Scaling Terraform agents using the Terraform Operator on Amazon EKS Auto Mode

Learn how to deploy and autoscale HCP Terraform agents using the Terraform Operator on EKS Auto Mode, enabling efficient resource utilization and automated capacity management for platform teams.

Sep 15, 2025

HashiCorp Cloud Platform (HCP) Terraform provides organizations with a managed service for infrastructure automation, offering multiple execution mode options to suit different requirements. While many teams can effectively use HCP Terraform with standard remote execution workflows, certain scenarios, such as managing resources within private networks, call for HCP Terraform Agents.

HCP Terraform Agents are self-hosted execution environments that run Terraform operations locally while maintaining seamless integration with the HCP Terraform service. They provide flexibility for organizations that need to maintain control over the execution environment while still benefiting from HCP Terraform’s collaborative features and workflow management.

As organizations that utilize agents scale their infrastructure deployments, managing these HCP Terraform Agents becomes increasingly complex. Platform teams face the challenging task of striking a balance between ensuring sufficient agent capacity for peak workloads and avoiding resource waste during quiet periods. Manual scaling processes are time-consuming and error-prone, often resulting in either costly over-provisioning or performance-limiting under-provisioning. This problem intensifies in environments where multiple teams share Terraform resources, resulting in unpredictable demand fluctuations. In this blog post, we’ll demonstrate how combining the HCP Terraform Operator with Amazon EKS Auto Mode creates an intelligent, self-adjusting agent scaling solution that responds dynamically to workload demands, eliminating manual intervention while optimizing both performance and cost.

The HCP Terraform Operator advantage for agent scaling

The HCP Terraform Operator for Kubernetes simplifies deploying and managing Terraform agents by handling their lifecycle through Kubernetes custom resources. While the basic deployment of agents is straightforward, the operator’s autoscaling capability is particularly valuable for platform teams.

The most significant advantage is the operator’s ability to automatically scale the number of agents based on pending Terraform workloads. This ensures:

Sufficient capacity during high-demand periods
Resource conservation during low utilization
Elimination of manual scaling operations

Without autoscaling, platform teams face challenges:

Over-provisioning agents “just in case,” wasting resources
Under-provisioning agents leading to execution delays
Manual intervention to adjust capacity as needs change

Why EKS Auto Mode with HCP Terraform Operator to manage Terraform agents

EKS Auto Mode extends AWS management beyond the Kubernetes control plane to include the underlying infrastructure that enables the smooth operation of your workloads. For platform teams managing Terraform deployments, this brings several important advantages:

Streamlined operations: EKS Auto Mode automates cluster node provisioning, scaling, security patching, and upgrades. This reduces the operational burden on platform teams who need to focus on delivering infrastructure as a service rather than maintaining Kubernetes clusters.
Dynamic scalability with Karpenter: EKS Auto Mode leverages Karpenter for intelligent node scaling, which is ideal for Terraform agent workloads. Unlike traditional cluster autoscaling solutions, Karpenter observes unschedulable pods and makes scheduling decisions based on actual workload requirements. When Terraform runs spike, Karpenter launches new nodes to run the new Terraform agent pods. Similarly, when there is a decrease in runs, Karpenter disrupts and terminates nodes when they are no longer needed, optimizing resource usage.
Enhanced security: EKS Auto Mode uses immutable AMIs with read-only root filesystems and SELinux mandatory access controls. Nodes are automatically replaced within a maximum 21-day lifecycle (which you can reduce).
Cost optimization: EKS Auto Mode consolidates workloads onto fewer nodes when possible and terminates unused instances, helping control costs for your agent infrastructure.

Unlike EKS with Fargate, EKS Auto Mode is fully Kubernetes conformant and supports all EC2 runtime purchase options, including Spot instances. This gives you flexibility to optimize your Terraform agent infrastructure based on your specific workload characteristics and cost requirements.

Let’s examine how to implement autoscaling with the Terraform Operator on EKS Auto Mode.

Deploying the HCP Terraform Operator on EKS Auto Mode

To demonstrate this deployment pattern, I’ve created a GitHub repository with complete Terraform configurations that you can use to follow along. The repository contains modular configurations for each component of the solution. The solution deployed is shown in the architecture diagram. Note that this code is for demonstration purposes only and should not be used in production environments.

Create an EKS Auto Mode cluster using Terraform.

The code for this section is available here. Follow the instructions in the link to configure your AWS credentials and apply the Terraform configuration. Note that running this demo will incur cloud service costs. Ensure you understand the pricing implications before proceeding.

When the EKS cluster is first created, no pods are running.

Since there are no deployments or unscheduled pods, no compute nodes are provisioned.

Create HCP Terraform Team and Token

This section follows the guidance in the HashiCorp Terraform Operator v2 Agent Pool tutorial. The code for this section is available here. Follow the instructions in the linked code to configure your HCP credentials and apply the Terraform configuration.

This creates a Terraform team named k8sop-agents with permissions to manage agent pools and read workspaces. These permissions are necessary because the Agent Pool Custom Resource Definition (CRD) requires the ability to create agent pools and detect pending runs in Terraform workspaces to effectively scale agents within the pool.

A team token is also created.

Create the Terraform Operator

The code for this section is available here. Follow the instructions in the link to create the operator within the EKS Auto Mode cluster.

The EKS Cluster now has 2 Terraform Operator pods running.

A node is automatically provisioned by the EKS cluster as the hosting environment for the pods. This node is managed by the system built-in node pool. Refer to the Amazon EKS documentation for running critical add-ons on dedicated instances for more information about scheduling pods onto the system node pool.

EKS cluster compute configuration shows a node created

Viewing the node capacity allocation shows the 2 pods and their memory reservation.

Create the Terraform Agent Pool CRD

The code for this section is available here. Follow the instructions to create the operator agent pool CRD. For this demo, the Terraform agent pool is configured with minReplicas as 5 and maxReplicas as 50. These settings are compatible with agents running on HCP Terraform Premium, which supports up to 300 self-hosted agents. If you are using HCP Terraform Essentials or Standard, you should modify these values to align with your plan’s agent limits.

Note that beyond autoscaling capabilities, the operator also automatically configures agent pods with a default 15-minute graceful termination period, allowing active Terraform runs to complete before the agents are shut down. This prevents interruption of long-running infrastructure operations and ensures data integrity. You can customize this termination grace period by setting terminationGracePeriodSeconds in your AgentPool specification if your typical Terraform runs require more or less time to complete safely.

5 Terraform agent pods prefixed with agents-of-demo-pool are created in the EKS cluster. This is in addition to the original 2 HCP Terraform operator pods prefixed with demo-hcp-terraform-operator.

When the agent pods are created, EKS Auto Mode provisions a new node within the cluster to accommodate them, as the Terraform agent pods are scheduled onto the general-purpose built-in node pool.

EKS cluster compute configuration shows a general-purpose node created

Viewing the node capacity allocation shows the 5 Terraform agent pods and their memory reservation.

EKS cluster node capacity allocation shows the Terraform agents

A Terraform agent pool is also created with the 5 agents registered and in Idle state.

Testing with workspaces

With both the operator and agent pool successfully deployed into the environment, the next step involves testing how the Terraform Cloud agents scale out and in based on workload. The test setup, available here, creates a GitHub repository containing simple Terraform configuration files and automatically provisions 50 Terraform workspaces. Each workspace is connected to this repository through VCS integration, enabling automatic plan and apply operations when code changes are pushed. All workspaces are specifically configured to use agent execution mode and are linked to the agent pool previously created using the Terraform agent pool CRD.

GitHub repo and workspaces

GitHub repository created with simple Terraform resources.

GitHub repository with simple Terraform resources

50 Terraform workspaces are created. Some workspace runs are picked up by the 5 pre-existing idle agents. The rest of the workspaces are queued while the operator detects that workspaces are pending and begins to scale out the Terraform agents. The logic for how the agent pool controller scales is available in the hcp-terraform-operator GitHub repo.

Terraform Operator scaling out

The controller detects that there are pending workspace runs and scales out the number of agents. At this point, 50 agents match the maxReplicas argument that was specified in the Terraform Operator Agent Pool configuration.

EKS Cluster has 52 pods running. There are 2 pods for the Terraform Operator and 50 pods for the Terraform agent.

EKS cluster shows 50 agent pods and 2 operator pods

Karpenter automatically scales out from 1 to 2 nodes that are managed by the general-purpose node pool to accommodate the large increase in Terraform agent pods.

Karpenter scales out the number of nodes managed by the general-purpose node pool

44 of the new Terraform agent pods are scheduled on the newly launched r5a.xlarge node.

The original node has the original 5 Terraform agent pods and 1 new Terraform agent pod.

Terraform Operator scaling in

Eventually, all the workspaces are applied.

Terraform agents scale to 5 agents. This matches the minReplicas parameter that was specified in the Terraform Operator Agent Pool configuration.

EKS cluster shows that the 5 Terraform agent pods and 2 Terraform Operator pods are running.

EKS cluster pods show the 5 Terraform agent pods and 2 Terraform Operator pods

After some time with no pending workspace runs or unscheduled pods, Karpenter automatically consolidates resources by scaling in the the number of nodes.

The remaining general-purpose node now shows only 5 Terraform agent pods.

EKS cluster general-purpose node shows 5 agent pods

Next steps

The combination of EKS Auto Mode and the HCP Terraform Operator creates a powerful solution for dynamically scaling Terraform agents. This approach delivers significant benefits for platform teams managing infrastructure at scale, from operational efficiency and streamlined workflows to enhanced security and consistent performance. By automating the provisioning, scaling, and lifecycle management of agent infrastructure, teams can focus on strategic infrastructure delivery rather than managing the underlying components.

As demonstrated in this blog, the solution automatically adapts to changing workload demands, spinning up additional capacity when Terraform runs increase and consolidating resources when demand decreases. This ensures your organization always has the right amount of capacity for your Terraform operations without manual intervention.

Get started with Terraform agents on EKS Auto Mode by following these additional resources:

HCP Terraform Operator for Kubernetes documentation
Amazon EKS Auto Mode documentation
Try the HCP Terraform Agent tutorial to learn more about agent capabilities

Note that EKS Auto Mode prices are in addition to the Amazon EC2 instance price, which covers the EC2 instances themselves. Like EC2 instance charges, EKS Auto Mode charges are billed per second, with a one-minute minimum. Refer to the EKS pricing page for more information.

HashiCorp Solutions Engineering Blog

Discussion about this post