Introduction
This written workshop describes how to deploy a reliable, high available and a production-ready Kubernetes cluster on AWS with Terraform and KOPS.
Amazon EKS is the default go-to solution for Kubernetes on AWS.It simplifies a Kubernetes cluster deployment by taking away the hassle of maintaining a master control plane. It leaves the worker node provisioning to you, which is simplified by Amazon EKS pre-configured Amazon Machine Images (AMIs).
But sometimes the default settings are not enough for a particular solution.
- For example, Amazon EKS does not allow custom settings on its control plane, so if that’s something you need you’ll have to consider a self-hosted solution for your Kubernetes cluster.
- Another case is deploying Pods (groups of one or more containers with shared storage/network, and a specification on how to run the containers) in the master nodes. This capability is locked in Amazon EKS. For certain add-ons that can be installed in Kubernetes like KIAM (a tool to provide IAM credentials to Pods for target IAM roles), a master/worker Pod setup is necessary. If you can’t run Pods in the master nodes in Amazon EKS, you’ll need to provision some extra nodes in your cluster to simulate a master role for this type of add-on.
If you find yourself in any of the above scenarios, I recommend that you deploy a self-managed Kubernetes cluster on AWS using Terraform and kops.
In this tutorial, we will deploy the following architecture using Terraform and KOPS. You use Terraform
to create and manage the shared VPC resources, KOPS
resources and the application environment (AWS ECR, AWS ACM, AWS Route 53, …) . You use KOPS as the mechanism to install and manage the K8S cluster in AWS. You use kubectl
to test and manage the K8S application deployment
Architecture

The following table maps most of the AWS products used in this tutorial to their use case in K8S cluster.
Service | Use case |
---|---|
VPC | Provision a logically isolated section of the AWS Cloud |
ACM | AWS Certificate Manager: provision SSL/TLS certificates |
ECR | Amazon ECR is a fully-managed Docker container registry |
Kubernetes Server | Bastion host used to ssh the private K8S cluster |
K8S Worker Nodes | Worker machines, part of the Kubernetes cluster |
Internet Gateway | Allows communication between VPC instances and the internet |
Route 53 | Amazon Route 53 is a cloud DNS service |
S3 | Amazon Simple Storage Service is an object storage service |
Assumptions and Prerequisites
- You have basic knowledge of AWS
- Have basic knowledge of Kubernetes
- You have a basic knowledge of Terraform
- You have Terraform v0.12x / v0.11x installed in your machine
- You mist have kubectl and KOPS installed in your machine
- You must have an AWS account, with an IAM keypair which has owner access to your AWS environment.
Objectives
This guide walks you through how to the following tasks:
- ✅ Use Terraform and KOPS to create a Kubernetes cluster
- ✅ Create a bastion machine to manage your cluster masters/nodes
- ✅ Deploy a sample Kubernetes application in the created cluster
- ✅ Learn and use HCL (HashiCorp Language), Terraform and KOPS best practices
Software Dependencies
- Terraform v0.12.x and v0.11x (you can use the tf switcher tool)
- Terraform AWS provider plugin v2.57
- KOPS
- Kubectl
- jq
What is out-of-scope
This is not a tutorial on terraform
, even without knowing it you should still be able to understand most of it. You can learn the basics here in my previous blog with Azure AKS.
We will also not dive deep into kubernetes and just limit ourself to creating the cluster.
Before you begin
Setup the Terraform State environment
In order to deploy Kubernetes cluster on AWS with Terraform and KOPS we need to create 2 resources:
- A S3 bucket (in our tutorial it will be named
terraform-eks-dev
, I recommend to set the versioning) - A DynamoDB table (in our tutorial it will be named
terraform-state-lock
)
Configuring AWS
In order to follow the best practices, let’s create a user for Terraform. Go to your AWS console and create terraform_user
user:

Give it the good rights. In my example, I need Terraform to be able to manage all of my AWS Cloud resources:

Don’t forget to store the AWS access key id and secret access key. Next, you need to copy them in your AWS credential file, you can also execute $ aws configure
to add a new user.
[terraform_user]
aws_access_key_id = xxxxxxxxxxxxxxxxxxx
aws_secret_access_key = xxx/xxxxxxxxxxxxx/xxxx
Technical setup of our cluster
AWS
We are going to create a kubernetes cluster inside a private VPC (we will create it using terraform
) in the Frankfurt region (eu-central-1
).
This VPC will have 3 private and 3 public subnets (one per Availability zone).
For our private subnets we will have only 1 NAT gateway (for economy purpose).
Kubernetes
Our kubernetes cluster will run in a private topology (i.e. in privates subnets).
The kubernetes API (running on masters node) will only accessible through a Load Balancer (created by kops
).
All the node won’t be internet accessible by default, but using a bastion host we will be able to ssh to them.
The following setup is “prod” ready, we will have 3 masters (one per Availability zone) and 2 nodes.
Kubernetes imposes the following fundamental requirements (shamefully stolen from here):
- All containers can communicate with all other containers without NAT
- All nodes can communicate with all containers (and vice-versa) without NAT
- The IP address that a container sees itself as is the same IP address that others see it as
So in AWS we need to choose a network plugin. Here we will use the amazon-vpc-cni-k8s plugin. It is the recommended plugin and it’s maintained by AWS.
To deploy an K8S cluster with Terraform and KOPS, the first step is to obtain the source code from my Github repository.
This will clone the sample repository and make it the current directory
$ git clone
https://github.com/AymenSegni/aws-eks-cluster-tf-kops.git$ cd
aws-eks-cluster-tf kops
Our project directory tree will look like at the end:

Project structure
1- Terraform config directory: /terraform
a- modules
: represent here in this layout the Terraform modules (general re-used functions) . In this lab, we have basically 4 modules:
– shared_vpc
: Define the shared VPC resources
–kops_resources
: the AWS resources needed to run the KOPS configs
– ecr:
Create an AWS ECR repository used to store docker images needed to deploy the kubernetes application later
– app_env
: Host the Teeraform configs necessary to create a Route 53 dns records and SSL ACM certificate for the kubernetes application deploymentb- Deployment
: is the root Terraform function of the layout, responsible of the K8S Kubernetes cluster deployment on AWS.
– main.tf
we define the Terraform modules already created in /modules
sub-foldesr with the appropriate inputs defined in variables.tf
or in a terraform.tfvars
file (wich is not covered in this guide).
– provider.tf:
define the AWS provider configuration used by Terraform including the version, the main deployment region and the AWS technical user (terraform_user)
– backend.tf:
Define the S3 bucket
and the dynamodb table
that manage the Terraform state file.
2- KOPS config directory: /kops
– template.yaml
: template the K8S creation in AWS
The rest of the KOPS config files are auto-generated by the KOPS CLI. Only the template file is needed.
2- Kubernetes Application deployment config directory: /k8s-deployment
a- /src
: Holds the application code source, nginx config file and the Dockerfileb- /deploy-app
: Manage the K8S application deployment and the service definition
As you can see our terraform and kops configuration are separated. This is because the kops configuration files are fully managed by kops and modifying them is not “persisted” between kops run.
Terraform deployment setup
So as to deploy Kubernetes cluster on AWS with Terraform and KOPS, we need to setup our Terraform deployment in root function at /terraform/deployment
.
In this stage, we must define the provider and the backend configs as the following:
1- provider.tf
2- backend.tf
Stay tuned, in the next section, we’re going to talk about how to create shared AWS VPC resources using Terraform modules.
Shared VPC resources
We need to setup some Terraform resources that will be used by kops for deploying K8S cluster but could also be used by other things.
We will use the very good terraform-aws-vpc module to avoid having to setup each resource individually.
But first, we need to define the generic TF module terraform/modules/shared_vpc
that will be used throughout the whole tutorial.
Our VPC will be on the 10.0.0.0/16
with a separation of private and public subnets.
As you can see we are applying some specific tags to our AWS subnets so that kops can recognize them.
Now let’s really apply this configuration to our aws account:
In fact, let’s navigate to the root deployment folder, the vpc module deployment is defined as the following:
To run this deployment, let’s define some global
variables in terraform.tfvars
KOPS AWS resources
Let’s also create a S3 bucket (with versioning enabled) where kops
will save the configuration of our cluster.
And a security group to whitelist IPs access to the kubernetes API .
In our project layout the kops resources are defined in the terraform module /terraform/modules/kops_respources
.
Output
The output we define below will be used by kops
to configure and create our cluster.
terraform/deployment/output.tf
:
Finally, we can now run Terraform magic (if you use my code, don’t forget to comment out all the other resource keep only the vpc and the kops_resources modules in main.tf )
$ cd /terraform/deployment
$ terraform init
$ terraform plan
$ terraform apply
$ terraform plan
output examples:


Example of the $ terraform apply
output

Bingo, our shared resources are done! ✅ We can verify in the AWS console that the krypton-vpc is created and available:

Kops: Deploy Kubernetes cluster on AWS with Terraform and KOPS
kops/template.yaml
The above template will be used by the kops templating tool to create a cluster, with:
- 3 master, each in a different availability zone
- 2 nodes
- 1 bastion to have SSH access to any node of our cluster (master and nodes)
Using it: the KOPS magic
We are going use our previous terraform output as values for the template (run this in the kops/
directory).
$$$ TF_OUTPUT=$(cd ../terraform/deployment && terraform output -json)
CLUSTER_NAME="$(echo ${TF_OUTPUT} | jq -r .cluster_name.value)" $ kops toolbox template --name ${CLUSTER_NAME} --values <( echo ${TF_OUTPUT}) --template template.yaml --format-yaml > cluster.yaml
Now the cluster.yaml
contains the real cluster definition. We are going to put in the kops state s3 bucket.
$ STATE="s3://$(echo ${TF_OUTPUT} | jq -r .kops_s3_bucket_name.value)"
$ kops replace -f cluster.yaml --state ${STATE} --name ${CLUSTER_NAME} --force
$ kops create secret --name ${CLUSTER_NAME} --state ${STATE} --name ${CLUSTER_NAME} sshpublickey admin -i ~/.ssh/id_rsa.pub
The last command will create use your public key in ~/.ssh/id_rsa.pub
to allow you to access the bastion host.
Now that kops state as been updated we can use it to create terraform files that will represent our cluster.
$ kops update cluster \
--out=. \
--target=terraform \
--state ${STATE} \
--name ${CLUSTER_NAME}
And let’s deploy it on AWS 😃
Oops 😬, one thing is missing: KOPS is not yet compatible with Terraform 0.12x. In fact, you must downgrade to Terraform v0.11x
in your terminal before continuing (pro tips: you can use the Terraform Switcher tool 😊)
$ terraform init
$ terraform plan
$ terraform apply
Congratulations! 🎉 our cluster is deployed on AWS with the bastion server and the Load Balancer and all the desired resources

Wrapping up
You should now have a cluster with multiple nodes and multiple masters, running on a VPC you control outside of kops.
This cluster uses the AWS VPC CNI plugin (amazon-vpc-cni-k8s), so pod networking uses the native network of AWS.
You should be able to see all your nodes by running (don’t forget to add your public IP to the cluster security group)
$ kubectl get nodes

You also have a bastion host to connect to your cluster VMs 😄
Deploy a Kubernetes Application to the cluster
Now it’s time to deploy a simple application to our cluster. It’s just going to be a simple ngin
server serving the index.html
. We will create an ECR repository
through terraform, create a container image serving the index.html (through a standard nginx container image), build it and push it to the newly created repository.
Create ECR repository with Terraform
As As usual, we follow Terraform best practices using Terraform modules to provision our Cloud resources, we defined the Terraform configuration of the ECR service in /terraform/modules/ecr
ecr.tf
module example:
Let’s deploy the AWS ECR resource using the root deployment folder: /terraform/deployment
(you can just uncomment the code section)
Now let’s keep the rest on Terraform (don’t forget to upgrade Terraform v0.12 x 😉 again)
$ terraform init
$ terraform plan
$ terraform apply
Example of the Terraform apply execution output:

Build and Push The Docker Image
The source code already contains the Dockerfile
needed for the Configs API. Build, tag and push the image using the command below (you can find the push commands in the ECR service console section)
$ cd /k8s-deployment/src && docker build -t <image_name> .
$ docker tag <image_name>:latest <ecr_uri/<image_name>:latest
$ docker push <ecr_uri/<image_name>:latest
Deploy and expose the application in K8S cluster
After that, we will deploy an example application to our Kubernetes cluster and expose the website to the outside world. We will use a public load balancer for that and the result should be reachable from the internet via hello.aymen.krypton.berlin
(feel free to use your own domain name)
Since we want to expose our website securely, we need to get a valid SSL certificate from ACM (we will Terraform) and attach it to the load balancer
The following steps show you how to create a sample application, and then apply the following Kubernetes LoadBalancer ServiceTypes to your sample application:
Create a sample application
1. To create a sample NGINX deployment, run the following command
$ cd k8s-deployment/deploy-app
$ kubectl apply -f deployment.yaml
Create a LoadBalancer service
1. To create a LoadBalancer service, we created a file called service.yaml, and then set type to LoadBalancer. See the following example:
apiVersion: v1
kind: Service
metadata:
name: aymen-krypton
spec:
type: LoadBalancer
selector:
app: aymen-krypton
ports:
- protocol: TCP
port: 80
targetPort: 80
To apply the loadbalancer service, run the following command:
$ kubectl create -f service.yaml
Verify the Deployment
To verify the Application deployment, you can run the following kubectl cli
|--⫸ kubectl get pods
NAME READY STATUS RESTARTS AGE
aymen-krypton-7dc69c7d7d-5bp4w 1/1 Running 0 25h
aymen-krypton-7dc69c7d7d-gx87l 1/1 Running 0 25h
aymen-krypton-7dc69c7d7d-mvrbx 1/1 Running 0 25h
|--⫸ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
aymen-krypton LoadBalancer 10.0.3.6 afe022044489a44d8ae4a47c6f43c44c-2036026668.eu-central-1.elb.amazonaws.com 80:30770/TCP 25h
kubernetes ClusterIP 10.0.0.1 <none> 443/TCP 40h
Create a DNS crecord and generate a valid SSL certificate
In order to finalise the deploy of the Kubernetes cluster on AWS using Terraform and KOPS, we should create a dns record for our deployed application using AWS Route 53 service, then we will generate a valid SSL certificate and attach it the application Load Balancer. To do all of that we will use Terraform of course 😍
As the other modules, we defined the Terraform configuration of the application environment resources in /terraform/modules/app_env
as the following:
Now, let’s deploy the AWS ACM and Route 53 resource using the root deployment folder: /terraform/deployment
.
Now let’s keep the rest on Terraform
$ terraform init
$ terraform plan
$ terraform apply
Explore the Application
Excited to see the results of this long journey 😄? So am I.
Let’s navigate to the web browser and type: hello.aymen.krypton.berlin

BingoO 🥳Congratulations! our application has been successfully deployed on our Kubernetes cluster!
Updates and Clean Up
If you make changes to your code, running the plan
and apply
commands again will let Terraform use its knowledge of the deployed resources (.tfstate)
to calculate what changes need to be made, whether building or destroying. Finally, when you want to bring down your infrastructure, simply issue a $ terraform destroy
command and down it comes.
That’s all folks!
That’s all for this lab, thanks for reading 🙏
Later posts may cover how to deploy a configured cluster with hundreds of microservices deployed in one click!
Be the first to be notified when a new article, running it on Cloud or Kubernetes experiment is published.
Don’t miss the next article!