In a dynamic world, your Kubernetes security infrastructure should also be dynamic. Many Kubernetes applications only add security features after deploying or are ready to be deployed. No wonder 55% of application rollouts were delayed, and 94% experienced a security incident last year.
Kubernetes actually offers several security mechanisms that could be provisioned, such as NetworkPolicies and PodSecurityPolicies. Kubernetes NetworkPolicies acts like a basic firewall by controlling the traffic to and from each pod. Calico NetworkPolicies lets us extend Kubernetes network policies even further with additional security features such as referencing sets of IP subnetworks, creating non-namespaced GlobalNetworkPolicies, and allowing or denying external services access.
The manual work of managing the sets of external IP addresses used by the Kubernetes deployments is complex. With the size of an environment like Taboola’s, it’s impractical and takes too much time. It can also introduce additional risks to the Kubernetes cluster.
We wanted to see if there was a way we could sync our Kubernetes NetworkPolicies dynamically with tools we already use. Keep reading to see how we use Consul and Calico to do it.
Dynamic security equals scalability
For the Taboola Engineering team, scalability is a high priority. We manage seven Kubernetes clusters with more than 100,000 cores on-premise. Plus, we have more than 6,000 Physical Nodes and other VMs that run other workloads. Running infrastructure of this size means we’re making dozens of changes per day, sometimes even dozens per hour.
To keep the k8s workloads isolated and operational, we’ve got to update the NetworkPolicies regularly. Every deployment, auto-scaling, and maintenance task can lead to an IP address update in the NetworkPolicy. That’s impractical to do manually with such a large Kubernetes deployment.
We knew there had to be a way to do this dynamically, so we started investigating.
Searching for the best dynamic security option
In our infrastructure, we use Consul to discover services running outside of k8s.
We had three critical requirements to use Consul and Calico to dynamically enforce security:
- If an IP address is added to a Consul service, it should be added automatically to Calico GlobalNetworkSet.
- If an IP address is deleted from a Consul service, it will be removed from Calico GlobalNetworkSet only after a defined grace period to prevent a flapping node.
- The update process should not overload the Calico datastore or the Consul API.
With our requirements set, we looked at three options to enable Kubernetes security with Consul and Calico.
Option 1: A bash script and cron job combination
We could run a bash script in a k8s cron job that uses Curl and Calicoctl command line tools to fetch the Consul service catalog and update Calico GlobalNetworkSet.
Option 2: A Consul template and Calico command line combination
We could create a Consul template for each Calico GlobalNetworkSet that would apply the updated GlobalNetworkSet whenever a Consul change was made.
Option 3: A sync process using Consul and Calico APIs
We could use Consul API to watch for service changes and directly update Calico GlobalNetworkSet with lid calico-go.
The Winner: Option 3
We rejected the first two options as they didn’t satisfy all our requirements. They both cannot delete an IP address from Calico after a grace period. They could also potentially overload Consul with too much LimboAPI traffic.
We discovered that running a Consul to Calico sync process was the winning option to enforce dynamic Kubernetes security. It allowed us to add IP addresses dynamically while giving us the ability to add a grace period for all removals/deletions. It prevents bottlenecks on the API and scales easily, no matter the size of the Kubernetes deployment.
To use this process, we started by running a blocking query for each Consul service configured for the operator. The process will store an updated list of IPs from both Consul and Calico. Upon every event from Consul, the process compares the IP lists in Consul and Calico by getting GlobalNetworkSet from Calico and the specific catalog from Consul.
Then, before it makes any changes in Calico, the process will check:
- If the IP address exists in Consul, but not in Calico. If not, it will add the IP to GlobalNetworkSet.
- If the IP address exists in Calico but not in Consul, it will wait 30 minutes before checking again. If the IP is still not in Consul after those 30 minutes, it will delete it from GlobalNetworkSet.
Real-time scenarios: How we’re using dynamic Kubernetes security
The dynamic approach we use eliminates manual operations and reduces human error while allowing better agility, scalability, and delivery times while reducing human error. Here are two scenarios that often happen at Taboola.
Scenario 1: Scraping pods from outside Prometheus
We use Prometheus outside our clusters to collect metrics from our pods. To do this, we need a secure way to allow incoming Prometheus traffic to scrape the servers for the data.
We configure the new consul-calico-sync process for the Prometheus Consul service and the correlated GlobalNetworkSet.
Now, any time the service catalog is changed, the relevant GlobalNetworkPolicy is updated dynamically so the Prometheus server can successfully scrape the pod.
Scenario 2: Querying external databases
Some of our applications that run on k8s regularly use databases outside the cluster. To allow this traffic, we constantly sync all IP addresses in the Consul database service with those in Calico.
This way, we only allow traffic from pods with the allow_db label and reject everything else.
In both these scenarios we see how the use of dynamic Kubernetes security negates human intervention, which saves time and reduces errors – both extremely important when we’re considering scalability.
Over to you
And there you have it, a way to dynamically manage security in your Kubernetes deployments at scale. Try adding the Consul2Calico sync process to your next Kubernetes application and stop making manual IP address changes to save time and effort.
For more details on how to run this solution in your Kubernetes cluster check out Consul2Calico on GitHub.