Kubernetes Security for Sysadmins

Software security of any kind is a broad topic that can quickly become overwhelming, and Kubernetes is no different. Due to the architecture of Kubernetes, there are multiple components that can be attack vectors, from the kubelet, to the API server, and of course, your container images. Each component must be considered an attack vector in order to properly defend against attacks.

The Kubernetes documentation offers a good high-level guide to approaching security, but not much in terms of specifics. You get a lot of "you should secure this," but not much "here's exactly how you do it." This guide aims to offer you some specifics on how to secure your clusters and why each of them matters.

A refresher on the k8s architecture

Instead of slapping on a few “best practices” and calling it a day, it is important to revisit each of the Kubernetes components and walk through each, understanding the purpose of individual components can give you a better picture of how to secure it.

Kubernetes uses a control plane-style architecture, which houses a few subcomponents. The control plane talks to each node in the cluster and tells them what to do.

The Control Plane

The API server is the main entry point because all requests go through it. Whether you're running kubectl, deploying a pod, or a service account that is trying to list secrets, it all hits the API server first. This makes it the most critical component from a security perspective.

There's also etcd, the scheduler, and the controller manager, but in managed Kubernetes (EKS, GKE, AKS), these are abstracted away. Even in self-managed clusters, securing these usually means "don't expose them to the internet," and that's about it. We'll skip for now, since most folks never touch them directly.

The Worker Nodes

Every node runs a kubelet. The kubelet pulls container images, starts pods, and mounts volumes. Each node also has a container runtime (containerd, CRI-O, docker, if you're still on older versions). This is what actually runs your containers and where container escapes happen.

Why this matters

When we talk about securing Kubernetes, it usually means talking about securing the paths between these components or a specific component. An attacker needs to compromise one of these to get anywhere, so that's where we focus our efforts. API server access, kubelet endpoints, and container runtime.

Securing the control plane

Here's that section with the mix of code and explanation in your style:

Securing the control plane

Because the control plane houses many critical components it's a good idea to spend some time here getting things right.

RBAC

Shipped in k8s 1.6, RBAC enables you to provide role based access to both workloads and administrators managing your clusters. The problem is everyone just slaps cluster-admin on everything and calls it a day.

When installing new tools on your cluster like ArgoCD ensure you are using the production installation of said tool. Many operators and tools will provide a "dev" version of an install which is aimed at getting you up and running as fast as possible but should NEVER be used in prod due to the fact that these installs will give the tool way more permissions than it needs

When securing workloads you want to leverage RoleBindings (namespace-scoped) instead of ClusterRoleBindings (cluster-wide) whenever possible.

Which typically looks like this:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: production
  name: pod-reader
rules:
- apiGroups: [""]
  resources: ["pods", "pods/log"]
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: read-pods
  namespace: production
subjects:
- kind: ServiceAccount
  name: my-app
  namespace: production
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io

Applying the principle of least privilege, you also want to avoid wildcards in your RBAC rules.

# DON'T DO THIS
rules:
- apiGroups: ["*"]
  resources: ["*"]
  verbs: ["*"]

For assigning roles to humans, many cloud providers will already give you a kubeconfig derived from your IAM/cloud accounts. However, not everyone needs admin access to a cluster, in which case you can leverage RBAC to restrict access:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: developer-role
rules:
# Let devs view most things
- apiGroups: ["", "apps", "batch"]
  resources: ["pods", "deployments", "jobs", "services"]
  verbs: ["get", "list", "watch"]
# But only edit in dev namespaces
- apiGroups: ["apps"]
  resources: ["deployments"]
  verbs: ["create", "update", "patch", "delete"]
  resourceNames: ["dev", "staging"]
# Never let them touch secrets in prod
- apiGroups: [""]
  resources: ["secrets"]
  verbs: ["get", "list"]
  resourceNames: ["dev", "staging"]

Audit logs

RBAC controls who can do what, but audit logs tell you who actually did what. The API server can log every request, who made it, what they asked for, and whether it succeeded.

For managed clusters (EKS, GKE, AKS), check your cloud provider's console - they usually have it under "cluster settings" or "logging". Each provider does it differently, so you’d have to poke around their docs for specifics.

For self-managed clusters, check the API server config:

# Check if the API server has audit flags
kubectl get pod -n kube-system -l component=kube-apiserver -o yaml | grep -E "audit-log-path|audit-policy-file"

# Or if you have access to the master node
ps aux | grep kube-apiserver | grep -E "audit-log-path|audit-policy-file"

Here's the node security section in your style:

Securing nodes

While the control plane manages the operations of your entire cluster, the nodes are still where the magic happens, which means equal attention should be paid to them.

Generally, you want to start by ensuring your nodes aren't publicly accessible. While it sounds like generic advice, even if you want to SSH into a node (for whatever reason), consider using a more secure option such as Tailscale or SSM Session Manager if you're on AWS. Exposing your nodes directly could mean a possible DDoS if an attacker finds it, and once a node fails, it's possible autoscaling kicks in, and now your bill goes up at the end of the month.

Similarly, exposing services via NodePort is not the way to go. Aside from the fact that you can only open so many ports, you give malicious actors a good idea of what you are running, which can further aid their fingerprinting and enumeration. Use a proper ingress controller or LoadBalancer service type instead.

Sign and verify images

For the longest time, open source and much of software is based on implicit trust - the idea that systems will behave as intended and no application has bad intent. This is exactly the thinking that supply chain attacks exploit.

When running images of any kind, it's important to verify them. Tools like Cosign and Kyverno will allow you to write policies to ensure you are using a signed image or running an image from a vetted registry.

Kyverno does this through the use of policies, which look something like this:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: verify-image-signature
spec:
  validationFailureAction: enforce
  background: false
  webhookTimeoutSeconds: 30
  failurePolicy: Fail
  rules:
    - name: verify-signature
      match:
        any:
        - resources:
            kinds:
            - Pod
      verifyImages:
      - imageReferences:
        - "ghcr.io/your-org/*"
        attestors:
        - count: 1
          entries:
          - keys:
              publicKeys: |-
                -----BEGIN PUBLIC KEY-----
                MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE...
                -----END PUBLIC KEY-----

This blocks any pod from running unless the image is signed with your key.

Scan with purpose

When scanning your images it's important to do it with purpose and actually take those insights and improve on them.

# .github/workflows/scan.yml
- name: Run Trivy vulnerability scanner
  uses: aquasecurity/trivy-action@master
  with:
    image-ref: ${{ env.IMAGE }}
    format: 'sarif'
    severity: 'CRITICAL,HIGH'  # Don't fail on MEDIUM unless you want noise
    exit-code: '1'  # Actually fail the build

The key is setting thresholds that make sense. Failing builds on every CVE means nothing ships. Only failing on CRITICAL means you're probably running Log4Shell somewhere.

Also scan running images in the cluster, not just at build time:

kubectl apply -f https://raw.githubusercontent.com/aquasecurity/trivy-operator/main/deploy/static/trivy-operator.yaml

This continuously scans what's actually running vs what you think is running. Because that image you built 6 months ago should probably be re-evaluated.

Security at Runtime

Most of the tips so far focus on preventing incidents before they happen, hardening access, signing images, and scanning builds. But what happens when an application is compromised at runtime?

Runtime security aims to protect applications while they’re running, detecting and preventing suspicious behavior that violates defined security policies. In Kubernetes, this means stopping compromised pods from doing further damage, like spawning unexpected shells, writing to host paths, or making unusual network calls.

Tools like Falco (CNCF) monitor system calls and detect abnormal activity in real time. For example:

- rule: Terminal shell in container
  desc: Detect when a shell is run inside a container
  condition: container and shell_procs
  output: "Shell spawned inside container (user=%user.name image=%container.image.repository)"
  priority: WARNING

Closing thoughts

Securing Kubernetes requires more than just following best practices. It’s about understanding where your biggest risks actually lie and prioritizing accordingly. For example, if you’re running untrusted workloads, it makes little sense to spend time hardening etcd when you could be hardening your nodes and leveraging tools like gVisor.

Perfect security doesn’t exist, but with the right focus, you can avoid being on the front page of Hacker News as the latest breach :)‍

Kubernetes Security for Sysadmins

A refresher on the k8s architecture

Securing the control plane

Securing the control plane

Audit logs

Securing nodes

Security at Runtime

Closing thoughts

Keep Reading

EverythingDevOps

Home

Account