Skip to main content

Command Palette

Search for a command to run...

Bubble Wrap for Containers

gVisor in Kubernetes

Updated
8 min read
Bubble Wrap for Containers
M

Working as a solutions architect while going deep on Kubernetes security — prevention-first thinking, open source tooling, and a daily rabbit hole of hands-on learning. I make the mistakes, then figure out how to fix them (eventually).

Kubernetes makes it easy to forget what’s really running underneath. You write a Deployment, set a few limits, and let the control plane take it from there. But once that Pod lands on a node, it’s no longer YAML — it’s syscalls hitting the kernel.

Containers aren’t magic sandboxes; they’re just processes sharing the same kernel with a light dusting of isolation. That’s fine for speed, but it’s also why “container escapes” can show up (yes, back to my container escape obsession). They’re not exploits so much as reminders that namespaces aren’t armor.

Enter gVisor, Google’s user-space kernel that intercepts syscalls before they ever reach the host. Instead of trusting the Linux kernel to stay polite, gVisor runs your workload inside its own miniature kernel, enforcing isolation at the syscall boundary.

It sits somewhere between runc and a full-blown VM: fast enough to stay in the Kubernetes loop, but restrictive enough to squash most escape paths.

gVisor isn’t new, but it’s worth a burrito look—what it takes to install, where it shines, where it hurts, and why your favorite nsenter trick suddenly stops working.


Installing gVisor on Ubuntu (ARM)

I’m running this on my usual Mac setup: an Ubuntu ARM VM (Apple Silicon under the hood) with a kubeadm cluster using containerd as the runtime. Running something else should be fairly similar.

The plan:

  1. Install the gVisor binaries (runsc and the containerd shim).
  2. Tell containerd about the new runtime.
  3. Restart containerd and sanity-check.

Do this on every node that will run gVisor-protected workloads.

0. Quick sanity checks

Make sure you’re on ARM64 and using containerd:

uname -m
containerd --version

1. Install the gVisor binaries

curl -fsSL https://gvisor.dev/archive.key | sudo gpg --dearmor -o /usr/share/keyrings/gvisor.gpg
echo "deb [signed-by=/usr/share/keyrings/gvisor.gpg] https://storage.googleapis.com/gvisor/releases release main" | sudo tee /etc/apt/sources.list.d/gvisor.list
sudo apt update
sudo apt install -y runsc gvisor-containerd-shim

Validate:

runsc --version

2. Wire gVisor into containerd

Create or edit config.toml:

cat <<EOF | sudo tee /etc/containerd/config.toml
version = 2
[plugins."io.containerd.runtime.v1.linux"]
  shim_debug = true
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
  runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc]
  runtime_type = "io.containerd.runsc.v1"
EOF

Restart containerd:

sudo systemctl restart containerd

Running Kubernetes Pods with gVisor

Start by creating a RuntimeClass:

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: gvisor
handler: runsc

Apply:

kubectl apply -f runtimeclass-gvisor.yaml

Now run a test pod:

apiVersion: v1
kind: Pod
metadata:
  name: gvisor-test
spec:
  runtimeClassName: gvisor
  containers:
    - name: ubuntu
      image: ubuntu:22.04
      command: ["bash", "-c", "sleep 36000"]

Apply and verify:

kubectl apply -f gvisor-test.yaml
kubectl get pod

Get the container ID and confirm it’s using gVisor:

CID=$(kubectl get pod gvisor-test -o jsonpath='{.status.containerStatuses[0].containerID}' | sed 's#containerd://##')
sudo runsc --root /run/containerd/runsc/k8s.io list | grep $CID

gVisor vs runc Deep Dive

Instead of starting with theory, we’re going to follow the Burrito Way™: look at what actually happens first, then decide what we think. Two Ubuntu containers, same image, same command, same cluster:

  • one using runc
  • one using runsc

The differences show you far more about gVisor’s philosophy than any diagram.

Each section includes:

  • test commands
  • what you should observe
  • and what it actually means

Test Setup: Ubuntu Pods (gVisor vs runc)

Baseline (runc)

apiVersion: v1
kind: Pod
metadata:
  name: nogvisor-test
spec:
  containers:
    - name: ubuntu
      image: ubuntu:22.04
      command: ["/bin/bash", "-c", "sleep 3600"]

gVisor

apiVersion: v1
kind: Pod
metadata:
  name: gvisor-test
spec:
  runtimeClassName: gvisor
  containers:
    - name: ubuntu
      image: ubuntu:22.04
      command: ["/bin/bash", "-c", "sleep 3600"]

Process Visibility (Inside the Container)

Commands

runc

kubectl exec -it nogvisor-test -- bash
ps aux

gVisor

kubectl exec -it gvisor-test -- bash
ps aux

Expected

  • runc: PID 1 (sleep), bash, ps
  • gVisor: PID 1 (sleep), bash, ps
  • TTY differs:
    • runc → pts/0
    • gVisor → ?

Assessment

Inside the container, gVisor looks almost identical to runc. PID namespaces behave the same. That’s the trick: gVisor changes the kernel boundary, not the container environment. From the inside, nothing looks strange.


Process Visibility (From the Host)

Commands

Check for runc container process:

ps aux | grep sleep

Check for gVisor process wrappers:

ps aux | grep runsc

Expected

  • runc: host sees sleep 3600 as a real process
  • gVisor: host sees runsc-sandbox, runsc-gofer, etc.

Assessment

This is where the façade cracks. With runc, containers are just host processes. With gVisor, your workload runs inside a userspace kernel, not directly on the host. This is the clearest indicator that gVisor is more than “runc but safer.”


TTY Behavior

Command

ps aux

Expected

  • runc: TTY = pts/0
  • gVisor: TTY = ?

Assessment

TTYs behave differently because gVisor doesn’t map container PTYs to real host pseudo-terminals. You’re talking to a virtualized console layer.


/proc Virtualization

Commands

cat /proc/modules | grep tcp_diag

Expected

  • runc: shows real kernel modules (matching host)
  • gVisor: empty or missing

Assessment

Under gVisor, /proc is synthetic. runsc generates a fake procfs, so nothing from the real kernel leaks through. Kernel modules, device info, and other structural details disappear entirely. This is strong proof that syscalls never reach the kernel directly.

Capabilities

Command

grep Cap /proc/self/status

Expected

runc:

CapEff: 00000000a80425fb

gVisor:

CapEff: 00000000a80405fb

Assessment

The masks look nearly identical, but they don’t mean the same thing.

  • In runc, capability bits map to real (namespaced) kernel capabilities.
  • In gVisor, the bits are synthetic values exposed by runsc so applications don't break.

Even if CAP_SYS_ADMIN shows up in the mask, the underlying syscalls never reach the host.
The permissions appear real, but the power behind them isn’t.


Syscall Behavior (strace)

Note: you need to install strace inside the container.

Commands

apt update && apt install -y strace
strace ls

Expected

  • runc:
    execve("/usr/bin/ls", ["ls"], ...)
    
  • gVisor:
    execve(0xffffffffffffffda, ["ls"], ...)
    

Assessment

On the host and in runc, execve shows a real path because the syscall goes directly into the host kernel.

gVisor shows a sentinel hex value instead of a path. That’s runsc intercepting the syscall before it reaches the kernel. The rest of the call trace often looks similar because gVisor emulates most of Linux’s syscall surface — but it’s emulation, not the real thing.


Filesystem & Mount Behavior

Commands

mount -t proc proc /mnt
touch /proc/sys/kernel/randomize_va_space

Expected

  • runc:
    mount: /mnt: cannot mount proc read-only.
    
  • gVisor:
    mount: /mnt: permission denied.
    

Assessment

Both runtimes reject the mount, but for completely different reasons:

  • In runc, the real kernel enforces container restrictions (read-only proc, etc.).
  • In gVisor, runsc denies the syscall immediately, before the kernel even sees it.

This highlights the fundamental boundary difference:
runc relies on the kernel’s own namespace model, while gVisor implements mount and filesystem semantics in userspace.


Simulating a Classic Container Escape (runc vs gVisor)

This is the last container escape demo. (Until the next one.)

runc Escape (Ubuntu Node)

Save as escape.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: escape
  labels:
    app: escape
spec:
  hostPID: true
  containers:
    - name: escape
      image: nicolaka/netshoot:latest
      command: ["sleep", "3600"]
      securityContext:
        privileged: true
      volumeMounts:
        - name: host-root
          mountPath: /host
  volumes:
    - name: host-root
      hostPath:
        path: /
        type: Directory
  restartPolicy: Never

Apply and exec:

kubectl apply -f escape.yaml
kubectl exec -it escape -- bash

Escape to the host:

nsenter --target 1 --mount --uts --ipc --net --pid

You now land directly on the host:

uname
whoami
cat /etc/os-release

Trying the Same Escape Under gVisor

Now use the same pod spec, but with gVisor:

apiVersion: v1
kind: Pod
metadata:
  name: gvisor-escape
  labels:
    app: gvisor-escape
spec:
  hostPID: true
  runtimeClassName: gvisor
  containers:
    - name: escape
      image: ubuntu:22.04
      command: ["/bin/bash", "-c", "sleep 3600"]
      securityContext:
        privileged: true
      volumeMounts:
        - name: host-root
          mountPath: /host
  volumes:
    - name: host-root
      hostPath:
        path: /
        type: Directory
  restartPolicy: Never

Exec in:

kubectl exec -it gvisor-escape -- bash
ps aux

PID 1 here is just the /pause infrastructure container.

Attempt the escape:

nsenter --target 1 --mount --uts --ipc --net --pid
# nsenter: failed to execute /bin/sh: No such file or directory

This drops you into the infra container’s namespaces — not the host — and the infra container has no shell.
Trying to pivot to your own namespace:

nsenter --target 3 --mount --uts --ipc --net --pid -- ls /
# works, but just shows your same container root

nsenter --target 3 --mount --uts --ipc --net --pid
# no visible change — you're already there

Nothing interesting happens because:

  • gVisor mediates all namespaces
  • /proc is virtualized
  • escape pivots that rely on host namespaces simply don’t exist

Same YAML. Very different outcome.
runc → host access.
gVisor → sandbox stays a sandbox.


Wrapping Up

Kubernetes makes containers feel tidy and predictable. YAML goes in, Pods come out, and somewhere in between the scheduler pretends it’s your friend. But once a container starts running, every security guarantee boils down to one question:

Who actually handles your syscalls?

  • With runc, the answer is: the host kernel.
    Great for performance, great for density, and great for escape demos.

  • With gVisor, the answer becomes: a userspace kernel you don’t control from inside the container.
    Syscalls stop inside runsc, /proc becomes synthetic, capabilities lose their teeth, mounts break differently, and classic escape tricks like nsenter --target 1 simply stop working because the host kernel never sees the request.

That’s the gVisor mindset: keep Kubernetes fast, but stop trusting the kernel as a security boundary.

Is gVisor a silver bullet? No. But it genuinely changes the attack surface without requiring VMs or a massive architectural overhaul. That makes it worth understanding.

I’ll revisit this later to look at additional examples (and the very real performance hit), but that’s it for now.