building a 600-line container runtime in Go (because why not)

containers are not magic. containers are a small pile of linux primitives wearing a trench coat. you can build one in an afternoon if you don't care about correctness, security, performance, or your friendships.

i didn't care about any of those things, so i built one.

the four ingredients

namespaces — for isolation (pid, mount, net, uts, ipc, user)
cgroups v2 — for resource limits (cpu, mem, io)
an overlay filesystem — for the rootfs
vibes — for everything else

the entire fork-and-exec

cmd := exec.Command("/proc/self/exe", append([]string{"child"}, args...)...)
cmd.SysProcAttr = &syscall.SysProcAttr{
    Cloneflags: syscall.CLONE_NEWUTS |
                syscall.CLONE_NEWPID |
                syscall.CLONE_NEWNS  |
                syscall.CLONE_NEWNET |
                syscall.CLONE_NEWIPC |
                syscall.CLONE_NEWUSER,
}
cmd.Stdin, cmd.Stdout, cmd.Stderr = os.Stdin, os.Stdout, os.Stderr
must(cmd.Run())

in the child you mount /proc inside the new pid namespace, pivot_root onto your overlay, set the hostname to something embarrassing (mine is 'localhorst'), and exec the user's command. that's it. that is the whole thing. you have invented a sad runc.

do not use this in production. do not use this in staging. do not let it within ten meters of an audit. i love you.

what's actually hard

cgroups v2 was a delight after years of fighting v1's hierarchies. user namespaces remain a special kind of nightmare — uid mappings, capability semantics, the small but persistent voice in the back of your head whispering 'are you sure'. networking, of course, is its own continent. veth pairs and a bridge get you basically nowhere; routes, iptables, dns, all of it has to be built. give up early and shell out to slirp4netns. don't be a hero.

building a 600-line container runtime in Go (because why not)building a 600-line container runtime in Go (because why not)building a 600-line container runtime in Go (because why not)

the four ingredients

the entire fork-and-exec

what's actually hard

building a 600-line container runtime in Go (because why not)