// PROJECT

Mini Container Runtime

Build a minimal container runtime using Linux namespaces and cgroups. Run isolated processes with resource limits. Applies Module 10.

Build a minimal container runtime using Linux namespaces and cgroups. Run a process in an isolated environment with resource limits.


What You're Building

A contain command that:

Why This Project

You've been running Docker containers for years. Now you understand what it actually does. This project is Linux-specific (won't work on macOS) and uses real kernel features. It's the deepest systems programming project in the course.

If you can explain namespaces, cgroups, and pivot_root in an interview — and point to code you wrote that uses them — you're operating at a level most DevOps engineers never reach.

Usage

# Run a shell in an isolated container
sudo ./contain run --rootfs /path/to/alpine-rootfs -- /bin/sh

# Run with memory limit
sudo ./contain run --rootfs ./rootfs --memory 64M -- /bin/sh

# Run with CPU and PID limits
sudo ./contain run --rootfs ./rootfs --memory 128M --cpu 50000 --pids 64 -- /bin/sh

# Run a specific command
sudo ./contain run --rootfs ./rootfs --hostname mycontainer -- /bin/echo "hello from container"

Getting a Root Filesystem

You need a minimal Linux rootfs to use as the container's filesystem. Alpine is perfect:

# Download and extract Alpine mini root filesystem
mkdir rootfs
curl -o alpine.tar.gz https://dl-cdn.alpinelinux.org/alpine/v3.19/releases/x86_64/alpine-minirootfs-3.19.0-x86_64.tar.gz
tar -xzf alpine.tar.gz -C rootfs

Expected Output

$ sudo ./contain run --rootfs ./rootfs --memory 64M --hostname sandbox -- /bin/sh

[contain] creating namespaces: pid uts mnt
[contain] setting hostname: sandbox
[contain] setting up rootfs: ./rootfs
[contain] pivoting root
[contain] mounting /proc
[contain] applying cgroup limits: memory=64M
[contain] running: /bin/sh

/ # hostname
sandbox
/ # ps aux
PID   USER     TIME  COMMAND
    1 root      0:00 /bin/sh
    2 root      0:00 ps aux
/ # cat /proc/self/cgroup
0::/
/ # ls /
bin    dev    etc    home   lib    media  mnt    opt    proc   root   run    sbin   srv    sys    tmp    usr    var
/ # exit

[contain] container exited: exit status 0
[contain] cleaning up cgroups

Requirements

Core

Namespace Flags

Flag Constant What It Isolates
PID CLONE_NEWPID Process IDs — container sees PID 1
UTS CLONE_NEWUTS Hostname — container gets its own
Mount CLONE_NEWNS Mounts — container has its own mount table
Net CLONE_NEWNET Network — container has its own network stack (stretch goal)

Cgroup v2 File Layout

/sys/fs/cgroup/contain-<id>/
├── cgroup.procs      ← Write PID here to add process to group
├── memory.max        ← Max memory in bytes (e.g., "67108864" for 64M)
├── cpu.max           ← "quota period" (e.g., "50000 100000" = 50%)
└── pids.max          ← Max number of processes (e.g., "64")

Helper: Parse Memory Strings

func parseMemory(s string) (int64, error) {
    s = strings.TrimSpace(s)
    multipliers := map[byte]int64{'K': 1024, 'M': 1024 * 1024, 'G': 1024 * 1024 * 1024}
    if len(s) == 0 {
        return 0, fmt.Errorf("empty memory string")
    }
    last := s[len(s)-1]
    if m, ok := multipliers[last]; ok {
        n, err := strconv.ParseInt(s[:len(s)-1], 10, 64)
        return n * m, err
    }
    return strconv.ParseInt(s, 10, 64)
}

Suggested Structure

contain/
├── main.go           ← CLI entry point, run vs child dispatch
├── container.go      ← Namespace setup, re-exec, run the command
├── filesystem.go     ← pivot_root, mount /proc, unmount old root
├── cgroup.go         ← Create cgroup, write limits, cleanup
├── cgroup_test.go    ← Test memory parsing, limit file generation
└── rootfs/           ← Alpine mini rootfs (not committed to git)

Hints

Suggested approach:

  1. Start with the re-exec pattern: parent creates a child process, child prints "hello from child" — verify it works
  2. Add CLONE_NEWUTS and set the hostname — verify with hostname command
  3. Add CLONE_NEWPID — verify child sees itself as PID 1
  4. Add filesystem isolation: pivot_root into the Alpine rootfs
  5. Mount /proc — verify ps shows only the container's processes
  6. Add cgroup memory limits — verify by allocating memory beyond the limit
  7. Add CPU and PID limits
  8. Add cleanup logic with defer

Testing Memory Limits

# Inside the container, try to allocate more than the limit
/ # dd if=/dev/zero of=/dev/null bs=1M count=200
# Should be killed by OOM if memory limit is set below 200M

Testing PID Limits

# Inside the container, try to fork-bomb (safely!)
/ # for i in $(seq 1 100); do sleep 100 & done
# Should fail after hitting the pids.max limit

Debugging Tips

# Check what namespaces a process is in
ls -la /proc/self/ns/

# Check cgroup membership
cat /proc/self/cgroup

# Verify cgroup limits are applied
cat /sys/fs/cgroup/contain-*/memory.max

Safety Notes

Testing

Some parts (namespace creation, pivot_root) require root and are hard to unit test. Focus tests on the pure logic:

func TestParseMemory(t *testing.T) {
    tests := []struct {
        input string
        want  int64
    }{
        {"64M", 67108864},
        {"1G", 1073741824},
        {"512K", 524288},
        {"1048576", 1048576},
    }
    for _, tt := range tests {
        t.Run(tt.input, func(t *testing.T) {
            got, err := parseMemory(tt.input)
            if err != nil {
                t.Fatal(err)
            }
            if got != tt.want {
                t.Errorf("parseMemory(%q) = %d, want %d", tt.input, got, tt.want)
            }
        })
    }
}

func TestCgroupPaths(t *testing.T) {
    cg := newCgroup("test-container")
    if cg.path != "/sys/fs/cgroup/contain-test-container" {
        t.Errorf("unexpected cgroup path: %s", cg.path)
    }
}

func TestCPUMaxFormat(t *testing.T) {
    // 50% CPU = "50000 100000"
    got := formatCPUMax(50000)
    want := "50000 100000"
    if got != want {
        t.Errorf("formatCPUMax(50000) = %q, want %q", got, want)
    }
}

For integration testing, write a script that runs the full binary in a VM and checks output:

#!/bin/bash
# integration_test.sh — run in a VM with root
OUTPUT=$(sudo ./contain run --rootfs ./rootfs --hostname testbox -- /bin/hostname)
if [ "$OUTPUT" != "testbox" ]; then
    echo "FAIL: expected 'testbox', got '$OUTPUT'"
    exit 1
fi
echo "PASS"

Stretch Goals

Skills Used: Syscalls (clone, pivot_root, mount, sethostname), namespaces, cgroups v2, process management, file I/O, exec.Command, SysProcAttr, byte parsing, defer for cleanup, CLI argument handling.