The What, Why and How of Containers

Alternate title: An history of OS-Level Virtualisation on UNIX and then Linux.

This post is part of an entry for the Handmade Network’s Learning Jam 2024

The problem

It all starts with chroot (UNIX, 1982)

$ apropos chroot
chroot (1)           - run command or interactive shell with special root directory
chroot (2)           - change root directory

Chroot is a kernel mechanism available since early versions of UNIX that allows to run a process as with an alternate root directory. The process shares the kernel and hardware, but only has access to a subtree of the file system.

Jails on Free BSD (FreeBSD, 2000)

FreeBSD jails basically take chroot and build upon it by adding mechanism to isolate and control the use of other system resources beside the filesystem.

The end result is a kernel mechanism that is a complete container implementation:

  • Fully (More or less, depending onthe exact configuration of the jail) isolated userspace
  • Running on the same kernel as the host

Quick tangent: Control Groups (cgroups) (Linux, 2007, major update in 2016)

Control groups are a mechanism in Linux that allows to control which how much of the system resources a process (and it’s child processes) can use. It wasn’t originally meant for virtualisation, but rather as a system to avoid processes fighting over hardware, and to implement quotas. It however turned out to be very useful when it came to implement containers.

Namespaces (Linux, starting 2002)

Namespaces allow to isolate processes within the namespace from the rest with regard to a specific global resource such as mount points, process ids, user ids, interprocess communication, networking or time.

But more crucially they also allow to isolate across cgroups, giving the illusion of being alone on the system.

How to make a linux container

With those mechanisms (chroot, cgroups and namespaces) in place creating a container is conceptually relatively simple:

  • First you populate the subtree that your container will have access too, ready to be chroot’ed in
  • Then you create namespaces for all you need to isolate (this usually includes at least PID, UID, mountpoints and cgroups)
  • Finally you run your containerized process within your namespaces, chroot’ed to its subtree

Conclusion

In practice you don’t need to make them from scratch, people already have made systems for managing containers with nicer user interfaces. Those include Docker, LXC and systemd-nspawn just to name a few, and there is probably one just right for your use-case.

Sources


2024-03-24