The traditional method of restricting a process is with the chroot()
system call. This system call changes the root directory
from which all other paths are referenced for a process and any child processes. For this
call to succeed the process must have execute (search) permission on the directory being
referenced. The new environment does not actually take effect until you chdir()
into your new environment. It should also be noted that a
process can easily break out of a chroot environment if it has root privilege. This could
be accomplished by creating device nodes to read kernel memory, attaching a debugger to a
process outside of the chroot(8) environment,
or in many other creative ways.
The behavior of the chroot()
system call can be
controlled somewhat with the kern.chroot_allow_open_directories sysctl variable. When this value is set to 0, chroot()
will fail with EPERM if there are any directories open.
If set to the default value of 1, then chroot()
will fail
with EPERM if there are any directories open and the process is already subject to a
chroot()
call. For any other value, the check for open
directories will be bypassed completely.
The concept of a Jail extends upon the chroot()
by
limiting the powers of the superuser to create a true `virtual server'. Once a prison is
set up all network communication must take place through the specified IP address, and
the power of "root privilege" in this jail is severely constrained.
While in a prison, any tests of superuser power within the kernel using the suser()
call will fail. However, some calls to suser()
have been changed to a new interface suser_xxx()
. This function is responsible for recognizing or
denying access to superuser power for imprisoned processes.
A superuser process within a jailed environment has the power to:
Manipulate credential with setuid
, seteuid
, setgid
, setegid
, setgroups
, setreuid
, setregid
, setlogin
Set resource limits with setrlimit
Modify some sysctl nodes (kern.hostname)
chroot()
Set flags on a vnode: chflags
, fchflags
Set attributes of a vnode such as file permission, owner, group, size, access time, and modification time.
Bind to privileged ports in the Internet domain (ports < 1024)
Jail
is a very useful tool for running applications in a
secure environment but it does have some shortcomings. Currently, the IPC mechanisms have
not been converted to the suser_xxx
so applications such as
MySQL cannot be run within a jail. Superuser access may have a very limited meaning
within a jail, but there is no way to specify exactly what "very limited" means.
POSIX® has released a working draft that adds event auditing, access control lists, fine grained privileges, information labeling, and mandatory access control.
This is a work in progress and is the focus of the TrustedBSD project. Some of the initial work has been committed to FreeBSD-CURRENT (cap_set_proc(3)).