This patch I posted yesterday needs some reviews! It adds a qemu-img “diff” subcommand. One use for this is for when you’ve copied/cloned a guest, and you want to go back to a thin-provisioned guest over a base image.
To show it in action, we can use virt-ls. Firstly I’ll start with a Windows 2003 guest which I’ll copy:
$ xzcat /mnt/media/guest-library/win2003.img.xz > base.img $ cp base.img copy.img
At this point, I boot copy.img, log in and install the Windows updates, just so the guest has some changes over the base image.
Now I have two 6 GB files**, but I know that they’re very similar to each other:
$ ll -h total 6.9G -rw-rw-r--. 1 rjones rjones 6.0G May 18 11:26 base.img -rw-rw-r--. 1 rjones rjones 6.0G May 18 11:42 copy.img
I could save some space here using qemu-img diff!
$ ~/d/qemu/qemu-img diff -b base.img copy.img diff.qcow2 Formatting 'diff.qcow2', fmt=qcow2 size=6442450944 backing_file='base.img' encryption=off cluster_size=65536 $ ll -h total 7.2G -rw-rw-r--. 1 rjones rjones 6.0G May 18 11:26 base.img-rw-rw-r--. 1 rjones rjones 6.0G May 18 11:42 copy.img-rw-r--r--. 1 rjones rjones 308M May 18 11:44 diff.qcow2
copy.img is no longer needed because base.img + diff.qcow2 contains the same data. Before we delete it, let’s check that using virt-ls*:
$ virt-ls -lR -a base.img --checksum / > base.files $ virt-ls -lR -a copy.img --checksum / > copy.files $ virt-ls -lR -a diff.qcow2 --checksum / > diff.files
Note we expect that “base” is different from the other two, but “copy” and “diff” should be identical if all went well:
$ md5sum *.files bef46650ff6f2d3f22f562c2a820230f base.files fc723e43a9c16bffd6a9fb7dfd701752 copy.files fc723e43a9c16bffd6a9fb7dfd701752 diff.files
So we can now in confidence delete copy.img.
* = Note there is a bug which stops --checksum from working in released versions of virt-ls.
** = The eagle-eyed will spot that I don’t in fact have two 6 GB files, because cp has cleverly made the copy sparse, Nevertheless qemu-img diff still saves space.
Previously if you wanted to mount a disk image on the host from libguestfs, you had to use a separate program called guestmount.
A couple of months ago, we added FUSE support directly into the libguestfs API, and rewrote guestmount to use it. This also means you can use FUSE from your own libguestfs programs.
Yesterday I wrote a short example of using the mount-local API from C. This program creates a new disk image, formats it etc using libguestfs, then gives you a subshell so you can copy your own files, directories etc. in. When you quit the subshell, the disk image is synced and closed, and you end up with a virtual disk image containing all the files you just added. (Nothing that you couldn’t easily do before, but a nice little demonstration anyway).
Here it is in use:
$ gcc -Wall mount_local.c -o mount_local -lguestfs $ ./mount_local /tmp/test.img This is the 'mount-local' demonstration program. Follow the instructions on screen. Creating and formatting the disk image, please wait a moment ... The _current directory_ is a FUSE filesystem backed by the disk image which is managed by libguestfs. Any files or directories you copy into here (up to 512 MB) will be saved into the disk image. You can also delete files, create certain special files and so on. When you have finished adding files, hit ^D or exit to exit the shell and return to the mount-local program. mount-local-shell> ls lost+found PUT_FILES_AND_DIRECTORIES_HERE
From the subshell, I copy in some files:
mount-local-shell> cp -a /usr/share/doc/libguestfs-devel-1.17.40/ . mount-local-shell> ls libguestfs-devel-1.17.40 lost+found PUT_FILES_AND_DIRECTORIES_HERE mount-local-shell> ls libguestfs-devel-1.17.40/ AUTHORS example-ubuntu.xml BUGS example-windows-2003-x64-cd.xml ChangeLog example-windows-2003-x86-cd.xml copy_over.c example-windows.xml create_disk.c example-windows-xp-cd.xml display_icon.c HACKING example-debian-netinst-cd.xml inspect_vm.c example-debian.xml README example-fedora-dvd.xml RELEASE-NOTES example-fedora-netinst-cd.xml ROADMAP example-fedora.xml TODO example-rhel-6-dvd.xml virt-dhcp-address.c example-rhel-6-netinst-cd.xml virt-inspector.rng example-ubuntu-live-cd.xml
After copying in my files, I exit from the subshell:
mount-local-shell> exit Any files or directories that you copied in have been saved into the disk image called '/tmp/test.img'. Try opening the disk image with guestfish to see those files: guestfish -a /tmp/test.img -m /dev/sda1
Here is the disk image that was created (note it is sparse, so it’s not really so large as it appears):
$ ll /tmp/test.img -rw-r--r--. 1 rjones rjones 536870912 May 14 12:03 /tmp/test.img $ du -sh /tmp/test.img 18M /tmp/test.img
We can use guestfish to look inside it:
$ guestfish -a /tmp/test.img -m /dev/sda1
Welcome to guestfish, the libguestfs filesystem interactive shell for
editing virtual machine filesystems.
Type: 'help' for help on commands
'man' to read the manual
'quit' to quit the shell
><fs> ll /
total 18
drwxr-xr-x 4 root root 1024 May 14 12:03 .
drwxr-xr-x 23 1000 1000 4096 May 14 12:18 ..
-rw-r--r-- 1 root root 0 May 14 12:03 PUT_FILES_AND_DIRECTORIES_HERE
drwxr-xr-x 2 root root 1024 May 14 08:37 libguestfs-devel-1.17.40
drwx------ 2 root root 12288 May 14 12:03 lost+found
Download the test program here: https://github.com/libguestfs/libguestfs/blob/master/examples/mount_local.c
As discussed here previously, gettextize is a particularly sociopathic piece of software that causes unncessary confusion when building any software that it touches.
Well, in libguestfs we finally got rid of it.
The replacement, a simple Makefile.am, is 110 lines long, which is considerably smaller than the maze of m4 macros and shell scripts that it replaced. It also runs much faster because make can compile the PO files in parallel.
virt-rescue is a useful “rescue tool” (like a rescue CD) for virtual machines.
New in libguestfs ≥ 1.17.36 is the virt-rescue --scratch[=N] option which lets you create scratch disks to play with.
Firstly it’s useful for playing around with Linux utilities that you might not normally get to use, such as mdadm and btrfs. For example, suppose you want to try spanning a btrfs filesystem over 4 devices. This is now simple and you don’t even need root:
$ virt-rescue --scratch=4 ><rescue> mkfs.btrfs /dev/vda /dev/vdb /dev/vdc /dev/vdd ><rescue> mount /dev/vda /sysroot ><rescue> btrfs filesystem show Label: none uuid: da1693d6-a89f-4cb6-8405-d277869e289b Total devices 4 FS bytes used 28.00KB devid 1 size 10.00GB used 2.02GB path /dev/vda devid 2 size 10.00GB used 2.00GB path /dev/vdb devid 3 size 10.00GB used 1.01GB path /dev/vdc devid 4 size 10.00GB used 1.01GB path /dev/vdd
Secondly it’s a way to get a second temporary disk attached to an ordinary guest while you’re rescuing it, for example if you need to temporary copy some data off the guest while you’re fixing it.
# virt-rescue -d Fedora16 --scratch ><rescue> fdisk /dev/vdb (use fdisk and mkfs to partition the scratch disk and then use it for temporary data)
You should use libguestfs live with libvirt because it takes the pain out of it.
However if for some reason you don’t want to use libvirt, here is how to use it directly. Note normally you should use libvirt and you do not need to do any of this!
(1) Your guest needs to have guestfsd -r installed and running. [Normally you just install libguestfs-live-service which does everything necessary]
(2) You need to start up qemu (the guest) with these special options, or the equivalent. [Normally you just set this up through libvirt]
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/tmp/socket,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.libguestfs.channel.0
(3) Start guestfish and use the attach-method command to point to the socket. [Normally using '-d Guest --live' does all this automatically for you by getting the necessary information out of libvirt]
$ guestfish ><fs> attach-method unix:/tmp/socket ><fs> run ><fs> ll / [use libguestfs command as normal here]
17G debian5x64.img.orig
The original image has about 4.7 GB of data, plus a large swap partition, according to virt-df:
$ virt-df -a debian5x64.img.orig -h
Filesystem Size Used Available Use%
debian5x64.img.orig:/dev/sda1 322M 66M 239M 21%
debian5x64.img.orig:/dev/debian5x64.home.annexia.org/home
3.4G 359M 2.9G 11%
debian5x64.img.orig:/dev/debian5x64.home.annexia.org/root
320M 301M 2.4M 95%
debian5x64.img.orig:/dev/debian5x64.home.annexia.org/tmp
300M 8.2M 276M 3%
debian5x64.img.orig:/dev/debian5x64.home.annexia.org/usr
3.4G 2.0G 1.2G 60%
debian5x64.img.orig:/dev/debian5x64.home.annexia.org/var
2.6G 2.0G 536M 76%
Using virt-sparsify, all unused space in the image is made sparse. The most recent version can sparsify swap partitions too:
4.6G debian5x64.img
xz --best -T 0 reduces the final image to under a gigabyte:
971M debian5x64.img.xz
(1) Is the guest generating disk and network activity (a.k.a are the lights flashing)?
I don’t have nice code for this, but you can see the technique that Oz uses here. Oz uses the libvirt monitoring APIs to look for disk and network activity, and signals when it hasn’t seen any after a certain timeout period.
(2) When was the last time a user logged in?
If this is the sort of “activity” you’re after, then you can use virt-cat on Linux, or virt-win-reg [sorry, no actual example yet] on Windows.
(3) What was the last file updated in a guest?
Finally, if you’re interested in the newest file updated in a guest, see this technique which will work for any Linux or Windows guest.
Are there other kinds of “activity” that it’s interesting to find from guests?
I’ve updated the libguestfs preview packages for RHEL 6.2/6.3. These contain many new features and bug fixes compared to RHEL 6.2.
An easy way to use these packages is to run this command, which creates a yum repository pointing to the preview packages:
# cat <<EOF > /etc/yum.repos.d/libguestfs-RHEL-6.3-preview.repo [libguestfs-RHEL-6.3-preview] name=libguestfs RHEL 6.3 preview - x86_64 baseurl=http://people.redhat.com/~rjones/libguestfs-RHEL-6.3-preview/ enabled=1 gpgcheck=0 EOF
Thursday 2012-04-12 is Fedora 17 virtualization test day and you are warmly invited.
You will need: A Fedora 17 pre-release version installed and fully updated, and at least one guest.
Here are the libguestfs tests.
New in libguestfs ≥ 1.17.22 is the ability to mount the guest filesystem on a local mountpoint. Well, you can already do that using guestmount, but the new thing is that you can do it from the API (from any language).
Here is an example using guestfish:
$ guestfish --ro -a /dev/vg_pin/F16x64 -i
Welcome to guestfish, the libguestfs filesystem interactive shell for
editing virtual machine filesystems.
Type: 'help' for help on commands
'man' to read the manual
'quit' to quit the shell
Operating system: Fedora release 16 (Verne)
/dev/vg_f16x64/lv_root mounted on /
/dev/vda2 mounted on /boot
><fs> ! mkdir /tmp/mnt # creates a local directory
><fs> mount-local /tmp/mnt readonly:true
><fs> mount-local-run
# the errors come from a GNOME daemon that
# looks at all new mountpoints
libguestfs: error: lstat: /.Trash: No such file or directory
libguestfs: error: lstat: /.Trash-500: No such file or directory
Over in another terminal, we can see the filesystem mounted on the local directory /tmp/mnt:
$ ls /tmp/mnt bin dev home lib64 media opt root sbin srv tmp var boot etc lib lost+found mnt proc run selinux sys usr $ cat /tmp/mnt/etc/redhat-release Fedora release 16 (Verne)
Unmounting the filesystem causes the guestfish mount-local-run command to return (since that command was actually serving the FUSE requests from the kernel).
$ fusermount -u /tmp/mnt
New in libguestfs 1.17.18, virt-sparsify can now find and sparsify even more space in your guests — Linux swap partitions are detected and zeroed (though not if they contain hibernation data).
The --zero option can be used to remove unwanted partitions.
The new version is also faster and uses less temporary space during the first stage.
You can now easily add the libguestfs RHEL 6.3 preview packages to your RHEL 6.2 system as a yum repo:
# cat <<EOF > /etc/yum.repos.d/libguestfs-RHEL-6.3-preview.repo [libguestfs-RHEL-6.3-preview] name=libguestfs RHEL 6.3 preview - x86_64 baseurl=http://people.redhat.com/~rjones/libguestfs-RHEL-6.3-preview/ enabled=1 gpgcheck=0 EOF
For more information, see the README file.
In recent months I have spent more of my time working on projects immediately above/related to the core libvirt library, such as libvirt-glib, libosinfo and virt-sandbox. To that list I have now added OpenStack, where my goal is to ensure that the libvirt driver is following all the best practices and start to take advantage of libosinfo for optimizing virtual hardware configuration. I’m familiar with hacking on python so that’s no big issue, but what is new about OpenStack is dealing with Gerrit. For the sake of reference, here were the steps I went through on Fedora 16 for my first patch (a tweak to the tools/install_venv.sh file)
$ mkdir $HOME/src/cloud $ cd $HOME/src/cloud $ git clone git://github.com/openstack/nova.git $ cd nova
$ sudo yum install gcc python-pep8 python-virtualenv m2crypto libvirt libvirt-python libxslt-devel libxml2-devel $ sudo yum remove python-distutils-extra
$HOME/.ssh/id_rsa.pub)$ ssh -p 29418 berrange@review.openstack.org The authenticity of host '[review.openstack.org]:29418 ([173.203.103.119]:29418)' can't be established. RSA key fingerprint is ee:2f:ac:1b:f8:25:d0:39:be:55:02:c7:76:5e:39:53. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '[review.openstack.org]:29418,[173.203.103.119]:29418' (RSA) to the list of known hosts. **** Welcome to Gerrit Code Review **** Hi Daniel Berrange, you have successfully connected over SSH. Unfortunately, interactive shells are disabled. To clone a hosted Git repository, use: git clone ssh://berrange@review.openstack.org:29418/REPOSITORY_NAME.git Connection to review.openstack.org closed.
$ scp -p -P 29418 berrange@review.openstack.org:hooks/commit-msg .git/hooks/
$ git remote add gerrit ssh://berrange@review.openstack.org:29418/openstack/nova.git
$ git checkout -b venv-install-fixes
$ vi tools/virtual_venv.py $ git add -u (Don't forget to add yourself to Authors if this is your first change)
$ git commit -s $ git show commit fd682a28fb4591c65f20129d4bfb4eccf1232cb8 Author: Daniel P. Berrange <berrange@redhat.com> Date: Thu Jan 5 13:15:15 2012 +0000 Tell users what is about to be installed via sudo Rather than just giving users the sudo password prompt immediately, actually tell them what is about to be installed, so they know what privileged action is being attempted. Change-Id: Ic0c1de812be119384753895531a008075b13494e Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If the commit is fixing a OpenStack bug, then the commit message should include a line “BugXXXX” where XXXX is the bug number. Gerrit uses this to link to the bug tracker
$ ./run_tests.sh $ ./run_tests.sh --pep8
$ git push gerrit HEAD:refs/for/master
There is also a GIT plugin “git review” available in the git-review RPM, which can provide syntactic sugar for step 12, but personally I don’t find it adds significant value to be worth my while using.
I can see the attraction of Gerrit, but I personally still prefer the practice of using git send-email for reviewing on mailing lists. My problems with Gerrit are
The libguestfs binary “fixed” appliance [README] is a copy of the Fedora appliance, adapted so you can just download it and use it directly with libguestfs.
Just download libguestfs ≥ 1.17.10 (or from git), compile without the usual appliance:
./configure --disable-appliance --disable-daemon make sudo make install
Set your path to point to wherever you unpacked the appliance:
export LIBGUESTFS_PATH=/usr/local/lib/libguestfs/appliance
and then run libguestfs and the virt tools in the usual way.
I’ve updated the libguestfs RHEL 6.3 preview packages to libguestfs 1.16.4. Barring any bugs that we find over the next few months, these should be close to what we ship in RHEL 6.3.
http://people.redhat.com/~rjones/libguestfs-RHEL-6.3-preview/
You can install these on RHEL 6.2, but make sure you read the README file first.
A few months back the Red Hat KVM team held a mass keysigning party to setup a web of trust between each others keys. IIRC, there were approximately 20 people participating in this, which potentially meant alot of tedious typing of GPG commands, with the potential for error such tedium implies. Fortunately we had Jim Meyering on hand to give us some tips for facilitating/optimizing the process, the most important of which was to introduce us to the ‘Pius‘ tool. To quote from its website
pius(PGP Individual UID Signer) helps attendees of PGP keysigning parties. It is the main utility and allows you to quickly and easily sign each UID on a set of PGP keys. It is designed to take the pain out of the sign-all-the-keys part of PGP Keysigning Party while adding security to the process.…
That can already be time consuming, but preferrably, you want to verify the identity in each UID, which means verifying the email addresses. There are a few ways to do this, but one of them is to sign each UID on the key individually (which requires import-sign-export-delete for each UID), encrypt-emailing that key to the email address in the UID. This can be incredibly time consuming.
That’s where pius comes in. Pius will do all the work for you – all you have to do is confirm the fingerprint for each key. It will then take care of signing each UID cleanly, minimizing the key, and using PGP/Mime email to send it, encrypted, to the email address in the UID.
The steps Jim defined for us to follow using Pius were as follows
# cat > keyids.txt <<EOF 4096R/000BEEEE 2010-06-14 Jim Meyering 4096R/E1B768A0 2011-10-11 Richard W.M. Jones 4096R/15104FDF 2011-10-11 Daniel P. Berrange ... EOF
# id_list=$(perl -nle 'm!^\d{4}R/(\S{8}) ! and print $1' keyids.txt)
# gpg --recv-keys $(echo $id_list)
# gpg --fingerprint $(echo $id_list)
MAIL_HOST=smtp.your.mail.server.com me=your@email.address.com (eg dan@berrange.com) my_id=XXXXXXXXXXX (Your GPG Key ID eg 15104FDF) # pius --mail-host=MAIL_HOST --no-pgp-mime --mail=$me --signer=$my_id $(echo $id_list)
What Pius does here is that for each key ID it is given, it will sign each individual identity (email address). The signature will be ascii-armoured and then sent to the email address associated with that identity. If a user has multiple email addresses on their key, they will receive one signature email per address. The email contains instructions for what the receipient should do. The email will look something like this
From: eblake@redhat.com To: berrange@redhat.com Subject: Your signed PGP key [-- Attachment #1 --] [-- Type: text/plain, Encoding: 7bit, Size: 0.7K --] Hello, Attached is a copy of your PGP key (0x15104fdf) signed by my key (0xa7a16b4a2527436a). If your key has more than one UID, than this key only has the UID associated with this email address (berrange@redhat.com) signed and you will receive additional emails containing signatures of the other UIDs at the respective email addresses. Please take the attached message and decrypt it and then import it. Something like this should work: gpg -d | gpg --import Then, don't forget to send it to a keyserver: gpg --keyserver pool.sks-keyservers.net --send-key 15104fdf If you have any questions, let me know. Generated by PIUS (http://www.phildev.net/pius/). [-- Attachment #2: 15104fdf__berrange_at_redhat.com_ENCRYPTED.asc --] [-- Type: application/octet-stream, Encoding: 7bit, Size: 4.6K --]
The final thing, once everyone has dealt with the emails they received, is to refresh your local key database to pull down all the new signatures
# gpg --recv-keys $(echo $id_list)
I should point out that Pius isn’t just for mass key signing parties. Even if you only have 1 single key you want to sign, it is still a very convenient tool to use. The simplified set of steps to go through would be
# gpg --recv-key XXXXXXXX # gpg --fingerprint XXXXXXXX # ...verify person's identity & fingerprint # pius --mail-host=MAIL_HOST --no-pgp-mime --mail=$me --signer=$my_id XXXXXXX # ....some time later... # gpg --recv-key XXXXXXXX
Thanks again to Jim Meyering for pointing out Pius and doing the organization for our key signing party & defining the steps I describe above. BTW, Pius is available in Fedora from F16 onwards.
As mentioned previously, today I presented a talk at FOSDEM 2012, titled “Building application sandboxes on top of LXC and KVM with libvirt”. As promised I have now uploaded the PDF slides for public access. For further information about libvirt-sandbox, consult this previous blog post on the subject. Also keep an eye on this site for further blog posts in the future. Thanks to everyone who attended the talk. I look forward to returning again in a year’s time for another update.

Big theatre!

The code examples are in this git repository (git clone git://git.annexia.org/git/libguestfs-talks.git)
Here:
http://people.redhat.com/~rjones/libguestfs-RHEL-6.3-preview/
These are based on libguestfs 1.16.1 1.16.2.
FreeDOS 1.1 running in KVM with 4 MB (sic) of virtual RAM:

This is more than just a silly experiment. Being able to run very small VMs (and this is by far the smallest real VM I have been able to run) allows us to test the scalability of KVM to hundreds or thousands of guests using standard hardware.
It has revealed a couple of bugs in libguestfs too …
As many readers are no doubt aware, the FOSDEM 2012 conference is taking place this weekend in Brussels. This year I was organized enough to submit a proposal for a talk and was very happy to be accepted. My talk is titled “Building app sandboxes on top of LXC and KVM with libvirt” and is part of the Virtualization & Cloud Dev Room. As you can guess from the title, I will be talking in some detail about the libvirt-sandbox project I recently announced. Richard Jones is also attending to provide a talk on libguestfs and how it is used in cloud projects like OpenStack. There will be three talks covering different aspects of the oVirt project, a general project overview, technical look at the management engine and a technical look at the node agent VDSM. Finally the GNOME Boxes project I mentioned a few weeks ago will also be represented in the CrossDesktop devroom.
Besides these virtualization related speakers, there are a great many other Red Hat people attending FOSDEM this year, so we put together a small flyer highlighting all their talks. In keeping with the spirit of FOSDEM, these talks will of course be community / technically focused, not corporate marketing ware :-) I look forward to meeting many people at FOSDEM this year, and if all goes well, make it a regular conference to attend.
In libguestfs 1.16 we added experimental GObject bindings and support for GObject Introspection. These are experimental because we may change them a little in future. They do allow you to access libguestfs from Javascript, specifically from gjs.
Here is an example program (fixed and updated):
const Guestfs = imports.gi.Guestfs;
function inspect (filename)
{
var g = new Guestfs.Session ();
//g.set_trace (true);
var optargs = new Guestfs.AddDriveOpts ({readonly: true});
g.add_drive_opts (filename, optargs);
g.launch ();
var roots = g.inspect_os ()
if (roots.length == 0)
printerr ("inspection: no operating systems found in", filename);
else {
for (var i = 0; i < roots.length; ++i) {
inspect_root (g, roots[i]);
}
}
}
function inspect_root (g, root)
{
print ("inspecting operating system root", root);
print (" product name:", g.inspect_get_product_name (root));
print (" version:",
g.inspect_get_major_version (root),
g.inspect_get_minor_version (root));
//print (" type:", g.inspect_get_type (root));
print (" distro:", g.inspect_get_distro (root));
// Mount up the disks like guestfish -i
var mps = g.inspect_get_mountpoints (root);
var keys = [];
for (var key in mps) { keys.push (key); }
function compare (a, b) {
if (a.length > b.length) return 1;
else if (a.length == b.length) return 0;
else return -1;
}
keys.sort (compare);
for (var i = 0; i < keys.length; ++i) {
g.mount_ro (mps[keys[i]], keys[i]);
}
// Get the list of applications.
print (" applications:");
apps = g.inspect_list_applications (root);
for (var i = 0; i < apps.length; ++i) {
print (" ", apps[i].app_name,
apps[i].app_version, apps[i].app_release);
}
g.umount_all ();
}
if (ARGV.length != 1) {
printerr ("Usage: gjs test.js disk.img");
} else {
inspect (ARGV[0]);
}
There is a new tool in libguestfs ≥ 1.15.17: virt-format lets you erase disks and make blank disks.
Usage is quite simple:
$ virt-format -a disk.img
(Note that erases any data on disk.img!)
This works for any format of disk, eg. qcow2, or you can run it on host partitions, LVs, USB storage etc. By default it just creates a partition, but using other options you can make empty filesystems and logical volumes.
This is a simple tool that doesn’t cover everything you might want to do. For more complex requirements, see virt-make-fs or guestfish.
I have mentioned in passing every now & then over the past few months, that I have been working on a tool for creating application sandboxes using libvirt, LXC and KVM. Last Thursday, I finally got around to creating a first public release of a package that is now called libvirt-sandbox. Before continuing it is probably worth defining what I consider the term “application sandbox” to mean. My working definition is that an “application sandbox” is simply a way to confine the execution environment of an application, limiting the access it has to OS resources. To me one notable point is that there is no need for a separate / special installation of the application to be confined. An application sandbox ought to be able to run any existing application installed in the OS.
For a few Fedora releases, users have had the SELinux sandbox command which will execute a command with a strictly confined SELinux context applied. It is also able to make limited use of the kernel filesystem namespace feature, to allow changes to the mount table inside the sandbox. For example, the common case is to put in place a different $HOME. The SELinux sandbox has been quite effective, but there is a limit to what can be done with SELinux policy alone, as evidenced by the need to create a setuid helper to enable use of the kernel namespace feature. Architecturally this gets even more problematic as new feature requests need to be dealt with.
As most readers are no doubt aware, libvirt provides a virtualization management API, with support for a wide variety of virtualization technologies. The KVM driver is easily the most advanced and actively developed driver for libvirt with a very wide array of features for machine based virtualization. In terms of container based virtualization, the LXC driver is the most advanced driver in libvirt, often getting new features “for free” since it shares alot of code with the KVM driver, in particular anything cgroup based. The LXC driver has always had the ability to pass arbitrary host filesystems through to the container, and the KVM driver gained similar capabilities last year with the inclusion of support for virtio 9p filesystems. One of the well known security features in libvirt is sVirt, which leverages MAC technology like SELinux to strictly confine the execution environment of QEMU. This has also now been adapted to work for the LXC driver.
Looking at the architecture of the SELinux sandbox command last year, it occurred to me that the core concepts mapped very well to the host filesystem passthrough & sVirt features in libvirt’s KVM & LXC drivers. In other words, it ought to be possible to create application sandboxes using the libvirt API and suitably advanced drivers like KVM or LXC. A few weeks hacking resulted in a proof of concept tool virt-sandbox which can run simple commands in sandboxes built on LXC or KVM.
A command line tool for running applications inside a sandbox is great, but even more useful would be an API for creating application sandboxes that programmers can use directly. While libvirt provides an API that is portable across different virtualization technologies, it cannot magically hide the differences in feature set or architecture between the technologies. Thus the decision was taken to create a new library called libvirt-sandbox that provides a higher level API for managing application sandboxes, built on top of libvirt. The virt-sandbox command from the proof of concept would then be re-implemented using this library API.
The libvirt-sandbox library is built using GObject to enable it to be accessible to any programming language via GObject Introspection. The basic idea is that programmer simply defines the desired characteristics of the sandbox, such as the command to be executed, any arguments, filesystems to be exposed from host, any bind mounts, private networking configuration, etc. From this configuration description, libvirt-sandbox will decide upon & construct a libvirt guest XML configuration that can actually provided the requested characteristics. In other words, the libvirt-sandbox API is providing a layer of policy avoid libvirt, to isolate the application developer from the implementation details of the underlying hypervisor.
Building sandboxes using LXC is quite straightforward, since application confinement is a core competency of LXC. Thus I will move straight to the KVM implementation, which is where the real fun is. Booting up an entire virtual machine probably sounds like quite a slow process, but it really need not be particularly if you have a well constrained hardware definition which avoids any need for probing. People also generally assume that running a KVM guest, means having a guest operating system install. This is absolutely something that is not acceptable for application sandboxing, and indeed not actually necessary. In a nutshell, libvirt-sandbox creates a new initrd image containing a custom init binary. This init binary simply loads the virtio-9p kernel module and then mounts the host OS’ root filesystem as the guest’s root filesystem, readonly of course. It then hands off to a second boot strap process which runs the desired application binary and forwards I/O back to the host OS, until the sandboxed application exits. Finally the init process powers off the virtual machine. To get an idea of the overhead, the /bin/false binary can be executed inside a KVM sandbox with an overall execution time of 4 seconds. That is the total time for libvirt to start QEMU, QEMU to run its BIOS, the BIOS to load the kernel + initrd, the kenrel to boot up, /bin/false to run, and the kernel to shutdown & QEMU to exit. I think 3 seconds is pretty impressive todo all that. This is a constant overhead, so for a long running command like an MP3 encoder, it disappears into the background noise. With sufficient optimization, I’m fairly sure we could get the overhead down to approx 2 seconds.
The Fedora review of the libvirt-sandbox package was nice & straightforward, so the package is already available in rawhide for ready to test the VirtSandbox F17 feature. The virt-sandbox command is provided by the libvirt-sandbox RPM package
# yum install libvirt-sandbox
Assuming libvirt is already installed & able to run either LXC or KVM guests, everything is ready to use immediately.
A first example is to run the ‘/bin/date’ command inside a KVM sandbox:
$ virt-sandbox -c qemu:///session /bin/date Thu Jan 12 22:30:03 GMT 2012
You want proof that this really is running an entire KVM guest ? How about looking at the /proc/cpuinfo contents:
$ virt-sandbox -c qemu:///session /bin/cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 2 model name : QEMU Virtual CPU version 1.0 stepping : 3 cpu MHz : 2793.084 cache size : 4096 KB fpu : yes fpu_exception : yes cpuid level : 4 wp : yes flags : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm up rep_good nopl pni cx16 hypervisor lahf_lm bogomips : 5586.16 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management:
How about using LXC instead of KVM, and providing an interactive console instead of just a one-shot command ? Yes, we can do that too:
$ virt-sandbox -c lxc:/// /bin/sh sh-4.2$ ps -axuwf USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 165436 3756 pts/0 Ss+ 22:31 0:00 libvirt-sandbox-init-lxc berrange 24 0.0 0.1 167680 4688 pts/0 S+ 22:31 0:00 libvirt-sandbox-init-common berrange 47 0.0 0.0 13852 1608 pts/1 Ss 22:31 0:00 \_ /bin/sh berrange 48 0.0 0.0 13124 996 pts/1 R+ 22:31 0:00 \_ ps -axuwf
Notice how we only see the processes from our sandbox, none from the host OS. There are many more examples I’d like to illustrate, but this post is already far too long.
This blog post might give the impression that every is complete & operational, but that is far from the truth. This is only the bare minimum functionality to enable some real world usage. Things that are yet to be dealt with include
For those attending FOSDEM this year, I will be giving a presentation about libvirt-sandbox in the virt/cloud track.
Oh and as well as the released tar.gz mentioned in the first paragraph, or the Fedora RPM, the code is all available in GIT
Powered by Planet!
Last updated: May 19, 2012 09:11 AM