QEMU is able to leverage a number of live tracing systems, the choice configurable at build time between
Upstream QEMU enables the “log” trace backend by default since it is cross-platform portable and very simple to use by adding “-d trace:PATTERN
” on the QEMU command line. For example to enable logging of all trace events in the QEMU I/O subsystem (aka “qio
“) we can
$ qemu -d trace:qio* ...some args... 23266@1547735759.137292:qio_channel_socket_new Socket new ioc=0x563a8a39d400 23266@1547735759.137305:qio_task_new Task new task=0x563a891d0570 source=0x563a8a39d400 func=0x563a86f1e6c0 opaque=0x563a89078000 23266@1547735759.137326:qio_task_thread_start Task thread start task=0x563a891d0570 worker=0x563a86f1ce50 opaque=0x563a891d9d90 23273@1547735759.137491:qio_task_thread_run Task thread run task=0x563a891d0570 23273@1547735759.137503:qio_channel_socket_connect_sync Socket connect sync ioc=0x563a8a39d400 addr=0x563a891d9d90 23273@1547735759.138108:qio_channel_socket_connect_fail Socket connect fail ioc=0x563a8a39d400
This is very simple and surprisingly effective much of the time, but it is not without its downsides
For these reasons, some downstreams chose not to use the default “log
” backend. Both Fedora and RHEL have instead enabled the “dtrace
” backend which can be leveraged via systemtap
on Linux. This provides a very powerful tracing system, but the cost is that the previous simple task of printing a formatted string when a probe point fires has become MUCH more complicated. For example to get equivalent output to that seen with QEMU’s log backend would require
# cat > trace.stp <<EOF probe qemu.system.x86_64.qio_task_new { printf("%d@%d qio_task_new Task new task=%p source=%p func=%p opaque=%p\n", pid(), gettimeofday_ns(), task, source, func, opaque) } EOF # stap trace.stp 22806@1547735341399862570 qio_task_new Task new task=0x56135cd66eb0 source=0x56135d1d7c00 func=0x56135af746c0 opaque=0x56135bf06400
Repeat that code snippet for every qio*
probe point you want to watch, figuring out the set of args it has available to print.This quickly becomes tedious for what should be a simple logging job, especially if you need to reference null terminated strings from userspace.
After cursing this difficulty one time too many, it occurred to me that QEMU could easily do more to make life easier for systemtap users. The QEMU build system is already auto-generating all the trace backend specific code from a generic description of probes in the QEMU source tree. It has a format string which is used in the syslog, log and ftrace backends, but this is ignored for the dtrace backend. It did not take much to change the code generator so that it can use this format string to generate a convenient systemtap tapset representing the above manually written probe:
probe qemu.system.x86_64.log.qio_task_new = qemu.system.x86_64.qio_task_new ? { printf("%d@%d qio_task_new Task new task=%p source=%p func=%p opaque=%p\n", pid(), gettimeofday_ns(), task, source, func, opaque) }
This can be trivially executed with minimal knowledge of systemtap tapset language required
# stap -e "qemu.system.x86_64.log.qio_task_new{}" 22806@1547735341399862570 qio_task_new Task new task=0x56135cd66eb0 source=0x56135d1d7c00 func=0x56135af746c0 opaque=0x56135bf06400
Even better, we have now gained the ability to use wildcards too
# stap -e "qemu.system.x86_64.log.qio*{}" 23266@1547735759.137292:qio_channel_socket_new Socket new ioc=0x563a8a39d400 23266@1547735759.137305:qio_task_new Task new task=0x563a891d0570 source=0x563a8a39d400 func=0x563a86f1e6c0 opaque=0x563a89078000 23266@1547735759.137326:qio_task_thread_start Task thread start task=0x563a891d0570 worker=0x563a86f1ce50 opaque=0x563a891d9d90 23273@1547735759.137491:qio_task_thread_run Task thread run task=0x563a891d0570 23273@1547735759.137503:qio_channel_socket_connect_sync Socket connect sync ioc=0x563a8a39d400 addr=0x563a891d9d90 23273@1547735759.138108:qio_channel_socket_connect_fail Socket connect fail ioc=0x563a8a39d400
Users still, however, need to be aware of the naming convention for QEMU’s systemtap tapsets and how it maps to the particular QEMU binary that is used, and don’t forget the trailing “{}”. Thus I decided to go one step further and ship a small helper tool to make it even easier to use
$ qemu-trace-stap run qemu-system-x86_64 'qio*' 22806@1547735341399856820 qio_channel_socket_new Socket new ioc=0x56135d1d7c00 22806@1547735341399862570 qio_task_new Task new task=0x56135cd66eb0 source=0x56135d1d7c00 func=0x56135af746c0 opaque=0x56135bf06400 22806@1547735341399865943 qio_task_thread_start Task thread start task=0x56135cd66eb0 worker=0x56135af72e50 opaque=0x56135c071d70 22806@1547735341399976816 qio_task_thread_run Task thread run task=0x56135cd66eb0
The second argument to this tool is the QEMU binary filename to be traced, which can be relative (to search $PATH
) or absolute. What is clever is that it will set the SYSTEMTAP_TAPSET
env variable to point to the right location to find the corresponding tapset definition. This is very useful when you have multiple copies of QEMU on the system and need to make sure systemtap traces the right one.
The ‘qemu-trace-stap
‘ script takes a verbose arg so you can understand what it is running behind the scenes:
$ qemu-trace-stap run /home/berrange/usr/qemu-git/bin/qemu-system-x86_64 'qio*' Using tapset dir '/home/berrange/usr/qemu-git/share/systemtap/tapset' for binary '/home/berrange/usr/qemu-git/bin/qemu-system-x86_64' Compiling script 'probe qemu.system.x86_64.log.qio* {}' Running script, <Ctrl>-c to quit ...trace output...
It can enable multiple probes at once
$ qemu-trace-stap run qemu-system-x86_64 'qio*' 'qcrypto*' 'buffer*'
By default it monitors all existing running processes and all future launched proceses. This can be restricted to a specific PID using the –pid arg
$ qemu-trace-stap run --pid 2532 qemu-system-x86_64 'qio*'
Finally if you can’t remember what probes are valid it can tell you
$ qemu-trace-stap list qemu-system-x86_64 ahci_check_irq ahci_cmd_done ahci_dma_prepare_buf ahci_dma_prepare_buf_fail ahci_dma_rw_buf ahci_irq_lower ...snip...
This new functionality merged into QEMU upstream a short while ago and will be included in the QEMU 4.0 release coming at the end of April.
A couple of times I already ran into the situation that I wanted to provide
a small guest disk image to other people. For example, one time I wanted to
provide a test application like LTP to colleagues via a server where I only
had some limited disk quota available. Back then I was still able to resolve
the problem by installing a stock Linux distribution together with the test
software into a normal qcow2 image, and then to shrink the image with
qemu-img convert
and xz
to approximately 500 MiB.
But when I started to think about the QEMU advent calendar 2018, where I wanted to provide many small images for various different target architectures, it was clear to me that I needed a different approach. First the disk images needed to be much smaller due to network traffic constraints, and for many of “non-mainstream” target architectures (like MicroBlaze or Xtensa) you also can not easily get a standard Linux distribution that installs without problems on the machines that QEMU provides.
Instead of using a pre-built Linux distribution, it would also be possible to cross-compile the kernel and user space programs and build a small disk image with that on your own. However, figuring out how to do that for multiple target architectures would have been very cumbersome and time consuming.
So after doing some research, I finally discovered buildroot, which is an excellent framework for doing exactly what I wanted: It allows to create small disk images for non-x86 target CPUs, with all the magic about cross compiling and image creation wrapped into its internal scripts, and with a very flexible Kconfig-style configuration system on top.
For those who are interested, here’s now a short description how to use buildroot for creating a small guest disk image:
$ wget https://buildroot.org/downloads/buildroot-2018.02.9.tar.bz2
$ tar -xaf buildroot-2018.02.9.tar.bz2
$ cd buildroot-2018.02.9/
$ cd configs/
$ ls qemu*
qemu_aarch64_virt_defconfig qemu_nios2_10m50_defconfig
qemu_arm_versatile_defconfig qemu_or1k_defconfig
qemu_arm_versatile_nommu_defconfig qemu_ppc64le_pseries_defconfig
qemu_arm_vexpress_defconfig qemu_ppc64_pseries_defconfig
qemu_m68k_mcf5208_defconfig qemu_ppc_g3beige_defconfig
qemu_m68k_q800_defconfig qemu_ppc_mpc8544ds_defconfig
qemu_microblazebe_mmu_defconfig qemu_ppc_virtex_ml507_defconfig
qemu_microblazeel_mmu_defconfig qemu_sh4eb_r2d_defconfig
qemu_mips32r2el_malta_defconfig qemu_sh4_r2d_defconfig
qemu_mips32r2_malta_defconfig qemu_sparc64_sun4u_defconfig
qemu_mips32r6el_malta_defconfig qemu_sparc_ss10_defconfig
qemu_mips32r6_malta_defconfig qemu_x86_64_defconfig
qemu_mips64el_malta_defconfig qemu_x86_defconfig
qemu_mips64_malta_defconfig qemu_xtensa_lx60_defconfig
qemu_mips64r6el_malta_defconfig qemu_xtensa_lx60_nommu_defconfig
qemu_mips64r6_malta_defconfig
$ cd ..
$ make qemu_ppc64_pseries_defconfig
make menuconfig
to fine tune your build. I recommend to have
a look at the following settings first:
Now save your configuration, exit the config menu, and type make
for a
first test to see whether it produces a usable image. Note: Don’t use the -j
parameter of make here, buildroot will figure that out on its own instead.
output/images/
directory. You can start your guest with the results from there to give it a
try. For example if you built with the ppc64 pseries configuration, with the
initrd linked into the kernel:
$ qemu-system-ppc64 -M pseries -m 1G -kernel output/images/vmlinux
You should see the kernel booting up, and if you have a look at the serial console, there is also a getty running where you can log in as root and look around.
$ mkdir overlay/etc/init.d
$ cp my-startup-script.sh overlay/etc/init.d/S99myscript # If you have one
$ cp output/target/etc/inittab overlay/etc/inittab
$ echo 'tty1::respawn:/sbin/getty -L tty1 0 linux' >> overlay/etc/inittab
Then run make menuconfig
and set the Root filesystem overlay directories
option in the System Configuration section to the overlay
folder that
you have just created. Run make
again and the next time you start
your guest, you should see the new files in the image, e.g. also a getty
running on the graphical console. Note: Do not try to add/change files
directly in the output/target/ folder. That looks tempting first, but
this is just a temporary folder used by the build system, which can be
overwritten at any time and will be erased when you run make clean
for
example.
If you need to tweak the kernel configuration, you can run
make linux-menuconfig
and do the appropriate changes there. For example,
if you want to get keyboard input for the ppc64 pseries machine on the
graphical console, you should enable the USB XHCI driver in the kernel,
too. Once you are happy with the kernel configuration, save it, exit the
menu and type make linux-rebuild && make
. Note: To avoid that the kernel
config gets reset after you run make clean
at a later point in time,
you should copy output/build/linux-*/.config to a safe location. Then
run make menuconfig
, change the Kernel -> Kernel configuration
setting to Use a custom config file and set the Configuration file path
to the copied file.
That’s it. You now should be able to package your software in really small VM images. Of course, there are still lots of other settings that you can tweak in the buildroot environment – if you need any of these just have a look at the good buildroot manual for more information.
The VIRTIO 1.1 specification for paravirtualized I/O devices includes the new packed vring layout and the GPU, input, crypto, and socket device types. In addition to this there are other improvements and new features in the specification. The new vring layout will increase performance and offers new features that devices can take advantage of.
You can review the specification and post comments until February 21st 2019: VIRTIO 1.1 csprd01.
QEMU participates in open source internship programs including Google Summer of Code (GSoC) and Outreachy. These full-time remote work opportunities allow talented new developers to get involved in our community. This post highlights what our interns achieved in 2018.
Julia Suvorova (Outreachy) and Steffen Görtz (GSoC) tackled adding emulation support for the micro:bit ARM board. Although QEMU already has plenty of ARM emulation code, the Cortex-M0 CPU used in the micro:bit was not yet implemented and the nRF51 system-on-chip was also missing.
The goal of this project was to run micro:bit programs (usually created with the MicroPython or Javascript/Blocks IDEs) with a core set of emulated devices, including the serial port, pushbuttons, and LEDs.
QEMU 3.1 already shipped the groundwork for the new qemu-system-arm -M
microbit
machine type. Enough functionality to run basic micro:bit programs
is expected in the next QEMU release.
This project was mentored by Jim Mussared, Joel Stanley, and Stefan Hajnoczi.
Shubham Jain (GSoC) created a REST API for the Patchew continuous integration system that is at the heart of QEMU’s development process. The previous API was not RESTful and exposed database schema internals.
The improvements to the REST API have been included into Patchew and are deployed on patchew.org. They are not in use yet, pending more work on authentication; this may be the topic of a future Summer of Code internship.
This project was mentored by Paolo Bonzini and Fam Zheng.
Emanuele Esposito (GSoC) enhanced QEMU’s test infrastructure with an engine that starts tests with all variants of devices that they are capable of driving.
This is a complicated task in QEMU since certain devices and buses are available in an architecture-specific way on each emulation target, making it hard to write test cases without lots of hardcoded dependencies - and to keep them up-to-date!
The qgraph framework that Emanuele created eliminates the need to hardcode each variant into the test. Emanuele also converted several existing tests. His framework was also presented at KVM Forum 2018 by Laurent Vivier and should be merged in 4.0.
This project was mentored by Paolo Bonzini and Laurent Vivier.
Nathan Gauër (GSoC) improved VirGL, which provides an OpenGL path well supported on Linux guests. On the host, QEMU offers several console back-ends, from EGL to SDL. Adding a Vulkan path will require to change the current VirGL API, write new guest drivers, and also offer a way to display the output. This is a huge task, which can be split in several sub-projects. Expending the current VirGL API to support Vulkan is the first step.
Code is available here.
This project was mentored by Marc-André Lureau.
QEMU will apply to Google Summer of Code and Outreachy again in 2019. We hope to offer more great open source internship opportunities for new developers.
Ever get included on an email thread that isn't relevant? It can be distracting to see new emails appear on a thread you already know is not interesting. You could mark them as read manually, but that is tedious.
This mute-thread script silences email threads that you don't want to read, even after new emails are received.
Download it here.
It relies on the awesome notmuch(1) email utility, so make sure you have that set up in order to use this script.
The following .muttrc macro integrates this with the mutt(1) email client. When you press M the entire email thread is muted:
macro index M "<enter-command>unset wait_key<enter><pipe-message>~/.mutt/mute-thread add<enter><enter-command>set wait_key<enter><read-thread>" "Mute thread"
After fetching new emails, run notmuch and then mute-thread apply.
If you change your mind, run mute-thread remove MESSAGE-ID to unmute a thread again. Future emails will not be silenced.
We would like to announce the availability of the QEMU 3.1.0 release. This release contains 1900+ commits from 189 authors.
You can grab the tarball from our download page. The full list of changes are available in the Wiki.
Highlights include:
Thank you to everyone involved!
Starting today, on December 1st, the first door of the QEMU Advent Calendar 2018 can now be opened! The advent calendar reveals a new disk image for download on each of the first 24 days in December 2018, to create a fun experience for the QEMU community, to celebrate the 15th anniversary of QEMU, and to provide some good images for testing the various CPU targets of QEMU – this year it will contain way more images for non-x86 targets than before, so if you are interested in collecting test images for the various CPU targets of QEMU, be sure to check the calendar regularly!
The libvirt library has long provided translations of its end user facing strings, which largely means error messages and console output from command line tools / daemons. Since libvirt uses autotools for its build system, it naturally used the standard automake integration provided by gettext for handling .po files. The libvirt.pot file with master strings is exported to Zanata, where the actual translation work is outsourced to the Fedora translation team who support up to ~100 languages. At time of writing libvirt has some level of translation in ~45 languages.
With use of Zanata, libvirt must periodically create an updated libvirt.pot file and push it to Zanata, and then just before release it must pull the latest translated .po files back into GIT for release.
There have been a number of problems with this approach which have been annoying us pretty much since the start, and earlier this year it finally became too much to bear any longer.
One suggested approach to deal with this is to stop storing translations in GIT at all and simply export them from Zanata only at time of ‘make dist’. The concern with this approach is that the GIT repository no longer contains the full source for the project in a self-contained manner. ‘make dist‘ now needs a live network connection to the Zanata servers. If we were to replace Zanata with a new tool in the future (Zanata is already a replacement for the previously used Transifex), we would potentially loose access to translations for old releases.
With this in mind we decided to optimize the way translations are managed in GIT.
The first easy win was to simply remove the master libvirt.pot file from GIT entirely. This file is auto-generated from the source files and is out of date the moment any source file changes, so no one would ever want to use the stored copy.
The second more complex step was to minimize and canonicalize the per-language translation files. msgmerge is used to take the full .po file and strip out the source file locations and sort the string alphabetically. A perl script is then used to further process the content dropping any translations marked as “fuzzy” and drop any strings for which there is no translated text available. The resulting output is still using the normal .po file format but we call these ‘.mini.po‘ files to indicate that they are stripped down compared to what you’d normally expect to see.
The final step was to remove the gettext / autotools integration and write a custom Makefile.am to handle the key tasks.
/usr/share/locale
for use at runtime. This avoids the need to ship the full .po files in release tarballs.After all this work was completed the key benefits are
while not having any visible change on the translators using Zanata. In the event anyone does need to see full translation languages outside of Zanata there is an extra step to generate the full .po files from the .mini.po files but this is countered by the fact that the result will be fully up to date with respect to translatable strings and source file locations.
I’d encourage any project which is using gettext autotools integration, while also outsourcing to a system like Zanata, to consider whether they’d benefit from taking similar steps to libvirt. Not all projects will get the same degree of space saving but diffstats with good signal/noise ratios and removing side effects from ‘make dist’ are wins that are likely desirable for any project.
Donations to Software Freedom Conservancy, the charity that acts as the legal home for QEMU and many other popular open source projects that don't run their own foundations or charities, are being matched again this year. That means your donation is doubled thanks to a group of donors who have pledged to match donations.
Software Freedom Conservancy helps projects with the details of running an open source project (legal advice, handling expenses, organizing conferences, etc) as well as taking a leading position on open source licensing and enforcement. Their work is not-for-profit and in the interest of the entire open source community.
If you want more projects like QEMU, Git, Samba, Inkscape, and Selenium to succeed as healthy open source communities, then donating to Software Freedom Conservancy is a good way to help.
Find out about becoming a Supporter here.
QEMU Advent Calendar is running again this year. Each day from December 1st through 24th a surprise QEMU disk image will be released for your entertainment.
Check out the website on December 1st for the first disk image:
https://www.qemu-advent-calendar.org/2018/
Thomas Huth is organizing QEMU Advent Calendar 2018 with the help of others from the QEMU community. If you want to contribute a disk image, take a look at the call for images email.
This year, we are celebrating the 15th anniversary of QEMU (QEMU 0.1 was announced in March 2003), and to contribute to this celebration, we will have another edition of the QEMU Advent Calendar this year. The new website for the advent calendar is now online at www.qemu-advent-calendar.org – but please do not try to open any of the doors before December 1st. We are also still looking for some images which we can present this year. If you would like to help, please have a look at the “QEMU Advent Calendar 2018 - Help wanted” mail that I have sent to the QEMU mailing lists.
Fedora 28 was released last week, so here are the fresh Fedora 29 images for qemu.
As usual the images don't have a root password. You have to set one using
virt-customize -a --root-password "password:<secret>"
,
otherwise you can't login after boot.
Some images use grub2 as bootloader, some use systemd-boot. The filename indicates which uses which. The x86_64 and i686 images can be booted with both uefi and bios firmware. The arm images come as grub2 variant only. systemd-boot doesn't support 32bit arm and crashes on 64bit arm.
The images can also be booted as container, using
systemd-nspawn --boot --image <file>
,
but you have to convert them to raw first as systemd-nspawn can't
handle qcow2.
The 32bit arm image (armhfp) isn't there because doesn't boot for me. Seems the fedora grub2-efi.armhfp package has trouble booting the kernel in qemu (with edk2 firmware). To be investigated if I find some time. Note: The fedora 28 image uses a custom grub2-efi.armhfp package as fedora didn't ship grub2-efi.armhfp in version 28.
The images have been created with imagefish.
I gave a talk about security in QEMU at KVM Forum 2018. It covers the architecture of QEMU and focusses on the attack surfaces that are exposed to guests. I hope it will be useful to anyone auditing or writing device emulation code. It also describes the key design principles for isolating the QEMU process and limiting the damage that can be done if a guest escapes.
The video of the talk is now available:
The slides are available here (PDF).
lets start with some history ...
It was introduced by IBM in 1987. It had a bunch of new features, and also included old ones which where already present in the predecessor devices CGA and EGA, including:
The VGA has 256k of video memory and it is accessed using a memory window at 0xa0000. It is not possible to access all video memory at the same time, you have to set bank registers to map the piece of memory you want access into the window.
All vga devices emulated by qemu support this.
In the early 90ies various enhanced VGA cards, typically named "Super VGA" (abbreviated SVGA) became available from various vendors. The cirrus vga emulated by qemu is a typical SVGA card which was quite popular back then. They add various new features:
All SVGA devices in qemu (except cirrus) have support for the bochs display interface. That interface was implemented by the bochs emulator first (this is where the name comes from). It was implemented in qemu too. For the qemu standard vga it is the primary interface. qxl-vga, virtio-vga and vmsvga support the bochs dispi interface when they are in vga compatibility mode, which is typically the case at boot, before the guest os loads the native display driver.
The bochs display interface is a paravirtual interface, with just the bare essentials to set video modes on a virtual display device. There are no registers for clock rate and other timing stuff for example.
Traditionally the bochs display interface uses I/O ports 0x1ce (bochs register index) and 0x1cf (bochs register data), As both registers are 16bit the data registers is unaligned, which does not work on non-x86 archs, so 0x1d0 is supported as data port too.
Lets have a look at what modern guests are doing in the graphics field:
So, we have a lot of rather complex code to emulate features not used at all by modern guests. There have been security bugs in the past in that complex but largely unused code ...
Turns out: yes, we can. First step already happened in qemu 1.3. The qemu stdvga got a MMIO bar. The MMIO bar can be used as alternative way to access the vga registers and also the bochs dispi interface registers.
OVMF (UEFI implementation for qemu) uses the MMIO bar. The bochs-drm.ko linux kms driver uses the MMIO bar too. In fact, both use the bochs display interface registers only, except for setting the unblank bit so the screen will not stay black.
So, the guest code already ignores the vga emulation. Cool. We can build on that.
New display device. Merged in qemu 3.0. Featues:
So, all set for UEFI guests. You can switch from stdvga to bochs-display, and everything continues to work fine.
Accessing the vga hardware directly for text mode is rare these days. Typically seabios and linux boot loaders call vgabios functions to render text on the display. So, we can hook in there and support text rendering without the hardware actually having text mode support. A very simliar approach is taken by sgabios, to redirect vga text output to the serial line.
Luckily we are not the first ones facing that problem. coreboot can initialize the graphics hardware and setup a framebuffer with the native display resolution. Having to switch back to text mode when running seabios as coreboot payload is not exactly nice. So, there is a vgabios variant for coreboot which renders text to a framebuffer.
So, take that, tweak the initialization code to program the bochs dispi interface instead of looking for a framebuffer setup by coreboot, and we are ready to go. Seabios boot messages show up on the bochs-display framebuffer. Yay!
This will work out-of-the-box in qemu 3.1. The vgabios is already present in qemu 3.0, but due to a bug it is not installed by default, it must be copyed over manually to get things going.
There are some drawbacks, which may or may not be a problem depending on your use case:
That's it. Enjoy the new legacy-free display device.
$ cat /etc/udev/rules.d/99-cole-nic-names.rules
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="70:8b:cd:80:e5:5f", ATTR{type}=="1", NAME="lan0main"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="68:05:ca:1a:f5:da", ATTR{type}=="1", NAME="lan1pcie"
KVM virtualization is now supported on IBM z Systems. However, this feature is only available in the newly introduced user space based on kernel version 4.14, provided by the kernel-alt packages.See here for further details.
virt-builder is a tool for rapidly creating customized Linux images. Recently I’ve added support for Windows although for rather obvious licensing reasons we cannot distribute the Windows templates which would be needed to provide Windows support for everyone. However you can build your own Windows templates as described here and then:
$ virt-builder -l | grep windows windows-10.0-server x86_64 Windows Server 2016 (x86_64) windows-6.2-server x86_64 Windows Server 2012 (x86_64) windows-6.3-server x86_64 Windows Server 2012 R2 (x86_64) $ virt-builder windows-6.3-server [ 0.6] Downloading: http://xx/builder/windows-6.3-server.xz [ 5.1] Planning how to build this image [ 5.1] Uncompressing [ 60.1] Opening the new disk [ 77.6] Setting a random seed virt-builder: warning: random seed could not be set for this type of guest virt-builder: warning: passwords could not be set for this type of guest [ 77.6] Finishing off Output file: windows-6.3-server.img Output size: 10.0G Output format: raw Total usable space: 9.7G Free space: 3.5G (36%)
To build a Windows template repository you will need the latest libguestfs sources checked out from https://github.com/libguestfs/libguestfs and you will also need a suitable Windows Volume License, KMS or MSDN developer subscription. Also the final Windows templates are at least ten times larger than Linux templates, so virt-builder operations take correspondingly longer and use lots more disk space.
First download install ISOs for the Windows guests you want to use.
After cloning the latest libguestfs sources, go into the builder/templates
subdirectory. Edit the top of the make-template.ml
script to set the path which contains the Windows ISOs. You will also possibly need to edit the names of the ISOs later in the script.
Build a template, eg:
$ ../../run ./make-template.ml windows 2k12 x86_64
You’ll need to read the script to understand what the arguments do. The script will ask you for the product key, where you should enter the volume license key or your MSDN key.
Each time you run the script successfully you’ll end up with two files called something like:
windows-6.2-server.xz windows-6.2-server.index-fragment
The version numbers are Windows internal version numbers.
After you’ve created templates for all the Windows guest types you need, copy them to any (private) web server, and concatenate all the index fragments into the final index file:
$ cat *.index-fragment > index
Finally create a virt-builder repo file pointing to this index file:
# cat /etc/virt-builder/repos.d/windows.conf [windows] uri=http://xx/builder/index
You can now create Windows guests in virt-builder. However note they are not sysprepped. We can’t do this because it requires some Windows tooling. So while these guests are good for small tests and similar, they’re not suitable for creating actual Windows long-lived VMs. To do that you will need to add a sysprep.exe step somewhere in the template creation process.
Note that this is a blog post for (new) QEMU developers. If you are just interested in using QEMU, you can certainly skip this text. Otherwise, in case you have ever been in touch with the QEMU device model (“qdev”), you are likely aware of the basic qdev code boilerplate already:
static void mydev_realize(DeviceState *dev, Error **errp)
{
/* callback function that is run during device "realization" */
}
static void mydev_instance_init(Object *obj)
{
/* callback function that is run during device instance init */
}
static Property mydev_properties[] = {
DEFINE_PROP_xxx("myprop", MyDevState, field, ...),
/* ... */
DEFINE_PROP_END_OF_LIST(),
};
static void mydev_class_init(ObjectClass *oc, void *data)
{
DeviceClass *dc = DEVICE_CLASS(oc);
dc->realize = mydev_realize;
dc->desc = "My cool device";
dc->props = mydev_properties;
/* ... and other device class setup code ... */
}
static const TypeInfo mydev_info = {
.name = TYPE_MYDEV,
.parent = TYPE_SYS_BUS_DEVICE, /* or something else */
.instance_size = sizeof(mydev_state),
.instance_init = mydev_instance_init,
.class_init = mydev_class_init,
};
static void mydev_register_types(void)
{
type_register_static(&mydev_info);
}
type_init(mydev_register_types)
There are three different initialization functions involved here, the class_init, the instance_init and the realize function. While it is quite obvious to distinguish the class_init function from the two others (it is used for initializing the class data, not the data that is used for an instance … this is similar to the object model with classes and instances in C++), I initially always wondered about the difference between the instance_init() and the realize() functions. Having fixed quite a lot of related bugs in the past months in the QEMU code base, I now know that a lot of other people are also not properly aware of the difference here, so I think it is now time to write down some information that I’m now aware of, to make sure that I don’t forget about this again, and maybe help others to avoid related bugs in the future ;-)
First it is of course always a good idea to have a look at the documentation. While the documentation of TypeInfo (where instance_init() is defined) is not very helpful to understand the differences, the documentation of DeviceClass (where realize() is defined) has some more useful information: You can learn here that the object instantiation is done first, before the device is realized, i.e. the instance_init() function is called first, and the realize() function is called afterwards. The former must not fail, while the latter can return an error to its caller via a pointer to an “Error” object pointer.
So the basic idea here is that device objects are first instantiated, then
these objects can be inspected for their interfaces and their creators can
set up their properties to configure their settings and wire them up with
other devices, before the device finally becomes “active” by being realized.
It is important here to notice that devices can be instantiated (and also
finalized) without being realized! This happens for example if the device
is introspected: If you enter for example device_add xyz,help
at the HMP
monitor, or if you send the device-list-properties
QOM command to QEMU to
retrieve the device’s properties, QEMU creates a temporary instance of the
device to query the properties of the object, without realizing it. The
object gets destroyed (“finalized”) immediately afterwards.
Knowing this, you can avoid a set of bugs which could be found with a couple of devices in the past:
If you want your device to provide properties for other parts of the QEMU
code or for the users, and you want to add those properties via one of the
many object_property_add*() functions of QEMU (instead of using the “props”
field of the DeviceClass), then you should do this in the instance_init()
and not in the realize() function. Otherwise the properties won’t show up
when the user runs --device xyz,help
or the device-list-properties
QOM
command to get some information about your device.
instance_init() functions must really never fail, i.e. also not call
abort() or exit(). Otherwise QEMU can terminate unexpectedly when a
user simply wanted to have a look at the list of device properties with
device_add xyz,help
or the device-list-properties
QOM command.
If your device cannot work in certain circumstances, check for the error
condition in the realize() function instead and return with an appropriate
error there.
Never assume that your device is always instantiated only with the machine
that it was designed for. It’s of course a good idea to set the
“user_creatable = false” flag in the DeviceClass of your device if
your device cannot be plugged in arbitrary machines. But device introspection
can still happen at any time, with any machine.
So if you wrote a device called “mydev-a” that only works with --machine A
,
the user still can start QEMU with the option --machine B
instead and then
run device_add mydev-a,help
or the device-list-properties
QOM command.
The instance_init() function of your device will be called to create a
temporary instance of your device, even though the base machine is B and not
A here. So you especially should take care to not depend on the availability
of certain buses or other devices in the instance_init() function, nor use
things like serial_hd() or nd_table[] in your instance_init() function,
since these might (and should) have been used by the machine init function
already. If your device needs to be wired up, provide properties as
interfaces to the outside and let the creator of your device (e.g. the
machine init code) wire your device between the device instantiation and the
realize phase instead.
Make sure that your device leaves a clean state after a temporary instance is destroyed again, i.e. don’t assume that there will be only one instance of your device which is created at the beginning right after QEMU has been started and is destroyed at the very end before QEMU terminates. Thus do not assume that the things that you do in your instance_init() don’t need explicit clean-up since the device instance will only be destroyed when QEMU terminates. Device instances can be created and destroyed at any time, so when the device is finalized, you must not leave any dangling pointers or references to your device behind you, e.g. in the QOM tree. When you create other objects in your instance_init() function, make sure to set proper parents of these objects or use an instance_finalize() function, so that the created objects get cleaned up correctly again when your device is destroyed.
All in all, if you write code for a new QEMU device, it is likely a good idea to use the instance_init() function only for e.g. creating properties and other things that are required before device realization, and then do the main work in the realize() function instead.