Blogging about open source virtualization

News from QEMU, KVM, libvirt, libguestfs, virt-manager and related tools


Planet Feeds

August 19, 2019

KVM on Z

QEMU v4.1 released

QEMU v4.1 is out. For highlights from a KVM on Z perspective see the Release Notes.
Note: The DASD IPL feature is still considered experimental.

by Stefan Raspl ( at August 19, 2019 11:32 AM

August 16, 2019

QEMU project

QEMU version 4.1.0 released

We would like to announce the availability of the QEMU 4.1.0 release. This release contains 2000+ commits from 176 authors.

You can grab the tarball from our download page. The full list of changes are available in the Wiki.

Highlights include:

  • ARM: FPU emulation support for Cortex-M CPUs, FPU fixes for Cortex-R5F
  • ARM: ARMv8.5-RNG extension support for CPU-generated random numbers
  • ARM: board build options now configurable via new Kconfig-based system
  • ARM: Exynos4210 SoC model now supports PL330 DMA controllers
  • MIPS: improved emulation performance of numerous MSA instructions, mostly integer and data permuting operations
  • MIPS: improved support for MSA ASE instructions on big-endian hosts, handling for ‘division by zero’ cases now matches reference hardware
  • PowerPC: pseries: support for NVIDIA V100 GPU/NVLink2 passthrough via VFIO
  • PowerPC: pseries: in-kernel acceleration for XIVE interrupt controller
  • PowerPC: pseries: supporting for hot-plugging PCI host bridges
  • PowerPC: emulation optimizations for vector (Altivec/VSX) instructions
  • RISC-V: support for new “spike” machine model
  • RISC-V: ISA 1.11.0 support for privileged architectures
  • RISC-V: improvements for 32-bit syscall ABI, illegal instruction handling, and built-in debugger
  • RISC-V: support for CPU topology in device trees
  • s390: bios support for booting from ECKD DASD assigned to guest via vfio-ccw
  • s390: emulation support for all “Vector Facility” instructions
  • s390: additional facilities and support for gen15 machines, including support for AP Queue Interruption Facility for using interrupts for vfio-ap devices
  • SPARC: sun4m: sun4u: fixes when running with -vga none (OpenBIOS)
  • x86: emulation support for new Hygon Dhyana and Intel SnowRidge CPU models
  • x86: emulation support for RDRAND extension
  • x86: md-clear/mds-no feature flags, for detection/mitigation of MDS vulnerabilities (CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091)
  • x86: CPU die topology now configurable using -smp …,dies=
  • Xtensa: support for memory protection unit (MPU) option
  • Xtensa: support for Exclusive Access option
  • GUI: virtio-gpu 2D/3D rendering may now be offloaded to an external vhost-user process, such as QEMU vhost-user-gpu
  • GUI: semihosting output can now be redirected to a chardev backend
  • qemu-img: added a –salvage option to qemu-img convert, which prevents the conversion process from aborting on I/O errors (can be used for example to salvage partially corrupted qcow2 files)
  • qemu-img: qemu-img rebase works now even when the input file doesn’t have a backing file yet
  • VMDK block driver now has read-only support for the seSparse subformat
  • GPIO: support for SiFive GPIO controller
  • and lots more…

Thank you to everyone involved!

August 16, 2019 05:50 AM

August 09, 2019

KVM on Z

New Documentation: Configuring Crypto Express Adapters for KVM Guests

See here for a new documentation release on how to configure Crypto Express adapters for KVM guests.

by Stefan Raspl ( at August 09, 2019 02:49 PM

August 07, 2019

Daniel Berrange

ANNOUNCE: gtk-vnc 1.0.0 release

I’m pleased to announce a new release of GTK-VNC, version 1.0.0. (211K)
sha256sum: a81a1f1a79ad4618027628ffac27d3391524c063d9411c7a36a5ec3380e6c080

Pay particular attention to the first two major changes in
this release:

  • Autotools build system replaced with meson
  • Support for GTK-2 is dropped. GTK-3 is mandatory
  • Support for libview is dropped in example program
  • Improvements to example demos
  • Use MAP_ANON if MAP_ANONYMOUS doesn’t exist to help certain macOS versions
  • Fix crash when connection attempt fails early
  • Initialize gcrypt early in auth process
  • Emit vnc-auth-failure signal when SASL auth fals
  • Emit vnc-error signal when authentication fails
  • Fix double free when failing to read certificates
  • Run unit tests in RPM build
  • Modernize RPM spec
  • Fix race condition in unit tests
  • Fix install of missing header for cairo framebuffer
  • Fix typo in gir package name
  • Add missing VncPixelFormat file to gir data

Thanks to all those who reported bugs and provides patches that went into this new release.

by Daniel Berrange at August 07, 2019 03:06 PM

August 05, 2019

Stefan Hajnoczi

Determining why a Linux syscall failed

One is often left wondering what caused an errno value when a system call fails. Figuring out the reason can be tricky because a single errno value can have multiple causes. Applications get an errno integer and no additional information about what went wrong in the kernel.

There are several ways to determine the reason for a system call failure (from easiest to most involved):

  1. Check the system call's man page for the meaning of the errno value. Sometimes this is enough to explain the failure.
  2. Check the kernel log using dmesg(1). If something went seriously wrong (like a hardware error) then there may be a detailed error information. It may help to increase the kernel log level.
  3. Read the kernel source code to understand various error code paths and identify the most relevant one.
  4. Use the function graph tracer to see which code path was taken.
  5. Add printk() calls, recompile the kernel (module), and rerun to see the output.

Reading the man page and checking dmesg(1) are fairly easy for application developers and do not require knowledge of kernel internals. If this does not produce an answer then it is necessary to look closely at the kernel source code to understand a system call's error code paths.

This post discusses the function graph tracer and how it can be used to identify system call failures without recompiling the kernel. This is useful because running a custom kernel may not be possible (e.g. due to security or reliability concerns) and recompiling the kernel is slow.

An example

In order to explore some debugging techniques let's take the io_uring_setup(2) system call as an example. It is failing with ENOMEM but the system is not under memory pressure, so ENOMEM is not expected.

The io_uring_setup(2) source code (fs/io_uring.c) contains many ENOMEM locations but it is not possible to conclusively identify which one is failing. The next step is to determine which code path is taken using dynamic instrumentation.

The function graph tracer

The Linux function graph tracer records kernel function entries and returns so that function call relationships are made apparent. The io_uring_setup(2) system call is failing with ENOMEM but it is unclear at which point in the system call this happens. It is possible to find the answer by studying the function call graph produced by the tracer and following along in the Linux source code.

Since io_uring_setup(2) is a system call it's not an ordinary C function definition and has a special symbol name in the kernel ELF file. It is possible to look up the (architecture-specific) symbol for the currently running kernel:

# grep io_uring_setup /proc/kallsyms
ffffffffbd357130 T __x64_sys_io_uring_setup

Let's trace all __x64_sys_io_uring_setup calls:

# cd /sys/kernel/debug/tracing
# echo '__x64_sys_io_uring_setup' > set_graph_function
# echo 'function_graph' >current_tracer
# cat trace_pipe >/tmp/trace.log run the application in another terminal...
The trace contains many successful io_uring_setup(2) calls that look like this:
 1)               |  __x64_sys_io_uring_setup() {
1) | io_uring_setup() {
1) | capable() {
1) | ns_capable_common() {
1) | security_capable() {
1) 0.199 us | cap_capable();
1) 7.095 us | }
1) 7.594 us | }
1) 8.059 us | }
1) | kmem_cache_alloc_trace() {
1) | _cond_resched() {
1) 0.244 us | rcu_all_qs();
1) 0.708 us | }
1) 0.208 us | should_failslab();
1) 0.220 us | memcg_kmem_put_cache();
1) 2.201 us | }
1) | fd_install() {
1) 0.223 us | __fd_install();
1) 0.643 us | }
1) ! 190.396 us | }
1) ! 216.236 us | }

Although the goal is to understand system call failures, looking at a successful invocation can be useful too. Failed calls in trace output can be identified on the basis that they differ from successful calls. This knowledge can be valuable when searching through large trace files. A failed io_uring_setup(2) call aborts early and does not invoke fd_install(). Now it is possible to find a failed call amongst all the io_uring_setup(2) calls:

 2)               |  __x64_sys_io_uring_setup() {
2) | io_uring_setup() {
2) | capable() {
2) | ns_capable_common() {
2) | security_capable() {
2) 0.236 us | cap_capable();
2) 0.872 us | }
2) 1.419 us | }
2) 1.951 us | }
2) 0.419 us | free_uid();
2) 3.389 us | }
2) + 48.769 us | }

The fs/io_uring.c code shows the likely error code paths:

        account_mem = !capable(CAP_IPC_LOCK);

if (account_mem) {
ret = io_account_mem(user,
ring_pages(p->sq_entries, p->cq_entries));
if (ret) {
return ret;

ctx = io_ring_ctx_alloc(p);
if (!ctx) {
if (account_mem)
io_unaccount_mem(user, ring_pages(p->sq_entries,
return -ENOMEM;

But is there enough information in the trace to determine which of these return statements is executed? The trace shows free_uid() so we can be confident that both these code paths are valid candidates. By looking back at the success code path we can use the kmem_cache_alloc_trace() as a landmark. It is called by io_ring_ctx_alloc() so we should see kmem_cache_alloc_trace() in the trace before free_uid() if the second return statement is taken. Since it does not appear in the trace output we conclude that the first return statement is being taken!

When trace output is inconclusive

Function graph tracer output only shows functions in the ELF file. When the compiler inlines code, no entry or return is recorded in the function graph trace. This can make it hard to identify the exact return statement taken in a long function. Functions containing few function calls and many conditional branches are also difficult to analyze from just a function graph trace.

We can enhance our understanding of the trace by adding dynamic probes that record function arguments, local variables, and/or return values via perf-probe(2). By knowing these values we can make inferences about the code path being taken.

If this is not enough to infer which code path is being taken, detailed code coverage information is necessary.

One way to approximate code coverage is using a sampling CPU profiler, like perf(1), and letting it run under load for some time to gather statistics on which code paths are executed frequently. This is not as precise as code coverage tools, which record each branch encountered in a program, but it can be enough to observe code paths in functions that are not amenable to the function graph tracer due to the low number of function calls.

This is done as follows:

  1. Run the system call in question in a tight loop so the CPU is spending a significant amount of time in the code path you wish to observe.
  2. Start perf record -a and let it run for 30 seconds.
  3. Stop perf-record(1) and run perf-report(1) to view the annotated source code of the function in question.

The error code path should have a significant number of profiler samples and it should be prominent in the pref-report(1) annotated output.


Determining the cause for a system call failure can be hard work. The function graph tracer is helpful in shedding light on the code paths being taken by the kernel. Additional debugging is possible using perf-probe(2) and the sampling profiler, so that in most cases it's not necessary to recompile the kernel with printk() just to learn why a system call is failing.

by Unknown ( at August 05, 2019 03:54 PM

July 30, 2019

Cole Robinson

Blog moved to Pelican and GitHub Pages

I've moved my blog from to a static site generated with Pelican and hosted on GitHub Pages. This is a dump of some of the details.

The content is hosted in three branches across two repos:

The motivation for the split is that according to this pelican SEO article, master branches of GitHub repos are indexed by google, so if you store HTML content in a master branch your canonical blog might be battling your GitHub repo in the search results. And since you can only put content in the master branch of a $ repo, I added a separate blog.git repo. Maybe I could shove all the content into the blog/gh-pages branch I think dealing with multiple subdomains prevents it. I've already spent too much timing playing with all this stuff though so that's for another day to figure out. Of course, suggestions welcome, blog comments are enabled with Disqus.

One issue I hit is that pushing updated content to blog/gh-pages doesn't consistently trigger a new GitHub Pages deployment. There's a bunch of hits about this around the web (this stackoverflow post in particular) but no authoritative explanation about what criteria GitHub Pages uses to determine whether to redeploy. The simplest 'fix' I found is to tweak the index.html content via the GitHub web UI and commit the change which seems to consistently trigger a refresh as reported by the repo's deployments page.

You may notice the blog looks a lot like stock Jekyll with its minima theme. I didn't find any Pelican theme that I liked as much as minima, so I grabbed the CSS from a minima instance and started adapting the Pelican simple-bootstrap theme to use it. The end result is basically a simple reimplementation of minima for Pelican. I learned a lot in the process but it likely would have been much simpler if I just used Jekyll in the first place, but I'm in too deep to switch now!

by Cole Robinson at July 30, 2019 07:30 PM

July 12, 2019

KVM on Z

KVM at SHARE Pittsburgh 2019

Yes, we will be at SHARE in Pittsburgh this August!
See the following session in the Linux and VM/Virtualization track:

  • KVM on IBM Z News (Session #25978): Latest news on our development work with the open source community

by Stefan Raspl ( at July 12, 2019 04:42 PM

July 10, 2019

Cornelia Huck

s390x changes in QEMU 4.1

QEMU has just entered hard freeze for 4.1, so the time is here again to summarize the s390x changes for that release.


  • All instructions that have been introduced with the "Vector Facility" in the z13 machines are now emulated by QEMU. In particular, this allows Linux distributions built for z13 or later to be run under TCG (vector instructions are generated when we compile for z13; other z13 facilities are optional.)

CPU Models

  • As the needed prerequisites in TCG now have been implemented, the "qemu" cpu model now includes the "Vector Facility" and has been bumped to a stripped-down z13.
  • Models for the upcoming gen15 machines (the official name is not yet known) and some new facilities have been added.
  • If the host kernel supports it, we now indicate the AP Queue Interruption facility. This is used by vfio-ap and allows to provide interrupts for AP to the guest.

I/O Devices

  • vfio-ccw has gained support for relaying HALT SUBCHANNEL and CLEAR SUBCHANNEL requests from the guest to the device, if the host kernel vfio-ccw driver supports it. Otherwise, these instructions continue to be emulated by QEMU, as before.
  • The bios now supports IPLing (booting) from DASD attached via vfio-ccw.


  • The bios tolerates signatures written by zipl, if present; but it does not actually handle them. See the 'secure' option for zipl introduced in s390-tools 2.9.0.
And the usual fixes and cleanups.

by Cornelia Huck ( at July 10, 2019 02:16 PM

July 08, 2019

KVM on Z

SLES 15 SP1 released

SLES 15 SP1 is out! See the announcement and their release notes with Z-specific changes.
It ships the following code level updates:
  • QEMU v3.1 (GA: v2.11)
  • libvirt v5.1 (GA: v4.0)
See previous blog entries on QEMU v2.12, v3.0 and v3.1 for details on new features that become available with the QEMU package update.
Furthermore, SLES 15 SP1 introduces the kvm_stat tool, which can be used for guest event analysis.

by Stefan Raspl ( at July 08, 2019 10:14 PM

June 16, 2019

Gerd Hoffmann

macos guest support for qemu

display support

After one of the minor updates (10.14.3 or 10.14.4, don't remember) my macOS Mojave guest started to switch video modes at boot. Also the "Display" panel in "System Preferences" started to offer three video modes to choose from. Unfortunaly FullHD (aka 1920x1080) is not on the list.

Decided to look into this. Assuming that macOS learned to switch video modes using the EFI GOP interface I've tweaked the mode list in OVMF (QemuVideoDxe driver). No effect. Huh?

Next I've looked at the list of drivers, using kextstat. Found a AppleVirtualGraphics.kext entry. Interesting. Checking out Info.plist (in /System/Library/Extensions/AppleVirtualGraphics.kext/Contents) I found this:

    [ ... ]
            [ ... ]
            [ ... ]
            [ ... ]
            [ ... ]
    [ ... ]

So recent macOS Mojave ships with a driver for qemu stdvga and qemu cirrus vga. Nice. Unfortunaly the question how to switch the display into 1920x1080 mode (to match the hosts display resolution) isn't solved yet.

virtio support

While looking around I've noticed there is a AppleVirtIO.kext too, with this in Info.plist:

    [ ... ]
        [ ... ]
        [ ... ]
        [ ... ]
        [ ... ]
        [ ... ]

Apparently a virtio driver with support for virtio-console/serial, virtio-blk and virtio-9p.

Tried to switch the system disk from sata to virtio-blk. Clover seems to be slightly confused. It stops showing the nice disk icons. But booting macOS works fine regardless. When using the transitional device, so the driver supports legacy mode only.

virtio-9p for filesystem sharing looks pretty interesting too. So, lets try that (in libvirt xml):

  [ ... ]
    [ ... ]
    <filesystem type='mount' accessmode='mapped'>
      <source dir='/path/to/some/host/directory'/>
      <target dir='/test9p'/>
    [ ... ]
  [ ... ]

macOS seems to not mount the filesystem automatically. But it is easy to do, using the terminal. You need to create the target directory first (sudo mkdir /test9p). Then run sudo mount_9p. Done.

by Gerd Hoffmann at June 16, 2019 10:00 PM

June 05, 2019

Gerd Hoffmann

recent qemu sound improvements

The qemu sound system got a bunch of improvements in 2018 and 2019.

New in qemu 3.0

The hda emulation uses a high resolution timer now to better emulate the timing-sensitive dma transfer of sound samples. Credits for this implementation go to Martin Schrodt.

Unfortunaly this is incompatible with older qemu versions, so it is only enabled for 3.0+ machine type versions. So upgrading qemu is not enough to get this, you also have to make sure you are using a new enough machine type (qemu -M command line switch).

libvirt stores the machine type in the domain xml when the guest is created. It is never updated automatically. So have a look at your domain configuration (using virsh edit domain-name for example) and check the version is 3.0 or newer:

[ ... ]
    <type arch='x86_64' machine='pc-q35-3.0'>hvm</type>
[ ... ]

New in qemu 3.1

The pulseaudio backend got fixes in 3.1, so if you are using pulse you should upgrade to at least qemu version 3.1.

New in qemu upcoming 4.0

Yet another pulseaudio bugfix.

Initial support for the -audiodev command line switch was finally merged. So audio support is not the odd kid any more which is configured in a completely different way, using environment variables instead of command line switches. Credits for this go to Kővágó, Zoltán.

In the pipeline

There are more -audiodev improvements in the pipeline, they are expected to land upstream in the 4.1 or 4.2 devel cycle.

Latency tuning

While being at it one final note:

Bugs in qemu sound device emulation and audio backends are not the only possible root cause for bad sound quality. Crackling sound -- typically caused by buffer underruns -- can also be caused by latency problems elsewhere in qemu.

One known offender is disk I/O, specifically the linux aio support which isn't as async as it should be and blocks now and then. linux aio support is configured with io=native for block device backends.

Better choice is io=threads. In libvirt xml:

[ ... ]
    <disk type='...' device='disk'>
      <driver name='qemu' type='...' cache='none' io='threads'/>
[ ... ]

Another known issue is spice audio compression, so better turn that off when using spice:

[ ... ]
    <graphics type='spice'>
      [ ... ]
      <playback compression='off'/>
[ ... ]

by Gerd Hoffmann at June 05, 2019 10:00 PM

May 22, 2019

QEMU project

QEMU 4.0 adds micro:bit emulation support

micro:bit emulation support is available from QEMU 4.0 onwards and can be used for low-level software testing and development. Unlike existing micro:bit simulators, QEMU performs full-system emulation and actually runs the same ARM code as the real hardware. This blog post explains what full-system emulation means and why QEMU is now a useful tool for developing micro:bit software.

The micro:bit is a tiny ARM board designed for teaching. It is increasingly being used around the world to expose children to computers, programming, and electronics in a low-cost way with an active online community that shares project ideas, lesson plans, and programming tips.

micro:bit board

Simulators and emulators

Simulators are used for many tasks from mobile app development to performance analysis of computer hardware. It is possible to develop code using a simulator without having access to real hardware. Oftentimes using a simulator is more convenient than flashing and debugging programs on real hardware.

Emulators allow programs written for one computer system to run on a different computer system. They use techniques like machine code interpreters and just-in-time compilers to execute guest programs that do not run natively on the host computer. Each CPU instruction must be correctly implemented by the emulator so it can run guest software.

How existing micro:bit simulators work

Simulators can be implemented at various layers in the software stack. The MakeCode editor for JavaScript development includes a micro:bit simulator:

MakeCode editor

This simulator does not execute any ARM code and is therefore not running the same CPU instructions as a real micro:bit. Instead it reuses the JavaScript engine already available in your web browser to execute micro:bit JavaScript programs. This is achieved by providing the micro:bit JavaScript APIs that micro:bit programs expect. The programs don’t need to know whether those APIs are implemented by the real micro:bit software stack or whether they are actually calling into the MakeCode simulator.

In the screenshot above the micro:bit program calls showString("Hello world!") and this becomes a call into the MakeCode simulator code to render images of LEDs in the web browser. On real hardware the code path is different and eventually leads to an LED matrix driver that lights up the LEDs by driving output pins on the micro:bit board.

Full-system emulation

Unlike the MakeCode simulator, QEMU emulates the micro:bit CPU and boots from the same ARM code as the real micro:bit board. The simulation happens at the CPU instruction and hardware interface level instead of at the JavaScript API level. This is called full-system emulation because the entire guest software environment is present.

What are the advantages of full-system emulation?

  • Programs written in any language can run (MicroPython, mbed C/C++, etc)
  • Boot, device driver, and language run-time code can be tested
  • Bugs in lower layers of the software stack can be reproduced
  • CPU architecture-specific bugs can be reproduced (stack and memory corruption bugs)
  • A debugger can be connected to inspect the entire software stack

The main disadvantage of full-system emulation is that the performance overhead is higher since simulation happens at the CPU instruction level. Programs consist of many CPU instructions so the task of emulation is performance-sensitive. Luckily the micro:bit’s CPU is much less powerful than CPUs available in our laptops and desktops, so programs execute at a reasonable speed.

Running micro:bit programs on QEMU

QEMU emulates the core devices on the micro:bit, including the serial port (UART) and timers. This is enough for developing and testing low-level software but does not offer the LEDs, radio, and other devices that most micro:bit programs rely on. These devices might be emulated by QEMU in the future, but for now the main use of QEMU is for developing and testing low-level micro:bit code.

To run test.hex:

$ qemu-system-arm -M microbit -device loader,file=test.hex -serial stdio

Any output written to the serial port is printed to the terminal by QEMU.

Debugging micro:bit programs with QEMU and GDB

QEMU has GDB guest debugging support. This means GDB can connect to QEMU in order to debug the guest software. This is similar to debugging a real system over JTAG, except no hardware is necessary!

Connect with GDB to debug the guest:

$ qemu-system-arm -M microbit -device loader,file=test.hex -s
$ gdb
(gdb) target remote tcp:
(gdb) x/10i $pc
=> 0x161c4:	ldr	r3, [r4, #0]
   0x161c6:	cmp	r3, #0
   0x161c8:	beq.n	0x161d2
   0x161ca:	ldr	r3, [pc, #48]	; (0x161fc)
   0x161cc:	ldr	r3, [r3, #0]
   0x161ce:	cmp	r3, #0
   0x161d0:	bne.n	0x161d8
   0x161d2:	movs	r0, #6
   0x161d4:	bl	0x16160
   0x161d8:	ldr	r0, [r4, #0]

Having a debugger is very powerful. QEMU can also load ELF files in addition to the popular .hex files used for micro:bit programs. ELF files can contain debugging information that enables source-level debugging so GDB can display function and variable names as well as listing the source code instead of showing assembly instructions.


QEMU now offers a platform for developing and testing micro:bit programs. It is open to future extension, hopefully to emulate more devices and offer a graphical user interface.

micro:bit emulation was contributed by Julia Suvorova and Steffen Görtz as part of their Outreachy and Google Summer of Code internships with QEMU. Jim Mussared, Joel Stanley, and Stefan Hajnoczi acted as mentors and contributed patches as well.

May 22, 2019 10:45 AM

May 17, 2019

KVM on Z

QEMU v3.1 released

QEMU v3.1 is out. Besides a number of small enhancements, some items that we would like to highlight from a KVM on Z perspective:
  • Huge Pages Support: KVM guests can now utilize 1MB pages. As this removes one layer of address translation for the guest backing, less page-faults need to be processed, and less translation lookaside buffer (TLB) entries are needed to hold translations. This, as well as the TLB improvements in z14, will improve KVM guest performance.
    To use:
    Create config file /etc/modprobe.d/kvmhpage.conf file with the following content to enable huge pages for KVM:

       options kvm hpage=1

    Furthermore, add the following line to /etc/sysctl.conf to reserve N huge pages:

       vm.nr_hugepages = N

    Alternatively, append the following statement to the kernel parameter line in case support is compiled into the kernel: kvm.hpage=1 hugepages=N.
    Note that means to add hugepages dynamically after boot exist, but with effects like memory fragmentation, it is preferable to define huge pages as early as possible.
    If successful, the file /proc/sys/vm/nr_hugepages should show N huge pages. See here for further documentation.
    Then, to enable huge pages for a guest, add the following element to the respective domain XML:


    The use of huge pages in the host is orthogonal to the use of huge pages in the guest. Both will improve the performance independently by reducing the number of page faults and the number of page table walks after a TLB miss.
    The biggest performance improvement can be achieved by using huge pages in both, host and guest, e.g. with libhugetlbfs, as this will also make use of the larger 1M TLB entries in the hardware.
    Requires Linux kernel 4.19.
  • vfio-ap: The Adjunct Processor (AP) facility is an IBM Z cryptographic facility comprised of three AP instructions and up to 256 cryptographic adapter cards. Each adapter card is partitioned into up to 85 domains, each of which provides cryptographic services. An AP queue is the means by which AP messages are sent to and received from an AP adapter. Each AP queue is connected to a particular domain within a particular adapter. vfio-ap enables assignment of a subset of AP adapters and domains to one or more guests such that each guest has exclusive access to a discrete set of AP queues.
    Here is a small sample script illustrating host setup:

       # load vfio-ap device driver
       modprobe vfio-ap

       # reserve domain 7 for use by KVM guests
       echo -0x7 > /sys/bus/ap/aqmask
       # to reserve all domains of an adapter, use the following
       # line instead (by uncommenting it), and replace NN with the
       # adapter number:
       # echo -0xNN > /sys/bus/ap/apmask

       # create a mediated device (mdev) to provide userspace access
       # to a device in a secure manner
       echo $UUID > /sys/devices/vfio_ap/matrix/mdev_supported_types/ \

       # assign adapter, domain and control domain
       echo 0x3 > /sys/devices/vfio_ap/matrix/${UUID}/assign_adapter
       echo 0x7 > /sys/devices/vfio_ap/matrix/${UUID}/assign_domain
       echo 0x7 > /sys/devices/vfio_ap/matrix/${UUID}/ \

    To make use of the AP device in a KVM guest, add the following element to the respective domain XML:

       <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-ap'>
           <address uuid='e926839d-a0b4-4f9c-95d0-c9b34190c4ba'/>

    Once complete, use the passthrough device in a KVM guest just like a regular crypto adapter.
    Requires Linux kernel 4.20 and libvirt 4.9, and is also available in RHEL 8, Ubuntu 18.04 and SLES 15 SP1.

by Stefan Raspl ( at May 17, 2019 11:33 PM

May 07, 2019

KVM on Z

RHEL 8 released

Red Hat Enterprise Linux 8 is out! See the announcement and their release notes with Z-specific changes.
It ships the following code levels:

by Stefan Raspl ( at May 07, 2019 04:52 PM

April 30, 2019

KVM on Z

QEMU v4.0 released

QEMU v4.0 is out. Besides a number of small enhancements, some items that we would like to highlight from a KVM on Z perspective:
  • CPU models for z14 GA2 as follows:
       $ qemu-system-s390x -cpu help -enable-kvm | grep z14.2
       s390 z14.2-base      IBM z14 GA2           (static, migration-safe)
       s390 z14.2           IBM z14 GA2           (migration-safe)
  • vfio-ap now supports hotplugging of vfio-ap devices.

by Stefan Raspl ( at April 30, 2019 08:17 AM

April 24, 2019

QEMU project

QEMU version 4.0.0 released

We would like to announce the availability of the QEMU 4.0.0 release. This release contains 3100+ commits from 220 authors.

You can grab the tarball from our download page. The full list of changes are available in the Wiki.

Highlights include:

  • ARM: ARMv8+ extensions for SB, PredInv, HPD, LOR, FHM, AA32HPD, PAuth, JSConv, CondM, FRINT, and BTI
  • ARM: new emulation support for “Musca” and “MPS2” development boards
  • ARM: virt: support for >255GB of RAM and u-boot “noload” image types
  • ARM: improved emulation of ARM PMU
  • HPPA: support for TLB protection IDs and TLB trace events
  • MIPS: support for multi-threaded TCG emulation
  • MIPS: emulation support for I7200 I6500 CPUs, QMP-base querying of CPU types, and improved support for SAARI and SAAR configuration registers
  • MIPS: improvements to Interthread Communication Unit, Fulong 2E machine types, and end-user documentation.
  • PowerPC: pseries/powernv: support for POWER9 large decrementer
  • PowerPC: pseries: emulation support for XIVE interrupt controller
  • PowerPC: pseries: support for hotplugging PCI host bridges (PHBs)
  • PowerPC: pseries: Spectre/Meltdown mitigations enabled by default, additional support for count-cache-flush mitigation
  • RISC-V: virt: support for PCI and USB
  • RISC-V: support for TSR, TW, and TVM fields of mstatus, FS field now supports three stats (dirty, clean, and off)
  • RISC-V: built-in gdbserver supports register lists via XML files
  • s390: support for z14 GA 2 CPU model, Multiple-epoch and PTFF features now enabled in z14 CPU model by default
  • s390: vfio-ap: now supports hot plug/unplug, and no longer inhibits memory ballooning
  • s390: emulation support for floating-point extension facility and vector support instructions
  • x86: HAX accelerator now supported POSIX hosts other than Darwin, including Linux and NetBSD
  • x86: Q35: advertised PCIe root port speeds will now optimally default to maximum link speed (16GT/s) and width (x32) provided by PCIe 4.0 for QEMU 4.0+ machine types; older machine types will retain 2.5GT/x1 defaults for compatibility.
  • x86: Xen PVH images can now be booted with “-kernel” option
  • Xtensa: xtfpga: improved SMP support for linux (interrupt distributor, IPI, and runstall) and new SMP-capable test_mmuhifi_c3 core configuration
  • Xtensa: support for Flexible length instructions extension (FLIX)
  • GUI: new ‘-display spice-app’ to configure/launch a Spice client GUI with a similar UI to QEMU GTK. VNC server now supports access controls via tls-authz/sasl-authz options
  • QMP: support for “out-of-band” command execution, can be useful for postcopy migration recovery. Additional QMP commands for working with block devices and dirty bitmaps
  • VFIO: EDID interface for supported mdev (Intel vGPU for kernel 5.0+), allows resolution setting via xres/yres options.
  • Xen: new ‘xen-disk’ device which can create a Xen PV disk backend, and performance improvements for Xen PV disk backend.
  • Network Block Device: improved tracing and error diagnostics, improved client compatibility with buggy NBD server implementations, new –bitmap, –list, –tls-authz options for qemu-nbd
  • virtio-blk now supports DISCARD and WRITE_ZEROES
  • pvrdma device now supports RDMA Management Datagram services (MAD)
  • and lots more…

Thank you to everyone involved!

April 24, 2019 05:45 AM

April 18, 2019

Stefan Hajnoczi

What's new in VIRTIO 1.1?

The VIRTIO 1.1 specification has been published! This article covers the major new features in this specification.

New Devices

The following new devices are defined:

  • virtio-input is a Linux evdev input device (mouse, keyboard, joystick)
  • virtio-gpu is a 2D graphics device (with 3D support planned)
  • virtio-vsock is a host->guest socket communications device
  • virtio-crypto is a cryptographic accelerator device

New Device Features




New Core Features

There is a new virtqueue memory layout called packed virtqueues. The old layout is called split virtqueues because the avail and used rings are separate from the descriptor table. The new packed virtqueue layout uses just a single descriptor table as the single ring. The layout is optimized for a friendlier CPU cache footprint and there are several features that devices can exploit for better peformance.

The VIRTIO_F_NOTIFICATION_DATAfeature is an optimization mainly for hardware implementations of VIRTIO. The driver writes extra information as part of the Available Buffer Notification. Thanks to the information included in the notification, the device does not need to fetch this information from memory anymore. This is useful for PCI hardware implementations where minimizing DMA operations improves performance significantly.

by Unknown ( at April 18, 2019 12:56 PM

April 09, 2019

Cole Robinson

Host 'Network Interfaces' panel removed from virt-manager

I released virt-manager 2.0.0 in October 2018. Since the release contained the full port to python3, it seemed like a good opportunity to drop some baggage from the app.

The biggest piece we removed was the UI for managing host network interfaces. This is the Connection Details->Network Interfaces panel, and the New Interface wizard for defining host network definitions for things like bridges, bonds, and vlan devices. The main screen of the old UI looked like this:

virt-manager host interfaces panel

Some history

Behind the scenes, this UI was using libvirt's Interface APIs, which also power the virsh iface-* commands. These APIs are little more than a wrapper around the netcf library.

netcf aimed to be a linux distro independent API for network device configuration. On Red Hat distros this meant turning the API's XML format into an /etc/sysconfig/network script. There were even pie-in-the-sky ideas about NetworkManager one day using netcf.

In practice though the library never really took off. It was years before a debian backend showed up, contributed by a Red Hatter in the hope of increasing library uptake, though it didn't seem to help. netcf basically only existed to serve the libvirt Interface APIs, yet those APIs were never really used by any major libvirt consuming app, besides virt-manager. And in virt-manager's case it was largely just slapping some UI over the XML format and lifecycle operations.

For virt-manager's usecases we hoped that netcf would make it trivial to bridge the host's network interface, which when used with VMs would give them first class IP addresses on the host network setup, not NAT like the 'default' virtual network. Unfortunately though the UI would create the ifcfg files well enough, behind the scenes nothing played well with NetworkManager for years and years. The standard suggestion for was to disable NetworkManager if you wanted to bridge your host NIC. Not very user friendly. Some people did manage to use the UI to that effect but it was never a trivial process.

The state today

Nowadays NetworkManager can handle bridging natively and is much more powerful than what virt-manager/libvirt/netcf provide. The virt-manager UI was more likely to shoot you in the foot than make things simple. And it had become increasingly clear that virt-manager was not the place to maintain host network config UI.

So we made the decision to drop all this from virt-manager in 2.0.0. netcf and the libvirt interface APIs still exist. If you're interested in some more history on the interface API/netcf difficulties, check out Laine's email to virt-tools-list.

by Cole Robinson at April 09, 2019 06:01 PM

April 02, 2019

Gerd Hoffmann

drminfo 6 released

drminfo is a small collection of tools for drm and fbdev devices. They print device information and can run some basic tests.

New in version 6 is a number of avocado test cases for qemu display devices (stdvga, cirrus, qxl and virtio).

drminfo has a homepage and a git repository.
My copr repo has Fedora and EPEL rpm packages.

by Gerd Hoffmann at April 02, 2019 10:00 PM

March 12, 2019

Cornelia Huck

s390x changes in QEMU 4.0

QEMU is now entering softfreeze for the 4.0 release (expected in April), so here is the usual summary of s390x changes in that release.

CPU Models

  • A cpu model for the z14 GA 2 has been added. Currently, no new features have been added.
  • The cpu model for z14 now does, however, include the multiple epoch and PTFF enhancement features per default.
  • The 'qemu' cpu model now includes the zPCI feature per default. No more prerequisites are needed for pci support (see below).


  • QEMU for s390x is now always built with pci support. If we want to provide backwards compatibility,  we cannot simply disable pci (we need the s390 pci host bus); it is easier to simply make pci mandatory. Note that disabling pci was never supported by the normal build system anyway.
  • zPCI devices have gained support for instruction counters (on a Linux guest, these are exposed through /sys/kernel/debug/pci/<function>/statistics).
  • zPCI devices always lacked support for migrating their s390-specific state (not implemented...); if you tried to migrate a guest with a virtio-pci device on s390x, odd things might happen. To avoid surprises, the 'zpci' devices are now explicitly marked as unmigratable. (Support for migration will likely be added in the future.)
  • Hot(un)plug of the vfio-ap matrix device is now supported.
  • Adding a vfio-ap matrix device no longer inhibits usage of a memory ballooner: Memory usage by vfio-ap does not clash with the concept of a memory balloon.


  • Support for the floating-point extension facility has been added.
  • The first part of support for z13 vector instructions has been added (vector support instructions). Expect support for the remaining vector instructions in the next release; it should support enough of the instructions introduced with z13 to be able to run a distribution built for that cpu. 

by Cornelia Huck ( at March 12, 2019 06:20 PM

March 11, 2019

KVM on Z

libvirt v4.10 released, providing PCI passthrough support

libvirt v4.10, available for download at the libvirt project website, adds support for PCI passthrough devices on IBM Z (requires Linux kernel 4.14 and QEMU v2.11).
To setup passthrough for a PCI device, follow these steps:
  1. Make sure the vfio-pci module is  available, e.g. using the modinfo command:
       $ modinfo vfio-pci
       filename:       /lib/modules/4.18.0/kernel/drivers/vfio/pci/vfio-pci.ko
       description:    VFIO PCI - User Level meta-driver
  2. Verify that the pciutils package, providing the lspci command et al, is available using your distro's package manager
  3. Determine the PCI device's address using the lspci command:
       $ lspci

       0002:06:00.0 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family

                    [ConnectX-3/ConnectX-3 Pro Virtual Function]
  4. Add the following element to the guest domain XML's devices section:
       <hostdev mode='subsystem' type='pci' managed='yes'>


           <address domain='0x0002' bus='0x06' slot='0x00' function='0x0'/>



    Note that if attribute managed is set to no (which is the default), it becomes the user's duty to unbind the PCI device from the respective device driver, and rebind to vfio-pci in the host prior to starting the guest.
Once done and the guest is started, running the lspci command in the guest should show the PCI device, and one can proceed to configure it as needed.
It is well worth checking out the expanded domain XML:
    <hostdev mode='subsystem' type='pci' managed='yes'>
        <address domain='0x0002' bus='0x06' slot='0x00' function='0x0'/>
      <address type='pci' domain='0x0002' bus='0x00' slot='0x01' function='0x0'>
        <zpci uid='0x0001' fid='0x00000000'/>

Theoretically, the PCI address in the guest can change between boots. However, the <zpci> element guarantees address persistence inside of the guest. The actual address of the passthrough device is based solely on the uid attribute: The uid becomes the PCI domain, all remaining values of the address (PCI bus, slot and function) are set to zero. Therefore, in this example, the PCI address in the guest would be 0001:00:00:0.
Take note of the fid attribute, whose value is required to hotplug/hotunplug PCI devices within a guest.
Furthermore note that the target PCI address is not visible anywhere (except within the QEMU process) at all. I.e. it is not related to the PCI address as observed within the KVM guest, and could be set to an arbitrary value. However, choosing the "wrong" values might have undesired subtle side effects with QEMU. Therefore, we strongly recommend not to specify a target address, and to rather rely on the auto-assignment. I.e. if the guest's PCI address has to be chosen, at a maximum restrict the target address element to uid (for PCI address definition) and fid (so that e.g. scripts in the guest for hotplugging PCI devices can rely on a specific value) as follows:
   <address type='pci'>
     <zpci uid='0x0001' fid='0x00000000'/>

For further (rather technical) details see here and here (git commit).

by Stefan Raspl ( at March 11, 2019 03:11 PM

March 02, 2019

Gerd Hoffmann

EDID support for qemu

Over the last months I've worked on adding EDID support to qemu. This allows to pass all kinds of information about the (virtual) display to the guest. Preferred video mode, display resolution, monitor name, monitor serial number and more. Current focus is getting the infrastructure in place. Once we have this we can build new features on top. HiDPI support comes to mind for example.

New in qemu 3.1

In qemu 3.1 the EDID generator code and support for the qemu stdvga was added. Right now EDID support is turned off by default, use edid=on to enable it. With EDID enabled you can also use the xres and yres properties to set the preferred video mode. Here is an example: qemu -device VGA,edid=on,xres=1280,yres=800

The qemu-edid utility has been added too. Main purpose is to allow testing the generator code without having to boot a guest for that, so typically the qemu-edid output is piped into the edid-decode utility to verify the generator works correctly. If you need an EDID blob for other reasons you might find this useful.

New in linux kernel 5.0

Infrastructure work: Some interface updates have been merged:

  • EDID support has been added to the virtio-gpu protocol.
  • The vfio mdev interface for vgpus got EDID support too.

We also got EDID support in drm drivers for both qemu stdvga (bochs-drm.ko) and virtio-gpu.

If both guest (linux kernel) and host are (qemu) are new enough the drm drivers will use the informations from the edid blob to create the video mode list. It will also be available in sysfs, you can use edid-decode to get a human-readable version: edid-decode /sys/class/drm/card0-Virtual-1/edid

Planned for qemu 4.0

Support for EDID in virtio-gpu will be added, it is already merged in master branch. It is also turned off by default, use edid=on to enable it (simliar to stdvga).

Support for EDID in vgpus (i.e. vfio-mdev) is planned too, patches are out for review right now. Note that the kernel's mdev driver must support that too.

The qemu macos driver for the stdvga have been updated to support EDID.

Future plans

Intel is working on adding EDID support to gvt (the intel graphics mdev driver). Should land in the 5.1 or 5.2 linux kernel merge window.

Once the EDID support got some real world testing it will be enabled by default for both stdvga and virtio-gpu. Unless something unexpected happens that will probably happen in qemu 4.1.

As already mentioned above looking at HiDPI support (starting with the gtk UI probably) is something I plan to look at when I find some time.

by Gerd Hoffmann at March 02, 2019 11:00 PM

February 28, 2019

Stefan Hajnoczi

QEMU accepted into Google Summer of Code and Outreachy 2019

QEMU is participating in the Google Summer of Code and Outreachy open source internship programs again this year. These 12-week, full-time, paid, remote work internships allow people interested in contributing to QEMU get started. Each intern works with one or more mentors who can answer questions and are experienced developers. This is a great way to try out working on open source if you are considering it as a career.

For more information (including eligibility requirements), see our GSoC and our Outreachy pages.

by Unknown ( at February 28, 2019 05:17 PM

February 27, 2019

Gerd Hoffmann

ramfb display in qemu

ramfb is a very simple framebuffer display device. It is intended to be configured by the firmware and used as boot framebuffer, until the guest OS loads a real GPU driver.

The framebuffer memory is allocated from guest RAM and initialized using the firmware config interface (fw_cfg). edk2 (uefi firmware) has ramfb support. There also is a vgabios, which emulates vga text mode and renders it to the framebuffer.

Most interesting use case for this is boot display support for vgpus. vfio has a not hot-pluggable variant, which allows to enable ramfb support: qemu -device vfio-pci-nohotplug,ramfb=on,... Once the guest OS has initialized the vgpu qemu will show the vgpu display. Otherwise the ramfb framebuffer is used. The firmware messages, boot loader menu and efifb/vesafb output all will show via ramfb.

There also is a standalone device, mostly intended for testing: qemu -vga none -device ramfb.

Even though it is possible to use ramfb as primary display it isn't a good idea to actually do that as this isn't very efficient.

by Gerd Hoffmann at February 27, 2019 11:00 PM

February 26, 2019

QEMU project

Announcing GSoC and Outreachy 2019 internships

QEMU is once again participating in Google Summer of Code and Outreachy this year! These open source internship programs offer full-time remote work opportunities for talented new developers wishing to get involved in our community.

Each intern works with one or more mentors who support them in their project. Code is submitted according to QEMU’s normal development process, giving the intern experience in open source software development. Our projects range from device emulation to performance optimization to test infrastructure.

If you are interested in contributing to QEMU through a paid 12-week internship from May to August 2019, take a look at our GSoC page and our Outreachy page for more information.

Both GSoC and Outreachy have eligibility criteria, which you can review here (GSoC) and here (Outreachy) before applying.

You can read about projects that were completed in 2018 here.

These internships are generously funded by Google (GSoC) and Red Hat (Outreachy).

February 26, 2019 07:00 AM

February 18, 2019

Daniel Berrange

Easier QEMU live tracing using systemtap

QEMU is able to leverage a number of live tracing systems, the choice configurable at build time between

  • log – printf formatted string for each event sent into QEMU’s logging system which writes to stderr
  • syslog – printf formatted string for each event sent via syslog
  • simple – binary data stream for each event written to a file or fifo pipe
  • ftrace – printf formatted string for each event sent to kernel ftrace facility
  • dtrace – user space probe markers dynamically enabled via dtrace or systemtap
  • ust – user space probe markers dynamically enabled via LTT-ng

Upstream QEMU enables the “log” trace backend by default since it is cross-platform portable and very simple to use by adding “-d trace:PATTERN” on the QEMU command line. For example to enable logging of all trace events in the QEMU I/O subsystem (aka “qio“) we can

$ qemu -d trace:qio* ...some args...
23266@1547735759.137292:qio_channel_socket_new Socket new ioc=0x563a8a39d400
23266@1547735759.137305:qio_task_new Task new task=0x563a891d0570 source=0x563a8a39d400 func=0x563a86f1e6c0 opaque=0x563a89078000
23266@1547735759.137326:qio_task_thread_start Task thread start task=0x563a891d0570 worker=0x563a86f1ce50 opaque=0x563a891d9d90
23273@1547735759.137491:qio_task_thread_run Task thread run task=0x563a891d0570
23273@1547735759.137503:qio_channel_socket_connect_sync Socket connect sync ioc=0x563a8a39d400 addr=0x563a891d9d90
23273@1547735759.138108:qio_channel_socket_connect_fail Socket connect fail ioc=0x563a8a39d400

This is very simple and surprisingly effective much of the time, but it is not without its downsides

  • Inactive probes have non-negligible performance impact on hot codepaths
  • It is targetted to human consumption, so it is not easy to reliably process with machines
  • It requires adding arguments to QEMU’s command line so is not easy to enable in many cases
  • It is custom to QEMU so does not facilitate getting correlated traces across the whole system

For these reasons, some downstreams chose not to use the default “log” backend. Both Fedora and RHEL have instead enabled the “dtrace” backend which can be leveraged via systemtap on Linux. This provides a very powerful tracing system, but the cost is that the previous simple task of printing a formatted string when a probe point fires has become MUCH more complicated. For example to get equivalent output to that seen with QEMU’s log backend would require

# cat > trace.stp <<EOF
probe qemu.system.x86_64.qio_task_new {
    printf("%d@%d qio_task_new Task new task=%p source=%p func=%p opaque=%p\n", 
           pid(), gettimeofday_ns(), task, source, func, opaque)
# stap trace.stp
22806@1547735341399862570 qio_task_new Task new task=0x56135cd66eb0 source=0x56135d1d7c00 func=0x56135af746c0 opaque=0x56135bf06400

Repeat that code snippet for every qio* probe point you want to watch, figuring out the set of args it has available to print.This quickly becomes tedious for what should be a simple logging job, especially if you need to reference null terminated strings from userspace.

After cursing this difficulty one time too many, it occurred to me that QEMU could easily do more to make life easier for systemtap users. The QEMU build system is already auto-generating all the trace backend specific code from a generic description of probes in the QEMU source tree. It has a format string which is used in the syslog, log and ftrace backends, but this is ignored for the dtrace backend. It did not take much to change the code generator so that it can use this format string to generate a convenient systemtap tapset representing the above manually written probe:

probe qemu.system.x86_64.log.qio_task_new = qemu.system.x86_64.qio_task_new ?
    printf("%d@%d qio_task_new Task new task=%p source=%p func=%p opaque=%p\n",
           pid(), gettimeofday_ns(), task, source, func, opaque)

This can be trivially executed with minimal knowledge of systemtap tapset language required

# stap -e "qemu.system.x86_64.log.qio_task_new{}"
22806@1547735341399862570 qio_task_new Task new task=0x56135cd66eb0 source=0x56135d1d7c00 func=0x56135af746c0 opaque=0x56135bf06400

Even better, we have now gained the ability to use wildcards too

# stap -e "qemu.system.x86_64.log.qio*{}"
23266@1547735759.137292:qio_channel_socket_new Socket new ioc=0x563a8a39d400
23266@1547735759.137305:qio_task_new Task new task=0x563a891d0570 source=0x563a8a39d400 func=0x563a86f1e6c0 opaque=0x563a89078000
23266@1547735759.137326:qio_task_thread_start Task thread start task=0x563a891d0570 worker=0x563a86f1ce50 opaque=0x563a891d9d90
23273@1547735759.137491:qio_task_thread_run Task thread run task=0x563a891d0570
23273@1547735759.137503:qio_channel_socket_connect_sync Socket connect sync ioc=0x563a8a39d400 addr=0x563a891d9d90
23273@1547735759.138108:qio_channel_socket_connect_fail Socket connect fail ioc=0x563a8a39d400

Users still, however, need to be aware of the naming convention for QEMU’s systemtap tapsets and how it maps to the particular QEMU binary that is used, and don’t forget the trailing “{}”. Thus I decided to go one step further and ship a small helper tool to make it even easier to use

$ qemu-trace-stap run qemu-system-x86_64 'qio*'
22806@1547735341399856820 qio_channel_socket_new Socket new ioc=0x56135d1d7c00
22806@1547735341399862570 qio_task_new Task new task=0x56135cd66eb0 source=0x56135d1d7c00 func=0x56135af746c0 opaque=0x56135bf06400
22806@1547735341399865943 qio_task_thread_start Task thread start task=0x56135cd66eb0 worker=0x56135af72e50 opaque=0x56135c071d70
22806@1547735341399976816 qio_task_thread_run Task thread run task=0x56135cd66eb0

The second argument to this tool is the QEMU binary filename to be traced, which can be relative (to search $PATH) or absolute. What is clever is that it will set the SYSTEMTAP_TAPSET env variable to point to the right location to find the corresponding tapset definition. This is very useful when you have multiple copies of QEMU on the system and need to make sure systemtap traces the right one.

The ‘qemu-trace-stap‘ script takes a verbose arg so you can understand what it is running behind the scenes:

$ qemu-trace-stap run /home/berrange/usr/qemu-git/bin/qemu-system-x86_64 'qio*'
Using tapset dir '/home/berrange/usr/qemu-git/share/systemtap/tapset' for binary '/home/berrange/usr/qemu-git/bin/qemu-system-x86_64'
Compiling script 'probe qemu.system.x86_64.log.qio* {}'
Running script, <Ctrl>-c to quit
...trace output...

It can enable multiple probes at once

$ qemu-trace-stap run qemu-system-x86_64 'qio*' 'qcrypto*' 'buffer*'

By default it monitors all existing running processes and all future launched proceses. This can be restricted to a specific PID using the –pid arg

$ qemu-trace-stap run --pid 2532 qemu-system-x86_64 'qio*'

Finally if you can’t remember what probes are valid it can tell you

$ qemu-trace-stap list qemu-system-x86_64

This new functionality merged into QEMU upstream a short while ago and will be included in the QEMU 4.0 release coming at the end of April.

by Daniel Berrange at February 18, 2019 03:00 PM

January 28, 2019

Thomas Huth

How to create small VMs with buildroot

A couple of times I already ran into the situation that I wanted to provide a small guest disk image to other people. For example, one time I wanted to provide a test application like LTP to colleagues via a server where I only had some limited disk quota available. Back then I was still able to resolve the problem by installing a stock Linux distribution together with the test software into a normal qcow2 image, and then to shrink the image with qemu-img convert and xz to approximately 500 MiB.

But when I started to think about the QEMU advent calendar 2018, where I wanted to provide many small images for various different target architectures, it was clear to me that I needed a different approach. First the disk images needed to be much smaller due to network traffic constraints, and for many of “non-mainstream” target architectures (like MicroBlaze or Xtensa) you also can not easily get a standard Linux distribution that installs without problems on the machines that QEMU provides.

Instead of using a pre-built Linux distribution, it would also be possible to cross-compile the kernel and user space programs and build a small disk image with that on your own. However, figuring out how to do that for multiple target architectures would have been very cumbersome and time consuming.

So after doing some research, I finally discovered buildroot, which is an excellent framework for doing exactly what I wanted: It allows to create small disk images for non-x86 target CPUs, with all the magic about cross compiling and image creation wrapped into its internal scripts, and with a very flexible Kconfig-style configuration system on top.

For those who are interested, here’s now a short description how to use buildroot for creating a small guest disk image:

  1. Download the version that you like to use from the buildroot download page and unpack it:
    $ wget
    $ tar -xaf buildroot-2018.02.9.tar.bz2 
    $ cd buildroot-2018.02.9/
  2. Now you have to choose for which CPU and machine target you want to build. Have a look at the pre-defined config files and then select one. In the following example, I’m going to use the “pseries” POWER machine:
    $ cd configs/
    $ ls qemu*
    qemu_aarch64_virt_defconfig         qemu_nios2_10m50_defconfig
    qemu_arm_versatile_defconfig        qemu_or1k_defconfig
    qemu_arm_versatile_nommu_defconfig  qemu_ppc64le_pseries_defconfig
    qemu_arm_vexpress_defconfig         qemu_ppc64_pseries_defconfig
    qemu_m68k_mcf5208_defconfig         qemu_ppc_g3beige_defconfig
    qemu_m68k_q800_defconfig            qemu_ppc_mpc8544ds_defconfig
    qemu_microblazebe_mmu_defconfig     qemu_ppc_virtex_ml507_defconfig
    qemu_microblazeel_mmu_defconfig     qemu_sh4eb_r2d_defconfig
    qemu_mips32r2el_malta_defconfig     qemu_sh4_r2d_defconfig
    qemu_mips32r2_malta_defconfig       qemu_sparc64_sun4u_defconfig
    qemu_mips32r6el_malta_defconfig     qemu_sparc_ss10_defconfig
    qemu_mips32r6_malta_defconfig       qemu_x86_64_defconfig
    qemu_mips64el_malta_defconfig       qemu_x86_defconfig
    qemu_mips64_malta_defconfig         qemu_xtensa_lx60_defconfig
    qemu_mips64r6el_malta_defconfig     qemu_xtensa_lx60_nommu_defconfig
    $ cd ..
    $ make qemu_ppc64_pseries_defconfig
  3. Now run make menuconfig to fine tune your build. I recommend to have a look at the following settings first:
    • In the Toolchain section, you might need to enable other languages like C++ in case it is required for the application that you want to ship in the image.
    • In the System Configuration section, change the System Banner to something that better suits your disk image.
    • Check the Kernel section to see whether the right kernel settings are used here. The defaults should be fine most of the time, but in case you want to use a newer kernel version for example, or a different kernel config file, you can adjust it here. Note that you also should adjust the kernel header version in the Toolchain section if you change the kernel version here.
    • Have a look at the Target packages section – maybe the application that you want to include is already available by the base buildroot system. In that case you can already enable it here.
    • Check the Filesystem images section and decide which kind of image you want to ship later. For example, for most of the QEMU advent calendar images, I used a simple initrd only, so I unchecked the ext2/3/4 root filesystem here and used initial RAM filesystem linked into linux kernel instead.
  4. Now save your configuration, exit the config menu, and type make for a first test to see whether it produces a usable image. Note: Don’t use the -j parameter of make here, buildroot will figure that out on its own instead.

  5. Once the build finished successfully, have a look at the output/images/ directory. You can start your guest with the results from there to give it a try. For example if you built with the ppc64 pseries configuration, with the initrd linked into the kernel:
    $ qemu-system-ppc64 -M pseries -m 1G -kernel output/images/vmlinux

    You should see the kernel booting up, and if you have a look at the serial console, there is also a getty running where you can log in as root and look around.

  6. To customize your build, you sooner or later want to add additional files to the image, for example some additional init scripts in the /etc/init.d/ folder. Or in the above case, it would be good to also have getty running on the graphical console. So to add custom files, the best way is to create an overlay folder which will be copied into the destination filesystem during the make process:
    $ mkdir overlay/etc/init.d
    $ cp overlay/etc/init.d/S99myscript  # If you have one
    $ cp output/target/etc/inittab overlay/etc/inittab
    $ echo 'tty1::respawn:/sbin/getty -L tty1 0 linux' >> overlay/etc/inittab

    Then run make menuconfig and set the Root filesystem overlay directories option in the System Configuration section to the overlay folder that you have just created. Run make again and the next time you start your guest, you should see the new files in the image, e.g. also a getty running on the graphical console. Note: Do not try to add/change files directly in the output/target/ folder. That looks tempting first, but this is just a temporary folder used by the build system, which can be overwritten at any time and will be erased when you run make clean for example.

  7. If you need to tweak the kernel configuration, you can run make linux-menuconfig and do the appropriate changes there. For example, if you want to get keyboard input for the ppc64 pseries machine on the graphical console, you should enable the USB XHCI driver in the kernel, too. Once you are happy with the kernel configuration, save it, exit the menu and type make linux-rebuild && make. Note: To avoid that the kernel config gets reset after you run make clean at a later point in time, you should copy output/build/linux-*/.config to a safe location. Then run make menuconfig, change the Kernel -> Kernel configuration setting to Use a custom config file and set the Configuration file path to the copied file.

  8. If you want to add additional software to your image, you basically have to provide a file and a *.mk file. I recommend to have a look at the various packages in the package/ directory. Use one of the software from there with a similar build system as a template, and have a closer look at buildroot manual for details. Tweaking the build system of your software to properly cross-compile can be a little bit tricky some times, but most software that uses standard systems like autoconf should be fine.

That’s it. You now should be able to package your software in really small VM images. Of course, there are still lots of other settings that you can tweak in the buildroot environment – if you need any of these just have a look at the good buildroot manual for more information.

January 28, 2019 02:20 PM

January 25, 2019

Stefan Hajnoczi

VIRTIO 1.1 is available for public review until Feb 21st 2019

The VIRTIO 1.1 specification for paravirtualized I/O devices includes the new packed vring layout and the GPU, input, crypto, and socket device types. In addition to this there are other improvements and new features in the specification. The new vring layout will increase performance and offers new features that devices can take advantage of.

You can review the specification and post comments until February 21st 2019: VIRTIO 1.1 csprd01.

by Unknown ( at January 25, 2019 02:42 PM

January 23, 2019

QEMU project

GSoC and Outreachy 2018 retrospective

QEMU participates in open source internship programs including Google Summer of Code (GSoC) and Outreachy. These full-time remote work opportunities allow talented new developers to get involved in our community. This post highlights what our interns achieved in 2018.

micro:bit board emulation

Julia Suvorova (Outreachy) and Steffen Görtz (GSoC) tackled adding emulation support for the micro:bit ARM board. Although QEMU already has plenty of ARM emulation code, the Cortex-M0 CPU used in the micro:bit was not yet implemented and the nRF51 system-on-chip was also missing.

The goal of this project was to run micro:bit programs (usually created with the MicroPython or Javascript/Blocks IDEs) with a core set of emulated devices, including the serial port, pushbuttons, and LEDs.

QEMU 3.1 already shipped the groundwork for the new qemu-system-arm -M microbit machine type. Enough functionality to run basic micro:bit programs is expected in the next QEMU release.

This project was mentored by Jim Mussared, Joel Stanley, and Stefan Hajnoczi.

Patchew REST API improvements

Shubham Jain (GSoC) created a REST API for the Patchew continuous integration system that is at the heart of QEMU’s development process. The previous API was not RESTful and exposed database schema internals.

The improvements to the REST API have been included into Patchew and are deployed on They are not in use yet, pending more work on authentication; this may be the topic of a future Summer of Code internship.

This project was mentored by Paolo Bonzini and Fam Zheng.

Qtest Driver Framework

Emanuele Esposito (GSoC) enhanced QEMU’s test infrastructure with an engine that starts tests with all variants of devices that they are capable of driving.

This is a complicated task in QEMU since certain devices and buses are available in an architecture-specific way on each emulation target, making it hard to write test cases without lots of hardcoded dependencies - and to keep them up-to-date!

The qgraph framework that Emanuele created eliminates the need to hardcode each variant into the test. Emanuele also converted several existing tests. His framework was also presented at KVM Forum 2018 by Laurent Vivier and should be merged in 4.0.

This project was mentored by Paolo Bonzini and Laurent Vivier.

Vulkan-izing VirGL

Nathan Gauër (GSoC) improved VirGL, which provides an OpenGL path well supported on Linux guests. On the host, QEMU offers several console back-ends, from EGL to SDL. Adding a Vulkan path will require to change the current VirGL API, write new guest drivers, and also offer a way to display the output. This is a huge task, which can be split in several sub-projects. Expending the current VirGL API to support Vulkan is the first step.

Code is available here.

This project was mentored by Marc-André Lureau.

Stay tuned for 2019 internships!

QEMU will apply to Google Summer of Code and Outreachy again in 2019. We hope to offer more great open source internship opportunities for new developers.

January 23, 2019 06:50 AM

January 07, 2019

KVM on Z

Knowledge Series: CPU Models and Migration Considerations

Another new entry in our Knowledge Series explains CPU models and its relationship to live guest migration.

by Stefan Raspl ( at January 07, 2019 09:23 PM

Powered by Planet!
Last updated: August 19, 2019 12:02 PM
Powered by OpenShift Online