Blogging about open source virtualization

News from QEMU, KVM, libvirt, libguestfs, virt-manager and related tools


Planet Feeds

November 09, 2019

Stefano Garzarella

KVM Forum 2019: virtio-vsock in QEMU, Firecracker and Linux

Slides and recording are available for the “virtio-vsock in QEMU, Firecracker and Linux: Status, Performance and Challenges“ talk that Andra Paraschiv and I presented at KVM Forum 2019. This was the 13th edition of the KVM Forum conference. It took place in Lyon, France in October 2019.

We talked about the current status and future works of VSOCK drivers in Linux and how Firecracker and QEMU provides the virtio-vsock device.


Initially, Andra gave an overview of VSOCK, she described the state of the art, and the key features:

  • it is very simple to configure, the host assigns an unique CID (Context-ID) to each guest, and no configuration is needed inside the guest;

  • it provides AF_VSOCK address family, allowing user space application in the host and guest to communicate using standard POSIX Socket API (e.g. bind, listen, accept, connect, send, recv, etc.)

Andra also described common use cases for VSOCK, such as guest agents (clipboard sharing, remote console, etc.), network applications using SOCK_STREAM, and services provided by the hypervisor to the guests.

Going into the implementation details, Andra explained how the device in the guest communicates with the vhost backend in the host, exchanging data and events (i.e. ioeventfd, irqfd).


Focusing on Firecracker, Andra gave a brief overview on this new VMM (Virtual Machine Monitor) written in Rust and she explained why, in the v0.18.0 release, they switched from the experimental vhost-vsock implementation to a vhost-less solution:

  • focus on security impact
  • less dependency on host kernel features

This change required a device emulation in Firecracker, that implements virtio-vsock device model over MMIO. The device is exposed in the host using UDS (Unix Domain Sockets).

Andra described how Firecracker maps the VSOCK ports on the uds_path specified in the VM configuration:

  • Host-Initiated Connections

    • Guest: create an AF_VSOCK socket and listen() on PORT
    • Host: connect() to AF_UNIX at uds_path
    • Host: send() “CONNECT PORT\n”
    • Guest: accept() the new connection
  • Guest-Initiated Connections

    • Host: create and listen() on an AF_UNIX socket at uds_path_PORT
    • Guest: create an AF_VSOCK socket and connect() to HOST_CID and PORT
    • Host: accept() the new connection

Finally, she showed the performance of this solution, running iperf-vsock benchmark, varying the size of the buffer used in Firecracker to transfer packets between the virtio-vsock device and the UNIX domain socket. The throughput on the guest to host path reaches 10 Gbps.


In the second part of the talk, I described the QEMU implementation. QEMU provides the virtio-vsock device using the vhost-vsock kernel module.

The vsock device in QEMU handles only:

  • configuration
    • user or management tool can configure the guest CID
  • live-migration
    • connected SOCK_STREAM sockets become disconnected. Applications must handle a connection reset error and should reconnect.
    • guest CID can be not available in the new host because can be assigned to another VM. In this case the guest is notified about the CID change.

The vhost-vsock kernel module handles the communication with the guest, providing in-kernel virtio device emulation, to have very high performance and to interface directly to the host socket layer. In this way, also host application can directly use POSIX Socket API to communicate with the guest. So, guest and host applications can be switched between them, changing only the destination CID.

virtio-vsock Linux drivers

After that, I told the story of VSOCK in the Linux tree, started in 2013 when the first implementation was merged, and the changes in the last year.

These changes mainly regard fixes, but for the virtio/vhost transports we also improved the performance with two simple changes released with Linux v5.4:

  • reducing the number of credit update messages exchanged
  • increasing the size of packets queued in the virtio-vsock device from 4 KB up to 64 KB, the maximum packet size handled by virtio-vsock devices.

With these changes we are able to reach ~40 Gbps in the Guest -> Host path, because the guest can now send up to 64 KB packets directly to the host; for the Host -> Guest path, we reached ~25 Gbps, because the host is still using 4 KB buffer preallocated by the guest.

Tools and languages that support VSOCK

In the last few years, several applications, tools, and languages started to support VSOCK and I listed them to update the audience:

  • Tools:

    • wireshark >= 2.40 [2017-07-19]
    • iproute2 >= 4.15 [2018-01-28]
      • ss
    • tcpdump
      • merged in master [2019-04-16]
    • nmap >= 7.80 [2019-08-10]
      • ncat
      • nbd
    • nbdkit >= 1.15.5 [2019-10-19]
    • libnbd >= 1.1.6 [2019-10-19]
    • iperf-vsock
      • iperf3 fork
  • Languages:

    • C
      • glibc >= 2.18 [2013-08-10]
    • Python
      • python >= 3.7 alpha 1 [2017-09-19]
    • Golang
    • Rust
      • libc crate >= 0.2.59 [2019-07-08]
        • struct sockaddr_vm
        • VMADDR_* macros
      • nix crate >= 0.15.0 [2019-08-10]
        • VSOCK supported in the socket API (nix::sys::socket)

Next steps

Concluding, I went through the next challenges that we are going to face:

  • multi-transport to use VSOCK in a nested VM environment. because we are limited by the fact that the current implementation can handle only one transport loaded at run time, so, we can’t load virtio_transport and vhost_transport together in the L1 guest. I already sent some patches upstream [RFC, v1], but they are still in progress.

  • network namespace support to create independent addressing domains with VSOCK socket. This could be useful for partitioning VMs in different domains or, in a nested VM environment, to isolate host applications from guest applications bound to the same port.

  • virtio-net as a transport for the virtio-vsock to avoid to re-implement features already done in virtio-net, such as mergeable buffers, page allocation, small packet handling.

From the audience

Other points to be addressed came from the comments we received from the audience:

  • loopback device could be very useful for developers to test applications that use VSOCK socket. The current implementation support loopback only in the guest, but it would be better to support it also in the host, adding VMADDR_CID_LOCAL special address.

  • VM to VM communication was asked by several people. Introducing it in the VSOCK core could complicate the protocol, the addressing and could require some sort of firewall. For now we do not have in mind to do it, but I developed a simple user space application to solve this issue: vsock-bridge. In order to improve the performance of this solution, we will consider the possibility to add sendfile(2) or MSG_ZEROCOPY support to the AF_VSOCK core.

  • virtio-vsock windows drivers is not planned to be addressed, but contributions are welcome. Other virtio windows drivers are available in the vm-guest-drivers-windows repository.

Stay tuned!

by (Stefano Garzarella) at November 09, 2019 05:45 PM

November 06, 2019

Cornelia Huck

s390x changes in QEMU 4.2

You know the drill: QEMU is entering freeze (this time for 4.2), and there's a post on the s390x changes for the upcoming release.


  • Emulation for  IEP (Instruction Execution Protection), a z14 feature, has been added.
  • A bunch of fixes in the vector instruction emulation and in the fault-handling code.


  • For quite some time now, the code has been implicitly relying on the presence of the 'flic' (floating interrupt controller) KVM device (which had been added in Linux 3.15). Nobody really complained, so we won't try to fix this up and instead make the dependency explicit.
  • The KVM memslot handling was reworked to be actually sane. Unfortunately, this breaks migration of huge KVM guests with more than 8TB of memory from older QEMUs. Migration of guests with less than 8TB continues to work, and there's no planned breakage of migration of >8TB guests starting with 4.2.

CPU models

  • We now know that the gen15a is called 'z15', so reflect this in the cpu model description.
  • The 'qemu' and the 'max' models gained some more features.
  • Under KVM, 'query-machines' will now return the correct default cpu model ('host-s390x-cpu').


  • The usual array of bugfixes, including in SCLP handling and in the s390-ccw bios.

by Cornelia Huck ( at November 06, 2019 03:12 PM

October 20, 2019

KVM on Z

Ubuntu 19.10 released

Ubuntu Server 19.10 is out!
It ships For a detailed list of KVM on Z changes, see the release notes here.

by Stefan Raspl ( at October 20, 2019 11:36 AM

October 16, 2019

Fabiano Fidêncio

Libosinfo (Part I)

This is the first blog post of a series which will cover Libosinfo, what it is, who uses it, how it is used, how to manage it, and, finally, how to contribute to it.

A quick overview

Libosinfo is the operating system information database. As a project, it consists of three different parts, with the goal to provide a single place containing all the required information about an operating system in order to provision and manage it in a virtualized environment.

The project allows management applications to:

  • Automatically identify for which operating system an ISO image or an installation tree is intended to;

  • Find the download location of installable ISOs and LiveCDs images;

  • Find the location of installation trees;

  • Query the minimum, recommended, and maximum CPU / memory / disk resources for an operating system;

  • Query the hardware supported by an operating system;

  • Generate scripts suitable for automating “Server” and “Workstation” installations;

The library (libosinfo)

The library API is written in C, taking advantage of GLib and GObject. Thanks to GObject Introspection, the API is automatically available in all dynamic programming languages with bindings for GObject (JavaScript, Perl, Python, and Ruby). Auto-generated bindings for Vala are also provided.

As part of libosinfo, three tools are provided:

  • osinfo-detect: Used to detect an Operating System from a given ISO or installation tree.

  • osinfo-install-script: Used to generate a “Server” or “Workstation” install-script to perform automated installation of an Operating System;

  • osinfo-query: Used to query information from the database;

The database (osinfo-db)

The database is written in XML and it can either be consumed via libosinfo APIs or directly via management applications’ own code.

It contains information about the operating systems, devices, installation scripts, platforms, and datamaps (keyboard and language mappings for Windows and Linux OSes).

The database tools (osinfo-db-tools)

These are tools that can be used to manage the database, which is distributed as a tarball archive.

  • osinfo-db-import: Used to import an osinfo database archive;

  • osinfo-db-export: Used to export an osinfo database archive;

  • osinfo-db-validate: Used to validate the XML files in one of the osinfo database locations for compliance with the RNG schema.

  • osinfo-db-path: Used to report the paths associated with the standard database locations;

The consumers …

Libosinfo and osinfo-db have management applications as their target audience. Currently the libosinfo project is consumed by big players in the virtual machine management environment such as OpenStack Nova, virt-manager, GNOME Boxes, Cockpit Machines, and KubeVirt.

… a little bit about them …

  • OpenStack Nova: An OpenStack project that provides a way to provision virtual machines, baremetal servers, and (limited supported for) system containers.

  • virt-manager: An application for managing virtual machines through libvirt.

  • GNOME Boxes: A simple application to view, access, and manage remote and virtual systems.

  • Cockpit Machines: A Cockpit extension to manage virtual machines running on the host.

  • KubeVirt: Virtual Machine Management on Kubernetes.

… and why they use it

  • Download ISOs: As libosinfo provides the ISO URLs, management applications can offer the user the option to download a specific operating system;

  • Automatically detect the ISO being used: As libosinfo can detect the operating system of an ISO, management applications can use this info to set reasonable default values for resources, to select the hardware supported, and to perform unattended installations.

  • Start tree installation: As libosinfo provides the tree installation URLs, management applications can use it to start a network-based installation without having to download the whole operating system ISO;

  • Set reasonable default values for RAM, CPU, and disk resources: As libosinfo knows the values that are recommended by the operating system’s vendors, management applications can rely on that when setting the default resources for an installation.

  • Automatically set the hardware supported: As libosinfo provides the list of hardware supported by an operating system, management applications can choose the best defaults based on this information, without taking the risk of ending up with a non-bootable guest.

  • Unattended install: as libosinfo provides unattended installations scripts for CentOS, Debian, Fedora, Fedora Silverblue, Microsoft Windows, OpenSUSE, Red Hat Enterprise Linux, and Ubuntu, management applications can perform unattended installations for both “Workstation” and “Server” profiles.

What’s next?

The next blog post will provide a “demo” of an unattended installation using both GNOME Boxes and virt-install and, based on that, explain how libosinfo is internally used by these projects.

By doing that, we’ll both cover how libosinfo can be used and also demonstrate how it can ease the usage of those management applications.

October 16, 2019 12:00 AM

October 07, 2019

KVM on Z

KVM on IBM z15 Features

To take advantage of the new features of z15, the latest addition to the IBM Z family as previously announced here, use any of the following CPU models in your guest's domain XML:
  • Pre-defined model for z15
      <cpu mode='custom'>
  • Use z15 features in a migration-safe way (recommended). E.g. when running on z15 this will be a superset of the gen15a model, and feature existence will be verified on the target system prior to a live migration:
      <cpu mode='host-model'/>
  • Use z15 features in a non-migration-safe way. I.e. feature existence will not be verified on the target system prior to a live migration:
      <cpu mode='host-passthrough'/>
Here is a list of features of the new hardware generation as supported in Linux kernel 5.2 and QEMU 4.1, all activated by default in the CPU models listed above:
  • Miscellaneous Instructions
    Following the example of previous machines, new helper and general purpose instructions were
      minste3     Miscellaneous-Instruction-Extensions Facility 3 
  • SIMD Extensions
    Following up to the SIMD instructions as introduced with the previous z13 and z14 models, this feature again provides further vector instructions, which can again be used in KVM guests.
    These new vector instructions can be used to improve decimal calculations as well as for implementing high performance variants of certain cryptographic operations.
    In the z15 CPU models, the respective feature is:
      vxpdeh      Vector-Packed-Decimal-Enhancement Facility
      vxeh2       Vector enhancements facility 2
  • Deflate Conversion
    Provide acceleration for zlib compression and decompression
    In the z15 CPU model, the respective feature is:
      dflt        Deflate conversion facility
  • MSA Updates
    z15 introduces a new Message Security Assist MSA9, providing elliptic curve cryptography. It supports message authentication, the generation of elliptic curve keys, and scalar multiplication.
    This feature can be exploited in KVM guests' kernels and userspace applications independently (i.e. a KVM guest's userspace applications can take advantage of these features irrespective of the guest's kernel version).
    In the z15 CPU model, the respective feature is:
      msa9        Message-security-assist-extension 9 facility
      msa9_pckmo  Message-security-assist-extension 9 PCKMO
                  subfunctions for protected ECC keys
The z15 CPU model will be backported into several distributions. It is already out for Ubuntu 18.04 and later, and work on inclusion of the major Linux distributions supporting KVM on Z is underway.

by Stefan Raspl ( at October 07, 2019 04:21 PM

September 26, 2019

Gerd Hoffmann

VGA and other display devices in qemu

There are alot of emulated display devices available in qemu. This blog post introduces them, explains the differences between them and the use cases they are good for.

The TL;DR version is in the recommendations section at the end of the article.

standard VGA

  • qemu: -vga std or -device VGA
  • libvirt: <model type='vga'/>
  • &check VGA compatible
  • &check vgabios support
  • &check UEFI support (QemuVideoDxe)
  • &check linux driver (bochs-drm.ko)

This is the default display device (on x86). It provides full VGA compatibility and support for a simple linear framebuffer (using the bochs dispi interface). It is the best choice compatibility wise, pretty much any guest should be able to bring up a working display on this device. Performance or usability can be better with other devices, see discussion below.

The device has 16 MB of video memory by default. This can be changed using the vgamem_mb property, -device VGA,vgamem_mb=32 for example will double the amount of video memory. The size must be a power of two, the valid range is 1 MB to 256 MB.

The linux driver supports page-flipping, so having room for 3-4 framebuffers is a good idea. The driver can leave the framebuffers in vram then instead of swapping them in and out. FullHD (1920x1080) for example needs a bit more than 8 MB for a single framebuffer, so 32 or 64 MB would be a good choice for that.

The UEFI setup allows to choose the display resolution which OVMF will use to initialize the display at boot. Press ESC at the tianocore splash screen to enter setup, then go to "Device Manager" &rightarrow "OVMF Platform Configuration".

bochs display device

  • qemu: -device bochs-display
  • libvirt: <model type='bochs'/>
  • &cross not VGA compatible
  • &check vgabios support
  • &check UEFI support (QemuVideoDxe)
  • &check linux driver (bochs-drm.ko)

This device supports a simple linear framebuffer. It also uses the bochs dispi interface for modesetting, therefore the linear framebuffer configuration is fully compatible to the standard VGA device.

The bochs display is not VGA compatible though. There is no support for text mode, planar video modes, memory windows at 0xa0000 and other legacy VGA features in the virtual hardware.

Main advantage over standard VGA is that this device is alot simpler. The code size and complexity needed to emulate this device is an order of magnitude smaller, resulting in a reduced attack surface. Another nice feature is that you can place this device in a PCI Express slot.

For UEFI guests it is safe to use the bochs display device instead of the standard VGA device. The firmware will setup a linear framebuffer as GOP anyway and never use any legacy VGA features.

For BIOS guests this device might be useable as well, depending on whenever they depend on direct VGA hardware access or not. There is a vgabios which supports text rendering on a linear framebuffer, so software which uses the vgabios services for text output will continue to work. Linux bootloaders typically fall into this category. The linux text mode console (vgacon) uses direct hardware access and does not work. The framebuffer console (fbcon running on vesafb or bochs-drm) works.

virtio vga

  • qemu: -vga virtio or -device virtio-vga
  • libvirt: <model type='virtio'/> (on x86).
  • &check VGA compatible
  • &check vgabios support
  • &check UEFI support (QemuVideoDxe)
  • &check linux driver (virtio-gpu.ko)

This is a modern, virtio-based display device designed for virtual machines. It comes with VGA compatibility mode. You need a guest driver to make full use of this device. If your guest OS has no driver it should still show a working display thanks to the VGA compatibility mode, but the device will not provide any advantages over standard VGA then.

This device has (optional) hardware-assisted opengl acceleration support. This can be enabled using the virgl=on property, which in turn needs opengl support enabled (gl=on) in the qemu display.

This device has multihead support, can be enabled using the max_outputs=2 property.

This device has no dedicated video memory (except for VGA compatibility), gpu data will be stored in main memory instead. Therefore this device has no config options for video memory size.

This is the place where most development happens, support for new, cool features will most likely be added to this device.

virtio gpu

  • qemu: -device virtio-gpu-pci
  • libvirt: <model type='virtio'/> (on arm).
  • &cross not VGA compatible
  • &cross no vgabios support
  • &check UEFI support (VirtioGpuDxe)
  • &check linux driver (virtio-gpu.ko)

This device lacks VGA compatibility mode but is otherwise identical to the virtio vga device. UEFI firmware can handle this, and if your guests has drivers too you can use this instead of virtio-vga. This will reduce the attack surface (no complex VGA emulation support) and reduce the memory footprint by 8 MB (no pci memory bar for VGA compatibility). This device can be placed in a PCI Express slot.

vhost-user virtio gpu

There is a vhost-user variant for both virtio vga and virtio gpu. This allows to run the virtio-gpu emulation in a separate process. This is good from the security perspective, especially if you want use virgl 3D acceleration, and it also helps with opengl performance.

Run the gpu emulation process (see contrib/vhost-user-gpu/ in the qemu source tree):

./vhost-user-gpu --virgl -s vgpu.sock

Run qemu:

qemu \
  -chardev socket,id=vgpu,path=vgpu.sock \
  -device vhost-user-vga,chardev=vgpu \
  [ ... ]

libvirt support is in the works.

qxl vga

  • qemu: -vga qxl or -device qxl-vga.
  • libvirt: <model type='qxl' primary='yes'/>.
  • &check VGA compatible
  • &check vgabios support
  • &check UEFI support (QemuVideoDxe)
  • &check linux driver (qxl.ko)
  • &check windows driver

This is a slightly dated display device designed for virtual machines. It comes with VGA compatibility mode. You need a guest driver to make full use of this device. If your guest OS has no driver it should still show a working display thanks to the VGA compatibility mode, but the device will not provide any advantages over standard VGA then.

This device has support for 2D acceleration. This becomes more and more useless though as modern display devices don't have dedicated 2D acceleration support any more and use the 3D engine for everything. The same happens on the software side, modern desktops are rendering with opengl or vulkan instead of using 2D acceleration.

Spice and qxl support offloading 2D acceleration to the spice client (typically virt-viewer these days). That is quite complex and with 2D acceleration being on the way out this becomes increasingly useless too. You might want pick some simpler device for security reasons.

This device has multihead support, can be enabled using the max_outputs=2 property. The linux driver will use this, the windows driver expects multiple devices instead (see below).

The amount of video memory for this device is configurable using the ram_size_mb and vram_size_mb properties for the two pci memory bars. The default is 64 MB for both, which should be plenty for typical use cases. When using 4K display resolution or multihead support you should assign more video memory though. When using small resolutions like 1024x768 you can assign less video memory to reduce the memory footprint.


  • qemu: -device qxl.
  • libvirt: <model type='qxl' primary='no'/>.

This device lacks VGA compatibility mode but is otherwise identical to the qxl vga device. Providing multihead support for windows guests is pretty much the only use case for this device. The windows guest driver expects one qxl device per secondary display (additionally to one qxl-vga device for the primary display).

cirrus vga

  • qemu: -vga cirrus or -device cirrus-vga.
  • libvirt: <model type='cirrus'/>.
  • &check VGA compatible
  • &check vgabios support
  • &check UEFI support (QemuVideoDxe)
  • &check linux driver (cirrus.ko)

Emulates a Cirrus SVGA device which used to be modern in the 90ies of the last century, more than 20 years ago. For the most part my blog article from 2014 is still correct; the device is mostly useful for guests which are equally old and are shipped with a driver for cirrus vga devices.

Two things have changed meanwhile though: Since qemu version 2.2 cirrus is not the default vga device any more. Also the cirrus driver in the linux kernel has been completely rewritten. In kernel 5.2 & newer the cirrus driver uses a shadow framebuffer and converts formats on the fly to hide some of the cirrus oddities from userspace (Xorg/wayland), so things are working a bit better now. That doesn't cure everything though, especially the available display resolutions are still constrained by the small amount of video memory.

ati vga

  • qemu: -device ati-vga.
  • &check VGA compatible
  • &check vgabios support
  • &cross no UEFI support

Emulates two ATI SVGA devices, the model property can be used to pick the variant. model=rage128p selects the "Rage 128 Pro" and model=rv100 selects the "Radeon RV100".

The devices are newer (late 90ies / early 2000) and more modern than the cirrus VGA. Nevertheless the use case is very similar: For guests of a similar age which are shipping with drivers for those devices.

This device has been added recently to qemu, development is in progress still. The fundamentals are working (modesetting, hardware cursor). Most important 2D accel ops are implemented too. 3D acceleration is not implemented yet.

Linux has both drm and fbdev drivers for these devices. The drm drivers are not working due to emulation being incomplete still (which hopefully changes in the future). The fbdev drivers are working. Modern linux distros prefer the drm drivers though. So you probably have to build your own kernel if you want use this device.


  • qemu: -device ramfb.
  • &cross not VGA compatible
  • &check vgabios support
  • &check UEFI support (QemuRamfbDxe)

Very simple display device. Uses a framebuffer stored in guest memory. The firmware initializes it and allows to use it as boot display (grub boot menu, efifb, ...) without needing complex legacy VGA emulation. Details can be found here.

no display device

  • qemu: -vga none -nographic.

You don't have to use a display device. If you don't need one you can run your guests with a serial console instead.

embedded devices

There are a bunch of other display devices. Those are typically SoC-specific and used by embedded board emulation. Just mentioning them here for completeness. You can't select the display device for embedded boards, the qemu emulation simply matches physical hardware here.


For the desktop use case (assuming display performance matters and/or you need multihead support), in order of preference:

For the server use case (assuming the GUI is rarely used, or not at all), in order of preference:

On arm systems display devices with a pci memory bar do not work, which reduces the choices alot. We are left with:

by Gerd Hoffmann at September 26, 2019 10:00 PM

September 13, 2019

KVM on Z

IBM z15 announced

Today, IBM announced the new IBM Z models:
Furthermore, check the updated IBM Z tested platforms matrix here.
We will look at features in support of the new IBM Z model in a separate blog entry soon.

by Stefan Raspl ( at September 13, 2019 09:42 AM

September 02, 2019

KVM on Z

virt-manager 2.2 released

virt-manager v2.2 was released a while ago. One feature we contributed is the ability to choose a temporary boot device. Here is a quick write-up on how to use that feature.

virt-xml is a simple command line tool for editing domain definitions. It can be used interactively or for batch processing. Starting with virt-manager v2.2, virt-xml allows to boot a guest with a temporarily changed domain definition. This allows us to specify a boot configuration other than the guest's current domain definition. This is especially useful as the IBM Z architecture allows for only a single boot device, and therefore the boot order settings do not work the way they do on other platforms: If the first boot device fails to boot, there is no attempt made to boot from the next boot device. In addition, the architecture/BIOS has no support for interactively changing the boot device during the boot/IPL process.
Therefore, two new command line options were introduced:
  • --no-define makes any changes to the domain definition transient (i.e. the guest's persistent domain XML will not be modified)
  • --start allows the user to start the domain after the changes to the domain XML were applied
Here is a simple example illustrating the usage:
  1. First, select the device which should be changed by a selector. In this example, the unique target name of the disk is used. See man virt-xml for a list of further possibilities.
  2. Temporarily modify the boot order, assign the first slot to device vdc, and start the guest right away:

      $ virt-xml --edit target='vdc' --disk='boot_order=1' --start \
                 --no-define sample_domain

    Note: If there was another device that already had boot_order=1, its boot order would be incremented.
As soon the guest is stopped, the changes will vanish.

    by Stefan Raspl ( at September 02, 2019 03:22 PM

    August 29, 2019

    KVM on Z

    Webinar: How to Virtualize with KVM in Live Demo, August 28

    We will explain basic KVM concepts, including CPU and memory virtualization, storage, network management, as well as a brief overview about commonalities and differences with other virtualization environments. Furthermore, a live demo will demonstrate how to use the KVM management tools to create and install Linux guests, how to operate and monitor.

    Christian Bornträger, Chief Product Owner Linux and KVM on IBM Z.

    Register here. You can check the system requirements here.
    After registering, you will receive a confirmation email containing information about joining the webinar.

    Replay & Archive
    All sessions are recorded. For the archive as well as a replay and handout of this session and all previous webcasts see here.

    by Stefan Raspl ( at August 29, 2019 12:10 PM

    August 24, 2019

    Stefano Garzarella

    How to measure the boot time of a Linux VM with QEMU/KVM

    The stefano-garzarella/qemu-boot-time repository contains a Python perf-script and (Linux, QEMU, SeaBIOS) patches to measure the boot time of a Linux VM with QEMU/KVM.

    Using I/O writes, we can trace events to measure the time consumed during the boot phase by the different components:

    We extended the I/O port addresses and values defined in qboot/benchmark.h adding new trace points to trace the kernel boot time.

    In the repository you can find patches for Linux, QEMU, and SeaBIOS to add the I/O writes in the components involved during the boot, and a Python perf-script useful to process the data recorded through perf using perf-script’s built-in Python interpreter.

    Trace points

    The benchmark.h file contains the following trace points used in the patches:

    • QEMU
      • qemu_init_end: first kvm_entry (i.e. QEMU initialized has finished)
    • Firmware (SeaBIOS + optionrom or qboot)
      • fw_start: first entry of the firmware
      • fw_do_boot: after the firmware initialization (e.g. PCI setup, etc.)
      • linux_start_boot: before the jump to the Linux kernel
      • linux_start_pvhboot: before the jump to the Linux PVH kernel
    • Linux Kernel
      • linux_start_kernel: first entry of the Linux kernel
      • linux_start_user: before starting the init process

    Custom trace points

    If you want to add new trace points, you can simply add an I/O write to LINUX_EXIT_PORT or FW_EXIT_PORT I/O port with a value (> 7) that identifies the trace point:

        outb(10, LINUX_EXIT_PORT);

    The perf script output will contain Exit point 10 line that identifies your custom trace point:

     qemu_init_end: 143.770419
     fw_start: 143.964328 (+0.193909)
     fw_do_boot: 164.71107 (+20.746742)
     Exit point 10: 165.396804 (+0.685734)
     linux_start_kernel: 165.979486 (+0.582682)
     linux_start_user: 272.178335 (+106.198849)

    How to use

    Clone qemu-boot-time repository

    cd ${REPOS}
    git clone

    Apply patches to Linux, QEMU and SeaBIOS

    Trace points are printed only if they are recorded, so you can enable few of them, patching only the components that you are interested in.


    Apply the patches/linux.patch to your Linux kernel in order to trace kernel events

    cd ${REPOS}/linux
    git checkout -b benchmark
    git am ${REPOS}/qemu-boot-time/patches/linux.patch


    Apply the patches/qemu.patch to your QEMU in order to trace optionrom events

    cd ${REPOS}/qemu
    git checkout -b benchmark
    git am ${REPOS}/qemu-boot-time/patches/qemu.patch
    mkdir build-benchmark
    cd build-benchmark
    ../configure --target-list=x86_64-softmmu ...

    You can use qemu-system-x86_64 -L ${REPOS}/qemu/build-benchmark/pc-bios/optionrom/ ... to use the optionrom patched.


    Apply the patches/seabios.patch to your SeaBIOS in order to trace bios events

    cd ${REPOS}/seabios
    git checkout -b benchmark
    git am ${REPOS}/qemu-boot-time/patches/seabios.patch
    make clean distclean
    cp ${REPOS}/qemu/roms/config.seabios-256k .config
    make oldnoconfig

    You can use qemu-system-x86_64 -bios ${REPOS}/seabios/out/bios.bin ... to use the SeaBIOS image patched.


    qboot already defines trace points, we just need to compile it defining BENCHMARK_HACK

    cd ${REPOS}/qboot
    make clean

    You can use qemu-system-x86_64 -bios ${REPOS}/qboot/bios.bin ... to use the qboot image.

    Enable KVM events

    The following steps allow perf record to get the kvm trace events:

    echo 1 > /sys/kernel/debug/tracing/events/kvm/enable
    echo -1 > /proc/sys/kernel/perf_event_paranoid
    mount -o remount,mode=755 /sys/kernel/debug
    mount -o remount,mode=755 /sys/kernel/debug/tracing

    Record the trace events

    Start perf record to get the trace events

    perf record -a -e kvm:kvm_entry -e kvm:kvm_pio -e sched:sched_process_exec \
                -o $PERF_DATA &

    You can run QEMU multiple times to get also some statistics (Avg/Min/Max)

    qemu-system-x86_64 -machine q35,accel=kvm \
                       -bios seabios/out/bios.bin \
                       -L qemu/build-benchmark/pc-bios/optionrom/ \
                       -kernel linux/bzImage ...
    qemu-system-x86_64 -machine q35,accel=kvm \
                       -bios seabios/out/bios.bin \
                       -L qemu/build-benchmark/pc-bios/optionrom/ \
                       -kernel linux/bzImage ...
    qemu-system-x86_64 -machine q35,accel=kvm \
                       -bios seabios/out/bios.bin \
                       -L qemu/build-benchmark/pc-bios/optionrom/ \
                       -kernel linux/bzImage ...

    Stop perf record

    kill $PERF_PID

    Process the trace recorded using the

    Note: times printed in milliseconds

    perf script -s ${REPOS}/qemu-boot-time/perf-script/ -i $PERF_DATA
    in trace_begin
    sched__sched_process_exec     1 55061.435418353   289738 qemu-system-x86
    kvm__kvm_entry           1 55061.466887708   289741 qemu-system-x86
    kvm__kvm_pio             1 55061.467070650   289741 qemu-system-x86      rw=1, port=0xf5, size=1, count=1, val=1
    kvm__kvm_pio             1 55061.475818073   289741 qemu-system-x86      rw=1, port=0xf5, size=1, count=1, val=4
    kvm__kvm_pio             1 55061.477168037   289741 qemu-system-x86      rw=1, port=0xf4, size=1, count=1, val=3
    kvm__kvm_pio             1 55061.558779540   289741 qemu-system-x86      rw=1, port=0xf4, size=1, count=1, val=5
    kvm__kvm_pio             1 55061.686849663   289741 qemu-system-x86      rw=1, port=0xf4, size=1, count=1, val=6
    sched__sched_process_exec     4 55067.461869075   289793 qemu-system-x86
    kvm__kvm_entry           4 55067.496402472   289796 qemu-system-x86
    kvm__kvm_pio             4 55067.496555385   289796 qemu-system-x86      rw=1, port=0xf5, size=1, count=1, val=1
    kvm__kvm_pio             4 55067.505067184   289796 qemu-system-x86      rw=1, port=0xf5, size=1, count=1, val=4
    kvm__kvm_pio             4 55067.506395502   289796 qemu-system-x86      rw=1, port=0xf4, size=1, count=1, val=3
    kvm__kvm_pio             4 55067.584029910   289796 qemu-system-x86      rw=1, port=0xf4, size=1, count=1, val=5
    kvm__kvm_pio             4 55067.704751791   289796 qemu-system-x86      rw=1, port=0xf4, size=1, count=1, val=6
    sched__sched_process_exec     0 55070.073823767   289827 qemu-system-x86
    kvm__kvm_entry           0 55070.110507211   289830 qemu-system-x86
    kvm__kvm_pio             0 55070.110694645   289830 qemu-system-x86      rw=1, port=0xf5, size=1, count=1, val=1
    kvm__kvm_pio             1 55070.120092692   289830 qemu-system-x86      rw=1, port=0xf5, size=1, count=1, val=4
    kvm__kvm_pio             1 55070.121437922   289830 qemu-system-x86      rw=1, port=0xf4, size=1, count=1, val=3
    kvm__kvm_pio             1 55070.198628779   289830 qemu-system-x86      rw=1, port=0xf4, size=1, count=1, val=5
    kvm__kvm_pio             1 55070.315734630   289830 qemu-system-x86      rw=1, port=0xf4, size=1, count=1, val=6
    in trace_end
    Trace qemu-system-x86
    1) pid 289738
     qemu_init_end: 31.469355
     fw_start: 31.652297 (+0.182942)
     fw_do_boot: 40.39972 (+8.747423)
     linux_start_boot: 41.749684 (+1.349964)
     linux_start_kernel: 123.361187 (+81.611503)
     linux_start_user: 251.43131 (+128.070123)
    2) pid 289793
     qemu_init_end: 34.533397
     fw_start: 34.68631 (+0.152913)
     fw_do_boot: 43.198109 (+8.511799)
     linux_start_boot: 44.526427 (+1.328318)
     linux_start_kernel: 122.160835 (+77.634408)
     linux_start_user: 242.882716 (+120.721881)
    3) pid 289827
     qemu_init_end: 36.683444
     fw_start: 36.870878 (+0.187434)
     fw_do_boot: 46.268925 (+9.398047)
     linux_start_boot: 47.614155 (+1.34523)
     linux_start_kernel: 124.805012 (+77.190857)
     linux_start_user: 241.910863 (+117.105851)
     qemu_init_end: 34.228732
     fw_start: 34.403161 (+0.174429)
     fw_do_boot: 43.288918 (+8.885757)
     linux_start_boot: 44.630088 (+1.34117)
     linux_start_kernel: 123.442344 (+78.812256)
     linux_start_user: 245.408296 (+121.965952)
     qemu_init_end: 31.469355
     fw_start: 31.652297 (+0.182942)
     fw_do_boot: 40.39972 (+8.747423)
     linux_start_boot: 41.749684 (+1.349964)
     linux_start_kernel: 122.160835 (+80.411151)
     linux_start_user: 241.910863 (+119.750028)
     qemu_init_end: 36.683444
     fw_start: 36.870878 (+0.187434)
     fw_do_boot: 46.268925 (+9.398047)
     linux_start_boot: 47.614155 (+1.34523)
     linux_start_kernel: 124.805012 (+77.190857)
     linux_start_user: 242.882716 (+118.077704)

    by (Stefano Garzarella) at August 24, 2019 01:03 PM

    August 23, 2019

    Stefano Garzarella

    QEMU 4.0 boots uncompressed Linux x86_64 kernel

    QEMU 4.0 is now able to boot directly into the uncompressed Linux x86_64 kernel binary with minimal firmware involvement using the PVH entry point defined in the x86/HVM direct boot ABI. (CONFIG_PVH=y must be enabled in the Linux config file).

    The x86/HVM direct boot ABI was initially developed for Xen guests, but with latest changes in both QEMU and Linux, QEMU is able to use that same entry point for booting KVM guests.


    • QEMU >= 4.0
    • Linux kernel >= 4.21
      • CONFIG_PVH=y enabled
      • vmlinux uncompressed image

    How to use

    To boot the PVH kernel image, you can use the -kernel parameter specifying the path to the vmlinux image.

    qemu-system-x86_64 -machine q35,accel=kvm \
        -kernel /path/to/vmlinux \
        -drive file=/path/to/rootfs.ext2,if=virtio,format=raw \
        -append 'root=/dev/vda console=ttyS0' -vga none -display none \
        -serial mon:stdio

    The -initrd and -append parameters are also supported as for compressed images.


    QEMU will automatically recognize if the vmlinux image has the PVH entry point and it will use SeaBIOS with the new pvh.bin optionrom to load the uncompressed image into the guest VM.

    As an alternative, qboot can be used to load the PVH image.


    Perf script and patches used to measure boot time:

    The following values are expressed in milliseconds [ms]

    • QEMU (q35 machine) + SeaBIOS + bzImage

      • qemu_init_end: 36.072056
      • linux_start_kernel: 114.669522 (+78.597466)
      • linux_start_user: 191.748567 (+77.079045)
    • QEMU (q35 machine) + SeaBIOS + vmlinux(PVH)

      • qemu_init_end: 51.588200
      • linux_start_kernel: 62.124665 (+10.536465)
      • linux_start_user: 139.460582 (+77.335917)
    • QEMU (q35 machine) + qboot + bzImage

      • qemu_init_end: 36.443638
      • linux_start_kernel: 106.73115 (+70.287512)
      • linux_start_user: 184.575531 (+77.844381)
    • QEMU (q35 machine) + qboot + vmlinux(PVH)

      • qemu_init_end: 51.877656
      • linux_start_kernel: 56.710735 (+4.833079)
      • linux_start_user: 133.808972 (+77.098237)
    • Tracepoints:

      • qemu_init_end: first kvm_entry (i.e. QEMU initialization has finished)
      • linux_start_kernel: first entry of the Linux kernel (start_kernel())
      • linux_start_user: before starting the init process


    Linux patches merged upstream in Linux 4.21:

    QEMU patches merged upstream in QEMU 4.0:

    qboot patches merged upstream:

    by (Stefano Garzarella) at August 23, 2019 01:26 PM

    August 22, 2019

    Stefano Garzarella

    iperf3-vsock: how to measure VSOCK performance

    The iperf-vsock repository contains few patches to add the support of VSOCK address family to iperf3. In this way iperf3 can be used to measure the performance between guest and host using VSOCK sockets.

    The VSOCK address family facilitates communication between virtual machines and the host they are running on.

    To test VSOCK sockets (only Linux), you must use the new option --vsock on both server and client. Other iperf3 options (e.g. -t, -l, -P, -R, --bidir) are well supported by VSOCK tests.


    • Linux host kernel >= 4.8
    • Linux guest kernel >= 4.8
    • QEMU >= 2.8

    Build iperf3-vsock from source

    Clone repository

    git clone
    cd iperf-vsock


    mkdir build
    cd build

    (Note: If configure fails, try running ./ first)

    Example with Fedora 30 (host and guest):

    Host: start the VM

    sudo modprobe vhost_vsock
    sudo qemu-system-x86_64 -m 1G -smp 2 -cpu host -M accel=kvm	\
         -drive if=virtio,file=/path/to/fedora.img,format=qcow2     \
         -device vhost-vsock-pci,guest-cid=${GUEST_CID}

    Guest: start iperf server

    # SELinux can block you, so you can write a policy or temporally disable it
    sudo setenforce 0
    iperf-vsock/build/src/iperf3 --vsock -s

    Host: start iperf client

    iperf-vsock/build/src/iperf3 --vsock -c ${GUEST_CID}


    Connecting to host 3, port 5201
    [  5] local 2 port 4008596529 connected to 3 port 5201
    [ ID] Interval           Transfer     Bitrate
    [  5]   0.00-1.00   sec  1.30 GBytes  11.2 Gbits/sec
    [  5]   1.00-2.00   sec  1.67 GBytes  14.3 Gbits/sec
    [  5]   2.00-3.00   sec  1.57 GBytes  13.5 Gbits/sec
    [  5]   3.00-4.00   sec  1.49 GBytes  12.8 Gbits/sec
    [  5]   4.00-5.00   sec   971 MBytes  8.15 Gbits/sec
    [  5]   5.00-6.00   sec  1.01 GBytes  8.71 Gbits/sec
    [  5]   6.00-7.00   sec  1.44 GBytes  12.3 Gbits/sec
    [  5]   7.00-8.00   sec  1.62 GBytes  13.9 Gbits/sec
    [  5]   8.00-9.00   sec  1.61 GBytes  13.8 Gbits/sec
    [  5]   9.00-10.00  sec  1.63 GBytes  14.0 Gbits/sec
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval           Transfer     Bitrate
    [  5]   0.00-10.00  sec  14.3 GBytes  12.3 Gbits/sec                  sender
    [  5]   0.00-10.00  sec  14.3 GBytes  12.3 Gbits/sec                  receiver
    iperf Done.

    by (Stefano Garzarella) at August 22, 2019 03:52 PM

    August 19, 2019

    KVM on Z

    QEMU v4.1 released

    QEMU v4.1 is out. For highlights from a KVM on Z perspective see the Release Notes.
    Note: The DASD IPL feature is still considered experimental.

    by Stefan Raspl ( at August 19, 2019 01:14 PM

    August 16, 2019

    QEMU project

    QEMU version 4.1.0 released

    We would like to announce the availability of the QEMU 4.1.0 release. This release contains 2000+ commits from 176 authors.

    You can grab the tarball from our download page. The full list of changes are available in the Wiki.

    Highlights include:

    • ARM: FPU emulation support for Cortex-M CPUs, FPU fixes for Cortex-R5F
    • ARM: ARMv8.5-RNG extension support for CPU-generated random numbers
    • ARM: board build options now configurable via new Kconfig-based system
    • ARM: Exynos4210 SoC model now supports PL330 DMA controllers
    • MIPS: improved emulation performance of numerous MSA instructions, mostly integer and data permuting operations
    • MIPS: improved support for MSA ASE instructions on big-endian hosts, handling for ‘division by zero’ cases now matches reference hardware
    • PowerPC: pseries: support for NVIDIA V100 GPU/NVLink2 passthrough via VFIO
    • PowerPC: pseries: in-kernel acceleration for XIVE interrupt controller
    • PowerPC: pseries: supporting for hot-plugging PCI host bridges
    • PowerPC: emulation optimizations for vector (Altivec/VSX) instructions
    • RISC-V: support for new “spike” machine model
    • RISC-V: ISA 1.11.0 support for privileged architectures
    • RISC-V: improvements for 32-bit syscall ABI, illegal instruction handling, and built-in debugger
    • RISC-V: support for CPU topology in device trees
    • s390: bios support for booting from ECKD DASD assigned to guest via vfio-ccw
    • s390: emulation support for all “Vector Facility” instructions
    • s390: additional facilities and support for gen15 machines, including support for AP Queue Interruption Facility for using interrupts for vfio-ap devices
    • SPARC: sun4m: sun4u: fixes when running with -vga none (OpenBIOS)
    • x86: emulation support for new Hygon Dhyana and Intel SnowRidge CPU models
    • x86: emulation support for RDRAND extension
    • x86: md-clear/mds-no feature flags, for detection/mitigation of MDS vulnerabilities (CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091)
    • x86: CPU die topology now configurable using -smp …,dies=
    • Xtensa: support for memory protection unit (MPU) option
    • Xtensa: support for Exclusive Access option
    • GUI: virtio-gpu 2D/3D rendering may now be offloaded to an external vhost-user process, such as QEMU vhost-user-gpu
    • GUI: semihosting output can now be redirected to a chardev backend
    • qemu-img: added a –salvage option to qemu-img convert, which prevents the conversion process from aborting on I/O errors (can be used for example to salvage partially corrupted qcow2 files)
    • qemu-img: qemu-img rebase works now even when the input file doesn’t have a backing file yet
    • VMDK block driver now has read-only support for the seSparse subformat
    • GPIO: support for SiFive GPIO controller
    • and lots more…

    Thank you to everyone involved!

    August 16, 2019 05:50 AM

    August 09, 2019

    KVM on Z

    New Documentation: Configuring Crypto Express Adapters for KVM Guests

    See here for a new documentation release on how to configure Crypto Express adapters for KVM guests.

    by Stefan Raspl ( at August 09, 2019 02:49 PM

    August 07, 2019

    Daniel Berrange

    ANNOUNCE: gtk-vnc 1.0.0 release

    I’m pleased to announce a new release of GTK-VNC, version 1.0.0. (211K)
    sha256sum: a81a1f1a79ad4618027628ffac27d3391524c063d9411c7a36a5ec3380e6c080

    Pay particular attention to the first two major changes in
    this release:

    • Autotools build system replaced with meson
    • Support for GTK-2 is dropped. GTK-3 is mandatory
    • Support for libview is dropped in example program
    • Improvements to example demos
    • Use MAP_ANON if MAP_ANONYMOUS doesn’t exist to help certain macOS versions
    • Fix crash when connection attempt fails early
    • Initialize gcrypt early in auth process
    • Emit vnc-auth-failure signal when SASL auth fals
    • Emit vnc-error signal when authentication fails
    • Fix double free when failing to read certificates
    • Run unit tests in RPM build
    • Modernize RPM spec
    • Fix race condition in unit tests
    • Fix install of missing header for cairo framebuffer
    • Fix typo in gir package name
    • Add missing VncPixelFormat file to gir data

    Thanks to all those who reported bugs and provides patches that went into this new release.

    by Daniel Berrange at August 07, 2019 03:06 PM

    August 05, 2019

    Stefan Hajnoczi

    Determining why a Linux syscall failed

    One is often left wondering what caused an errno value when a system call fails. Figuring out the reason can be tricky because a single errno value can have multiple causes. Applications get an errno integer and no additional information about what went wrong in the kernel.

    There are several ways to determine the reason for a system call failure (from easiest to most involved):

    1. Check the system call's man page for the meaning of the errno value. Sometimes this is enough to explain the failure.
    2. Check the kernel log using dmesg(1). If something went seriously wrong (like a hardware error) then there may be a detailed error information. It may help to increase the kernel log level.
    3. Read the kernel source code to understand various error code paths and identify the most relevant one.
    4. Use the function graph tracer to see which code path was taken.
    5. Add printk() calls, recompile the kernel (module), and rerun to see the output.

    Reading the man page and checking dmesg(1) are fairly easy for application developers and do not require knowledge of kernel internals. If this does not produce an answer then it is necessary to look closely at the kernel source code to understand a system call's error code paths.

    This post discusses the function graph tracer and how it can be used to identify system call failures without recompiling the kernel. This is useful because running a custom kernel may not be possible (e.g. due to security or reliability concerns) and recompiling the kernel is slow.

    An example

    In order to explore some debugging techniques let's take the io_uring_setup(2) system call as an example. It is failing with ENOMEM but the system is not under memory pressure, so ENOMEM is not expected.

    The io_uring_setup(2) source code (fs/io_uring.c) contains many ENOMEM locations but it is not possible to conclusively identify which one is failing. The next step is to determine which code path is taken using dynamic instrumentation.

    The function graph tracer

    The Linux function graph tracer records kernel function entries and returns so that function call relationships are made apparent. The io_uring_setup(2) system call is failing with ENOMEM but it is unclear at which point in the system call this happens. It is possible to find the answer by studying the function call graph produced by the tracer and following along in the Linux source code.

    Since io_uring_setup(2) is a system call it's not an ordinary C function definition and has a special symbol name in the kernel ELF file. It is possible to look up the (architecture-specific) symbol for the currently running kernel:

    # grep io_uring_setup /proc/kallsyms
    ffffffffbd357130 T __x64_sys_io_uring_setup

    Let's trace all __x64_sys_io_uring_setup calls:

    # cd /sys/kernel/debug/tracing
    # echo '__x64_sys_io_uring_setup' > set_graph_function
    # echo 'function_graph' >current_tracer
    # cat trace_pipe >/tmp/trace.log run the application in another terminal...
    The trace contains many successful io_uring_setup(2) calls that look like this:
     1)               |  __x64_sys_io_uring_setup() {
    1) | io_uring_setup() {
    1) | capable() {
    1) | ns_capable_common() {
    1) | security_capable() {
    1) 0.199 us | cap_capable();
    1) 7.095 us | }
    1) 7.594 us | }
    1) 8.059 us | }
    1) | kmem_cache_alloc_trace() {
    1) | _cond_resched() {
    1) 0.244 us | rcu_all_qs();
    1) 0.708 us | }
    1) 0.208 us | should_failslab();
    1) 0.220 us | memcg_kmem_put_cache();
    1) 2.201 us | }
    1) | fd_install() {
    1) 0.223 us | __fd_install();
    1) 0.643 us | }
    1) ! 190.396 us | }
    1) ! 216.236 us | }

    Although the goal is to understand system call failures, looking at a successful invocation can be useful too. Failed calls in trace output can be identified on the basis that they differ from successful calls. This knowledge can be valuable when searching through large trace files. A failed io_uring_setup(2) call aborts early and does not invoke fd_install(). Now it is possible to find a failed call amongst all the io_uring_setup(2) calls:

     2)               |  __x64_sys_io_uring_setup() {
    2) | io_uring_setup() {
    2) | capable() {
    2) | ns_capable_common() {
    2) | security_capable() {
    2) 0.236 us | cap_capable();
    2) 0.872 us | }
    2) 1.419 us | }
    2) 1.951 us | }
    2) 0.419 us | free_uid();
    2) 3.389 us | }
    2) + 48.769 us | }

    The fs/io_uring.c code shows the likely error code paths:

            account_mem = !capable(CAP_IPC_LOCK);

    if (account_mem) {
    ret = io_account_mem(user,
    ring_pages(p->sq_entries, p->cq_entries));
    if (ret) {
    return ret;

    ctx = io_ring_ctx_alloc(p);
    if (!ctx) {
    if (account_mem)
    io_unaccount_mem(user, ring_pages(p->sq_entries,
    return -ENOMEM;

    But is there enough information in the trace to determine which of these return statements is executed? The trace shows free_uid() so we can be confident that both these code paths are valid candidates. By looking back at the success code path we can use the kmem_cache_alloc_trace() as a landmark. It is called by io_ring_ctx_alloc() so we should see kmem_cache_alloc_trace() in the trace before free_uid() if the second return statement is taken. Since it does not appear in the trace output we conclude that the first return statement is being taken!

    When trace output is inconclusive

    Function graph tracer output only shows functions in the ELF file. When the compiler inlines code, no entry or return is recorded in the function graph trace. This can make it hard to identify the exact return statement taken in a long function. Functions containing few function calls and many conditional branches are also difficult to analyze from just a function graph trace.

    We can enhance our understanding of the trace by adding dynamic probes that record function arguments, local variables, and/or return values via perf-probe(2). By knowing these values we can make inferences about the code path being taken.

    If this is not enough to infer which code path is being taken, detailed code coverage information is necessary.

    One way to approximate code coverage is using a sampling CPU profiler, like perf(1), and letting it run under load for some time to gather statistics on which code paths are executed frequently. This is not as precise as code coverage tools, which record each branch encountered in a program, but it can be enough to observe code paths in functions that are not amenable to the function graph tracer due to the low number of function calls.

    This is done as follows:

    1. Run the system call in question in a tight loop so the CPU is spending a significant amount of time in the code path you wish to observe.
    2. Start perf record -a and let it run for 30 seconds.
    3. Stop perf-record(1) and run perf-report(1) to view the annotated source code of the function in question.

    The error code path should have a significant number of profiler samples and it should be prominent in the pref-report(1) annotated output.


    Determining the cause for a system call failure can be hard work. The function graph tracer is helpful in shedding light on the code paths being taken by the kernel. Additional debugging is possible using perf-probe(2) and the sampling profiler, so that in most cases it's not necessary to recompile the kernel with printk() just to learn why a system call is failing.

    by Unknown ( at August 05, 2019 03:54 PM

    July 30, 2019

    Cole Robinson

    Blog moved to Pelican and GitHub Pages

    I've moved my blog from to a static site generated with Pelican and hosted on GitHub Pages. This is a dump of some of the details.

    The content is hosted in three branches across two repos:

    The motivation for the split is that according to this pelican SEO article, master branches of GitHub repos are indexed by google, so if you store HTML content in a master branch your canonical blog might be battling your GitHub repo in the search results. And since you can only put content in the master branch of a $ repo, I added a separate blog.git repo. Maybe I could shove all the content into the blog/gh-pages branch I think dealing with multiple subdomains prevents it. I've already spent too much timing playing with all this stuff though so that's for another day to figure out. Of course, suggestions welcome, blog comments are enabled with Disqus.

    One issue I hit is that pushing updated content to blog/gh-pages doesn't consistently trigger a new GitHub Pages deployment. There's a bunch of hits about this around the web (this stackoverflow post in particular) but no authoritative explanation about what criteria GitHub Pages uses to determine whether to redeploy. The simplest 'fix' I found is to tweak the index.html content via the GitHub web UI and commit the change which seems to consistently trigger a refresh as reported by the repo's deployments page.

    You may notice the blog looks a lot like stock Jekyll with its minima theme. I didn't find any Pelican theme that I liked as much as minima, so I grabbed the CSS from a minima instance and started adapting the Pelican simple-bootstrap theme to use it. The end result is basically a simple reimplementation of minima for Pelican. I learned a lot in the process but it likely would have been much simpler if I just used Jekyll in the first place, but I'm in too deep to switch now!

    by Cole Robinson at July 30, 2019 07:30 PM

    July 12, 2019

    KVM on Z

    KVM at SHARE Pittsburgh 2019

    Yes, we will be at SHARE in Pittsburgh this August!
    See the following session in the Linux and VM/Virtualization track:

    • KVM on IBM Z News (Session #25978): Latest news on our development work with the open source community

    by Stefan Raspl ( at July 12, 2019 04:42 PM

    July 10, 2019

    Cornelia Huck

    s390x changes in QEMU 4.1

    QEMU has just entered hard freeze for 4.1, so the time is here again to summarize the s390x changes for that release.


    • All instructions that have been introduced with the "Vector Facility" in the z13 machines are now emulated by QEMU. In particular, this allows Linux distributions built for z13 or later to be run under TCG (vector instructions are generated when we compile for z13; other z13 facilities are optional.)

    CPU Models

    • As the needed prerequisites in TCG now have been implemented, the "qemu" cpu model now includes the "Vector Facility" and has been bumped to a stripped-down z13.
    • Models for the upcoming gen15 machines (the official name is not yet known) and some new facilities have been added.
    • If the host kernel supports it, we now indicate the AP Queue Interruption facility. This is used by vfio-ap and allows to provide interrupts for AP to the guest.

    I/O Devices

    • vfio-ccw has gained support for relaying HALT SUBCHANNEL and CLEAR SUBCHANNEL requests from the guest to the device, if the host kernel vfio-ccw driver supports it. Otherwise, these instructions continue to be emulated by QEMU, as before.
    • The bios now supports IPLing (booting) from DASD attached via vfio-ccw.


    • The bios tolerates signatures written by zipl, if present; but it does not actually handle them. See the 'secure' option for zipl introduced in s390-tools 2.9.0.
    And the usual fixes and cleanups.

    by Cornelia Huck ( at July 10, 2019 02:16 PM

    July 08, 2019

    KVM on Z

    SLES 15 SP1 released

    SLES 15 SP1 is out! See the announcement and their release notes with Z-specific changes.
    It ships the following code level updates:
    • QEMU v3.1 (GA: v2.11)
    • libvirt v5.1 (GA: v4.0)
    See previous blog entries on QEMU v2.12, v3.0 and v3.1 for details on new features that become available with the QEMU package update.
    Furthermore, SLES 15 SP1 introduces the kvm_stat tool, which can be used for guest event analysis.

    by Stefan Raspl ( at July 08, 2019 10:14 PM

    June 16, 2019

    Gerd Hoffmann

    macos guest support for qemu

    display support

    After one of the minor updates (10.14.3 or 10.14.4, don't remember) my macOS Mojave guest started to switch video modes at boot. Also the "Display" panel in "System Preferences" started to offer three video modes to choose from. Unfortunaly FullHD (aka 1920x1080) is not on the list.

    Decided to look into this. Assuming that macOS learned to switch video modes using the EFI GOP interface I've tweaked the mode list in OVMF (QemuVideoDxe driver). No effect. Huh?

    Next I've looked at the list of drivers, using kextstat. Found a AppleVirtualGraphics.kext entry. Interesting. Checking out Info.plist (in /System/Library/Extensions/AppleVirtualGraphics.kext/Contents) I found this:

        [ ... ]
                [ ... ]
                [ ... ]
                [ ... ]
                [ ... ]
        [ ... ]

    So recent macOS Mojave ships with a driver for qemu stdvga and qemu cirrus vga. Nice. Unfortunaly the question how to switch the display into 1920x1080 mode (to match the hosts display resolution) isn't solved yet.

    virtio support

    While looking around I've noticed there is a AppleVirtIO.kext too, with this in Info.plist:

        [ ... ]
            [ ... ]
            [ ... ]
            [ ... ]
            [ ... ]
            [ ... ]

    Apparently a virtio driver with support for virtio-console/serial, virtio-blk and virtio-9p.

    Tried to switch the system disk from sata to virtio-blk. Clover seems to be slightly confused. It stops showing the nice disk icons. But booting macOS works fine regardless. When using the transitional device, so the driver supports legacy mode only.

    virtio-9p for filesystem sharing looks pretty interesting too. So, lets try that (in libvirt xml):

      [ ... ]
        [ ... ]
        <filesystem type='mount' accessmode='mapped'>
          <source dir='/path/to/some/host/directory'/>
          <target dir='/test9p'/>
        [ ... ]
      [ ... ]

    macOS seems to not mount the filesystem automatically. But it is easy to do, using the terminal. You need to create the target directory first (sudo mkdir /test9p). Then run sudo mount_9p. Done.

    by Gerd Hoffmann at June 16, 2019 10:00 PM

    June 05, 2019

    Gerd Hoffmann

    recent qemu sound improvements

    The qemu sound system got a bunch of improvements in 2018 and 2019.

    New in qemu 3.0

    The hda emulation uses a high resolution timer now to better emulate the timing-sensitive dma transfer of sound samples. Credits for this implementation go to Martin Schrodt.

    Unfortunaly this is incompatible with older qemu versions, so it is only enabled for 3.0+ machine type versions. So upgrading qemu is not enough to get this, you also have to make sure you are using a new enough machine type (qemu -M command line switch).

    libvirt stores the machine type in the domain xml when the guest is created. It is never updated automatically. So have a look at your domain configuration (using virsh edit domain-name for example) and check the version is 3.0 or newer:

    [ ... ]
        <type arch='x86_64' machine='pc-q35-3.0'>hvm</type>
    [ ... ]

    New in qemu 3.1

    The pulseaudio backend got fixes in 3.1, so if you are using pulse you should upgrade to at least qemu version 3.1.

    New in qemu upcoming 4.0

    Yet another pulseaudio bugfix.

    Initial support for the -audiodev command line switch was finally merged. So audio support is not the odd kid any more which is configured in a completely different way, using environment variables instead of command line switches. Credits for this go to Kővágó, Zoltán.

    In the pipeline

    There are more -audiodev improvements in the pipeline, they are expected to land upstream in the 4.1 or 4.2 devel cycle.

    Latency tuning

    While being at it one final note:

    Bugs in qemu sound device emulation and audio backends are not the only possible root cause for bad sound quality. Crackling sound -- typically caused by buffer underruns -- can also be caused by latency problems elsewhere in qemu.

    One known offender is disk I/O, specifically the linux aio support which isn't as async as it should be and blocks now and then. linux aio support is configured with io=native for block device backends.

    Better choice is io=threads. In libvirt xml:

    [ ... ]
        <disk type='...' device='disk'>
          <driver name='qemu' type='...' cache='none' io='threads'/>
    [ ... ]

    Another known issue is spice audio compression, so better turn that off when using spice:

    [ ... ]
        <graphics type='spice'>
          [ ... ]
          <playback compression='off'/>
    [ ... ]

    by Gerd Hoffmann at June 05, 2019 10:00 PM

    May 22, 2019

    QEMU project

    QEMU 4.0 adds micro:bit emulation support

    micro:bit emulation support is available from QEMU 4.0 onwards and can be used for low-level software testing and development. Unlike existing micro:bit simulators, QEMU performs full-system emulation and actually runs the same ARM code as the real hardware. This blog post explains what full-system emulation means and why QEMU is now a useful tool for developing micro:bit software.

    The micro:bit is a tiny ARM board designed for teaching. It is increasingly being used around the world to expose children to computers, programming, and electronics in a low-cost way with an active online community that shares project ideas, lesson plans, and programming tips.

    micro:bit board

    Simulators and emulators

    Simulators are used for many tasks from mobile app development to performance analysis of computer hardware. It is possible to develop code using a simulator without having access to real hardware. Oftentimes using a simulator is more convenient than flashing and debugging programs on real hardware.

    Emulators allow programs written for one computer system to run on a different computer system. They use techniques like machine code interpreters and just-in-time compilers to execute guest programs that do not run natively on the host computer. Each CPU instruction must be correctly implemented by the emulator so it can run guest software.

    How existing micro:bit simulators work

    Simulators can be implemented at various layers in the software stack. The MakeCode editor for JavaScript development includes a micro:bit simulator:

    MakeCode editor

    This simulator does not execute any ARM code and is therefore not running the same CPU instructions as a real micro:bit. Instead it reuses the JavaScript engine already available in your web browser to execute micro:bit JavaScript programs. This is achieved by providing the micro:bit JavaScript APIs that micro:bit programs expect. The programs don’t need to know whether those APIs are implemented by the real micro:bit software stack or whether they are actually calling into the MakeCode simulator.

    In the screenshot above the micro:bit program calls showString("Hello world!") and this becomes a call into the MakeCode simulator code to render images of LEDs in the web browser. On real hardware the code path is different and eventually leads to an LED matrix driver that lights up the LEDs by driving output pins on the micro:bit board.

    Full-system emulation

    Unlike the MakeCode simulator, QEMU emulates the micro:bit CPU and boots from the same ARM code as the real micro:bit board. The simulation happens at the CPU instruction and hardware interface level instead of at the JavaScript API level. This is called full-system emulation because the entire guest software environment is present.

    What are the advantages of full-system emulation?

    • Programs written in any language can run (MicroPython, mbed C/C++, etc)
    • Boot, device driver, and language run-time code can be tested
    • Bugs in lower layers of the software stack can be reproduced
    • CPU architecture-specific bugs can be reproduced (stack and memory corruption bugs)
    • A debugger can be connected to inspect the entire software stack

    The main disadvantage of full-system emulation is that the performance overhead is higher since simulation happens at the CPU instruction level. Programs consist of many CPU instructions so the task of emulation is performance-sensitive. Luckily the micro:bit’s CPU is much less powerful than CPUs available in our laptops and desktops, so programs execute at a reasonable speed.

    Running micro:bit programs on QEMU

    QEMU emulates the core devices on the micro:bit, including the serial port (UART) and timers. This is enough for developing and testing low-level software but does not offer the LEDs, radio, and other devices that most micro:bit programs rely on. These devices might be emulated by QEMU in the future, but for now the main use of QEMU is for developing and testing low-level micro:bit code.

    To run test.hex:

    $ qemu-system-arm -M microbit -device loader,file=test.hex -serial stdio

    Any output written to the serial port is printed to the terminal by QEMU.

    Debugging micro:bit programs with QEMU and GDB

    QEMU has GDB guest debugging support. This means GDB can connect to QEMU in order to debug the guest software. This is similar to debugging a real system over JTAG, except no hardware is necessary!

    Connect with GDB to debug the guest:

    $ qemu-system-arm -M microbit -device loader,file=test.hex -s
    $ gdb
    (gdb) target remote tcp:
    (gdb) x/10i $pc
    => 0x161c4:	ldr	r3, [r4, #0]
       0x161c6:	cmp	r3, #0
       0x161c8:	beq.n	0x161d2
       0x161ca:	ldr	r3, [pc, #48]	; (0x161fc)
       0x161cc:	ldr	r3, [r3, #0]
       0x161ce:	cmp	r3, #0
       0x161d0:	bne.n	0x161d8
       0x161d2:	movs	r0, #6
       0x161d4:	bl	0x16160
       0x161d8:	ldr	r0, [r4, #0]

    Having a debugger is very powerful. QEMU can also load ELF files in addition to the popular .hex files used for micro:bit programs. ELF files can contain debugging information that enables source-level debugging so GDB can display function and variable names as well as listing the source code instead of showing assembly instructions.


    QEMU now offers a platform for developing and testing micro:bit programs. It is open to future extension, hopefully to emulate more devices and offer a graphical user interface.

    micro:bit emulation was contributed by Julia Suvorova and Steffen Görtz as part of their Outreachy and Google Summer of Code internships with QEMU. Jim Mussared, Joel Stanley, and Stefan Hajnoczi acted as mentors and contributed patches as well.

    May 22, 2019 10:45 AM

    May 17, 2019

    KVM on Z

    QEMU v3.1 released

    QEMU v3.1 is out. Besides a number of small enhancements, some items that we would like to highlight from a KVM on Z perspective:
    • Huge Pages Support: KVM guests can now utilize 1MB pages. As this removes one layer of address translation for the guest backing, less page-faults need to be processed, and less translation lookaside buffer (TLB) entries are needed to hold translations. This, as well as the TLB improvements in z14, will improve KVM guest performance.
      To use:
      Create config file /etc/modprobe.d/kvmhpage.conf file with the following content to enable huge pages for KVM:

         options kvm hpage=1

      Furthermore, add the following line to /etc/sysctl.conf to reserve N huge pages:

         vm.nr_hugepages = N

      Alternatively, append the following statement to the kernel parameter line in case support is compiled into the kernel: kvm.hpage=1 hugepages=N.
      Note that means to add hugepages dynamically after boot exist, but with effects like memory fragmentation, it is preferable to define huge pages as early as possible.
      If successful, the file /proc/sys/vm/nr_hugepages should show N huge pages. See here for further documentation.
      Then, to enable huge pages for a guest, add the following element to the respective domain XML:


      The use of huge pages in the host is orthogonal to the use of huge pages in the guest. Both will improve the performance independently by reducing the number of page faults and the number of page table walks after a TLB miss.
      The biggest performance improvement can be achieved by using huge pages in both, host and guest, e.g. with libhugetlbfs, as this will also make use of the larger 1M TLB entries in the hardware.
      Requires Linux kernel 4.19.
    • vfio-ap: The Adjunct Processor (AP) facility is an IBM Z cryptographic facility comprised of three AP instructions and up to 256 cryptographic adapter cards. Each adapter card is partitioned into up to 85 domains, each of which provides cryptographic services. An AP queue is the means by which AP messages are sent to and received from an AP adapter. Each AP queue is connected to a particular domain within a particular adapter. vfio-ap enables assignment of a subset of AP adapters and domains to one or more guests such that each guest has exclusive access to a discrete set of AP queues.
      Here is a small sample script illustrating host setup:

         # load vfio-ap device driver
         modprobe vfio-ap

         # reserve domain 7 for use by KVM guests
         echo -0x7 > /sys/bus/ap/aqmask
         # to reserve all domains of an adapter, use the following
         # line instead (by uncommenting it), and replace NN with the
         # adapter number:
         # echo -0xNN > /sys/bus/ap/apmask

         # create a mediated device (mdev) to provide userspace access
         # to a device in a secure manner
         echo $UUID > /sys/devices/vfio_ap/matrix/mdev_supported_types/ \

         # assign adapter, domain and control domain
         echo 0x3 > /sys/devices/vfio_ap/matrix/${UUID}/assign_adapter
         echo 0x7 > /sys/devices/vfio_ap/matrix/${UUID}/assign_domain
         echo 0x7 > /sys/devices/vfio_ap/matrix/${UUID}/ \

      To make use of the AP device in a KVM guest, add the following element to the respective domain XML:

         <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-ap'>
             <address uuid='e926839d-a0b4-4f9c-95d0-c9b34190c4ba'/>

      Once complete, use the passthrough device in a KVM guest just like a regular crypto adapter.
      Requires Linux kernel 4.20 and libvirt 4.9, and is also available in RHEL 8, Ubuntu 18.04 and SLES 15 SP1.

    by Stefan Raspl ( at May 17, 2019 11:33 PM

    May 07, 2019

    KVM on Z

    RHEL 8 released

    Red Hat Enterprise Linux 8 is out! See the announcement and their release notes with Z-specific changes.
    It ships the following code levels:

    by Stefan Raspl ( at May 07, 2019 04:52 PM

    April 30, 2019

    KVM on Z

    QEMU v4.0 released

    QEMU v4.0 is out. Besides a number of small enhancements, some items that we would like to highlight from a KVM on Z perspective:
    • CPU models for z14 GA2 as follows:
         $ qemu-system-s390x -cpu help -enable-kvm | grep z14.2
         s390 z14.2-base      IBM z14 GA2           (static, migration-safe)
         s390 z14.2           IBM z14 GA2           (migration-safe)
    • vfio-ap now supports hotplugging of vfio-ap devices.

    by Stefan Raspl ( at April 30, 2019 08:17 AM

    April 24, 2019

    QEMU project

    QEMU version 4.0.0 released

    We would like to announce the availability of the QEMU 4.0.0 release. This release contains 3100+ commits from 220 authors.

    You can grab the tarball from our download page. The full list of changes are available in the Wiki.

    Highlights include:

    • ARM: ARMv8+ extensions for SB, PredInv, HPD, LOR, FHM, AA32HPD, PAuth, JSConv, CondM, FRINT, and BTI
    • ARM: new emulation support for “Musca” and “MPS2” development boards
    • ARM: virt: support for >255GB of RAM and u-boot “noload” image types
    • ARM: improved emulation of ARM PMU
    • HPPA: support for TLB protection IDs and TLB trace events
    • MIPS: support for multi-threaded TCG emulation
    • MIPS: emulation support for I7200 I6500 CPUs, QMP-base querying of CPU types, and improved support for SAARI and SAAR configuration registers
    • MIPS: improvements to Interthread Communication Unit, Fulong 2E machine types, and end-user documentation.
    • PowerPC: pseries/powernv: support for POWER9 large decrementer
    • PowerPC: pseries: emulation support for XIVE interrupt controller
    • PowerPC: pseries: support for hotplugging PCI host bridges (PHBs)
    • PowerPC: pseries: Spectre/Meltdown mitigations enabled by default, additional support for count-cache-flush mitigation
    • RISC-V: virt: support for PCI and USB
    • RISC-V: support for TSR, TW, and TVM fields of mstatus, FS field now supports three stats (dirty, clean, and off)
    • RISC-V: built-in gdbserver supports register lists via XML files
    • s390: support for z14 GA 2 CPU model, Multiple-epoch and PTFF features now enabled in z14 CPU model by default
    • s390: vfio-ap: now supports hot plug/unplug, and no longer inhibits memory ballooning
    • s390: emulation support for floating-point extension facility and vector support instructions
    • x86: HAX accelerator now supported POSIX hosts other than Darwin, including Linux and NetBSD
    • x86: Q35: advertised PCIe root port speeds will now optimally default to maximum link speed (16GT/s) and width (x32) provided by PCIe 4.0 for QEMU 4.0+ machine types; older machine types will retain 2.5GT/x1 defaults for compatibility.
    • x86: Xen PVH images can now be booted with “-kernel” option
    • Xtensa: xtfpga: improved SMP support for linux (interrupt distributor, IPI, and runstall) and new SMP-capable test_mmuhifi_c3 core configuration
    • Xtensa: support for Flexible length instructions extension (FLIX)
    • GUI: new ‘-display spice-app’ to configure/launch a Spice client GUI with a similar UI to QEMU GTK. VNC server now supports access controls via tls-authz/sasl-authz options
    • QMP: support for “out-of-band” command execution, can be useful for postcopy migration recovery. Additional QMP commands for working with block devices and dirty bitmaps
    • VFIO: EDID interface for supported mdev (Intel vGPU for kernel 5.0+), allows resolution setting via xres/yres options.
    • Xen: new ‘xen-disk’ device which can create a Xen PV disk backend, and performance improvements for Xen PV disk backend.
    • Network Block Device: improved tracing and error diagnostics, improved client compatibility with buggy NBD server implementations, new –bitmap, –list, –tls-authz options for qemu-nbd
    • virtio-blk now supports DISCARD and WRITE_ZEROES
    • pvrdma device now supports RDMA Management Datagram services (MAD)
    • and lots more…

    Thank you to everyone involved!

    April 24, 2019 05:45 AM

    April 18, 2019

    Stefan Hajnoczi

    What's new in VIRTIO 1.1?

    The VIRTIO 1.1 specification has been published! This article covers the major new features in this specification.

    New Devices

    The following new devices are defined:

    • virtio-input is a Linux evdev input device (mouse, keyboard, joystick)
    • virtio-gpu is a 2D graphics device (with 3D support planned)
    • virtio-vsock is a host->guest socket communications device
    • virtio-crypto is a cryptographic accelerator device

    New Device Features




    New Core Features

    There is a new virtqueue memory layout called packed virtqueues. The old layout is called split virtqueues because the avail and used rings are separate from the descriptor table. The new packed virtqueue layout uses just a single descriptor table as the single ring. The layout is optimized for a friendlier CPU cache footprint and there are several features that devices can exploit for better peformance.

    The VIRTIO_F_NOTIFICATION_DATAfeature is an optimization mainly for hardware implementations of VIRTIO. The driver writes extra information as part of the Available Buffer Notification. Thanks to the information included in the notification, the device does not need to fetch this information from memory anymore. This is useful for PCI hardware implementations where minimizing DMA operations improves performance significantly.

    by Unknown ( at April 18, 2019 12:56 PM

    April 09, 2019

    Cole Robinson

    Host 'Network Interfaces' panel removed from virt-manager

    I released virt-manager 2.0.0 in October 2018. Since the release contained the full port to python3, it seemed like a good opportunity to drop some baggage from the app.

    The biggest piece we removed was the UI for managing host network interfaces. This is the Connection Details->Network Interfaces panel, and the New Interface wizard for defining host network definitions for things like bridges, bonds, and vlan devices. The main screen of the old UI looked like this:

    virt-manager host interfaces panel

    Some history

    Behind the scenes, this UI was using libvirt's Interface APIs, which also power the virsh iface-* commands. These APIs are little more than a wrapper around the netcf library.

    netcf aimed to be a linux distro independent API for network device configuration. On Red Hat distros this meant turning the API's XML format into an /etc/sysconfig/network script. There were even pie-in-the-sky ideas about NetworkManager one day using netcf.

    In practice though the library never really took off. It was years before a debian backend showed up, contributed by a Red Hatter in the hope of increasing library uptake, though it didn't seem to help. netcf basically only existed to serve the libvirt Interface APIs, yet those APIs were never really used by any major libvirt consuming app, besides virt-manager. And in virt-manager's case it was largely just slapping some UI over the XML format and lifecycle operations.

    For virt-manager's usecases we hoped that netcf would make it trivial to bridge the host's network interface, which when used with VMs would give them first class IP addresses on the host network setup, not NAT like the 'default' virtual network. Unfortunately though the UI would create the ifcfg files well enough, behind the scenes nothing played well with NetworkManager for years and years. The standard suggestion for was to disable NetworkManager if you wanted to bridge your host NIC. Not very user friendly. Some people did manage to use the UI to that effect but it was never a trivial process.

    The state today

    Nowadays NetworkManager can handle bridging natively and is much more powerful than what virt-manager/libvirt/netcf provide. The virt-manager UI was more likely to shoot you in the foot than make things simple. And it had become increasingly clear that virt-manager was not the place to maintain host network config UI.

    So we made the decision to drop all this from virt-manager in 2.0.0. netcf and the libvirt interface APIs still exist. If you're interested in some more history on the interface API/netcf difficulties, check out Laine's email to virt-tools-list.

    by Cole Robinson at April 09, 2019 06:01 PM

    Powered by Planet!
    Last updated: November 12, 2019 08:13 AM
    Powered by OpenShift Online