Blogging about open source virtualization

News from QEMU, KVM, libvirt, libguestfs, virt-manager and related tools


Planet Feeds

October 04, 2018

Cole Robinson

Setting custom network names on Fedora

systemd predictable network names give us host interface names like enp3s0. On one of my hosts, I have two interfaces: one that is my regular hard wired connection, and another I only plug in occasionally for some virt network testing. I can never remember the systemd names, so I want to rename the interfaces to something more descriptive for my needs. in my case lan0main and lan1pcie

The page referenced says to use systemd links. However after struggling with that for a while I'm that's only relevant to systemd-networkd usage and doesn't apply to Fedora's default use of NetworkManager. So I needed another way.

Long story short I ended up with some custom udev rules that are patterned after the old 70-persistent-net.rules file:

$ cat /etc/udev/rules.d/99-cole-nic-names.rules 
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="70:8b:cd:80:e5:5f", ATTR{type}=="1", NAME="lan0main"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="68:05:ca:1a:f5:da", ATTR{type}=="1", NAME="lan1pcie"

by Cole Robinson ( at October 04, 2018 09:27 PM

October 03, 2018

KVM on Z

RHEL 7.5 Beta supports KVM on Z

The Red Hat Enterprise Linux 7.5 Beta ships with support for KVM on Z through the kernel-alt packages. This will essentially ship Linux kernel 4.14.
Here is the respective section from the release notes:
KVM virtualization is now supported on IBM z Systems. However, this feature is only available in the newly introduced user space based on kernel version 4.14, provided by the kernel-alt packages.
See here for further details.

by Stefan Raspl ( at October 03, 2018 09:14 AM

October 01, 2018

KVM on Z

Knowledge Series: Black Box Guest Analysis Using kvm_stat

Another new entry in our Knowledge Series details how to gain insights into black box KVM guests using kvm_stat.

by Stefan Raspl ( at October 01, 2018 12:12 PM

September 24, 2018

KVM on Z

Knowledge Series: How to use vnc for Guest Installs

A new entry in our Knowledge Series details how to utilize vnc for graphical installs, exemplified using RHEL 7.5.

by Stefan Raspl ( at September 24, 2018 08:05 PM

September 13, 2018

Richard Jones

Creating Windows templates for virt-builder

virt-builder is a tool for rapidly creating customized Linux images. Recently I’ve added support for Windows although for rather obvious licensing reasons we cannot distribute the Windows templates which would be needed to provide Windows support for everyone. However you can build your own Windows templates as described here and then:

$ virt-builder -l | grep windows
windows-10.0-server      x86_64     Windows Server 2016 (x86_64)
windows-6.2-server       x86_64     Windows Server 2012 (x86_64)
windows-6.3-server       x86_64     Windows Server 2012 R2 (x86_64)
$ virt-builder windows-6.3-server
[   0.6] Downloading: http://xx/builder/windows-6.3-server.xz
[   5.1] Planning how to build this image
[   5.1] Uncompressing
[  60.1] Opening the new disk
[  77.6] Setting a random seed
virt-builder: warning: random seed could not be set for this type of guest
virt-builder: warning: passwords could not be set for this type of guest
[  77.6] Finishing off
                   Output file: windows-6.3-server.img
                   Output size: 10.0G
                 Output format: raw
            Total usable space: 9.7G
                    Free space: 3.5G (36%)

To build a Windows template repository you will need the latest libguestfs sources checked out from and you will also need a suitable Windows Volume License, KMS or MSDN developer subscription. Also the final Windows templates are at least ten times larger than Linux templates, so virt-builder operations take correspondingly longer and use lots more disk space.

First download install ISOs for the Windows guests you want to use.

After cloning the latest libguestfs sources, go into the builder/templates subdirectory. Edit the top of the script to set the path which contains the Windows ISOs. You will also possibly need to edit the names of the ISOs later in the script.

Build a template, eg:

$ ../../run ./ windows 2k12 x86_64

You’ll need to read the script to understand what the arguments do. The script will ask you for the product key, where you should enter the volume license key or your MSDN key.

Each time you run the script successfully you’ll end up with two files called something like:


The version numbers are Windows internal version numbers.

After you’ve created templates for all the Windows guest types you need, copy them to any (private) web server, and concatenate all the index fragments into the final index file:

$ cat *.index-fragment > index

Finally create a virt-builder repo file pointing to this index file:

# cat /etc/virt-builder/repos.d/windows.conf

You can now create Windows guests in virt-builder. However note they are not sysprepped. We can’t do this because it requires some Windows tooling. So while these guests are good for small tests and similar, they’re not suitable for creating actual Windows long-lived VMs. To do that you will need to add a sysprep.exe step somewhere in the template creation process.

by rich at September 13, 2018 09:07 AM

September 11, 2018

KVM on Z

2018 Linux on IBM Z and LinuxONE Workshop, Poughkeepsie, NY

Meet us at this event, taking place November 5-6, 2018, at IBM Poughkeepsie, NY. See the full announcement here.
Naturally, KVM on IBM Z will be covered by both, presentations and workgroup sessions.

Find the agenda here.

Registration is open here till October 25.

by Stefan Raspl ( at September 11, 2018 08:25 AM

September 10, 2018

KVM on Z

libvirt v4.7.0 released

libvirt v4.7, available for download at the libvirt project website, adds support for vsock for CCW.
For a full usage example and related information, see this article in our Knowledge series.

by Stefan Raspl ( at September 10, 2018 02:17 PM

Thomas Huth

QEMU's instance_init() vs. realize()

Note that this is a blog post for (new) QEMU developers. If you are just interested in using QEMU, you can certainly skip this text. Otherwise, in case you have ever been in touch with the QEMU device model (“qdev”), you are likely aware of the basic qdev code boilerplate already:

static void mydev_realize(DeviceState *dev, Error **errp)
    /* callback function that is run during device "realization" */

static void mydev_instance_init(Object *obj)
    /* callback function that is run during device instance init */

static Property mydev_properties[] = {
    DEFINE_PROP_xxx("myprop", MyDevState, field, ...),
    /* ... */

static void mydev_class_init(ObjectClass *oc, void *data)
    DeviceClass *dc = DEVICE_CLASS(oc);

    dc->realize = mydev_realize;
    dc->desc = "My cool device";
    dc->props = mydev_properties;
    /* ... and other device class setup code ... */

static const TypeInfo mydev_info = {
    .name          = TYPE_MYDEV,
    .parent        = TYPE_SYS_BUS_DEVICE,  /* or something else */
    .instance_size = sizeof(mydev_state),
    .instance_init = mydev_instance_init,
    .class_init    = mydev_class_init,

static void mydev_register_types(void)


There are three different initialization functions involved here, the class_init, the instance_init and the realize function. While it is quite obvious to distinguish the class_init function from the two others (it is used for initializing the class data, not the data that is used for an instance … this is similar to the object model with classes and instances in C++), I initially always wondered about the difference between the instance_init() and the realize() functions. Having fixed quite a lot of related bugs in the past months in the QEMU code base, I now know that a lot of other people are also not properly aware of the difference here, so I think it is now time to write down some information that I’m now aware of, to make sure that I don’t forget about this again, and maybe help others to avoid related bugs in the future ;-)

First it is of course always a good idea to have a look at the documentation. While the documentation of TypeInfo (where instance_init() is defined) is not very helpful to understand the differences, the documentation of DeviceClass (where realize() is defined) has some more useful information: You can learn here that the object instantiation is done first, before the device is realized, i.e. the instance_init() function is called first, and the realize() function is called afterwards. The former must not fail, while the latter can return an error to its caller via a pointer to an “Error” object pointer.

So the basic idea here is that device objects are first instantiated, then these objects can be inspected for their interfaces and their creators can set up their properties to configure their settings and wire them up with other devices, before the device finally becomes “active” by being realized. It is important here to notice that devices can be instantiated (and also finalized) without being realized! This happens for example if the device is introspected: If you enter for example device_add xyz,help at the HMP monitor, or if you send the device-list-properties QOM command to QEMU to retrieve the device’s properties, QEMU creates a temporary instance of the device to query the properties of the object, without realizing it. The object gets destroyed (“finalized”) immediately afterwards.

Knowing this, you can avoid a set of bugs which could be found with a couple of devices in the past:

  • If you want your device to provide properties for other parts of the QEMU code or for the users, and you want to add those properties via one of the many object_property_add*() functions of QEMU (instead of using the “props” field of the DeviceClass), then you should do this in the instance_init() and not in the realize() function. Otherwise the properties won’t show up when the user runs --device xyz,help or the device-list-properties QOM command to get some information about your device.

  • instance_init() functions must really never fail, i.e. also not call abort() or exit(). Otherwise QEMU can terminate unexpectedly when a user simply wanted to have a look at the list of device properties with device_add xyz,help or the device-list-properties QOM command. If your device cannot work in certain circumstances, check for the error condition in the realize() function instead and return with an appropriate error there.

  • Never assume that your device is always instantiated only with the machine that it was designed for. It’s of course a good idea to set the “user_creatable = false” flag in the DeviceClass of your device if your device cannot be plugged in arbitrary machines. But device introspection can still happen at any time, with any machine. So if you wrote a device called “mydev-a” that only works with --machine A, the user still can start QEMU with the option --machine B instead and then run device_add mydev-a,help or the device-list-properties QOM command. The instance_init() function of your device will be called to create a temporary instance of your device, even though the base machine is B and not A here. So you especially should take care to not depend on the availability of certain buses or other devices in the instance_init() function, nor use things like serial_hd() or nd_table[] in your instance_init() function, since these might (and should) have been used by the machine init function already. If your device needs to be wired up, provide properties as interfaces to the outside and let the creator of your device (e.g. the machine init code) wire your device between the device instantiation and the realize phase instead.

  • Make sure that your device leaves a clean state after a temporary instance is destroyed again, i.e. don’t assume that there will be only one instance of your device which is created at the beginning right after QEMU has been started and is destroyed at the very end before QEMU terminates. Thus do not assume that the things that you do in your instance_init() don’t need explicit clean-up since the device instance will only be destroyed when QEMU terminates. Device instances can be created and destroyed at any time, so when the device is finalized, you must not leave any dangling pointers or references to your device behind you, e.g. in the QOM tree. When you create other objects in your instance_init() function, make sure to set proper parents of these objects or use an instance_finalize() function, so that the created objects get cleaned up correctly again when your device is destroyed.

All in all, if you write code for a new QEMU device, it is likely a good idea to use the instance_init() function only for e.g. creating properties and other things that are required before device realization, and then do the main work in the realize() function instead.

September 10, 2018 01:05 PM

September 06, 2018

KVM on Z

QEMU v2.11 released

QEMU v2.11 is out. Here are the highlights from a KVM on Z perspective:
  • TOD-Clock Epoch Extension Support: Extends the TOD clock beyond the year 2042.
  • Setting sysctl vm.allocate_pgste is now superfluous.
  • Netboot: The network boot firmware sets the client architecture option (93) in the DHCP request to 0x1f ("s390 Basic"). This allows a DHCP server to deliver the correct boot image for IBM Z guests. This is useful in situations where a single DHCP server has to provide network boot images for multiple architectures, e.g. for the purpose of installing operating systems.
  • Added support for virtio-input-ccw and virtio-gpu-ccw. These newly supported devices lay the foundation for applications that require graphical interfaces, which thereby become usable from remote via VNC or SPICE.
    Here is a sample XML snippet for a guest definition:

        <input type='keyboard' bus='virtio'/>
        <graphics type='vnc' port='-1' autoport='yes' listen=''>
          <listen type='address' address=''/>
          <model type='virtio' heads='1' primary='yes'/>

by Stefan Raspl ( at September 06, 2018 03:46 PM

August 26, 2018

KVM on Z

QEMU v3.0 released

QEMU v3.0 is out. Besides a number of small enhancements, some items that we would like to highlight from a KVM on Z perspective:

  • A new CPU model representing IBM z14 Model ZR1 was added:
    (long name: IBM z14 Model ZR1 GA1).
  • Re-use your existing infrastructure for LPAR installs by utilizing the newly added support for .INS files in network boot.

by Stefan Raspl ( at August 26, 2018 07:47 AM

August 21, 2018

Gerd Hoffmann

USB recommendations for qemu

A collection of tips on using usb with qemu.

Picking a host adapter

The short answer for this one is: Unless you are running an operating system museum just use -device qemu-xhci.

Any recent operating system should support xhci out-of-the box. The only OS without xhci support which is still in widespread use is Windows 7.

In case your qemu version doesn't support qemu-xhci you can use nec-usb-xhci instead.

The -usb command line switch adds usb controllers matching the emulated hardware platform. So for the 'pc' machine type (emulating a 20+ year old i440FX chipset) this is a uhci host adapter (supporting usb1). For the 'q35' machine type (emulating a almost 10 year old Q35 chipset) it is ehci (for usb2 devices) with uhci companions (for usb1 devices). This is what you can use when running old guests which lack xhci support.

When using xhci you should better not use -usb, because you would get two usb busses then. Which is a valid configuration, but requires naming the usb host adapter and specifying the usb bus when adding usb devices if you want avoid qemu picking a random usb bus:

-device qemu-xhci,id=xhci -device usb-tablet,bus=xhci.0

With a single usb bus you can just say -device usb-tablet and be done with it.

Not enough usb ports?

Qemu can emulate an usb hub (-device usb-hub). But the hub supports usb1 only, so you should avoid using it. Better solution is to just increase the number of root ports. xhci has four root ports by default, but it supports up to 15 ports. And in case this still isn't enough a second xhci adapter can be added to the virtual machine.

To create a host adapter with 8 ports use -device qemu-xhci,p2=8,p3=8. The libvirt configuration is:

<controller type='usb' model='qemu-xhci' ports='8'/>

In case you wonder why qemu-xhci needs both p2 and p3 parameters: p2 specifies the number of usb2 ports (which support usb1 too), and p3 specifies the number of usb3 ports. It is possible to assign different counts here. When using -device qemu-xhci,p2=8,p3=4 you'll get an xhci adapter where ports 1-4 support both usb2 and usb3 and ports 5-8 are usb2-only. Can be used to force a usb3-capable usb device into usb2 mode by plugging it into a usb2-only xhci port. There should rarely be a need to actually do that in practice though.

by Gerd Hoffmann at August 21, 2018 10:00 PM

August 17, 2018

Daniel Berrange

ANNOUNCE: gtk-vnc 0.9.0 release

I’m pleased to announce a new release of GTK-VNC, version 0.9.0. This is a cleanup/modernization release. Note that the next release (1.0.0) will drop support for GTK-2

  • Requires gnutls >= 3.1.18
  • Requires libgcrypt >= 1.5.0
  • Requires glib2 >= 2.42.0
  • Use libgcrypt for DES routines
  • Add missing cipher close calls in ARD auth
  • Check for errors after reading mslogon params
  • Support newer UltraVNC mslogon auth type code
  • Avoid divide by zero in mslogin auth from bogus params
  • Re-allow python2 accidentally blocked when removing python binding

Thanks to all those who reported bugs and provides patches that went into this new release.

by Daniel Berrange at August 17, 2018 04:01 PM

August 15, 2018

QEMU project

QEMU version 3.0.0 released

We’d like to announce the availability of the QEMU 3.0.0 release. This release contains 2300+ commits from 169 authors.

A note from the maintainer: Why 3.0? Well, we felt that our version numbers were getting a bit unwieldy, and since this year is QEMU’s 15th birthday it seemed like a good excuse to roll over the major digit. Going forward we plan to increment the major version once a year, for the first release of the year. Don’t read too much into it: it doesn’t imply a drastic compatibility break. Rumours of our triskaidekaphobia have been greatly exaggerated ;-)

You can grab the tarball from our download page. The full list of changes are available in the Wiki.

Highlights include:

  • Support for additional x86/AMD mitigations against Speculative Store Bypass (Spectre Variant 4, CVE-2018-3639)
  • Improved support for nested KVM guests running on Hyper-V
  • Block device support for active disk-mirroring, which avoids convergence issues which may arise when doing passive/background mirroring of busy devices
  • Improved support for AHCI emulation, SCSI emulation, and persistent reservations / cluster management
  • OpenGL ES support for SDL front-end, additional framebuffer device options for early boot display without using legacy VGA emulation
  • Live migration support for TPM TIS devices, capping bandwidth usage during post-copy migration, and recovering from a failed post-copy migration
  • Improved latency when using user-mode networking / SLIRP
  • ARM: support for SMMUv3 IOMMU when using ‘virt’ machine type
  • ARM: v8M extensions for VLLDM and VLSTM floating-point instructions, and improved support for AArch64 v8.2 FP16 extensions
  • ARM: support for Scalable Vector Extensions in linux-user mode
  • Microblaze: support for 64-bit address sizes and translation bug fixes
  • PowerPC: PMU support for mac99 machine type and improvements for Uninorth PCI host bridge emulation for Mac machine types
  • PowerPC: preliminary support for emulating POWER9 hash MMU mode when using powernv machine type
  • RISC-V: improvement for privileged ISA emulation
  • s390: support for z14 ZR1 CPU model
  • s390: bpb/ppa15 Spectre mitigations enabled by default for z196 and later CPU models
  • s390: support for configuring consoles via -serial options
  • and lots more…

Thank you to everyone involved!

August 15, 2018 11:25 AM

August 01, 2018

Daniel Berrange

ANNOUNCE: gtk-vnc 0.8.0 release

I’m pleased to announce a new release of GTK-VNC, version 0.8.0. This is a small maintenance release tidying up some loose ends

  • Deleted the python2 binding in favour of GObject introspection
  • Pull in latest keycodemapdb content
  • Disable/fix -Wcast-function-type warnings

Thanks to all those who reported bugs and provides patches that went into this new release.

by Daniel Berrange at August 01, 2018 04:45 PM

Cornelia Huck

s390x changes in QEMU 3.0

QEMU 3.0 is currently in the late -rc phase (with the final release expected early/mid August), so here's a quick summary of what has been changed for s390x.

CPU models

  • A CPU model for the z14 Model ZR1 has been added. This is the "small", single-frame z14.
  • The feature bits for Spectre mitigation (bpb and ppa15) are now included in the default CPU model for z196 and up. This means that these features will be available to the guest (given the host supports them) without needing to specify them explicitly.


  • You can now configure consoles via -serial as well.
  • vfio-ccw devices have gained a "force-orb-pfch" property. This is not very useful for Linux guests, but if you are trying to use vfio-ccw with a guest that does not specify "unlimited prefetch" for its requests but does not actually rely on the semantics, this will help you. Adding support to vfio-ccw to accommodate channel programs that must not be prefetched is unfortunately not straightforward and will not happen in the foreseeable future.

Booting and s390 bios

  • The s390-netboot image has been enhanced: It now supports indirect loading via .INS files and pxelinux.cfg-style booting.
  • The boot menu can now also deal with non-sequential entries.


  • Handling of the TOD clock in tcg has been improved; CPU hotplug under tcg is now working.
  • And the usual fixes, cleanups and improvements.

by Cornelia Huck ( at August 01, 2018 11:36 AM

July 27, 2018

Daniel Berrange

ANNOUNCE: virt-viewer 7.0 release

I am happy to announce a new bugfix release of virt-viewer 7.0 (gpg), including experimental Windows installers for Win x86 MSI (gpg) and Win x64 MSI (gpg). The virsh and virt-viewer binaries in the Windows builds should now successfully connect to libvirtd, following fixes to libvirt’s mingw port.

Signatures are created with key DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF (4096R)

All historical releases are available from:

Changes in this release include:

  • Require spice-gtk >= 0.35
  • Clarify error message when no running VM is found
  • Improve check for libgovirt requirements
  • Support “-” as a URI for input connection file
  • Remove support for spice controller interface
  • Misc man page fixes
  • Lookup win32 translations relative to install dir
  • Position connect window in center not top-left
  • Misc fixes for ovirt foreign menu support

Thanks to everyone who contributed towards this release.

by Daniel Berrange at July 27, 2018 03:23 PM

July 23, 2018

Nathan Gauër

Vulkan-ize Virglrenderer - experiment

Virglrenderer provides OpenGL acceleration to a guest running on QEMU.

My current GSoC project is to add support for the Vulkan API.

Vulkan is drastically different to OpenGL. Thus, this addition is not straight-forward. My current idea is to add an alternative path for Vulkan. Currently, two different states are kept, one for OpenGL, and one for Vulkan. Commands will either go to the OpenGL or Vulkan front-end.

For now, only compute shaders are supported. The work is divided in two parts: a Vulkan ICD in MESA, and a new front-end for Virgl and vtest.

If you have any feedback, do not hesitate !

This experiment can be tested using this repository. If you have an Intel driver in use, you might be able to use the Dockerfile provided.

Each part is also available independently:

July 23, 2018 10:00 PM

July 20, 2018

KVM on Z

Knowledge: Use virt-install for KVM Guest Installations

Tired of fiddling with guest XML files and virtual DJing with ISO images? Then our latest entry in our Knowledge Series might come handy! Plus it also offers access to further Linux distributions on IBM Z!
Read more on how virt-install can help here.

by Stefan Raspl ( at July 20, 2018 10:59 PM

June 29, 2018

Daniel Berrange

CPU model configuration for QEMU/KVM on x86 hosts

With the various CPU hardware vulnerabilities reported this year, guest CPU configuration is now a security critical task. This blog post contains content I’ve written that is on its way to become part of the QEMU documentation.

QEMU / KVM virtualization supports two ways to configure CPU models

Host passthrough
This passes the host CPU model features, model, stepping, exactly to the guest. Note that KVM may filter out some host CPU model features if they cannot be supported with virtualization. Live migration is unsafe when this mode is used as libvirt / QEMU cannot guarantee a stable CPU is exposed to the guest across hosts. This is the recommended CPU to use, provided live migration is not required.
Named model
QEMU comes with a number of predefined named CPU models, that typically refer to specific generations of hardware released by Intel and AMD. These allow the guest VMs to have a degree of isolation from the host CPU, allowing greater flexibility in live migrating between hosts with differing hardware.

In both cases, it is possible to optionally add or remove individual CPU features, to alter what is presented to the guest by default.

Libvirt supports a third way to configure CPU models known as “Host model”. This uses the QEMU “Named model” feature, automatically picking a CPU model that is similar the host CPU, and then adding extra features to approximate the host model as closely as possible. This does not guarantee the CPU family, stepping, etc will precisely match the host CPU, as they would with “Host passthrough”, but gives much of the benefit of passthrough, while making live migration safe.

Recommendations for KVM CPU model configuration on x86 hosts

The information that follows provides recommendations for configuring CPU models on x86 hosts. The goals are to maximise performance, while protecting guest OS against various CPU hardware flaws, and optionally enabling live migration between hosts with hetergeneous CPU models.

Preferred CPU models for Intel x86 hosts

The following CPU models are preferred for use on Intel hosts. Administrators / applications are recommended to use the CPU model that matches the generation of the host CPUs in use. In a deployment with a mixture of host CPU models between machines, if live migration compatibility is required, use the newest CPU model that is compatible across all desired hosts.

Intel Xeon Processor (Skylake, 2016)
Intel Core Processor (Skylake, 2015)
Intel Core Processor (Broadwell, 2014)
Intel Core Processor (Haswell, 2013)
Intel Xeon E3-12xx v2 (Ivy Bridge, 2012)
Intel Xeon E312xx (Sandy Bridge, 2011)
Westmere E56xx/L56xx/X56xx (Nehalem-C, 2010)
Intel Core i7 9xx (Nehalem Class Core i7, 2008)
Intel Core 2 Duo P9xxx (Penryn Class Core 2, 2007)
Intel Celeron_4x0 (Conroe/Merom Class Core 2, 2006)

Important CPU features for Intel x86 hosts

The following are important CPU features that should be used on Intel x86 hosts, when available in the host CPU. Some of them require explicit configuration to enable, as they are not included by default in some, or all, of the named CPU models listed above. In general all of these features are included if using “Host passthrough” or “Host model”.

Recommended to mitigate the cost of the Meltdown (CVE-2017-5754) fix. Included by default in Haswell, Broadwell & Skylake Intel CPU models. Should be explicitly turned on for Westmere, SandyBridge, and IvyBridge Intel CPU models. Note that some desktop/mobile Westmere CPUs cannot support this feature.
Required to enable the Spectre (CVE-2017-5753 and CVE-2017-5715) fix, in cases where retpolines are not sufficient. Included by default in Intel CPU models with -IBRS suffix. Must be explicitly turned on for Intel CPU models without -IBRS suffix. Requires the host CPU microcode to support this feature before it can be used for guest CPUs.
Required to enable the CVE-2018-3639 fix. Not included by default in any Intel CPU model. Must be explicitly turned on for all Intel CPU models. Requires the host CPU microcode to support this feature before it can be used for guest CPUs.
Recommended to allow guest OS to use 1GB size pages.Not included by default in any Intel CPU model. Should be explicitly turned on for all Intel CPU models. Note that not all CPU hardware will support this feature.

Preferred CPU models for AMD x86 hosts

The following CPU models are preferred for use on Intel hosts. Administrators / applications are recommended to use the CPU model that matches the generation of the host CPUs in use. In a deployment with a mixture of host CPU models between machines, if live migration compatibility is required, use the newest CPU model that is compatible across all desired hosts.

AMD EPYC Processor (2017)
AMD Opteron 63xx class CPU (2012)
AMD Opteron 62xx class CPU (2011)
AMD Opteron 23xx (Gen 3 Class Opteron, 2009)
AMD Opteron 22xx (Gen 2 Class Opteron, 2006)
AMD Opteron 240 (Gen 1 Class Opteron, 2004)

Important CPU features for AMD x86 hosts

The following are important CPU features that should be used on AMD x86 hosts, when available in the host CPU. Some of them require explicit configuration to enable, as they are not included by default in some, or all, of the named CPU models listed above. In general all of these features are included if using “Host passthrough” or “Host model”.

Required to enable the Spectre (CVE-2017-5753 and CVE-2017-5715) fix, in cases where retpolines are not sufficient. Included by default in AMD CPU models with -IBPB suffix. Must be explicitly turned on for AMD CPU models without -IBPB suffix. Requires the host CPU microcode to support this feature before it can be used for guest CPUs.
Required to enable the CVE-2018-3639 fix. Not included by default in any AMD CPU model. Must be explicitly turned on for all AMD CPU models. This should be provided to guests, even if amd-ssbd is also provided, for maximum guest compatibility. Note for some QEMU / libvirt versions, this must be force enabled when when using “Host model”, because this is a virtual feature that doesn’t exist in the physical host CPUs.
Required to enable the CVE-2018-3639 fix. Not included by default in any AMD CPU model. Must be explicitly turned on for all AMD CPU models. This provides higher performance than virt-ssbd so should be exposed to guests whenever available in the host. virt-ssbd should none the less also be exposed for maximum guest compatability as some kernels only know about virt-ssbd.
Recommended to indicate the host is not vulnerable CVE-2018-3639. Not included by default in any AMD CPU model. Future hardware genarations of CPU will not be vulnerable to CVE-2018-3639, and thus the guest should be told not to enable its mitigations, by exposing amd-no-ssb. This is mutually exclusive with virt-ssbd and amd-ssbd.
Recommended to allow guest OS to use 1GB size pages. Not included by default in any AMD CPU model. Should be explicitly turned on for all AMD CPU models. Note that not all CPU hardware will support this feature.

Default x86 CPU models

The default QEMU CPU models are designed such that they can run on all hosts. If an application does not wish to do perform any host compatibility checks before launching guests, the default is guaranteed to work.

The default CPU models will, however, leave the guest OS vulnerable to various CPU hardware flaws, so their use is strongly discouraged. Applications should follow the earlier guidance to setup a better CPU configuration, with host passthrough recommended if live migration is not needed.

QEMU Virtual CPU version 2.5+ (32 & 64 bit variants). qemu64 is used for x86_64 guests and qemu32 is used for i686 guests, when no -cpu argument is given to QEMU, or no <cpu> is provided in libvirt XML.

Other non-recommended x86 CPUs

The following CPUs models are compatible with most AMD and Intel x86 hosts, but their usage is discouraged, as they expose a very limited featureset, which prevents guests having optimal performance.

Common KVM processor (32 & 64 bit variants). Legacy models just for historical compatibility with ancient QEMU versions.
Various very old x86 CPU models, mostly predating the introduction of hardware assisted virtualization, that should thus not be required for running virtual machines.

Syntax for configuring CPU models

The example below illustrate the approach to configuring the various CPU models / features in QEMU and libvirt

QEMU command line

Host passthrough
   $ qemu-system-x86_64 -cpu host

With feature customization:

   $ qemu-system-x86_64 -cpu host,-vmx,...
Named CPU models
   $ qemu-system-x86_64 -cpu Westmere

With feature customization:

   $ qemu-system-x86_64 -cpu Westmere,+pcid,...

Libvirt guest XML

Host passthrough
   <cpu mode='host-passthrough'/>

With feature customization:

   <cpu mode='host-passthrough'>
       <feature name="vmx" policy="disable"/>
Host model
   <cpu mode='host-model'/>

With feature customization:

   <cpu mode='host-model'>
       <feature name="vmx" policy="disable"/>
Named model
   <cpu mode='custom'>

With feature customization:

   <cpu mode='custom'>
       <feature name="pcid" policy="require"/>


by Daniel Berrange at June 29, 2018 12:49 PM

KVM on Z

Getting Started: Alpine Linux Instructions added

Instructions for Alpine Linux were added to the Getting Started with KVM on Z series.
See here for the actual page.

by Stefan Raspl ( at June 29, 2018 09:58 AM

June 21, 2018

QEMU project


Ever since I started working on QEMU, a small directory called tests/tcg has been in a perpetually broken state. It contains tests that exercise QEMU’s ability to work across architectures using the power of the Tiny Code Generator. However as these tests needed to be compiled for the guest architectures and not the host architecture—this is known as cross-compiling—most developers never ran them. As the tests were hardly ever built inevitably a certain amount of bit-rot set in.

Cross Compilers

In the old days, cross-compilation setups were almost all hand-crafted affairs which involved building versions of binutils, gcc and a basic libc. If you couldn’t get someone to give you a pre-built tarball, it was something you laboured through once and hopefully never had to touch again. There were even dedicated scripts like crosstool-ng which attempted to make the process of patching and configuring your toolchain easier.

While the distributions have improved their support for cross compilers over the years, there are still plenty of variations in how they are deployed. It is hard for a project like QEMU which has to build on a wide range of operating systems and architectures to seamlessly use any given distributions compiler setup. However for those with cross compilers at hand configure now accepts two additional flags:


With a compiler specified for each guest architecture you want to test the build system can now build and run the tests. For developers that don’t have cross compilers around, they can take advantage of QEMU’s docker images.

Enter Docker Containers

If you work in IT you would be hard pressed not to have noticed the hype around Docker and the concept of containerisation over the last few years. Put simply containers allow you to define a known working set of software that gets run in an isolated environment for a given task. While this has many uses for QEMU it allows us to define build environments that any developer can run without having to mess around with their preferred host setup.

Over the last few years QEMU’s build system has been expanding the number of docker images it supports. Most of this has been in service of our CI testing such as Patchew and Shippable but any developer with a docker setup can run the exact same images. For example if you want to check your patches won’t break when compiled on a 32 bit ARM system you can run:

make docker-test-build@debian-armhf-cross J=n

instead of tracking down a piece of ARM hardware to actually build on. Run make docker in your source tree to see the range of builds and tests it can support.

make check-tcg

With the latest work merged into master we can now take advantage of either hand-configured and Docker-based cross compilers to build test cases for TCG again. To run the TCG tests after you have built QEMU:

make check-tcg

and the build system will build and run all the tests it can for your configured targets.

Rules for tests/tcg

So now we have the infrastructure in place to add new tests what rules need to be followed to add new tests?

Well the first thing to note is currently all the tests are for the linux-user variant of QEMU. This means the tests are all currently user-space tests that have access to the Linux syscall ABI.

Another thing to note is the tests are separate from the rest of the QEMU test infrastructure. To keep things simple they are compiled as standalone “static” binaries. As the cross-compilation setup can be quite rudimentary for some of the rarer architectures we only compile against a standard libc. There is no support for linking to other libraries like for example glib. Thread and maths support is part of glibc so shouldn’t be a problem.

Finally when writing new tests consider if it really is architecture specific or can be added to tests/tcg/multiarch. The multiarch tests are re-built for every supported architecture and should be the default place for anything that tests syscalls or other common parts of the code base.

What’s next

My hope with this work is we can start adding more tests to systematically defend functionality in linux-user. In fact I hope the first port of call to reproducing a crash would be writing a test case that can be added to our growing library of tests.

Another thing that needs sorting out is getting toolchains for all of the less common architectures. The current work relies heavily on the excellent work of the Debian toolchain team in making multiarch aware cross compilers available in their distribution. However QEMU supports a lot more architectures than QEMU, some only as system emulations. In principle supporting them is as easy as adding another docker recipe but it might be these recipes end up having to compile the compilers from source.

The tests/tcg directory still contains a number of source files we don’t build.

The cris and openrisc directories contain user-space tests which just need the support of a toolchain and the relevant Makefile plumbing to be added.

The lm32, mips and xtensa targets have a set of tests that need a system emulator. Aside from adding the compilers as docker images some additional work is needed to handle the differences between plain linux-user tests which can simply return an exit code to getting the results from a qemu-system emulation. Some architectures have semi-hosting support already for this while others report their test status over a simple serial link which will need to be parsed and handled in custom versions of the run-%: rule.

by Alex Bennée at June 21, 2018 02:30 PM

June 10, 2018

KVM on Z

KVM at Share St.Louis 2018

Yes, we will be at SHARE in St.Louis this August!
See our sessions in the Linux and VM track as follows:

by Stefan Raspl ( at June 10, 2018 09:17 PM

May 31, 2018

QEMU project

QEMU’s new -nic command line option

If you used QEMU in the past, you are probably familiar with the -net command line option, which can be used to configure a network connection for the guest, or with with the -netdev option, which configures a network back-end. Yet, QEMU v2.12 introduces a third way to configure NICs, the -nic option.

The ChangeLog of QEMU v2.12 says that -nic can “quickly create a network front-end (emulated NIC) and a host back-end”. But why did QEMU need yet another way to configure the network, and how does it compare with -net and -netdev? To answer these questions, we need to look at the model behind network virtualization in QEMU.

As hinted by the ChangeLog entry, a network interface consists of two separate entities:

  1. The emulated hardware that the guest sees, i.e. the so-called NIC (network interface controller). On systems that support PCI cards, these typically could be an e1000 network card, a rtl8139 network card or a virtio-net device. This entity is also called the “front-end”.

  2. The network back-end on the host side, i.e. the interface that QEMU uses to exchange network packets with the outside (like other QEMU instances or other real hosts in your intranet or the internet). The common host back-ends are the “user” (a.k.a. SLIRP) back-end which provides access to the host’s network via NAT, the “tap” back-end which allows the guest to directly access the host’s network, or the “socket” back-end which can be used to connect multiple QEMU instances to simulate a shared network for their guests.

Based on this, it is already possible to define the most obvious difference between -net, -netdev and -nic: the -net option can create either a front-end or a back-end (and also does other things); -netdev can only create a back-end; while a single occurrence of -nic will create both a front-end and a back-end. But for the non-obvious differences, we also need to have a detailed look at the -net and -netdev options first …

The legacy -net option

QEMU’s initial way of configuring the network for the guest was the -net option. The emulated NIC hardware can be chosen with the -net nic,model=xyz,... parameter, and the host back-end with the -net <backend>,... parameter (e.g. -net user for the SLIRP back-end). However, the emulated NIC and the host back-end are not directly connected. They are rather both connected to an emulated hub (called “vlan” in older versions of QEMU). Therefore, if you start QEMU with -net nic,model=e1000 -net user -net nic,model=virtio -net tap for example, you get a setup where all the front-ends and back-ends are connected together via a hub:

Networking with -net

That means the e1000 NIC also gets the network traffic from the virtio-net NIC and both host back-ends… this is probably not what the users expected; it’s more likely that they wanted two separate networks in the guest, one for each NIC. Because -net always connects its NIC to a hub, you would have to tell QEMU to use two separate hubs, using the “vlan” parameter. For example -net nic,model=e1000,vlan=0 -net user,vlan=0 -net nic,model=virtio,vlan=1 -net tap,vlan=1 moves the virtio-net NIC and the “tap” back-end to a second hub (with ID #1).

Please note that the “vlan” parameter will be dropped in QEMU v3.0 since the term was rather confusing (it’s not related to IEEE 802.1Q for example) and caused a lot of misconfigurations in the past. Additional hubs can still be instantiated with -netdev (or -nic) and the special “hubport” back-end. The -net option itself will still stay around since it is still useful if you only want to use one front-end and one back-end together, or if you want to tunnel the traffic of multiple NICs through one back-end only (something like -net nic,model=e1000 -net nic,model=virtio -net l2tpv3,... for example).

The modern -netdev option

Beside the confusing “vlan” parameter of the -net option, there is one more major drawback with -net: the emulated hub between the NIC and the back-end gets in the way when the NIC front-end has to work closely together with the host back-end. For example, vhost acceleration cannot be enabled if you create a virtio-net device with -net nic,model=virtio.

To configure a network connection where the emulated NIC is directly connected to a host network back-end, without a hub in between, the well-established solution is to use the -netdev option for the back-end, together with -device for the front-end. Assuming that you want to configure the same devices as in the -net example above, you could use -netdev user,id=n1 -device e1000,netdev=n1 -netdev tap,id=n2 -device virtio-net,netdev=n2. This will give you straight 1:1 connections between the NICs and the host back-ends:

Networking with -netdev

Note that you can also still connect the devices to a hub with the special -netdev hubport back-end, but in most of the normal use cases, the use of a hub is not required anymore.

Now while -netdev together with -device provide a very flexible and extensive way to configure a network connection, there are still two drawbacks with this option pair which prevented us from deprecating the legacy -net option completely:

  1. The -device option can only be used for pluggable NICs. Boards (e.g. embedded boards) which feature an on-board NIC cannot be configured with -device yet, so -net nic,netdev=<id> must be used here instead.

  2. In some cases, the -net option is easier to use (less to type). For example, assuming you want to set up a “tap” network connection and your default scripts /etc/qemu-ifup and -down are already in place, it’s enough to type -net nic -net tap to start your guest. To do the same with -netdev, you always have to specify an ID here, too, for example like this: -netdev tap,id=n1 -device e1000,netdev=n1.

The new -nic option

Looking at the disadvantages listed above, users could benefit from a convenience option that:

  • is easier to use (and shorter to type) than -netdev <backend>,id=<id> -device <dev>,netdev=<id>
  • can be used to configure on-board / non-pluggable NICs, too
  • does not place a hub between the NIC and the host back-end.

This is where the new -nic option kicks in: this option can be used to configure both the guest’s NIC hardware and the host back-end in one go. For example, instead of -netdev tap,id=n1 -device e1000,netdev=n1 you can simply type -nic tap,model=e1000. If you don’t care about the exact NIC model type, you can even omit the model=... parameter and type -nic tap. This is even shorter and more convenient than the previous shortest way of typing -net nic -net tap. To get a list of NIC models that you can use with this option, you can simply run QEMU with -nic model=help.

Beside being easier to use, the -nic option can be used to configure on-board NICs, too (just like the -net option). For machines that have on-board NICs, the first -nic option configures the first on-board NIC, the second -nic option configures the second on-board NIC, and so forth.


  • The new -nic option gives you an easy and quick way to configure the networking of your guest.
  • For more detailed configuration, e.g. when you need to tweak the details of the emulated NIC hardware, you can use -device together with -netdev.
  • The -net option should be avoided these days unless you really want to configure a set-up with a hub between the front-ends and back-ends.

by Thomas Huth at May 31, 2018 07:50 AM

May 24, 2018

Gerd Hoffmann

Fedora 28 images uploaded

Fedora 28 was released a few weeks ago. New Fedora 28 images are finally uploaded now.

There are no raspberry pi images any more. Just use the standard Fedora arm images, they work just fine for both arm (rpi 2) and aarch64 (rpi 3).

There efi images are for qemu. Some use grub2 as bootloader, some use systemd-boot. The filename indicates which uses which. The efi images can also be booted as container, using systemd-nspawn --boot --image <file>, but you have to convert them to raw first as systemd-nspawn can't handle qcow2.

As usual the images don't have a root password. You have to set one using virt-customize -a --root-password "password:<secret>", otherwise you can't login after boot.

The images have been created with imagefish.

by Gerd Hoffmann at May 24, 2018 10:00 PM

May 17, 2018

KVM on Z

Knowledge Series: Managing KVM on IBM Z with oVirt

See here for a new entry in our "knowledge" series, providing step-by-step instructions on how to include IBM Z hosts in an oVirt data center.

by Stefan Raspl ( at May 17, 2018 02:51 PM

May 14, 2018

KVM on Z

Getting Started: RHEL 7.5 Instructions added

Instructions for RHEL7.5 were added to the Getting Started with KVM on Z series.
See here for the actual page.

by Stefan Raspl ( at May 14, 2018 12:40 PM

Getting Started: Instructions for Ubuntu 16.04 to 17.10 added

Instructions for Ubuntu 16.04 to 17.10 were added to the Getting Started with KVM on Z series.
See here for the entry page.

by Stefan Raspl ( at May 14, 2018 12:34 PM

May 13, 2018

Gerd Hoffmann

Welcome to Jekyll!

Switched my blog from wordpress to jekyll.

Automatic import handled most of content. Didn’t cover syntax highlight (extra wordpress plugin) though, so that needed some manual invention. Also copying over the (few) images was a manual process.

Configuration isn’t imported automatically, but that is just editing a few lines in _config.yml. Permalinks can be configured to be compatible to wordpress without much trouble, so all the article links stay valid.

My blog is stored in git now. Everything is just static pages. No database needed. No user data stored anywhere.

I’m pretty pleased with the result.

by Gerd Hoffmann at May 13, 2018 10:00 PM

May 04, 2018

KVM on Z

Ubuntu 18.04 released

Ubuntu Server 18.04 LTS is out! Support for IBM Z is available here.
It ships
As obvious from these package versions, support for IBM z14 is readily in place.

Since this is a so-called LTS (Long Term Support) release providing approx. 5 years of support (in contrast to the usual 9 months of non-LTS releases), it is of particular interest to Ubuntu users interested in a stable environment for production deployments.

by Stefan Raspl ( at May 04, 2018 08:47 PM

May 03, 2018

Cornelia Huck

A vfio-ccw primer

While basic support for vfio-ccw has been included in Linux and QEMU for some time, work has recently started to ramp up again and it seems like a good time to give some basic overview.

Why vfio-ccw?

Historically, QEMU on s390x presented paravirtualized virtio devices to the guest; first, via a protocol inspired by lguest, later, as emulated channel devices. This satisfies most needs (you get block devices, network devices, a console device, and lots more), but the device types are different from those found on LPARs or z/VM guests, and you may have a need to use e.g. a DASD directly.

For that reason, we want to do the same thing as on other platforms: pass a host device to the guest directly via vfio.

How does this work?

vfio-ccw is using the vfio mediated device framework; see the kernel documentation for an overview.

In a nutshell: The subchannel to be passed to the guest is unbound from its normal host driver (in this case, the I/O subchannel driver) and bound to the vfio-ccw driver. Any I/O request is intercepted and executed on the real device, and interrupts from the real device are relayed back to the guest.

Why subchannels and not ccw devices?

The initial attempt to implement this actually worked at the ccw device level. However, this means that the Linux common I/O layer in the host will perform various actions like handling of channel paths - which may interfere with what the guest is trying to do. Therefore, it seemed like a better idea to keep out of the way as much as possible and just implement a minimal subchannel driver that does not do much beyond what the guest actually triggered itself.

How is an actual I/O request processed?

When the guest is ready to use a channel device, it will issue I/O requests via channel programs (see here for an explanation on how that works and what things like scsw and orb mean.) The channel I/O instructions are mandatory SIE intercepts, so the host will get control for any START SUBCHANNEL the guest issues. QEMU is in charge of interpretation of channel I/O instructions, so it will process the ssch as a request to a pass-through device.

All channel I/O instructions are privileged, which means that the host kernel now needs to get involved again. QEMU does so by writing to an I/O region: the scsw (which contains, amongst other things, the fctl field specifying the start function) and the orb (pointing to the channel program). The host kernel driver now has enough information to actually issue the request on the real device after translating the ccw chain and its addresses to host addresses (involving pinning, idals and other things I will not explain here for brevity.)

After the device has processed the I/O request, it will make the subchannel status pending and generate an I/O interrupt. The host kernel driver collects the state and makes it available via the same I/O region (the IRB field), and afterwards triggers QEMU via an eventfd. QEMU now has all information needed to update its internal structures for the devices so that the guest can obtain the information related to the I/O request.

Isn't that all a bit too synchronous?

Yes, it is. Channel I/O is supposed to be asynchronous (give the device an I/O request, collect status later), but our implementation isn't yet. Why? Short answer: It is hard, and we wanted something to get us going. But this is on the list of things to be worked on.

Where is the IOMMU for this?

Due to the way channel programs work, we don't have a real IOMMU.

Does this cover everything supported by the architecture?

Not yet. Channel program wise, we support the format Linux drivers use. Also, we're emulating things like HALT SUBCHANNEL and CLEAR SUBCHANNEL in QEMU, while they really should be handed through to the device (support for this is in the works).

On the whole, you should be able to pass an ECKD DASD to a Linux guest without (known) issues.

How can I try this out?

Recent QEMU and Linux versions should have everything you need in the host; see this wiki entry for details. As a guest, any guest that can run under KVM should be fine.

What's the deal with that "unrestricted cssids" thing?

If you look at this older article, you'll notice the 'fe' value for the cssid of virtio devices (with the promise to explain it later... which I sadly never did). The basic idea at the time was to put 'virtual' devices like virtio and 'non-virtual' devices like vfio-ccw into different channel subsystem images, so that e.g. channel paths (which are per channel subsystem image) don't clash. In other words, 'virtual' and 'non-virtual' devices (and channel paths) would have different cssids (the first part of their identifiers).

This sounded like a good idea at the time; however, there's a catch: A guest operating system will by default only see the devices in the default channel subsystem image. To see all of them, it needs to explicitly enable the Multiple Channel Subsystems Extended (MCSS-E) feature - and I do not know of any operating system that has done so as of today (not very surprising, as QEMU is the only implementation of MCSS-E I'm aware of).

To work around this, we originally introduced the 's390-squash-mcss' parameter to QEMU, which would put all devices into the default channel subsystem image. But as MCSS-E support is unlikely to arrive in any guest operating system anytime soon, we agreed to rather drop the restriction of virtual devices being in css fe and non-virtual devices everywhere else (since QEMU 2.12).

What are the plans for the future?

Several things are already actively worked on, while others may come up later.
  • Intial libvirt support for vfio-ccw has been posted here.
  • Reworking the Linux host driver to make things more asynchronous and to support halt/clear is in progress.
  • Improvements in channel path handling (for example, to enable the guest to see path availability changes) are also in progress. We may need to consider things like dasd reserve/release as well.

by Cornelia Huck ( at May 03, 2018 01:53 PM

Powered by Planet!
Last updated: October 18, 2018 12:34 AM
Powered by OpenShift Online