Blogging about open source virtualization

News from QEMU, KVM, libvirt, libguestfs, virt-manager and related tools


Planet Feeds

January 14, 2020

KVM on Z

KVM at SHARE Fort Worth 2020

Yes, we will be at SHARE in Fort Worth, TX, February 23-28!
Meet us at the following session in the Linux and VM/Virtualization track:
  • KVM on IBM Z News (Session #26219): Latest news on our development work with the open source community

by Stefan Raspl ( at January 14, 2020 07:38 AM

January 09, 2020

Gerd Hoffmann

PCI IDs for virtual devices

It's a bit complicated. There are a bunch of different vendor IDs in use in the linux/qemu virtualization world. I'm trying to cover them all.

Vendor ID 1af4 (Red Hat)

This is the Red Hat vendor ID. The device ID range from 1af4:1000 to 1af4:10ff is reserved for virtio-pci devices. For virtio 1.0 (and newer) devices there is a fixed relationship between virtio device ID and PCI device ID (offset 0x1040). So allocating a virtio device ID is enough, that implicitly allocates a PCI device ID too.

Qemu uses 1af4:1100 as PCI Subsystem ID for most devices.

The ivshmem device uses 1af4:1110.

Example (virtio scsi controller):

$ lspci -vnns3:
03:00.0 SCSI storage controller [0100]: Red Hat, Inc. \
		Virtio SCSI [1af4:1048] (rev 01)
	Subsystem: Red Hat, Inc. Device [1af4:1100]
	[ ... ]

Vendor ID 1b36 (Red Hat / Qumranet)

This was the Qumranet vendor ID, and with the Qumranet aquisition by Red Hat the PCI vendor ID moved to Red Hat too. The device ID range from 1b36:0001 to 1b36:00ff is reserved for qemu, for virtual devices which are not virtio.

Such devices can either be qemu specific (test device, rocker ethernet) or devices which guest drivers typically match by PCI class code (pci bridges, serial, sdhci, xhci, nvme, ...) so they don't need to masquerade as some existing physical device.

New PCI ID assignments for qemu should use this pool.

Current allocations are listed in docs/specs/pci-ids.txt (here is the qemu 4.2 version) and have a #define in include/hw/pci/pci.h. If needed an addtional file in docs/specs/ should have the device specification.

To allocate an ID send a patch updating both docs/specs/pci-ids.txt and include/hw/pci/pci.h to the qemu-devel mailing list. Add me to Cc: to make sure it gets my attention. The commit message should describe the device you want allocate an ID for.

The 1b36:0100 ID is used by the qxl display device.

Example (xhci usb controller):

$ lspci -vnns2:
02:00.0 USB controller [0c03]: Red Hat, Inc. QEMU XHCI Host \
		Controller [1b36:000d] (rev 01) (prog-if 30 [XHCI])
	Subsystem: Red Hat, Inc. Device [1af4:1100]
	[ ... ]

Vendor ID 1234

Even though there are #defines for PCI_VENDOR_ID_QEMU with this ID at various places this is not assigned to qemu. According to the PCI ID database this belongs to "Technical Corp".

Do not use this for new devices.

The qemu stdvga uses 1234:1111. This ID is hardcoded all over the place in guest drivers, so unfortunaly we are stuck with it.

Luckily no problems due to conflicts showed up so far, seems this vendor ID isn't used by the owner to build PCI devices.

Other Vendor IDs

It is perfectly fine to use IDs from other vendors for virtual qemu devices. They must be explicitly reserved for qemu though, to avoid possible conflicts with physical hardware from the same vendor. So, if you have your own vendor ID and want contribute a device to qemu you can also allocate an device ID from your ID space instead of taking one from the Red Hat 1b36 pool.

PCI IDs for emulated devices

When emulating existing physical hardware qemu will of course use the PCI IDs of the hardware being emulated, so the guest drivers will find the device. In most cases the PCI Subsystem ID 1af4:1100 indicates the device is emulated by qemu.

Example (sound device):

$ lspci -vnns1b
00:1b.0 Audio device [0403]: Intel Corporation 82801I (ICH9 Family) \
		HD Audio Controller [8086:293e] (rev 03)
	Subsystem: Red Hat, Inc. QEMU Virtual Machine [1af4:1100]
	[ ... ]

by Gerd Hoffmann at January 09, 2020 11:00 PM

January 07, 2020

Gerd Hoffmann

sound configuration changes in qemu

The sound subsystem was the odd one in qemu. Configuration using environment variables like QEMU_AUDIO_DRV instead of command line switches. That time is over now. For qemu 4.2 we finally completed merging full -audiodev support, written by Zoltán Kővágó. Read on to learn what has changed.

Using the environment variables continues to work for now. But this configuration method is deprecated and will be dropped at some point in the future. Maybe in qemu 5.1 (earliest release the qemu deprecation policy would allow), maybe we'll leave the compatibility code in for a few more releases to allow a longer transition period given this is a rather fundamental change.

Creating audio backends

The new -audiodev command line switch creates an audio backend, simliar to -netdev for network backends or -blockdev for block device backends. All sound backend configuration is done using -audiodev parameters.

The simplest case is to just specify the backend and assign an id to refer to it, like this:

qemu -audiodev spice,id=snd0

Some backends have additional configuration options. For pulseaudio for example it is possible to specify server hostname:

qemu -audiodev pa,id=snd0,server=localhost

Stream parameters can be specified separately for in (record) and out (playback). There are some parameters which are common for all backends (frequency, channels, ...) and backend-specific parameters like the pulseaudio stream name (visible in mixer applications like pavucontrol) or the alsa device:

qemu -audiodev pa,id=snd0,out.frequency=48000,
qemu -audiodev alsa,id=snd0,

Buffer sizes are specified in microseconds everywhere. So configuring alsa with a buffer for 10 milliseconds of sound data and 4 periods (2.5 miliseconds each) works this way:

qemu -audiodev alsa,id=snd0,out.buffer-length=10000,out.period-length=2500

The -audiodev switch accepts json too (simliar to -blockdev), so the last example can also be specified this way:

qemu -audiodev "{
    'driver' : 'alsa',
    'id' : 'snd0',
    'out' : {
      'buffer-length' : 10000,
      'period-length' : 2500

Consult the audio qapi schema for all available config options (including documentation).

Using audio backends

That is the simple part. All sound devices got a new audiodev parameter to link the device with an audio backend, using the id assigned to the backend:

qemu -device usb-audio,audiodev=snd0


Note: Right now the audiodev= parameter is optional, for backward compatibility reasons. The parameter will become mandatory though when we drop the code for the deprecated evironment variable configuration method. So I strongly recommend to use the parameter explicitly everywhere, to be prepared for the future.

Multiple audio backends and 5:1 surround support

Finally a more complex example to showcast what the new sound configuration allows for:

qemu \
  -audiodev pa,id=hda,out.mixing-engine=off \
  -audiodev pa,id=usb,out.mixing-engine=off \
  -device intel-hda -device hda-output,audiodev=hda \
  -device qemu-xhci -device usb-audio,audiodev=usb,multi=on


  • We can have multiple backends, by simply specifying -audiodev multiple times on the command line and assigning different ids. That can be useful even in case of two identical backends. With pulseaudio each backend is a separate stream and can be routed to different output devices on the host (using a pulse mixer app like pavucontrol).
  • Using the backend ids we assign one backend to the HDA device and the other to the USB device.
  • qemu has an internal audio mixer, for mixing audio streams from multiple devices into one output stream. The internal mixer can also do resampling and format conversion if needed. With the pulseaudio backend we don't need it as the pulseaudio daemon can handle all this for us. Also the internal mixer is limited to mono and stereo, it can't handle multichannel (5:1) sound data. So we use mixing-engine=off to turn off the internal mixer.
  • The USB audio device has multichannel (5:1) support. This is disabled by default though, the multi=on parameter turns it on.


by Gerd Hoffmann at January 07, 2020 11:00 PM

December 20, 2019

Cornelia Huck

A 2019 recap (and a bit of an outlook)

The holiday season for 2019 will soon be upon us, so I decided to do a quick recap of what I consider the highlights for this year, from my perspective, regarding s390x virtualization, and the wider ecosystem.


I attended the following conferences last year.

Linux Plumbers Conference

LPC 2019 was held in Lisbon, Portugal, on September 9-11. Of particular interest for me was the VFIO/IOMMU/PCI microconference. I talked a bit about cross-architecture considerations (and learned about some quirks on other architectures as well); the rest of the topics, while not currently concerning my work directly, were nevertheless helpful to move things forward. As usual on conferences, the hallway track is probably the most important one; met some new folks, saw others once again, and talked more about s390 I/O stuff than I anticipated. I can recommend this conference for meeting people to talk to about (not only) deeply technical things.

KVM Forum

KVM Forum 2019 was held in Lyon, France, on October 30 - November 1. As usual, a great place to meat people and have discussions in the hallway, e.g. about vfio migration. No talk from me this year, but an assortment of interesting topics presented by others; I contributed to an article on ( Of note from an s390x perspective were the talks about protected virtualization and nested testing mentioned in the article, and also the presentation on running kvm unit tests beyond KVM.

s390x changes in QEMU and elsewhere

There's a new machine (z15) on the horizon, but support for older things has been introduced or enhanced as well.


Lots of work has gone into tcg to emulate the vector instructions introduced with z13. Distributions are slowly switching to compiling for z13, which means gcc is generating vector instructions. As of QEMU 4.2, you should be able to boot recent distributions under tcg once again.


vfio-ccw now has gained support for sending HALT SUBCHANNEL and CLEAR SUBCHANNEL to the real device; this is useful e.g. for error handling, when you want to make sure an operation is really terminated at the device. Also, it is now possible to boot from a DASD attached via vfio-ccw.

vfio-ap has seen some improvements as well, including support for hotplugging the matrix device and for interrupts.

Guest side

A big change on the guest side of things was support for protected virtualization (also see the talk given at KVM Forum). This is a bit like AMD's SEV, but (of course) different. Current Linux kernels should be ready to run as a protected guest; host side support is still in progress (see below).

Other developments of interest

mdev, mdev everywhere

There has been a lot of activity around mediated devices this year. They have been successfully used in various places in the past (GPUs, vfio-ccw, vfio-ap, ...). A new development is trying to push parts of it into userspace ('muser', see the talk at KVM Forum). An attempt was made to make use of the mediating part without the vfio part, but that was met with resistance. Ideas for aggregation are still being explored.

In order to manage and persist mdev devices, we introduced the mdevctl tool, which is currently included in at least Fedora and Debian.

vfio migration

Efforts to introduce generic migration support for vfio (or at least in the first shot, for pci) are still ongoing. Current concerns mostly cycle around dirty page tracking. It might make sense to take a stab at wiring up vfio-ccw once the interface is stable.

What's up next?

While there probably will be some not-yet-expected developments next year, some things are bound to come around in 2020.

Protected virtualization

Patch sets for KVM and QEMU to support protected virtualization on s390 have already been posted this year; expect new versions of the patch sets to show up in 2020 (and hopefully make their way into the kernel respectively QEMU).


Patches to support detecting path status changes and relaying them to the guest have already been posted; expect an updated version to make its way into the kernel and QEMU in 2020. Also likely: further cleanups and bugfixes, and probably some kind of testing support, e.g. via kvm unit tests. Migration support also might be on that list.

virtio-fs support on s390x

Instead of using virtio-9p, virtio-fs is a much better way to share files between host and guest; expect support on s390x to become available once sharing of fds in QEMU without numa becomes possible. Shared memory regions on s390x (for DAX support) still need discussion, however.

by Cornelia Huck ( at December 20, 2019 02:34 PM

December 17, 2019

KVM on Z

SLES 12 SP5 released

SLES 12 SP5 is out! See this section in the release notes for a detailed look at IBM Z-specific changes in support of KVM.
Otherwise, the update ships the following code levels:
  • Linux kernel 4.12 (SP4: 4.12, unchanged),
  • QEMU v3.1 (SP4: v2.11), and
  • libvirt v5.1 (SP4: v4.0).
See previous blog entries on QEMU v2.12 and v3.0, and libvirt v4.7 and v4.10 for further details on new features that become available by the respective package updates.
Also take note of this article, which adds further details on IBM z15 support.

by Stefan Raspl ( at December 17, 2019 11:22 AM

December 13, 2019

KVM on Z

Documentation: New Performance Papers

Two new performance papers on KVM network performance were published:
See also the Performance Hot topics page in IBM Knowledge Center.

by Stefan Raspl ( at December 13, 2019 07:49 AM

QEMU v4.2 released

QEMU v4.2 is out. For highlights from a KVM on Z perspective see the Release Notes.

by Stefan Raspl ( at December 13, 2019 07:45 AM

QEMU project

QEMU version 4.2.0 released

We would like to announce the availability of the QEMU 4.2.0 release. This release contains 2200+ commits from 198 authors.

You can grab the tarball from our download page. The full list of changes are available in the Wiki.

Highlights include:

  • TCG plugin support for passive monitoring of instructions and memory accesses
  • block: NBD block driver now supports more efficient handling of copy-on-read requests
  • block: NBD server optimizations for copying of sparse images, and general fixes/improvements for NBD server/client implementations
  • block/crypto: improved performance for AES-XTS encryption for LUKS disk encryption
  • vfio-pci support for “failover_pair_id” property for easier migration of VFIO devices
  • virtio-mmio now supports virtio-compatible v2 personality and virtio 1.1 support for packed virtqueues
  • 68k: new “next-cube” machine for emulating a classic NeXTcube
  • 68k: new “q800” machine for emulating Macintosh Quadro 800
  • ARM: new “ast2600-evb” machine for emulating Aspeed AST2600 SoC
  • ARM: semihosting v2.0 support with STDOUT_STDERR/EXIT_EXTENDED extentions
  • ARM: KVM support for more than 256 CPUs
  • ARM: “virt” machine now supports memory hotplugging
  • ARM: improved TCG emulation performance
  • ARM: KVM support for SVE SIMD instructions on SVE-capable hardware
  • PowerPC: emulation support for mffsce, mffscrn, and mffscrni POWER9 instructions
  • PowerPC: “powernv” machine now supports Homer and OCC SRAM system devices
  • RISC-V: “-initrd” argument now supported
  • RISC-V: debugger can now see all architectural state
  • s390: emulation support for IEP (Instruction Execution Protection)
  • SPARC: “sun4u” IOMMU now supports “invert endianness” bit
  • x86: VMX features can be enabled/disabled via “-cpu” flags
  • x86: new “microvm” machine that uses virtio-mmio instead of PCI for use as baseline for performance optimizations
  • x86: emulation support for AVX512 BFloat16 extensions
  • x86: new CPU models for Denverton (server-class Atom-based SoC), Snowridge, and Dhyana
  • x86: macOS Hypervisor.framework support (“-accel hvf”) now considered stable
  • xtensa: new “virt” machine type
  • xtensa: call0 ABI support for user-mode emulation
  • and lots more…

Thank you to everyone involved!

December 13, 2019 06:00 AM

December 09, 2019

KVM on Z

KVM on IBM z15 Features

To take advantage of the new features of z15, the latest addition to the IBM Z family as previously announced here, use any of the following CPU models in your guest's domain XML:
  • Pre-defined model for z15
      <cpu mode='custom'>
  • Use z15 features in a migration-safe way (recommended). E.g. when running on z15 this will be a superset of the gen15a model, and feature existence will be verified on the target system prior to a live migration:
      <cpu mode='host-model'/>
  • Use z15 features in a non-migration-safe way. I.e. feature existence will not be verified on the target system prior to a live migration:
      <cpu mode='host-passthrough'/>
Here is a list of features of the new hardware generation as supported in Linux kernel 5.2 and QEMU 4.1, all activated by default in the CPU models listed above:
  • Miscellaneous Instructions
    Following the example of previous machines, new helper and general purpose instructions were
      minste3     Miscellaneous-Instruction-Extensions Facility 3 
  • SIMD Extensions
    Following up to the SIMD instructions as introduced with the previous z13 and z14 models, this feature again provides further vector instructions, which can again be used in KVM guests.
    These new vector instructions can be used to improve decimal calculations as well as for implementing high performance variants of certain cryptographic operations.
    In the z15 CPU models, the respective feature is:
      vxpdeh      Vector-Packed-Decimal-Enhancement Facility
      vxeh2       Vector enhancements facility 2
  • Deflate Conversion
    Provide acceleration for zlib compression and decompression
    In the z15 CPU model, the respective feature is:
      dflt        Deflate conversion facility
  • MSA Updates
    z15 introduces a new Message Security Assist MSA9, providing elliptic curve cryptography. It supports message authentication, the generation of elliptic curve keys, and scalar multiplication.
    This feature can be exploited in KVM guests' kernels and userspace applications independently (i.e. a KVM guest's userspace applications can take advantage of these features irrespective of the guest's kernel version).
    In the z15 CPU model, the respective feature is:
      msa9        Message-security-assist-extension 9 facility
      msa9_pckmo  Message-security-assist-extension 9 PCKMO
                  subfunctions for protected ECC keys
The z15 CPU model was backported into several Linux distributions. It is readily available in RHEL8.1, SLES 12 SP5, SLES 15 SP1 (via maintweb updates for kernel and qemu) and Ubuntu 18.04.

by Stefan Raspl ( at December 09, 2019 03:29 PM

Webinar: Dynamically provisioning resources to KVM hosted Linux virtual servers

Are you just getting started with KVM? In this session we review dynamic provisioning and de-provisioning processor, memory, disk storage, and network connectivity to your KVM host and the guest virtual servers. Examples include both graphical management and command line interfaces. Come and see how simple it can be to manage your servers.

Richard Young, Executive IT Specialist, IBM Systems

Register here. You can check the system requirements here.
After registering, you will receive a confirmation email containing information about joining the webinar.

Replay & Archive
All sessions are recorded. For the archive as well as a replay and handout of this session and all previous webcasts see here.

by Stefan Raspl ( at December 09, 2019 01:17 PM

December 06, 2019

KVM on Z

Documentation: KVM Virtual Server Management Update

Intended for KVM virtual server administrators, this book illustrates how to set up, configure, and operate Linux on KVM instances and their virtual devices running on the KVM host and IBM Z hardware.

This major update includes information about VFIO pass-through devices, virtual server setup with the virt-install command, setting up virtual switches with VNIC characteristics, and other features that are available with the latest versions of QEMU and libvirt.

You can download the .pdf here.

by Stefan Raspl ( at December 06, 2019 07:14 AM

December 01, 2019

Fabiano Fidêncio

Adopting GitLab workflow

In October 2018 there was a face-to-face short meeting with a big part of libosinfo maintainers, some contributors, and some users.

This short meeting took place during a lunch break in one of KVM Forum 2018 days and, among other things, we discussed whether we should allow, and / or prefer receiving patches through GitLab Merge Requests.

Here’s the announcement:

[Libosinfo] Merge Requests are enabled!

    From: Fabiano Fidêncio <fidencio redhat com>
    To: "libosinfo redhat com" <libosinfo redhat com>
    Subject: [Libosinfo] Merge Requests are enabled!
    Date: Fri, 21 Dec 2018 16:48:14 +0100


Although the preferred way to contribute to libosinfo, osinfo-db and
osinfo-db-tools is still sending patches to this ML, we've decided to
also enable Merge Requests on our gitlab!

Best Regards,
Fabiano Fidêncio

Now, one year past that decision, let’s check what has been done, review some numbers, and discuss what’s my take, as one of the maintainers, of the decision we made.

2019, the experiment begins …

After the e-mail shown above was sent, I’ve kept using mailing list as the preferred way to submit and review patches, keeping an eye on GitLab Merge Requests, till August 2019 from when I did a full switch to using GitLab instead of mailing list.

… and what changed? …

Well, to be honest, not much. But in order to extend a little bit more, I have to describe a little bit my not optimal workflow.

Even before describing my workflow, let me just make clear that:

  • I don’t have any scripts that would fetch the patches from my e-mail and apply them automagically for me;

  • I never ever got used to text-based mail clients (I’m a former Evolution developer, I’m an Evolution user for several years);

Knowing those things, this is how my workflow looks like:

  • Development: I’ve been using GitLab for a few years as the main host of, my forks of. the projects I contribute to. When developing a new feature, I would:

    • Create a new branch;
    • Do the needed changes;
    • Push the new branch to the project on my GitLab account;
    • Submit the patches;
  • Review: It may sound weird, maybe it really is, but the way I do review patches is by:

    • Getting the patches submitted;
    • Applying atop of master;
    • Doing a git rebase -i so I can go through each one of the patchesR;
    • Then, for each one of the patches I would:
      • Add comments;
      • Do fix-up changes;
      • Squash my fixes atop of the original patch;
      • Move to the next patch;

And now, knowing my workflow, I can tell that pretty much nothing changed.

As part of the development workflow:

  • Submitting patches:

    • git publish -> click in the URL printed when a new branch is pushed to GitLab;
  • Reviewing patches:

    • Saving patch e-mails as mbox, applying them to my tree -> pull the MR

Everything else stays pretty much the same. I still do a git rebase -i and go through the patches, adding comments / fix-ups which, later on I’ll have to organise and paste somewhere (either replying to the e-mail or adding to GitLab’s web UI) and that’s the part which consumes the most of my time.

However, although the change was not big to me as a developer, some people had to adapt their workflow in order to start reviewing all the patches I’ve been submitting to GitLab. But let’s approach this later on … :-)

Anyways, it’s important to make it crystal clear that this is my personal experience and that I do understand that people who rely more heavily on text-based mail clients and / or with a bunch of scripts tailored for their development would have a different, way way different, experience.

… do we have more contributions since the switch? …

As by November 26th, I’ve checked the amount of submissions we had on both libosinfo mailing list and libosinfo GitLab page during the current year.

Mind that I’m not counting my own submissions and that I’m counting osinfo-db’s addition, which usually may consist in adding data & tests, as a single submission.

As for the mailing list, we’ve received 32 patches; as for the GitLab, we’ve received 34 patches.

Quite similar number of contributions, let’s dig a little bit more.

The 32 patches sent to our mailing list came from 8 different contributors, and all of them had at least one previous patch merged in one of the libosinfo projects.

The 34 patches sent to our GitLab came from 15 different contributors and, from those, only 6 of them had at least one previous patch merged in one of the libosinfo projects, whilst 9 of them were first time contributors (and I hope they’ll stay around, I sincerely do ;-)).

Maybe one thing to consider here is whether forking a project on GitLab is easier than subscribing to a new mailing list when submitting a patch. This is something people usually do once per project they contribute to, but subscribing to a mailing list may actually be a barrier.

Some people would argue, though, it’s a both ways barrier, mainly considering one may extensively contribute to projects using one or the other workflow. IMHO, it’s not exactly true. Subscribing to a mailing list, getting the patches correctly formatted feels more difficult than forking a repo and submitting a Merge Request.

In my personal case, I can tell the only projects I contribute to which still didn’t adopt GitLab / GitHub workflow are the libvirt ones, although it may change in the near future, as mentioned by Daniel P. Berrangé on his KVM Forum 2019 talk.

… what are the pros and cons? …

When talking about the “pros” and “cons” is really hard to get exactly which are the objective and subjective pros and cons.

  • pros

    • CI: The possibility to have a CI running for all libosinfo projects, running the tests we have on each MR, without any effort / knowledge of the contributor about this;

    • Tracking non-reviewed patches: Although this one may be related to each one’s workflow, it’s objective that figuring out which Merge Requests need review on GitLab is way easier for a new contributor than navigating through a mailing list;

    • Centralisation: This is one of the subjective ones, for sure. For libosinfo we have adopted GitLab as its issue tracker as well, which makes my life as maintainer quite easy as I have “Issues” and “Merge Requests” in a single place. It may not be true for different projects, though.

  • cons

    • Reviewing commit messages: It seems to be impossible to review commit messages, unless you make a comment about that. Making a comment, though, is not exactly practical as I cannot go specifically to the line I want to comment and make a suggestion.

    • Creating an account to yet another service: This is another one of the subjective ones. It bothers me a lot, having to create an account on a different service in order to contribute to a project. This is my case with GitLab, GNOME GitLab, and GitHub. However, is that different from subscribing to a few different mailing lists? :-)

Those are, for me, the most prominent “pros” and “cons”. There are a few other things that I’ve seen people complaining, being the most common one related to changing their workflow. And this is something worth its own section! :-)

… is there something out there to make my workflow change easier? …

Yes and no. That’s a horrible answer, ain’t it? :-)

Daniel P. Berrangé has created a project called Bichon, which is a tool providing a terminal based user interface for reviewing GitLab Merge Requests.

Cool, right? In general, yes. But you have to keep in mind that the project is still in its embryonic stage. When more mature, I’m pretty sure it’ll help people used to mailing lists workflow to easily adapt to GitLab workflow without leaving behind the facilities of doing everything via command-line.

I’ve been using the tool for simple things, I’ve been contributing to the tool with simple patches. It’s fair to say that it I do prefer adding a comment to Merge Requests, Approve, and Merge them using Bichon than via the web UI. Is the tool enough to suffice all the people’s needs? Of course not. Will it be? Hardly. But it may be enough to surpass the blockers of migrating away from mailing lists workflow.

… a few words from different contributors …

I’ve decided to ask Cole Robinson and Felipe Borges a word or two about this subject as they are contributors / reviewers of libosinfo projects.

It should go without saying that their opinions should not be taken as “this workflow is better than the other”. However, take their words as valid points from people who are heavily using one workflow or the other, as Cole Robinson comes from libvirt / virt-tools world, which rely heavily on mailing list, and Felipe Borges comes from GNOME world, which is a huge GitLab consumer.

“The change made things different for me, slightly worse but not in any disruptive way. The main place I feel the pain is putting review comments into a web UI rather than inline in email which is more natural for me. For a busier project than libosinfo I think the pain would ramp up, but it would also force me to adapt more. I’m still largely maintaining an email based review workflow and not living in GitLab / GitHub” - Cole Robinson

“The switch to Gitlab has significantly lowered the threshold for people getting started. The mailing list workflow has its advantages but it is an entry barrier for new contributors that don’t use native mail clients and that learned the Pull Request workflow promoted by GitLab/GitHub. New contributors now can easily browse the existing Issues and find something to work on, all in the same place. Reviewing contributions with inline discussions and being able to track the status of CI pipelines in the same interface is definitely a must. I’m sure Libosinfo foresees an increase in the number of contributors without losing existing ones, considering that another advantage of Gitlab is that it allows developers to interact with the service from email, similarly to the email-driven git workflow that we were using before.” - Felipe Borges

… is there any conclusion from the author’s side?

As the first thing, I’ve to emphasize two points:

  • Avoid keeping both workflows: Although we do that on libosinfo, it’s something I’d strongly discourage. It’s almost impossible to keep the information in sync in both places in a reasonable way.

  • Be aware of changes, be welcome to changes: As mentioned above, migrating from one workflow to another will be disruptive at some level. Is it actually a blocker? Although it was not for me, it may be for you. The thing to keep in mind here is to be aware of changes and welcome them knowing you won’t have a 1:1 replacement for your current workflow.

With that said, I’m mostly happy with the change made. The number of old time contributors has not decreased and, at the same time, the number of first time contributors has increased.

Another interesting fact is that the number of contributions using the mailing list has decreased as we only had 4 contributions through this mean since June 2019.

Well, that’s all I have to say about the topic. I sincerely hope a reading through this content somehow helps your project and the contributors of your project to have a better idea about the migration.

December 01, 2019 12:00 AM

November 29, 2019

Stefan Hajnoczi

Visiting the Centre for Computing History

I visited the Centre for Computing History today in Cambridge, UK. It's home to old machines from computer science history, 80s home computers, 90s games consoles, and much more. It was nice to see familiar machines that I used to play with back in the day. This post has pictures from the visit to the museum.

The journey starts with the Megaprocessor, a computer build from 15,000 transistors with countless LEDs that allow you to see what's happening inside the machine while a program runs.

The Megaprocessor has its own instruction set architecture (ISA) with 4 General Purpose Registers (GPRs). The contents of the GPRs are visible on seven-segment displays and LEDs.

The instruction decoder looks fun. Although I didn't look in detail, it seems to be an old-school decoder where each bit in an instruction is hardcoded to enable or disable certain hardware units. No microcoded instructions here!

Ada Lovelace is considered the first programmer thanks to her work on the Analytical Engine. On a Women in Computer Science note, I learnt that Margaret Hamilton coined the term "software engineering". Hedy Lamarr also has an interesting background: movie star and inventor. There are posters throughout the museum featuring profiles on women in computer science that are quite interesting.

The museum is very hands-on with machines available to use and other items like books open to visitors. If nostalgia strikes and you want to sit down and play a game or program in BASIC, or just explore an exotic old machine, you can just do it! That is quite rare for a museum since these are historic items that can be fragile or temperamental.

Moving along in chronological order, here is the PDP-11 minicomputer that UNIX ran on in 1970! I've seen them in museums before have yet to interact with one.

In the 1980s the MicroVAX ran VMS or ULTRIX. I've read about these machines but they were before my time! It's cool to see one.

This HP Graphics Terminal was amusing. I don't see anything graphical about ASCII art, but I think the machine was connected to a plotter/printer.

The museum has a lot of microcomputers from the 1980s including brands I've never heard of. There were also machines with laserdiscs or more obscure optical media, what eventually became the "multi-media" buzzword in the 90s when CD-ROMs became mainstream.

Speaking of optical media, here is an physical example of bitrot, the deterioration of data media or software!

Classic home computers: ZX Spectrum, Commodore 64, Atari ST Mega 2, and Acorn. The museum has machines that were popular in the UK, so the selection is a little different from what you find in the Computer History Museum in Mountain View, CA, USA.

There are games consoles from the 80s, 90s, and 2000s. The Gameboy holds a special place for me. Both the feel of the buttons and the look of the display still seems right in this age of high resolution color displays.

The museum has both the popular Nintendo, SEGA, and Sony consoles as well as rarer specimens that I've never seen in real life before. It was cool to see an Intellivision, Jaguar, etc.

Back to UNIX. This SGI Indy brought back memories. I remember getting a used one in order to play with the IRIX operating system. It was already an outdated machine at the time but the high resolution graphics and the camera were clearly ahead of its time.

Say hello to an old friend. I remember having exactly the same 56K modem! What a wonderful dial-up sound :).

And finally, the Palm pilot. Too bad that the company failed, they had neat hardware before smartphones came along. I remember programming and reverse engineering on the Palm.


If you visit Cambridge, UK be sure to check out the Centre for Computing History. It has an excellent collection of home computers and games consoles. I hope it will be expanded to cover the 2000s internet era too (old web browsers, big websites that no longer exist, early media streaming, etc).

by Unknown ( at November 29, 2019 07:33 PM

November 27, 2019

Gerd Hoffmann

virtio gpu status and plans

Time for a status update for virtio-gpu development, so lets go ...

Before we begin: If you want follow development and discussions more closely head over to the virgl project at freedesktop gitlab. Git repos are there, most discussions are happening in gitlab issues.

What happened

Over the course of the last year (2019) a bunch of cleanups happened in the virtio-gpu linux kernel driver code base, to prepare for the planned changes and to reduce the code size by using common drm code. Memory management was switched over to use the shmem helpers instead of ttm. fbdev support (for the framebuffer console) was switched over to the generic drm fbdev emulation. fence support was improved. Various bugfixes.

Planned feature: shared mappings

Little change to reduce image data copying. Currently resources have a guest and a host buffer, and there are transfer commands to copy data between the two. Shared mappings allow the host to use the guest buffer directly.

On the guest side this is pretty simple, the guest only needs to inform the host that a shared mapping for the given resource -- so the host might see changes without explicit transfer commands -- is fine.

On the host side this is a bit more involved. Qemu will create a dma-buf for the resource, using the udmabuf driver. Which in turn allows qemu create a linear mapping of the resource, even if it is scattered in guest memory. That way the resource can be used directly (i.e. a pixman image created for it, ...)

Status: Almost ready to be submitted upstream.

Planned feature: blob resources

Currently virtio-gpu resources always have a format attached to it. So they are all-in-one objects, handling both memory management and image metadata. Combining both works ok for opengl, because we have a 1:1 relationship there. It will not work for vulkan though because the memory management works radically different there.

The drm subsystem has separate entities too: gem objects for memory management and framebuffer objects for the format metadata. That is difficuilt to model for virtio-gpu. virtio-gpu supports only a single format for dumb framebuffers because of that. dma-buf imports can't be supported too.

Blob resources will all fix that. A blob resource is just a bunch of bytes, i.e. it has only the memory management aspect.

Status: Proof-of-concept works. Various details are to be hashed out. Next in line after "shared mappings" feature.

Planned feature: metadata query

Allows guest userspace to query host renderer for capabilities and allocation requirements. Simliar to capsets but more flexible.

Status: Some test code exists. Not fully clear yet whenever that'll makes sense as feature on its own. Maybe it'll be folded into vulkan support.

Planned feature: host memory

For some features -- coherent memory for example, which is required by vulkan and at least one opengl extension -- the host gpu driver must allocate resources. So the model of using guest allocated memory for resources doesn't work, we have to map the host-allocated resources into the guest address space instead.

Virtio recently got support for (host-managed) shared memory, because virtiofs needs this for dax support. The plan is to use that for virtio-gpu too.

Status: Incomplete test code exists. Alot of the kernel driver cleanups where done to prepare the driver for this.

Planned feature: vulkan support

It's coming, but will surely take a while to actually arrive. As you have proably noticed while reading the article plans are already done with vulkan in mind, even if features are useful without vulkan, so we don't have to change things again when vulkan actually lands.

by Gerd Hoffmann at November 27, 2019 11:00 PM

Stefan Hajnoczi

Software Freedom Conservancy donation matching is back!

Software Freedom Conservancy is a non-profit that provides a home for Git, QEMU, Inkscape, and many other popular open source projects. Conservancy is also influential in promoting free software and open source licenses, including best practices for license compliance. They help administer the Outreachy open source internship program that encourages diversity in open source. They are a small organization with just 5 full-time employees taking on many tasks important in the open source community.

The yearly donation matching event has started again, so now is the best time to become a supporter by donating!

by Unknown ( at November 27, 2019 09:31 AM

November 26, 2019

Daniel Berrange

ANNOUNCE: libvirt-glib release 3.0.0

I am pleased to announce that a new release of the libvirt-glib package, version 3.0.0, is now available from

The packages are GPG signed with

Key fingerprint: DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF (4096R)

Changes in this release:

  • Add support for bochs video device
  • Add API to query firmware config
  • Improve testing coverage
  • Validate min/max glib API versions in use
  • Remove deprecated G_PARAM_PRIVATE
  • Fix docs build linking problems
  • Convert python demos to be python 3 compatible & use modern best practice for pyobject introspection bindings
  • Add API to query domain capaibilities
  • Refresh translations
  • Simplify build process for handling translations
  • Fix some memory leaks
  • Add API for setting storage volume features

Thanks to everyone who contributed to this new release through testing, patches, bug reports, translations and more.

by Daniel Berrange at November 26, 2019 01:15 PM

November 21, 2019

QEMU project

Presentations from KVM Forum 2019

Last month, the KVM Forum 2019 took place in Lyon, France. The conference also featured quite a lot talks about QEMU, and now the videos of the presentation are available online. So for those who could not attend, here is the list of the QEMU-related presentations:

More interesting virtualization-related talks can be found in the KVM Forum Youtube Channel, and for LWN readers, there is “A recap of KVM Forum 2019” article, too.

by Thomas Huth at November 21, 2019 04:30 PM

November 19, 2019

Stefan Hajnoczi

Video and slides available for "virtio-fs: A Shared File System for Virtual Machines"

This year I presented virtio-fs at KVM Forum 2019 with David Gilbert and Miklos Szeredi. virtio-fs is a host<->guest file system that allows guests to access a shared directory on the host. We've been working on virtio-fs together with Vivek Goyal and community contributors since 2018 and are excited that it is now being merged upstream in Linux and QEMU.

virtio-fs gives guests file system access without the need for disk image files or copying files between the guest and host. You can even boot a guest from a directory on the host without a disk image file. Kata Containers 1.7 and later ship with virtio-fs support for running VM-isolated containers.

What is new and interesting about virtio-fs is that it takes advantage of the co-location of guests and the hypervisor to avoid file server communication and to provide local file system semantics. The guest can map the contents of files from the host page cache. This bypasses the guest page cache to reduce memory footprint and avoid copying data into guest RAM. Network file systems and earlier attempts at paravirtualized file systems, like virtio-9p, cannot do this since they are designed for message-passing communication only.

To learn more about virtio-fs, check out the video or slides (PDF) from the presentation.

by Unknown ( at November 19, 2019 08:25 AM

November 15, 2019

QEMU project

Micro-Optimizing KVM VM-Exits

Background on VM-Exits

KVM (Kernel-based Virtual Machine) is the Linux kernel module that allows a host to run virtualized guests (Linux, Windows, etc). The KVM “guest execution loop”, with QEMU (the open source emulator and virtualizer) as its user space, is roughly as follows: QEMU issues the ioctl(), KVM_RUN, to tell KVM to prepare to enter the CPU’s “Guest Mode” – a special processor mode which allows guest code to safely run directly on the physical CPU. The guest code, which is inside a “jail” and thus cannot interfere with the rest of the system, keeps running on the hardware until it encounters a request it cannot handle. Then the processor gives the control back (referred to as “VM-Exit”) either to kernel space, or to the user space to handle the request. Once the request is handled, native execution of guest code on the processor resumes again. And the loop goes on.

There are dozens of reasons for VM-Exits (Intel’s Software Developer Manual outlines 64 “Basic Exit Reasons”). For example, when a guest needs to emulate the CPUID instruction, it causes a “light-weight exit” to kernel space, because CPUID (among a few others) is emulated in the kernel itself, for performance reasons. But when the kernel cannot handle a request, e.g. to emulate certain hardware, it results in a “heavy-weight exit” to QEMU, to perform the emulation. These VM-Exits and subsequent re-entries (“VM-Enters”), even the light-weight ones, can be expensive. What can be done about it?

Guest workloads that are hard to virtualize

At the 2019 edition of the KVM Forum in Lyon, kernel developer Andrea Arcangeli addressed the kernel part of minimizing VM-Exits.

His talk touched on the cost of VM-Exits into the kernel, especially for guest workloads (e.g. enterprise databases) that are sensitive to their performance penalty. However, these workloads cannot avoid triggering VM-Exits with a high frequency. Andrea then outlined some of the optimizations he’s been working on to improve the VM-Exit performance in the KVM code path – especially in light of applying mitigations for speculative execution flaws (Spectre v2, MDS, L1TF).

Andrea gave a brief recap of the different kinds of speculative execution attacks (retpolines, IBPB, PTI, SSBD, etc). Followed by that he outlined the performance impact of Spectre-v2 mitigations in context of KVM.

The microbechmark: CPUID in a one million loop

Andrea constructed a synthetic microbenchmark program (without any GCC optimizations or caching) which runs the CPUID instructions one million times in a loop. This microbenchmark is meant to focus on measuring the performance of a specific area of the code – in this case, to test the latency of VM-Exits.

While stressing that the results of these microbenchmarks do not represent real-world workloads, he had two goals in mind with it: (a) explain how the software mitigation works; and (b) to justify to the broader community the value of the software optimizations he’s working on in KVM.

Andrea then reasoned through several interesting graphs that show how CPU computation time gets impacted when you disable or enable the various kernel-space mitigations for Spectre v2, L1TF, MDS, et al.

The proposal: “KVM Monolithic”

Based on his investigation, Andrea proposed a patch series, “KVM monolithc”, to get rid of the KVM common module, ‘kvm.ko’. Instead the KVM common code gets linked twice into each of the vendor-specific KVM modules, ‘kvm-intel.ko’ and ‘kvm-amd.ko’.

The reason for doing this is that the ‘kvm.ko’ module indirectly calls (via the “retpoline” technique) the vendor-specific KVM modules at every VM-Exit, several times. These indirect calls—via function pointers in the C source code—were not optimal before, but the “retpoline” mitigation (which isolates indirect branches, that allow a CPU to execute code from arbitrary locations, from speculative execution) for Spectre v2 compounds the problem, as it degrades performance.

This approach will result in a few MiB of increased disk space for ‘kvm-intel.ko’ and ‘kvm-amd.ko’, but the upside in saved indirect calls, and the elimination of “retpoline” overhead at run-time more than compensate for it.

With the “KVM Monolithic” patch series applied, Andrea’s microbenchmarks show a double-digit improvement in performance with default mitigations (for Spectre v2, et al) enabled on both Intel ‘VMX’ and AMD ‘SVM’. And with ‘spectre_v2=off’ or for CPUs with IBRS_ALL in ARCH_CAPABILITIES “KVM monolithic” still improve[s] performance, albiet it’s on the order of 1%.


Removal of the common KVM module has a non-negligible positive performance impact. And the “KVM Monolitic” patch series is still actively being reviewed, modulo some pending clean-ups. Based on the upstream review discussion, KVM Maintainer, Paolo Bonzini, and other reviewers seemed amenable to merge the series.

Although, we still have to deal with mitigations for ‘indirect branch prediction’ for a long time, reducing the VM-Exit latency is important in general; and more specifically, for guest workloads that happen to trigger frequent VM-Exits, without having to disable Spectre v2 mitigations on the host, as Andrea stated in the cover letter of his patch series.

by Kashyap Chamarthy at November 15, 2019 05:00 AM

November 09, 2019

Stefano Garzarella

KVM Forum 2019: virtio-vsock in QEMU, Firecracker and Linux

Slides and recording are available for the “virtio-vsock in QEMU, Firecracker and Linux: Status, Performance and Challenges“ talk that Andra Paraschiv and I presented at KVM Forum 2019. This was the 13th edition of the KVM Forum conference. It took place in Lyon, France in October 2019.

We talked about the current status and future works of VSOCK drivers in Linux and how Firecracker and QEMU provides the virtio-vsock device.


Initially, Andra gave an overview of VSOCK, she described the state of the art, and the key features:

  • it is very simple to configure, the host assigns an unique CID (Context-ID) to each guest, and no configuration is needed inside the guest;

  • it provides AF_VSOCK address family, allowing user space application in the host and guest to communicate using standard POSIX Socket API (e.g. bind, listen, accept, connect, send, recv, etc.)

Andra also described common use cases for VSOCK, such as guest agents (clipboard sharing, remote console, etc.), network applications using SOCK_STREAM, and services provided by the hypervisor to the guests.

Going into the implementation details, Andra explained how the device in the guest communicates with the vhost backend in the host, exchanging data and events (i.e. ioeventfd, irqfd).


Focusing on Firecracker, Andra gave a brief overview on this new VMM (Virtual Machine Monitor) written in Rust and she explained why, in the v0.18.0 release, they switched from the experimental vhost-vsock implementation to a vhost-less solution:

  • focus on security impact
  • less dependency on host kernel features

This change required a device emulation in Firecracker, that implements virtio-vsock device model over MMIO. The device is exposed in the host using UDS (Unix Domain Sockets).

Andra described how Firecracker maps the VSOCK ports on the uds_path specified in the VM configuration:

  • Host-Initiated Connections

    • Guest: create an AF_VSOCK socket and listen() on PORT
    • Host: connect() to AF_UNIX at uds_path
    • Host: send() “CONNECT PORT\n”
    • Guest: accept() the new connection
  • Guest-Initiated Connections

    • Host: create and listen() on an AF_UNIX socket at uds_path_PORT
    • Guest: create an AF_VSOCK socket and connect() to HOST_CID and PORT
    • Host: accept() the new connection

Finally, she showed the performance of this solution, running iperf-vsock benchmark, varying the size of the buffer used in Firecracker to transfer packets between the virtio-vsock device and the UNIX domain socket. The throughput on the guest to host path reaches 10 Gbps.


In the second part of the talk, I described the QEMU implementation. QEMU provides the virtio-vsock device using the vhost-vsock kernel module.

The vsock device in QEMU handles only:

  • configuration
    • user or management tool can configure the guest CID
  • live-migration
    • connected SOCK_STREAM sockets become disconnected. Applications must handle a connection reset error and should reconnect.
    • guest CID can be not available in the new host because can be assigned to another VM. In this case the guest is notified about the CID change.

The vhost-vsock kernel module handles the communication with the guest, providing in-kernel virtio device emulation, to have very high performance and to interface directly to the host socket layer. In this way, also host application can directly use POSIX Socket API to communicate with the guest. So, guest and host applications can be switched between them, changing only the destination CID.

virtio-vsock Linux drivers

After that, I told the story of VSOCK in the Linux tree, started in 2013 when the first implementation was merged, and the changes in the last year.

These changes mainly regard fixes, but for the virtio/vhost transports we also improved the performance with two simple changes released with Linux v5.4:

  • reducing the number of credit update messages exchanged
  • increasing the size of packets queued in the virtio-vsock device from 4 KB up to 64 KB, the maximum packet size handled by virtio-vsock devices.

With these changes we are able to reach ~40 Gbps in the Guest -> Host path, because the guest can now send up to 64 KB packets directly to the host; for the Host -> Guest path, we reached ~25 Gbps, because the host is still using 4 KB buffer preallocated by the guest.

Tools and languages that support VSOCK

In the last few years, several applications, tools, and languages started to support VSOCK and I listed them to update the audience:

  • Tools:

    • wireshark >= 2.40 [2017-07-19]
    • iproute2 >= 4.15 [2018-01-28]
      • ss
    • tcpdump
      • merged in master [2019-04-16]
    • nmap >= 7.80 [2019-08-10]
      • ncat
      • nbd
    • nbdkit >= 1.15.5 [2019-10-19]
    • libnbd >= 1.1.6 [2019-10-19]
    • iperf-vsock
      • iperf3 fork
  • Languages:

    • C
      • glibc >= 2.18 [2013-08-10]
    • Python
      • python >= 3.7 alpha 1 [2017-09-19]
    • Golang
    • Rust
      • libc crate >= 0.2.59 [2019-07-08]
        • struct sockaddr_vm
        • VMADDR_* macros
      • nix crate >= 0.15.0 [2019-08-10]
        • VSOCK supported in the socket API (nix::sys::socket)

Next steps

Concluding, I went through the next challenges that we are going to face:

  • multi-transport to use VSOCK in a nested VM environment. because we are limited by the fact that the current implementation can handle only one transport loaded at run time, so, we can’t load virtio_transport and vhost_transport together in the L1 guest. I already sent some patches upstream [RFC, v1], but they are still in progress.

  • network namespace support to create independent addressing domains with VSOCK socket. This could be useful for partitioning VMs in different domains or, in a nested VM environment, to isolate host applications from guest applications bound to the same port.

  • virtio-net as a transport for the virtio-vsock to avoid to re-implement features already done in virtio-net, such as mergeable buffers, page allocation, small packet handling.

From the audience

Other points to be addressed came from the comments we received from the audience:

  • loopback device could be very useful for developers to test applications that use VSOCK socket. The current implementation support loopback only in the guest, but it would be better to support it also in the host, adding VMADDR_CID_LOCAL special address.

  • VM to VM communication was asked by several people. Introducing it in the VSOCK core could complicate the protocol, the addressing and could require some sort of firewall. For now we do not have in mind to do it, but I developed a simple user space application to solve this issue: vsock-bridge. In order to improve the performance of this solution, we will consider the possibility to add sendfile(2) or MSG_ZEROCOPY support to the AF_VSOCK core.

  • virtio-vsock windows drivers is not planned to be addressed, but contributions are welcome. Other virtio windows drivers are available in the vm-guest-drivers-windows repository.

Stay tuned!

by (Stefano Garzarella) at November 09, 2019 05:45 PM

November 07, 2019

QEMU project

Fuzzing QEMU Device Emulation

QEMU ( emulates a large number of network cards, disk controllers, and other devices needed to simulate a virtual computer system, called the “guest”.

The guest is untrusted and QEMU may even be used to run malicious software, so it is important that bugs in emulated devices do not allow the guest to compromise QEMU and escape the confines of the guest. For this reason a Google Summer of Code project was undertaken to develop fuzz tests for emulated devices.

QEMU device emulation attack surface

Fuzzing is a testing technique that feeds random inputs to a program in order to trigger bugs. Random inputs can be generated quickly without relying on human guidance and this makes fuzzing an automated testing approach.

Device Fuzzing

Emulated devices are exposed to the guest through a set of registers and also through data structures located in guest RAM that are accessed by the device in a process known as Direct Memory Access (DMA). Fuzzing emulated devices involves mapping random inputs to the device registers and DMA memory structures in order to explore code paths in QEMU’s device emulation code.

Device fuzzing overview

Fuzz testing discovered an assertion failure in the virtio-net network card emulation code in QEMU that can be triggered by a guest. Fixing such bugs is usually easy once fuzz testing has generated a reproducer.

Modern fuzz testing intelligently selects random inputs such that new code paths are explored and previously-tested code paths are not tested repeatedly. This is called coverage-guided fuzzing and involves an instrumented program executable so the fuzzer can detect the code paths that are taken for a given input. This was surprisingly effective at automatically exploring the input space of emulated devices in QEMU without requiring the fuzz test author to provide detailed knowledge of device internals.

How Fuzzing was Integrated into QEMU

Device fuzzing in QEMU is driven by the open source libfuzzer library ( A special build of QEMU includes device emulation fuzz tests and launches without running a normal guest. Instead the fuzz test directly programs device registers and stores random data into DMA memory structures.

The next step for the QEMU project will be to integrate fuzzing into Google’s OSS-Fuzz ( continuous fuzzing service. This will ensure that fuzz tests are automatically run after new code is merged into QEMU and bugs are reported to the community.


Fuzzing emulated devices has already revealed bugs in QEMU that would have been time-consuming to find through manual testing approaches. So far only a limited number of devices have been fuzz-tested and we hope to increase this number now that the foundations have been laid. The goal is to integrate these fuzz tests into OSS-Fuzz so that fuzz testing happens continuously.

This project would not have been possible without Google’s generous funding of Google Summer of Code. Alexander Oleinik developed the fuzzing code and was mentored by Bandan Das, Paolo Bonzini, and Stefan Hajnoczi.

by Stefan Hajnoczi and Alexander Oleinik at November 07, 2019 05:50 AM

November 06, 2019

Cornelia Huck

s390x changes in QEMU 4.2

You know the drill: QEMU is entering freeze (this time for 4.2), and there's a post on the s390x changes for the upcoming release.


  • Emulation for  IEP (Instruction Execution Protection), a z14 feature, has been added.
  • A bunch of fixes in the vector instruction emulation and in the fault-handling code.


  • For quite some time now, the code has been implicitly relying on the presence of the 'flic' (floating interrupt controller) KVM device (which had been added in Linux 3.15). Nobody really complained, so we won't try to fix this up and instead make the dependency explicit.
  • The KVM memslot handling was reworked to be actually sane. Unfortunately, this breaks migration of huge KVM guests with more than 8TB of memory from older QEMUs. Migration of guests with less than 8TB continues to work, and there's no planned breakage of migration of >8TB guests starting with 4.2.

CPU models

  • We now know that the gen15a is called 'z15', so reflect this in the cpu model description.
  • The 'qemu' and the 'max' models gained some more features.
  • Under KVM, 'query-machines' will now return the correct default cpu model ('host-s390x-cpu').


  • The usual array of bugfixes, including in SCLP handling and in the s390-ccw bios.

by Cornelia Huck ( at November 06, 2019 03:12 PM

October 20, 2019

KVM on Z

Ubuntu 19.10 released

Ubuntu Server 19.10 is out!
It ships For a detailed list of KVM on Z changes, see the release notes here.

by Stefan Raspl ( at October 20, 2019 01:12 PM

October 16, 2019

Fabiano Fidêncio

Libosinfo (Part I)

This is the first blog post of a series which will cover Libosinfo, what it is, who uses it, how it is used, how to manage it, and, finally, how to contribute to it.

A quick overview

Libosinfo is the operating system information database. As a project, it consists of three different parts, with the goal to provide a single place containing all the required information about an operating system in order to provision and manage it in a virtualized environment.

The project allows management applications to:

  • Automatically identify for which operating system an ISO image or an installation tree is intended to;

  • Find the download location of installable ISOs and LiveCDs images;

  • Find the location of installation trees;

  • Query the minimum, recommended, and maximum CPU / memory / disk resources for an operating system;

  • Query the hardware supported by an operating system;

  • Generate scripts suitable for automating “Server” and “Workstation” installations;

The library (libosinfo)

The library API is written in C, taking advantage of GLib and GObject. Thanks to GObject Introspection, the API is automatically available in all dynamic programming languages with bindings for GObject (JavaScript, Perl, Python, and Ruby). Auto-generated bindings for Vala are also provided.

As part of libosinfo, three tools are provided:

  • osinfo-detect: Used to detect an Operating System from a given ISO or installation tree.

  • osinfo-install-script: Used to generate a “Server” or “Workstation” install-script to perform automated installation of an Operating System;

  • osinfo-query: Used to query information from the database;

The database (osinfo-db)

The database is written in XML and it can either be consumed via libosinfo APIs or directly via management applications’ own code.

It contains information about the operating systems, devices, installation scripts, platforms, and datamaps (keyboard and language mappings for Windows and Linux OSes).

The database tools (osinfo-db-tools)

These are tools that can be used to manage the database, which is distributed as a tarball archive.

  • osinfo-db-import: Used to import an osinfo database archive;

  • osinfo-db-export: Used to export an osinfo database archive;

  • osinfo-db-validate: Used to validate the XML files in one of the osinfo database locations for compliance with the RNG schema.

  • osinfo-db-path: Used to report the paths associated with the standard database locations;

The consumers …

Libosinfo and osinfo-db have management applications as their target audience. Currently the libosinfo project is consumed by big players in the virtual machine management environment such as OpenStack Nova, virt-manager, GNOME Boxes, Cockpit Machines, and KubeVirt.

… a little bit about them …

  • OpenStack Nova: An OpenStack project that provides a way to provision virtual machines, baremetal servers, and (limited supported for) system containers.

  • virt-manager: An application for managing virtual machines through libvirt.

  • GNOME Boxes: A simple application to view, access, and manage remote and virtual systems.

  • Cockpit Machines: A Cockpit extension to manage virtual machines running on the host.

  • KubeVirt: Virtual Machine Management on Kubernetes.

… and why they use it

  • Download ISOs: As libosinfo provides the ISO URLs, management applications can offer the user the option to download a specific operating system;

  • Automatically detect the ISO being used: As libosinfo can detect the operating system of an ISO, management applications can use this info to set reasonable default values for resources, to select the hardware supported, and to perform unattended installations.

  • Start tree installation: As libosinfo provides the tree installation URLs, management applications can use it to start a network-based installation without having to download the whole operating system ISO;

  • Set reasonable default values for RAM, CPU, and disk resources: As libosinfo knows the values that are recommended by the operating system’s vendors, management applications can rely on that when setting the default resources for an installation.

  • Automatically set the hardware supported: As libosinfo provides the list of hardware supported by an operating system, management applications can choose the best defaults based on this information, without taking the risk of ending up with a non-bootable guest.

  • Unattended install: as libosinfo provides unattended installations scripts for CentOS, Debian, Fedora, Fedora Silverblue, Microsoft Windows, OpenSUSE, Red Hat Enterprise Linux, and Ubuntu, management applications can perform unattended installations for both “Workstation” and “Server” profiles.

What’s next?

The next blog post will provide a “demo” of an unattended installation using both GNOME Boxes and virt-install and, based on that, explain how libosinfo is internally used by these projects.

By doing that, we’ll both cover how libosinfo can be used and also demonstrate how it can ease the usage of those management applications.

October 16, 2019 12:00 AM

September 26, 2019

Gerd Hoffmann

VGA and other display devices in qemu

There are alot of emulated display devices available in qemu. This blog post introduces them, explains the differences between them and the use cases they are good for.

The TL;DR version is in the recommendations section at the end of the article.

standard VGA

  • qemu: -vga std or -device VGA
  • libvirt: <model type='vga'/>
  • &check VGA compatible
  • &check vgabios support
  • &check UEFI support (QemuVideoDxe)
  • &check linux driver (bochs-drm.ko)

This is the default display device (on x86). It provides full VGA compatibility and support for a simple linear framebuffer (using the bochs dispi interface). It is the best choice compatibility wise, pretty much any guest should be able to bring up a working display on this device. Performance or usability can be better with other devices, see discussion below.

The device has 16 MB of video memory by default. This can be changed using the vgamem_mb property, -device VGA,vgamem_mb=32 for example will double the amount of video memory. The size must be a power of two, the valid range is 1 MB to 256 MB.

The linux driver supports page-flipping, so having room for 3-4 framebuffers is a good idea. The driver can leave the framebuffers in vram then instead of swapping them in and out. FullHD (1920x1080) for example needs a bit more than 8 MB for a single framebuffer, so 32 or 64 MB would be a good choice for that.

The UEFI setup allows to choose the display resolution which OVMF will use to initialize the display at boot. Press ESC at the tianocore splash screen to enter setup, then go to "Device Manager" &rightarrow "OVMF Platform Configuration".

bochs display device

  • qemu: -device bochs-display
  • libvirt: <model type='bochs'/>
  • &cross not VGA compatible
  • &check vgabios support
  • &check UEFI support (QemuVideoDxe)
  • &check linux driver (bochs-drm.ko)

This device supports a simple linear framebuffer. It also uses the bochs dispi interface for modesetting, therefore the linear framebuffer configuration is fully compatible to the standard VGA device.

The bochs display is not VGA compatible though. There is no support for text mode, planar video modes, memory windows at 0xa0000 and other legacy VGA features in the virtual hardware.

Main advantage over standard VGA is that this device is alot simpler. The code size and complexity needed to emulate this device is an order of magnitude smaller, resulting in a reduced attack surface. Another nice feature is that you can place this device in a PCI Express slot.

For UEFI guests it is safe to use the bochs display device instead of the standard VGA device. The firmware will setup a linear framebuffer as GOP anyway and never use any legacy VGA features.

For BIOS guests this device might be useable as well, depending on whenever they depend on direct VGA hardware access or not. There is a vgabios which supports text rendering on a linear framebuffer, so software which uses the vgabios services for text output will continue to work. Linux bootloaders typically fall into this category. The linux text mode console (vgacon) uses direct hardware access and does not work. The framebuffer console (fbcon running on vesafb or bochs-drm) works.

virtio vga

  • qemu: -vga virtio or -device virtio-vga
  • libvirt: <model type='virtio'/> (on x86).
  • &check VGA compatible
  • &check vgabios support
  • &check UEFI support (QemuVideoDxe)
  • &check linux driver (virtio-gpu.ko)

This is a modern, virtio-based display device designed for virtual machines. It comes with VGA compatibility mode. You need a guest driver to make full use of this device. If your guest OS has no driver it should still show a working display thanks to the VGA compatibility mode, but the device will not provide any advantages over standard VGA then.

This device has (optional) hardware-assisted opengl acceleration support. This can be enabled using the virgl=on property, which in turn needs opengl support enabled (gl=on) in the qemu display.

This device has multihead support, can be enabled using the max_outputs=2 property.

This device has no dedicated video memory (except for VGA compatibility), gpu data will be stored in main memory instead. Therefore this device has no config options for video memory size.

This is the place where most development happens, support for new, cool features will most likely be added to this device.

virtio gpu

  • qemu: -device virtio-gpu-pci
  • libvirt: <model type='virtio'/> (on arm).
  • &cross not VGA compatible
  • &cross no vgabios support
  • &check UEFI support (VirtioGpuDxe)
  • &check linux driver (virtio-gpu.ko)

This device lacks VGA compatibility mode but is otherwise identical to the virtio vga device. UEFI firmware can handle this, and if your guests has drivers too you can use this instead of virtio-vga. This will reduce the attack surface (no complex VGA emulation support) and reduce the memory footprint by 8 MB (no pci memory bar for VGA compatibility). This device can be placed in a PCI Express slot.

vhost-user virtio gpu

There is a vhost-user variant for both virtio vga and virtio gpu. This allows to run the virtio-gpu emulation in a separate process. This is good from the security perspective, especially if you want use virgl 3D acceleration, and it also helps with opengl performance.

Run the gpu emulation process (see contrib/vhost-user-gpu/ in the qemu source tree):

./vhost-user-gpu --virgl -s vgpu.sock

Run qemu:

qemu \
  -chardev socket,id=vgpu,path=vgpu.sock \
  -device vhost-user-vga,chardev=vgpu \
  [ ... ]

libvirt support is in the works.

qxl vga

  • qemu: -vga qxl or -device qxl-vga.
  • libvirt: <model type='qxl' primary='yes'/>.
  • &check VGA compatible
  • &check vgabios support
  • &check UEFI support (QemuVideoDxe)
  • &check linux driver (qxl.ko)
  • &check windows driver

This is a slightly dated display device designed for virtual machines. It comes with VGA compatibility mode. You need a guest driver to make full use of this device. If your guest OS has no driver it should still show a working display thanks to the VGA compatibility mode, but the device will not provide any advantages over standard VGA then.

This device has support for 2D acceleration. This becomes more and more useless though as modern display devices don't have dedicated 2D acceleration support any more and use the 3D engine for everything. The same happens on the software side, modern desktops are rendering with opengl or vulkan instead of using 2D acceleration.

Spice and qxl support offloading 2D acceleration to the spice client (typically virt-viewer these days). That is quite complex and with 2D acceleration being on the way out this becomes increasingly useless too. You might want pick some simpler device for security reasons.

This device has multihead support, can be enabled using the max_outputs=2 property. The linux driver will use this, the windows driver expects multiple devices instead (see below).

The amount of video memory for this device is configurable using the ram_size_mb and vram_size_mb properties for the two pci memory bars. The default is 64 MB for both, which should be plenty for typical use cases. When using 4K display resolution or multihead support you should assign more video memory though. When using small resolutions like 1024x768 you can assign less video memory to reduce the memory footprint.


  • qemu: -device qxl.
  • libvirt: <model type='qxl' primary='no'/>.

This device lacks VGA compatibility mode but is otherwise identical to the qxl vga device. Providing multihead support for windows guests is pretty much the only use case for this device. The windows guest driver expects one qxl device per secondary display (additionally to one qxl-vga device for the primary display).

cirrus vga

  • qemu: -vga cirrus or -device cirrus-vga.
  • libvirt: <model type='cirrus'/>.
  • &check VGA compatible
  • &check vgabios support
  • &check UEFI support (QemuVideoDxe)
  • &check linux driver (cirrus.ko)

Emulates a Cirrus SVGA device which used to be modern in the 90ies of the last century, more than 20 years ago. For the most part my blog article from 2014 is still correct; the device is mostly useful for guests which are equally old and are shipped with a driver for cirrus vga devices.

Two things have changed meanwhile though: Since qemu version 2.2 cirrus is not the default vga device any more. Also the cirrus driver in the linux kernel has been completely rewritten. In kernel 5.2 & newer the cirrus driver uses a shadow framebuffer and converts formats on the fly to hide some of the cirrus oddities from userspace (Xorg/wayland), so things are working a bit better now. That doesn't cure everything though, especially the available display resolutions are still constrained by the small amount of video memory.

ati vga

  • qemu: -device ati-vga.
  • &check VGA compatible
  • &check vgabios support
  • &cross no UEFI support

Emulates two ATI SVGA devices, the model property can be used to pick the variant. model=rage128p selects the "Rage 128 Pro" and model=rv100 selects the "Radeon RV100".

The devices are newer (late 90ies / early 2000) and more modern than the cirrus VGA. Nevertheless the use case is very similar: For guests of a similar age which are shipping with drivers for those devices.

This device has been added recently to qemu, development is in progress still. The fundamentals are working (modesetting, hardware cursor). Most important 2D accel ops are implemented too. 3D acceleration is not implemented yet.

Linux has both drm and fbdev drivers for these devices. The drm drivers are not working due to emulation being incomplete still (which hopefully changes in the future). The fbdev drivers are working. Modern linux distros prefer the drm drivers though. So you probably have to build your own kernel if you want use this device.


  • qemu: -device ramfb.
  • &cross not VGA compatible
  • &check vgabios support
  • &check UEFI support (QemuRamfbDxe)

Very simple display device. Uses a framebuffer stored in guest memory. The firmware initializes it and allows to use it as boot display (grub boot menu, efifb, ...) without needing complex legacy VGA emulation. Details can be found here.

no display device

  • qemu: -vga none -nographic.

You don't have to use a display device. If you don't need one you can run your guests with a serial console instead.

embedded devices

There are a bunch of other display devices. Those are typically SoC-specific and used by embedded board emulation. Just mentioning them here for completeness. You can't select the display device for embedded boards, the qemu emulation simply matches physical hardware here.


For the desktop use case (assuming display performance matters and/or you need multihead support), in order of preference:

For the server use case (assuming the GUI is rarely used, or not at all), in order of preference:

On arm systems display devices with a pci memory bar do not work, which reduces the choices alot. We are left with:

by Gerd Hoffmann at September 26, 2019 10:00 PM

September 13, 2019

KVM on Z

IBM z15 announced

Today, IBM announced the new IBM Z models:
Furthermore, check the updated IBM Z tested platforms matrix here.
We will look at features in support of the new IBM Z model in a separate blog entry soon.

by Stefan Raspl ( at September 13, 2019 09:42 AM

September 02, 2019

KVM on Z

virt-manager 2.2 released

virt-manager v2.2 was released a while ago. One feature we contributed is the ability to choose a temporary boot device. Here is a quick write-up on how to use that feature.

virt-xml is a simple command line tool for editing domain definitions. It can be used interactively or for batch processing. Starting with virt-manager v2.2, virt-xml allows to boot a guest with a temporarily changed domain definition. This allows us to specify a boot configuration other than the guest's current domain definition. This is especially useful as the IBM Z architecture allows for only a single boot device, and therefore the boot order settings do not work the way they do on other platforms: If the first boot device fails to boot, there is no attempt made to boot from the next boot device. In addition, the architecture/BIOS has no support for interactively changing the boot device during the boot/IPL process.
Therefore, two new command line options were introduced:
  • --no-define makes any changes to the domain definition transient (i.e. the guest's persistent domain XML will not be modified)
  • --start allows the user to start the domain after the changes to the domain XML were applied
Here is a simple example illustrating the usage:
  1. First, select the device which should be changed by a selector. In this example, the unique target name of the disk is used. See man virt-xml for a list of further possibilities.
  2. Temporarily modify the boot order, assign the first slot to device vdc, and start the guest right away:

      $ virt-xml --edit target='vdc' --disk='boot_order=1' --start \
                 --no-define sample_domain

    Note: If there was another device that already had boot_order=1, its boot order would be incremented.
As soon the guest is stopped, the changes will vanish.

    by Stefan Raspl ( at September 02, 2019 03:22 PM

    August 29, 2019

    KVM on Z

    Webinar: How to Virtualize with KVM in Live Demo, August 28

    We will explain basic KVM concepts, including CPU and memory virtualization, storage, network management, as well as a brief overview about commonalities and differences with other virtualization environments. Furthermore, a live demo will demonstrate how to use the KVM management tools to create and install Linux guests, how to operate and monitor.

    Christian Bornträger, Chief Product Owner Linux and KVM on IBM Z.

    Register here. You can check the system requirements here.
    After registering, you will receive a confirmation email containing information about joining the webinar.

    Replay & Archive
    All sessions are recorded. For the archive as well as a replay and handout of this session and all previous webcasts see here.

    by Stefan Raspl ( at August 29, 2019 12:10 PM

    August 24, 2019

    Stefano Garzarella

    How to measure the boot time of a Linux VM with QEMU/KVM

    The stefano-garzarella/qemu-boot-time repository contains a Python perf-script and (Linux, QEMU, SeaBIOS) patches to measure the boot time of a Linux VM with QEMU/KVM.

    Using I/O writes, we can trace events to measure the time consumed during the boot phase by the different components:

    We extended the I/O port addresses and values defined in qboot/benchmark.h adding new trace points to trace the kernel boot time.

    In the repository you can find patches for Linux, QEMU, and SeaBIOS to add the I/O writes in the components involved during the boot, and a Python perf-script useful to process the data recorded through perf using perf-script’s built-in Python interpreter.

    Trace points

    The benchmark.h file contains the following trace points used in the patches:

    • QEMU
      • qemu_init_end: first kvm_entry (i.e. QEMU initialized has finished)
    • Firmware (SeaBIOS + optionrom or qboot)
      • fw_start: first entry of the firmware
      • fw_do_boot: after the firmware initialization (e.g. PCI setup, etc.)
      • linux_start_boot: before the jump to the Linux kernel
      • linux_start_pvhboot: before the jump to the Linux PVH kernel
    • Linux Kernel
      • linux_start_kernel: first entry of the Linux kernel
      • linux_start_user: before starting the init process

    Custom trace points

    If you want to add new trace points, you can simply add an I/O write to LINUX_EXIT_PORT or FW_EXIT_PORT I/O port with a value (> 7) that identifies the trace point:

        outb(10, LINUX_EXIT_PORT);

    The perf script output will contain Exit point 10 line that identifies your custom trace point:

     qemu_init_end: 143.770419
     fw_start: 143.964328 (+0.193909)
     fw_do_boot: 164.71107 (+20.746742)
     Exit point 10: 165.396804 (+0.685734)
     linux_start_kernel: 165.979486 (+0.582682)
     linux_start_user: 272.178335 (+106.198849)

    How to use

    Clone qemu-boot-time repository

    cd ${REPOS}
    git clone

    Apply patches to Linux, QEMU and SeaBIOS

    Trace points are printed only if they are recorded, so you can enable few of them, patching only the components that you are interested in.


    Apply the patches/linux.patch to your Linux kernel in order to trace kernel events

    cd ${REPOS}/linux
    git checkout -b benchmark
    git am ${REPOS}/qemu-boot-time/patches/linux.patch


    Apply the patches/qemu.patch to your QEMU in order to trace optionrom events

    cd ${REPOS}/qemu
    git checkout -b benchmark
    git am ${REPOS}/qemu-boot-time/patches/qemu.patch
    mkdir build-benchmark
    cd build-benchmark
    ../configure --target-list=x86_64-softmmu ...

    You can use qemu-system-x86_64 -L ${REPOS}/qemu/build-benchmark/pc-bios/optionrom/ ... to use the optionrom patched.


    Apply the patches/seabios.patch to your SeaBIOS in order to trace bios events

    cd ${REPOS}/seabios
    git checkout -b benchmark
    git am ${REPOS}/qemu-boot-time/patches/seabios.patch
    make clean distclean
    cp ${REPOS}/qemu/roms/config.seabios-256k .config
    make oldnoconfig

    You can use qemu-system-x86_64 -bios ${REPOS}/seabios/out/bios.bin ... to use the SeaBIOS image patched.


    qboot already defines trace points, we just need to compile it defining BENCHMARK_HACK

    cd ${REPOS}/qboot
    make clean

    You can use qemu-system-x86_64 -bios ${REPOS}/qboot/bios.bin ... to use the qboot image.

    Enable KVM events

    The following steps allow perf record to get the kvm trace events:

    echo 1 > /sys/kernel/debug/tracing/events/kvm/enable
    echo -1 > /proc/sys/kernel/perf_event_paranoid
    mount -o remount,mode=755 /sys/kernel/debug
    mount -o remount,mode=755 /sys/kernel/debug/tracing

    Record the trace events

    Start perf record to get the trace events

    perf record -a -e kvm:kvm_entry -e kvm:kvm_pio -e sched:sched_process_exec \
                -o $PERF_DATA &

    You can run QEMU multiple times to get also some statistics (Avg/Min/Max)

    qemu-system-x86_64 -machine q35,accel=kvm \
                       -bios seabios/out/bios.bin \
                       -L qemu/build-benchmark/pc-bios/optionrom/ \
                       -kernel linux/bzImage ...
    qemu-system-x86_64 -machine q35,accel=kvm \
                       -bios seabios/out/bios.bin \
                       -L qemu/build-benchmark/pc-bios/optionrom/ \
                       -kernel linux/bzImage ...
    qemu-system-x86_64 -machine q35,accel=kvm \
                       -bios seabios/out/bios.bin \
                       -L qemu/build-benchmark/pc-bios/optionrom/ \
                       -kernel linux/bzImage ...

    Stop perf record

    kill $PERF_PID

    Process the trace recorded using the

    Note: times printed in milliseconds

    perf script -s ${REPOS}/qemu-boot-time/perf-script/ -i $PERF_DATA
    in trace_begin
    sched__sched_process_exec     1 55061.435418353   289738 qemu-system-x86
    kvm__kvm_entry           1 55061.466887708   289741 qemu-system-x86
    kvm__kvm_pio             1 55061.467070650   289741 qemu-system-x86      rw=1, port=0xf5, size=1, count=1, val=1
    kvm__kvm_pio             1 55061.475818073   289741 qemu-system-x86      rw=1, port=0xf5, size=1, count=1, val=4
    kvm__kvm_pio             1 55061.477168037   289741 qemu-system-x86      rw=1, port=0xf4, size=1, count=1, val=3
    kvm__kvm_pio             1 55061.558779540   289741 qemu-system-x86      rw=1, port=0xf4, size=1, count=1, val=5
    kvm__kvm_pio             1 55061.686849663   289741 qemu-system-x86      rw=1, port=0xf4, size=1, count=1, val=6
    sched__sched_process_exec     4 55067.461869075   289793 qemu-system-x86
    kvm__kvm_entry           4 55067.496402472   289796 qemu-system-x86
    kvm__kvm_pio             4 55067.496555385   289796 qemu-system-x86      rw=1, port=0xf5, size=1, count=1, val=1
    kvm__kvm_pio             4 55067.505067184   289796 qemu-system-x86      rw=1, port=0xf5, size=1, count=1, val=4
    kvm__kvm_pio             4 55067.506395502   289796 qemu-system-x86      rw=1, port=0xf4, size=1, count=1, val=3
    kvm__kvm_pio             4 55067.584029910   289796 qemu-system-x86      rw=1, port=0xf4, size=1, count=1, val=5
    kvm__kvm_pio             4 55067.704751791   289796 qemu-system-x86      rw=1, port=0xf4, size=1, count=1, val=6
    sched__sched_process_exec     0 55070.073823767   289827 qemu-system-x86
    kvm__kvm_entry           0 55070.110507211   289830 qemu-system-x86
    kvm__kvm_pio             0 55070.110694645   289830 qemu-system-x86      rw=1, port=0xf5, size=1, count=1, val=1
    kvm__kvm_pio             1 55070.120092692   289830 qemu-system-x86      rw=1, port=0xf5, size=1, count=1, val=4
    kvm__kvm_pio             1 55070.121437922   289830 qemu-system-x86      rw=1, port=0xf4, size=1, count=1, val=3
    kvm__kvm_pio             1 55070.198628779   289830 qemu-system-x86      rw=1, port=0xf4, size=1, count=1, val=5
    kvm__kvm_pio             1 55070.315734630   289830 qemu-system-x86      rw=1, port=0xf4, size=1, count=1, val=6
    in trace_end
    Trace qemu-system-x86
    1) pid 289738
     qemu_init_end: 31.469355
     fw_start: 31.652297 (+0.182942)
     fw_do_boot: 40.39972 (+8.747423)
     linux_start_boot: 41.749684 (+1.349964)
     linux_start_kernel: 123.361187 (+81.611503)
     linux_start_user: 251.43131 (+128.070123)
    2) pid 289793
     qemu_init_end: 34.533397
     fw_start: 34.68631 (+0.152913)
     fw_do_boot: 43.198109 (+8.511799)
     linux_start_boot: 44.526427 (+1.328318)
     linux_start_kernel: 122.160835 (+77.634408)
     linux_start_user: 242.882716 (+120.721881)
    3) pid 289827
     qemu_init_end: 36.683444
     fw_start: 36.870878 (+0.187434)
     fw_do_boot: 46.268925 (+9.398047)
     linux_start_boot: 47.614155 (+1.34523)
     linux_start_kernel: 124.805012 (+77.190857)
     linux_start_user: 241.910863 (+117.105851)
     qemu_init_end: 34.228732
     fw_start: 34.403161 (+0.174429)
     fw_do_boot: 43.288918 (+8.885757)
     linux_start_boot: 44.630088 (+1.34117)
     linux_start_kernel: 123.442344 (+78.812256)
     linux_start_user: 245.408296 (+121.965952)
     qemu_init_end: 31.469355
     fw_start: 31.652297 (+0.182942)
     fw_do_boot: 40.39972 (+8.747423)
     linux_start_boot: 41.749684 (+1.349964)
     linux_start_kernel: 122.160835 (+80.411151)
     linux_start_user: 241.910863 (+119.750028)
     qemu_init_end: 36.683444
     fw_start: 36.870878 (+0.187434)
     fw_do_boot: 46.268925 (+9.398047)
     linux_start_boot: 47.614155 (+1.34523)
     linux_start_kernel: 124.805012 (+77.190857)
     linux_start_user: 242.882716 (+118.077704)

    by (Stefano Garzarella) at August 24, 2019 01:03 PM

    August 23, 2019

    Stefano Garzarella

    QEMU 4.0 boots uncompressed Linux x86_64 kernel

    QEMU 4.0 is now able to boot directly into the uncompressed Linux x86_64 kernel binary with minimal firmware involvement using the PVH entry point defined in the x86/HVM direct boot ABI. (CONFIG_PVH=y must be enabled in the Linux config file).

    The x86/HVM direct boot ABI was initially developed for Xen guests, but with latest changes in both QEMU and Linux, QEMU is able to use that same entry point for booting KVM guests.


    • QEMU >= 4.0
    • Linux kernel >= 4.21
      • CONFIG_PVH=y enabled
      • vmlinux uncompressed image

    How to use

    To boot the PVH kernel image, you can use the -kernel parameter specifying the path to the vmlinux image.

    qemu-system-x86_64 -machine q35,accel=kvm \
        -kernel /path/to/vmlinux \
        -drive file=/path/to/rootfs.ext2,if=virtio,format=raw \
        -append 'root=/dev/vda console=ttyS0' -vga none -display none \
        -serial mon:stdio

    The -initrd and -append parameters are also supported as for compressed images.


    QEMU will automatically recognize if the vmlinux image has the PVH entry point and it will use SeaBIOS with the new pvh.bin optionrom to load the uncompressed image into the guest VM.

    As an alternative, qboot can be used to load the PVH image.


    Perf script and patches used to measure boot time:

    The following values are expressed in milliseconds [ms]

    • QEMU (q35 machine) + SeaBIOS + bzImage

      • qemu_init_end: 36.072056
      • linux_start_kernel: 114.669522 (+78.597466)
      • linux_start_user: 191.748567 (+77.079045)
    • QEMU (q35 machine) + SeaBIOS + vmlinux(PVH)

      • qemu_init_end: 51.588200
      • linux_start_kernel: 62.124665 (+10.536465)
      • linux_start_user: 139.460582 (+77.335917)
    • QEMU (q35 machine) + qboot + bzImage

      • qemu_init_end: 36.443638
      • linux_start_kernel: 106.73115 (+70.287512)
      • linux_start_user: 184.575531 (+77.844381)
    • QEMU (q35 machine) + qboot + vmlinux(PVH)

      • qemu_init_end: 51.877656
      • linux_start_kernel: 56.710735 (+4.833079)
      • linux_start_user: 133.808972 (+77.098237)
    • Tracepoints:

      • qemu_init_end: first kvm_entry (i.e. QEMU initialization has finished)
      • linux_start_kernel: first entry of the Linux kernel (start_kernel())
      • linux_start_user: before starting the init process


    Linux patches merged upstream in Linux 4.21:

    QEMU patches merged upstream in QEMU 4.0:

    qboot patches merged upstream:

    by (Stefano Garzarella) at August 23, 2019 01:26 PM

    Powered by Planet!
    Last updated: January 18, 2020 12:04 PM
    Powered by OpenShift Online