Blogging about open source virtualization

News from QEMU, KVM, libvirt, libguestfs, virt-manager and related tools


Planet Feeds

December 07, 2023

KVM on Z

Red Hat Ansible Automation Platform available on IBM Z and LinuxONE

While Linux on IBM Z and LinuxONE could be used as a target for Ansible scripts ever since, the backend had to be run on other architectures. But no longer so: Starting today, the entire Red Hat Ansible Automation Platform is becoming available on IBM Z and LinuxONE!

See here for more details, and here for the formal announcement from Red Hat.

by Stefan Raspl ( at December 07, 2023 04:05 PM

December 01, 2023

KVM on Z

New Releases: RHEL 8.9 and RHEL 9.3 on IBM Z & LinuxONE

Both, Red Hat Enterprise Linux 8.9 and 9.1 are out! See the press release here, and Red Hat's blog entry here!

Both releases ship

  • s390-tools v2.27 (renamed to s390utils)
  • smc-tools v1.8.2
  • openCryptoki v3.21

Further information can be found in the release notes for RHEL 8.9 and see RHEL 9.3.

by Stefan Raspl ( at December 01, 2023 11:26 AM

November 30, 2023

Gerd Hoffmann

physical address space in qemu

The physical addess space is where all memory and most IO resources are located. PCI memory bars, PCI MMIO bars, platform devices like lapic, io-apic, hpet, tpm, ...

On your linux machine you can use lscpu to see the size of the physical address space:

$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
[ ... ]

In /proc/iomem you can see how the address space is used. Note that the actual addresses are only shown to root.

The physical address space problem on x86_64

The very first x86_64 processor (AMD Opteron) shipped with a physical address space of 40 bits (aka one TeraByte). So when qemu added support for the (back then) new architecture the qemu vcpu likewise got 40 bits of physical address space, probably assuming that this would be a safe baseline. It is still the default in qemu (version 8.1 as of today) for backward compatibility reasons.

Enter Intel. The first 64-bit processors shipped by Intel featured only 36 bits of physical address space. More recent Intel processors have 39, 42 or more physical address bits. Problem is this limit applies not only to the real physical addess space, but also to Extended Page Tables (EPT). Which means the physical address space of virtual machines is limited too.

So, the problem is the virtual machine firmware does not know how much physical address space it actually has. When checking CPUID it gets back 40 bits, but it could very well be it actually has only 36 bits.

Traditional firmware behavior

To address that problem the virtual machine firmware was very conservative with address space usage, to avoid crossing the unknown limit.

OVMF used to have a MMIO window with fixed size (32GB), which was based on the first multiple of 32GB after normal RAM. So a typical, smallish virtual machine had 0 -> 32GB for RAM and 32GB -> 64GB for IO, staying below the limit for 36 bits of physical address space (which equals 64GB).

VMs having more than 30GB of RAM will need address space above 32GB for RAM, which pushes the IO window above the 64GB limit. The assumtion that hosts which have enough physical memory to run such big virtual machines also have a physical address space larger than 64GB seems to have worked good enough.

Nevertheless the fixed 32G-sized IO window became increasingly problematic. Memory sizes are growing, not only for main memory, but also for device memory. GPUs have gigabytes of memory these days.

Config options in qemu

Qemu has tree -cpu options to control physical address space advertized to the guest, for quite a while already.

When enabled qemu will use the hosts physical address bits for the guest, i.e. the guest can see the actual limit. I recommend enable this everywhere.
Upstream default: off (except for -cpu host where it is on).
Some downstream linux distro builds flip this to on by default.
Is used only with host-phys-bits=on. Can be used to reduce the number of physical address space bits communicated to the guest. Useful for live migration compatibility in case your machine cluster has machines with different physical address space sizes.
Is used only with host-phys-bits=off. Can be used to set the number of physical address space bits to any value you want, including non-working values. Use only if you know what you are doing, it's easy to shot yourself into the foot with this one.

Changes in OVMF

Recent OVMF versions (edk2-stable202211 and newer) try to figure the size of the physical address space using a heuristic: In case the physical address space bits value received via CPUID is 40 or below it is checked against known-good values, which are 36 and 39 for Intel processors and 40 for AMD processors. If that check passes or the number of bits is 41 or higher OVMF assumes qemu is configured with host-phys-bits=on and the value can be trusted.

In case there is no trustworthy phys-bits value OVMF will continue with the traditional behavior described above.

In case OVMF trusts the phys-bits value it will apply some OVMF-specific limitations before actually using it:

  • The concept if virtual memory does not exist in UEFI, so the firmware will identity-map everything. Without 5-level paging (which is not yet supported in OVMF) at most 128TB (phys-bits=47) can be identity-mapped, so OVMF can not use more than that.
    The actual limit is phys-bits=46 (64TB) for now due to older linux kernels (4.15) having problems if OVMF uses phys-bits=47.
  • In case gigabyte pages are not available OVMF will not use more than phys-bits=40 (1TB). This avoids high memory usage and long boot times due to OVMF creating lots of page tables for the identity mapping.

The final phys-bits value will be used to calculate the size of the physical address space available. The 64-bit IO window will be placed as high as possibe, i.e. at the end of the physical address space. The size of the IO window and also the size of the PCI bridge windows (for prefetchable 64-bit bars) will be scaled up with the physical address space, i.e. on machines with a larger physical address space you will also get larger IO windows.

Changes in SeaBIOS

Starting with version 1.16.3 SeaBIOS uses a heuristic simliar to OVMF to figure whenever there is a trustworthy phys-bits value.

If that is the case SeaBIOS will enable the 64-bit IO window by default and place it at the end of the address space like OVMF does. SeaBIOS will also scale the size of the IO window with the size of the address space.

Although the overall behavior is simliar there are some noteworthy differences:

  • SeaBIOS will not enable the 64-bit IO window in case there is no RAM above 4G, for better compatibility with old -- possibly 32-bit -- guests.
  • SeaBIOS will not enable the 64-bit IO window in case the CPU has no support for long mode (i.e. it is a 32-bit processor), likewise for better compatibility with old guests.
  • SeaBIOS will limit phys-bits to 46, simliar to OVMF, likewise for better compatibility with old guests. SeaBIOS does not use paging though and does not care about support for gigabyte pages, it will never limit phys-bits to 40.
  • SeaBIOS has a list of devices which will never be placed in the 64-bit IO window. This list includes devices where SeaBIOS drivers must be able to access the PCI bars. SeaBIOS runs in 32-bit mode so these PCI bars must be mapped below 4GB.

Changes in qemu

Starting with release 8.2 the firmware images bundled with upstream qemu are new enough to include the OVMF and SeaBIOS changes described above.

Live migration and changes in libvirt

The new firmware behavior triggered a few bugs elsewhere ...

When doing live migration the vcpu configuration on source and target host must be identical. That includes the size of the physical address space.

libvirt can calculate the cpu baseline for a given cluster, i.e. create a vcpu configuration which is compatible with all cluster hosts. That calculation did not include the size of the physical address space though.

With the traditional, very conservative firmware behavior this bug did not cause problems in practice, but with OVMF starting to use the full physical address space live migrations in heterogeneous clusters started to fail because of that.

In libvirt 9.5.0 and newer this has been fixed.

Trouble shooting tips

In general, it is a good idea to set the qemu config option host-phys-bits=on.

In case guests can't deal with PCI bars being mapped at high addresses the host-phys-bits-limit=bits option can be used to limit the address space usage. I'd suggest to stick to values seen in actual processors, so 40 for AMD and 39 for Intel are good candidates.

In case you are running 32-bit guests with alot of memory (which btw isn't a good idea performance-wise) you might need turn off long mode support to force the PCI bars being mapped below 4G. This can be done by simply using qemu-system-i386 instead of qemu-system-x86_64, or by explicitly setting lm=off in the -cpu options.

by Gerd Hoffmann at November 30, 2023 11:00 PM

Daniel Berrange

ANNOUNCE: libvirt-glib release 5.0.0

I am pleased to announce that a new release of the libvirt-glib package, version 5.0.0, is now available from

The packages are GPG signed with

Key fingerprint: DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF (4096R)

Changes in this release:

  • Fix compatiblity with libxml2 >= 2.12.0
  • Bump min libvirt version to 2.3.0
  • Bump min meson to 0.56.0
  • Require use of GCC >= 4.8 / CLang > 3.4 / XCode CLang > 5.1
  • Mark USB disks as removable by default
  • Add support for audio device backend config
  • Add support for DBus graphics backend config
  • Add support for controlling firmware feature flags
  • Improve compiler flag handling in meson
  • Extend library version script handling to FreeBSD
  • Fix pointer sign issue in capabilities config API
  • Fix compat with gnome.mkenums() in Meson 0.60.0
  • Avoid compiler warnings from gi-ir-scanner generated code by not setting glib version constraints
  • Be more robust about NULL GError parameters
  • Disable unimportant cast alignment compiler warnings
  • Use ‘pragma once’ in all header files
  • Updated translations

Thanks to everyone who contributed to this new release.

by Daniel Berrange at November 30, 2023 02:59 PM

November 12, 2023

Stefan Hajnoczi

Upcoming talk: "Trust, confidentiality, and hardening: the virtio lessons" at LPC 2023

I will be at Linux Plumbers Conference 2023 to present "Trust, confidentiality, and hardening: the virtio lessons" at 2:30pm on Wednesday, November 15th. Michael Tsirkin and I prepared this talk about the evolution of the trust model of the Linux VIRTIO drivers. It explores how the drivers have been hardened in response to new use cases for VIRTIO, including Linux VDUSE, hardware VIRTIO devices, and Confidential Computing.

Details are available on the LPC schedule. Come watch to talk to find out how drivers work when you can't trust the hypervisor!

by Unknown ( at November 12, 2023 02:07 AM

October 20, 2023

KVM on Z

New Release: Ubuntu 23.10

Canonical released a new version of their Ubuntu server offering Ubuntu Server 23.10!

See the announcement on the mailing list here, and the blog entry at Canonical with Z-specific highlights here.

by Stefan Raspl ( at October 20, 2023 02:13 PM

New Release: Ubuntu 23.04

Canonical released a new version of their Ubuntu server offering Ubuntu Server 23.04!

See the announcement on the mailing list here, and the blog entry at Canonical with Z-specific highlights here.

by Stefan Raspl ( at October 20, 2023 02:11 PM

October 06, 2023

Daniel Berrange

Bye Bye BIOS: a tool for when you need to warn users the VM image is EFI only

The x86 platform has been ever so slowly moving towards a world where EFI is used to boot everything, with legacy BIOS put out to pasture. Virtual machines in general have been somewhat behind the cutting edge in this respect though. This has mostly been due to the virtualization and cloud platforms being somewhat slow in enabling use of EFI at all, let alone making it the default. In a great many cases the platforms still default to using BIOS unless explicitly asked to use EFI. With this in mind most the mainstream distros tend to provide general purpose disk images built such that they can boot under either BIOS or EFI, thus adapting to whatever environment the user deploys them in.

In recent times there is greater interest in the use of TPM sealing and SecureBoot for protecting guest secrets (eg LUKS passphrases), the introduction of UKIs as the means to extend the SecureBoot signature to close initrd/cmdline hole, and the advent of confidential virtualization technology. These all combine to increase the liklihood that a virtual machine image will exclusively target EFI, fully discontinuing support for legacy BIOS.

This presents a bit of a usability trapdoor for people deploying images though, as it has been taken for granted that BIOS boot always works. If one takes an EFI only disk image and attempts to boot it via legacy BIOS, the user is likely to get an entirely blank graphical display and/or serial console, with no obvious hint that EFI is required. Even if the requirement for EFI is documented, it is inevitable that users will make mistakes.

Can we do better than this ? Of course we can.

Enter ‘Bye Bye BIOS‘  (

This is a simple command line tool that, when pointed to a disk image, will inject a MBR sector that prints out a message to the user on the primary VGA display and serial port informing them that UEFI is required, then puts the CPUs in a ‘hlt‘ loop.

The usage is as follows, with a guest serial port connected to the local terminal:

$ byebyebios test.img
$ qemu-system-x86_64 \
    -blockdev driver=file,filename=test.img,node-name=img \
    -device virtio-blk,drive=img \
    -m 2000 -serial stdio

STOP: Machine was booted from BIOS or UEFI CSM
 _    _         _   _ ___________ _____   ___
| \  | |       | | | |  ___|  ___|_   _| |__ \
|  \ | | ___   | | | | |__ | |_    | |      ) |
| . `  |/ _ \  | | | |  __||  _|   | |     / /
| |\   | (_) | | |_| | |___| |    _| |_   |_|
\_| \_/ \___/   \___/\____/\_|    \___/   (_)

Installation requires UEFI firmware to boot

Meanwhile the graphical console shows the same:

QEMU showing

QEMU showing “No UEFI” message when booted from BIOS

The message shown here is a default, but it can be customized by pointing to an alternative message file

$ echo "Bye Bye BIOS" | figlet -f bubble | unix2dos > msg.txt
$ byebyebios --message msg.txt test.img
$ qemu-system-x86_64 \
    -blockdev driver=file,filename=test.img,node-name=img \
    -device virtio-blk,drive=img \
    -m 2000 -serial stdio

  _   _   _     _   _   _     _   _   _   _
 / \ / \ / \   / \ / \ / \   / \ / \ / \ / \
( B | y | e ) ( B | y | e ) ( B | I | O | S )
 \_/ \_/ \_/   \_/ \_/ \_/   \_/ \_/ \_/ \_/

The code behind this is simplicity itself, just a short piece of x86 asm

$ cat bootstub.S
# SPDX-License-Identifier: MIT-0

.global bye_bye_bios

  mov $something_important, %si
  mov $0xe, %ah
  mov $0x3f8,%dx

  cmp $0, %al
  je this_is_the_end
  int $0x10
  outb %al,%dx
  jmp say_a_little_more

  jmp this_is_the_end

# The string message will be appended here at time of install

This is compiled with the GNU assembler to create a i486 ELF object file

$ as -march i486 -mx86-used-note=no --32 -o bootstub.o bootstub.S

From this ELF object file we have to extract the raw machine code bytes

$ ld -m elf_i386 --oformat binary -e bye_bye_bios -Ttext 0x7c00 -o bootstub.bin bootstub.o

The byebyebios python tool takes this bootstub.bin, appends the text message and NUL terminator, padding to fill 446 bytes, then adds a dummy partition table and boot signature to fill the whole 512 sector.

With the boot stub binary at 21 bytes in size, this leaves 424 bytes available for the message to display to the user, which is ample for the purpose.

In conclusion, if you need to ship an EFI only virtual machine image, do your users a favour and use byebyebios to add a dummy MBR to tell them that the image is EFI only when they inevitably make a mistake and run it under legacy BIOS.


by Daniel Berrange at October 06, 2023 01:53 PM

October 03, 2023

Marcin Juszkiewicz

Testing *BSD on SBSA Reference Platform

SystemReady specification mentions that system to be certified needs to be able to boot several operating systems:

In addition, OS installation and boot logs are required:

  • Windows PE boot log, from a GPT partitioned disk, is required.
  • VMware ESXi-Arm installation and boot logs are recommended.
  • Installation and boot logs from two of the Linux distros or BSDs are required.

All logs must be submitted using the ES/SR template.

In choosing the Linux distros or BSDs, maximize the coverage by diversifying the heritage. For example, the following shows the grouping of the heritage:

  • RHEL/Fedora/CentOS/AlmaLinux/Rocky Linux/Oracle Linux/Anolis OS
  • SLES/openSUSE
  • Ubuntu/Debian
  • CBL-Mariner
  • NetBSD/OpenBSD/FreeBSD

So during last week I went through *BSD ones.


Started with “download OpenBSD” page and found out that there is no installation ISO for aarch64 architecture. Not good.

So I fetched miniroot73.img disk image instead and went on with booting:

>> OpenBSD/arm64 BOOTAA64 1.16
cannot open sd0a:/etc/random.seed: No such file or directory
booting sd0a:/bsd: 2798224+1058776+12709688+630920 [229059+91+651336+254968]=0x1
Copyright (c) 1982, 1986, 1989, 1991, 1993
        The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2023 OpenBSD. All rights reserved.

OpenBSD 7.3 (RAMDISK) #1941: Sat Mar 25 14:42:22 MDT 2023
real mem  = 4287451136 (4088MB)
avail mem = 4073807872 (3885MB)
random: boothowto does not indicate good seed
mainbus0 at root: ACPI
psci0 at mainbus0: PSCI 1.1, SMCCC 1.2
cpu0 at mainbus0 mpidr 0: ARM Cortex-A57 r1p0
cpu0: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
cpu0: 2048KB 64b/line 16-way L2 cache
efi0 at mainbus0: UEFI 2.7
efi0: EFI Development Kit II / SbsaQemu rev 0x10000
smbios0 at efi0: SMBIOS 3.4.0
smbios0: vendor EFI Development Kit II / SbsaQemu version "1.0" date 09/15/2023
smbios0: QEMU QEMU SBSA-REF Machine
agintc0 at mainbus0 shift 4:3 nirq 256 nredist 2: "interrupt-controller"
agtimer0 at mainbus0: 62500 kHz
acpi0 at mainbus0: ACPI 6.0
acpimcfg0 at acpi0
acpimcfg0: addr 0xf0000000, bus 0-255
pluart0 at acpi0 COM0 addr 0x60000000/0x1000 irq 33
pluart0: console
ahci0 at acpi0 AHC0 addr 0x60100000/0x10000 irq 42: AHCI 1.0
ahci0: port 0: 1.5Gb/s
ahci0: port 1: 1.5Gb/s
scsibus0 at ahci0: 32 targets
sd0 at scsibus0 targ 0 lun 0: <ATA, QEMU HARDDISK, 2.5+> t10.ATA_QEMU_HARDDISK_QM00001_
sd0: 43MB, 512 bytes/sector, 88064 sectors, thin
sd1 at scsibus0 targ 1 lun 0: <ATA, QEMU HARDDISK, 2.5+> t10.ATA_QEMU_HARDDISK_QM00003_
sd1: 504MB, 512 bytes/sector, 1032192 sectors, thin
ehci0 at acpi0 USB0 addr 0x60110000/0x10000 irq 43panic: uvm_fault failed: ffffff800034c3e8 esr 96000050 far ffffff8066ef5048

The operating system has halted.
Please press any key to reboot.

As you can see it hang on an attempt to initialize USB controller. Which shows that our move from EHCI to XHCI was not properly tested ;(

The problem was that our virtual hardware (QEMU) had XHCI (USB 3) controller on non-discoverable platform bus. But firmware (EDK2) tells that it was EHCI (USB 2) one.

This got solved with Yuquan Wang’s patch moving EDK2 to initiate and describe XHCI usb controller (change is already merged upstream). After rebuilding EDK2 OpenBSD booted fine right to the installation prompt (skipped previous messages):

xhci0 at acpi0 USB0 addr 0x60110000/0x10000 irq 43, xHCI 0.0
usb0 at xhci0: USB revision 3.0
uhub0 at usb0 configuration 1 interface 0 "Generic xHCI root hub" rev 3.00/1.00 addr 1
acpipci0 at acpi0 PCI0
pci0 at acpipci0
0:1:0: rom address conflict 0xfffc0000/0x40000
0:2:0: rom address conflict 0xffff8000/0x8000
"Red Hat Host" rev 0x00 at pci0 dev 0 function 0 not configured
em0 at pci0 dev 1 function 0 "Intel 82574L" rev 0x00: msi, address 52:54:00:12:34:56
"Bochs VGA" rev 0x02 at pci0 dev 2 function 0 not configured
"ACPI0007" at acpi0 not configured
"ACPI0007" at acpi0 not configured
simplefb0 at mainbus0: 1280x800, 32bpp
wsdisplay0 at simplefb0 mux 1
wsdisplay0: screen 0 added (std, vt100 emulation)
uhidev0 at uhub0 port 1 configuration 1 interface 0 "QEMU QEMU USB Keyboard" rev 2.00/0.00 addr 2
uhidev0: iclass 3/1
ukbd0 at uhidev0
wskbd0 at ukbd0 mux 1
wskbd0: connecting to wsdisplay0
uhidev1 at uhub0 port 2 configuration 1 interface 0 "QEMU QEMU USB Tablet" rev 2.00/0.00 addr 3
uhidev1: iclass 3/0
uhid at uhidev1 not configured
softraid0 at root
scsibus1 at softraid0: 256 targets
root on rd0a swap on rd0b dump on rd0b
erase ^?, werase ^W, kill ^U, intr ^C, status ^T

Welcome to the OpenBSD/arm64 7.3 installation program.
(I)nstall, (U)pgrade, (A)utoinstall or (S)hell?

After this I added booting OpenBSD to QEMU tests for SBSA Reference Platform to make sure that we have something non-Linux based there.


The next one was FreeBSD. And here situation started to be weird…

First I took 13.2 release. Used firmware with XHCI information and was greeted with:

Copyright (c) 1992-2021 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 13.2-RELEASE releng/13.2-n254617-525ecfdad597 GENERIC arm64
FreeBSD clang version 14.0.5 ( llvmorg-14.0.5-0-gc12386ae247c)
VT(efifb): resolution 1280x800
module firmware already present!
real memory  = 4294967296 (4096 MB)
avail memory = 4160204800 (3967 MB)
Starting CPU 1 (1)
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
arc4random: WARNING: initial seeding bypassed the cryptographic random device because it was not yet seeded and the knob 'bypass_before_seeding' was enabled.
random: entropy device external interface
MAP 100fbdf0000 mode 2 pages 128
MAP 100fbe70000 mode 2 pages 160
MAP 100fbf10000 mode 2 pages 80
MAP 100fbfb0000 mode 2 pages 80
MAP 100ff500000 mode 2 pages 400
MAP 100ff690000 mode 2 pages 592
MAP 10000000 mode 0 pages 1728
MAP 60010000 mode 0 pages 1
kbd0 at kbdmux0
acpi0: Power Button (fixed)
acpi0: Sleep Button (fixed)
acpi0: Could not update all GPEs: AE_NOT_CONFIGURED
psci0: <ARM Power State Co-ordination Interface Driver> on acpi0
gic0: <ARM Generic Interrupt Controller v3.0> iomem 0x40060000-0x4007ffff,0x40080000-0x4407ffff on acpi0
its0: <ARM GIC Interrupt Translation Service> mem 0x44081000-0x440a0fff on gic0
generic_timer0: <ARM Generic Timer> irq 3,4,5 on acpi0
Timecounter "ARM MPCore Timecounter" frequency 62500000 Hz quality 1000
Event timer "ARM MPCore Eventtimer" frequency 62500000 Hz quality 1000
efirtc0: <EFI Realtime Clock>
efirtc0: registered as a time-of-day clock, resolution 1.000000s
uart0: <PrimeCell UART (PL011)> iomem 0x60000000-0x60000fff irq 0 on acpi0
uart0: console (115200,n,8,1)
ahci0: <AHCI SATA controller> iomem 0x60100000-0x6010ffff irq 1 on acpi0
ahci0: AHCI v1.00 with 6 1.5Gbps ports, Port Multiplier not supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich1: <AHCI channel> at channel 1 on ahci0
ahcich2: <AHCI channel> at channel 2 on ahci0
ahcich3: <AHCI channel> at channel 3 on ahci0
ahcich4: <AHCI channel> at channel 4 on ahci0
ahcich5: <AHCI channel> at channel 5 on ahci0
xhci0: <Generic USB 3.0 controller> iomem 0x60110000-0x6011ffff irq 2 on acpi0
xhci0: 32 bytes context size, 32-bit DMA

And it hang there…

Let check newer FreeBSD

Contacted people on #freebsd IRC channel and Mina Galić (meena on IRC) asked me to boot FreeBSD 14 or 15 images. So I tried both:

xhci0: <Generic USB 3.0 controller> iomem 0x60110000-0x6011ffff irq 2 on acpi0
xhci0: 32 bytes context size, 64-bit DMA
usbus0 on xhci0

System booted further. Note “64-bit DMA” information instead of “32-bit DMA” from 13.2 release. Reported bug 274237 for it. On the same day required change was identified and marked for potential backport.

AHCI issue

But that was not the only problem. Turned out that none of AHCI devices were found… So no way to run an installer:

ahci0: <AHCI SATA controller> iomem 0x60100000-0x6010ffff irq 1 on acpi0
ahci0: AHCI v1.00 with 6 1.5Gbps ports, Port Multiplier not supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich1: <AHCI channel> at channel 1 on ahci0
ahcich2: <AHCI channel> at channel 2 on ahci0
ahcich3: <AHCI channel> at channel 3 on ahci0
ahcich4: <AHCI channel> at channel 4 on ahci0
ahcich5: <AHCI channel> at channel 5 on ahci0
Release APs...done
Trying to mount root from cd9660:/dev/iso9660/13_2_RELEASE_AARCH64_BO [ro]...
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
ahcich0: Poll timeout on slot 1 port 0
ahcich0: is 00000000 cs 00000002 ss 00000000 rs 00000002 tfd 170 serr 00000000 cmd 0000c017
(aprobe0:ahcich0:0:0:0): SOFT_RESET. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
(aprobe0:ahcich0:0:0:0): CAM status: Command timeout
(aprobe0:ahcich0:0:0:0): Error 5, Retries exhausted

I checked QEMU 7.2 (from Fedora package) and it booted fine. 8.0.5 failed, 8.0.0 booted. Hm… Started “git bisect” to find out which change broke it. After several rebuilds I found commit to blame:

commit 7bcd32128b227cee1fb39ff242d486ed9fff7648
Author: Niklas Cassel <>
Date:   Fri Jun 9 16:08:40 2023 +0200

    hw/ide/ahci: simplify and document PxCI handling

    The AHCI spec states that:
    For NCQ, PxCI is cleared on command queued successfully.
Is it AArch64 only or not?

The next step: checking is it global problem or only aarch64 one.

I built x86-64 emulation component and checked Q35 machine (which also uses AHCI). And FreeBSD failed exactly same way. This made bug reporting a lot easier as there were several architectures and more users affected.

Mailed author and QEMU developers about it. Described the problem, gave exact command line arguments for QEMU etc. Niklas Cassel replied:

I will have a look at this.

So it will be done.


Here situation was a bit similar to FreeBSD one.

Fetched NetBSD 9.3 image and booted just to see it hang (removed printk.time from output):

Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
    2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
    2018, 2019, 2020, 2021, 2022
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 9.3 (GENERIC64) #0: Thu Aug  4 15:30:37 UTC 2022
total memory = 4075 MB
avail memory = 3929 MB
running cgd selftest aes-xts-256 aes-xts-512 done
armfdt0 (root)
simplebus0 at armfdt0: QEMU QEMU SBSA-REF Machine
simplebus1 at simplebus0
acpifdt0 at simplebus0
acpifdt0: using EFI runtime services for RTC
ACPI: RSDP 0x00000100FC020018 000024 (v02 LINARO)
ACPI: XSDT 0x00000100FC02FE98 00006C (v01 LINARO SBSAQEMU 20200810 LNRO 00000001)
ACPI: FACP 0x00000100FC02FB98 000114 (v06 LINARO SBSAQEMU 20200810 LNRO 00000001)
ACPI: DSDT 0x00000100FC02E998 000CD8 (v02 LINARO SBSAQEMU 20200810 INTL 20220331)
ACPI: DBG2 0x00000100FC02FA98 00005C (v00 LINARO SBSAQEMU 20200810 LNRO 00000001)
ACPI: MCFG 0x00000100FC02FE18 00003C (v01 LINARO SBSAQEMU 20200810 LNRO 00000001)
ACPI: SPCR 0x00000100FC02FF98 000050 (v02 LINARO SBSAQEMU 20200810 LNRO 00000001)
ACPI: IORT 0x00000100FC027518 0000DC (v00 LINARO SBSAQEMU 20200810 LNRO 00000001)
ACPI: APIC 0x00000100FC02E498 000108 (v04 LINARO SBSAQEMU 20200810 LNRO 00000001)
ACPI: SSDT 0x00000100FC02E898 000067 (v02 LINARO SBSAQEMU 20200810 LNRO 00000001)
ACPI: PPTT 0x00000100FC02FD18 0000B8 (v02 LINARO SBSAQEMU 20200810 LNRO 00000001)
ACPI: GTDT 0x00000100FC02E618 000084 (v03 LINARO SBSAQEMU 20200810 LNRO 00000001)
ACPI: 2 ACPI AML tables successfully acquired and loaded
acpi0 at acpifdt0: Intel ACPICA 20190405
cpu0 at acpi0: unknown CPU (ID = 0x411fd402)
cpu0: package 0, core 0, smt 0
cpu0: IC enabled, DC enabled, EL0/EL1 stack Alignment check enabled
cpu0: Cache Writeback Granule 16B, Exclusives Reservation Granule 16B
cpu0: Dcache line 64, Icache line 64
cpu0: L1 0KB/64B 4-way PIPT Instruction cache
cpu0: L1 0KB/64B 4-way PIPT Data cache
cpu0: L2 0KB/64B 8-way PIPT Unified cache
cpu0: revID=0x0, 4k table, 16k table, 64k table, 16bit ASID
cpu0: auxID=0x1011111110212120, GICv3, CRC32, SHA1, AES+PMULL, rounding, NaN propagation, denormals, 32x64bitRegs, Fused Multiply-Add
cpu1 at acpi0: unknown CPU (ID = 0x411fd402)
cpu1: package 0, core 1, smt 0
gicvthree0 at acpi0: GICv3
gicvthree0: ITS #0 at 0x44081000
gicvthree0: ITS [#0] Devices table @ 0x10009210000/0x80000, Cacheable WA WB, Inner shareable
gicvthree0: ITS [#1] Collections table @ 0x10009290000/0x10000, Cacheable WA WB, Inner shareable

As 9.3 release is quite old I tested NetBSD 10-Beta:

gicvthree0 at acpi0: GICv3
gicvthree0: ITS #0 at 0x44081000
gicvthree0: ITS [#0] Devices table @ 0x10008a60000/0x80000, Cacheable WA WB, Inner shareable
gicvthree0: ITS [#1] Collections table @ 0x10008ae0000/0x10000, Cacheable WA WB, Inner shareable
gtmr0 at acpi0: irq 27
armgtmr0 at gtmr0: Generic Timer (62500 kHz, virtual)
plcom0 at acpi0 (COM0, ARMH0011-0): mem 0x60000000-0x60000fff irq 33
plcom0: console
NetBSD-10.0_BETA Install System

Went to #netbsd channel on IRC and started disussion. Michael van Elst (mlelstv on irc) gave me a helping hand and debugged the problem. Looks like kernel went into infinite loop on parsing GTDT table from ACPI. Newer branches of NetBSD have additional check there.

Filled bug 57642 for it. And, like in FreeBSD case, it looks like some backport to stable branch is needed.


Testing platforms for SystemReady compliance needs to include *BSD systems. Linux and NetBSD were fine with our USB controller mess — gave “something is wrong” message and went on. FreeBSD and OpenBSD systems were complaining and stopped booting process.

We also need to do more testing before merging big changes in future. This USB controller mess could be avoided or done better.

by Marcin Juszkiewicz at October 03, 2023 06:46 PM

September 15, 2023

Marcin Juszkiewicz

SBSA Reference Platform update

There were several changes done since my previous post on the topic. So after some discussions I decided to write a post about it.

There are improvements, fixes and even issues with BSA specification.

Versioning related changes

SBSA Reference Platform (“sbsa-ref” in short) is now at version 0.3 one. Note that this is internal number. Machine name is still the same.

First bump was adding GIC data into (minimalistic) device-tree so firmware can configure it without using any magic numbers (as it was before).

Second update added GIC ITS (Interrupt Translation Services) support. Which means that we can have MSI-X interrupts and complex PCI Express setup.

Third time we said goodbye to USB 2.0 (EHCI) host controller. It never worked and only generated kernel warnings. XHCI (USB 3) controller is used instead now. EDK2 enablement is still work in progress.

Firmware improvements

Most of versioning updates involved firmware changes. Information about hardware details gets passed from virtual hardware level to operating system via standard defined ways:

  • Trusted Firmware (TF-A) gets minimalistic Device-Tree from QEMU
  • UEFI (EDK2) uses Secure Monitor Calls to get information from TF-A
  • operating system uses ACPI tables

This way we were able to get rid of part of “magic numbers” from firmware components.

CPU updates

We can use Neoverse V1 cpu core now. It uses Arm v8.4 architecture and brings SVE and a bunch of other interesting features. You may need to update Trusted Firmware to make use of it.

QEMU got Arm Cortex-A710 cpu core support. It is first Arm v9.0 core there. Due to 240 address space we cannot use it for sbsa-ref. But it prepares code for Neoverse N2/V2 cores.

PCI Express changes and disputes

SBSA Reference Platform passes most of BSA ACS tests from PCI Express module:

      *** Starting PCIe tests ***

Operating System View:
 801 : Check ECAM Presence                        : Result:  PASS
 802 : PE - ECAM Region accessibility check       : Result:  PASS
 803 : All EP/Sw under RP in same ECAM Region     : Result:  PASS
 804 : Check RootPort NP Memory Access            : Result:  PASS
 805 : Check RootPort P Memory Access             : Result:  PASS
 806 : Legacy int must be SPI & lvl-sensitive
       Checkpoint --  2                           : Result:  SKIPPED
 808 : Check all 1's for out of range             : Result:  PASS
 809 : Vendor specfic data are PCIe compliant     : Result:  PASS
 811 : Check RP Byte Enable Rules                 : Result:  PASS
 817 : Check Direct Transl P2P Support
       Checkpoint --  1                           : Result:  SKIPPED
 818 : Check RP Adv Error Report
       Checkpoint --  1                           : Result:  SKIPPED
 819 : RP must suprt ACS if P2P Txn are allow
       Checkpoint --  1                           : Result:  SKIPPED
 820 : Type 0/1 common config rule                : Result:  PASS
 821 : Type 0 config header rules                 : Result:  PASS
 822 : Check Type 1 config header rules
       BDF 0x400 : SLT attribute mismatch: 0xFF020100 instead of 0x20100
       BDF 0x500 : SLT attribute mismatch: 0xFF030300 instead of 0x30300
       BDF 0x600 : SLT attribute mismatch: 0xFF040400 instead of 0x40400
       BDF 0x700 : SLT attribute mismatch: 0xFF050500 instead of 0x50500
       BDF 0x800 : SLT attribute mismatch: 0xFF060600 instead of 0x60600
       BDF 0x900 : SLT attribute mismatch: 0xFF080700 instead of 0x80700
       BDF 0x10000 : SLT attribute mismatch: 0xFF020201 instead of 0x20201
       Failed on PE -    0
       Checkpoint --  7                           : Result:  FAIL
 824 : Device capabilities reg rule               : Result:  PASS
 825 : Device Control register rule               : Result:  PASS
 826 : Device cap 2 register rules                : Result:  PASS
 830 : Check Cmd Reg memory space enable
       BDF 400 MSE functionality failure
       Failed on PE -    0
       Checkpoint --  1                           : Result:  FAIL
 831 : Check Type0/1 BIST Register rule           : Result:  PASS
 832 : Check HDR CapPtr Register rule             : Result:  PASS
 833 : Check Max payload size supported           : Result:  PASS
 835 : Check Function level reset                 : Result:  PASS
 836 : Check ARI forwarding enable rule           : Result:  PASS
 837 : Check Config Txn for RP in HB              : Result:  PASS
 838 : Check all RP in HB is in same ECAM         : Result:  PASS
 839 : Check MSI support for PCIe dev             : Result:  PASS
 840 : PCIe RC,PE - Same Inr Shareable Domain     : Result:  PASS
 841 : NP type-1 PCIe supp 32-bit only
       NP type-1 pcie is not 32-bit mem type
       Failed on PE -    0
       Checkpoint --  1                           : Result:  FAIL
 842 : PASID support atleast 16 bits
       Checkpoint --  3                           : Result:  SKIPPED

      One or more PCIe tests failed or were skipped.

     Total Tests run  =   30  Tests Passed  =   22  Tests Failed =    3

As you see some of them require work.

Root ports SLT issue

I reported problem with test 822 to QEMU developers and turned out that it is a bug there. I got patch from Michael S. Tsirkin (one of QEMU PCI maintainers) and it made test pass. I hope it will be merged soon.

PCIe to PCI bridge issue

I wonder how many SBSA physical platforms will use one of those. Probably none, but my testing setup has one.

And it makes test 841 fail. This time problem requires more discussion because BSA specification writes (chapter E.2 PCI Express Memory Space):

When PCI Express memory space is mapped as normal memory, the system must support unaligned accesses to that region. PCI Type 1 headers, used in PCI-to-PCI bridges, and therefore in root ports and switches, have to be programmed with the address space resources claimed by the given bridge. For non-prefetchable (NP) memory, Type 1 headers only support 32-bit addresses. This implies that endpoints on the other end of a PCI-to-PCI bridge only support 32-bit NP BARs.

On the other side we have PCI Express Base Specification Revision 6.0 which, in chapter, says that BAR can be either 32 or 64-bit long:

Base Address registers that map into Memory Space can be 32 bits or 64 bits wide (to support mapping into a 64-bit address space) with bit 0 hardwired to 0b. For Memory Base Address registers, bits 2 and 1 have an encoded meaning as shown in Table 7-9. Bit 3 should be set to 1b if the data is prefetchable and set to 0b otherwise. A Function is permitted to mark a range as prefetchable if there are no side effects on reads, the Function returns all bytes on reads regardless of the byte enables, and host bridges can merge processor writes into this range 150 without causing errors. Bits 3-0 are read-only.

Table 7-9 Memory Base Address Register Bits 2:1 Encoding

Bits 2:1(b) Meaning
00 Base register is 32 bits wide and can be mapped anywhere in the 32 address bit Memory Space.
01 Reserved
10 Base register is 64 bits wide and can be mapped anywhere in the 64 address bit Memory Space.
11 Reserved

And pcie-pci-bridge device in QEMU uses 64-bit BAR.

I opened support ticket for it at Arm. Will see how it ends.

Non-secure EL2 virtual timer

Arm v8.1 architecture brought Virtual Host Extension (VHE in short). And it added one more timer: non-secure EL2 virtual timer.

BSA ACS checks for it and we were failing:

 226 : Check NS EL2-Virt timer PPI Assignment         START

       NS EL2 Virtual timer interrupt 28 not received
       Failed on PE -    0
       Checkpoint --  4                           : Result:  FAIL

Turned out that everything to make it pass was already present in QEMU. Except code to enable it for our platform. Two lines of code were enough.

After I sent my small patch, Leif Lindholm extracted timer definitions to separate include file and cleaned code around it to make it easier to compare QEMU code with BSA specification.

Result? Test passes:

 226 : Check NS EL2-Virt timer PPI Assignment         START

       Received vir el2 interrupt
                                       : Result:  PASS


SBSA Reference Platform in QEMU gets better and better with time. We can emulate more complex systems, information about hardware details gets passed from virtual hardware level to operating system via standard defined ways.

Still have test failures but less than it was in past.

by Marcin Juszkiewicz at September 15, 2023 01:30 PM

September 09, 2023

Stefan Hajnoczi

How nostr could enable peer-to-peer apps

I hacked up a prototype multi-player game in just a static HTML/JS files. The game runs in players' browsers without the need for a centralized game server. This peer-to-peer model - getting rid of the server - is something I've been interested in for a long time. I finally discovered a way to make it work without hosting my own server or relying on a hosted service that requires API keys, accounts, or payments. That missing piece came in the form of nostr, a decentralized network protocol that I'll get into later.

Recently p2panda and Veilid were released. They are decentralized application frameworks. Neither has the exact properties I like, but that spurred me to work on a prototype game that shows the direction that I find promising for decentralized applications.

Distributed application models

Most distributed applications today are built on a centralized client-server model. Applications are not a single program, but two programs. A client application on the user's device communicates with a server application on the application owner's machines. The way it works is pretty simple: the server holds the data and the client sends commands to operate on the data.

The centralized client-server model is kind of a drag because you need to develop two separate programs and maintain a server so that the application remains online at all times. Non-technical users can't really host the application themselves. It costs money to run the server. If the application owner decides to pull the plug on the server then users cannot use the application anymore. Bad practices of locking in, hoarding, and selling user data as well as monitoring and manipulating user behavior are commonplace because the server controls access to user data.

Peer-to-peer applications solve many of these issues. The advantages are roughly:

  • Eliminating the cost, effort, and skill required to maintain servers.
  • Improving user privacy by not involving a server.
  • Operating without constant internet connectivity.
  • Enabling users to run the application even after the developer has stopped supporting it.
  • Reducing application complexity by replacing client/server with a single program.

How to make a peer-to-peer application

This needs to work for web, mobile, and desktop applications because people switch between these three environments all the time. It would be impractical if the solution does not support all environments. The web is the most restrictive environment, mostly for security reasons. Many technologies are not available on the web, including networking APIs that desktop peer-to-peer applications tend to rely on. But if a solution works on the web, then mobile and desktop applications are likely to be able to use the same technology and interoperate with web applications.

Luckily the web environment has one technology that can be used to build peer-to-peer applications: WebRTC. Implementations are available for mobile and destkop environments as well. WebRTC's DataChannels can be thought of as network connections that transfer messages between two devices. They are the primitive for communicating in a peer-to-peer application in place of HTTPS, TCP, or UDP connections that most existing application use today.

Unfortunately WebRTC is not fully peer-to-peer because it relies on a "signaling server" for connection establishment. The signaling server exchanges connectivity information so that a peer-to-peer connetion can be negotiated. This negotiation process does not always succeed, by the way, so in some cases it is not possible to create a peer-to-peer connection. I have no solution for that without hosting servers.

The crux of using WebRTC is that a signaling server is needed, but we don't want to host one for each application. Over the years I've investigated existing peer-to-peer networks like Tor and WebTorrent to see if they could act as the signaling server. I didn't find one that is usable from the web environment (it's too restricted) until now.

It turns out that nostr, originally designed for social network applications but now being used for a bunch of different applications, is web-friendly and could act as a WebRTC signaling server quite easily. In my prototype I abused the encrypted direct message (NIP-04) feature for WebRTC signaling. It works but has the downside that the nostr relay wastes storage because there is no need to preserve the messages. That can be fixed by assigning an "ephemeral kind" so the relay knows it can discard messages after delivery.

(Another option is to build a free public WebRTC signaling service. Its design would be remarkably close to the nostr protocol, so I decided not to reinvent the wheel. If anyone wants to create a public service, let me know and I can share ideas and research.)

Once connectivity has been established via WebRTC, it's up to the application to decide how to communicate. It could be a custom protocol like the JSON messages that my prototype uses, it could be the nostr protocol, it could be HTTP, or literally anything.

The user experience

Here is how my game prototype works:

  1. Player A opens the game web page (just static files hosted on GitLab Pages) and clicks "Host" game.
  2. Player A shares the game link with player B.
  3. Player B opens the game link and uses nostr to exchange WebRTC signaling messages encrypted with the other player's public key.
  4. A WebRTC DataChannel is negotiated and nostr is no longer used once the peer-to-peer connection is established.
  5. The game proceeds with player A and B exchanging game messages over the DataChannel.

In order to connect apps, a user must share a public key with the other user. The public key allows the other user to connect. In my prototype the player hosting the game gets a URL that can be shared with the other player. When the other player visits the URL they will join the game because the public key is embedded in the URL. The role of the public key is similar to the idea behind INET256's "stable addresses derived from public keys".

When devices go offline it is no longer possible to connect to them. This is not a problem for short-lived use cases like playing a game of chess or synchronizing the state of an RSS reader application between a phone and a laptop. For long-lived use cases like a discussion forum or a team chat there are two options: a fully peer-to-peer replicated and eventually consistent data model or a traditional centralized server hosted on a supernode. Both of these options are possible.

Try it out

You can try out my prototype in your web browser. It's a 2-player tic-tac-toe game: If the game does not start, try it again (sorry, I hacked it up in a weekend and it's not perfect).

If you want to discuss or share other peer-to-peer application approaches, see my contact details here.

by Unknown ( at September 09, 2023 01:02 AM

August 22, 2023

QEMU project

QEMU version 8.1.0 released

We’d like to announce the availability of the QEMU 8.1.0 release. This release contains 2900+ commits from 250 authors.

You can grab the tarball from our download page. The full list of changes are available in the changelog.

Highlights include:

  • VFIO: improved live migration support, no longer an experimental feature
  • GTK GUI now supports multi-touch events
  • ARM, PowerPC, and RISC-V can now use AES acceleration on host processor
  • PCIe: new QMP commands to inject CXL General Media events, DRAM events and Memory Module events
  • ARM: KVM VMs on a host which supports MTE (the Memory Tagging Extension) can now use MTE in the guest
  • ARM: emulation support for bpim2u (Banana Pi BPI-M2 Ultra) board and neoverse-v1 (Cortex Neoverse-V1) CPU
  • ARM: new architectural feature support for: FEAT_PAN3 (SCTLR_ELx.EPAN), FEAT_LSE2 (Large System Extensions v2), and experimental support for FEAT_RME (Realm Management Extensions)
  • Hexagon: new instruction support for v68/v73 scalar, and v68/v69 HVX
  • Hexagon: gdbstub support for HVX
  • MIPS: emulation support for Ingenic XBurstR1/XBurstR2 CPUs, and MXU instructions
  • PowerPC: TCG SMT support, allowing pseries and powernv to run with up to 8 threads per core
  • PowerPC: emulation support for Power9 DD2.2 CPU model, and perf sampling support for POWER CPUs
  • RISC-V: ISA extension support for BF16/Zfa, and disassembly support for Zcm/Zinx/XVentanaCondOps/Xthead
  • RISC-V: CPU emulation support for Veyron V1
  • RISC-V: numerous KVM/emulation fixes and enhancements
  • s390: instruction emulation fixes for LDER, LCBB, LOCFHR, MXDB, MXDBR, EPSW, MDEB, MDEBR, MVCRL, LRA, CKSM, CLM, ICM, MC, STIDP, EXECUTE, and CLGEBR(A)
  • SPARC: updated target/sparc to use tcg_gen_lookup_and_goto_ptr() for improved performance
  • Tricore: emulation support for TC37x CPU that supports ISA v1.6.2 instructions
  • Tricore: instruction emulation of POPCNT.W, LHA, CRC32L.W, CRC32.B, SHUFFLE, SYSCALL, and DISABLE
  • x86: CPU model support for GraniteRapids
  • and lots more…

Thank you to everybody who contributed to this release, whether that was by writing code, reporting bugs, improving documentation, testing, or providing the project with CI resources. We couldn’t do these without you!

August 22, 2023 04:42 PM

August 10, 2023

KVM on Z

2023 IBM TechXchange EMEA Client Workshop for Linux on IBM Z and LinuxONE

Interested in the latest news on Linux on IBM Z & LinuxONE? Come and meet us at the 2023 IBM TechXchange EMEA Client Workshop for Linux on IBM Z and LinuxONE on September 19-20 in Ehningen, Germany!

Register here.

by Stefan Raspl ( at August 10, 2023 01:24 PM

July 10, 2023

KVM on Z

KVM in Linux Distributions in 2Q 2023

Second quarter of 2023 was quite productive in terms of new Linux distribution releases and KVM related features shipped in there. Here they are, in chronological order.

Ubuntu 23.04 

The most recent Ubuntu release contains the following new KVM-related functionality:
  • Interpretive vfio-pci support for ISM: Allows pass-through of ISM devices to KVM guests, enabling high-bandwith and low-latency network communications using SMC-D.
  • Encrypted dump for Secure Execution: Enhances problem determination capabilities while not compromising the security of secure KVM guests.
  • Bus id for subchannels: Allows you to identify passthrough CCW devices by their device bus id in the host without going through hoops.
  • Driverctl now lists persisted overrides: Makes it easier to identify and manage passthrough devices.

RHEL 8.8 and 9.2

While being distinct releases, both share the same set of new functionality, which is:
  • Interpretive vfio-pci support for ISM: Which allows pass-through of ISM devices to KVM guests, enabling high-bandwith and low-latency network communication using SMC-D.
  • Encrypted dump for Secure Execution: Enhances problem determination capabilities while not compromising the security of secure KVM guests.
  • Bus id for subchannels: Allows you to identify passthrough CCW devices by their device bus id in the host without going through hoops.
  • Dynamic configuration updates  for vfio-ap: Allows you to hot plug and unplug Crypto domains of a Crypto passthrough configuration for running KVM guests.


Being a large service pack, there are numerous new features delivered:
  • Interpretive vfio-pci support for ISM: Which allows pass-through of ISM devices to KVM guests, enabling high-bandwith and low-latency network communication using SMC-D.
  • Encrypted dump for Secure Execution: Enhances problem determination capabilities while not compromising the security of secure KVM guests.
  • Bus id for subchannels: Allows you to identify passthrough CCW devices by their device bus id in the host without going through hoops.
  • Driverctl now lists persisted overrides: Makes it easier to identify and manage passthrough devices.
  • Persistent configuration for vfio-ap: The s390-tools command zdev can now be used to persist Crypto passthrough configurations.
  • Dynamic configuration updates  for vfio-ap: Allows you to hot plug and unplug Crypto domains of a Crypto passthrough configuration for running KVM guests.
  • Remote attestation for Secure Execution: Provides cryptographic evidence of workload authenticity and integrity facilitates integration into common Confidential Computing frameworks.
  • Support of long kernel command lines of up to 64 KB length, for example, allows you to specify plenty of I/O devices.
...and other minor improvements

by Hendrik Brueckner ( at July 10, 2023 02:49 PM

June 24, 2023

Thomas Huth

New KVM features in RHEL 9.2 and 8.8 on IBM Z

A couple of weeks ago, Red Hat Enterprise Linux 9.2 and Red Hat Enterprise Linux 8.8 have been release – time to look at the new features here with regards to KVM virtualization on IBM Z systems.

Rebased versions in RHEL 9.2

The KVM code in the 5.14-based kernel of RHEL 9.2 has been refreshed to the state of the upstream 6.0 kernel.

Additionally, many packages from the virtualization stack have been rebased in RHEL 9.2. The following versions are now available:

  • QEMU 7.2.0 (updated from 7.0.0 in RHEL 9.1)
  • libvirt 9.0.0 (updated from 8.5.0 in RHEL 9.1)
  • virt-install 4.1.0 (updated from 4.0.0 in RHEL 9.1)
  • libguestfs 1.48.4
  • libslirp 4.4.0

Speaking of libslirp, a new alternative to the “slirp” user mode networking called passt has been added in RHEL 9.2 for the first time and can be used by installing the “passt” package and adjusting the XML definition of your guest accordingly. “passt” should provide more performance than “slirp” and was designed with security in mind.

Beside the generic new features that are available thanks to the rebased packages in RHEL 9.2, there are also some cool new IBM Z-specific features which have been explicitly backported to the RHEL 9.2 and 8.8 code base:

Secure execution guest dump encryption with customer keys

When running secure guests it is of course normally not possible to dump the guest’s memory from the host (e.g. with virsh dump --memory-only) since the memory pages of the guest are not available to the host system anymore.

However, in some cases (e.g. when debugging a misbehaving or crashing kernel in the guest), the owner of the guest VM still might want to get a dump of the guest memory – just without providing it in clear text to the administrator of the host system. With RHEL 9.2 and 8.8, this is now possible on the new z16 mainframe. Please see the related documentation from IBM to learn how to set up such a dump.

Crypto passthrough hotplug

vfio-ap crypto adapters can now be hotplugged to guests during runtime, too, which brings you more flexibility, without the need to shutdown your guests to change their configurations.

Enhanced interpretation for PCI functions

The kernel code in RHEL 9.2 and 8.8 can now enable a new firmware/hardware feature of the recent IBM Z machines that can speed up the performance of passthrough PCI devices (more events can be handled within the guest, without intervention of the KVM hypervisor). Additionally, this now also allows to pass ISM PCI devices through to KVM guests (which was not possible before).

June 24, 2023 10:45 AM

May 23, 2023

Marcin Juszkiewicz

Versioning of sbsa-ref machine

QEMU has emulation of several machines. One of them is “sbsa-ref” which stands for SBSA Reference Platform. The Arm server in simpler words.

In past I worked on it when my help was needed. We have CI jobs which run some tests (SBSA ACS, BSA ACS) and do some checks to see how we are with SBSA compliance.


One day there was discussion that we need a way to recognize variants of “sbsa-ref” in some sane way. The idea was to get rid of most of hardcoded values and provide a way to have data going from QEMU up to firmware.

We started with adding “platform version major/minor” fields into DeviceTree. Starting with “0.0” as value. And for some time nothing changed here as some of people working on SBSA Reference Platform changed jobs and other worked on other parts of it.

Note that this is different than other QEMU targets. We do not go “sbsa-ref-8.0”, “sbsa-ref-8.1” way as this would add maintenance work without any gain for us.

During last Linaro Connect we had some discussion on how we want to proceed. And some after (as not everyone got there — UK visa issues).

The plan

The plan is simple:

  • QEMU adds data into DeviceTree
  • TF-A parses DeviceTree and extracts data from it
  • TF-A provides Secure Monitor Calls (SMC) with data from DT
  • EDK2 uses SMC to gather data from TF-A
  • EDK2 creates ACPI tables
  • OS uses ACPI to get hardware information


After setting the plan I created a bunch of Jira tickets and started writing code. Some changes were new, some were adapted from our work-in-progress ones.

0.0: Platform version SMC

Trusted Firmware (TF-A) reads DeviceTree from QEMU and provides platform version (PV from now on) up to firmware via SMC. EDK2 reads it and does nothing (as expected).

0.1: GIC data SMC

Firmware knows which platform version we are on so it can do something about it. So we bump the value in QEMU and provide Arm GIC addresses via another SMCs.

TF-A uses those values instead of hardcoded ones to initialize GIC. Then EDK2 does the same.

If such firmware boots on older QEMU then hardcoded values are used and machine is operational still.


Here things start to be more interesting. We add Interrupt Translation Service support to GIC. Which means we have LPI, MSI(-X) etc. In other words: have normal, working PCI Express with root ports, proper interrupts etc.

From code side it is like previous step: QEMU adds address to DT, TF-A reads it and provides via SMC to EDK2.

If such firmware boots on older QEMU then ITS is not initialized as it was not present in PV 0.0 system.

0.x: PCIe SMC

Normal PCI Express is present. So let get rid of hardcoded values. Similar steps and behaviour like above.

0.y: go PCIe!

At this step we have normal, working PCI Express structure. So let get rid of some platform devices and replace them with expansion cards:

  • AHCI (sata storage)
  • EHCI (usb 2.0)

We can use “ich9-ahci” card instead of former and “qemu-xhci” for latter one.

This step is EDK2 only as we do not touch those parts in TF-A. No real code yet as it needs adding some conditions to existing ASL code so operating system will not get information in DSDT table.

Again: if booted on lower PV then hardcoded values are used.

Other changes

Recently some additional changes to “sbsa-ref” were merged.

We exchanged graphics card from basic VGA one on legacy PCI bus to Bochs one (which uses PCI Express). From firmware or OS view not much changed as both were supported already.

Other change was default processor. QEMU 8.0 brought emulation of Arm Neoverse-N1 cpu. It is enabled in TF-A for a while so we switched to use it by default (instead of ancient Cortex-A57). With move from arm v8.0 to v8.2 we got several cpu features and functionalities.


The above steps are cleanup preparing “sbsa-ref” for future work. We want to be able to change hardware definition more. For example to select exact GIC model (like GIC-600) instead of generic “arm-gic-v3” one.

SBSA Reference Platform is system where most of expansion is expected to happen by adding PCI Express cards.

by Marcin Juszkiewicz at May 23, 2023 10:43 AM

KVM on Z

WARNING: Updating from RHEL8.6 to RHEL8.7/RHEL8.8 may break Network Access with RoCE Express Adapters

RoCE interfaces may lose their IP settings due to an unexpected change of the network interface name.

The RoCE Express adapters can lose their IP settings due to an unexpected change of the network interface name if both of the following conditions are met:

  • User upgrades from a RHEL 8.6 system or earlier.
  • The RoCE card is enumerated by UID.

To workaround this problem:

Create the file /etc/systemd/network/ with the following content:


NamePolicy=kernel database slot path
AlternativeNamesPolicy=database slot path

After rebooting the system for the changes to take effect, you can safely upgrade to RHEL 8.7 or later.

RoCE interfaces that are enumerated by function ID (FID, indicated by featuring the prefix ens in their interface names) are not unique and not affected by this issue. Set the kernel parameter net.naming-scheme=rhel-8.7. to switch to predictable interface names with the eno prefix. See the Networking with RoCE Express book for further details.

by Stefan Raspl ( at May 23, 2023 09:08 AM

April 28, 2023

KVM on Z

2023 Client Workshop for Linux on IBM Z and LinuxONE North America

Join us at the 2023 Client Workshop for Linux on IBM Z and LinuxONE for North America in-person at the IBM Innovation Studio in Poughkeepsie, NY!
Go here to register.

Dates and Times

  • Dates: Wednesday, May 24th and Thursday, May 25th
  • Times: 9:30 AM until 5:30 PM local time on both days


  • IBM Poughkeepsie
  • 705 Development Court
  • Poughkeepsie, NY

General Information 

On May 24th and May 25th, you have the possibility to get the latest news and technical information for Hybrid Cloud with Linux on IBM zSystems, LinuxONE, KVM, and z/VM. The training will be delivered onsite at the IBM Innovation Studio Poughkeepsie. You have the chance to interact directly with our IBM experts and join different small work group sessions on these two days.  

Enrollment and Costs/Fees 

The class is free of charge. Travel & Living is on your own. 

Target Audience

Linux on IBM zSystems and LinuxONE Clients, Partners and Software Vendors, and IBM zSystems Technical Sales

Topic Highlights

  • Hybrid Cloud on IBM zSystems with Linux and z/OS
  • Getting the most out of the Latest Features in Linux on IBM Z and LinuxONE
  • The Gravity of Data - Co-locating z/OS and Linux
  • IBM LinuxONE Security and Compliance Center Overview & Demo
  • Common z/VM Hurdles and How to Overcome Them
  • Update on z/VM Express System Installation
  • z/VM Security Session
  • IBM Z next middleware HW overview
  • Quantum Deep Dive
  • Red Hat OpenShift, Storage and Solutioning Values
  • Turbonomic resource management with OpenShift on IBM zSystems
  • Running Databases on LinuxONE
  • Confidential computing - Confidential Containers
  • Cloud Native DevSecOps using OpenShift Pipelines on IBM LinuxONE
  • Various Workgroup Sessions

by Stefan Raspl ( at April 28, 2023 09:25 AM

April 27, 2023

Stefan Hajnoczi

libblkio 1.3.0 is out


The 1.3.0 release of the libblkio high-performance block device I/O library is out. libblkio provides an API that abstracts various storage interfaces that are efficient but costly to integrate into applications including io_uring, NVMe uring_cmd, virtio-blk-pci, vdpa-blk, and more. Switching between them is very easy using libblkio and gives you flexibility to target both kernel and userspace storage interfaces in your application.

Linux packaging work has progressed over the past few months. Arch Linux, Fedora, and CentOS Stream now carry libblkio packages and more will come in the future. This makes it easier to use libblkio in your applications because you don't need to compile it from source yourself.

In this release the vdpa-blk storage interface support has been improved. vpda-blk is a virtio-blk-based storage interface designed for hardware implementation, typically on Data Processing Unit (DPU) PCIe adapters. Applications can use vdpa-blk to talk directly to the hardware from userspace. This approach can be used either as part of a hypervisor like QEMU or simply to accelerate I/O-bound userspace applications. QEMU uses libblkio to make vdpa-blk devices available to guests.

The downloads and release notes are available here.

by Unknown ( at April 27, 2023 08:58 PM

April 20, 2023

QEMU project

QEMU version 8.0.0 released

We’d like to announce the availability of the QEMU 8.0.0 release. This release contains 2800+ commits from 238 authors.

You can grab the tarball from our download page. The full list of changes are available in the changelog.

Highlights include:

  • ARM: emulation support for FEAT_EVT, FEAT_FGT, and AArch32 ARMv8-R
  • ARM: CPU emulation for Cortex-A55 and Cortex-R52, and new Olimex STM32 H405 machine type
  • ARM: gdbstub support for M-profile system registers
  • HPPA: fid (Floating-Point Identify) instruction support and 32-bit emulation improvements
  • RISC-V: additional ISA and Extension support for smstateen, native debug icount trigger, cache-related PMU events in virtual mode, Zawrs/Svadu/T-Head/Zicond extensions, and ACPI support
  • RISC-V: updated machine support for OpenTitan, PolarFire, and OpenSBI
  • RISC-V: wide ranges of fixes covering PMP propagation for TLB, mret exceptions, uncompressed instructions, and other emulation/virtualization improvements
  • s390x: improved zPCI passthrough device handling
  • s390x: support for asynchronous teardown of memory of secure KVM guests during reboot
  • x86: support for Xen guests under KVM with Linux v5.12+
  • x86: new SapphireRapids CPU model
  • x86: TCG support for FSRM, FZRM, FSRS, and FSRC CPUID flags
  • virtio-mem: support for using preallocation in conjunction with live migration
  • VFIO: experimental migration support updated to v2 VFIO migration protocol
  • qemu-nbd: improved efficient over TCP and when using TLS
  • and lots more…

Thank you to everybody who contributed to this release, whether that was by writing code, reporting bugs, improving documentation, testing, or providing the project with CI resources. We couldn’t do these without you!

April 20, 2023 06:53 PM

April 04, 2023

KVM on Z

Introducing IBM z16 A02 & IBM LinuxONE Rockhopper 4

Today we announced new IBM z16 and IBM LinuxONE single frame and rack mount options. That’s right – a mainframe in a rack mount! The unique design provides more flexibility and choice for clients to choose the best fit for their business, whether they are a start-up or a major enterprise.

See the official press release here.

Also, see Ian Cutress from TechTechPotato giving us a visit on site in Böblingen, Germany, to check out our hardware here.

And participate in our virtual launch events as follows:

  • Event registration and replays: 
  • IBM z16 Virtual Launch April 4th (Replay)
  • IBM LinuxONE Virtual Launch April 17th  (Registration)

by Stefan Raspl ( at April 04, 2023 04:57 PM

March 24, 2023

QEMU project

Preparing a consistent Python environment

Building QEMU is a complex task, split across several programs. the configure script finds the host and cross compilers that are needed to build emulators and firmware; Meson prepares the build environment for the emulators; finally, Make and Ninja actually perform the build, and in some cases they run tests as well.

In addition to compiling C code, many build steps run tools and scripts which are mostly written in the Python language. These include processing the emulator configuration, code generators for tracepoints and QAPI, extensions for the Sphinx documentation tool, and the Avocado testing framework. The Meson build system itself is written in Python, too.

Some of these tools are run through the python3 executable, while others are invoked directly as sphinx-build or meson, and this can create inconsistencies. For example, QEMU’s configure script checks for a minimum version of Python and rejects too-old interpreters. However, what would happen if code run by Sphinx used a different version?

This situation has been largely hypothetical until recently; QEMU’s Python code is already tested with a wide range of versions of the interpreter, and it would not be a huge issue if Sphinx used a different version of Python as long as both of them were supported. This will change in version 8.1 of QEMU, which will bump the minimum supported version of Python from 3.6 to 3.8. While all the distros that QEMU supports have a recent-enough interpreter, the default on RHEL8 and SLES15 is still version 3.6, and that is what all binaries in /usr/bin use unconditionally.

As of QEMU 8.0, even if configure is told to use /usr/bin/python3.8 for the build, QEMU’s custom Sphinx extensions would still run under Python 3.6. configure does separately check that Sphinx is executing with a new enough Python version, but it would be nice if there were a more generic way to prepare a consistent Python environment.

This post will explain how QEMU 8.1 will ensure that a single interpreter is used for the whole of the build process. Getting there will require some familiarity with Python packaging, so let’s start with virtual environments.

Virtual environments

It is surprisingly hard to find what Python interpreter a given script will use. You can try to parse the first line of the script, which will be something like #! /usr/bin/python3, but there is no guarantee of success. For example, on some version of Homebrew /usr/bin/meson will be a wrapper script like:

PYTHONPATH="/usr/local/Cellar/meson/0.55.0/lib/python3.8/site-packages" \
  exec "/usr/local/Cellar/meson/0.55.0/libexec/bin/meson" "$@"

The file with the Python shebang line will be hidden somewhere in /usr/local/Cellar. Therefore, performing some kind of check on the files in /usr/bin is ruled out. QEMU needs to set up a consistent environment on its own.

If a user who is building QEMU wanted to do so, the simplest way would be to use Python virtual environments. A virtual environment takes an existing Python installation but gives it a local set of Python packages. It also has its own bin directory; place it at the beginning of your PATH and you will be able to control the Python interpreter for scripts that begin with #! /usr/bin/env python3.

Furthermore, when packages are installed into the virtual environment with pip, they always refer to the Python interpreter that was used to create the environment. Virtual environments mostly solve the consistency problem at the cost of an extra pip install step to put QEMU’s build dependencies into the environment.

Unfortunately, this extra step has a substantial downside. Even though the virtual environment can optionally refer to the base installation’s installed packages, pip will always install packages from scratch into the virtual environment. For all Linux distributions except RHEL8 and SLES15 this is unnecessary, and users would be happy to build QEMU using the versions of Meson and Sphinx included in the distribution.

Even worse, pip install will access the Python package index (PyPI) over the Internet, which is often impossible on build machines that are sealed from the outside world. Automated installation of PyPI dependencies may actually be a welcome feature, but it must also remain strictly optional.

In other words, the ideal solution would use a non-isolated virtual environment, to be able to use system packages provided by Linux distributions; but it would also ensure that scripts (sphinx-build, meson, avocado) are placed into bin just like pip install does.

Distribution packages

When it comes to packages, Python surely makes an effort to be confusing. The fundamental unit for importing code into a Python program is called a package; for example os and sys are two examples of a package. However, a program or library that is distributed on PyPI consists of many such “import packages”: that’s because while pip is usually said to be a “package installer” for Python, more precisely it installs “distribution packages”.

To add to the confusion, the term “distribution package” is often shortened to either “package” or “distribution”. And finally, the metadata of the distribution package remains available even after installation, so “distributions” include things that are already installed (and are not being distributed anywhere).

All this matters because distribution metadata will be the key to building the perfect virtual environment. If you look at the content of bin/meson in a virtual environment, after installing the package with pip, this is what you find:

# -*- coding: utf-8 -*-
import re
import sys
from mesonbuild.mesonmain import main
if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])

This looks a lot like automatically generated code, and in fact it is; the only parts that vary are the from mesonbuild.mesonmain import main import, and the invocation of the main() function on the last line. pip creates this invocation script based on the setup.cfg file in Meson’s source code, more specifically based on the following stanza:

console_scripts =
  meson = mesonbuild.mesonmain:main

Similar declarations exist in Sphinx, Avocado and so on, and accessing their content is easy via importlib.metadata (available in Python 3.8+):

$ python3
>>> from importlib.metadata import distribution
>>> distribution('meson').entry_points
[EntryPoint(name='meson', value='mesonbuild.mesonmain:main', group='console_scripts')]

importlib looks up the metadata in the running Python interpreter’s search path; if Meson is installed under another interpreter’s site-packages directory, it will not be found:

$ python3.8
>>> from importlib.metadata import distribution
>>> distribution('meson').entry_points
Traceback (most recent call last):
importlib.metadata.PackageNotFoundError: meson

So finally we have a plan! configure can build a non-isolated virtual environment, use importlib to check that the required packages exist in the base installation, and create scripts in bin that point to the right Python interpreter. Then, it can optionally use pip install to install the missing packages.

While this process includes a certain amount of specialized logic, Python provides a customizable venv module to create virtual environments. The custom steps can be performed by subclassing venv.EnvBuilder.

This will provide the same experience as QEMU 8.0, except that there will be no need for the --meson and --sphinx-build options to the configure script. The path to the Python interpreter is enough to set up all Python programs used during the build.

There is only one thing left to fix…

Nesting virtual environments

Remember how we started with a user that creates her own virtual environment before building QEMU? Well, this would not work anymore, because virtual environments cannot be nested. As soon as configure creates its own virtual environment, the packages installed by the user are not available anymore.

Fortunately, the “appearance” of a nested virtual environment is easy to emulate. Detecting whether python3 runs in a virtual environment is as easy as checking sys.prefix != sys.base_prefix; if it is, we need to retrieve the parent virtual environments site-packages directory:

>>> import sysconfig
>>> sysconfig.get_path('purelib')

and write it to a .pth file in the lib directory of the new virtual environment. The following demo shows how a distribution package in the parent virtual environment will be available in the child as well:

A small detail is that configure’s new virtual environment should mirror the isolation setting of the parent. An isolated venv can be detected because sys.base_prefix in site.PREFIXES is false.


Right now, QEMU only makes a minimal attempt at ensuring consistency of the Python environment; Meson is always run using the interpreter that was passed to the configure script with --python or $PYTHON, but that’s it. Once the above technique will be implemented in QEMU 8.1, there will be no difference in the build experience, but configuration will be easier and a wider set of invalid build environments will be detected. We will merge these checks before dropping support for Python 3.6, so that users on older enterprise distributions will have a smooth transition.

March 24, 2023 09:00 AM

March 22, 2023

Stefan Hajnoczi

How to debug stuck VIRTIO devices in QEMU

Every once in a while a bug comes along where a guest hangs while communicating with a QEMU VIRTIO device. In this blog post I'll share some debugging approaches that can help QEMU developers who are trying to understand why a VIRTIO device is stuck.

There are a number of reasons why communication with a VIRTIO device might cease, so it helps to identify the nature of the hang:

  • Did the QEMU device see the requests that the guest driver submitted?
  • Did the QEMU device complete the request?
  • Did the guest driver see the requests that the device completed?

The case I will talk about is when QEMU itself is still responsive (the QMP/HMP monitor works) and the guest may or may not be responsive.

Finding requests that are stuck

There is a QEMU monitor command to inspect virtqueues called x-query-virtio-queue-status (QMP) and info virtio-queue-status (HMP). This is a quick way to extract information about a virtqueue from QEMU.

This command allows us to answer the question of whether the QEMU device completed its requests. The shadow_avail_idx and used_idx values in the output are the Available Ring index and Used Ring index, respectively. When they are equal the device has completed all requests. When they are not equal there are still requests in flight and the request must be stuck inside QEMU.

Here is a little more background on the index values. Remember that VIRTIO Split Virtqueues have an Available Ring index and a Used Ring index. The Available Ring index is incremented by the driver whenever it submits a request. The Used Ring index is incremented by the device whenever it completes a request. If the Available Ring index is equal to the Used Ring index then all requests have been completed.

Note that shadow_avail_idx is not the vring Available Ring index in guest RAM but just the last cached copy that the device saw. That means we cannot tell if there are new requests that the device hasn't seen yet. We need to take another approach to figure that out.

Finding requests that the device has not seen yet

Maybe the device has not seen new requests recently and this is why the guest is stuck. That can happen if the device is not receiving Buffer Available Notifications properly (normally this is done by reading a virtqueue kick ioeventfd, also known as a host notifier in QEMU).

We cannot use QEMU monitor commands here, but attaching the GDB debugger to QEMU will allow us to peak at the Available Ring index in guest RAM. The following GDB Python script loads the Available Ring index for a given VirtQueue:

$ cat
import gdb

# ADDRESS is the address of a VirtQueue struct
vq = gdb.Value(ADDRESS).cast(gdb.lookup_type('VirtQueue').pointer())
avail_idx = vq['vring']['caches']['avail']['ptr'].cast(uint16_type.pointer())[1]
if avail_idx != vq['shadow_avail_idx']:
  print('Device has not seen all available buffers: avail_idx {} shadow_avail_idx {} in {}'.format(avail_idx, vq['shadow_avail_idx'], vq.dereference()))

You can run the script using the source GDB command. Finding the address of the virtqueue depends on the type of device that you are debugging.

Finding completions that the guest has not seen

If requests are not stuck inside QEMU and the device has seen the latest request, then the guest driver might have missed the Used Buffer Notification from the device (normally an interrupt handler or polling loop inside the guest detects completed requests).

In VIRTIO the driver's current index in the Used Ring is not visible to the device. This means we have no general way of knowing whether the driver has seen completions. However, there is a cool trick for modern devices that have the VIRTIO_RING_F_EVENT_IDX feature enabled.

The trick is that the Linux VIRTIO driver code updates the Used Event Index every time a completed request is popped from the virtqueue. So if we look at the Used Event Index we know the driver's index into the Used Ring and can find out whether it has seen request completions.

The following GDB Python script loads the Used Event Index for a given VirtQueue:

$ cat
import gdb

# ADDRESS is the address of a VirtQueue struct
vq = gdb.Value(ADDRESS).cast(gdb.lookup_type('VirtQueue').pointer())
used_event = vq['vring']['caches']['avail']['ptr'].cast(uint16_type.pointer())[2 + vq['vring']['num']]
if used_event != vq['used_idx']:
  print('Driver has not seen all used buffers: used_event {} used_idx {} in {}'.format(used_event, vq['used_idx'], vq.dereference()))

You can run the script using the source GDB command. Finding the address of the virtqueue depends on the type of device that you are debugging.


I hope this helps anyone who has to debug a VIRTIO device that seems to have gotten stuck.

by Unknown ( at March 22, 2023 07:39 PM

March 17, 2023

KVM on Z

KVM Upstream Roundup 2022

The KVM on Z community has been quite busy in 2022, delivering a number of features. Here is a look back at a selection of significant new features in recent months:
  • Support for IBM z16 and LinuxONE Emperor 4
    • Allows guest exploitation of new hardware features and live migration for the latest machine generation.
    • Requires QEMU 6.1 and Linux kernel 5.14 or later.
  • Support of long kernel command lines
    • Kernel command lines up to 64 KB length, e.g. allows to specify plenty of I/O devices.
    • Requires QEMU 7.0, s390-tools 2.20 or later.
  • Remote attestation for Secure Execution
    • Cryptographic evidence of workload authenticity and integrity facilitates integration into common Confidential Computing frameworks.
    • Requires QEMU 7.1 and Linux kernel 5.19 or later.
  • Encrypted dump for Secure Execution
    • Enhances problem determination capabilities while not compromising the security of secure KVM guests.
    • Requires QEMU 7.1, Linux 5.19, and s390-tools 2.22 or later.
  • Persistent configuration for vfio-ap 
    • The s390-tools command zdev can now be used to persist Crypto passthrough configurations.
    • Requires s390-tools 2.22 or later.
  • Interpretive vfio-pci support for ISM
    • Allows pass-through of ISM devices to KVM guests, enabling high-bandwith and low-latency network communication using SMC-D.
    • Requires QEMU 7.2 and Linux kernel 6.2 or later.

by Unknown ( at March 17, 2023 02:56 PM

March 09, 2023

Daniel Berrange creating tiny initrds for testing QEMU or Linux kernel/userspace behaviour

As a virtualization developer a significant amount of time is spent in understanding and debugging the behaviour and interaction of QEMU and the guest kernel/userspace code. As such my development machines have a variety of guest OS installations that get booted for various tasks. Some tasks, however, require a repeated cycle of QEMU code changes, or QEMU config changes, followed by guest testing. Waiting for an OS to boot can quickly become a significant time sink affecting productivity and lead to frustration. What is needed is a very low overhead way to accomplish simple testing tasks without an OS getting in the way.

Enter ‘‘ tool for creating minimal initrd images.

If invoked with no arguments, this tool will create an initrd containing nothing more than busybox. The “init” program will be a script that creates a few device nodes, mounts proc/sysfs and then runs the busybox ‘sh’ binary to provide an interactive shell. This is intended to be used as follows

$ ./

$ qemu-system-x86_64 \
    -kernel /boot/vmlinuz-$(uname -r) \
    -initrd tiny-initrd.img \
    -append 'console=ttyS0 quiet' \
    -accel kvm -m 1000 -display none -serial stdio
~ # uname  -a
Linux (none) 6.0.8-300.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Nov 11 15:09:04 UTC 2022 x86_64 x86_64 x86_64 Linux
~ # uptime
 15:05:42 up 0 min,  load average: 0.00, 0.00, 0.00
~ # free
              total        used        free      shared  buff/cache   available
Mem:         961832       38056      911264        1388       12512      845600
Swap:             0           0           0
~ # df
Filesystem           1K-blocks      Used Available Use% Mounted on
none                    480916         0    480916   0% /dev
~ # ls
bin   dev   init  proc  root  sys   usr
~ # <Ctrl+D>
[   23.841282] reboot: Power down

When I say “low overhead”, just how low are we talking about ? With KVM, it takes less than a second to bring up the shell. Testing with emulation is where this really shines. Booting a full Fedora OS with QEMU emulation is slow enough that you don’t want to do it at all frequently. With this tiny initrd, it’ll take a little under 4 seconds to boot to the interactive shell. Much slower than KVM, but fast enough you’ll be fine repeating this all day long, largely unaffected by the (lack of) speed relative to KVM.

The tool will create the initrd such that it drops you into a shell, but it can be told to run another command instead. This is how I tested the overheads mentioned above

$ ./ --run poweroff

$ time qemu-system-x86_64 \
     -kernel /boot/vmlinuz-$(uname -r) \
     -initrd tiny-initrd.img \
     -append 'console=ttyS0 quiet' \
     -m 1000 -display none -serial stdio -accel kvm
[    0.561174] reboot: Power down

real	0m0.828s
user	0m0.613s
sys	0m0.093s
$ time qemu-system-x86_64 \
     -kernel /boot/vmlinuz-$(uname -r) \
     -initrd tiny-initrd.img \
     -append 'console=ttyS0 quiet' \
     -m 1000 -display none -serial stdio -accel tcg
[    2.741983] reboot: Power down

real	0m3.774s
user	0m3.626s
sys	0m0.174s

As a more useful real world example, I wanted to test the effect of changing the QEMU CPU configuration against KVM and QEMU, by comparing at the guest /proc/cpuinfo.

$ ./ --run 'cat /proc/cpuinfo'

$ qemu-system-x86_64 \
    -kernel /boot/vmlinuz-$(uname -r) \
    -initrd tiny-initrd.img \
    -append 'console=ttyS0 quiet' \
    -m 1000 -display none -serial stdio -accel tcg -cpu max | grep '^flags'
flags		: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
                  cmov pat pse36 clflush acpi mmx fxsr sse sse2 ss syscall 
                  nx mmxext pdpe1gb rdtscp lm 3dnowext 3dnow rep_good nopl 
                  cpuid extd_apicid pni pclmulqdq monitor ssse3 cx16 sse4_1 
                  sse4_2 movbe popcnt aes xsave rdrand hypervisor lahf_lm 
                  svm cr8_legacy abm sse4a 3dnowprefetch vmmcall fsgsbase 
                  bmi1 smep bmi2 erms mpx adx smap clflushopt clwb xsaveopt 
                  xgetbv1 arat npt vgif umip pku ospke la57

$ qemu-system-x86_64 \
    -kernel /boot/vmlinuz-$(uname -r) \
    -initrd tiny-initrd.img \
    -append 'console=ttyS0 quiet' \
    -m 1000 -display none -serial stdio -accel kvm -cpu max | grep '^flags'
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
                  cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx 
                  pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good 
                  nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx 
                  ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe 
                  popcnt tsc_deadline_timer aes xsave avx f16c rdrand 
                  hypervisor lahf_lm abm 3dnowprefetch cpuid_fault 
                  invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced 
                  tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase 
                  tsc_adjust sgx bmi1 avx2 smep bmi2 erms invpcid mpx 
                  rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 
                  xsaves arat umip sgx_lc md_clear arch_capabilities

NB, with the list of flags above, I’ve manually line wrapped the output for saner presentation in this blog rather than have one giant long line.

These examples have relied on tools provided by busybox, but we’re not limited by that. It is possible to tell it to copy in arbitrary extra binaries from the host OS by just listing their name. If it is a dynamically linked ELF binary, it’ll follow the ELF header dependencies, pulling in any shared libraries needed.

$ ./ hwloc-info lstopo-no-graphics
Copy bin /usr/bin/hwloc-info -> /tmp/make-tiny-imagexu_mqd99/bin/hwloc-info
Copy bin /usr/bin/lstopo-no-graphics -> /tmp/make-tiny-imagexu_mqd99/bin/lstopo-no-graphics
Copy lib /lib64/ -> /tmp/make-tiny-imagexu_mqd99/lib64/
Copy lib /lib64/ -> /tmp/make-tiny-imagexu_mqd99/lib64/
Copy lib /lib64/ -> /tmp/make-tiny-imagexu_mqd99/lib64/
Copy lib /lib64/ -> /tmp/make-tiny-imagexu_mqd99/lib64/
Copy lib /lib64/ -> /tmp/make-tiny-imagexu_mqd99/lib64/

$ qemu-system-x86_64 -kernel /boot/vmlinuz-$(uname -r) -initrd tiny-initrd.img -append 'console=ttyS0 quiet' -m 1000 -display none -serial stdio -accel kvm 
~ # hwloc-info 
depth 0:           1 Machine (type #0)
 depth 1:          1 Package (type #1)
  depth 2:         1 L3Cache (type #6)
   depth 3:        1 L2Cache (type #5)
    depth 4:       1 L1dCache (type #4)
     depth 5:      1 L1iCache (type #9)
      depth 6:     1 Core (type #2)
       depth 7:    1 PU (type #3)
Special depth -3:  1 NUMANode (type #13)
Special depth -4:  1 Bridge (type #14)
Special depth -5:  3 PCIDev (type #15)
Special depth -6:  1 OSDev (type #16)
Special depth -7:  1 Misc (type #17)

~ # lstopo-no-graphics 
Machine (939MB total)
  Package L#0
    NUMANode L#0 (P#0 939MB)
    L3 L#0 (16MB) + L2 L#0 (4096KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
    PCI 00:01.1 (IDE)
      Block "sr0"
    PCI 00:02.0 (VGA)
    PCI 00:03.0 (Ethernet)

An obvious limitation is that if the binary/library requires certain data files, those will not be present in the initrd. It isn’t attempting to do anything clever like query the corresponding RPM file list and copy those. This tool is meant to be simple and fast and keep out of your way. If certain data files are critical for testing though, the --copy argument can be used. The copied files will be put at the same path inside the initrd as found on the host

$ ./ --copy /etc/redhat-release 
Copy extra /etc/redhat-release -> /tmp/make-tiny-imageicj1tvq4/etc/redhat-release

$ qemu-system-x86_64 \
    -kernel /boot/vmlinuz-$(uname -r) \
    -initrd tiny-initrd.img \
    -append 'console=ttyS0 quiet' \
    -m 1000 -display none -serial stdio -accel kvm 
~ # cat /etc/redhat-release 
Fedora release 37 (Thirty Seven)

What if the problem being tested requires using some kernel modules ? That’s covered too with the --kmod argument, which will copy in the modules listed, along with their dependencies and the insmod command itself. As an example of its utility, I used this recently to debug a regression in support for the iTCO watchdog in Linux kernels

$ ./ --kmod lpc_ich --kmod iTCO_wdt  --kmod i2c_i801 
Copy kmod /lib/modules/6.0.8-300.fc37.x86_64/kernel/drivers/mfd/lpc_ich.ko.xz -> /tmp/make-tiny-image63td8wbl/lib/modules/lpc_ich.ko.xz
Copy kmod /lib/modules/6.0.8-300.fc37.x86_64/kernel/drivers/watchdog/iTCO_wdt.ko.xz -> /tmp/make-tiny-image63td8wbl/lib/modules/iTCO_wdt.ko.xz
Copy kmod /lib/modules/6.0.8-300.fc37.x86_64/kernel/drivers/watchdog/iTCO_vendor_support.ko.xz -> /tmp/make-tiny-image63td8wbl/lib/modules/iTCO_vendor_support.ko.xz
Copy kmod /lib/modules/6.0.8-300.fc37.x86_64/kernel/drivers/mfd/intel_pmc_bxt.ko.xz -> /tmp/make-tiny-image63td8wbl/lib/modules/intel_pmc_bxt.ko.xz
Copy kmod /lib/modules/6.0.8-300.fc37.x86_64/kernel/drivers/i2c/busses/i2c-i801.ko.xz -> /tmp/make-tiny-image63td8wbl/lib/modules/i2c-i801.ko.xz
Copy kmod /lib/modules/6.0.8-300.fc37.x86_64/kernel/drivers/i2c/i2c-smbus.ko.xz -> /tmp/make-tiny-image63td8wbl/lib/modules/i2c-smbus.ko.xz
Copy bin /usr/sbin/insmod -> /tmp/make-tiny-image63td8wbl/bin/insmod
Copy lib /lib64/ -> /tmp/make-tiny-image63td8wbl/lib64/
Copy lib /lib64/ -> /tmp/make-tiny-image63td8wbl/lib64/
Copy lib /lib64/ -> /tmp/make-tiny-image63td8wbl/lib64/
Copy lib /lib64/ -> /tmp/make-tiny-image63td8wbl/lib64/
Copy lib /lib64/ -> /tmp/make-tiny-image63td8wbl/lib64/
Copy lib /lib64/ -> /tmp/make-tiny-image63td8wbl/lib64/
Copy lib /lib64/ -> /tmp/make-tiny-image63td8wbl/lib64/

$ ~/src/virt/qemu/build/qemu-system-x86_64 -kernel /boot/vmlinuz-$(uname -r) -initrd tiny-initrd.img -append 'console=ttyS0 quiet' -m 1000 -display none -serial stdio -accel kvm  -M q35 -global ICH9-LPC.noreboot=false -watchdog-action poweroff -trace ich9* -trace tco*
ich9_cc_read addr=0x3410 val=0x20 len=4
ich9_cc_write addr=0x3410 val=0x0 len=4
ich9_cc_read addr=0x3410 val=0x0 len=4
ich9_cc_read addr=0x3410 val=0x0 len=4
ich9_cc_write addr=0x3410 val=0x20 len=4
ich9_cc_read addr=0x3410 val=0x20 len=4
tco_io_write addr=0x4 val=0x8
tco_io_write addr=0x6 val=0x2
tco_io_write addr=0x6 val=0x4
tco_io_read addr=0x8 val=0x0
tco_io_read addr=0x12 val=0x4
tco_io_write addr=0x12 val=0x32
tco_io_read addr=0x12 val=0x32
tco_io_write addr=0x0 val=0x1
tco_timer_reload ticks=50 (30000 ms)
~ # mknod /dev/watchdog0 c 10 130
~ # cat /dev/watchdog0
tco_io_write addr=0x0 val=0x1
tco_timer_reload ticks=50 (30000 ms)
cat: read error: Invalid argument
[   11.052062] watchdog: watchdog0: watchdog did not stop!
tco_io_write addr=0x0 val=0x1
tco_timer_reload ticks=50 (30000 ms)
~ # tco_timer_expired timeouts_no=0 no_reboot=0/1
tco_timer_reload ticks=50 (30000 ms)
tco_timer_expired timeouts_no=1 no_reboot=0/1
tco_timer_reload ticks=50 (30000 ms)
tco_timer_expired timeouts_no=0 no_reboot=0/1
tco_timer_reload ticks=50 (30000 ms)

The Linux regression had accidentally left the watchdog with the ‘no reboot’ bit set, so it would never trigger the action, which we diagnosed from seeing repeated QEMU trace events for tco_timer_expired after triggering the watchdog in the guest. This was quicky fixed by the Linux maintainers.

In spite of being such a simple and crude script, with many, many, many unhandled edge cases, it has proved remarkably useful at enabling low overhead debugging of QEMU/Linux guest behaviour.

by Daniel Berrange at March 09, 2023 03:54 PM

March 08, 2023

QEMU project

KVM Forum 2023: Call for presentations

KVM Forum is an annual event that presents a rare opportunity for KVM and QEMU developers and users to discuss the state of Linux virtualization technology and plan for the challenges ahead. Sessions include updates on the state of the KVM virtualization stack, planning for the future, and many opportunities for attendees to collaborate.

This year’s event will be held in Brno, Czech Republic on June 14-15, 2023. It will be in-person only and will be held right before the DevConf.CZ open source community conference.

June 14 will be at least partly dedicated to a hackathon or “day of BoFs”. This will provide time for people to get together and discuss strategic decisions, as well as other topics that are best solved within smaller groups.

Call for presentations

We encourage you to submit presentations via the KVM Forum CfP page. Suggested topics include:

  • Scalability and Optimization
  • Hardening and security
  • Confidential computing
  • Testing
  • KVM and the Linux Kernel:
    • New Features and Ports
    • Device Passthrough: VFIO, mdev, vDPA
    • Network Virtualization
    • Virtio and vhost
  • Virtual Machine Monitors and Management:
    • VMM Implementation: APIs, Live Migration, Performance Tuning, etc.
    • Multi-process VMMs: vhost-user, vfio-user, QEMU Storage Daemon
    • QEMU without KVM: Hypervisor.framework and other hypervisors
    • Managing KVM: Libvirt, KubeVirt, Kata Containers
  • Emulation:
    • New Devices, Boards and Architectures
    • CPU Emulation and Binary Translation

The deadline for submitting presentations is April 2, 2023 - 11:59 PM PDT. Accepted speakers will be notified on April 17, 2023.

Attending KVM Forum

Admission to KVM Forum and DevConf.CZ is free. However, registration is required and the number of attendees is limited by the space available at the venue.

The DevConf.CZ program will feature technical talks on a variety of topics, including cloud and virtualization infrastructure—so make sure to register for DevConf.CZ as well if you would like to attend.

Both conferences are committed to fostering an open and welcoming environment for everybody. Participants are expected to abide by the code of conduct and media policy.

March 08, 2023 12:45 PM

February 23, 2023

QEMU project

Announcing QEMU Google Summer of Code and Outreachy 2023 internships

QEMU is participating in Google Summer of Code and Outreachy again this year! Google Summer of Code and Outreachy are open source internship programs that offer paid remote work opportunities for contributing to open source. Internships generally run May through August, so if you have time and want to experience open source development, read on to find out how you can apply.

Each intern is paired with one or more mentors, experienced QEMU contributors who support them during the internship. Code developed by the intern is submitted through the same open source development process that all QEMU contributions follow. This gives interns experience with contributing to open source software. Some interns then choose to pursue a career in open source software after completing their internship.

Find out if you are eligible

Information on who can apply is here for Google Summer of Code and here for Outreachy. Note that Outreachy initial applications ended on February 6th so only those who have been accepted into Outreachy can apply for QEMU Outreachy internships.

Select a project idea

Look through the the list of QEMU project ideas and see if there is something you are interested in working on. Once you have found a project idea you want to apply for, email the mentor for that project idea to ask any questions you may have and discuss the idea further.

Submit your application

You can apply for Google Summer of Code from March 20th to April 4th and apply for Outreachy from March 6th to April 3rd.

Good luck with your applications!

If you have questions about applying for QEMU GSoC or Outreachy, please email Stefan Hajnoczi or ask on the #qemu-gsoc IRC channel.

February 23, 2023 01:30 PM

February 20, 2023

Stefan Hajnoczi

Writing a C library in Rust

I started working on libblkio in 2020 with the goal of creating a high-performance block I/O library. The internals are written in Rust while the library exposes a public C API for easy integration into existing applications. Most languages have a way to call C APIs, often called a Foreign Function Interface (FFI). It's the most universal way to call into code written in different languages within the same program. The choice of building a C API was a deliberate one in order to make it easy to create bindings in many programming languages. However, writing a library in Rust that exposes a C API is relatively rare (librsvg is the main example I can think of), so I wanted to share what I learnt from this project.

Calling Rust code from C

Rust has good support for making functions callable from C. The documentation on calling Rust code from C covers the basics. Here is the Rust implementation of void blkioq_set_completion_fd_enabled(struct blkioq *q, bool enable) from libblkio:

pub extern "C" fn blkioq_set_completion_fd_enabled(q: &mut Blkioq, enable: bool) {

A C program just needs a function prototype for blkioq_set_completion_fd_enabled() and can call it directly like a C function.

What's really nice is that most primitive Rust types can be passed between languages without special conversion code in Rust. That means the function can accept arguments and return values that map naturally from Rust to C. In the code snippet above you can see that the Rust bool argument can be used without explicit conversion.

C pointers are converted to Rust pointers or references automatically by the compiler. If you want them to be nullable, just wrap them in Rust Option and the C NULL value becomes Rust None while a non-NULL pointer becomes Some. This makes it a breeze to pass data between Rust and C. In the example above, the Rust &mut Blkioq argument is a C struct blkioq *.

Rust structs also map to C nicely when they are declared with repr(C). The Rust compiler lays out the struct in memory so that its representation is compatible with the equivalent C struct.

Limitations of Rust FFI

It's not all roses though. There are fundamental differences between Rust and C that make FFI challenging. Not all language constructs are supported by FFI and some that are require manual work.

Rust generics and dynamically sized types (DST) cannot be used in extern "C" function signatures. Generics require that Rust compiler to generate code, which does not make sense in a C API because there is no Rust compiler involved. DSTs have no mapping to C and so they need to be wrapped in something that can be expressed in C, like a struct. DSTs include trait objects, so you cannot directly pass trait objects across the C/Rust language boundary.

Two extremes in library design

The limitations of FFI raise the question of how to design the library. The first extreme is to use the lowest common denominator language features supported by FFI. In the worst case this means writing C in Rust with frequent use of unsafe (because pointers and unpacked DSTs are passed around). This is obviously a bad approach because it foregoes the safety and expressiveness benefits of Rust. I think few human programmers would follow this approach although code generators or translators might output Rust code of this sort.

The other extreme is to forget about C and focus on writing an idiomatic Rust crate and then build a C API afterwards. Although this sounds nice, it's not entirely a good idea either because of the FFI limitations I mentioned. The Rust crate might be impossible to express as a C API and require significant glue code and possibly performance sacrifices if it values cannot be passed across language boundaries efficiently.

Lessons learnt

When I started libblkio I thought primarily in terms of the C API. Although the FFI code was kept isolated and the rest of the codebase was written in acceptably nice Rust, the main mistake was that I didn't think of what the native Rust crate API should look like. Only thinking of the C API meant that some of the key design decisions were suboptimal for a native Rust crate. Later on, when we began experimenting with a native Rust crate, it became clear where assumptions from the unsafe C API had crept in. It is hard to change them now, although Alberto Faria has done great work in revamping the codebase for a natural Rust API.

I erred too much on the side of the C API. In the future I would try to stay closer to the middle or slightly towards the native Rust API (but not to the extreme). That approach is most likely to end up with code that presents an efficient C API while still implementing it in idiomatic Rust. Overall, implementing a C library API in Rust was a success. I would continue to do this instead of writing new libraries in C because Rust's language features are more attractive than C's.

by Unknown ( at February 20, 2023 03:24 PM

Video and slides available for "vhost-user-blk: a fast userspace block I/O interface"

At FOSDEM '23 I gave a talk about vhost-user-blk and its use as a userspace block I/O interface. The video and slides are now available here. Enjoy!

by Unknown ( at February 20, 2023 12:47 PM

January 27, 2023

Stefan Hajnoczi

Speaking at FOSDEM '23 about "vhost-user-blk: A fast userspace block I/O interface"

vhost-user-blk has connected hypervisors to software-defined storage since around 2017, but it was mainly seen as virtualization technology. Did you know that vhost-user-blk is not specific to virtual machines? I think it's time to use it more generally as a userspace block I/O interface because it's fast, unprivileged, and avoids exposing kernel attack surfaces.

My article about Accessing QEMU storage features without a VM already hinted at this, but now it's time to focus on what vhost-user-blk is and why it's easy to integrate into your applications. libblkio is a simple and familiar block I/O API with vhost-user-blk support. You can connect to existing SPDK-based software-defined storage applications, qemu-storage-daemon, and other vhost-user-blk back-ends.

Come see my FOSDEM '23 talk about vhost-user-blk as a fast userspace block I/O interface live on Saturday Feb 4 2023, 11:15 CET. It will be streamed on the FOSDEM website and recordings will be available later. Slides are available here.

by Unknown ( at January 27, 2023 08:46 PM

Powered by Planet!
Last updated: December 07, 2023 08:06 PMEdit this page