Blogging about open source virtualization

News from QEMU, KVM, libvirt, libguestfs, virt-manager and related tools


Planet Feeds

July 03, 2020

KVM on Z

RHEL providing unlimited KVM Guests on IBM Z

Red Hat has announced a new offering for IBM Z and LinuxONE here.
Red Hat Enterprise Linux for IBM Z with premium support also includes
  • Red Hat Enterprise Linux Extended Update Support add-on (new)
  • Red Hat Enterprise Linux High Availability add-on (new)
  • Red Hat Smart Management (new)
  • Red Hat Insights (new)
  • Unlimited virtual guests (KVM)

by Stefan Raspl ( at July 03, 2020 09:10 AM

July 02, 2020

Stefan Hajnoczi

Avoiding bitrot in C macros

A common approach to debug messages that can be toggled at compile-time in C programs is:

#define DPRINTF(fmt, ...) do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
#define DPRINTF(fmt, ...)

Usually the ENABLE_DEBUG macro is not defined in normal builds, so the C preprocessor expands the debug printfs to nothing. No messages are printed at runtime and the program's binary size is smaller since no instructions are generated for the debug printfs.

This approach has the disadvantage that it suffers from bitrot, the tendency for source code to break over time when it is not actively built and used. Consider what happens when one of the variables used in the debug printf is not updated after being renamed:

- int r;
+ int radius;
DPRINTF("radius %d\n", r);

The code continues to compile after r is renamed to radius because the DPRINTF() macro expands to nothing. The compiler does not syntax check the debug printf and misses that the outdated variable name r is still in use. When someone defines ENABLE_DEBUG months or years later, the compiler error becomes apparent and that person is confronted with fixing a new bug on top of whatever they were trying to debug when they enabled the debug printf!

It's actually easy to avoid this problem by writing the macro differently:

#define ENABLE_DEBUG 0
#define DPRINTF(fmt, ...) do { \
fprintf(stderr, fmt, ## __VA_ARGS__); \
} \
} while (0)

When ENABLE_DEBUG is not defined the macro expands to:

do {
if (0) {
fprintf(stderr, fmt, ...);
} while (0)

What is the difference? This time the compiler parses and syntax checks the debug printf even when it is disabled. Luckily compilers are smart enough to eliminate deadcode, code that cannot be executed, so the binary size remains small.

This applies not just to debug printfs. More generally, all preprocessor conditionals suffer from bitrot. If an #if ... #else ... #endif can be replaced with equivalent unconditional code then it's often worth doing.

by Unknown ( at July 02, 2020 08:10 AM

June 16, 2020

Cole Robinson

virt-manager is deprecated in RHEL (but only RHEL)

TL;DR: I'm the primary author of virt-manager. virt-manager is deprecated in RHEL8 in favor of cockpit, but ONLY in RHEL8 and future RHEL releases. The upstream project virt-manager is still maintained and is still relevant for other distros.

Google 'virt-manager deprecated' and you'll find some discussions suggesting virt-manager is no longer maintained, Cockpit is replacing virt-manager, virt-manager is going to be removed from every distro, etc. These conclusions are misinformed.

The primary source for this confusion is the section 'virt-manager has been deprecated' from the RHEL8 release notes virtualization deprecation section. Relevant quote from the RHEL8.2 docs:

The Virtual Machine Manager application, also known as virt-manager, has been deprecated. The RHEL 8 web console, also known as Cockpit, is intended to become its replacement in a subsequent release.

What that means:

  • virt-manager is in RHEL8 and will be there for the lifetime of RHEL8.
  • Red Hat engineering effort assigned to the virt-manager UI has been reduced compared to previous RHEL versions.
  • The tentative plan is to not ship the virt-manager UI in RHEL9.

Why is this happening? As I understand it, RHEL wants to roughly standardize on Cockpit as their host admin UI tool. It's a great project with great engineers and great UI designers. Red Hat is going all in on it for RHEL and aims to replace the mismash of system-config-X tools and project specific admin frontends (like virt-manager) with a unified project. (Please note: this is my paraphrased understanding, I'm not speaking on behalf of Red Hat here.)

Notice though, this is all about RHEL. virt-manager is not deprecated upstream, or deprecated in other distros automatically just because RHEL has made this decision. The upstream virt-manager project continues on and Red Hat continues to allocate resources to maintain it.

Also, I'm distinguishing virt-manager UI from the virt-manager project, which includes tools like virt-install. I fully expect virt-install to be shipped in RHEL9 and actively maintained (FWIW Cockpit uses it behind the scenes).

And even if the virt-manager UI is not in RHEL9 repos, it will likely end up shipped in EPEL, so the UI will still be available for install, just through external repos.

Overall my personal opinion is that as long as libvirt+KVM is in use on linux desktops and servers, virt-manager will be relevant. I don't expect anything to change in that area any time soon.

by Cole Robinson at June 16, 2020 05:00 PM

May 22, 2020

Stefan Hajnoczi

How to check VIRTIO feature bits inside Linux guests

VIRTIO devices have feature bits that indicate the presence of optional features. The feature bit space is divided into core VIRTIO features (e.g. notify on empty), transport-specific features (PCI, MMIO, CCW), and device-specific features (e.g. virtio-net checksum offloading). This article shows how to check whether a feature is enabled inside Linux guests.

The feature bits are used during VIRTIO device initialization to negotiate features between the device and the driver. The device reports a fixed set of features, typically all the features that the device implementors wanted to offer from the VIRTIO specification version that they developed against. The driver also reports features, typically all the features that the driver developers wanted to offer from the VIRTIO specification version that they developed against.

Feature bit negotiation determines the subset of features supported by both the device and the driver. A new driver might not be able to enable all the features it supports if the device is too old. The same is true vice versa. This offers compatibility between devices and drivers. It also means that you don't know which features are enabled until the device and driver have negotiated them at runtime.

Where to find feature bit definitions

VIRTIO feature bits are listed in the VIRTIO specification. You can also grep the linux/virtio-*.h header files:

$ grep VIRTIO.*_F_ /usr/include/linux/virtio_*.h
virtio_ring.h:#define VIRTIO_RING_F_INDIRECT_DESC 28
virtio_ring.h:#define VIRTIO_RING_F_EVENT_IDX 29
virtio_scsi.h:#define VIRTIO_SCSI_F_INOUT 0
virtio_scsi.h:#define VIRTIO_SCSI_F_HOTPLUG 1
virtio_scsi.h:#define VIRTIO_SCSI_F_CHANGE 2

Here the VIRTIO_SCSI_F_INOUT (0) constant is for the 1st bit (1ull << 0). Bit-numbering can be confusing because different standards, vendors, and languages express it differently. Here it helps to think of a bit shift operation like 1 << BIT.

How to check feature bits inside the guest

The Linux virtio.ko driver that is used for all VIRTIO devices has a sysfs file called features. This file contains the feature bits in binary representation starting with the 1st bit on the left and more significant bits to the right. The reported bits are the subset that both the device and the driver support.

To check if the virtio-blk device /dev/vda has the VIRTIO_RING_F_EVENT_IDX (29) bit set:

$ python -c "print('$(</sys/block/vda/device/driver/virtio*/features)'[29])"

Other device types can be found through similar sysfs paths.

by Unknown ( at May 22, 2020 01:46 PM

May 01, 2020

Daniel Berrange

ANNOUNCE: virt-viewer version 9.0 released

I am happy to announce a new bugfix release of virt-viewer 9.0 (gpg), including experimental Windows installers for Win x86 MSI (gpg) and Win x64 MSI (gpg).

Signatures are created with key DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF (4096R)

With this release the project has moved over to use GitLab for its hosting needs instead of Pagure. Instead of sending patches to the old mailing list, we have adopted modern best practices and now welcome contributions as merge requests, from where they undergo automated CI testing of the build. Bug reports directed towards upstream maintainers, should also be filed at the GitLab project now instead of the Red Hat Bugzilla

All historical releases are available from:

Changes in this release include:

  • Project moved to
  • Allow toggling shared clipboard in remote-viewer
  • Fix handling when initial spice connection fails
  • Fix check for govirt library
  • Add bash completion of cli args
  • Improve errors in file transfer dialog
  • Fix ovirt foreign menu storage domains query
  • Prefer TLS certs from oVirt instead of CLI
  • Improve USB device cleanup when Ctrl-C is used
  • Remember monitor mappings across restarts
  • Add a default file extension to screenshots
  • Updated translations
  • Fix misc memory leaks

by Daniel Berrange at May 01, 2020 05:19 PM

April 30, 2020

KVM on Z

QEMU v5.0 released

QEMU v5.0 is out. For highlights from a KVM on Z perspective see the Release Notes.

by Stefan Raspl ( at April 30, 2020 05:28 PM

Stefan Hajnoczi

How the Linux VFS, block layer, and device drivers fit together

The Linux kernel storage stack consists of several components including the Virtual File System (VFS) layer, the block layer, and device drivers. This article gives an overview of the main objects that a device driver interacts with and their relationships to each other. Actual I/O requests are not covered, instead the focus is on the objects representing the disk.

Let's start with a diagram of the key data structures and then an explanation of how they work together.

The Virtual File System (VFS) layer

The VFS layer is where file system concepts like files and directories are handled. The VFS provides an interface that file systems like ext4, XFS, and NFS implement to register themselves with the kernel and participate in file system operations. The struct file_operations interface is the most interesting for device drivers as we are about to see.

System calls like open(2), read(2), etc are handled by the VFS and dispatched to the appropriate struct file_operationsfunctions.

Block device nodes like /dev/sda are implemented in fs/block_dev.c, which forms a bridge between the VFS and the Linux block layer. The block layer handles the actual I/O requests and is aware of disk-specific information like capacity and block size.

The main VFS concept that device drivers need to be aware of is struct block_device_operations and the struct block_deviceinstances that represent block devices in Linux. A struct block_device connects the VFS inode and struct file_operationsinterface with the block layer struct gendiskand struct request_queue.

In Linux there are separate device nodes for the whole device (/dev/sda) and its partitions (/dev/sda1, /dev/sda2, etc). This is handled by struct block_device so that a partition has a pointer to its parent in bd_contains.

The block layer

The block layer handles I/O request queues, disk partitions, and other disk-specific functionality. Each disk is represented by a struct gendisk and may have multiple struct hd_struct partitions. There is always part0, a special "partition" covering the entire block device.

I/O requests are placed into queues for processing. Requests can be merged and scheduled by the block layer. Ultimately a device driver receives a request for submission to the physical device. Queues are represented by struct request_queue.

The device driver

The disk device driver registers a struct genhd with the block layer and sets up the struct request_queue to receive requests that need to be submitted to the physical device.

There is one struct genhd for the entire device even though userspace may open struct block_device instances for multiple partitions on the disk. Disk partitions are not visible at the driver level because I/O requests have already had their Logical Block Address (LBA) adjusted with the partition start offset.

How it all fits together

The VFS is aware of the block layer struct gendisk. The device driver is aware of both the block layer and the VFS struct block_device. The block layer does not have direct connections to the other components but the device driver provides callbacks.

One of the interesting aspects is that a device driver may drop its reference to struct gendisk but struct block_device instances may still have their references. In this case no I/O can occur anymore because the driver has stopped the disk and the struct request_queue, but userspace processes can still call into the VFS and struct block_device_operations callbacks in the device driver can still be invoked.

Thinking about this case is why I drew the diagram and ended up writing about this topic!

by Unknown ( at April 30, 2020 04:21 PM

KVM on Z

Redbook on KVM on Z

A new Redbook titled "Virtualization Cookbook for IBM Z Volume 5 KVM" is available now. Among others, it covers tasks such as installation, host configuration and guest deployments for Linux distributions by Red Hat, SUSE and Ubuntu.

by Stefan Raspl ( at April 30, 2020 10:41 AM

April 29, 2020

QEMU project

QEMU version 5.0.0 released

We’d like to announce the availability of the QEMU 5.0.0 release. This release contains 2800+ commits from 232 authors.

You can grab the tarball from our download page. The full list of changes are available in the Wiki.

Highlights include:

  • Support for passing host filesystem directory to guest via virtiofsd
  • Live migration support for external processes running on QEMU D-Bus
  • Support for using memory backends for main/”built-in” guest RAM
  • block: support for compressed backup images via block jobs
  • block: qemu-img: ‘measure’ command now supports LUKS images, ‘convert’ command now supports skipping zero’ing of target image
  • block: experimental support for qemu-storage-daemon, which provides access to QEMU block-layer/QMP features like blocks jobs or built-in NBD server without starting a full VM
  • ARM: support for the following architecture features: ARMv8.1 VHE/VMID16/PAN/PMU ARMv8.2 UAO/DCPoP/ATS1E1/TTCNP ARMv8.3 RCPC/CCIDX ARMv8.4 PMU/RCPC
  • ARM: support for Cortex-M7 CPU
  • ARM: new board support for tacoma-bmc, Netduino Plus 2, and Orangepi PC
  • ARM: ‘virt’ machine now supports vTPM and virtio-iommu devices
  • HPPA: graphical console support via HP Artist graphics device
  • MIPS: support for GINVT (global TLB invalidation) instruction
  • PowerPC: ‘pseries’ machine no longer requires reboot to negotiate between XIVE/XICS interrupt controllers when ic-mode=dual
  • PowerPC: ‘powernv’ machine can now emulate KVM hardware acceleration to run KVM guests while in TCG mode
  • PowerPC: support for file-backed NVDIMMs for persistent memory emulation
  • RISC-V: ‘virt’ and ‘sifive_u’ boards now support generic syscon drivers in Linux to control power/reboot
  • RISC-V: ‘virt’ board now supports Goldfish RTC
  • RISC-V: experimental support for v0.5 of draft hypervisor extension
  • s390: support for Adapter Interrupt Suppression while running in KVM mode
  • and lots more…

Thank you to everyone involved!

April 29, 2020 06:00 PM

KVM on Z

RHEL 7 Structure A (KVM host support) Support Lifecycle Extended

Red Hat has updated the Red Hat Enterprise Linux Life Cycle, extending the full support lifecycle for Red Hat Enterprise Linux Structure A on IBM Z to May 31, 2021. See here for details, and here for an entry on the Red Hat Blog, referring to Structure A as "alt packages".
The Structure A release provides updated kernel, QEMU and libvirt packages to run KVM on IBM Z.

by Stefan Raspl ( at April 29, 2020 02:40 PM

April 27, 2020

KVM on Z

Ubuntu 20.04 released

Ubuntu Server 20.04 is out!
It ships
providing Secure Execution as announced here.
For a detailed list of KVM on Z changes, see the release notes here.

by Stefan Raspl ( at April 27, 2020 01:55 PM

April 24, 2020

ARM Datacenter Project

NUMA balancing

NUMA balancing impact on common benchmarks

NUMA balancing can lead to performance degradation on NUMA-based arm64 systems when tasks migrate,
and their memory accesses now suffer additional latency.


System Information
Architecture aarch64
Processor version Kunpeng 920-6426
CPUs 128
NUMA nodes 4
Kernel release 5.6.0+
Node name ARMv2-3

Test results


perf bench -f simple sched pipe  
Test Result
numa_balancing-ON 10.012 (usecs/op)
numa_balancing-OFF 10.509 (usecs/op)


perf bench -f simple sched messaging -l 10000  
Test Result
numa_balancing-ON 6.417 (Sec)
numa_balancing-OFF 6.494 (Sec)


perf bench -f simple  mem memset -s 4GB -l 5 -f default  
Test Result
numa_balancing-ON 17.438783330964565 (GB/sec)
numa_balancing-OFF 17.63163114627642 (GB/sec)


perf bench -f simple futex wake -s -t 1024 -w 1  
Test Result
numa_balancing-ON 9.2742 (ms)
numa_balancing-OFF 9.2178 (ms)


sysbench cpu --time=10 --threads=64 --cpu-max-prime=10000 run  
Test Result
numa_balancing-ON 214960.28 (Events/sec)
numa_balancing-OFF 214965.55 (Events/sec)


sysbench memory --memory-access-mode=rnd --threads=64 run  
Test Result
numa_balancing-ON 1645 (MB/s)
numa_balancing-OFF 1959 (MB/s)


sysbench threads --threads=64 run  
Test Result
numa_balancing-ON 4604 (Events/sec)
numa_balancing-OFF 5390 (Events/sec)


sysbench mutex --mutex-num=1 --threads=512 run  
Test Result
numa_balancing-ON 33.2165 (Sec)
numa_balancing-OFF 32.1088 (Sec)

by Peter at April 24, 2020 11:04 PM

April 23, 2020

ARM Datacenter Project

LISA-QEMU Presentation

We recently gave a presentation on LISA-QEMU to the Linaro Toolchain Working Group.

This presentation highlights our work on LISA-QEMU and provides all the details on what LISA-QEMU is, why we established this project, and how to get up and running creating VMs with the tools we developed.

Please visit the links below to view the presentation or meeting recording.

by Rob Foley at April 23, 2020 03:27 PM

April 22, 2020

ARM Datacenter Project

How to debug kernel using QEMU and aarch64 VM.

QEMU is a great tool to use when needing to debug the kernel.
There are many recipes online for this too, I have listed a few helpful ones at the end of the article for reference.

We would like to share our steps for debug the kernel, but focused on aarch64 systems, as some of the steps might be slightly different for this type of system.

First, create a directory to work in and run these commands to create the flash images:

dd if=/dev/zero of=flash1.img bs=1M count=64
dd if=/dev/zero of=flash0.img bs=1M count=64
dd if=/usr/share/qemu-efi-aarch64/QEMU_EFI.fd of=flash0.img conv=notrunc

Next, download a QEMU image. We will use an ubuntu image that we previously created.

We should mention that our procedure involves building our own kernel from scratch, and feeding this image to QEMU.

Thus the first step is to actually create a QEMU image. We will assume you already have an image to use. If not, check out our articles on:

We prefer the first procedure using LISA-QEMU since we also have a helpful script to install your kernel into the VM image automatically.

But don’t worry, if you want to take a different route we will show all the steps for that too!

Installing Kernel

You have a few options here. One is to boot the image and install the image manually or use LISA-QEMU scripts to install it. The below command will boot the image in case you want to use the later manual approach to boot the image, scp in the kernel (maybe a .deb file) and install it manually with deb -i .deb.

qemu/build/aarch64-softmmu/qemu-system-aarch64 -nographic\
                    -machine virt,gic-version=max -m 2G -cpu max\
                    -netdev user,id=vnet,hostfwd=:\
                    -device virtio-net-pci,netdev=vnet\ 
                    -drive file=./mini_ubuntu.img,if=none,id=drive0,cache=writeback\ 
                    -device virtio-blk,drive=drive0,bootindex=0\ 
                    -drive file=./flash0.img,format=raw,if=pflash \
                    -drive file=./flash1.img,format=raw,if=pflash -smp 4 

To bring up QEMU with a kernel, typically you will need a kernel image (that you built), an initrd image (built after installing the kernel in your image), and the OS image (created above).

Keep in mind the below steps assume a raw image. If you have a qcow2, then use qemu-img to convert it to raw first. For example:

qemu-img convert -O raw my_image.qcow2 my_image_output.raw

Below is how to mount an image to copy out files. You need to copy out the initrd in this case.

$ mkdir mnt
$ sudo losetup -f -P ubuntu.img
$ sudo losetup -l
/dev/loop0         0      0         0  0 ubuntu.img   0     512
$ sudo mount /dev/loop0p2 ./mnt
$ ls ./mnt/boot
config-4.15.0-88-generic  grub                          initrd.img-5.5.11             vmlinuz-5.5.11
config-5.5.11             initrd.img                    initrd.img.old                vmlinuz                    vmlinuz.old
efi                       initrd.img-4.15.0-88-generic  vmlinuz-4.15.0-88-generic
$ cp ./mnt/initrd.img-5.5.11 .
$ sudo umount ./mnt
$ sudo losetup -d /dev/loop0

Next, boot the kernel you built with your initrd. Note the kernel you built can be found at arch/arm64/boot/Image.

This command line will bring up your kernel image with your initrd and your OS Image.

One item you might need to customize is the “root=/dev/vda1” argument. This tells the kernel where to find your boot partition. This might vary depending on your VM image.

qemu/build/aarch64-softmmu/qemu-system-aarch64 -nographic\
                  -machine virt,gic-version=max -m 2G -cpu max\
                  -netdev user,id=vnet,hostfwd=:\
                  -device virtio-net-pci,netdev=vnet\
                  -drive file=./mini_ubuntu.img,if=none,id=drive0,cache=writeback\
                  -device virtio-blk,drive=drive0,bootindex=0\
                  -drive file=./flash0.img,format=raw,if=pflash\
                  -drive file=./flash1.img,format=raw,if=pflash -smp 4\
                  -kernel ./linux/arch/arm64/boot/Image\
                  -append "root=/dev/vda2 nokaslr console=ttyAMA0"\
                  -initrd ./initrd.img-5.5.11 -s -S

-s tells QEMU to use the TCP port :1234
-S will pause at startup, waiting for the debugger to attach.

Before we get started debugging, update your ~/.gdbinit with the following:

add-auto-load-safe-path linux-5.5.11/scripts/gdb/

In another window, start the debugger. Note, if you are on a x86 host debugging aarch64, then you need to use gdb-multiarch (sudo apt-get gdb-multiarch). In our case below we are on an aarch64 host, so we just use gdb.

It’s very important to note that we receive the “done” message below indicating symbols were loaded successfully, otherwise the following steps will not work.

$ gdb linux-5.5.11/vmlinux
GNU gdb (Ubuntu 8.1-0ubuntu3.2)
Reading symbols from linux-5.5.11/vmlinux...done.

Attach the debugger to the kernel. Remember the -s argument above? It told QEMU to use port :1234. We will connect to it now.

(gdb) target remote localhost:1234
Remote debugging using localhost:1234
0x0000000000000000 in ?? ()

That it. The debugger is connected.

Now let’s test out the setup.
Add a breakpoint in the kernel as a test.

(gdb) hbreak start_kernel
Hardware assisted breakpoint 1 at 0xffff800011330cdc: file init/main.c, line 577.
(gdb) c

Thread 1 hit Breakpoint 1, start_kernel () at init/main.c:577
577 {
(gdb) l
572 {
573     rest_init();
574 }
576 asmlinkage __visible void __init start_kernel(void)
577 {
578     char *command_line;
579     char *after_dashes;
581     set_task_stack_end_magic(&init_task);

We hit the breakpoint !

Remember above that we used the -S option to QEMU? This told QEMU to wait to start running the image until we connected the debugger. Thus once we hit continue, QEMU actually starts booting the kernel.


by Rob Foley at April 22, 2020 10:51 AM

April 20, 2020

Stefan Hajnoczi

virtio-fs has landed in QEMU 5.0!

The virtio-fs shared host->guest file system has landed in QEMU 5.0! It consists of two parts: the QEMU -device vhost-user-fs-pci and the actual file server called virtiofsd. Guests need to have a virtio-fs driver in order to access shared file systems. In Linux the driver is called virtiofs.ko and has been upstream since Linux v5.4.

Using virtio-fs

Thanks to libvirt virtio-fs support, it's possible to share directories trees from the host with the guest like this:

<filesystem type='mount' accessmode='passthrough'>
<driver type='virtiofs'/>
<binary xattr='on'>
<lock posix='on' flock='on'/>
<source dir='/path/on/host'/>
<target dir='mount_tag'/>

The host /path/on/host directory tree can be mounted inside the guest like this:

# mount -t virtiofs mount_tag /mnt

Applications inside the guest can then access the files as if they were local files. For more information about virtio-fs, see the project website.

How it works

For the most part, -device vhost-user-fs-pci just facilitates the connection to virtiofsd where the real work happens. When guests submit file system requests they are handled directly by the virtiofsd process on the host and don't need to go via the QEMU process.

virtiofsd is a FUSE file system daemon with virtio-fs extensions. virtio-fs is built on top of the FUSE protocol and therefore supports the POSIX file system semantics that applications expect from a native Linux file system. The Linux guest driver shares a lot of code with the traditional fuse.ko kernel module.

Resources on virtio-fs

I have given a few presentations on virtio-fs:

Future features

A key feature of virtio-fs is the ability to directly access the host page cache, eliminating the need to copy file contents into guest RAM. This so-called DAX support is not upstream yet.

Live migration is not yet implemented. It is a little challenging to transfer all file system state to the destination host and seamlessly continue file system operation without remounting, but it should be doable.

There is a Rust implementation of virtiofsd that is close to reaching maturity and will replace the C implementation. The advantage is that Rust has better memory and thread safety than C so entire classes of bugs can be eliminated. Also, the codebase is written from scratch whereas the C implementation was a combination of several existing pieces of software that were not designed together.

by Unknown ( at April 20, 2020 02:23 PM

April 16, 2020

KVM on Z

Secure Execution for IBM z15 arriving with New Models

IBM announced the latest additions to its IBM z15 series:
  • IBM z15 Model T02
  • IBM LinuxONE III Model LT2
A substantial part of the announcement is a new feature called Secure Execution. For a brief overview, see here. Secure Execution will become available in the following Linux distributions as announced by the respective distribution partners:
We will publish more details on Secure Execution later. The impatient with an interest in lower level technical details might want to check out the presentations here and here.

by Stefan Raspl ( at April 16, 2020 09:14 AM

April 08, 2020

Cornelia Huck

s390x changes in QEMU 5.0

QEMU is currently in hardfreeze, with the 5.0 release expected at the end of the month. Here's a quick list of some notable s390x changes.

  • You can finally enable Adapter Interrupt Suppression in the cpu model (ais=on) when running under KVM. This had been working under TCG for some time now, but KVM was missing an interface that was provided later -- and we finally actually check for that interface in QEMU. This is mostly interesting for PCI.
  • QEMU had been silently fixing odd memory sizes to something that can be reported via SCLP for some time. Silently changing user input is probably not such a good idea; compat machines will continue to do so to enable migration from old QEMUs for machines with odd sizes, but will print a warning now. If you have such an old machine (and you can modify it), it might be a good idea to either specify the memory size it gets rounded to or to switch to the 5.0 machine type, where memory sizes can be more finegrained due to the removal of support for memory hotplug. We may want to get rid of the code doing the fixup at some time in the future.
  • QEMU now properly performs the whole set of initial, clear, and normal cpu reset.
  • And the usual fixes, cleanups, and improvements.
For 5.1, expect more changes; support for protected virtualization will be a big item.

by Cornelia Huck ( at April 08, 2020 01:58 PM

ARM Datacenter Project

How to easily install the kernel in a VM

This article is a follow-up to an earlier article we wrote Introducing LISA-QEMU.

This article will outline the steps to install a kernel into a VM using some scripts we developed. In our case we have an x86_64 host and a aarch64 VM.

We will assume you have cloned the LISA-QEMU repository already. As part of the LISA-QEMU integration we have added a script to automate the process of installing a kernel into a VM. The scripts we talk about below can be found in the LISA-QEMU github

git clone
cd lisa-qemu
git submodule update --init --recursive

We also assume you have built the kernel .deb install package. We covered the detailed steps in our README. You can also find needed dependencies for this article at that link.

You can use to generate a new image with the kernel of choice installed.
Assuming you have the VM image that was created similar to the steps in this post, just launch a command like the below to install your kernel.

$ sudo python3 scripts/ --kernel_pkg ../linux/linux-image-5.5.11_5.5.11-1_arm64.deb 
scripts/ image: build/VM-ubuntu.aarch64/ubuntu.aarch64.img
scripts/ kernel_pkg: ../linux/linux-image-5.5.11_5.5.11-1_arm64.deb

Install kernel successful.
Image path: /home/rob/qemu/lisa-qemu/build/VM-ubuntu.aarch64/ubuntu.aarch64.img.kernel-5.5.11-1

To start this image run this command:
python3 /home/rob/qemu/lisa-qemu/scripts/ -p /home/rob/qemu/lisa-qemu/build/VM-ubuntu.aarch64/ubuntu.aarch64.img.kernel-5.5.11-1

We need to use sudo for these commands since sudo is required as part of mounting images.

Note that the argument is:
-p or –kernel_pkg argument with the .deb kernel package

Also note that the last lines in the output show the command to issue to bring this image up.

To start this image run this command:
python3 /home/rob/qemu/lisa-qemu/scripts/ -p /home/rob/qemu/lisa-qemu/build/VM-ubuntu.aarch64/ubuntu.aarch64.img.kernel-5.5.11-1

You might wonder where we got the VM image from?
It was found in a default location after running our script. See this post for more details.

If you want to supply your own image, we have an argument for that. :)
–image argument with the VM image to start from.

When supplying the image, the command line might look like the below.

sudo python3 scripts/ --kernel_pkg ../linux/linux-image-5.5.11_5.5.11-1_arm64.deb --image build/VM-ubuntu.aarch64/ubuntu.aarch64.img

There are a few options for installing the kernel.

By default will attempt to install your kernel using a chroot environment. This is done for speed more than anything else since in our case is is faster to use the chroot than to bring up the aarch64 emulated VM and install the kernel.

We also support the –vm option which will bring up the VM with QEMU and then install the kernel into it. If you run into issues with the chroot environment install this would be a good alternative.

An example of the VM install method.

sudo python3 scripts/ --vm --kernel_pkg ../linux/linux-image-5.5.11_5.5.11-1_arm64.deb

Thanks for taking the time to learn more about our work on LISA-QEMU !

by Rob Foley at April 08, 2020 11:50 AM

April 02, 2020

ARM Datacenter Project


This article is a follow-up to an earlier article we wrote Introducing LISA-QEMU.

LISA-QEMU provides an integration which allows LISA to work with QEMU VMs. LISA’s goal is to help Linux kernel developers to measure the impact of modifications in core parts of the kernel.1 Integration with QEMU will allow developers to test wide variety of hardware configurations including ARM architecture and complex NUMA topologies.

This demo will walk through all the steps needed to build and bring up an aarch64 VM on an x86 platform. Future articles will work through reconfiguring the hardware for these VMs, inserting a new kernel into these VMs and more !

The first step is to get your linux machine ready to run LISA-QEMU. In this step we will download all the dependencies needed. We assume Ubuntu in the below steps.

apt-get build-dep -y qemu
apt-get install -y python3-yaml wget git qemu-efi-aarch64 qemu-utils genisoimage qemu-user-static git

Now that we have the correct dependencies, let’s download the LISA-QEMU code.

git clone
cd lisa-qemu
git submodule update --init --progress --recursive

One note on the above. If you do not plan to use lisa, then you can leave off the –recursive and it will update much quicker.

The next step is to build a new VM. This build command takes all the defaults. If you want to learn more about the possible options take a look at –help.

$ time python3 scripts/  --help
usage: [-h] [--debug] [--dry_run] [--ssh]
                      [--image_type IMAGE_TYPE] [--image_path IMAGE_PATH]
                      [--config CONFIG] [--skip_qemu_build]

Build the qemu VM image for use with lisa.

optional arguments:
  -h, --help            show this help message and exit
  --debug, -D           enable debug output
  --dry_run             for debugging.  Just show commands to issue.
  --ssh                 Launch VM and open an ssh shell.
  --image_type IMAGE_TYPE, -i IMAGE_TYPE
                        Type of image to build.
                        From external/qemu/tests/vm.
                        default is ubuntu.aarch64
  --image_path IMAGE_PATH, -p IMAGE_PATH
                        Allows overriding path to image.
  --config CONFIG, -c CONFIG
                        config file.
                        default is conf/conf_default.yml.
  --skip_qemu_build     For debugging script.

  To select all defaults:
  Or select one or more arguments
    scripts/ -i ubuntu.aarch64 -c conf/conf_default.yml

But we digress… Below is the command to build the image.

OK let’s build that image…

python3 scripts/

You will see the progress of the build and other steps of the image creation on your screen. If you would like to see more comprehensive output and progress, use the –debug option.

Depending on your system this might take many minutes. Below are some example times.

50 minutes - Intel i7 laptop with 2 cores and 16 GB of memory
6 minutes - Huawei Taishan 2286 V2 with 128 ARM cores and 512 GB of memory.

Once the image creation is complete, you will see a message like the following.

Image creation successful.
Image path: /home/lisa-qemu/build/VM-ubuntu.aarch64/ubuntu.aarch64.img

Now that we have an image, we can test it out by bringing up the image and opening an ssh connection to it.

python3 scripts/

The time to bring up the VM will vary based on your machine, but it should come up in about 2-3 minutes on most machines.

You should expect to see the following as the system boots and we open an ssh connection to bring us to the guest prompt.

$ python3 scripts/
Conf:        /home/lisa-qemu/build/VM-ubuntu.aarch64/conf.yml
Image type:  ubuntu.aarch64
Image path:  /home/lisa-qemu/build/VM-ubuntu.aarch64/ubuntu.aarch64.img


Now that the system is up and running, you could for example, use it for a lisa test.

In our case we issue one command to show that we are in fact an aarch64 architecture with 8 cores.

qemu@ubuntu-guest:~$ lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  1
Core(s) per socket:  8
Socket(s):           1
NUMA node(s):        1
Vendor ID:           0x00
Model:               0
Stepping:            0x0
BogoMIPS:            125.00
NUMA node0 CPU(s):   0-7
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma sha3 sm3 sm4 asimddp sha512 sve asimdfhm flagm

Once you are done with the VM, you can close the VM simply by typing “exit” a the command prompt.

qemu@ubuntu-guest:~$ exit
Connection to closed by remote host.

That’s it. The VM was gracefully powered off.

We hope this article was helpful to understand just how easy it can be to build and launch a VM with LISA-QEMU !

  1. This definition can be found on the LISA github page 

by Rob Foley at April 02, 2020 11:07 AM

April 01, 2020

ARM Datacenter Project

Introducing LISA-QEMU

LISA-QEMU provides an integration which allows LISA to work with QEMU VMs. LISA’s goal is to help Linux kernel developers to measure the impact of modifications in core parts of the kernel1. Integration with QEMU will allow developers to test wide variety of hardware configurations including ARM architecture and complex NUMA topologies.

One of our goals is to allow developers to test the impact of modifications on aarch64 architectures with complex NUMA topologies. Currently we are focusing on testing kernel CFS scheduler task placement decision mechanism interaction with NUMA_BALANCING

In order to simplify and streamline the development process we created scripts and configuration files, which allow developers to quickly create QEMU VMs with a configurable number of cores and NUMA nodes. We also created a script to install custom build kernel on these VMs. Once a VM is configured with the desired topology and kernel version developers can run interactive and/or automated LISA tests.

Please note, that you do not need physical aarch64 hardware. In fact we have demoed this project on a laptop with a Core-i7-7600U CPU with two cores.

Our approach is to contribute improvements in QEMU and LISA back to the mainstream. In our repository we will keep scripts and configurations belonging to the integration between LISA and QEMU.

LISA Overview: The LISA project provides a toolkit that supports regression testing and interactive analysis of Linux kernel behavior. LISA’s goal is to help Linux kernel developers measure the impact of modifications in core parts of the kernel. LISA itself runs on a host machine, and uses the devlib toolkit to interact with the target via SSH, ADB or telnet. LISA provides features to describe workloads (notably using rt-app) and run them on targets. It can collect trace files from the target OS (e.g. systrace and ftrace traces), parse them via the TRAPpy framework. These traces can then be parsed and analysed in order to examine detailed target behaviour during the workload’s execution.1

Peter also contributed to this article.

We also have articles on LISA-QEMU:

  1. This definition can be found on the LISA github page  2

by Rob Foley at April 01, 2020 08:30 PM

March 25, 2020

Marcin Juszkiewicz

Sharing PCIe cards across architectures

Some days ago during one of conference calls one of my co-workers asked:

Has anyone ever tried PCI forwarding to an ARM VM on an x86 box?

As my machine was opened I just turned it off and inserted SATA controller into one of unused PCI Express slots. After boot I started one of my AArch64 CirrOS VM instances and gave it this card. Worked perfectly:

[   21.603194] pcieport 0000:00:01.0: pciehp: Slot(0): Attention button pressed
[   21.603849] pcieport 0000:00:01.0: pciehp: Slot(0) Powering on due to button press
[   21.604124] pcieport 0000:00:01.0: pciehp: Slot(0): Card present
[   21.604156] pcieport 0000:00:01.0: pciehp: Slot(0): Link Up
[   21.739977] pci 0000:01:00.0: [1b21:0612] type 00 class 0x010601
[   21.740159] pci 0000:01:00.0: reg 0x10: [io  0x0000-0x0007]
[   21.740199] pci 0000:01:00.0: reg 0x14: [io  0x0000-0x0003]
[   21.740235] pci 0000:01:00.0: reg 0x18: [io  0x0000-0x0007]
[   21.740271] pci 0000:01:00.0: reg 0x1c: [io  0x0000-0x0003]
[   21.740306] pci 0000:01:00.0: reg 0x20: [io  0x0000-0x001f]
[   21.740416] pci 0000:01:00.0: reg 0x24: [mem 0x00000000-0x000001ff]
[   21.742660] pci 0000:01:00.0: BAR 5: assigned [mem 0x10000000-0x100001ff]
[   21.742709] pci 0000:01:00.0: BAR 4: assigned [io  0x1000-0x101f]
[   21.742770] pci 0000:01:00.0: BAR 0: assigned [io  0x1020-0x1027]
[   21.742803] pci 0000:01:00.0: BAR 2: assigned [io  0x1028-0x102f]
[   21.742834] pci 0000:01:00.0: BAR 1: assigned [io  0x1030-0x1033]
[   21.742866] pci 0000:01:00.0: BAR 3: assigned [io  0x1034-0x1037]
[   21.742935] pcieport 0000:00:01.0: PCI bridge to [bus 01]
[   21.742961] pcieport 0000:00:01.0:   bridge window [io  0x1000-0x1fff]
[   21.744805] pcieport 0000:00:01.0:   bridge window [mem 0x10000000-0x101fffff]
[   21.745749] pcieport 0000:00:01.0:   bridge window [mem 0x8000000000-0x80001fffff 64bit pref]

Let’s go deeper

Next day I turned off desktop for CPU cooler upgrade. During process I went through my box of expansion cards and plugged additional USB 3.0 controller (Renesas based). Also added SATA hard drive and connected it to previously added controller.

Once computer was back online I created new VM instance. This time I used Fedora 32 beta. But when I tried to add PCI Express card I got an error:

Error while starting domain: internal error: process exited while connecting to monitor: 2020-03-25T13:43:39.107524Z qemu-system-aarch64: -device vfio-pci,host=0000:29:00.0,id=hostdev0,bus=pci.3,addr=0x0: VFIO_MAP_DMA: -22
2020-03-25T13:43:39.107560Z qemu-system-aarch64: -device vfio-pci,host=0000:29:00.0,id=hostdev0,bus=pci.3,addr=0x0: vfio 0000:29:00.0: failed to setup container for group 28: memory listener initialization failed: Region mach-virt.ram: vfio_dma_map(0x563169753c80, 0x40000000, 0x100000000, 0x7fb2a3e00000) = -22 (Invalid argument)

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/", line 75, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/", line 111, in tmpcb
    callback(*args, **kwargs)
  File "/usr/share/virt-manager/virtManager/object/", line 66, in newfn
    ret = fn(self, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/object/", line 1279, in startup
  File "/usr/lib64/python3.8/site-packages/", line 1234, in create
    if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirt.libvirtError: internal error: process exited while connecting to monitor: 2020-03-25T13:43:39.107524Z qemu-system-aarch64: -device vfio-pci,host=0000:29:00.0,id=hostdev0,bus=pci.3,addr=0x0: VFIO_MAP_DMA: -22
2020-03-25T13:43:39.107560Z qemu-system-aarch64: -device vfio-pci,host=0000:29:00.0,id=hostdev0,bus=pci.3,addr=0x0: vfio 0000:29:00.0: failed to setup container for group 28: memory listener initialization failed: Region mach-virt.ram: vfio_dma_map(0x563169753c80, 0x40000000, 0x100000000, 0x7fb2a3e00000) = -22 (Invalid argument)

Hmm. It worked before. Tried other card — with the same effect.


Went to #qemu IRC channel and started discussing issue with QEMU developers. Turned out that probably no one tried sharing expansion cards to foreign architecture guest (in TCG mode instead of same architecture KVM mode).

As I had VM instance where sharing card worked I started checking what was wrong. After some restarts it was clear that crossing 3054 MB of guest memory was enough to get VFIO errors like above.


Issue not reported does not exist. So I opened a bug against QEMU. Filled it with error messages, “lspci” output data for used cards, QEMU command line (generated by libvirt) etc.

Looks like the problem lies in architecture differences between x86-64 (host) and aarch64 (guest). Let me quote Alex Williamson:

The issue is that the device needs to be able to DMA into guest RAM, and to do that transparently (ie. the guest doesn’t know it’s being virtualized), we need to map GPAs into the host IOMMU such that the guest interacts with the device in terms of GPAs, the host IOMMU translates that to HPAs. Thus the IOMMU needs to support GPA range of the guest as IOVA. However, there are ranges of IOVA space that the host IOMMU cannot map, for example the MSI range here is handled by the interrupt remmapper, not the DMA translation portion of the IOMMU (on physical ARM systems these are one-in-the-same, on x86 they are different components, using different mapping interfaces of the IOMMU). Therefore if the guest programmed the device to perform a DMA to 0xfee00000, the host IOMMU would see that as an MSI, not a DMA. When we do an x86 VM on and x86 host, both the host and the guest have complimentary reserved regions, which avoids this issue.

Also, to expand on what I mentioned on IRC, every x86 host is going to have some reserved range below 4G for this purpose, but if the aarch64 VM has no requirements for memory below 4G, the starting GPA for the VM could be at or above 4G and avoid this issue.

I have to admit that this is too low-level for me. I hope that the problem I hit will help someone to improve QEMU.

by Marcin Juszkiewicz at March 25, 2020 05:23 PM

ARM Datacenter Project

Understanding pthread_cond_broadcast

Recently we came across a piece of code in QEMU using the pthread_cond_broadcast function.

This method is intended to wake up all threads waiting on a condition variable. However, the method needs to be used with care. In particular, it should only be used if you can guarantee: a) that the waiter is in fact waiting or b) that there is another mechanism to wake up the waiter if the broadcast signal arrives when the thread is not waiting

For example, suppose we have the following code:

while (first_cpu->stopped) {
    pthread_cond_wait(first_cpu->halt_cond, first_cpu->lock);

    /* process any pending work */

Also suppose we have another thread which will call to pthread_cond_broadcast() to wakeup this thread.

If the above thread is waiting in pthread_cond_wait() when it is woken up by pthread_cond_broadcast(), then all is well.

However, if this thread is outside of the pthread_cond_wait() in the loop when pthread_cond_broadcast() is called, then this thread will not be woken up. In other words, when the thread loops around to pthread_cond_wait() it will NOT wait.

This means that either we need to guarantee the thread is waiting when the broadcast is sent OR we need to make sure that there is another way to wakeup the thread.

One other option is to change the pthread_cond_wait() to a pthread_cond_timedwait() to ensure that we will periodically perform this “pending_work(), even if the pthread_cond_brodcast() is missed.

by Rob Foley at March 25, 2020 10:25 AM

March 23, 2020

ARM Datacenter Project

Testing QEMU emulation: how to change QTest Accelerator

How can we change QEMU QTest to use different accelerators? And why would we do this?
This article is a follow-up to a prior article we posted on how to debug QEMU Qtests.

Each QTest will decide which accelerators it uses. For example, the test might try to use ‘kvm’, which causes QEMU to use KVM to execute code. Or the test might try to use ‘TCG’ support, where QEMU will emulate the instructions itself. Regardless of which path is chosen, this choice inevitably results in different code paths getting exercised inside QEMU itself.

In some cases when developing QEMU code, we might want to force certain code paths which are specific to different accelerators. In this case we have a few things to decide. Take the case for example, where we want to force a specific TCG code path on an aarch64 machine for an aarch64 QTest. We will use the tests/qtest/arm-cpu-features test as an example.

This test it selects the specific accelerator(s) to use for each test case. It is possible that we might want to force the use of a specific accelerator to force that code path in QEMU. We might want to use TCG instead of kvm for instance.

In this case we would need to edit the test, for instance tests/qtest/arm-cpu-features.c, and replace the use of “kvm” with “tcg”, or in cases where both -accel kvm and -accel tcg are used, just remove the kvm.

This will have the effect of forcing the use of a specific code path, which can be very useful when debugging or validating a change.

by Rob Foley at March 23, 2020 01:25 PM

March 20, 2020

ARM Datacenter Project

Testing QEMU emulation: how to debug QTest with gdb

QEMU has several different kinds of tests for exercising different aspects of the code. A page here has good details on these tests. Another very useful doument is testing.rst. Within this document it states that “QTest is a device emulation testing framework. It can be very useful to test device models; it could also control certain aspects of QEMU (such as virtual clock stepping), with a special purpose “qtest” protocol.”

We can run the QTests as part of either make check or make check-qtest.

If we run the QTests with the below command, it shows us more information about the commands it is executing.

make check-qtest V=1

For example, you might see something like this displayed.

$ QTEST_QEMU_BINARY=aarch64-softmmu/qemu-system-aarch64 QTEST_QEMU_IMG=qemu-img tests/qtest/tpm-tis-device-test -m=quick -k --tap < /dev/null | ./scripts/ --test-name="tpm-tis-device-test"
PASS 1 tpm-tis-device-test /aarch64/tpm-tis/test_check_localities
PASS 2 tpm-tis-device-test /aarch64/tpm-tis/test_check_access_reg
PASS 3 tpm-tis-device-test /aarch64/tpm-tis/test_check_access_reg_seize
PASS 4 tpm-tis-device-test /aarch64/tpm-tis/test_check_access_reg_release
PASS 5 tpm-tis-device-test /aarch64/tpm-tis/test_check_transmit

Let’s break this down a bit.
QTEST_QEMU_BINARY - This is the command that gets issued when starting QEMU. This is useful since we can add onto it if we would like to run QEMU within another command (like a debugger).
tests/qtest/tpm-tis-device-test -m=quick -k –tap - This is the actual QTest which will get executed. Inside this test it will launch QEMU.

Another useful option is:


Once you add that to any qtest command, it would look something like this with more information displayed.

$ QTEST_QEMU_BINARY=aarch64-softmmu/qemu-system-aarch64 QTEST_QEMU_IMG=qemu-img tests/qtest/bios-tables-test -m=quick -k --tap < /dev/null | ./scripts/ --test-name="bios-tables-test"  --show-failures-only --verbose
   random seed: R02S0d429b0279b778325d7c631f360b375b
   Start of aarch64 tests
   Start of acpi tests
   starting QEMU: exec aarch64-softmmu/qemu-system-aarch64 -qtest unix:/tmp/qtest-29244.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-29244.qmp,id=char0 -mon chardev=char0,mode=control -display none -machine virt  -accel tcg -nodefaults -nographic -drive if=pflash,format=raw,file=pc-bios/edk2-aarch64-code.fd,readonly -drive if=pflash,format=raw,file=pc-bios/edk2-arm-vars.fd,snapshot=on -cdrom tests/data/uefi-boot-images/bios-tables-test.aarch64.iso.qcow2 -cpu cortex-a57 -accel qtest
   Start of virt tests
   starting QEMU: exec aarch64-softmmu/qemu-system-aarch64 -qtest unix:/tmp/qtest-29244.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-29244.qmp,id=char0 -mon chardev=char0,mode=control -display none -machine virt  -accel tcg -nodefaults -nographic -drive if=pflash,format=raw,file=pc-bios/edk2-aarch64-code.fd,readonly -drive if=pflash,format=raw,file=pc-bios/edk2-arm-vars.fd,snapshot=on -cdrom tests/data/uefi-boot-images/bios-tables-test.aarch64.iso.qcow2  -cpu cortex-a57 -object memory-backend-ram,id=ram0,size=128M -numa node,memdev=ram0 -accel qtest
   starting QEMU: exec aarch64-softmmu/qemu-system-aarch64 -qtest unix:/tmp/qtest-29244.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-29244.qmp,id=char0 -mon chardev=char0,mode=control -display none -machine virt  -accel tcg -nodefaults -nographic -drive if=pflash,format=raw,file=pc-bios/edk2-aarch64-code.fd,readonly -drive if=pflash,format=raw,file=pc-bios/edk2-arm-vars.fd,snapshot=on -cdrom tests/data/uefi-boot-images/bios-tables-test.aarch64.iso.qcow2  -cpu cortex-a57 -m 256M,slots=3,maxmem=1G -object memory-backend-ram,id=ram0,size=128M -object memory-backend-ram,id=ram1,size=128M -numa node,memdev=ram0 -numa node,memdev=ram1 -numa dist,src=0,dst=1,val=21 -accel qtest
   End of virt tests
   End of acpi tests
   End of aarch64 tests

We can launch the QTest from the debugger with something like this.

QTEST_QEMU_BINARY=aarch64-softmmu/qemu-system-aarch64 QTEST_QEMU_IMG=qemu-img gdb --args tests/qtest/bios-tables-test -m=quick -k --tap

But what if we want to debug QEMU itself with gdb?

To achieve this we would change the QTEST_QEMU_BINARY to something like this:

QTEST_QEMU_BINARY="sudo xterm -e gdb --tty $(tty) --args aarch64-softmmu/qemu-system-aarch64"

At least on our system we found that this only works by using sudo in front of xterm. Otherwise we found that the debugger actually seems to crash and exit with error. :(

When you launch the test, an xterm will pop up with a window. Inside that window, just hit r to run the test. Also keep in mind that you need to have the DISPLAY environment variable set for xterm to pop up.

export DISPLAY=12.345.67.89:0

When you use the verbose option you get to see the actual QEMU command used.

It might look something like this:

 starting QEMU: exec i386-softmmu/qemu-system-i386 -qtest unix:/tmp/qtest-33869.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-33869.qmp,id=char0 -mon chardev=char0,mode=control -display none -machine q35,kernel-irqchip=off -accel kvm -accel tcg -net none -display none -device pci-bridge,chassis_nr=1 -drive id=hd0,if=none,file=tests/acpi-test-disk-X74eKE,format=raw -device ide-hd,drive=hd0  -accel qtest

We also posted a follow-up article on how to change QEMU Qtest accelerators.

by Rob Foley at March 20, 2020 08:28 PM

March 12, 2020

ARM Datacenter Project

Testing QEMU emulation: how to debug kvm-unit-tests

kvm-unit-tests are a set of low level tests designed to exercise KVM. These tests are completely separate from QEMU (a different repo), but these tests can also be used to exercise QEMU.

Testing QEMU with kvm-unit-tests is the use case we will explore below.
A comprehensive description of kvm-unit-tests is outside the scope of this blog post, but check out this description for more details. It is worth noting that these tests launch QEMU using their test as the -kernel argument, and thus debugging the test can be viewed as similar to debugging a kernel under QEMU.

To run the tests with QEMU use a command similar to the below.
The -v option shows the commands it is issuing, which we will need for the next step.
The QEMU= is needed to point it to the QEMU you want to test, and the ACCEL=tcg is needed since we are testing with QEMU (instead of kvm).
There are more details on running the tests in the kvm-unit-tests README.

QEMU=../../qemu/build/aarch64-softmmu/qemu-system-aarch64 ACCEL=tcg ./ -v

When a test fails, the log directory (./logs) contains the log files for all tests. A failure might look like this.

$ TIMEOUT=0 QEMU=../../qemu/build/aarch64-softmmu/qemu-system-aarch64 ACCEL=tcg ./ -v
TESTNAME=gicv2-mmio TIMEOUT=0 ACCEL=tcg ./arm/run arm/gic.flat -smp $((($MAX_SMP < 8)?$MAX_SMP:8)) -machine gic-version=2 -append 'mmio'
PASS gicv2-mmio (17 tests, 1 skipped)
TESTNAME=gicv2-mmio-up TIMEOUT=0 ACCEL=tcg ./arm/run arm/gic.flat -smp 1 -machine gic-version=2 -append 'mmio'
FAIL gicv2-mmio-up (17 tests, 2 unexpected failures)
TESTNAME=gicv2-mmio-3p TIMEOUT=0 ACCEL=tcg ./arm/run arm/gic.flat -smp $((($MAX_SMP < 3)?$MAX_SMP:3)) -machine gic-version=2 -append 'mmio'
FAIL gicv2-mmio-3p (17 tests, 3 unexpected failures)

The first line of the log file has the command needed to run this test standalone.

../../qemu/build/aarch64-softmmu/qemu-system-aarch64 -nodefaults -machine virt,accel=tcg -cpu cortex-a57 -device virtio-serial-device -device virtconsole,chardev=ctd -chardev testdev,id=ctd -device pci-testdev -display none -serial stdio -kernel arm/gic.flat -smp 1 -machine gic-version=2 -append mmio # -initrd /tmp/tmp.RrDNvP8sPT

In order to attach the debugger we want to add two arguments to the above command line.
-s tells QEMU to use the TCP port :1234
-S will pause at startup, waiting for the debugger to attach.

../../qemu/build/aarch64-softmmu/qemu-system-aarch64 -s -S -nodefaults -machine virt,accel=tcg -cpu cortex-a57 -device virtio-serial-device -device virtconsole,chardev=ctd -chardev testdev,id=ctd -device pci-testdev -display none -serial stdio -kernel arm/gic.flat -smp 1 -machine gic-version=2 -append mmio # -initrd /tmp/tmp.RrDNvP8sPT

Then in another console, we will run gdb.

It is worth noting that in this case we are debugging an aarch64 QEMU on an x86_64 host. So we will need to use the gdb-multiarch since it can do cross debugging.

$ gdb-multiarch
(gdb) set arch aarch64
The target architecture is assumed to be aarch64
(gdb) file ./arm/gic.elf
Reading symbols from ./arm/gic.elf...(no debugging symbols found)...done.

Note how it says no debugging symbols found above. That indicates that the .elf file is missing symbols. Note that the .elf file is essentially the built “kernel” that we are going to give to QEMU. There is also a similar .flat file. This is the same as the elf, but just stripped of any unnecessary sections.

In order to get the symbols built into the .elf file, we need to modify the Makefile.

The issue is that when we link the .elf file, it excludes the symbols. Because of the special way we are constructing the test, as essentially a kernel, kvm-unit-test uses a custom linker script. The script is in /arm/

There is a section of the where we tell the linker which sections of the objects to set aside in the resulting binary.

    /DISCARD/ : {

We will need to remove the “*(.debug)” line so that the symbol table is not excluded (!excluded == included).

This section will then look like the below.

    /DISCARD/ : {

Once we re-build the binaries, we can resume our debugging process where we left off launching qemu in one console and then launching gdb-multiarch in a different console.

$ gdb-multiarch
(gdb) set arch aarch64
The target architecture is assumed to be aarch64
(gdb) file ./arm/gic.elf
Reading symbols from ./arm/gic.elf...done.

Note how the symbols were read successfully !

But we’re not out of the woods yet. To get the symbols loaded properly at the correct address, we need to first find the location where our code is loaded.

We will first attach to the target. Note that we use :1234 this is the port that was specified for us by the -s command to QEMU.

(gdb) target remote :1234
Remote debugging using :1234
warning: No executable has been specified and target does not support
determining executable automatically.  Try using the "file" command.
0x0000000040000000 in ?? ()

We then disassemble the first snippet of code. We disassemble using the address which is displayed after we attach to the target: 0x0000000040000000

(gdb) disassemble 0x0000000040000000,+100
Dump of assembler code from 0x40000000 to 0x40000064:
=> 0x0000000040000000:  ldr x0, 0x40000018
   0x0000000040000004:  mov x1, xzr
   0x0000000040000008:  mov x2, xzr
   0x000000004000000c:  mov x3, xzr
   0x0000000040000010:  ldr x4, 0x40000020
   0x0000000040000014:  br  x4
   0x0000000040000018:  .inst   0x44000000 ; undefined
   0x000000004000001c:  .inst   0x00000000 ; undefined
   0x0000000040000020:  .inst   0x40080000 ; undefined

If you follow the assembly above, you will see that we load in an address to jump to: 0x40080000

This happens to be the address where we loaded our code.

In order to load the symbols we need to determine the current location of our .text and .data segments.

“info files” will list the current offsets for our gic.elf binary.

(gdb) info files
Symbols from "/home/rob/qemu/alex-kvm-unit-tests/build/arm/gic.elf".
Local exec file:
    `/home/rob/qemu/alex-kvm-unit-tests/build/arm/gic.elf', file type elf64-littleaarch64.
    Entry point: 0x0
    0x0000000000000000 - 0x000000000000ef84 is .text
    0x0000000000010720 - 0x0000000000010768 is .dynsym
    0x0000000000010768 - 0x0000000000010769 is .dynstr
    0x0000000000010770 - 0x0000000000010788 is .hash
    0x0000000000010788 - 0x00000000000107a4 is .gnu.hash
    0x00000000000107a8 - 0x00000000000107c8 is .got
    0x00000000000107c8 - 0x0000000000012308 is .rodata
    0x0000000000012308 - 0x0000000000013670 is .data
    0x0000000000013670 - 0x0000000000022798 is .bss
    0x0000000000010000 - 0x0000000000010720 is .rela.dyn

Unload the current symbols with the “file” command (no argument). Note that we need to answer ‘y’ below.

(gdb) file
No executable file now.
Discard symbol table from `/home/rob/qemu/alex-kvm-unit-tests/build/arm/gic.elf'? (y or n) y
No symbol file now.

Find the current .text and .data addresses. Since we know our code was loaded at the address we found above (0x40080000), we merely need to add the offsets from the table to the load address.

.text is at offset 0 in the table, so the address is 0x40080000 .data is at offset 0x12308, so 0x40080000 + 0x12308 = 0x40092308

The add-symbol-file command will map the symbols properly. Note again we need to answer ‘y’ below.

(gdb) add-symbol-file ./arm/gic.elf 0x40080000 -s .data 0x40092308
add symbol table from file "./arm/gic.elf" at
    .text_addr = 0x40080000
    .data_addr = 0x40092308
(y or n) y

That’s it! The symbols are loaded.

We can tests this by setting a breakpoint or listing code.

Reading symbols from ./arm/gic.elf...done.
(gdb) b main
Breakpoint 1 at 0x40081748: file /home/rob/qemu/alex-kvm-unit-tests/arm/gic.c, line 517.
(gdb) l main
511     if (gic_version() == 2)
512         test_targets(nr_irqs);
513 }
515 int main(int argc, char **argv)
516 {
517     if (!gic_init()) {

by Rob Foley at March 12, 2020 08:48 PM

March 10, 2020

ARM Datacenter Project

A Lightweight Thread based storage runtime


Massive concurrency is the primary signature of Enterprise storage IO workloads. POSIX threads, a.k.a. pthreads, implemented in a Unix-like OS are too “heavy” to effectively support such concurrency.

The overhead incurred by pthreads can be grouped into two main areas:

1) memory cost of maintaining thread computational state, which includes duplicating all registers and data (stack, heap) plus memory associated with tracking the thread-state;

2) context switch cost, which includes cost of changing processors context (registers, counters etc) and changing computational context (interrupts, scheduling and processor and RAM cache).

These costs put a physical limit on how many pthreads a system can have and dictates how often a context switch may be profitably utilized.

User space lightweight threads (ULTs) are an alternative to traditional threads. ULTs allow a number of logical threads to share a thread context in a single physical pthread, thereby reducing memory consumption. A non-preemptive user space scheduler allows an application to switch between lightweight threads without involving the OS/kernel system scheduler. We can envision a runtime/library, which combines the features of a) sharing a single physical thread and b) user space scheduling. This runtime will have the advantage of being more efficient for managing large numbers of short-duration work items. We should mention that we expect these work items to have few external dependencies such as system calls.

Argobots Runtime for Storage (ABT-ST)

There are many lightweight thread implementations. Proprietary Microsft UMS scheduler, open source Argobots are a couple of examples. The Argobots Runtime for Storage builds upon Argobots framework and serves the need of storage workload. The main goal is to provide support for some typical storage workload characteristics:

  1. Mapping: A mechanism to evenly distribute I/O load across the available CPUS. When an I/O arrives, it is assigned to a lightweight thread (LWT), which is associated with a single load-group. A load-group has an OS thread that is bound to a single CPU core, and is shared by many lightweight threads (LWT). With the non-preemptive/cooperative user mode scheduler, instructions within the load-group are serialized and do not require a spinlock. This improves thread level parallelism, which is vital in today’s storage servers with large numbers of cores. It is especially important for ARM servers with hundreds of cores and multiple NUMA domains.

  2. Workload rebalance: When the load among physical CPU cores becomes imbalanced, a rebalancer will redistribute LWTs to optimally utilize CPU resources. The algorithm for rebalancing will also take different NUMA topologies into consideration when deciding how to migrate load-groups to different CPUs. See diagram rebalance:rebalance

  3. Stackable schedulers: Each load-group has its own scheduler, while a group of sequential instruction streams executed on a CPU core can have a main scheduler. This allows different types of load-groups to share a CPU core while keep their own scheduler. For example, front end write to cache IO load-group and backend write to disk IO load-group may share a CPU core, while keep their own low latency scheduler and high throughput scheduler.

The runtime source code is hosted at this github link. It is a work in progress as of today.

by Li at March 10, 2020 08:41 PM

March 05, 2020

ARM Datacenter Project

QEMU aarch64 ARM VMs

This is a follow-up to a prior post on how to launch ARM aarch64 VMs from scratch.

We are working on QEMU enhancements to support aarch64 ARM VMs inside QEMU’s vm-build infrastructure. This is a bit of test infrastructure which allows for building and testing QEMU source code within various flavors of VMs.

The aarch64 VMs are supported for Ubuntu and CentOS VMs.

Although we are working to this upstreamed into mainline QEMU, the current WIP for this project is here:

Note that this support is also being used by the lisa-qemu integration in order to enable easier testing and debug of varied hardware architectures.

To try this out you can see the available VMs via:

$ make vm-help

vm-build-ubuntu.aarch64 - Build QEMU in ubuntu aarch64 VM 
vm-build-centos.aarch64 - Build QEMU in CentOS aarch64 VM 
QEMU_CONFIG=/path/conf.yml - Change path to VM configuration .yml file. See config_example.yml for file format details.

To create the VM and build qemu inside it:

$ make vm-build-ubuntu.aarch64

or to make the VM and ssh into it:

$ make vm-boot-ssh-ubuntu.aarch64

Configuration yaml

We also support providing a configuration yaml file. This allows passing specific arguments to qemu to configure the hardware. For example, in our config_example.yaml, we configure a 4 NUMA node topology. To use a specific configuration, just provide the configuration YAML like so:

$ QEMU_CONFIG=../tests/vm/config_default.yml make vm-boot-ssh-ubuntu.aarch64

by Rob Foley at March 05, 2020 12:15 PM

March 04, 2020

ARM Datacenter Project

How to configure NUMA nodes with QEMU

QEMU does allow very flexible configuration of NUMA node topology.

When starting QEMU we can select a NUMA topology with the -numa argument.

In its most basic form, we can specify the CPUs assigned to each NUMA node.

-smp cpus=16 -numa node,cpus=0-3,nodeid=0 \
-numa node,cpus=4-7,nodeid=1 \
-numa node,cpus=8-11,nodeid=2 \
-numa node,cpus=12-15,nodeid=3 \

This gives us a system that looks like the following from lscpu.

Note below that we specified 4 NUMA nodes with each NUMA node containing 4 CPUs.

qemu@ubuntu:~$ lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              16
On-line CPU(s) list: 0-15
Thread(s) per core:  1
Core(s) per socket:  8
Socket(s):           2
NUMA node(s):        4
Vendor ID:           ARM
Model:               0
Model name:          Cortex-A57
Stepping:            r1p0
BogoMIPS:            125.00
NUMA node0 CPU(s):   0-3
NUMA node1 CPU(s):   4-7
NUMA node2 CPU(s):   8-11
NUMA node3 CPU(s):   12-15
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid

To go one step further we can also specify the NUMA distance between each CPU.

Suppose we have the following NUMA topology.

 ________      ________
|        |    |        |
| Node 0 |    | Node 1 |
|        |-15-|        |
|________|    |________|
     |    \  /     |
    20     20     20
 ____|___ /  \ ____|___
|        |/   |        |
| Node 2 |    | Node 3 |
|        |-15-|        |
|________|    |________|

We can use the -numa dist option to add on the specific NUMA distances.

-smp cpus=16 -numa node,cpus=0-3,nodeid=0 \
-numa node,cpus=4-7,nodeid=1 \
-numa node,cpus=8-11,nodeid=2 \
-numa node,cpus=12-15,nodeid=3 \
-numa dist,src=0,dst=1,val=15 \
-numa dist,src=2,dst=3,val=15 \
-numa dist,src=0,dst=2,val=20 \
-numa dist,src=0,dst=3,val=20 \
-numa dist,src=1,dst=2,val=20 \
-numa dist,src=1,dst=3,val=20

The numactl command will confirm the distances that we configured in QEMU.

qemu@ubuntu:~$ numactl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3
node 0 size: 983 MB
node 0 free: 852 MB
node 1 cpus: 4 5 6 7
node 1 size: 1007 MB
node 1 free: 923 MB
node 2 cpus: 8 9 10 11
node 2 size: 943 MB
node 2 free: 812 MB
node 3 cpus: 12 13 14 15
node 3 size: 1007 MB
node 3 free: 916 MB
node distances:
node   0   1   2   3 
  0:  10  15  20  20 
  1:  15  10  20  20 
  2:  20  20  10  15 
  3:  20  20  15  10 

by Rob Foley at March 04, 2020 08:32 PM

How to launch ARM aarch64 VM with QEMU from scratch.

The below instructions will allow for bringing up an ARM VM from scratch.

It is worth mentioning that our work includes automating all of the below, resulting in a more streamlined option to building and launching the VM. See the article on QEMU aarch64 VMs for more details.

To launch an aarch64 VM we first need to install a few dependencies, including QEMU and the qemu-efi-aarch64 package, which includes the efi firmware.

apt-get install qemu-system-arm
apt-get install qemu-efi-aarch64
apt-get install qemu-utils

Create the flash images with the correct sizes.

dd if=/dev/zero of=flash1.img bs=1M count=64
dd if=/dev/zero of=flash0.img bs=1M count=64
dd if=/usr/share/qemu-efi-aarch64/QEMU_EFI.fd of=flash0.img conv=notrunc

Download the image you want to boot.

For our example we use an Ubuntu installer.


Create the empty Ubuntu image file we will install Ubuntu into.

We will use 20 gigabytes for this file.

qemu-img create ubuntu-image.img 20G

Start QEMU with the installer.

qemu-system-aarch64 -nographic -machine virt,gic-version=max -m 512M -cpu max -smp 4 \
-netdev user,id=vnet,hostfwd=: -device virtio-net-pci,netdev=vnet \
-drive file=ubuntu-image.img,if=none,id=drive0,cache=writeback -device virtio-blk,drive=drive0,bootindex=0 \
-drive file=mini.iso,if=none,id=drive1,cache=writeback -device virtio-blk,drive=drive1,bootindex=1 \
-drive file=flash0.img,format=raw,if=pflash -drive file=flash1.img,format=raw,if=pflash 

Follow the instructions to install Ubuntu to the ubuntu-image.img file.

Once the install is finished you can exit QEMU with -a x.

Then restart QEMU without the installer image with the following command.

qemu-system-aarch64 -nographic -machine virt,gic-version=max -m 512M -cpu max -smp 4 \
-netdev user,id=vnet,hostfwd=: -device virtio-net-pci,netdev=vnet \
-drive file=ubuntu-image.img,if=none,id=drive0,cache=writeback -device virtio-blk,drive=drive0,bootindex=0 \
-drive file=flash0.img,format=raw,if=pflash -drive file=flash1.img,format=raw,if=pflash 

by Rob Foley at March 04, 2020 07:16 PM

Marcin Juszkiewicz

CirrOS 0.5.0 released

Someone may say that I am main reason why CirrOS project does releases.

In 2016 I got task at Linaro to get it running on AArch64. More details are in my blog post ‘my work on changing CirrOS images’. Result was 0.4.0 release.

Last year I got another task at Linaro. So we released 0.5.0 version today.

But that’s not how it happened.

Multiple contributors

Since 0.4.0 release there were changes done by several developers.

Robin H. Johnson took care of kernel modules. Added new ones, updated names. Also added several new features.

Murilo Opsfelder Araujo fixed build on Ubuntu 16.04.3 as gcc changed preprocessor output.

Jens Harbott took care of lack of space for data read from config-drive.

Paul Martin upgraded CirrOS build system to BuildRoot 2019.02.1 and bumped kernel/grub versions.

Maciej Józefczyk took care of metadata requests.

Marcin Sobczyk fixed starting of Dropbear and dropped creation of DSS ssh key which was no longer supported.

My Linaro work

At Linaro I got Jira card with “Upgrade CirrOS’ kernel to Ubuntu 18.04’s kernel” title.

This was needed as 4.4 kernel was far too old and gave us several booting issues. Internally we had builds with 4.15 kernel but it should be done properly and upstream.

So I fetched code, did some test builds and started looking how to improve situation. Spoke with Scott Moser (owner of CirrOS project) and he told me about his plans to migrate from Launchpad to GitHub. So we did that in December 2019 and then fun started.

Continuous Integration

GitHub has several ways of adding CI to projects. First we tried GitHub Actions but turned out that it is paid service. Looked around and then I decided to go with Travis CI.

Scott generated all required keys and integration started. Soon we had every pull request going through CI. Then I added simple script (bin/test-boot) so each image was booted after build. Scott improved script and fixed Power boot issue.

Next step was caching downloads and ccache files. This was huge improvement!

In meantime Travis bumped free service to 5 simultaneous builders which got our builds even faster.

CirrOS supports building only under Ubuntu LTS. But I use Fedora so we merged two changes to make sure that proper ‘grub(2)-mkimage’ command is used.

Kernel changes

4.4 kernel had to go. First idea was to move to 4.18 from Ubuntu 18.04 release. But if we upgrade then why not going for HWE one? I checked 5.0 and 5.3 versions. As both worked fine we decided to go with newer one.

Modules changes

During start of CirrOS image several kernel modules are loaded. But there were several “no kernel module found” like messages for built-in ones.

We took care of it by querying /sys/module/ directory so now module loading is quiet process. At the end a list of loaded ones is printed.

VirtIO changes

Lot of things happened since 4.4 kernels. So we added several VirtIO modules.

One of results is working graphical console on AArch64. Thanks to ‘virtio-gpu’ providing framebuffer and ‘hid-generic’ handling usb input devices.

As lack of entropy is common issue in VM instances we added ‘virtio-rng’ module. No more ‘uninitialized urandom read’ messages from kernel.

Final words

Yesterday Scott created 0.5.0 tag and CI built all release images. Then I wrote release notes (based on ones from pre-releases). Kolla project got patch to move to use new version.

When next release? Looking at history someone may say 2023 as previous one was in 2016 year. But know knows. Maybe we will get someone with “please add s390x support” question ;D

by Marcin Juszkiewicz at March 04, 2020 10:53 AM

Powered by Planet!
Last updated: July 04, 2020 06:10 PMEdit this page