Blogging about open source virtualization

News from QEMU, KVM, libvirt, libguestfs, virt-manager and related tools

Subscriptions

Planet Feeds

November 21, 2022

KVM on Z

New Release: RHEL 9.1 on IBM zSystems

After releasing RHEL 8.7 a week before, Red Hat now published RHEL 9.1, see the press release here! It ships, among others:

  • QEMU v7.0
  • libvirt v8.5

Note that RHEL9.1 is NOT an EUS (Extended Update Support) release, so it will go out of support with the GA of RHEL 9.2. For details, please see the "Red Hat Enterprise Linux Life Cycle" here.



by Stefan Raspl (noreply@blogger.com) at November 21, 2022 04:06 PM

IBM Cloud Infrastructure Center 1.1.6 with several KVM on IBM zSystems enhancements

The new version if IBM Cloud Infrastructure Center is available and has several improvements for KVM on IBM zSystems:

    • Consistency group and group snapshots
    • Security group on Red Hat KVM
    • Hybrid hypervisor support
    • Virtual machine backup and restore on KVM
    • Performance enhancement in Day2 operations
    • Visualization of network related components


by Christian Bornträger (noreply@blogger.com) at November 21, 2022 12:43 PM

November 18, 2022

Stefan Hajnoczi

LWN article on "Accessing QEMU storage features without a VM"

At KVM Forum 2022 Kevin Wolf and Stefano Garzarella gave a talk on qemu-storage-daemon, a way to get QEMU's storage functionality without running a VM. It's great for accessing disk images, basically taking the older qemu-nbd to the next level. The cool thing is this makes QEMU's software-defined storage functionality - block devices with snapshots, incremental backup, image file formats, etc - available to other programs. Backup and forensics tools as well as other types of programs can take advantage of qemu-storage-daemon.

Here is the full article about Accessing QEMU storage features without a VM. Enjoy!

by Unknown (noreply@blogger.com) at November 18, 2022 08:44 PM

KVM on Z

Documentation: Red Hat High Availability

Our solution assurance team published a new paper, providing guidance together with hints and tricks and practical examples to help you configure and use the Red Hat Enterprise Linux High Availability Add-on (Red Hat HA).

You can access the paper here.

by Stefan Raspl (noreply@blogger.com) at November 18, 2022 01:48 PM

November 17, 2022

QEMU project

Introduction to Zoned Storage Emulation

This summer I worked on adding Zoned Block Device (ZBD) support to virtio-blk as part of the Outreachy internship program. QEMU hasn’t directly supported ZBDs before so this article explains how they work and why QEMU needed to be extended.

Zoned block devices

Zoned block devices (ZBDs) are divided into regions called zones that can only be written sequentially. By only allowing sequential writes, SSD write amplification can be reduced by eliminating the need for a Flash Translation Layer, and potentially lead to higher throughput and increased capacity. Providing a new storage software stack, zoned storage concepts are standardized as ZBC (SCSI standard), ZAC (ATA standard), and ZNS (NVMe). Meanwhile, the virtio protocol for block devices(virtio-blk) should also be aware of ZBDs instead of taking them as regular block devices. It should be able to pass such devices through to the guest. An overview of necessary work is as follows:

  1. Virtio protocol: extend virtio-blk protocol with main zoned storage concept, Dmitry Fomichev
  2. Linux: implement the virtio specification extensions, Dmitry Fomichev
  3. QEMU: add zoned storage APIs to the block layer, Sam Li
  4. QEMU: implement zoned storage support in virtio-blk emulation, Sam Li

Once the QEMU and Linux patches have been merged it will be possible to expose a virtio-blk ZBD to the guest like this:

-blockdev node-name=drive0,driver=zoned_host_device,filename=/path/to/zbd,cache.direct=on \
-device virtio-blk-pci,drive=drive0 \

And then we can perform zoned block commands on that device in the guest os.

# blkzone report /dev/vda
start: 0x000000000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
start: 0x000020000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
start: 0x000040000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
start: 0x000060000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
start: 0x000080000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
start: 0x0000a0000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
start: 0x0000c0000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
start: 0x0000e0000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
start: 0x000100000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x000120000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x000140000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x000160000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]

Zoned emulation

Currently, QEMU can support zoned devices by virtio-scsi or PCI device passthrough. It needs to specify the device type it is talking to. Whereas storage controller emulation uses block layer APIs instead of directly accessing disk images. Extending virtio-blk emulation avoids code duplication and simplify the support by hiding the device types under a unified zoned storage interface, simplifying VM deployment for different types of zoned devices. Virtio-blk can also be implemented in hardware. If those devices wish to follow the zoned storage model then the virtio-blk specification needs to natively support zoned storage. With such support, individual NVMe namespaces or anything that is a zoned Linux block device can be exposed to the guest without passing through a full device.

For zoned storage emulation, zoned storage APIs support three zoned models (conventional, host-managed, host-aware) , four zone management commands (Report Zone, Open Zone, Close Zone, Finish Zone), and Append Zone. The QEMU block layer has a BlockDriverState graph that propagates device information inside block layer. File-posix driver is the lowest level within the graph where zoned storage APIs reside.

After receiving the block driver states, Virtio-blk emulation recognizes zoned devices and sends the zoned feature bit to guest. Then the guest can see the zoned device in the host. When the guest executes zoned operations, virtio-blk driver issues corresponding requests that will be captured by viritio-blk device inside QEMU. Afterwards, virtio-blk device sends the requests to file-posix driver which will perform zoned operations using Linux ioctls.

Unlike zone management operations, Linux doesn’t have a user API to issue zone append requests to zoned devices from user space. With the help of write pointer emulation tracking locations of write pointer of each zone, QEMU block layer can perform append writes by modifying regular writes. Write pointer locks guarantee the execution of requests. Upon failure it must not update the write pointer location which is only got updated when the request is successfully finished.

Problems can always be solved with right mind and right tools. A good approach to avoid pitfalls of programs is test-driven. In the beginning, users like qemu-io commands utility can invoke new block layer APIs. Moving towards to guest, existing tools like blktests, zonefs-tools, and fio are introduced for broader testing. Depending on the size of the zoned device, some tests may take long enough time to finish. Besides, tracing is also a good tool for spotting bugs. QEMU tracking tools and blktrace monitors block layer IO, providing detailed information to analysis.

Starting the journey with open source

As a student interested in computer science, I am enthusiastic about making real applications and fortunate to find the opportunity in this summer. I have a wonderful experience with QEMU where I get chance to work with experienced engineers and meet peers sharing same interests. It is a good starting point for me to continue my search on storage systems and open source projects.

Public communication, reaching out to people and admitting to failures used to be hard for me. Those feelings had faded away as I put more effort to this project over time. For people may having the same trouble as me, it might be useful to focus on the tasks ahead of you instead of worrying about the consequences of rejections from others.

Finally, I would like to thank Stefan Hajnoczi, Damien Le Moal, Dmitry Fomichev, and Hannes Reinecke for mentoring me - they have guided me through this project with patience and expertise, when I hit obstacles on design or implementations, and introduced a fun and vibrant open source world for me. Also thank QEMU community and Outreachy for organizing this program.

Conclusion

The current status for this project is waiting for virtio specifications extension and Linux driver support patches got accepted. And the up-to-date patch series of zoned device support welcome any new comments.

The next step for zoned storage emulation in QEMU is to enable full zoned emulation through virtio-blk. Adding support on top of a regular file, it allows developers accessing a zoned device environment without real zoned storage hardwares. Furthermore, virtio-scsi may need to add full emulation support to complete the zoned storage picture in QEMU. QEMU NVMe ZNS emulation can also use new block layer APIs to attach real zoned storage if the emulation is used in production in future.

by Sam Li at November 17, 2022 12:00 AM

November 13, 2022

KVM on Z

New Release: RHEL 8.7 on IBM zSystems

RHEL 8.7 is out, see the blog entry here! It ships, among others:

  • QEMU v6.2, supporting virtio-fs on IBM Z
  • libvirt v8.0

For a detailed list of Linux on Z-specific changes, see the release notes.

by Stefan Raspl (noreply@blogger.com) at November 13, 2022 10:42 PM

November 10, 2022

Stefan Hajnoczi

Using qemu-img to access vhost-user-blk storage

vhost-user-blk is a high-performance storage protocol that connects virtual machines to software-defined storage like SPDK or qemu-storage-daemon. Until now, tool support for vhost-user-blk has been lacking. Accessing vhost-user-blk devices involved running a virtual machine, which requires more setup than one would like.

QEMU 7.2 adds vhost-user-blk support to the qemu-img tool. This is possible thanks to libblkio, a library that other programs besides QEMU can use too.

Check for vhost-user-blk support in your installed qemu-img version like this (if it says 0 then you need to update qemu-img or compile it from source with libblkio enabled):

$ qemu-img --help | grep virtio-blk-vhost-user | wc -l
1

You can copy a raw disk image file into a vhost-user-blk device like this:


$ qemu-img convert \
--target-image-opts \
-n \
test.img \
driver=virtio-blk-vhost-user,path=/tmp/vhost-user-blk.sock,cache.direct=on

The contents of the vhost-user-blk device can be saved as a qcow2 image file like this:

$ qemu-img convert \
--image-opts \
-O qcow2 \
driver=virtio-blk-vhost-user,path=/tmp/vhost-user-blk.sock,cache.direct=on out.qcow2

The size of the virtual disk can be read:

$ qemu-img info \
--image-opts \
driver=virtio-blk-vhost-user,path=/tmp/vhost-user-blk.sock,cache.direct=on
image: json:{"driver": "virtio-blk-vhost-user"}
file format: virtio-blk-vhost-user
virtual size: 4 GiB (4294967296 bytes)
disk size: unavailable

Other qemu-img sub-commands like bench and dd are also available for quickly accessing the vhost-user-blk device without running a virtual machine:


$ qemu-img bench \
--image-opts \
driver=virtio-blk-vhost-user,path=/tmp/vhost-user-blk.sock,cache.direct=on
Sending 75000 read requests, 4096 bytes each, 64 in parallel (starting at offset 0, step size 4096)
Run completed in 1.443 seconds.

Being able to access vhost-user-blk devices from qemu-img makes vhost-user-blk a little easier to work with.

by Unknown (noreply@blogger.com) at November 10, 2022 07:35 PM

October 25, 2022

KVM on Z

New Release: Ubuntu 22.10

Canonical released a new version of their Ubuntu server offering Ubuntu Server 22.10!

One of the highlights is the addition of a new feature providing Secure Execution attestation.

See the announcement on the mailing list here, and the blog entry at Canonical with Z-specific highlights here.

by Stefan Raspl (noreply@blogger.com) at October 25, 2022 06:55 PM

October 19, 2022

KVM on Z

RHEL 8.6: RHEL-AV is no longer necessary for the latest version of KVM

by Christian Bornträger (noreply@blogger.com) at October 19, 2022 01:48 PM

August 30, 2022

QEMU project

QEMU version 7.1.0 released

We’d like to announce the availability of the QEMU 7.1.0 release. This release contains 2800+ commits from 238 authors.

You can grab the tarball from our download page. The full list of changes are available in the Wiki.

Highlights include:

  • Live migration: support for zero-copy-send on Linux
  • QMP: new options for exporting NBD images with dirty bitmaps via ‘block-export-add’ command
  • QMP: new ‘query-stats’ and ‘query-stats-schema’ commands for retrieving statistics from various QEMU subsystems
  • QEMU guest agent: improved Solaris support, new commands ‘guest-get-diskstats’/’guest-get-cpustats’, ‘guest-get-disks’ now reports NVMe SMART information, and ‘guest-get-fsinfo’ now reports NVMe bus-type
  • ARM: emulation support for new machine types: Aspeed AST1030 SoC, Qaulcomm, and fby35 (AST2600 / AST1030)
  • ARM: emulation support for Cortex-A76 and Neoverse-N1 CPUs
  • ARM: emulation support for Scalable Matrix Extensions, cache speculation control, RAS, and many other CPU extensions
  • ARM: ‘virt’ board now supports emulation of GICv4.0
  • HPPA: new SeaBIOS v6 firmware with support for PS/2 keyboard in boot menu when running with GTK UI, improved serial port emulation, and additional STI text fonts
  • LoongArch: initial support for LoongArch64 architecture, Loongson 3A5000 multiprocessor SoC, and the Loongson 7A1000 host bridge
  • MIPS: Nios2 board (-machine 10m50-ghrd) now support Vectored Interrupt Controller, shadow register sets, and improved exception handling
  • OpenRISC: ‘or1k-sim’ machine now support 4 16550A UART serial devices instead of 1
  • RISC-V: new ISA extensions with support for privileged spec version 1.12.0, software access to MIP SEIP, Sdtrig extension, vector extension improvements, native debug, PMU improvements, and many other features and miscellaneous fixes/improvements
  • RISC-V: ‘virt’ board now supports TPM
  • RISC-V: ‘OpenTitan’ board now supports Ibex SPI
  • s390x: emulation support for s390x Vector-Enhancements Facility 2
  • s390x: s390-ccw BIOS now supports booting from drives with non-512 sector sizes
  • x86: virtualization support for architectural LBRs
  • Xtensa: support for lx106 core and cache testing opcodes
  • and lots more…

Thank you to everyone involved!

August 30, 2022 11:00 PM

July 19, 2022

Gerd Hoffmann

edk2 and firmware packaging

Firmware autobuilder goes EOL

Some people already noticed and asked questions. So guess I better write things down in my blog so I don't have to answer the questions over and over again, and I hope to also clarify some things on distro firmware builds.

So, yes, the jenkins autobuilder creating the firmware repository at https://www.kraxel.org/repos/jenkins/ has been shutdown yesterday (Jul 19th 2020). The repository will stay online for the time being, so your establish workflows will not instantly break. But the repository will not get updates any more, so it is wise to start looking for alternatives now.

The obvious primary choice would be to just use the firmware builds provided by your distribution. I'll cover edk2 only, which seems to be the by far most popular use, even thought here are also builds for other firmware projects.

RHEL / Fedora edk2 firmware builds

Given I'm quite familier with the RHEL / Fedora world I can give some advise here. The edk2-ovmf package comes with multiple images for the firmware code and the varstore template which allow for various combinations. The most important ones are:

OVMF_CODE.secboot.fd and OVMF_VARS.secboot.fd
Run the secure-boot capable firmware build with secure boot enabled. The varstore has the microsoft secure boot keys enrolled and secure boot enabled.
Requires q35. Requires smm mode support (which is enabled by default these days).
OVMF_CODE.secboot.fd and OVMF_VARS.fd
Run the secure-boot capable firmware build with secure boot disabled. The varstore is blank.
Requires q35 and smm mode support too.
OVMF_CODE.fd and OVMF_VARS.fd
Run the firmware build without secure boot support with the blank varstore.
Works with both q35 and pc machine types. Only available on Fedora.

Configure libvirt domains for UEFI

The classic way to setup this in libvirt looks like this:

<domain type='kvm'>
[ ... ]
  <os>
    <type arch='x86_64' machine='q35'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE.secboot.fd</loader>
    <nvram template='/usr/share/OVMF/OVMF_VARS.fd'/>
  </os>

To make this easier the firmware builds come with json files describing the capabilities and requirements. You can find these files in /usr/share/qemu/firmware/. libvirt can use them to automatically find suitable firmware images, so you don't have to write the firmware image paths into the domain configuration. You can simply use this instead:

<domain type='kvm'>
[ ... ]
  <os firmware='efi'>
    <type arch='x86_64' machine='q35'>hvm</type>
  </os>

libvirt also allows to ask for specific firmware features. If you don't want use secure boot for example you can ask for the blank varstore template (no secure boot keys enrolled) this way:

<domain type='kvm'>
[ ... ]
  <os firmware='efi'>
    <type arch='x86_64' machine='q35'>hvm</type>
    <firmware>
      <feature name='enrolled-keys' enabled='no' />
    </firmware>
  </os>

In case you change the configuration of an existing virtual machine you might (depending on the kind of change) have to run virsh start --reset-nvram domain once to to start over with a fresh copy of the varstore template.

But why shutdown the autobuilder?

The world has moved forward. UEFI isn't a niche use case any more. Linux distributions all provide good packages theys days. The edk2 project got good CI coverage (years ago it was my autobuilder raising the flag when a commit broke the gcc build). The edk2 project got a regular release process distros can (and do) follow.

All in all the effort to maintain the autobuilder doesn't look justified any more.

by Gerd Hoffmann at July 19, 2022 10:00 PM

KVM on Z

Persistent configuration of crypto passthrough

Are you using CryptoExpress cards with KVM on IBM zSystems or LinuxONE? Sebastian Mitterle has a very good overview on how to make crypto device passthrough persistent.

http://learningbytesting.mathume.com/2022/07/persistent-crypto-device-passthrough-on.html

by Christian Bornträger (noreply@blogger.com) at July 19, 2022 11:43 AM

Daniel Berrange

Trying sd-boot and unified kernel images in a KVM virtual machine

A recent thread on the Fedora development list about unified kernel images co-incided with work I’m involved in wrt confidential computing (AMD SEV[-SNP], Intel TDX, etc). In exploring the different options for booting virtual machines in a confidential computing environment, one of the problems that keeps coming up is that of validating the boot measurements of the initrd and kernel command line. The initrd is currently generated on the fly at the time the kernel is installed on a host, while the command line typically contains host specific UUIDs for filesystems or LUKS volumes. Before even dealing with those problems, grub2‘s support for TPMs causes pain due to its need to measure every single grub.conf configuration line that is executed into a PCR. Even with the most minimal grub.conf using autodiscovery based on the boot loader spec, the grub.conf boot measurements are horribly cumbersome to deal with.

With this in mind, in working on confidential virtualization, we’re exploring options for simplifying the boot process by eliminating any per-host variable measurements. A promising way of achieving this is to make use of sd-boot instead of grub2, and using unified kernel images pre-built and signed by the OS vendor. I don’t have enough familiarity with this area of Linux, so I’ve been spending time trying out the different options available to better understand their operation. What follows is a short description of how i took an existing Fedora 36 virtual machine and converted it to sd-boot with a unified kernel image.

First of all, I’m assuming that the virtual machine has been installed using UEFI (EDK2’s OOVMF build) as the firmware, rather than legacy BIOS (aka SeaBIOS). This is not the default with virt-manager/virt-install, but an opt-in is possible at time of provisioning the guest. Similarly it is possible to opt-in to adding a virtual TPM to the guest, for the purpose of receiving boot measurements. Latest upstream code for virt-manager/virt-install will always add a vTPM if UEFI is requested.

Assuming UEFI + vTPM are enabled for the guest, the default Fedora / RHEL setup will also result in SecureBoot being enabled in the guest. This is good in general, but the sd-boot shipped in Fedora is not currently signed. Thus for (current) testing, either disable SecureBoot, or manually sign the sd-boot binary with a local key and enroll that key with UEFI. SecureBoot isn’t immediately important, so the quickest option is disabling SecureBoot with the following libvirt guest XML config setup:

<os firmware='efi'>
  <type arch='x86_64' machine='pc-q35-6.2'>hvm</type>
  <firmware>
    <feature enabled='no' name='secure-boot'/>
  </firmware>
  <loader secure='no'/>
  <boot dev='hd'/>
</os>

The next time the guest is cold-booted, the ‘--reset-nvram‘ flag needs to be passed to ‘virsh start‘ to make it throwaway the existing SecureBoot enabled NVRAM and replace it with one disabling SecureBoot.

$ virsh start --reset-nvram fedora36test

Inside the guest, surprisingly, there were only two steps required, installing ‘sd-boot’ to the EFI partition, and building the unified kernel images. Installing ‘sd-boot’ will disable the use of grub, so don’t reboot after this first step, until the kernels are setup:

$ bootctl install
Created "/boot/efi/EFI/systemd".
Created "/boot/efi/loader".
Created "/boot/efi/loader/entries".
Created "/boot/efi/EFI/Linux".
Copied "/usr/lib/systemd/boot/efi/systemd-bootx64.efi" to "/boot/efi/EFI/systemd/systemd-bootx64.efi".
Copied "/usr/lib/systemd/boot/efi/systemd-bootx64.efi" to "/boot/efi/EFI/BOOT/BOOTX64.EFI".
Updated /etc/machine-info with KERNEL_INSTALL_LAYOUT=bls
Random seed file /boot/efi/loader/random-seed successfully written (512 bytes).
Not installing system token, since we are running in a virtualized environment.
Created EFI boot entry "Linux Boot Manager".

While the ‘/boot/efi/loader‘ directory could be populated with config files specifying kernel/initrd/cmdline to boot, the desire is to be able to demonstrate booting with zero host local configuration. So the next step is to build and install the unified kernel image. The Arch Linux wiki has a comprehensive guide, but the easiest option for Fedora appears to be to use dracut with its ‘--uefi‘ flag

$ for i in /boot/vmlinuz-*x86_64
do
   kver=${i#/boot/vmlinuz-}
   echo "Generating $kver"
   dracut  --uefi --kver $kver --kernel-cmdline "root=UUID=5fd49e99-6297-4880-92ef-bc31aef6d2f0 ro rd.luks.uuid=luks-6806c81d-4169-4e7a-9bbc-c7bf65cabcb2 rhgb quiet"
done
Generating 5.17.13-300.fc36.x86_64
Generating 5.17.5-300.fc36.x86_64

The observant will notice the ‘–kernel-cmdline’ argument refers to install specific UUIDs for the LUKS volume and root filesystem. This ultimately needs to be eliminated too, which would require configuring the guest disk image to comply with the discoverable partitions spec. That is beyond the scope of my current exercise of merely demonstrating use of sd-boot and unified kernels. It ought to be possible to write a kickstart file to automate creation of a suitable cloud image though.

At this point the VM is rebooted, and watching the graphical console confirms that the grub menu has disappeared and display output goes straight from the UEFI splash screen into Linux. There’s no menu shown by sd-boot by default, but if desired this can be enabled by editing /boot/efi/loader/loader.conf to uncomment the line timeout 3, at which point it will show the kernel version selection at boot.

If following this scheme, bear in mind that nothing is wired up to handle this during kernel updates. The kernel RPM triggers will continue to setup grub.conf and generate standalone initrds. IOW don’t try this on a VM that you care about. I assume there’s some set of commands I could use to uninstall sd-boot and switch back to grub, but I’ve not bothered to figure this out.

Overall this exercise was suprisingly simple and painless. The whole idea of using a drastically simplified boot loader instead of grub, along with pre-built unified kernel images, feels like it has alot of promise, especially in the context of virtual machines where the set of possible boot hardware variants is small and well understood.

by Daniel Berrange at July 19, 2022 10:26 AM

July 18, 2022

Cornelia Huck

VIRTIO 1.2 is out!

A new version of the virtio specification has been released! As it has been three years after the 1.1 release, quite a lot of changes have accumulated. I have attempted to list some of them below; for details, you are invited to check out the spec :)

There are already some changes queued for 1.3; let’s hope it won’t take us three years again before the next release ;)

New device types in 1.2

Several new device types have been added.

  • virtio-pmem: persistent memory device; useful to avoid a separate page cache in the guest
  • virtio-fs: access a file system; kind of the spritual successor to the never officially standardized virtio-9p
  • virtio-rpmb: a tamper-resistant and anti-replay storage device
  • virtio-iommu: can both be a proxy for a physical IOMMU, or act as a virtual IOMMU
  • virtio-snd: a sound card supporting input and output PCM streams
  • virtio-mem: provides a memory region in guest physical address space; useful to implement memory hot(un)plugging
  • virtio-i2c: a virtual I2C adapter
  • virtio-scmi: implements the Arm System Control and Management Interface, for things like sensors etc.
  • virtio-gpio: a virtual GPIO device to manage named I/O lines

New features for existing device types

Enhancements have been added to some already existing device types.

  • virtio-blk
    • multiqueue support
    • lifetime metrics
    • secure erase
  • virtio-net
    • support for the guest providing the exact header length
    • receive-side scaling
    • per-packet hash reporting
    • per-virtqueue driver notifications
    • UDP segmentation offload
  • virtio-gpu
    • support for 3D commands
    • resource sharing
    • blob resources
    • context initialization
  • virtio-balloon
    • free page hints
    • page poisoning
    • free page reporting
  • virtio-vsock
    • seqpacket sockets

Features not specific to a device type

Some general enhancements include:

  • support for vendor-specific PCI capabilities
  • support for sharing resources between devices
  • support for resetting individual virtqueues

by Cornelia Huck at July 18, 2022 07:45 AM

July 05, 2022

Alberto Garcia

Running the Steam Deck’s OS in a virtual machine using QEMU

SteamOS desktop

Introduction

The Steam Deck is a handheld gaming computer that runs a Linux-based operating system called SteamOS. The machine comes with SteamOS 3 (code name “holo”), which is in turn based on Arch Linux.

Although there is no SteamOS 3 installer for a generic PC (yet), it is very easy to install on a virtual machine using QEMU. This post explains how to do it.

The goal of this VM is not to play games (you can already install Steam on your computer after all) but to use SteamOS in desktop mode. The Gamescope mode (the console-like interface you normally see when you use the machine) requires additional development to make it work with QEMU and will not work with these instructions.

A SteamOS VM can be useful for debugging, development, and generally playing and tinkering with the OS without risking breaking the Steam Deck.

Running the SteamOS desktop in a virtual machine only requires QEMU and the OVMF UEFI firmware and should work in any relatively recent distribution. In this post I’m using QEMU directly, but you can also use virt-manager or some other tool if you prefer, we’re emulating a standard x86_64 machine here.

General concepts

SteamOS is a single-user operating system and it uses an A/B partition scheme, which means that there are two sets of partitions and two copies of the operating system. The root filesystem is read-only and system updates happen on the partition set that is not active. This allows for safer updates, among other things.

There is one single /home partition, shared by both partition sets. It contains the games, user files, and anything that the user wants to install there.

Although the user can trivially become root, make the root filesystem read-write and install or change anything (the pacman package manager is available), this is not recommended because

  • it increases the chances of breaking the OS, and
  • any changes will disappear with the next OS update.

A simple way for the user to install additional software that survives OS updates and doesn’t touch the root filesystem is Flatpak. It comes preinstalled with the OS and is integrated with the KDE Discover app.

Preparing all the necessary files

The first thing that we need is the installer. For that we have to download the Steam Deck recovery image from here: https://store.steampowered.com/steamos/download/?ver=steamdeck&snr=

Once the file has been downloaded, we can uncompress it and we’ll get a raw disk image called steamdeck-recovery-4.img (the number may vary).

Note that the recovery image is already SteamOS (just not the most up-to-date version). If you simply want to have a quick look you can play a bit with it and skip the installation step. In this case I recommend that you extend the image before using it, for example with ‘truncate -s 64G steamdeck-recovery-4.img‘ or, better, create a qcow2 overlay file and leave the original raw image unmodified: ‘qemu-img create -f qcow2 -F raw -b steamdeck-recovery-4.img steamdeck-recovery-extended.qcow2 64G

But here we want to perform the actual installation, so we need a destination image. Let’s create one:

$ qemu-img create -f qcow2 steamos.qcow2 64G

Installing SteamOS

Now that we have all files we can start the virtual machine:

$ qemu-system-x86_64 -enable-kvm -smp cores=4 -m 8G \
    -device usb-ehci -device usb-tablet \
    -device intel-hda -device hda-duplex \
    -device VGA,xres=1280,yres=800 \
    -drive if=pflash,format=raw,readonly=on,file=/usr/share/ovmf/OVMF.fd \
    -drive if=virtio,file=steamdeck-recovery-4.img,driver=raw \
    -device nvme,drive=drive0,serial=badbeef \
    -drive if=none,id=drive0,file=steamos.qcow2

Note that we’re emulating an NVMe drive for steamos.qcow2 because that’s what the installer script expects. This is not strictly necessary but it makes things a bit easier. If you don’t want to do that you’ll have to edit ~/tools/repair_device.sh and change DISK and DISK_SUFFIX.

SteamOS installer shortcuts

Once the system has booted we’ll see a KDE Plasma session with a few tools on the desktop. If we select “Reimage Steam Deck” and click “Proceed” on the confirmation dialog then SteamOS will be installed on the destination drive. This process should not take a long time.

Now, once the operation finishes a new confirmation dialog will ask if we want to reboot the Steam Deck, but here we have to choose “Cancel”. We cannot use the new image yet because it would try to boot into the Gamescope session, which won’t work, so we need to change the default desktop session.

SteamOS comes with a helper script that allows us to enter a chroot after automatically mounting all SteamOS partitions, so let’s open a Konsole and make the Plasma session the default one in both partition sets:

$ sudo steamos-chroot --disk /dev/nvme0n1 --partset A
# steamos-readonly disable
# echo '[Autologin]' > /etc/sddm.conf.d/zz-steamos-autologin.conf
# echo 'Session=plasma.desktop' >> /etc/sddm.conf.d/zz-steamos-autologin.conf
# steamos-readonly enable
# exit

$ sudo steamos-chroot --disk /dev/nvme0n1 --partset B
# steamos-readonly disable
# echo '[Autologin]' > /etc/sddm.conf.d/zz-steamos-autologin.conf
# echo 'Session=plasma.desktop' >> /etc/sddm.conf.d/zz-steamos-autologin.conf
# steamos-readonly enable
# exit

After this we can shut down the virtual machine. Our new SteamOS drive is ready to be used. We can discard the recovery image now if we want.

Booting SteamOS and first steps

To boot SteamOS we can use a QEMU line similar to the one used during the installation. This time we’re not emulating an NVMe drive because it’s no longer necessary.

$ cp /usr/share/OVMF/OVMF_VARS.fd .
$ qemu-system-x86_64 -enable-kvm -smp cores=4 -m 8G \
   -device usb-ehci -device usb-tablet \
   -device intel-hda -device hda-duplex \
   -device VGA,xres=1280,yres=800 \
   -drive if=pflash,format=raw,readonly=on,file=/usr/share/ovmf/OVMF.fd \
   -drive if=pflash,format=raw,file=OVMF_VARS.fd \
   -drive if=virtio,file=steamos.qcow2 \
   -device virtio-net-pci,netdev=net0 \
   -netdev user,id=net0,hostfwd=tcp::2222-:22

(the last two lines redirect tcp port 2222 to port 22 of the guest to be able to SSH into the VM. If you don’t want to do that you can omit them)

If everything went fine, you should see KDE Plasma again, this time with a desktop icon to launch Steam and another one to “Return to Gaming Mode” (which we should not use because it won’t work). See the screenshot that opens this post.

Congratulations, you’re running SteamOS now. Here are some things that you probably want to do:

  • (optional) Change the keyboard layout in the system settings (the default one is US English)
  • Set the password for the deck user: run ‘passwd‘ on a terminal
  • Enable / start the SSH server: ‘sudo systemctl enable sshd‘ and/or ‘sudo systemctl start sshd‘.
  • SSH into the machine: ‘ssh -p 2222 deck@localhost

Updating the OS to the latest version

The Steam Deck recovery image doesn’t install the most recent version of SteamOS, so now we should probably do a software update.

  • First of all ensure that you’re giving enought RAM to the VM (in my examples I run QEMU with -m 8G). The OS update might fail if you use less.
  • (optional) Change the OS branch if you want to try the beta release: ‘sudo steamos-select-branch beta‘ (or main, if you want the bleeding edge)
  • Check the currently installed version in /etc/os-release (see the BUILD_ID variable)
  • Check the available version: ‘steamos-update check
  • Download and install the software update: ‘steamos-update

Note: if the last step fails after reaching 100% with a post-install handler error then go to Connections in the system settings, rename Wired Connection 1 to something else (anything, the name doesn’t matter), click Apply and run steamos-update again. This works around a bug in the update process. Recent images fix this and this workaround is not necessary with them.

As we did with the recovery image, before rebooting we should ensure that the new update boots into the Plasma session, otherwise it won’t work:

$ sudo steamos-chroot --partset other
# steamos-readonly disable
# echo '[Autologin]' > /etc/sddm.conf.d/zz-steamos-autologin.conf
# echo 'Session=plasma.desktop' >> /etc/sddm.conf.d/zz-steamos-autologin.conf
# steamos-readonly enable
# exit

After this we can restart the system.

If everything went fine we should be running the latest SteamOS release. Enjoy!

Reporting bugs

SteamOS is under active development. If you find problems or want to request improvements please go to the SteamOS community tracker.

Edit 06 Jul 2022: Small fixes, mention how to install the OS without using NVMe.

by berto at July 05, 2022 07:11 PM

June 30, 2022

Stefan Hajnoczi

Comparing VIRTIO, NVMe, and io_uring queue designs

Queues and their implementation using shared memory ring buffers are a standard tool for communicating with I/O devices and between CPUs. Although ring buffers are widely used, there is no standard memory layout and it's interesting to compare the differences between designs. When defining libblkio's APIs, I surveyed the ring buffer designs in VIRTIO, NVMe, and io_uring. This article examines some of the differences between the ring buffers and queue semantics in VIRTIO, NVMe, and io_uring.

Ring buffer basics

A ring buffer is a circular array where new elements are written or produced on one side and read or consumed on the other side. Often terms such as head and tail or reader and writer are used to describe the array indices at which the next element is accessed. When the end of the array is reached, one moves back to the start of the array. The empty and full conditions are special states that must be checked to avoid underflow and overflow.

VIRTIO, NVMe, and io_uring all use single producer, single consumer shared memory ring buffers. This allows a CPU and an I/O device or two CPUs to communicate across a region of memory to which both sides have access.

Embedding data in descriptors

At a minimum a ring buffer element, or descriptor, contains the memory address and size of a data buffer:

OffsetTypeName
0x0u64buf
0x8u64len

In a storage device the data buffer contains a request structure with information about the I/O request (logical block address, number of sectors, etc). In order to process a request, the device first loads the descriptor and then loads the request structure described by the descriptor. Performing two loads is sub-optimal and it would be faster to fetch the request structure in a single load.

Embedding the data buffer in the descriptor is a technique that reduces the number of loads. The descriptor layout looks like this:

OffsetTypeName
0x0u64remainder_buf
0x8u64remainder_len
0x10...request structure

The descriptor is extended to make room for the data. If the size of the data varies and is sometimes too large for a descriptor, then the remainder is put into an external buffer. The common case will only require a single load but larger variable-sized buffers can still be handled with 2 loads as before.

VIRTIO does not embed data in descriptors due to its layered design. The data buffers are defined by the device type (net, blk, etc) and virtqueue descriptors are one layer below device types. They have no knowledge of the data buffer layout and therefore cannot embed data.

NVMe embeds the request structure into the Submission Queue Entry. The Command Dword 10, 11, 12, 13, 14, and 15 fields contain the request data and their meaning depends on the Opcode (request type). I/O buffers are still external and described by Physical Region Pages (PRPs) or Scatter Gather Lists (SGLs).

io_uring's struct io_uring_sqe embeds the request structure. Only I/O buffer(s) need to be external as their size varies, would be too large for the ring buffer, and typically zero-copy is desired due to the size of the data.

It seems that VIRTIO could learn from NVMe and io_uring. Instead of having small 16-byte descriptors, it could embed part of the data buffer into the descriptor so that devices need to perform fewer loads during request processing. The 12-byte struct virtio_net_hdr and 16-byte struct virtio_blk_req request headers would fit into a new 32-byte descriptor layout. I have not prototyped and benchmarked this optimization, so I don't know how effective it is.

Descriptor chaining vs external descriptors

I/O requests often include variable size I/O buffers that require scatter-gather lists similar to POSIX struct iovec arrays. Long arrays don't fit into a descriptor so descriptors have fields that point to an external array of descriptors.

Another technique for scatter-gather lists is to chain descriptors together within the ring buffer instead of relying on memory external to the ring buffer. When descriptor chaining is used, I/O requests that don't fit into a single descriptor can occupy multiple descriptors.

Advantages of chaining are better cache locality when a sequence of descriptors is used and no need to allocate separate per-request external descriptor memory.

A consequence of descriptor chaining is that the maximum queue size, or queue depth, becomes variable. It is not possible to guarantee space for specific number of I/O requests because the available number of descriptors depends on the chain size of requests placed into the ring buffer.

VIRTIO supports descriptor chaining although drivers usually forego it when VIRTIO_F_RING_INDIRECT_DESC is available.

NVMe and io_uring do not support descriptor chaining, instead relying on embedded and external descriptors.

Limits on in-flight requests

The maximum number of in-flight requests depends on the ring buffer design. Designs where descriptors are occupied from submission until completion prevent descriptor reuse for other requests while the current request is in flight.

An alternative design is where the device processes submitted descriptors and they are considered free again as soon as the device has looked at them. This approach is natural when separate submission and completion queues are used and there is no relationship between the two descriptor rings.

VIRTIO requests occupy descriptors for the duration of their lifetime, at least in the Split Virtqueue format. Therefore the number of in-flight requests is influenced by the descriptor table size.

NVMe has separate Submission Queues and Completion Queues, but its design still limits the number of in-flight requests to the queue size. The Completion Queue Entry's SQ Head Pointer (SQHD) field precludes having more requests in flight than the Submission Queue size because the field would no longer be unique. Additionally, the driver has no way of detecting Submission Queue Head changes, so it only knows there is space for more submissions when completions occur.

io_uring has independent submission (SQ) and completions queues (CQ) with support for more in-flight requests than the ring buffer size. When there are more in-flight requests than CQ capacity, it's possible to overflow the CQ. io_uring has a backlog mechanism for this case, although the intention is for applications to properly size queues to avoid hitting the backlog often.

Conclusion

VIRTIO, NVMe, and io_uring have slightly different takes on queue design. The semantics and performance vary due to these differences. VIRTIO lacks data embedding inside descriptors. io_uring supports more in-flight requests than the queue size. NVMe and io_uring rely on external descriptors with no ability to chain descriptors.

by Unknown (noreply@blogger.com) at June 30, 2022 01:22 PM

May 16, 2022

Gerd Hoffmann

edk2 quickstart for virtualization

Here is a quickstart for everyone who wants (or needs to) deal with edk2 firmware, with a focus on virtual machine firmware. The article assumes you are using a linux machine with gcc.

Building firmware for VMs

To build edk2 you need to have a bunch of tools installed. An compiler and the make are required of course, but also iasl, nasm and libuuid. So install them first (package names are for centos/fedora).

dnf install -y make gcc binutils iasl nasm libuuid-devel

If you want cross-build arm firmware on a x86 machine you also need cross compilers. While being at also set the environment variables needed to make the build system use the cross compilers:

dnf install -y gcc-aarch64-linux-gnu gcc-arm-linux-gnu
export GCC5_AARCH64_PREFIX="aarch64-linux-gnu-"
export GCC5_ARM_PREFIX="arm-linux-gnu-"

Next clone the tiaocore/edk2 repository and also fetch the git submodules.

git clone https://github.com/tianocore/edk2.git
cd edk2
git submodule update --init

The edksetup script will prepare the build environment for you. The script must be sourced because it sets some environment variables (WORKSPACE being the most important one). This must be done only once (as long as you keep the shell with the configured environment variables open).

source edksetup.sh

Next step is building the BaseTools (also needed only once):

make -C BaseTools

Note: Currently (April 2022) BaseTools are being rewritten in Python, so most likely this step will not be needed any more at some point in the future.

Finally the build (for x64 qemu) can be kicked off:

build -t GCC5 -a X64 -p OvmfPkg/OvmfPkgX64.dsc

The firmware volumes built can be found in Build/OvmfX64/DEBUG_GCC5/FV.

Building the aarch64 firmware instead:

build -t GCC5 -a AARCH64 -p ArmVirtPkg/ArmVirtQemu.dsc

The build results land in Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/FV.

Qemu expects the aarch64 firmware images being 64M im size. The firmware images can't be used as-is because of that, some padding is needed to create an image which can be used for pflash:

dd of="QEMU_EFI-pflash.raw" if="/dev/zero" bs=1M count=64
dd of="QEMU_EFI-pflash.raw" if="QEMU_EFI.fd" conv=notrunc
dd of="QEMU_VARS-pflash.raw" if="/dev/zero" bs=1M count=64
dd of="QEMU_VARS-pflash.raw" if="QEMU_VARS.fd" conv=notrunc

There are a bunch of compile time options, typically enabled using -D NAME or -D NAME=TRUE. Options which are enabled by default can be turned off using -D NAME=FALSE. Available options are defined in the *.dsc files referenced by the build command. So a feature-complete build looks more like this:

build -t GCC5 -a X64 -p OvmfPkg/OvmfPkgX64.dsc \
    -D FD_SIZE_4MB \
    -D NETWORK_IP6_ENABLE \
    -D NETWORK_HTTP_BOOT_ENABLE \
    -D NETWORK_TLS_ENABLE \
    -D TPM2_ENABLE

Secure boot support (on x64) requires SMM mode. Well, it builds and works without SMM, but it's not secure then. Without SMM nothing prevents the guest OS writing directly to flash, bypassing the firmware, so protected UEFI variables are not actually protected.

Also suspend (S3) support works with enabled SMM only in case parts of the firmware (PEI specifically, see below for details) run in 32bit mode. So the secure boot variant must be compiled this way:

build -t GCC5 -a IA32 -a X64 -p OvmfPkg/OvmfPkgIa32X64.dsc \
    -D FD_SIZE_4MB \
    -D SECURE_BOOT_ENABLE \
    -D SMM_REQUIRE \
    [ ... add network + tpm + other options as needed ... ]

The FD_SIZE_4MB option creates a larger firmware image, being 4MB instead of 2MB (default) in size, offering more space for both code and vars. The RHEL/CentOS builds use that. The Fedora builds are 2MB in size, for historical reasons.

If you need 32-bit firmware builds for some reason, here is how to do it:

build -t GCC5 -a ARM -p ArmVirtPkg/ArmVirtQemu.dsc
build -t GCC5 -a IA32 -p OvmfPkg/OvmfPkgIa32.dsc

The build results will be in in Build/ArmVirtQemu-ARM/DEBUG_GCC5/FV and Build/OvmfIa32/DEBUG_GCC5/FV

Booting fresh firmware builds

The x86 firmware builds create three different images:

OVMF_VARS.fd
This is the firmware volume for persistent UEFI variables, i.e. where the firmware stores all configuration (boot entries and boot order, secure boot keys, ...). Typically this is used as template for an empty variable store and each VM gets its own private copy, libvirt for example stores them in /var/lib/libvirt/qemu/nvram.
OVMF_CODE.fd
This is the firmware volume with the code. Separating this from VARS does (a) allow for easy firmware updates, and (b) allows to map the code read-only into the guest.
OVMF.fd
The all-in-one image with both CODE and VARS. This can be loaded as ROM using -bios, with two drawbacks: (a) UEFI variables are not persistent, and (b) it does not work for SMM_REQUIRE=TRUE builds.

qemu handles pflash storage as block devices, so we have to create block devices for the firmware images:

CODE=${WORKSPACE}/Build/OvmfX64/DEBUG_GCC5/FV/OVMF_CODE.fd
VARS=${WORKSPACE}/Build/OvmfX64/DEBUG_GCC5/FV/OVMF_VARS.fd
qemu-system-x86_64 \
  -blockdev node-name=code,driver=file,filename=${CODE},read-only=on \
  -blockdev node-name=vars,driver=file,filename=${VARS},snapshot=on \
  -machine q35,pflash0=code,pflash1=vars \
  [ ... ]

Here is the arm version of that (using the padded files created using dd, see above):

CODE=${WORKSPACE}/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/FV/QEMU_EFI-pflash.raw
VARS=${WORKSPACE}/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/FV/QEMU_VARS-pflash.raw
qemu-system-aarch64 \
  -blockdev node-name=code,driver=file,filename=${CODE},read-only=on \
  -blockdev node-name=vars,driver=file,filename=${VARS},snapshot=on \
  -machine virt,pflash0=code,pflash1=vars \
  [ ... ]

Source code structure

The core edk2 repo holds a number of packages, each package has its own toplevel directory. Here are the most interesting ones:

OvmfPkg
This holds both the x64-specific code (i.e. OVMF itself) and virtualization-specific code shared by all architectures (virtio drivers).
ArmVirtPkg
Arm specific virtual machine support code.
MdePkg, MdeModulePkg
Most core code is here (PCI support, USB support, generic services and drivers, ...).
PcAtChipsetPkg
Some Intel architecture drivers and libs.
ArmPkg, ArmPlatformPkg
Common Arm architecture support code.
CryptoPkg, NetworkPkg, FatPkg, CpuPkg, ...
As the names of the packages already suggest: Crypto support (using openssl), Network support (including network boot), FAT Filesystem driver, ...

Firmware boot phases

The firmware modules in the edk2 repo often named after the boot phase they are running in. Most drivers are named SomeThingDxe for example.

ResetVector
This is where code execution starts after a machine reset. The code will do the bare minimum needed to enter SEC. On x64 the most important step is the transition from 16-bit real mode to 32-bit mode or 64bit long mode.
SEC (Security)
This code typically loads and uncompresses the code for PEI and SEC. On physical hardware SEC often lives in ROM memory and can not be updated. The PEI and DXE firmware volumes are loaded from (updateable) flash.
With OVMF both SEC firmware volume and the compressed volume holding PXE and DXE code are part of the OVMF_CODE image and will simply be mapped into guest memory.
PEI (Pre-EFI Initialization)
Platform Initialization is done here. Initialize the chipset. Not much to do here in virtual machines, other than loading the x64 e820 memory map (via fw_cfg) from qemu, or get the memory map from the device tree (on aarch64). The virtual hardware is ready-to-go without much extra preaparation.
PEIMs (PEI Modules) can implement functionality which must be executed before entering the DXE phase. This includes security-sensitive things like initializing SMM mode and locking down flash memory.
DXE (Driver Execution Environment)
When PEI is done it hands over control to the full EFI environment contained in the DXE firmware volume. Most code is here. All kinds of drivers. the firmware setup efi app, ...
Strictly speaking this isn't only one phase. The code for all phases after PEI is part of the DXE firmware volume though.

Useful Links

by Gerd Hoffmann at May 16, 2022 10:00 PM

May 13, 2022

Thomas Huth

Improved KVM virtualization with RHEL 9 on IBM Z

This week, Red Hat Enterprise Linux 9 has been announced, which will also bring us lots of new stuff for our beloved mainframe.

First, compared with RHEL 8, a lot of generic packages have been updated, of course. For example, RHEL 9 on IBM Z comes with:

  • Linux kernel 5.14
  • glibc 2.34
  • gcc 11.2
  • clang 13.0
  • binutils 2.35
  • s390utils 2.19

And of course all of these have been thoroughly tested during the past months, which is also the reason why RHEL sometimes does not ship the very latest bleeding edge versions of the upstream projects – thorough testing needs some time. But you can be sure that Red Hat also backported lots of selected upstream fixes and improvements e.g. for the kernel to their downstream packages, so this is very up to date and stable software here.

The new KVM virtualization stack

The first big news is: There is no need anymore to install the separate virt:av (“Advanced Virtualization”) module to get the latest and greatest virtualization features on IBM Z. Everything is packaged along with the main RHEL distribution for easier installation now and will be kept up-to-date there, with important new features like virtio-fs enabled by default. And of course, as with the latest releases of RHEL 8, there is also no limit to 4 guests anymore, so you don’t have to worry about the number of supported KVM guests (as long as your hardware can handle them).

The versions that will be shipped with RHEL 9.0 are:

  • QEMU 6.2.0
  • libvirt 8.0.0
  • libguestfs 1.46.1
  • virt-install 3.2.0
  • libslirp 4.4.0

To answer the maybe most important question: Yes, this will also support the brand new IBM z16 mainframe already. Basic support for this new generation has already been added to QEMU 6.1.0 and kernel 5.14, and additional z16 features have been enabled by default in QEMU 6.2.0.

Another great new change is that it is now possible to configure mediated devices directly with the virtualization CLI tools on IBM Z. You can now add vfio-ap and vfio-ccw mediated devices to your KVM guests using virt-install or virt-xml. With virt-install, you can also create a VM that uses an existing DASD mediated device as its primary disk.

Additionally, many small performance improvements (like the specification exception interpretation feature) and bug fixes have been backported to the RHEL 9 kernel and the userspace tools to give you a great virtualization experience with RHEL 9.

One more thing that is worth mentioning (though it is not specific to IBM Z), which you might have noticed by clicking on the links in the previous paragraphs already, there is another big change in RHEL 9: The development of the upcoming minor RHEL 9 releases (i.e. 9.1, 9.2, etc.) is now done in the public via the CentOS Stream repositories. That means you can not only peak on the work that will be integrated in the next 9.y release, you can now even directly participate in the development of these next release if you like! Isn’t that cool?

Anyway, no matter whether you are planning to participate or just want to use the software, please enjoy the new KVM virtualization stack on the mainframe!

May 13, 2022 03:45 PM

April 29, 2022

Stefan Hajnoczi

Debugging Flatpak applications

Flatpak is a way to distribute applications on Linux. Its container-style approach allows applications to run across Linux distributions. This means native packages (rpm, deb, etc) are not needed and it's relatively easy to get your app to Linux users with fewer worries about distro compatibility. This makes life a lot easier for developers and is also convenient for users.

I've run popular applications like OBS Studio as flatpaks and even publish my own on Flathub, a popular hosting site for applications. Today I figured out how to debug flatpaks, which requires some extra steps that I'll share below so I don't forget them myself!

Bonus Tip: Testing local flatpaks

If you're building a flatpak of your own application it's handy to use the dir sources type in the manifest to compile your application's source code from a local directory instead of a git tag or tarball URL. This way you can make changes to the source code and test them quickly inside Flatpak.

Put something along these lines in the manifest's modules object where /home/user/my-app is you local directory with your app's source code:

{
"name": "my-app",
"sources": [
{
"type": "dir",
"path": "/home/user/my-app"
}
],
...
}

Building and installing apps with debuginfo

flatpak-builder(1) automatically creates a separate .Debug extension for your flatpak that contains your application's debuginfo. You'll need the .Debug extension if you want proper backtraces and source level debugging. At the time of writing the Flatpak documentation did not mention how to install the locally-built .Debug extension. Here is how:

$ flatpak-builder --user --force-clean --install build my.org.app.json
$ flatpak install --user --reinstall --assumeyes "$(pwd)/.flatpak-builder/cache" my.org.app.Debug

It might be a good idea to install debuginfo for the system libraries in your SDK too in case it's not already installed:

$ flatpak install org.kde.Sdk.Debug # or your runtime's SDK

Running applications for debugging

There is a flatpak(1) option that launches the application with the SDK instead of the Runtime:

$ flatpak run --user --devel my.org.app

The SDK contains development tools whereas the Runtime just has the files needed to run applications.

It can also be handy to launch a shell so you can control the launch of your app and maybe use gdb or strace:

$ flatpak run --user --devel --command=sh my.org.app
[📦 my.org.app ~]$ gdb /app/bin/my-app

Working with core dumps

If your application crashes it will dump core like any other process. However, existing ways of inspecting core dumps like coredumpctl(1) are not fully functional because the process ran inside namespaces and debuginfo is located inside flatpaks instead of the usual system-wide /usr/lib/debug location. coredumpctl(1), gdb, etc aren't Flatpak-aware and need extra help.

Use the flatpak-coredumpctl wrapper to launch gdb:

$ flatpak-coredumpctl -m <PID> my.org.app

You can get PID from the list printed by coredumpctl(1).

Conclusion

This article showed how to install locally-built .Debug extensions and inspect core dumps when using Flatpak. I hope that over time these manual steps will become unnecessary as flatpak-builder(1) and coredumpctl(1) are extended to automatically install .Debug extensions and handle Flatpak core dumps. For now it just takes a few extra commands compared to debugging regular applications.

by Unknown (noreply@blogger.com) at April 29, 2022 05:41 PM

April 26, 2022

KVM on Z

Ubuntu 22.04 released

Canonical release a new LTS (Long Term Support) version of its Ubuntu server offering Ubuntu Server 22.04!
It ships

  • Linux kernel 5.15
  • QEMU v6.2
  • libvirt v8.0
See the release notes here, and the blog entry at Canonical with Z-specific highlights here.

by Stefan Raspl (noreply@blogger.com) at April 26, 2022 10:57 AM

Ubuntu 21.10 released

Ubuntu Server 21.10 is out!
It ships

  • Linux kernel 5.13 (including, among others, features as described here and here)
  • QEMU v6.0
  • libvirt v7.6
See the release notes here, and the blog entry at Canonical with Z-specific highlights here.

by Stefan Raspl (noreply@blogger.com) at April 26, 2022 10:56 AM

April 21, 2022

KVM on Z

Howto: Verifying Secure Execution Host Key Documents

TL;DR

Using genprotimg to build an IBM Secure Execution for Linux image may fail after CA certificate reissue in April 2022.
If this happens you can work around it by using this script
to verify the validity of the host key document and run genprotimg with the --noverify option.

The certificates used to verify Host Key Documents for IBM Secure Execution for Linux are about to be renewed in April 2022. As a result, the genprotimg tool contained in your Linux distribution may report a verification failure and refuse to build a secure image. Patches for RHEL 8 as well as for SLES 15 and Ubuntu 20.04 are underway, so the issue can eventually be fixed by updating your Linux installation.
Until then, it is possible to work around as follows:
  1. Download the check_hostkeydoc script at https://github.com/ibm-s390-linux/s390-tools/blob/master/genprotimg/samples/check_hostkeydoc.
  2. Run
    check_hostkeydoc <your host key document> \
        ibm-z-host-key-signing.crt -c DigiCertCA.crt
     
  3. If the previous step succeeded, it's safe to
    run genprotimg --no-verify -k <your host key document> \
        -o <output file> -i <kernel file> -r <ramdisk file> \
        -p <parameter file>
More information about IBM Secure Execution for Linux can be found here.

by Stefan Raspl (noreply@blogger.com) at April 21, 2022 02:28 PM

April 19, 2022

QEMU project

QEMU version 7.0.0 released

We’d like to announce the availability of the QEMU 7.0.0 release. This release contains 2500+ commits from 225 authors.

You can grab the tarball from our download page. The full list of changes are available in the Wiki.

Highlights include:

  • ACPI: support for logging guest events via ACPI ERST interface
  • virtiofs: improved security label support
  • block: improved flexibility for fleecing backups, including support for non-qcow2 images
  • ARM: ‘virt’ board support for virtio-mem-pci, specifying guest CPU topology, and enabling PAuth when using KVM/hvf
  • ARM: ‘xlnx-versal-virt’ board support for PMC SLCR and emulating the OSPI flash memory controller
  • ARM: ‘xlnx-zynqmp’ now models the CRF and APU control
  • HPPA: support for up to 16 vCPUs, improved graphics driver for HP-UX VDE/CDE environments, setting SCSI boot order, and a number of other new features
  • OpenRISC: ‘sim’ board support for up to 4 cores, loading an external initrd image, and automatically generating a device tree for the boot kernel
  • PowerPC: ‘pseries’ emulation support for running guests as a nested KVM hypervisor, and new support for spapr-nvdimm device
  • PowerPC: ‘powernv’ emulation improvements for XIVE and PHB 3/4, and new support for XIVE2 and PHB5
  • RISC-V: support for KVM
  • RISC-V: support for ratified 1.0 Vector extension, as well as Zve64f, Zve32f, Zfhmin, Zfh, zfinx, zdinx, and zhinx{min} extensions.
  • RISC-V: ‘spike’ machine support for OpenSBI binary loading
  • RISC-V: ‘virt’ machine support for 32 cores, and AIA support.
  • s390x: support for “Miscellaneous-Instruction-Extensions Facility 3” (a z15 extension)
  • x86: Support for Intel AMX
  • and lots more…

Thank you to everyone involved!

April 19, 2022 04:04 PM

April 07, 2022

KVM on Z

IBM z16 announced!

Today, IBM announced the new IBM z16, with a planned availability date of May 31.

See here for the press release, and here for the offical homepag. For further details, including a list of supported Linux distributions, see Eberhard's blog here.

And for a more hands-on tour of the new box, check out this video.

by Stefan Raspl (noreply@blogger.com) at April 07, 2022 02:57 PM

March 31, 2022

KVM on Z

Documentation: Solution Assurance

The solution assurance team started to publish solution setups, recommendations, and step-by-step guidelines for a broad range of topics, for example:

  • High availability clustering
  • IBM Cloud Infrastructure Center
  • CPUMF
  • kdump
and more!

You can access the materials here, but don't forget to check back periodically: There is more to come!

by Stefan Raspl (noreply@blogger.com) at March 31, 2022 09:34 PM

March 29, 2022

KVM on Z

Documentation: SAP Application Server on KVM

This new publication aims towards providing practical insights for running real-world workloads on KVM on IBM Z. From the abstract:

The SAP on IBM Z Performance team, in Poughkeepsie, NY, conducted a series of measurements to assess the performance cost of implementing a KVM environment to host SAP application servers. The tests used SAP (SBS 9.0) core banking workloads, with a Db2 database having 100 million banking accounts, which are comparable to some of the largest banks in the world. Tests were conducted that used both banking workload types, Account Settlement (batch) and Day Posting, which simulates online transactional processing (OLTP). They were executed on an IBM z15 with 16 and 32 Integrated Facility for Linux (IFL) processor configurations, that used various degrees of virtualization.

The document is available here.

by Stefan Raspl (noreply@blogger.com) at March 29, 2022 04:20 PM

March 07, 2022

QEMU project

Apply for a QEMU Google Summer of Code internship

We have great news to share: QEMU has been accepted as a Google Summer of Code 2022 organization! Google Summer of Code is an open source internship program offering paid remote work opportunities for contributing to open source. The internship runs from June 13th to September 12th.

Now is the chance to get involved in QEMU development! The QEMU community has put together a list of project ideas here.

Google has dropped the requirement that you need to be enrolled in a higher education course. We’re excited to work with a wider range of contributors this year! For details on the new eligibility requirements, see here.

You can submit your application from April 4th to 19th.

GSoC interns work together with their mentors, experienced QEMU contributors who support their interns in their projects. Code developed during the internship is submitted through the same open source development process that all QEMU contributions follow. This gives interns experience with contributing to open source software. Some interns then choose to pursue a career in open source software after completing their internship.

If you have questions about applying for QEMU GSoC, please email Stefan Hajnoczi or ask on the #qemu-gsoc IRC channel.

March 07, 2022 01:30 PM

Stefan Hajnoczi

vhost-user is coming to non-Linux hosts!

Sergio Lopez sent a QEMU patch series and vhost-user protocol specification update that maps vhost-user to non-Linux POSIX host operating systems. This is great news because vhost-user has become a popular way to develop emulated devices in any programming language that execute as separate processes with their own security sandboxing. Until now they have only been available on Linux hosts.

At the moment the BSD and macOS implementation is slower than the Linux implementation because the KVM ioeventfd and irqfd primitives are unavailable on those operating systems. Instead POSIX pipes is used and the VMM (QEMU) needs to acts as a forwarder for MMIO/PIO accesses and interrupt injections. On Linux the kvm.ko kernel module has direct support for this, bypassing the VMM process and achieving higher efficiency. However, similar mechanisms could be added to non-KVM virtualization drivers in the future.

This means that vhost-user devices can now start to support multiple host operating systems and I'm sure they will be used in new ways that no one thought about before.

by Unknown (noreply@blogger.com) at March 07, 2022 09:37 AM

February 28, 2022

Gerd Hoffmann

Introducing ovmfctl

New project: Tools for for ovmf (and armvirt) firmware volumes. It's written in python and can be installed with a simple pip3 install ovmfctl. The project is hosted at gitlab.

ovmfdump

Usage: ovmfctl --input file.fd.

It's a debugging tool which just prints the structure and content of firmware volumes.

ovmfctl

This is a tool to print and modify variable store volumes. Main focus has been on certificate handling so far.

Enrolling certificates for secure boot support in virtual machines has been a rather painfull process. It's handled by EnrollDefaultKeys.efi which needs to be started inside a virtual machine to enroll the certificates and enable secure boot mode.

With ovmfctl it is dead simple:

ovmfctl --input /usr/share/edk2/ovmf/OVMF_VARS.fd \
        --enroll-redhat \
        --secure-boot \
        --output file.fd

This enrolls the Red Hat Secure Boot certificate which is used by Fedora, CentOS and RHEL as platform key. The usual Microsoft certificates are added to the certificate database too, so windows guests and shim.efi work as expected.

If you want more fine-grained control you can use the --set-pk, --add-kek, --add-db and --add-mok switches instead. The --enroll-redhat switch above is actually just a shortcut for:

--set-pk  a0baa8a3-041d-48a8-bc87-c36d121b5e3d RedHatSecureBootPKKEKkey1.pem \
--add-kek a0baa8a3-041d-48a8-bc87-c36d121b5e3d RedHatSecureBootPKKEKkey1.pem \
--add-kek 77fa9abd-0359-4d32-bd60-28f4e78f784b MicrosoftCorporationKEKCA2011.pem \
--add-db  77fa9abd-0359-4d32-bd60-28f4e78f784b MicrosoftWindowsProductionPCA2011.pem \
--add-db  77fa9abd-0359-4d32-bd60-28f4e78f784b MicrosoftCorporationUEFICA2011.pem \

If you just want the variable store be printed use ovmfctl --input file.fd --print. Add --hexdump for more details.

Extract all certificates: ovmfctl --input file.fd --extract-certs.

Try ovmfctl --help for a complete list of command line switches. Note that Input and output file can be indentical for inplace updates.

That's it. Enjoy!

by Gerd Hoffmann at February 28, 2022 11:00 PM

February 15, 2022

QEMU project

QEMU welcomes Outreachy internship applicants

QEMU is offering open source internships in Outreachy’s May-August 2022 round. You can submit your application until February 25th 2022 if you want to contribute to QEMU in a remote work internship this summer.

Outreachy internships are extended to people who are subject to systemic bias and underrepresentation in the technical industry where they are living. For details on applying, please see the Outreachy website. If you are not eligible, don’t worry, QEMU is also applying to participate in Google Summer of Code again and we hope to share news about additional internships later this year.

Outreachy interns work together with their mentors, experienced QEMU contributors who support their interns in their projects. Code developed during the internship is submitted via the same open source development process that all QEMU code follows. This gives interns experience with contributing to open source software. Some interns then choose to pursue a career in open source software after completing their internship.

Now is the chance to get involved in QEMU development!

If you have questions about applying for QEMU Outreachy, please email Stefan Hajnoczi or ask on the #qemu-gsoc IRC channel.

February 15, 2022 01:30 PM

Powered by Planet!
Last updated: November 30, 2022 08:05 PMEdit this page