Starting with RHEL8.6 is is no longer necessary to use the RHEL-AV variant to get updated and new versions of QEMU/libvirt etc. Check this link how to configure dnf/yum to use the RHEL versions instead.
Some people already noticed and asked questions. So guess I better write things down in my blog so I don't have to answer the questions over and over again, and I hope to also clarify some things on distro firmware builds.
So, yes, the jenkins autobuilder creating the firmware repository at https://www.kraxel.org/repos/jenkins/ has been shutdown yesterday (Jul 19th 2020). The repository will stay online for the time being, so your establish workflows will not instantly break. But the repository will not get updates any more, so it is wise to start looking for alternatives now.
The obvious primary choice would be to just use the firmware builds provided by your distribution. I'll cover edk2 only, which seems to be the by far most popular use, even thought here are also builds for other firmware projects.
Given I'm quite familier with the RHEL / Fedora world I can give
some advise here. The edk2-ovmf
package comes with
multiple images for the firmware code and the varstore template
which allow for various combinations. The most important ones are:
OVMF_CODE.secboot.fd
and OVMF_VARS.secboot.fd
OVMF_CODE.secboot.fd
and OVMF_VARS.fd
OVMF_CODE.fd
and OVMF_VARS.fd
The classic way to setup this in libvirt looks like this:
<domain type='kvm'>
[ ... ]
<os>
<type arch='x86_64' machine='q35'>hvm</type>
<loader readonly='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE.secboot.fd</loader>
<nvram template='/usr/share/OVMF/OVMF_VARS.fd'/>
</os>
To make this easier the firmware builds come with json files
describing the capabilities and requirements. You can find these
files in /usr/share/qemu/firmware/
. libvirt can use
them to automatically find suitable firmware images, so you don't
have to write the firmware image paths into the domain
configuration. You can simply use this instead:
<domain type='kvm'>
[ ... ]
<os firmware='efi'>
<type arch='x86_64' machine='q35'>hvm</type>
</os>
libvirt also allows to ask for specific firmware features. If you don't want use secure boot for example you can ask for the blank varstore template (no secure boot keys enrolled) this way:
<domain type='kvm'>
[ ... ]
<os firmware='efi'>
<type arch='x86_64' machine='q35'>hvm</type>
<firmware>
<feature name='enrolled-keys' enabled='no' />
</firmware>
</os>
In case you change the configuration of an existing virtual machine
you might (depending on the kind of change) have to run virsh
start --reset-nvram domain
once to to start over with
a fresh copy of the varstore template.
The world has moved forward. UEFI isn't a niche use case any more. Linux distributions all provide good packages theys days. The edk2 project got good CI coverage (years ago it was my autobuilder raising the flag when a commit broke the gcc build). The edk2 project got a regular release process distros can (and do) follow.
All in all the effort to maintain the autobuilder doesn't look justified any more.
Are you using CryptoExpress cards with KVM on IBM zSystems or LinuxONE? Sebastian Mitterle has a very good overview on how to make crypto device passthrough persistent.
http://learningbytesting.mathume.com/2022/07/persistent-crypto-device-passthrough-on.html
A recent thread on the Fedora development list about unified kernel images co-incided with work I’m involved in wrt confidential computing (AMD SEV[-SNP], Intel TDX, etc). In exploring the different options for booting virtual machines in a confidential computing environment, one of the problems that keeps coming up is that of validating the boot measurements of the initrd and kernel command line. The initrd is currently generated on the fly at the time the kernel is installed on a host, while the command line typically contains host specific UUIDs for filesystems or LUKS volumes. Before even dealing with those problems, grub2
‘s support for TPMs causes pain due to its need to measure every single grub.conf
configuration line that is executed into a PCR. Even with the most minimal grub.conf
using autodiscovery based on the boot loader spec, the grub.con
f boot measurements are horribly cumbersome to deal with.
With this in mind, in working on confidential virtualization, we’re exploring options for simplifying the boot process by eliminating any per-host variable measurements. A promising way of achieving this is to make use of sd-boot
instead of grub2
, and using unified kernel images pre-built and signed by the OS vendor. I don’t have enough familiarity with this area of Linux, so I’ve been spending time trying out the different options available to better understand their operation. What follows is a short description of how i took an existing Fedora 36 virtual machine and converted it to sd-boot
with a unified kernel image.
First of all, I’m assuming that the virtual machine has been installed using UEFI (EDK2’s OOVMF build) as the firmware, rather than legacy BIOS (aka SeaBIOS). This is not the default with virt-manager
/virt-install
, but an opt-in is possible at time of provisioning the guest. Similarly it is possible to opt-in to adding a virtual TPM to the guest, for the purpose of receiving boot measurements. Latest upstream code for virt-manager
/virt-install
will always add a vTPM if UEFI is requested.
Assuming UEFI + vTPM are enabled for the guest, the default Fedora / RHEL setup will also result in SecureBoot being enabled in the guest. This is good in general, but the sd-boot
shipped in Fedora is not currently signed. Thus for (current) testing, either disable SecureBoot, or manually sign the sd-boot
binary with a local key and enroll that key with UEFI. SecureBoot isn’t immediately important, so the quickest option is disabling SecureBoot with the following libvirt guest XML config setup:
<os firmware='efi'> <type arch='x86_64' machine='pc-q35-6.2'>hvm</type> <firmware> <feature enabled='no' name='secure-boot'/> </firmware> <loader secure='no'/> <boot dev='hd'/> </os>
The next time the guest is cold-booted, the ‘--reset-nvram
‘ flag needs to be passed to ‘virsh start
‘ to make it throwaway the existing SecureBoot enabled NVRAM and replace it with one disabling SecureBoot.
$ virsh start --reset-nvram fedora36test
Inside the guest, surprisingly, there were only two steps required, installing ‘sd-boot’ to the EFI partition, and building the unified kernel images. Installing ‘sd-boot’ will disable the use of grub, so don’t reboot after this first step, until the kernels are setup:
$ bootctl install Created "/boot/efi/EFI/systemd". Created "/boot/efi/loader". Created "/boot/efi/loader/entries". Created "/boot/efi/EFI/Linux". Copied "/usr/lib/systemd/boot/efi/systemd-bootx64.efi" to "/boot/efi/EFI/systemd/systemd-bootx64.efi". Copied "/usr/lib/systemd/boot/efi/systemd-bootx64.efi" to "/boot/efi/EFI/BOOT/BOOTX64.EFI". Updated /etc/machine-info with KERNEL_INSTALL_LAYOUT=bls Random seed file /boot/efi/loader/random-seed successfully written (512 bytes). Not installing system token, since we are running in a virtualized environment. Created EFI boot entry "Linux Boot Manager".
While the ‘/boot/efi/loader
‘ directory could be populated with config files specifying kernel/initrd/cmdline to boot, the desire is to be able to demonstrate booting with zero host local configuration. So the next step is to build and install the unified kernel image. The Arch Linux wiki has a comprehensive guide, but the easiest option for Fedora appears to be to use dracut
with its ‘--uefi
‘ flag
$ for i in /boot/vmlinuz-*x86_64 do kver=${i#/boot/vmlinuz-} echo "Generating $kver" dracut --uefi --kver $f --kernel-cmdline "root=UUID=5fd49e99-6297-4880-92ef-bc31aef6d2f0 ro rd.luks.uuid=luks-6806c81d-4169-4e7a-9bbc-c7bf65cabcb2 rhgb quiet" done Generating 5.17.13-300.fc36.x86_64 Generating 5.17.5-300.fc36.x86_64
The observant will notice the ‘–kernel-cmdline’ argument refers to install specific UUIDs for the LUKS volume and root filesystem. This ultimately needs to be eliminated too, which would require configuring the guest disk image to comply with the discoverable partitions spec. That is beyond the scope of my current exercise of merely demonstrating use of sd-boot and unified kernels. It ought to be possible to write a kickstart file to automate creation of a suitable cloud image though.
At this point the VM is rebooted, and watching the graphical console confirms that the grub menu has disappeared and display output goes straight from the UEFI splash screen into Linux. There’s no menu shown by sd-boot
by default, but if desired this can be enabled by editing /boot/efi/loader/loader.conf
to uncomment the line timeout 3
, at which point it will show the kernel version selection at boot.
If following this scheme, bear in mind that nothing is wired up to handle this during kernel updates. The kernel RPM triggers will continue to setup grub.conf
and generate standalone initrds. IOW don’t try this on a VM that you care about. I assume there’s some set of commands I could use to uninstall sd-boot
and switch back to grub
, but I’ve not bothered to figure this out.
Overall this exercise was suprisingly simple and painless. The whole idea of using a drastically simplified boot loader instead of grub, along with pre-built unified kernel images, feels like it has alot of promise, especially in the context of virtual machines where the set of possible boot hardware variants is small and well understood.
A new version of the virtio specification has been released! As it has been three years after the 1.1 release, quite a lot of changes have accumulated. I have attempted to list some of them below; for details, you are invited to check out the spec :)
There are already some changes queued for 1.3; let’s hope it won’t take us three years again before the next release ;)
Several new device types have been added.
Enhancements have been added to some already existing device types.
Some general enhancements include:
The Steam Deck is a handheld gaming computer that runs a Linux-based operating system called SteamOS. The machine comes with SteamOS 3 (code name “holo”), which is in turn based on Arch Linux.
Although there is no SteamOS 3 installer for a generic PC (yet), it is very easy to install on a virtual machine using QEMU. This post explains how to do it.
The goal of this VM is not to play games (you can already install Steam on your computer after all) but to use SteamOS in desktop mode. The Gamescope mode (the console-like interface you normally see when you use the machine) requires additional development to make it work with QEMU and will not work with these instructions.
A SteamOS VM can be useful for debugging, development, and generally playing and tinkering with the OS without risking breaking the Steam Deck.
Running the SteamOS desktop in a virtual machine only requires QEMU and the OVMF UEFI firmware and should work in any relatively recent distribution. In this post I’m using QEMU directly, but you can also use virt-manager or some other tool if you prefer, we’re emulating a standard x86_64 machine here.
SteamOS is a single-user operating system and it uses an A/B partition scheme, which means that there are two sets of partitions and two copies of the operating system. The root filesystem is read-only and system updates happen on the partition set that is not active. This allows for safer updates, among other things.
There is one single /home
partition, shared by both partition sets. It contains the games, user files, and anything that the user wants to install there.
Although the user can trivially become root, make the root filesystem read-write and install or change anything (the pacman
package manager is available), this is not recommended because
A simple way for the user to install additional software that survives OS updates and doesn’t touch the root filesystem is Flatpak. It comes preinstalled with the OS and is integrated with the KDE Discover app.
The first thing that we need is the installer. For that we have to download the Steam Deck recovery image from here: https://store.steampowered.com/steamos/download/?ver=steamdeck&snr=
Once the file has been downloaded, we can uncompress it and we’ll get a raw disk image called steamdeck-recovery-4.img
(the number may vary).
Note that the recovery image is already SteamOS (just not the most up-to-date version). If you simply want to have a quick look you can play a bit with it and skip the installation step. In this case I recommend that you extend the image before using it, for example with ‘truncate -s 64G steamdeck-recovery-4.img
‘ or, better, create a qcow2 overlay file and leave the original raw image unmodified: ‘qemu-img create -f qcow2 -F raw -b steamdeck-recovery-4.img steamdeck-recovery-extended.qcow2 64G
‘
But here we want to perform the actual installation, so we need a destination image. Let’s create one:
$ qemu-img create -f qcow2 steamos.qcow2 64G
Now that we have all files we can start the virtual machine:
$ qemu-system-x86_64 -enable-kvm -smp cores=4 -m 8G \ -device usb-ehci -device usb-tablet \ -device intel-hda -device hda-duplex \ -device VGA,xres=1280,yres=800 \ -drive if=pflash,format=raw,readonly=on,file=/usr/share/ovmf/OVMF.fd \ -drive if=virtio,file=steamdeck-recovery-4.img,driver=raw \ -device nvme,drive=drive0,serial=badbeef \ -drive if=none,id=drive0,file=steamos.qcow2
Note that we’re emulating an NVMe drive for steamos.qcow2
because that’s what the installer script expects. This is not strictly necessary but it makes things a bit easier. If you don’t want to do that you’ll have to edit ~/tools/repair_device.sh
and change DISK
and DISK_SUFFIX
.
Once the system has booted we’ll see a KDE Plasma session with a few tools on the desktop. If we select “Reimage Steam Deck” and click “Proceed” on the confirmation dialog then SteamOS will be installed on the destination drive. This process should not take a long time.
Now, once the operation finishes a new confirmation dialog will ask if we want to reboot the Steam Deck, but here we have to choose “Cancel”. We cannot use the new image yet because it would try to boot into the Gamescope session, which won’t work, so we need to change the default desktop session.
SteamOS comes with a helper script that allows us to enter a chroot after automatically mounting all SteamOS partitions, so let’s open a Konsole and make the Plasma session the default one in both partition sets:
$ sudo steamos-chroot --disk /dev/nvme0n1 --partset A
# steamos-readonly disable
# echo '[Autologin]' > /etc/sddm.conf.d/zz-steamos-autologin.conf
# echo 'Session=plasma.desktop' >> /etc/sddm.conf.d/zz-steamos-autologin.conf
# steamos-readonly enable
# exit
$ sudo steamos-chroot --disk /dev/nvme0n1 --partset B
# steamos-readonly disable
# echo '[Autologin]' > /etc/sddm.conf.d/zz-steamos-autologin.conf
# echo 'Session=plasma.desktop' >> /etc/sddm.conf.d/zz-steamos-autologin.conf
# steamos-readonly enable
# exit
After this we can shut down the virtual machine. Our new SteamOS drive is ready to be used. We can discard the recovery image now if we want.
To boot SteamOS we can use a QEMU line similar to the one used during the installation. This time we’re not emulating an NVMe drive because it’s no longer necessary.
$ cp /usr/share/OVMF/OVMF_VARS.fd . $ qemu-system-x86_64 -enable-kvm -smp cores=4 -m 8G \ -device usb-ehci -device usb-tablet \ -device intel-hda -device hda-duplex \ -device VGA,xres=1280,yres=800 \ -drive if=pflash,format=raw,readonly=on,file=/usr/share/ovmf/OVMF.fd \ -drive if=pflash,format=raw,file=OVMF_VARS.fd \ -drive if=virtio,file=steamos.qcow2 \ -device virtio-net-pci,netdev=net0 \ -netdev user,id=net0,hostfwd=tcp::2222-:22
(the last two lines redirect tcp port 2222 to port 22 of the guest to be able to SSH into the VM. If you don’t want to do that you can omit them)
If everything went fine, you should see KDE Plasma again, this time with a desktop icon to launch Steam and another one to “Return to Gaming Mode” (which we should not use because it won’t work). See the screenshot that opens this post.
Congratulations, you’re running SteamOS now. Here are some things that you probably want to do:
deck
user: run ‘passwd
‘ on a terminalsudo systemctl enable sshd
‘ and/or ‘sudo systemctl start sshd
‘.ssh -p 2222 deck@localhost
‘The Steam Deck recovery image doesn’t install the most recent version of SteamOS, so now we should probably do a software update.
-m 8G
). The OS update might fail if you use less.sudo steamos-select-branch beta
‘ (or main
, if you want the bleeding edge)/etc/os-release
(see the BUILD_ID
variable)steamos-update check
‘steamos-update
‘Note: if the last step fails after reaching 100% with a post-install handler error then go to Connections in the system settings, rename Wired Connection 1 to something else (anything, the name doesn’t matter), click Apply and run steamos-update
again. This works around a bug in the update process. Recent images fix this and this workaround is not necessary with them.
As we did with the recovery image, before rebooting we should ensure that the new update boots into the Plasma session, otherwise it won’t work:
$ sudo steamos-chroot --partset other
# steamos-readonly disable
# echo '[Autologin]' > /etc/sddm.conf.d/zz-steamos-autologin.conf
# echo 'Session=plasma.desktop' >> /etc/sddm.conf.d/zz-steamos-autologin.conf
# steamos-readonly enable
# exit
After this we can restart the system.
If everything went fine we should be running the latest SteamOS release. Enjoy!
SteamOS is under active development. If you find problems or want to request improvements please go to the SteamOS community tracker.
Edit 06 Jul 2022: Small fixes, mention how to install the OS without using NVMe.
Queues and their implementation using shared memory ring buffers are a standard tool for communicating with I/O devices and between CPUs. Although ring buffers are widely used, there is no standard memory layout and it's interesting to compare the differences between designs. When defining libblkio's APIs, I surveyed the ring buffer designs in VIRTIO, NVMe, and io_uring. This article examines some of the differences between the ring buffers and queue semantics in VIRTIO, NVMe, and io_uring.
A ring buffer is a circular array where new elements are written or produced on one side and read or consumed on the other side. Often terms such as head and tail or reader and writer are used to describe the array indices at which the next element is accessed. When the end of the array is reached, one moves back to the start of the array. The empty and full conditions are special states that must be checked to avoid underflow and overflow.
VIRTIO, NVMe, and io_uring all use single producer, single consumer shared memory ring buffers. This allows a CPU and an I/O device or two CPUs to communicate across a region of memory to which both sides have access.
At a minimum a ring buffer element, or descriptor, contains the memory address and size of a data buffer:
Offset | Type | Name |
---|---|---|
0x0 | u64 | buf |
0x8 | u64 | len |
In a storage device the data buffer contains a request structure with information about the I/O request (logical block address, number of sectors, etc). In order to process a request, the device first loads the descriptor and then loads the request structure described by the descriptor. Performing two loads is sub-optimal and it would be faster to fetch the request structure in a single load.
Embedding the data buffer in the descriptor is a technique that reduces the number of loads. The descriptor layout looks like this:
Offset | Type | Name |
---|---|---|
0x0 | u64 | remainder_buf |
0x8 | u64 | remainder_len |
0x10 | ... | request structure |
The descriptor is extended to make room for the data. If the size of the data varies and is sometimes too large for a descriptor, then the remainder is put into an external buffer. The common case will only require a single load but larger variable-sized buffers can still be handled with 2 loads as before.
VIRTIO does not embed data in descriptors due to its layered design. The data buffers are defined by the device type (net, blk, etc) and virtqueue descriptors are one layer below device types. They have no knowledge of the data buffer layout and therefore cannot embed data.
NVMe embeds the request structure into the Submission Queue Entry. The Command Dword 10, 11, 12, 13, 14, and 15 fields contain the request data and their meaning depends on the Opcode (request type). I/O buffers are still external and described by Physical Region Pages (PRPs) or Scatter Gather Lists (SGLs).
io_uring's struct io_uring_sqe embeds the request structure. Only I/O buffer(s) need to be external as their size varies, would be too large for the ring buffer, and typically zero-copy is desired due to the size of the data.
It seems that VIRTIO could learn from NVMe and io_uring. Instead of having small 16-byte descriptors, it could embed part of the data buffer into the descriptor so that devices need to perform fewer loads during request processing. The 12-byte struct virtio_net_hdr and 16-byte struct virtio_blk_req request headers would fit into a new 32-byte descriptor layout. I have not prototyped and benchmarked this optimization, so I don't know how effective it is.
I/O requests often include variable size I/O buffers that require scatter-gather lists similar to POSIX struct iovec arrays. Long arrays don't fit into a descriptor so descriptors have fields that point to an external array of descriptors.
Another technique for scatter-gather lists is to chain descriptors together within the ring buffer instead of relying on memory external to the ring buffer. When descriptor chaining is used, I/O requests that don't fit into a single descriptor can occupy multiple descriptors.
Advantages of chaining are better cache locality when a sequence of descriptors is used and no need to allocate separate per-request external descriptor memory.
A consequence of descriptor chaining is that the maximum queue size, or queue depth, becomes variable. It is not possible to guarantee space for specific number of I/O requests because the available number of descriptors depends on the chain size of requests placed into the ring buffer.
VIRTIO supports descriptor chaining although drivers usually forego it when VIRTIO_F_RING_INDIRECT_DESC is available.
NVMe and io_uring do not support descriptor chaining, instead relying on embedded and external descriptors.
The maximum number of in-flight requests depends on the ring buffer design. Designs where descriptors are occupied from submission until completion prevent descriptor reuse for other requests while the current request is in flight.
An alternative design is where the device processes submitted descriptors and they are considered free again as soon as the device has looked at them. This approach is natural when separate submission and completion queues are used and there is no relationship between the two descriptor rings.
VIRTIO requests occupy descriptors for the duration of their lifetime, at least in the Split Virtqueue format. Therefore the number of in-flight requests is influenced by the descriptor table size.
NVMe has separate Submission Queues and Completion Queues, but its design still limits the number of in-flight requests to the queue size. The Completion Queue Entry's SQ Head Pointer (SQHD) field precludes having more requests in flight than the Submission Queue size because the field would no longer be unique. Additionally, the driver has no way of detecting Submission Queue Head changes, so it only knows there is space for more submissions when completions occur.
io_uring has independent submission (SQ) and completions queues (CQ) with support for more in-flight requests than the ring buffer size. When there are more in-flight requests than CQ capacity, it's possible to overflow the CQ. io_uring has a backlog mechanism for this case, although the intention is for applications to properly size queues to avoid hitting the backlog often.
VIRTIO, NVMe, and io_uring have slightly different takes on queue design. The semantics and performance vary due to these differences. VIRTIO lacks data embedding inside descriptors. io_uring supports more in-flight requests than the queue size. NVMe and io_uring rely on external descriptors with no ability to chain descriptors.
Here is a quickstart for everyone who wants (or needs to) deal with edk2 firmware, with a focus on virtual machine firmware. The article assumes you are using a linux machine with gcc.
To build edk2 you need to have a bunch of tools installed. An
compiler and the make
are required of course, but also
iasl
, nasm
and libuuid
. So
install them first (package names are for centos/fedora).
dnf install -y make gcc binutils iasl nasm libuuid-devel
If you want cross-build arm firmware on a x86 machine you also need cross compilers. While being at also set the environment variables needed to make the build system use the cross compilers:
dnf install -y gcc-aarch64-linux-gnu gcc-arm-linux-gnu
export GCC5_AARCH64_PREFIX="aarch64-linux-gnu-"
export GCC5_ARM_PREFIX="arm-linux-gnu-"
Next clone the tiaocore/edk2 repository and also fetch the git submodules.
git clone https://github.com/tianocore/edk2.git
cd edk2
git submodule update --init
The edksetup
script will prepare the build environment
for you. The script must be sourced because it sets some
environment variables (WORKSPACE
being the most
important one). This must be done only once (as long as you keep
the shell with the configured environment variables open).
source edksetup.sh
Next step is building the BaseTools (also needed only once):
make -C BaseTools
Note: Currently (April 2022) BaseTools are being rewritten in Python, so most likely this step will not be needed any more at some point in the future.
Finally the build (for x64 qemu) can be kicked off:
build -t GCC5 -a X64 -p OvmfPkg/OvmfPkgX64.dsc
The firmware volumes built can be found
in Build/OvmfX64/DEBUG_GCC5/FV
.
Building the aarch64 firmware instead:
build -t GCC5 -a AARCH64 -p ArmVirtPkg/ArmVirtQemu.dsc
The build results land
in Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/FV
.
Qemu expects the aarch64 firmware images being 64M im size. The firmware images can't be used as-is because of that, some padding is needed to create an image which can be used for pflash:
dd of="QEMU_EFI-pflash.raw" if="/dev/zero" bs=1M count=64
dd of="QEMU_EFI-pflash.raw" if="QEMU_EFI.fd" conv=notrunc
dd of="QEMU_VARS-pflash.raw" if="/dev/zero" bs=1M count=64
dd of="QEMU_VARS-pflash.raw" if="QEMU_VARS.fd" conv=notrunc
There are a bunch of compile time options, typically enabled
using -D NAME
or -D NAME=TRUE
. Options
which are enabled by default can be turned off using -D
NAME=FALSE
. Available options are defined in
the *.dsc
files referenced by the build
command. So a feature-complete build looks more like this:
build -t GCC5 -a X64 -p OvmfPkg/OvmfPkgX64.dsc \
-D FD_SIZE_4MB \
-D NETWORK_IP6_ENABLE \
-D NETWORK_HTTP_BOOT_ENABLE \
-D NETWORK_TLS_ENABLE \
-D TPM2_ENABLE
Secure boot support (on x64) requires SMM mode. Well, it builds and works without SMM, but it's not secure then. Without SMM nothing prevents the guest OS writing directly to flash, bypassing the firmware, so protected UEFI variables are not actually protected.
Also suspend (S3) support works with enabled SMM only in case parts of the firmware (PEI specifically, see below for details) run in 32bit mode. So the secure boot variant must be compiled this way:
build -t GCC5 -a IA32 -a X64 -p OvmfPkg/OvmfPkgIa32X64.dsc \
-D FD_SIZE_4MB \
-D SECURE_BOOT_ENABLE \
-D SMM_REQUIRE \
[ ... add network + tpm + other options as needed ... ]
The FD_SIZE_4MB
option creates a larger firmware image,
being 4MB instead of 2MB (default) in size, offering more space for
both code and vars. The RHEL/CentOS builds use that. The Fedora
builds are 2MB in size, for historical reasons.
If you need 32-bit firmware builds for some reason, here is how to do it:
build -t GCC5 -a ARM -p ArmVirtPkg/ArmVirtQemu.dsc
build -t GCC5 -a IA32 -p OvmfPkg/OvmfPkgIa32.dsc
The build results will be in
in Build/ArmVirtQemu-ARM/DEBUG_GCC5/FV
and
Build/OvmfIa32/DEBUG_GCC5/FV
The x86 firmware builds create three different images:
OVMF_VARS.fd
/var/lib/libvirt/qemu/nvram
.
OVMF_CODE.fd
OVMF.fd
CODE
and VARS
. This can be loaded as ROM
using -bios
, with two drawbacks: (a) UEFI variables
are not persistent, and (b) it does not work
for SMM_REQUIRE=TRUE
builds.
qemu handles pflash storage as block devices, so we have to create block devices for the firmware images:
CODE=${WORKSPACE}/Build/OvmfX64/DEBUG_GCC5/FV/OVMF_CODE.fd
VARS=${WORKSPACE}/Build/OvmfX64/DEBUG_GCC5/FV/OVMF_VARS.fd
qemu-system-x86_64 \
-blockdev node-name=code,driver=file,filename=${CODE},read-only=on \
-blockdev node-name=vars,driver=file,filename=${VARS},snapshot=on \
-machine q35,pflash0=code,pflash1=vars \
[ ... ]
Here is the arm version of that (using the padded files created
using dd
, see above):
CODE=${WORKSPACE}/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/FV/QEMU_EFI-pflash.raw
VARS=${WORKSPACE}/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/FV/QEMU_VARS-pflash.raw
qemu-system-aarch64 \
-blockdev node-name=code,driver=file,filename=${CODE},read-only=on \
-blockdev node-name=vars,driver=file,filename=${VARS},snapshot=on \
-machine virt,pflash0=code,pflash1=vars \
[ ... ]
The core edk2 repo holds a number of packages, each package has its own toplevel directory. Here are the most interesting ones:
The firmware modules in the edk2 repo often named after the boot
phase they are running in. Most drivers are
named SomeThingDxe
for example.
This week, Red Hat Enterprise Linux 9 has been announced, which will also bring us lots of new stuff for our beloved mainframe.
First, compared with RHEL 8, a lot of generic packages have been updated, of course. For example, RHEL 9 on IBM Z comes with:
And of course all of these have been thoroughly tested during the past months, which is also the reason why RHEL sometimes does not ship the very latest bleeding edge versions of the upstream projects – thorough testing needs some time. But you can be sure that Red Hat also backported lots of selected upstream fixes and improvements e.g. for the kernel to their downstream packages, so this is very up to date and stable software here.
The first big news is: There is no need anymore to install the separate virt:av (“Advanced Virtualization”) module to get the latest and greatest virtualization features on IBM Z. Everything is packaged along with the main RHEL distribution for easier installation now and will be kept up-to-date there, with important new features like virtio-fs enabled by default. And of course, as with the latest releases of RHEL 8, there is also no limit to 4 guests anymore, so you don’t have to worry about the number of supported KVM guests (as long as your hardware can handle them).
The versions that will be shipped with RHEL 9.0 are:
To answer the maybe most important question: Yes, this will also support the brand new IBM z16 mainframe already. Basic support for this new generation has already been added to QEMU 6.1.0 and kernel 5.14, and additional z16 features have been enabled by default in QEMU 6.2.0.
Another great new change is that it is now possible to
configure mediated devices directly with the virtualization CLI tools on IBM Z.
You can now add vfio-ap and vfio-ccw mediated devices to your KVM guests
using virt-install
or virt-xml
. With virt-install
, you can
also create a VM that uses an existing DASD mediated device as its primary disk.
Additionally, many small performance improvements (like the specification exception interpretation feature) and bug fixes have been backported to the RHEL 9 kernel and the userspace tools to give you a great virtualization experience with RHEL 9.
One more thing that is worth mentioning (though it is not specific to IBM Z), which you might have noticed by clicking on the links in the previous paragraphs already, there is another big change in RHEL 9: The development of the upcoming minor RHEL 9 releases (i.e. 9.1, 9.2, etc.) is now done in the public via the CentOS Stream repositories. That means you can not only peak on the work that will be integrated in the next 9.y release, you can now even directly participate in the development of these next release if you like! Isn’t that cool?
Anyway, no matter whether you are planning to participate or just want to use the software, please enjoy the new KVM virtualization stack on the mainframe!
Flatpak is a way to distribute applications on Linux. Its container-style approach allows applications to run across Linux distributions. This means native packages (rpm, deb, etc) are not needed and it's relatively easy to get your app to Linux users with fewer worries about distro compatibility. This makes life a lot easier for developers and is also convenient for users.
I've run popular applications like OBS Studio as flatpaks and even publish my own on Flathub, a popular hosting site for applications. Today I figured out how to debug flatpaks, which requires some extra steps that I'll share below so I don't forget them myself!
If you're building a flatpak of your own application it's handy to use the dir sources type in the manifest to compile your application's source code from a local directory instead of a git tag or tarball URL. This way you can make changes to the source code and test them quickly inside Flatpak.
Put something along these lines in the manifest's modules object where /home/user/my-app is you local directory with your app's source code:
{
"name": "my-app",
"sources": [
{
"type": "dir",
"path": "/home/user/my-app"
}
],
...
}
flatpak-builder(1) automatically creates a separate .Debug extension for your flatpak that contains your application's debuginfo. You'll need the .Debug extension if you want proper backtraces and source level debugging. At the time of writing the Flatpak documentation did not mention how to install the locally-built .Debug extension. Here is how:
$ flatpak-builder --user --force-clean --install build my.org.app.json
$ flatpak install --user --reinstall --assumeyes "$(pwd)/.flatpak-builder/cache" my.org.app.Debug
It might be a good idea to install debuginfo for the system libraries in your SDK too in case it's not already installed:
$ flatpak install org.kde.Sdk.Debug # or your runtime's SDK
There is a flatpak(1) option that launches the application with the SDK instead of the Runtime:
$ flatpak run --user --devel my.org.app
The SDK contains development tools whereas the Runtime just has the files needed to run applications.
It can also be handy to launch a shell so you can control the launch of your app and maybe use gdb or strace:
$ flatpak run --user --devel --command=sh my.org.app
[📦 my.org.app ~]$ gdb /app/bin/my-app
If your application crashes it will dump core like any other process. However, existing ways of inspecting core dumps like coredumpctl(1) are not fully functional because the process ran inside namespaces and debuginfo is located inside flatpaks instead of the usual system-wide /usr/lib/debug location. coredumpctl(1), gdb, etc aren't Flatpak-aware and need extra help.
Use the flatpak-coredumpctl wrapper to launch gdb:
$ flatpak-coredumpctl -m <PID> my.org.app
You can get PID from the list printed by coredumpctl(1).
This article showed how to install locally-built .Debug extensions and inspect core dumps when using Flatpak. I hope that over time these manual steps will become unnecessary as flatpak-builder(1) and coredumpctl(1) are extended to automatically install .Debug extensions and handle Flatpak core dumps. For now it just takes a few extra commands compared to debugging regular applications.
Canonical release a new LTS (Long Term Support) version of its Ubuntu server offering Ubuntu Server 22.04!
It ships
Ubuntu Server 21.10 is out!
It ships
TL;DR
We’d like to announce the availability of the QEMU 7.0.0 release. This release contains 2500+ commits from 225 authors.
You can grab the tarball from our download page. The full list of changes are available in the Wiki.
Highlights include:
Thank you to everyone involved!
Today, IBM announced the new IBM z16, with a planned availability date of May 31.
See here for the press release, and here for the offical homepag. For further details, including a list of supported Linux distributions, see Eberhard's blog here.
And for a more hands-on tour of the new box, check out this video.
The solution assurance team started to publish solution setups, recommendations, and step-by-step guidelines for a broad range of topics, for example:
You can access the materials here, but don't forget to check back periodically: There is more to come!
This new publication aims towards providing practical insights for running real-world workloads on KVM on IBM Z. From the abstract:
The SAP on IBM Z Performance team, in Poughkeepsie, NY, conducted a series of measurements to assess the performance cost of implementing a KVM environment to host SAP application servers. The tests used SAP (SBS 9.0) core banking workloads, with a Db2 database having 100 million banking accounts, which are comparable to some of the largest banks in the world. Tests were conducted that used both banking workload types, Account Settlement (batch) and Day Posting, which simulates online transactional processing (OLTP). They were executed on an IBM z15 with 16 and 32 Integrated Facility for Linux (IFL) processor configurations, that used various degrees of virtualization.The document is available here.
We have great news to share: QEMU has been accepted as a Google Summer of Code 2022 organization! Google Summer of Code is an open source internship program offering paid remote work opportunities for contributing to open source. The internship runs from June 13th to September 12th.
Now is the chance to get involved in QEMU development! The QEMU community has put together a list of project ideas here.
Google has dropped the requirement that you need to be enrolled in a higher education course. We’re excited to work with a wider range of contributors this year! For details on the new eligibility requirements, see here.
You can submit your application from April 4th to 19th.
GSoC interns work together with their mentors, experienced QEMU contributors who support their interns in their projects. Code developed during the internship is submitted through the same open source development process that all QEMU contributions follow. This gives interns experience with contributing to open source software. Some interns then choose to pursue a career in open source software after completing their internship.
If you have questions about applying for QEMU GSoC, please email Stefan Hajnoczi or ask on the #qemu-gsoc IRC channel.
Sergio Lopez sent a QEMU patch series and vhost-user protocol specification update that maps vhost-user to non-Linux POSIX host operating systems. This is great news because vhost-user has become a popular way to develop emulated devices in any programming language that execute as separate processes with their own security sandboxing. Until now they have only been available on Linux hosts.
At the moment the BSD and macOS implementation is slower than the Linux implementation because the KVM ioeventfd and irqfd primitives are unavailable on those operating systems. Instead POSIX pipes is used and the VMM (QEMU) needs to acts as a forwarder for MMIO/PIO accesses and interrupt injections. On Linux the kvm.ko kernel module has direct support for this, bypassing the VMM process and achieving higher efficiency. However, similar mechanisms could be added to non-KVM virtualization drivers in the future.
This means that vhost-user devices can now start to support multiple host operating systems and I'm sure they will be used in new ways that no one thought about before.
New project: Tools for for ovmf (and armvirt) firmware volumes.
It's written in python and can be installed with a simple pip3
install ovmfctl
. The project is hosted
at gitlab.
Usage: ovmfctl --input file.fd
.
It's a debugging tool which just prints the structure and content of firmware volumes.
This is a tool to print and modify variable store volumes. Main focus has been on certificate handling so far.
Enrolling certificates for secure boot support in virtual machines has been a rather painfull process. It's handled by EnrollDefaultKeys.efi which needs to be started inside a virtual machine to enroll the certificates and enable secure boot mode.
With ovmfctl it is dead simple:
ovmfctl --input /usr/share/edk2/ovmf/OVMF_VARS.fd \
--enroll-redhat \
--secure-boot \
--output file.fd
This enrolls the Red Hat Secure Boot certificate which is used by
Fedora, CentOS and RHEL as platform key. The usual Microsoft
certificates are added to the certificate database too, so windows
guests and shim.efi
work as expected.
If you want more fine-grained control you can use
the --set-pk
, --add-kek
, --add-db
and --add-mok
switches instead.
The --enroll-redhat
switch above is actually just a shortcut for:
--set-pk a0baa8a3-041d-48a8-bc87-c36d121b5e3d RedHatSecureBootPKKEKkey1.pem \
--add-kek a0baa8a3-041d-48a8-bc87-c36d121b5e3d RedHatSecureBootPKKEKkey1.pem \
--add-kek 77fa9abd-0359-4d32-bd60-28f4e78f784b MicrosoftCorporationKEKCA2011.pem \
--add-db 77fa9abd-0359-4d32-bd60-28f4e78f784b MicrosoftWindowsProductionPCA2011.pem \
--add-db 77fa9abd-0359-4d32-bd60-28f4e78f784b MicrosoftCorporationUEFICA2011.pem \
If you just want the variable store be printed use ovmfctl
--input file.fd --print
. Add --hexdump
for more details.
Extract all certificates: ovmfctl --input file.fd
--extract-certs
.
Try ovmfctl --help
for a complete list of command line
switches. Note that Input and output file can be indentical for
inplace updates.
That's it. Enjoy!
QEMU is offering open source internships in Outreachy’s May-August 2022 round. You can submit your application until February 25th 2022 if you want to contribute to QEMU in a remote work internship this summer.
Outreachy internships are extended to people who are subject to systemic bias and underrepresentation in the technical industry where they are living. For details on applying, please see the Outreachy website. If you are not eligible, don’t worry, QEMU is also applying to participate in Google Summer of Code again and we hope to share news about additional internships later this year.
Outreachy interns work together with their mentors, experienced QEMU contributors who support their interns in their projects. Code developed during the internship is submitted via the same open source development process that all QEMU code follows. This gives interns experience with contributing to open source software. Some interns then choose to pursue a career in open source software after completing their internship.
Now is the chance to get involved in QEMU development!
If you have questions about applying for QEMU Outreachy, please email Stefan Hajnoczi or ask on the #qemu-gsoc IRC channel.
I will give a talk titled What's coming in VIRTIO 1.2: New virtual I/O devices and features on Saturday, February 5th 2022 at 10:00 CET at the FOSDEM virtual conference (it's free and there is no registration!). The 9 new device types will be covered, as well as some of the other features that have been added to the upcoming 1.2 release of the VIRTIO specification. I hope to see you there and if you miss it there will be slides and video available afterwards.
In the first part of this article, I talked about how you can use versioned machine types to ensure compatibility. But the more interesting part is how this actually works under the covers.
QEMU devices often come with a list of properties that influence how the device is created and how it operates. Typically, authors try to come up with reasonable default values, which may be overriden if desired. However, the idea of what is considered reasonable may change over time, and a newer QEMU may provide a different default value for a property.
If you want to migrate a guest from an older QEMU machine to a more recent
QEMU, you obviously need to use the default values from that older QEMU machine
as well. For that, QEMU uses arrays of GlobalPropery
structures.
If you take a look at hw/core/machine.c
, you will notice several arrays named
hw_compat_<major>_<minor>
. These contain triplets specifying (from right to
left) the default value for a certain property for a certain device. The arrays
are designed to be included by the compat machine for <major>.<minor>
, thus
specifying a default value for that machine version and older. (More on this
later in this article.)
For example, QEMU 5.2 changed the default number of virtio queues defined for
virtio-blk and virtio-scsi devices: prior to 5.1, one queue would be present
if no other value had been specified; with 5.2, the default number of queues
would align with the number of vcpus for virtio-pci. Therefore, hw_compat_5_1
contains the following lines:
{ "virtio-blk-device", "num-queues", "1"},
{ "virtio-scsi-device", "num_queues", "1"},
(and some corresponding lines for vhost.) This makes sure that any virtio-blk
or virtio-scsi device on a -5.1
or older machine type will have one virtio
queue per default. Note that this holds true for all virtio-blk and virtio-scsi
devices, regardless of which transport they are using; for transports like ccw
where nothing changed with 5.2, this simply does not make any difference.
Generally, statements for all devices can go into the hw_compat_
arrays; if a
device is not present or even not available at all for the machine that is
started, the statement will simply not take any effect.
For the x86 machine types (pc-i440fx
and pc-q35
), pc_compat_<major>_<minor>
arrays are defined in hw/i386/pc.c
, mostly covering properties for x86 cpus,
but also some other x86-specific devices.
Some incompatible changes are not happening at the device property level, so the compat properties approach cannot be used. Instead, the individual machines need to take care of those changes.
For example, in QEMU 6.2 the smp parsing code started to prefer cores over sockets instead of preferring sockets. Therefore, all 6.1 compat machines have code like
m->smp_props.prefer_sockets = true;
to set prefer_sockets to true in the MachineClass. (Note that the m68k virt machine does not support smp, and therefore does not need that statement.)
Machines also sometimes need to configure associated capabilities in a compatible way. For example, the s390x cpu models may gain new feature flags in newer QEMU releases; when using a compat machine, those new flags need to be off in the cpu models that are used by default.
Compat machines for older machine types need the compatibility changes for newer
machine types as well as some changes on top. Typically, this is done by the
MachineState respectively MachineClass initializing functions for version n-1
calling the respective initializing functions for version n
. As all new
compatibility changes are added for the latest versioned machine type, changes
are propagated down the whole stack of versions.
All machine types for version n
include the hw_compat_<n>
array (and the
pc_compat_<n>
array for x86), unless they are the latest version (which does
not need any compat handling yet.) The older compat property arrays are included
via the inheritance mechanism.
QEMU currently supports versioned machine types for x86 (pc-i440fx
, pc-q35
),
arm (virt
), aarch64 (virt
), s390x (s390-ccw-virtio
), ppc64 (pseries
),
and m68k (virt
). At the beginning of each development cycle, new (empty) arrays
of compat properties for the last version are added and wired up in the machine
types for that last version, new versions of each of these machines are added to
the code, and the defaults switched to them (well, that’s the goal.) After that,
the framework for adding incompatible changes is in place.
If you find that these changes have not yet been made when you plan to make an incompatible change, it is important that you add the new machine types first.
If you plan to change the default value of a device property, or add a new property
with a default value that will cause guest-observable changes, you need to add
an entry that preserves the old value (or sets a value that does not change the
behaviour) to the compat property array for the last version. In general (non-x86
specific change), that means adding it to the hw_compat_
array, and all machine
types will use it automatically.
Take care to use the right device for specifying the property; for example, there
is often some confusion when dealing with virtio devices. If you e.g. modify a
virtio-blk property (as in the example above), you need to add a statement for
virtio-blk-device
and not for virtio-blk-pci
, or virtio-blk instances using
the ccw or mmio transports would be left out. If, on the other hand, you modify
a property only for virtio-blk devices using the pci transport, you need to add
a statement for virtio-blk-pci
. Similar considerations apply to other devices
inheriting from base types.
If you change a non-device default characteristic, you need to add a compatibility statement for the machine types for the last version in their instance (or class) init functions. The hardest part here is making sure that all relevant machine types get the update.
For example, if you add a change in the s390x cpu models, it is easy to see that
you only need to modify the code for the s390-ccw-virtio
machine. For other
changes, every versioned machine needs the change. And there are cases like
the prefer_sockets
change mentioned above, that apply to any machine type
that supports smp.
I hope that these explanations help a bit with understanding how machine type compatibility works, and where to add your own changes.
If you want to migrate a guest initially started on an older QEMU version to a newer version of QEMU, you need to make sure that the two machines are actually compatible with each other. Once you exclude things like devices that cannot be migrated at all and make sure both QEMU invocations actually create the same virtual hardware, this basically boils down to using compatible machines.
If you simply want to create a machine without any consideration regarding migration compatibility, you will usually do something like
qemu-system-ppc64 -machine pseries (...)
This will create a machine of the pseries
type. But in this case, pseries
is actually an alias to the latest version of this machine type; for 6.2,
this would be pseries-6.2
. You can find out which machine types are versioned
(and which machine types actually exist for a given binary) via -machine ?
:
$ qemu-system-ppc64 -machine ?
Supported machines are:
40p IBM RS/6000 7020 (40p)
bamboo bamboo
g3beige Heathrow based PowerMAC
mac99 Mac99 based PowerMAC
mpc8544ds mpc8544ds
none empty machine
pegasos2 Genesi/bPlan Pegasos II
powernv10 IBM PowerNV (Non-Virtualized) POWER10
powernv8 IBM PowerNV (Non-Virtualized) POWER8
powernv IBM PowerNV (Non-Virtualized) POWER9 (alias of powernv9)
powernv9 IBM PowerNV (Non-Virtualized) POWER9
ppce500 generic paravirt e500 platform
pseries-2.1 pSeries Logical Partition (PAPR compliant)
pseries-2.10 pSeries Logical Partition (PAPR compliant)
pseries-2.11 pSeries Logical Partition (PAPR compliant)
pseries-2.12 pSeries Logical Partition (PAPR compliant)
pseries-2.12-sxxm pSeries Logical Partition (PAPR compliant)
pseries-2.2 pSeries Logical Partition (PAPR compliant)
pseries-2.3 pSeries Logical Partition (PAPR compliant)
pseries-2.4 pSeries Logical Partition (PAPR compliant)
pseries-2.5 pSeries Logical Partition (PAPR compliant)
pseries-2.6 pSeries Logical Partition (PAPR compliant)
pseries-2.7 pSeries Logical Partition (PAPR compliant)
pseries-2.8 pSeries Logical Partition (PAPR compliant)
pseries-2.9 pSeries Logical Partition (PAPR compliant)
pseries-3.0 pSeries Logical Partition (PAPR compliant)
pseries-3.1 pSeries Logical Partition (PAPR compliant)
pseries-4.0 pSeries Logical Partition (PAPR compliant)
pseries-4.1 pSeries Logical Partition (PAPR compliant)
pseries-4.2 pSeries Logical Partition (PAPR compliant)
pseries-5.0 pSeries Logical Partition (PAPR compliant)
pseries-5.1 pSeries Logical Partition (PAPR compliant)
pseries-5.2 pSeries Logical Partition (PAPR compliant)
pseries-6.0 pSeries Logical Partition (PAPR compliant)
pseries-6.1 pSeries Logical Partition (PAPR compliant)
pseries pSeries Logical Partition (PAPR compliant) (alias of pseries-6.2)
pseries-6.2 pSeries Logical Partition (PAPR compliant) (default)
ref405ep ref405ep
sam460ex aCube Sam460ex
taihu taihu
virtex-ml507 Xilinx Virtex ML507 reference design
As you can see, there are various pseries-x.y
machine types for older versions;
these are designed to present a configuration that is compatible with a default
machine that was created with an older QEMU version. For example, if you wanted
to migrate a guest running on a pseries
machine that was created using QEMU
5.1, the receiving QEMU would need to be started with
qemu-system-ppc64 -machine pseries-5.1 (...)
Note: the following applies to upstream QEMU. Distributions may support different versioned machine types in their builds.
This list is as of QEMU 6.2; new versioned machine types may be added in the future, and sometimes old ones deprecated and removed. The machine types for the next QEMU release are usually introduced early in the release cycle (at least, that is the goal…)
The virt
machine type supports versions since 2.6.
The virt
machine type supports versions since 6.0.
The pseries
machine type supports versions since 2.1.
The s390-ccw-virtio
machine type supports versions since 2.4.
The pc-i440fx
machine type supports versions since 1.4 (there used to be even
older ones, but they have been removed), while the pc-q35
machine type supports
versions since 2.4.
There’s an additional thing to consider here: the pc
machine type alias points
(as of QEMU 6.2) to the latest pc-i440fx
machine type; if you want the latest
pc-q35
machine type instead, you have to use q35
.
If you want to simply fire up a QEMU instance and shut it down again without wanting to migrate it anywhere, you can stick to the default machine type. However, if you might want to migrate the machine later, it is probably a good idea to specify a versioned machine type explicitly, so that you don’t have to remember which QEMU version you started it with.
Or just use management software like libvirt, which will do the machine type expansion to the latest version for you automatically, so you don’t have to worry about it later.
This concludes the usage part of compatible machine types; a follow-up post will look at how this is actually implemented.
We’d like to announce the availability of the QEMU 6.2.0 release. This release contains 2300+ commits from 189 authors.
You can grab the tarball from our download page. The full list of changes are available in the Wiki.
Highlights include:
Thank you to everyone involved!
Sometimes it's necessary to add debuginfo so perf-report(1) and similar commands can display human-readable function names instead of raw addresses. For instance, if a program was built from source and stripped of symbols when installing into /usr/local/bin/ then perf(1) does not have the symbol information available.
perf(1) maintains a cache of debuginfos keyed by the build-id (also known as .note.gnu.build-id) that uniquely identifies executables and shared objects on Linux. perf.data files contain the build-ids involved when the data was recorded. This allows perf-report(1) and similar commands to look up the required debuginfo from the build-id cache for address to function name translation.
If perf-report(1) displays raw addresses instead of human-readable function names, then we need to get the debuginfo for the build-ids in the perf.data file and add it to the build-id cache. You can show the build-ids required by a perf.data file with perf-buildid-list(1):
$ perf buildid-list # reads ./perf.data by default
b022da126fad1e0a287a6a25016f6c7c996e68c9 /lib/modules/5.14.11-200.fc34.x86_64/kernel/arch/x86/kvm/kvm-intel.ko.xz
f8aa9d9bf047e67b76f22426ad4af310f9b0325a /lib/modules/5.14.11-200.fc34.x86_64/kernel/arch/x86/kvm/kvm.ko.xz
6740f24c4733268d03b41f9483282297dde6b286 [vdso]
Your build-id cache may be missing debuginfo or have limited debuginfo with less symbol information than you need. For example, if data was collected from a stripped /usr/local/bin/my-program executable and you now want to update the build-id cache with the executable that contains full debuginfo, use the perf-buildid-cache(1) command:
$ perf buildid-cache --update=path/to/my-program-with-symbols
There is also an --add=path/to/debuginfo option for adding new build-ids that are not yet in the cache.
Now perf-report(1) and similar tools will display human-readable function names from path/to/my-program-with-symbols instead of the stripped /usr/local/bin/my-program executable. If that doesn't work, verify that the build-ids in my-program-with-symbols and my-program match.
RHEL 8.5 Advanced Virtualization (AV) is out! See the official announcement and the release notes.
KVM is supported via Advanced Virtualization, and provides
Furthermore, RHEL 8.5 AV now supports the possibility to persist mediated devices.
For a detailed list of Linux on Z-specific changes, also see this blog entry at Red Hat.
This blog post describes my mail setup, with a focus on how I handle patch email. Lets start with a general mail overview. Not going too deep into the details here, the internet has plenty of documentation and configuration tutorials.
Most of my machines have a
local postfix configured for
outgoing mail. My workstation and my laptop forward all mail (over
vpn) to the company internal email server. All I need for this to
work is a relayhost line in /etc/postfix/main.cf
:
relayhost = [smtp.corp.redhat.com]
Most unix utilities (including git send-email
) try to
send mails using /usr/sbin/sendmail
by default. This
tool will place the mail in the postfix queue for processing. The
name of the binary is a convention dating back to the days
where sendmail
was the one and only unix mail processing daemon.
All my mail is synced to local maildir storage. I'm using offlineimap for the job. Plenty of other tools exist, isync is another popular choice.
Local mail storage has the advantage that reading mail is faster, especially in case you have a slow internet link. Local mail storage also allows to easily index and search all your mail with notmuch.
I'm using server side filtering. The major advantage is that I always have the same view on all my mail. I can use a mail client on my workstation, the web interface or a mobile phone. Doesn't matter, I always see the same folder structure.
All modern email clients should be able to use maildir folders. I'm using neomutt. I also have used thunderbird and evolution in the past. All working fine.
The reason I use neomutt is that it is simply faster than GUI-based mailers, which matters when you have to handle alot of email. It is also easy very to hook up scripts, which is very useful when it comes to patch processing.
I'm using git send-email
for the simple cases
and git-publish
for the more complex ones. Where "simple" typically is
single changes (not a patch series) where it is unlikely that I have
to send another version addressing review comments.
git publish
keeps track of the revisions you have sent
by storing a git tag in your repo. It also stores the cover letter
and the list of people Cc'ed on the patch, so sending out a new
revision of a patch series is much easier than with plain git
send-email
.
git publish
also features config profiles. This is
helpful for larger projects where different subsystems use different
mailing lists (and possibly different development branches too).
So, here comes the more interesting part: Hooking scripts into
neomutt for patch processing. Lets start with the config
(~/.muttrc
) snippet:
# patch processing
bind index,pager p noop # default: print
macro index,pager pa "<pipe-entry>~/.mutt/bin/patch-apply.sh<enter>"
macro index,pager pl "<pipe-entry>~/.mutt/bin/patch-lore.sh<enter>"
First I map the 'p' key to noop
(instead
of print
which is the default configuration), which
allows to use two-key combinations starting with 'p' for patch
processing. Then 'pa' is configured to run
my patch-apply.sh
script, and 'pl'
runs patch-lore.sh
.
Lets have a look at the patch-apply.sh
script which
applies a single patch:
#!/bin/sh
# store patch
file="$(mktemp ${TMPDIR-/tmp}/mutt-patch-apply-XXXXXXXX)"
trap "rm -f $file" EXIT
cat > "$file"
# find project
source ~/.mutt/bin/patch-find-project.sh
if test "$project" = ""; then
echo "ERROR: can't figure project"
exit 1
fi
# go!
clear
cd $HOME/projects/$project
branch=$(git rev-parse --abbrev-ref HEAD)
clear
echo "#"
echo "# try applying patch to $project, branch $branch"
echo "#"
if git am --message-id --3way --ignore-whitespace --whitespace=fix "$file"; then
echo "#"
echo "# OK"
echo "#"
else
echo "# FAILED, cleaning up"
cp -v .git/rebase-apply/patch patch-apply-failed.diff
cp -v "$file" patch-apply-failed.mail
git am --abort
git reset --hard
fi
The mail is passed to the script on stdin, so the first thing the
script does is to store that mail in a temporary file. Next it goes
try figure which project the patch is for. The logic for that is in
a separate file so other scripts can share it, see below. Finally
try to apply the patch using git am
. In case of a
failure store both decoded patch and complete email before cleaning
up and exiting.
Now for patch-find-project.sh
. This script snippet
tries to figure the project by checking which mailing list the mail
was sent to:
#!/bin/sh
if test "$PATCH_PROJECT" != ""; then
project="$PATCH_PROJECT"
elif grep -q -e "devel@edk2.groups.io" "$file"; then
project="edk2"
elif grep -q -e "qemu-devel@nongnu.org" "$file"; then
project="qemu"
# [ ... more checks snipped ... ]
fi
if test "$project" = ""; then
echo "Can't figure project automatically."
echo "Use env var PATCH_PROJECT to specify."
fi
The PATCH_PROJECT environment variable can be used to override the autodetect logic if needed.
Last script is patch-lore.sh
. That one tries to apply
a complete patch series, with the help of
the b4 tool. b4 makes
patch series management an order of magnitude simpler. It will find
the latest revision of a patch series, bring the patches into the
correct order, pick up tags (Reviewed-by, Tested-by etc.) from
replies, checks signatures and more.
#!/bin/sh
# store patch
file="$(mktemp ${TMPDIR-/tmp}/mutt-patch-queue-XXXXXXXX)"
trap "rm -f $file" EXIT
cat > "$file"
# find project
source ~/.mutt/bin/patch-find-project.sh
if test "$project" = ""; then
echo "ERROR: can't figure project"
exit 1
fi
# find msgid
msgid=$(grep -i -e "^message-id:" "$file" | head -n 1 \
| sed -e 's/.*<//' -e 's/>.*//')
# go!
clear
cd $HOME/projects/$project
branch=$(git rev-parse --abbrev-ref HEAD)
clear
echo "#"
echo "# try queuing patch (series) for $project, branch $branch"
echo "#"
echo "# msgid: $msgid"
echo "#"
# create work dir
WORK="${TMPDIR-/tmp}/${0##*/}-$$"
mkdir "$WORK" || exit 1
trap 'rm -rf $file "$WORK"' EXIT
echo "# fetching from lore ..."
echo "#"
b4 am --outdir "$WORK" \
--apply-cover-trailers \
--sloppy-trailers \
$msgid || exit 1
count=$(ls $WORK/*.mbx 2>/dev/null | wc -l)
if test "$count" = "0"; then
echo "#"
echo "# got nothing, trying notmuch instead ..."
echo "#"
echo "# update db ..."
notmuch new
echo "# find thread ..."
notmuch show \
--format=mbox \
--entire-thread=true \
id:$msgid > $WORK/notmuch.thread
echo "# process mails ..."
b4 am --outdir "$WORK" \
--apply-cover-trailers \
--sloppy-trailers \
--use-local-mbox $WORK/notmuch.thread \
$msgid || exit 1
count=$(ls $WORK/*.mbx 2>/dev/null | wc -l)
fi
echo "#"
echo "# got $count patches, trying to apply ..."
echo "#"
if git am -m -3 $WORK/*.mbx; then
echo "#"
echo "# OK"
echo "#"
else
echo "# FAILED, cleaning up"
git am --abort
git reset --hard
fi
First part (store mail, find project) of the script is the same
as patch-apply.sh
. Then the script goes get the
message id of the mail passed in and feeds that into b4. b4 will go
try to find the email thread
on lore.kernel.org. In case
this doesn't return results the script will go query notmuch for the
email thread instead and feed that into b4 using
the --use-local-mbox
switch.
Finally it tries to apply the complete patch series prepared by b4
with git am
.
So, with all that in place applying a patch series is just two key strokes in neomutt. Well, almost. I still need an terminal on the side which I use to make sure the correct branch is checked out, to run build tests etc.
This article gives an overview of the architecture of Urbit applications. I spent a weekend trying out Urbit, reading documentation, and digging through the source code. I'm always on the lookout for the next milestone system that will change the internet and computing landscape. In particular, I find decentralized and peer-to-peer systems interesting because I have a sense that the internet is not quite right. It could be better if only someone could figure out how to do that and make it mainstream.
Urbit is an operating system and network designed to give users control by running applications on personal servers instead of on centralized servers operated by the application creators. This means data is stored on personal servers and is not immediately accessible to application creators. Both the Urbit operating system and network run on top of existing computing infrastructure. It's not a baremetal operating system (it runs under Linux, macOS, and Windows) or a new Layer 3 network protocol (it uses UDP). If you want to know more there is an overview of Urbit here.
The Urbit kernel, Arvo, is a single-function operating system in the sense of purely functional programming. The operating system function takes the previous state and input events and produces the next state and output events. This means that the state of the system can be saved after each invocation. If there is a power failure or the system otherwise stops execution it's easy to resume it later from the last state.
Urbit has a virtual machine and runtime that supports this programming environment. The low-level language is called Nock and the higher-level language is called Hoon. I haven't investigated them in detail, but they appear to support deterministic purely functional programming with I/O and other side-effects kept outside via monadsand passing around inputs like the current time.
Applications, also called agents, follow the purely functional model where they produce the next state as their result. Agents expose their services in three ways:
For example, an application that keeps a counter can define a poke interface for incrementing the counter and a peek interface for querying its value. A subscription can be used to receive an update whenever the counter changes.
Urbit supports peeks, pokes, and subscriptions over the network. This is how applications running on different personal servers can communicate. If we want to replicate a remote counter we can subscribe to it and then poke our local counter every time an update is received. This replication model leads to the store/hook/view architecture, a way of splitting applications into components that support local state, remote replication, and a user interface. In our counter example the store would be the counter, the hook would be the code that replicates remote counters, and the view would provide any logic needed for the user interface to control the counter.
User interfaces for applications are typically implemented in Landscape, a web-based user interface for interacting with Urbit from your browser. The user interface can be a modern JavaScript application that communicates with the agent running inside Urbit via the HTTP JSON API. This API supports peeks, pokes, and subscriptions. In other words, the application's backend is implemented as an Urbit agent while the frontend is a regular client-side web application.
Of course there are also APIs for data storage, networking, HTTP, etc. For example, the weather widget in Landscape fetches the weather from a web service using an HTTP request.
Urbit also supports peer discovery so you can resolve the funny IDs like ~bitbet-bolbel and establish connections to remote Urbit instances. The IDs are allocated hierarchically and ultimately registered on the Ethereum blockchain.
Keep in mind I only spent a weekend investigating Urbit so I don't understand the full system and could be wrong about what I have described. Also, I've spent a lot of time and am therefore invested in Linux and conventional programming environments. Feedback from the Urbit community is welcome, just send me an email or message me on IRC or Matrix.
The application and network model is intended for personal servers. I don't think people want personal servers. It's been tried before by Sandstorm, FreedomBox, and various projects without mainstream success. I think a more interesting model for the powerful devices we run today is one without any "server" at all. Instead of having an always-on server that is hosted somewhere, apps should be able to replicate and sync directly between a laptop and a phone. Having the option to run a personal server for always-on services like chat rooms or file hosting is nice, but many things don't need this. I wish Urbit was less focussed on personal servers and more on apps that replicate and sync directly between each other.
Urbit is obfuscated by the most extreme not invented here (NIH) syndrome I have ever seen. I tried to keep the terminology at a minimum in this article, so it might not be obvious unless you dive into the documentation or source code yourself. Not only is most of it a reinvention of existing stuff but it also uses new terminology for everything. It was disappointing to find that what first appeared like an alien system that might hold interesting discoveries was just a quirky reimplementation of existing concepts.
It will be difficult for Urbit to catch on as a platform since it has no common terminology with existing programming environments. If you want to write an app for Urbit using the Hoon programming language you'll have to wade through a lot of NIH at every level of the stack (programming language, operating system, APIs). There is an argument that reinventing everything allows the whole system to be small and self-contained, but in practice that's not true since Landscape apps are JavaScript web applications. They drag in the entire conventional computing environment that Urbit was supposed to replace. I wonder if the same kind of system can be built on top of a browser plus Deno with WebRTC for the server side, reusing existing technology that is actively developed by teams much larger than Urbit. It seems like a waste because Urbit doesn't really innovate in VMs, programming languages, etc yet the NIH stuff needs to be maintained.
Finally, for a system that is very much exposed to the network, I didn't see a strong discipline or much support for writing secure applications. The degree of network transparency that Urbit agents have also means that they present an attack surface. I would have expected the documentation and APIs/tooling to steer developers in a direction where it's hard to make mistakes. My impression is that a lot of the attack surface in agents is hand coded and security issues could become commonplace when Urbit gains more apps written by a wider developer community.
Despite this criticism I really enjoyed playing with Urbit. It is a cool rabbit hole to go down.
Urbit applications boil down to a relatively familiar interface similar to what can be done with gRPC: command/response, querying data, and subscriptions. The Urbit network allows applications to talk to each other directly in a peer-to-peer fashion. Users run apps on personal servers instead of centralized servers operated by the application creators (like Twitter, Facebook, etc). If Urbit can attract enough early adopters then it could become an interesting operating system and ecosystem that overcomes some of the issues of today's centralized internet. If you're wondering, I think it's worth spending a weekend exploring Urbit!