Blogging about open source virtualization

News from QEMU, KVM, libvirt, libguestfs, virt-manager and related tools

Subscriptions

Planet Feeds

May 22, 2020

Stefan Hajnoczi

How to check VIRTIO feature bits inside Linux guests

VIRTIO devices have feature bits that indicate the presence of optional features. The feature bit space is divided into core VIRTIO features (e.g. notify on empty), transport-specific features (PCI, MMIO, CCW), and device-specific features (e.g. virtio-net checksum offloading). This article shows how to check whether a feature is enabled inside Linux guests.

The feature bits are used during VIRTIO device initialization to negotiate features between the device and the driver. The device reports a fixed set of features, typically all the features that the device implementors wanted to offer from the VIRTIO specification version that they developed against. The driver also reports features, typically all the features that the driver developers wanted to offer from the VIRTIO specification version that they developed against.

Feature bit negotiation determines the subset of features supported by both the device and the driver. A new driver might not be able to enable all the features it supports if the device is too old. The same is true vice versa. This offers compatibility between devices and drivers. It also means that you don't know which features are enabled until the device and driver have negotiated them at runtime.

Where to find feature bit definitions

VIRTIO feature bits are listed in the VIRTIO specification. You can also grep the linux/virtio-*.h header files:


$ grep VIRTIO.*_F_ /usr/include/linux/virtio_*.h
virtio_ring.h:#define VIRTIO_RING_F_INDIRECT_DESC 28
virtio_ring.h:#define VIRTIO_RING_F_EVENT_IDX 29
virtio_scsi.h:#define VIRTIO_SCSI_F_INOUT 0
virtio_scsi.h:#define VIRTIO_SCSI_F_HOTPLUG 1
virtio_scsi.h:#define VIRTIO_SCSI_F_CHANGE 2
...

Here the VIRTIO_SCSI_F_INOUT (0) constant is for the 1st bit (1ull << 0). Bit-numbering can be confusing because different standards, vendors, and languages express it differently. Here it helps to think of a bit shift operation like 1 << BIT.

How to check feature bits inside the guest

The Linux virtio.ko driver that is used for all VIRTIO devices has a sysfs file called features. This file contains the feature bits in binary representation starting with the 1st bit on the left and more significant bits to the right. The reported bits are the subset that both the device and the driver support.

To check if the virtio-blk device /dev/vda has the VIRTIO_RING_F_EVENT_IDX (29) bit set:


$ python -c "print('$(</sys/block/vda/device/driver/virtio*/features)'[29])"
01100010011101100000000000100010100

Other device types can be found through similar sysfs paths.

by Unknown (noreply@blogger.com) at May 22, 2020 01:12 PM

May 01, 2020

Daniel Berrange

ANNOUNCE: virt-viewer version 9.0 released

I am happy to announce a new bugfix release of virt-viewer 9.0 (gpg), including experimental Windows installers for Win x86 MSI (gpg) and Win x64 MSI (gpg).

Signatures are created with key DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF (4096R)

With this release the project has moved over to use GitLab for its hosting needs instead of Pagure. Instead of sending patches to the old mailing list, we have adopted modern best practices and now welcome contributions as merge requests, from where they undergo automated CI testing of the build. Bug reports directed towards upstream maintainers, should also be filed at the GitLab project now instead of the Red Hat Bugzilla

All historical releases are available from:

http://virt-manager.org/download/

Changes in this release include:

  • Project moved to https://gitlab.com/virt-viewer/virt-viewer
  • Allow toggling shared clipboard in remote-viewer
  • Fix handling when initial spice connection fails
  • Fix check for govirt library
  • Add bash completion of cli args
  • Improve errors in file transfer dialog
  • Fix ovirt foreign menu storage domains query
  • Prefer TLS certs from oVirt instead of CLI
  • Improve USB device cleanup when Ctrl-C is used
  • Remember monitor mappings across restarts
  • Add a default file extension to screenshots
  • Updated translations
  • Fix misc memory leaks

by Daniel Berrange at May 01, 2020 05:19 PM

April 30, 2020

KVM on Z

QEMU v5.0 released

QEMU v5.0 is out. For highlights from a KVM on Z perspective see the Release Notes.

by Stefan Raspl (noreply@blogger.com) at April 30, 2020 05:28 PM

Stefan Hajnoczi

How the Linux VFS, block layer, and device drivers fit together

The Linux kernel storage stack consists of several components including the Virtual File System (VFS) layer, the block layer, and device drivers. This article gives an overview of the main objects that a device driver interacts with and their relationships to each other. Actual I/O requests are not covered, instead the focus is on the objects representing the disk.

Let's start with a diagram of the key data structures and then an explanation of how they work together.

The Virtual File System (VFS) layer

The VFS layer is where file system concepts like files and directories are handled. The VFS provides an interface that file systems like ext4, XFS, and NFS implement to register themselves with the kernel and participate in file system operations. The struct file_operations interface is the most interesting for device drivers as we are about to see.

System calls like open(2), read(2), etc are handled by the VFS and dispatched to the appropriate struct file_operationsfunctions.

Block device nodes like /dev/sda are implemented in fs/block_dev.c, which forms a bridge between the VFS and the Linux block layer. The block layer handles the actual I/O requests and is aware of disk-specific information like capacity and block size.

The main VFS concept that device drivers need to be aware of is struct block_device_operations and the struct block_deviceinstances that represent block devices in Linux. A struct block_device connects the VFS inode and struct file_operationsinterface with the block layer struct gendiskand struct request_queue.

In Linux there are separate device nodes for the whole device (/dev/sda) and its partitions (/dev/sda1, /dev/sda2, etc). This is handled by struct block_device so that a partition has a pointer to its parent in bd_contains.

The block layer

The block layer handles I/O request queues, disk partitions, and other disk-specific functionality. Each disk is represented by a struct gendisk and may have multiple struct hd_struct partitions. There is always part0, a special "partition" covering the entire block device.

I/O requests are placed into queues for processing. Requests can be merged and scheduled by the block layer. Ultimately a device driver receives a request for submission to the physical device. Queues are represented by struct request_queue.

The device driver

The disk device driver registers a struct genhd with the block layer and sets up the struct request_queue to receive requests that need to be submitted to the physical device.

There is one struct genhd for the entire device even though userspace may open struct block_device instances for multiple partitions on the disk. Disk partitions are not visible at the driver level because I/O requests have already had their Logical Block Address (LBA) adjusted with the partition start offset.

How it all fits together

The VFS is aware of the block layer struct gendisk. The device driver is aware of both the block layer and the VFS struct block_device. The block layer does not have direct connections to the other components but the device driver provides callbacks.

One of the interesting aspects is that a device driver may drop its reference to struct gendisk but struct block_device instances may still have their references. In this case no I/O can occur anymore because the driver has stopped the disk and the struct request_queue, but userspace processes can still call into the VFS and struct block_device_operations callbacks in the device driver can still be invoked.

Thinking about this case is why I drew the diagram and ended up writing about this topic!

by Unknown (noreply@blogger.com) at April 30, 2020 04:21 PM

KVM on Z

Redbook on KVM on Z



A new Redbook titled "Virtualization Cookbook for IBM Z Volume 5 KVM" is available now. Among others, it covers tasks such as installation, host configuration and guest deployments for Linux distributions by Red Hat, SUSE and Ubuntu.


by Stefan Raspl (noreply@blogger.com) at April 30, 2020 10:41 AM

April 29, 2020

QEMU project

QEMU version 5.0.0 released

We’d like to announce the availability of the QEMU 5.0.0 release. This release contains 2800+ commits from 232 authors.

You can grab the tarball from our download page. The full list of changes are available in the Wiki.

Highlights include:

  • Support for passing host filesystem directory to guest via virtiofsd
  • Live migration support for external processes running on QEMU D-Bus
  • Support for using memory backends for main/”built-in” guest RAM
  • block: support for compressed backup images via block jobs
  • block: qemu-img: ‘measure’ command now supports LUKS images, ‘convert’ command now supports skipping zero’ing of target image
  • block: experimental support for qemu-storage-daemon, which provides access to QEMU block-layer/QMP features like blocks jobs or built-in NBD server without starting a full VM
  • ARM: support for the following architecture features: ARMv8.1 VHE/VMID16/PAN/PMU ARMv8.2 UAO/DCPoP/ATS1E1/TTCNP ARMv8.3 RCPC/CCIDX ARMv8.4 PMU/RCPC
  • ARM: support for Cortex-M7 CPU
  • ARM: new board support for tacoma-bmc, Netduino Plus 2, and Orangepi PC
  • ARM: ‘virt’ machine now supports vTPM and virtio-iommu devices
  • HPPA: graphical console support via HP Artist graphics device
  • MIPS: support for GINVT (global TLB invalidation) instruction
  • PowerPC: ‘pseries’ machine no longer requires reboot to negotiate between XIVE/XICS interrupt controllers when ic-mode=dual
  • PowerPC: ‘powernv’ machine can now emulate KVM hardware acceleration to run KVM guests while in TCG mode
  • PowerPC: support for file-backed NVDIMMs for persistent memory emulation
  • RISC-V: ‘virt’ and ‘sifive_u’ boards now support generic syscon drivers in Linux to control power/reboot
  • RISC-V: ‘virt’ board now supports Goldfish RTC
  • RISC-V: experimental support for v0.5 of draft hypervisor extension
  • s390: support for Adapter Interrupt Suppression while running in KVM mode
  • and lots more…

Thank you to everyone involved!

April 29, 2020 06:00 PM

KVM on Z

RHEL 7 Structure A (KVM host support) Support Lifecycle Extended

Red Hat has updated the Red Hat Enterprise Linux Life Cycle, extending the full support lifecycle for Red Hat Enterprise Linux Structure A on IBM Z to May 31, 2021. See here for details, and here for an entry on the Red Hat Blog, referring to Structure A as "alt packages".
The Structure A release provides updated kernel, QEMU and libvirt packages to run KVM on IBM Z.

by Stefan Raspl (noreply@blogger.com) at April 29, 2020 02:40 PM

April 27, 2020

KVM on Z

Ubuntu 20.04 released

Ubuntu Server 20.04 is out!
It ships
providing Secure Execution as announced here.
For a detailed list of KVM on Z changes, see the release notes here.

by Stefan Raspl (noreply@blogger.com) at April 27, 2020 01:55 PM

April 24, 2020

ARM Datacenter Project

NUMA balancing

NUMA balancing impact on common benchmarks

NUMA balancing can lead to performance degradation on NUMA-based arm64 systems when tasks migrate,
and their memory accesses now suffer additional latency.

Platform

System Information
Architecture aarch64
Processor version Kunpeng 920-6426
CPUs 128
NUMA nodes 4
Kernel release 5.6.0+
Node name ARMv2-3

Test results

PerfBenchSchedPipe

perf bench -f simple sched pipe  
Test Result
numa_balancing-ON 10.012 (usecs/op)
numa_balancing-OFF 10.509 (usecs/op)

PerfBenchSchedMessaging

perf bench -f simple sched messaging -l 10000  
Test Result
numa_balancing-ON 6.417 (Sec)
numa_balancing-OFF 6.494 (Sec)

PerfBenchMemMemset

perf bench -f simple  mem memset -s 4GB -l 5 -f default  
Test Result
numa_balancing-ON 17.438783330964565 (GB/sec)
numa_balancing-OFF 17.63163114627642 (GB/sec)

PerfBenchFutexWake

perf bench -f simple futex wake -s -t 1024 -w 1  
Test Result
numa_balancing-ON 9.2742 (ms)
numa_balancing-OFF 9.2178 (ms)

SysBenchCpu

sysbench cpu --time=10 --threads=64 --cpu-max-prime=10000 run  
Test Result
numa_balancing-ON 214960.28 (Events/sec)
numa_balancing-OFF 214965.55 (Events/sec)

SysBenchMemory

sysbench memory --memory-access-mode=rnd --threads=64 run  
Test Result
numa_balancing-ON 1645 (MB/s)
numa_balancing-OFF 1959 (MB/s)

SysBenchThreads

sysbench threads --threads=64 run  
Test Result
numa_balancing-ON 4604 (Events/sec)
numa_balancing-OFF 5390 (Events/sec)

SysBenchMutex

sysbench mutex --mutex-num=1 --threads=512 run  
Test Result
numa_balancing-ON 33.2165 (Sec)
numa_balancing-OFF 32.1088 (Sec)

by Peter at April 24, 2020 11:04 PM

April 23, 2020

ARM Datacenter Project

LISA-QEMU Presentation

We recently gave a presentation on LISA-QEMU to the Linaro Toolchain Working Group.

This presentation highlights our work on LISA-QEMU and provides all the details on what LISA-QEMU is, why we established this project, and how to get up and running creating VMs with the tools we developed.

Please visit the links below to view the presentation or meeting recording.

by Rob Foley at April 23, 2020 03:27 PM

April 22, 2020

ARM Datacenter Project

How to debug kernel using QEMU and aarch64 VM.

QEMU is a great tool to use when needing to debug the kernel.
There are many recipes online for this too, I have listed a few helpful ones at the end of the article for reference.

We would like to share our steps for debug the kernel, but focused on aarch64 systems, as some of the steps might be slightly different for this type of system.

First, create a directory to work in and run these commands to create the flash images:

dd if=/dev/zero of=flash1.img bs=1M count=64
dd if=/dev/zero of=flash0.img bs=1M count=64
dd if=/usr/share/qemu-efi-aarch64/QEMU_EFI.fd of=flash0.img conv=notrunc

Next, download a QEMU image. We will use an ubuntu image that we previously created.

We should mention that our procedure involves building our own kernel from scratch, and feeding this image to QEMU.

Thus the first step is to actually create a QEMU image. We will assume you already have an image to use. If not, check out our articles on:

We prefer the first procedure using LISA-QEMU since we also have a helpful script to install your kernel into the VM image automatically.

But don’t worry, if you want to take a different route we will show all the steps for that too!

Installing Kernel

You have a few options here. One is to boot the image and install the image manually or use LISA-QEMU scripts to install it. The below command will boot the image in case you want to use the later manual approach to boot the image, scp in the kernel (maybe a .deb file) and install it manually with deb -i .deb.

qemu/build/aarch64-softmmu/qemu-system-aarch64 -nographic\
                    -machine virt,gic-version=max -m 2G -cpu max\
                    -netdev user,id=vnet,hostfwd=:127.0.0.1:0-:22\
                    -device virtio-net-pci,netdev=vnet\ 
                    -drive file=./mini_ubuntu.img,if=none,id=drive0,cache=writeback\ 
                    -device virtio-blk,drive=drive0,bootindex=0\ 
                    -drive file=./flash0.img,format=raw,if=pflash \
                    -drive file=./flash1.img,format=raw,if=pflash -smp 4 

To bring up QEMU with a kernel, typically you will need a kernel image (that you built), an initrd image (built after installing the kernel in your image), and the OS image (created above).

Keep in mind the below steps assume a raw image. If you have a qcow2, then use qemu-img to convert it to raw first. For example:

qemu-img convert -O raw my_image.qcow2 my_image_output.raw

Below is how to mount an image to copy out files. You need to copy out the initrd in this case.

$ mkdir mnt
$ sudo losetup -f -P ubuntu.img
$ sudo losetup -l
NAME       SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE                                DIO LOG-SEC
/dev/loop0         0      0         0  0 ubuntu.img   0     512
$ sudo mount /dev/loop0p2 ./mnt
$ ls ./mnt/boot
config-4.15.0-88-generic  grub                          initrd.img-5.5.11             System.map-5.5.11          vmlinuz-5.5.11
config-5.5.11             initrd.img                    initrd.img.old                vmlinuz                    vmlinuz.old
efi                       initrd.img-4.15.0-88-generic  System.map-4.15.0-88-generic  vmlinuz-4.15.0-88-generic
$ cp ./mnt/initrd.img-5.5.11 .
$ sudo umount ./mnt
$ sudo losetup -d /dev/loop0

Next, boot the kernel you built with your initrd. Note the kernel you built can be found at arch/arm64/boot/Image.

This command line will bring up your kernel image with your initrd and your OS Image.

One item you might need to customize is the “root=/dev/vda1” argument. This tells the kernel where to find your boot partition. This might vary depending on your VM image.

qemu/build/aarch64-softmmu/qemu-system-aarch64 -nographic\
                  -machine virt,gic-version=max -m 2G -cpu max\
                  -netdev user,id=vnet,hostfwd=:127.0.0.1:0-:22\
                  -device virtio-net-pci,netdev=vnet\
                  -drive file=./mini_ubuntu.img,if=none,id=drive0,cache=writeback\
                  -device virtio-blk,drive=drive0,bootindex=0\
                  -drive file=./flash0.img,format=raw,if=pflash\
                  -drive file=./flash1.img,format=raw,if=pflash -smp 4\
                  -kernel ./linux/arch/arm64/boot/Image\
                  -append "root=/dev/vda2 nokaslr console=ttyAMA0"\
                  -initrd ./initrd.img-5.5.11 -s -S

-s tells QEMU to use the TCP port :1234
-S will pause at startup, waiting for the debugger to attach.

Before we get started debugging, update your ~/.gdbinit with the following:

add-auto-load-safe-path linux-5.5.11/scripts/gdb/vmlinux-gdb.py

In another window, start the debugger. Note, if you are on a x86 host debugging aarch64, then you need to use gdb-multiarch (sudo apt-get gdb-multiarch). In our case below we are on an aarch64 host, so we just use gdb.

It’s very important to note that we receive the “done” message below indicating symbols were loaded successfully, otherwise the following steps will not work.

$ gdb linux-5.5.11/vmlinux
GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
Reading symbols from linux-5.5.11/vmlinux...done.

Attach the debugger to the kernel. Remember the -s argument above? It told QEMU to use port :1234. We will connect to it now.

(gdb) target remote localhost:1234
Remote debugging using localhost:1234
0x0000000000000000 in ?? ()

That it. The debugger is connected.

Now let’s test out the setup.
Add a breakpoint in the kernel as a test.

(gdb) hbreak start_kernel
Hardware assisted breakpoint 1 at 0xffff800011330cdc: file init/main.c, line 577.
(gdb) c
Continuing.

Thread 1 hit Breakpoint 1, start_kernel () at init/main.c:577
577 {
(gdb) l
572 {
573     rest_init();
574 }
575 
576 asmlinkage __visible void __init start_kernel(void)
577 {
578     char *command_line;
579     char *after_dashes;
580 
581     set_task_stack_end_magic(&init_task);
(gdb) 

We hit the breakpoint !

Remember above that we used the -S option to QEMU? This told QEMU to wait to start running the image until we connected the debugger. Thus once we hit continue, QEMU actually starts booting the kernel.

References:

by Rob Foley at April 22, 2020 10:51 AM

April 20, 2020

Stefan Hajnoczi

virtio-fs has landed in QEMU 5.0!

The virtio-fs shared host->guest file system has landed in QEMU 5.0! It consists of two parts: the QEMU -device vhost-user-fs-pci and the actual file server called virtiofsd. Guests need to have a virtio-fs driver in order to access shared file systems. In Linux the driver is called virtiofs.ko and has been upstream since Linux v5.4.

Using virtio-fs

Thanks to libvirt virtio-fs support, it's possible to share directories trees from the host with the guest like this:

<filesystem type='mount' accessmode='passthrough'>
<driver type='virtiofs'/>
<binary xattr='on'>
<lock posix='on' flock='on'/>
</binary>
<source dir='/path/on/host'/>
<target dir='mount_tag'/>
</filesystem>

The host /path/on/host directory tree can be mounted inside the guest like this:

# mount -t virtiofs mount_tag /mnt

Applications inside the guest can then access the files as if they were local files. For more information about virtio-fs, see the project website.

How it works

For the most part, -device vhost-user-fs-pci just facilitates the connection to virtiofsd where the real work happens. When guests submit file system requests they are handled directly by the virtiofsd process on the host and don't need to go via the QEMU process.

virtiofsd is a FUSE file system daemon with virtio-fs extensions. virtio-fs is built on top of the FUSE protocol and therefore supports the POSIX file system semantics that applications expect from a native Linux file system. The Linux guest driver shares a lot of code with the traditional fuse.ko kernel module.

Resources on virtio-fs

I have given a few presentations on virtio-fs:

Future features

A key feature of virtio-fs is the ability to directly access the host page cache, eliminating the need to copy file contents into guest RAM. This so-called DAX support is not upstream yet.

Live migration is not yet implemented. It is a little challenging to transfer all file system state to the destination host and seamlessly continue file system operation without remounting, but it should be doable.

There is a Rust implementation of virtiofsd that is close to reaching maturity and will replace the C implementation. The advantage is that Rust has better memory and thread safety than C so entire classes of bugs can be eliminated. Also, the codebase is written from scratch whereas the C implementation was a combination of several existing pieces of software that were not designed together.

by Unknown (noreply@blogger.com) at April 20, 2020 02:23 PM

April 16, 2020

KVM on Z

Secure Execution for IBM z15 arriving with New Models

IBM announced the latest additions to its IBM z15 series:
  • IBM z15 Model T02
  • IBM LinuxONE III Model LT2
A substantial part of the announcement is a new feature called Secure Execution. For a brief overview, see here. Secure Execution will become available in the following Linux distributions as announced by the respective distribution partners:
We will publish more details on Secure Execution later. The impatient with an interest in lower level technical details might want to check out the presentations here and here.

by Stefan Raspl (noreply@blogger.com) at April 16, 2020 09:14 AM

April 08, 2020

Cornelia Huck

s390x changes in QEMU 5.0

QEMU is currently in hardfreeze, with the 5.0 release expected at the end of the month. Here's a quick list of some notable s390x changes.

  • You can finally enable Adapter Interrupt Suppression in the cpu model (ais=on) when running under KVM. This had been working under TCG for some time now, but KVM was missing an interface that was provided later -- and we finally actually check for that interface in QEMU. This is mostly interesting for PCI.
  • QEMU had been silently fixing odd memory sizes to something that can be reported via SCLP for some time. Silently changing user input is probably not such a good idea; compat machines will continue to do so to enable migration from old QEMUs for machines with odd sizes, but will print a warning now. If you have such an old machine (and you can modify it), it might be a good idea to either specify the memory size it gets rounded to or to switch to the 5.0 machine type, where memory sizes can be more finegrained due to the removal of support for memory hotplug. We may want to get rid of the code doing the fixup at some time in the future.
  • QEMU now properly performs the whole set of initial, clear, and normal cpu reset.
  • And the usual fixes, cleanups, and improvements.
For 5.1, expect more changes; support for protected virtualization will be a big item.

by Cornelia Huck (noreply@blogger.com) at April 08, 2020 01:58 PM

ARM Datacenter Project

How to easily install the kernel in a VM

This article is a follow-up to an earlier article we wrote Introducing LISA-QEMU.

This article will outline the steps to install a kernel into a VM using some scripts we developed. In our case we have an x86_64 host and a aarch64 VM.

We will assume you have cloned the LISA-QEMU repository already. As part of the LISA-QEMU integration we have added a script to automate the process of installing a kernel into a VM. The scripts we talk about below can be found in the LISA-QEMU github

git clone https://github.com/rf972/lisa-qemu.git
cd lisa-qemu
git submodule update --init --recursive

We also assume you have built the kernel .deb install package. We covered the detailed steps in our README. You can also find needed dependencies for this article at that link.

You can use install_kernel.py to generate a new image with the kernel of choice installed.
Assuming you have the VM image that was created similar to the steps in this post, just launch a command like the below to install your kernel.

$ sudo python3 scripts/install_kernel.py --kernel_pkg ../linux/linux-image-5.5.11_5.5.11-1_arm64.deb 
scripts/install_kernel.py: image: build/VM-ubuntu.aarch64/ubuntu.aarch64.img
scripts/install_kernel.py: kernel_pkg: ../linux/linux-image-5.5.11_5.5.11-1_arm64.deb

Install kernel successful.
Image path: /home/rob/qemu/lisa-qemu/build/VM-ubuntu.aarch64/ubuntu.aarch64.img.kernel-5.5.11-1

To start this image run this command:
python3 /home/rob/qemu/lisa-qemu/scripts/launch_image.py -p /home/rob/qemu/lisa-qemu/build/VM-ubuntu.aarch64/ubuntu.aarch64.img.kernel-5.5.11-1

We need to use sudo for these commands since sudo is required as part of mounting images.

Note that the argument is:
-p or –kernel_pkg argument with the .deb kernel package

Also note that the last lines in the output show the command to issue to bring this image up.

To start this image run this command:
python3 /home/rob/qemu/lisa-qemu/scripts/launch_image.py -p /home/rob/qemu/lisa-qemu/build/VM-ubuntu.aarch64/ubuntu.aarch64.img.kernel-5.5.11-1

You might wonder where we got the VM image from?
It was found in a default location after running our build_image.py script. See this post for more details.

If you want to supply your own image, we have an argument for that. :)
–image argument with the VM image to start from.

When supplying the image, the command line might look like the below.

sudo python3 scripts/install_kernel.py --kernel_pkg ../linux/linux-image-5.5.11_5.5.11-1_arm64.deb --image build/VM-ubuntu.aarch64/ubuntu.aarch64.img

There are a few options for installing the kernel.

By default install_kernel.py will attempt to install your kernel using a chroot environment. This is done for speed more than anything else since in our case is is faster to use the chroot than to bring up the aarch64 emulated VM and install the kernel.

We also support the –vm option which will bring up the VM with QEMU and then install the kernel into it. If you run into issues with the chroot environment install this would be a good alternative.

An example of the VM install method.

sudo python3 scripts/install_kernel.py --vm --kernel_pkg ../linux/linux-image-5.5.11_5.5.11-1_arm64.deb


Thanks for taking the time to learn more about our work on LISA-QEMU !

by Rob Foley at April 08, 2020 11:50 AM

April 02, 2020

ARM Datacenter Project

LISA-QEMU Demo

This article is a follow-up to an earlier article we wrote Introducing LISA-QEMU.

LISA-QEMU provides an integration which allows LISA to work with QEMU VMs. LISA’s goal is to help Linux kernel developers to measure the impact of modifications in core parts of the kernel.1 Integration with QEMU will allow developers to test wide variety of hardware configurations including ARM architecture and complex NUMA topologies.

This demo will walk through all the steps needed to build and bring up an aarch64 VM on an x86 platform. Future articles will work through reconfiguring the hardware for these VMs, inserting a new kernel into these VMs and more !

The first step is to get your linux machine ready to run LISA-QEMU. In this step we will download all the dependencies needed. We assume Ubuntu in the below steps.

apt-get build-dep -y qemu
apt-get install -y python3-yaml wget git qemu-efi-aarch64 qemu-utils genisoimage qemu-user-static git

Now that we have the correct dependencies, let’s download the LISA-QEMU code.

git clone https://github.com/rf972/lisa-qemu.git
cd lisa-qemu
git submodule update --init --progress --recursive

One note on the above. If you do not plan to use lisa, then you can leave off the –recursive and it will update much quicker.

The next step is to build a new VM. This build command takes all the defaults. If you want to learn more about the possible options take a look at build_image.py –help.

$ time python3 scripts/build_image.py  --help
usage: build_image.py [-h] [--debug] [--dry_run] [--ssh]
                      [--image_type IMAGE_TYPE] [--image_path IMAGE_PATH]
                      [--config CONFIG] [--skip_qemu_build]

Build the qemu VM image for use with lisa.

optional arguments:
  -h, --help            show this help message and exit
  --debug, -D           enable debug output
  --dry_run             for debugging.  Just show commands to issue.
  --ssh                 Launch VM and open an ssh shell.
  --image_type IMAGE_TYPE, -i IMAGE_TYPE
                        Type of image to build.
                        From external/qemu/tests/vm.
                        default is ubuntu.aarch64
  --image_path IMAGE_PATH, -p IMAGE_PATH
                        Allows overriding path to image.
  --config CONFIG, -c CONFIG
                        config file.
                        default is conf/conf_default.yml.
  --skip_qemu_build     For debugging script.

examples:
  To select all defaults:
   scripts/build_image.py
  Or select one or more arguments
    scripts/build_image.py -i ubuntu.aarch64 -c conf/conf_default.yml

But we digress… Below is the command to build the image.

OK let’s build that image…

python3 scripts/build_image.py

You will see the progress of the build and other steps of the image creation on your screen. If you would like to see more comprehensive output and progress, use the –debug option.

Depending on your system this might take many minutes. Below are some example times.

50 minutes - Intel i7 laptop with 2 cores and 16 GB of memory
6 minutes - Huawei Taishan 2286 V2 with 128 ARM cores and 512 GB of memory.

Once the image creation is complete, you will see a message like the following.

Image creation successful.
Image path: /home/lisa-qemu/build/VM-ubuntu.aarch64/ubuntu.aarch64.img

Now that we have an image, we can test it out by bringing up the image and opening an ssh connection to it.

python3 scripts/launch_image.py

The time to bring up the VM will vary based on your machine, but it should come up in about 2-3 minutes on most machines.

You should expect to see the following as the system boots and we open an ssh connection to bring us to the guest prompt.

$ python3 scripts/launch_image.py
Conf:        /home/lisa-qemu/build/VM-ubuntu.aarch64/conf.yml
Image type:  ubuntu.aarch64
Image path:  /home/lisa-qemu/build/VM-ubuntu.aarch64/ubuntu.aarch64.img

qemu@ubuntu-aarch64-guest:~$

Now that the system is up and running, you could for example, use it for a lisa test.

In our case we issue one command to show that we are in fact an aarch64 architecture with 8 cores.

qemu@ubuntu-guest:~$ lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  1
Core(s) per socket:  8
Socket(s):           1
NUMA node(s):        1
Vendor ID:           0x00
Model:               0
Stepping:            0x0
BogoMIPS:            125.00
NUMA node0 CPU(s):   0-7
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma sha3 sm3 sm4 asimddp sha512 sve asimdfhm flagm

Once you are done with the VM, you can close the VM simply by typing “exit” a the command prompt.

qemu@ubuntu-guest:~$ exit
exit
Connection to 127.0.0.1 closed by remote host.

That’s it. The VM was gracefully powered off.

We hope this article was helpful to understand just how easy it can be to build and launch a VM with LISA-QEMU !

  1. This definition can be found on the LISA github page 

by Rob Foley at April 02, 2020 11:07 AM

April 01, 2020

ARM Datacenter Project

Introducing LISA-QEMU

LISA-QEMU provides an integration which allows LISA to work with QEMU VMs. LISA’s goal is to help Linux kernel developers to measure the impact of modifications in core parts of the kernel1. Integration with QEMU will allow developers to test wide variety of hardware configurations including ARM architecture and complex NUMA topologies.

One of our goals is to allow developers to test the impact of modifications on aarch64 architectures with complex NUMA topologies. Currently we are focusing on testing kernel CFS scheduler task placement decision mechanism interaction with NUMA_BALANCING

In order to simplify and streamline the development process we created scripts and configuration files, which allow developers to quickly create QEMU VMs with a configurable number of cores and NUMA nodes. We also created a script to install custom build kernel on these VMs. Once a VM is configured with the desired topology and kernel version developers can run interactive and/or automated LISA tests.

Please note, that you do not need physical aarch64 hardware. In fact we have demoed this project on a laptop with a Core-i7-7600U CPU with two cores.

Our approach is to contribute improvements in QEMU and LISA back to the mainstream. In our repository we will keep scripts and configurations belonging to the integration between LISA and QEMU.

LISA Overview: The LISA project provides a toolkit that supports regression testing and interactive analysis of Linux kernel behavior. LISA’s goal is to help Linux kernel developers measure the impact of modifications in core parts of the kernel. LISA itself runs on a host machine, and uses the devlib toolkit to interact with the target via SSH, ADB or telnet. LISA provides features to describe workloads (notably using rt-app) and run them on targets. It can collect trace files from the target OS (e.g. systrace and ftrace traces), parse them via the TRAPpy framework. These traces can then be parsed and analysed in order to examine detailed target behaviour during the workload’s execution.1

Peter also contributed to this article.

We also have articles on LISA-QEMU:

  1. This definition can be found on the LISA github page  2

by Rob Foley at April 01, 2020 08:30 PM

March 25, 2020

Marcin Juszkiewicz

Sharing PCIe cards across architectures

Some days ago during one of conference calls one of my co-workers asked:

Has anyone ever tried PCI forwarding to an ARM VM on an x86 box?

As my machine was opened I just turned it off and inserted SATA controller into one of unused PCI Express slots. After boot I started one of my AArch64 CirrOS VM instances and gave it this card. Worked perfectly:

[   21.603194] pcieport 0000:00:01.0: pciehp: Slot(0): Attention button pressed
[   21.603849] pcieport 0000:00:01.0: pciehp: Slot(0) Powering on due to button press
[   21.604124] pcieport 0000:00:01.0: pciehp: Slot(0): Card present
[   21.604156] pcieport 0000:00:01.0: pciehp: Slot(0): Link Up
[   21.739977] pci 0000:01:00.0: [1b21:0612] type 00 class 0x010601
[   21.740159] pci 0000:01:00.0: reg 0x10: [io  0x0000-0x0007]
[   21.740199] pci 0000:01:00.0: reg 0x14: [io  0x0000-0x0003]
[   21.740235] pci 0000:01:00.0: reg 0x18: [io  0x0000-0x0007]
[   21.740271] pci 0000:01:00.0: reg 0x1c: [io  0x0000-0x0003]
[   21.740306] pci 0000:01:00.0: reg 0x20: [io  0x0000-0x001f]
[   21.740416] pci 0000:01:00.0: reg 0x24: [mem 0x00000000-0x000001ff]
[   21.742660] pci 0000:01:00.0: BAR 5: assigned [mem 0x10000000-0x100001ff]
[   21.742709] pci 0000:01:00.0: BAR 4: assigned [io  0x1000-0x101f]
[   21.742770] pci 0000:01:00.0: BAR 0: assigned [io  0x1020-0x1027]
[   21.742803] pci 0000:01:00.0: BAR 2: assigned [io  0x1028-0x102f]
[   21.742834] pci 0000:01:00.0: BAR 1: assigned [io  0x1030-0x1033]
[   21.742866] pci 0000:01:00.0: BAR 3: assigned [io  0x1034-0x1037]
[   21.742935] pcieport 0000:00:01.0: PCI bridge to [bus 01]
[   21.742961] pcieport 0000:00:01.0:   bridge window [io  0x1000-0x1fff]
[   21.744805] pcieport 0000:00:01.0:   bridge window [mem 0x10000000-0x101fffff]
[   21.745749] pcieport 0000:00:01.0:   bridge window [mem 0x8000000000-0x80001fffff 64bit pref]

Let’s go deeper

Next day I turned off desktop for CPU cooler upgrade. During process I went through my box of expansion cards and plugged additional USB 3.0 controller (Renesas based). Also added SATA hard drive and connected it to previously added controller.

Once computer was back online I created new VM instance. This time I used Fedora 32 beta. But when I tried to add PCI Express card I got an error:

Error while starting domain: internal error: process exited while connecting to monitor: 2020-03-25T13:43:39.107524Z qemu-system-aarch64: -device vfio-pci,host=0000:29:00.0,id=hostdev0,bus=pci.3,addr=0x0: VFIO_MAP_DMA: -22
2020-03-25T13:43:39.107560Z qemu-system-aarch64: -device vfio-pci,host=0000:29:00.0,id=hostdev0,bus=pci.3,addr=0x0: vfio 0000:29:00.0: failed to setup container for group 28: memory listener initialization failed: Region mach-virt.ram: vfio_dma_map(0x563169753c80, 0x40000000, 0x100000000, 0x7fb2a3e00000) = -22 (Invalid argument)

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 75, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 111, in tmpcb
    callback(*args, **kwargs)
  File "/usr/share/virt-manager/virtManager/object/libvirtobject.py", line 66, in newfn
    ret = fn(self, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/object/domain.py", line 1279, in startup
    self._backend.create()
  File "/usr/lib64/python3.8/site-packages/libvirt.py", line 1234, in create
    if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirt.libvirtError: internal error: process exited while connecting to monitor: 2020-03-25T13:43:39.107524Z qemu-system-aarch64: -device vfio-pci,host=0000:29:00.0,id=hostdev0,bus=pci.3,addr=0x0: VFIO_MAP_DMA: -22
2020-03-25T13:43:39.107560Z qemu-system-aarch64: -device vfio-pci,host=0000:29:00.0,id=hostdev0,bus=pci.3,addr=0x0: vfio 0000:29:00.0: failed to setup container for group 28: memory listener initialization failed: Region mach-virt.ram: vfio_dma_map(0x563169753c80, 0x40000000, 0x100000000, 0x7fb2a3e00000) = -22 (Invalid argument)

Hmm. It worked before. Tried other card — with the same effect.

Debugging

Went to #qemu IRC channel and started discussing issue with QEMU developers. Turned out that probably no one tried sharing expansion cards to foreign architecture guest (in TCG mode instead of same architecture KVM mode).

As I had VM instance where sharing card worked I started checking what was wrong. After some restarts it was clear that crossing 3054 MB of guest memory was enough to get VFIO errors like above.

Reporting

Issue not reported does not exist. So I opened a bug against QEMU. Filled it with error messages, “lspci” output data for used cards, QEMU command line (generated by libvirt) etc.

Looks like the problem lies in architecture differences between x86-64 (host) and aarch64 (guest). Let me quote Alex Williamson:

The issue is that the device needs to be able to DMA into guest RAM, and to do that transparently (ie. the guest doesn’t know it’s being virtualized), we need to map GPAs into the host IOMMU such that the guest interacts with the device in terms of GPAs, the host IOMMU translates that to HPAs. Thus the IOMMU needs to support GPA range of the guest as IOVA. However, there are ranges of IOVA space that the host IOMMU cannot map, for example the MSI range here is handled by the interrupt remmapper, not the DMA translation portion of the IOMMU (on physical ARM systems these are one-in-the-same, on x86 they are different components, using different mapping interfaces of the IOMMU). Therefore if the guest programmed the device to perform a DMA to 0xfee00000, the host IOMMU would see that as an MSI, not a DMA. When we do an x86 VM on and x86 host, both the host and the guest have complimentary reserved regions, which avoids this issue.

Also, to expand on what I mentioned on IRC, every x86 host is going to have some reserved range below 4G for this purpose, but if the aarch64 VM has no requirements for memory below 4G, the starting GPA for the VM could be at or above 4G and avoid this issue.

I have to admit that this is too low-level for me. I hope that the problem I hit will help someone to improve QEMU.

by Marcin Juszkiewicz at March 25, 2020 05:23 PM

ARM Datacenter Project

Understanding pthread_cond_broadcast

Recently we came across a piece of code in QEMU using the pthread_cond_broadcast function.

This method is intended to wake up all threads waiting on a condition variable. However, the method needs to be used with care. In particular, it should only be used if you can guarantee: a) that the waiter is in fact waiting or b) that there is another mechanism to wake up the waiter if the broadcast signal arrives when the thread is not waiting

For example, suppose we have the following code:

pthread_mutex_lock(&first_cpu->lock);
while (first_cpu->stopped) {
    pthread_cond_wait(first_cpu->halt_cond, first_cpu->lock);
    pthread_mutex_unlock(&first_cpu->lock);

    /* process any pending work */
    pending_work();    
    pthread_mutex_lock(&first_cpu->lock);
}
pthread_mutex_unlock(&first_cpu->lock);

Also suppose we have another thread which will call to pthread_cond_broadcast() to wakeup this thread.

If the above thread is waiting in pthread_cond_wait() when it is woken up by pthread_cond_broadcast(), then all is well.

However, if this thread is outside of the pthread_cond_wait() in the loop when pthread_cond_broadcast() is called, then this thread will not be woken up. In other words, when the thread loops around to pthread_cond_wait() it will NOT wait.

This means that either we need to guarantee the thread is waiting when the broadcast is sent OR we need to make sure that there is another way to wakeup the thread.

One other option is to change the pthread_cond_wait() to a pthread_cond_timedwait() to ensure that we will periodically perform this “pending_work(), even if the pthread_cond_brodcast() is missed.

by Rob Foley at March 25, 2020 10:25 AM

March 23, 2020

ARM Datacenter Project

Testing QEMU emulation: how to change QTest Accelerator

How can we change QEMU QTest to use different accelerators? And why would we do this?
This article is a follow-up to a prior article we posted on how to debug QEMU Qtests.

Each QTest will decide which accelerators it uses. For example, the test might try to use ‘kvm’, which causes QEMU to use KVM to execute code. Or the test might try to use ‘TCG’ support, where QEMU will emulate the instructions itself. Regardless of which path is chosen, this choice inevitably results in different code paths getting exercised inside QEMU itself.

In some cases when developing QEMU code, we might want to force certain code paths which are specific to different accelerators. In this case we have a few things to decide. Take the case for example, where we want to force a specific TCG code path on an aarch64 machine for an aarch64 QTest. We will use the tests/qtest/arm-cpu-features test as an example.

This test it selects the specific accelerator(s) to use for each test case. It is possible that we might want to force the use of a specific accelerator to force that code path in QEMU. We might want to use TCG instead of kvm for instance.

In this case we would need to edit the test, for instance tests/qtest/arm-cpu-features.c, and replace the use of “kvm” with “tcg”, or in cases where both -accel kvm and -accel tcg are used, just remove the kvm.

This will have the effect of forcing the use of a specific code path, which can be very useful when debugging or validating a change.

by Rob Foley at March 23, 2020 01:25 PM

March 20, 2020

ARM Datacenter Project

Testing QEMU emulation: how to debug QTest with gdb

QEMU has several different kinds of tests for exercising different aspects of the code. A page here has good details on these tests. Another very useful doument is testing.rst. Within this document it states that “QTest is a device emulation testing framework. It can be very useful to test device models; it could also control certain aspects of QEMU (such as virtual clock stepping), with a special purpose “qtest” protocol.”

We can run the QTests as part of either make check or make check-qtest.

If we run the QTests with the below command, it shows us more information about the commands it is executing.

make check-qtest V=1

For example, you might see something like this displayed.

$ QTEST_QEMU_BINARY=aarch64-softmmu/qemu-system-aarch64 QTEST_QEMU_IMG=qemu-img tests/qtest/tpm-tis-device-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="tpm-tis-device-test"
PASS 1 tpm-tis-device-test /aarch64/tpm-tis/test_check_localities
PASS 2 tpm-tis-device-test /aarch64/tpm-tis/test_check_access_reg
PASS 3 tpm-tis-device-test /aarch64/tpm-tis/test_check_access_reg_seize
PASS 4 tpm-tis-device-test /aarch64/tpm-tis/test_check_access_reg_release
PASS 5 tpm-tis-device-test /aarch64/tpm-tis/test_check_transmit

Let’s break this down a bit.
QTEST_QEMU_BINARY - This is the command that gets issued when starting QEMU. This is useful since we can add onto it if we would like to run QEMU within another command (like a debugger).
tests/qtest/tpm-tis-device-test -m=quick -k –tap - This is the actual QTest which will get executed. Inside this test it will launch QEMU.

Another useful option is:

--verbose

Once you add that to any qtest command, it would look something like this with more information displayed.

$ QTEST_QEMU_BINARY=aarch64-softmmu/qemu-system-aarch64 QTEST_QEMU_IMG=qemu-img tests/qtest/bios-tables-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="bios-tables-test"  --show-failures-only --verbose
   random seed: R02S0d429b0279b778325d7c631f360b375b
   Start of aarch64 tests
   Start of acpi tests
   starting QEMU: exec aarch64-softmmu/qemu-system-aarch64 -qtest unix:/tmp/qtest-29244.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-29244.qmp,id=char0 -mon chardev=char0,mode=control -display none -machine virt  -accel tcg -nodefaults -nographic -drive if=pflash,format=raw,file=pc-bios/edk2-aarch64-code.fd,readonly -drive if=pflash,format=raw,file=pc-bios/edk2-arm-vars.fd,snapshot=on -cdrom tests/data/uefi-boot-images/bios-tables-test.aarch64.iso.qcow2 -cpu cortex-a57 -accel qtest
   Start of virt tests
   starting QEMU: exec aarch64-softmmu/qemu-system-aarch64 -qtest unix:/tmp/qtest-29244.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-29244.qmp,id=char0 -mon chardev=char0,mode=control -display none -machine virt  -accel tcg -nodefaults -nographic -drive if=pflash,format=raw,file=pc-bios/edk2-aarch64-code.fd,readonly -drive if=pflash,format=raw,file=pc-bios/edk2-arm-vars.fd,snapshot=on -cdrom tests/data/uefi-boot-images/bios-tables-test.aarch64.iso.qcow2  -cpu cortex-a57 -object memory-backend-ram,id=ram0,size=128M -numa node,memdev=ram0 -accel qtest
   starting QEMU: exec aarch64-softmmu/qemu-system-aarch64 -qtest unix:/tmp/qtest-29244.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-29244.qmp,id=char0 -mon chardev=char0,mode=control -display none -machine virt  -accel tcg -nodefaults -nographic -drive if=pflash,format=raw,file=pc-bios/edk2-aarch64-code.fd,readonly -drive if=pflash,format=raw,file=pc-bios/edk2-arm-vars.fd,snapshot=on -cdrom tests/data/uefi-boot-images/bios-tables-test.aarch64.iso.qcow2  -cpu cortex-a57 -m 256M,slots=3,maxmem=1G -object memory-backend-ram,id=ram0,size=128M -object memory-backend-ram,id=ram1,size=128M -numa node,memdev=ram0 -numa node,memdev=ram1 -numa dist,src=0,dst=1,val=21 -accel qtest
   End of virt tests
   End of acpi tests
   End of aarch64 tests

We can launch the QTest from the debugger with something like this.

QTEST_QEMU_BINARY=aarch64-softmmu/qemu-system-aarch64 QTEST_QEMU_IMG=qemu-img gdb --args tests/qtest/bios-tables-test -m=quick -k --tap

But what if we want to debug QEMU itself with gdb?

To achieve this we would change the QTEST_QEMU_BINARY to something like this:

QTEST_QEMU_BINARY="sudo xterm -e gdb --tty $(tty) --args aarch64-softmmu/qemu-system-aarch64"

At least on our system we found that this only works by using sudo in front of xterm. Otherwise we found that the debugger actually seems to crash and exit with error. :(

When you launch the test, an xterm will pop up with a window. Inside that window, just hit r to run the test. Also keep in mind that you need to have the DISPLAY environment variable set for xterm to pop up.

export DISPLAY=12.345.67.89:0

When you use the verbose option you get to see the actual QEMU command used.

It might look something like this:

 starting QEMU: exec i386-softmmu/qemu-system-i386 -qtest unix:/tmp/qtest-33869.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-33869.qmp,id=char0 -mon chardev=char0,mode=control -display none -machine q35,kernel-irqchip=off -accel kvm -accel tcg -net none -display none -device pci-bridge,chassis_nr=1 -drive id=hd0,if=none,file=tests/acpi-test-disk-X74eKE,format=raw -device ide-hd,drive=hd0  -accel qtest

We also posted a follow-up article on how to change QEMU Qtest accelerators.

by Rob Foley at March 20, 2020 08:28 PM

March 12, 2020

ARM Datacenter Project

Testing QEMU emulation: how to debug kvm-unit-tests

kvm-unit-tests are a set of low level tests designed to exercise KVM. These tests are completely separate from QEMU (a different repo), but these tests can also be used to exercise QEMU.

Testing QEMU with kvm-unit-tests is the use case we will explore below.
A comprehensive description of kvm-unit-tests is outside the scope of this blog post, but check out this description for more details. It is worth noting that these tests launch QEMU using their test as the -kernel argument, and thus debugging the test can be viewed as similar to debugging a kernel under QEMU.

To run the tests with QEMU use a command similar to the below.
The -v option shows the commands it is issuing, which we will need for the next step.
The QEMU= is needed to point it to the QEMU you want to test, and the ACCEL=tcg is needed since we are testing with QEMU (instead of kvm).
There are more details on running the tests in the kvm-unit-tests README.

QEMU=../../qemu/build/aarch64-softmmu/qemu-system-aarch64 ACCEL=tcg ./run_tests.sh -v

When a test fails, the log directory (./logs) contains the log files for all tests. A failure might look like this.

$ TIMEOUT=0 QEMU=../../qemu/build/aarch64-softmmu/qemu-system-aarch64 ACCEL=tcg ./run_tests.sh -v
TESTNAME=gicv2-mmio TIMEOUT=0 ACCEL=tcg ./arm/run arm/gic.flat -smp $((($MAX_SMP < 8)?$MAX_SMP:8)) -machine gic-version=2 -append 'mmio'
PASS gicv2-mmio (17 tests, 1 skipped)
TESTNAME=gicv2-mmio-up TIMEOUT=0 ACCEL=tcg ./arm/run arm/gic.flat -smp 1 -machine gic-version=2 -append 'mmio'
FAIL gicv2-mmio-up (17 tests, 2 unexpected failures)
TESTNAME=gicv2-mmio-3p TIMEOUT=0 ACCEL=tcg ./arm/run arm/gic.flat -smp $((($MAX_SMP < 3)?$MAX_SMP:3)) -machine gic-version=2 -append 'mmio'
FAIL gicv2-mmio-3p (17 tests, 3 unexpected failures)

The first line of the log file has the command needed to run this test standalone.

../../qemu/build/aarch64-softmmu/qemu-system-aarch64 -nodefaults -machine virt,accel=tcg -cpu cortex-a57 -device virtio-serial-device -device virtconsole,chardev=ctd -chardev testdev,id=ctd -device pci-testdev -display none -serial stdio -kernel arm/gic.flat -smp 1 -machine gic-version=2 -append mmio # -initrd /tmp/tmp.RrDNvP8sPT

In order to attach the debugger we want to add two arguments to the above command line.
-s tells QEMU to use the TCP port :1234
-S will pause at startup, waiting for the debugger to attach.

../../qemu/build/aarch64-softmmu/qemu-system-aarch64 -s -S -nodefaults -machine virt,accel=tcg -cpu cortex-a57 -device virtio-serial-device -device virtconsole,chardev=ctd -chardev testdev,id=ctd -device pci-testdev -display none -serial stdio -kernel arm/gic.flat -smp 1 -machine gic-version=2 -append mmio # -initrd /tmp/tmp.RrDNvP8sPT

Then in another console, we will run gdb.

It is worth noting that in this case we are debugging an aarch64 QEMU on an x86_64 host. So we will need to use the gdb-multiarch since it can do cross debugging.

$ gdb-multiarch
(gdb) set arch aarch64
The target architecture is assumed to be aarch64
(gdb) file ./arm/gic.elf
Reading symbols from ./arm/gic.elf...(no debugging symbols found)...done.
(gdb) 

Note how it says no debugging symbols found above. That indicates that the .elf file is missing symbols. Note that the .elf file is essentially the built “kernel” that we are going to give to QEMU. There is also a similar .flat file. This is the same as the elf, but just stripped of any unnecessary sections.

In order to get the symbols built into the .elf file, we need to modify the Makefile.

The issue is that when we link the .elf file, it excludes the symbols. Because of the special way we are constructing the test, as essentially a kernel, kvm-unit-test uses a custom linker script. The script is in /arm/flat.lds.

There is a section of the flat.lds where we tell the linker which sections of the objects to set aside in the resulting binary.

    /DISCARD/ : {
        *(.note*)
        *(.interp)
        *(.debug*)
        *(.comment)
        *(.dynamic)
    }

We will need to remove the “*(.debug)” line so that the symbol table is not excluded (!excluded == included).

This section will then look like the below.

    /DISCARD/ : {
        *(.note*)
        *(.interp)
        *(.comment)
        *(.dynamic)
    }

Once we re-build the binaries, we can resume our debugging process where we left off launching qemu in one console and then launching gdb-multiarch in a different console.

$ gdb-multiarch
(gdb) set arch aarch64
The target architecture is assumed to be aarch64
(gdb) file ./arm/gic.elf
Reading symbols from ./arm/gic.elf...done.
(gdb) 

Note how the symbols were read successfully !

But we’re not out of the woods yet. To get the symbols loaded properly at the correct address, we need to first find the location where our code is loaded.

We will first attach to the target. Note that we use :1234 this is the port that was specified for us by the -s command to QEMU.

(gdb) target remote :1234
Remote debugging using :1234
warning: No executable has been specified and target does not support
determining executable automatically.  Try using the "file" command.
0x0000000040000000 in ?? ()

We then disassemble the first snippet of code. We disassemble using the address which is displayed after we attach to the target: 0x0000000040000000

(gdb) disassemble 0x0000000040000000,+100
Dump of assembler code from 0x40000000 to 0x40000064:
=> 0x0000000040000000:  ldr x0, 0x40000018
   0x0000000040000004:  mov x1, xzr
   0x0000000040000008:  mov x2, xzr
   0x000000004000000c:  mov x3, xzr
   0x0000000040000010:  ldr x4, 0x40000020
   0x0000000040000014:  br  x4
   0x0000000040000018:  .inst   0x44000000 ; undefined
   0x000000004000001c:  .inst   0x00000000 ; undefined
   0x0000000040000020:  .inst   0x40080000 ; undefined

If you follow the assembly above, you will see that we load in an address to jump to: 0x40080000

This happens to be the address where we loaded our code.

In order to load the symbols we need to determine the current location of our .text and .data segments.

“info files” will list the current offsets for our gic.elf binary.

(gdb) info files
Symbols from "/home/rob/qemu/alex-kvm-unit-tests/build/arm/gic.elf".
Local exec file:
    `/home/rob/qemu/alex-kvm-unit-tests/build/arm/gic.elf', file type elf64-littleaarch64.
    Entry point: 0x0
    0x0000000000000000 - 0x000000000000ef84 is .text
    0x0000000000010720 - 0x0000000000010768 is .dynsym
    0x0000000000010768 - 0x0000000000010769 is .dynstr
    0x0000000000010770 - 0x0000000000010788 is .hash
    0x0000000000010788 - 0x00000000000107a4 is .gnu.hash
    0x00000000000107a8 - 0x00000000000107c8 is .got
    0x00000000000107c8 - 0x0000000000012308 is .rodata
    0x0000000000012308 - 0x0000000000013670 is .data
    0x0000000000013670 - 0x0000000000022798 is .bss
    0x0000000000010000 - 0x0000000000010720 is .rela.dyn

Unload the current symbols with the “file” command (no argument). Note that we need to answer ‘y’ below.

(gdb) file
No executable file now.
Discard symbol table from `/home/rob/qemu/alex-kvm-unit-tests/build/arm/gic.elf'? (y or n) y
No symbol file now.

Find the current .text and .data addresses. Since we know our code was loaded at the address we found above (0x40080000), we merely need to add the offsets from the table to the load address.

.text is at offset 0 in the table, so the address is 0x40080000 .data is at offset 0x12308, so 0x40080000 + 0x12308 = 0x40092308

The add-symbol-file command will map the symbols properly. Note again we need to answer ‘y’ below.

(gdb) add-symbol-file ./arm/gic.elf 0x40080000 -s .data 0x40092308
add symbol table from file "./arm/gic.elf" at
    .text_addr = 0x40080000
    .data_addr = 0x40092308
(y or n) y

That’s it! The symbols are loaded.

We can tests this by setting a breakpoint or listing code.

Reading symbols from ./arm/gic.elf...done.
(gdb) b main
Breakpoint 1 at 0x40081748: file /home/rob/qemu/alex-kvm-unit-tests/arm/gic.c, line 517.
(gdb) l main
511     if (gic_version() == 2)
512         test_targets(nr_irqs);
513 }
514 
515 int main(int argc, char **argv)
516 {
517     if (!gic_init()) {

by Rob Foley at March 12, 2020 08:48 PM

March 10, 2020

ARM Datacenter Project

A Lightweight Thread based storage runtime

Background

Massive concurrency is the primary signature of Enterprise storage IO workloads. POSIX threads, a.k.a. pthreads, implemented in a Unix-like OS are too “heavy” to effectively support such concurrency.

The overhead incurred by pthreads can be grouped into two main areas:

1) memory cost of maintaining thread computational state, which includes duplicating all registers and data (stack, heap) plus memory associated with tracking the thread-state;

2) context switch cost, which includes cost of changing processors context (registers, counters etc) and changing computational context (interrupts, scheduling and processor and RAM cache).

These costs put a physical limit on how many pthreads a system can have and dictates how often a context switch may be profitably utilized.

User space lightweight threads (ULTs) are an alternative to traditional threads. ULTs allow a number of logical threads to share a thread context in a single physical pthread, thereby reducing memory consumption. A non-preemptive user space scheduler allows an application to switch between lightweight threads without involving the OS/kernel system scheduler. We can envision a runtime/library, which combines the features of a) sharing a single physical thread and b) user space scheduling. This runtime will have the advantage of being more efficient for managing large numbers of short-duration work items. We should mention that we expect these work items to have few external dependencies such as system calls.

Argobots Runtime for Storage (ABT-ST)

There are many lightweight thread implementations. Proprietary Microsft UMS scheduler, open source Argobots are a couple of examples. The Argobots Runtime for Storage builds upon Argobots framework and serves the need of storage workload. The main goal is to provide support for some typical storage workload characteristics:

  1. Mapping: A mechanism to evenly distribute I/O load across the available CPUS. When an I/O arrives, it is assigned to a lightweight thread (LWT), which is associated with a single load-group. A load-group has an OS thread that is bound to a single CPU core, and is shared by many lightweight threads (LWT). With the non-preemptive/cooperative user mode scheduler, instructions within the load-group are serialized and do not require a spinlock. This improves thread level parallelism, which is vital in today’s storage servers with large numbers of cores. It is especially important for ARM servers with hundreds of cores and multiple NUMA domains.

  2. Workload rebalance: When the load among physical CPU cores becomes imbalanced, a rebalancer will redistribute LWTs to optimally utilize CPU resources. The algorithm for rebalancing will also take different NUMA topologies into consideration when deciding how to migrate load-groups to different CPUs. See diagram rebalance:rebalance

  3. Stackable schedulers: Each load-group has its own scheduler, while a group of sequential instruction streams executed on a CPU core can have a main scheduler. This allows different types of load-groups to share a CPU core while keep their own scheduler. For example, front end write to cache IO load-group and backend write to disk IO load-group may share a CPU core, while keep their own low latency scheduler and high throughput scheduler.

The runtime source code is hosted at this github link. It is a work in progress as of today.

by Li at March 10, 2020 08:41 PM

March 05, 2020

ARM Datacenter Project

QEMU aarch64 ARM VMs

This is a follow-up to a prior post on how to launch ARM aarch64 VMs from scratch.

We are working on QEMU enhancements to support aarch64 ARM VMs inside QEMU’s vm-build infrastructure. This is a bit of test infrastructure which allows for building and testing QEMU source code within various flavors of VMs.

The aarch64 VMs are supported for Ubuntu and CentOS VMs.

Although we are working to this upstreamed into mainline QEMU, the current WIP for this project is here: https://github.com/rf972/qemu/tree/aarch64vm_2.7

Note that this support is also being used by the lisa-qemu integration https://github.com/rf972/lisa-qemu in order to enable easier testing and debug of varied hardware architectures.

To try this out you can see the available VMs via:

$ make vm-help

vm-build-ubuntu.aarch64 - Build QEMU in ubuntu aarch64 VM 
vm-build-centos.aarch64 - Build QEMU in CentOS aarch64 VM 
QEMU_CONFIG=/path/conf.yml - Change path to VM configuration .yml file. See config_example.yml for file format details.

To create the VM and build qemu inside it:

$ make vm-build-ubuntu.aarch64

or to make the VM and ssh into it:

$ make vm-boot-ssh-ubuntu.aarch64

Configuration yaml

We also support providing a configuration yaml file. This allows passing specific arguments to qemu to configure the hardware. For example, in our config_example.yaml, we configure a 4 NUMA node topology. To use a specific configuration, just provide the configuration YAML like so:

$ QEMU_CONFIG=../tests/vm/config_default.yml make vm-boot-ssh-ubuntu.aarch64

by Rob Foley at March 05, 2020 12:15 PM

March 04, 2020

ARM Datacenter Project

How to configure NUMA nodes with QEMU

QEMU does allow very flexible configuration of NUMA node topology.

When starting QEMU we can select a NUMA topology with the -numa argument.

In its most basic form, we can specify the CPUs assigned to each NUMA node.

-smp cpus=16 -numa node,cpus=0-3,nodeid=0 \
-numa node,cpus=4-7,nodeid=1 \
-numa node,cpus=8-11,nodeid=2 \
-numa node,cpus=12-15,nodeid=3 \

This gives us a system that looks like the following from lscpu.

Note below that we specified 4 NUMA nodes with each NUMA node containing 4 CPUs.

qemu@ubuntu:~$ lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              16
On-line CPU(s) list: 0-15
Thread(s) per core:  1
Core(s) per socket:  8
Socket(s):           2
NUMA node(s):        4
Vendor ID:           ARM
Model:               0
Model name:          Cortex-A57
Stepping:            r1p0
BogoMIPS:            125.00
NUMA node0 CPU(s):   0-3
NUMA node1 CPU(s):   4-7
NUMA node2 CPU(s):   8-11
NUMA node3 CPU(s):   12-15
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid

To go one step further we can also specify the NUMA distance between each CPU.

Suppose we have the following NUMA topology.

 ________      ________
|        |    |        |
| Node 0 |    | Node 1 |
|        |-15-|        |
|________|    |________|
     |    \  /     |
    20     20     20
 ____|___ /  \ ____|___
|        |/   |        |
| Node 2 |    | Node 3 |
|        |-15-|        |
|________|    |________|

We can use the -numa dist option to add on the specific NUMA distances.

-smp cpus=16 -numa node,cpus=0-3,nodeid=0 \
-numa node,cpus=4-7,nodeid=1 \
-numa node,cpus=8-11,nodeid=2 \
-numa node,cpus=12-15,nodeid=3 \
-numa dist,src=0,dst=1,val=15 \
-numa dist,src=2,dst=3,val=15 \
-numa dist,src=0,dst=2,val=20 \
-numa dist,src=0,dst=3,val=20 \
-numa dist,src=1,dst=2,val=20 \
-numa dist,src=1,dst=3,val=20

The numactl command will confirm the distances that we configured in QEMU.

qemu@ubuntu:~$ numactl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3
node 0 size: 983 MB
node 0 free: 852 MB
node 1 cpus: 4 5 6 7
node 1 size: 1007 MB
node 1 free: 923 MB
node 2 cpus: 8 9 10 11
node 2 size: 943 MB
node 2 free: 812 MB
node 3 cpus: 12 13 14 15
node 3 size: 1007 MB
node 3 free: 916 MB
node distances:
node   0   1   2   3 
  0:  10  15  20  20 
  1:  15  10  20  20 
  2:  20  20  10  15 
  3:  20  20  15  10 

by Rob Foley at March 04, 2020 08:32 PM

How to launch ARM aarch64 VM with QEMU from scratch.

The below instructions will allow for bringing up an ARM VM from scratch.

It is worth mentioning that our work includes automating all of the below, resulting in a more streamlined option to building and launching the VM. See the article on QEMU aarch64 VMs for more details.

To launch an aarch64 VM we first need to install a few dependencies, including QEMU and the qemu-efi-aarch64 package, which includes the efi firmware.

apt-get install qemu-system-arm
apt-get install qemu-efi-aarch64
apt-get install qemu-utils

Create the flash images with the correct sizes.

dd if=/dev/zero of=flash1.img bs=1M count=64
dd if=/dev/zero of=flash0.img bs=1M count=64
dd if=/usr/share/qemu-efi-aarch64/QEMU_EFI.fd of=flash0.img conv=notrunc

Download the image you want to boot.

For our example we use an Ubuntu installer.

wget http://ports.ubuntu.com/ubuntu-ports/dists/bionic-updates/main/installer-arm64/current/images/netboot/mini.iso

Create the empty Ubuntu image file we will install Ubuntu into.

We will use 20 gigabytes for this file.

qemu-img create ubuntu-image.img 20G

Start QEMU with the installer.

qemu-system-aarch64 -nographic -machine virt,gic-version=max -m 512M -cpu max -smp 4 \
-netdev user,id=vnet,hostfwd=:127.0.0.1:0-:22 -device virtio-net-pci,netdev=vnet \
-drive file=ubuntu-image.img,if=none,id=drive0,cache=writeback -device virtio-blk,drive=drive0,bootindex=0 \
-drive file=mini.iso,if=none,id=drive1,cache=writeback -device virtio-blk,drive=drive1,bootindex=1 \
-drive file=flash0.img,format=raw,if=pflash -drive file=flash1.img,format=raw,if=pflash 

Follow the instructions to install Ubuntu to the ubuntu-image.img file.

Once the install is finished you can exit QEMU with -a x.

Then restart QEMU without the installer image with the following command.

qemu-system-aarch64 -nographic -machine virt,gic-version=max -m 512M -cpu max -smp 4 \
-netdev user,id=vnet,hostfwd=:127.0.0.1:0-:22 -device virtio-net-pci,netdev=vnet \
-drive file=ubuntu-image.img,if=none,id=drive0,cache=writeback -device virtio-blk,drive=drive0,bootindex=0 \
-drive file=flash0.img,format=raw,if=pflash -drive file=flash1.img,format=raw,if=pflash 

by Rob Foley at March 04, 2020 07:16 PM

Marcin Juszkiewicz

CirrOS 0.5.0 released

Someone may say that I am main reason why CirrOS project does releases.

In 2016 I got task at Linaro to get it running on AArch64. More details are in my blog post ‘my work on changing CirrOS images’. Result was 0.4.0 release.

Last year I got another task at Linaro. So we released 0.5.0 version today.

But that’s not how it happened.

Multiple contributors

Since 0.4.0 release there were changes done by several developers.

Robin H. Johnson took care of kernel modules. Added new ones, updated names. Also added several new features.

Murilo Opsfelder Araujo fixed build on Ubuntu 16.04.3 as gcc changed preprocessor output.

Jens Harbott took care of lack of space for data read from config-drive.

Paul Martin upgraded CirrOS build system to BuildRoot 2019.02.1 and bumped kernel/grub versions.

Maciej Józefczyk took care of metadata requests.

Marcin Sobczyk fixed starting of Dropbear and dropped creation of DSS ssh key which was no longer supported.

My Linaro work

At Linaro I got Jira card with “Upgrade CirrOS’ kernel to Ubuntu 18.04’s kernel” title.

This was needed as 4.4 kernel was far too old and gave us several booting issues. Internally we had builds with 4.15 kernel but it should be done properly and upstream.

So I fetched code, did some test builds and started looking how to improve situation. Spoke with Scott Moser (owner of CirrOS project) and he told me about his plans to migrate from Launchpad to GitHub. So we did that in December 2019 and then fun started.

Continuous Integration

GitHub has several ways of adding CI to projects. First we tried GitHub Actions but turned out that it is paid service. Looked around and then I decided to go with Travis CI.

Scott generated all required keys and integration started. Soon we had every pull request going through CI. Then I added simple script (bin/test-boot) so each image was booted after build. Scott improved script and fixed Power boot issue.

Next step was caching downloads and ccache files. This was huge improvement!

In meantime Travis bumped free service to 5 simultaneous builders which got our builds even faster.

CirrOS supports building only under Ubuntu LTS. But I use Fedora so we merged two changes to make sure that proper ‘grub(2)-mkimage’ command is used.

Kernel changes

4.4 kernel had to go. First idea was to move to 4.18 from Ubuntu 18.04 release. But if we upgrade then why not going for HWE one? I checked 5.0 and 5.3 versions. As both worked fine we decided to go with newer one.

Modules changes

During start of CirrOS image several kernel modules are loaded. But there were several “no kernel module found” like messages for built-in ones.

We took care of it by querying /sys/module/ directory so now module loading is quiet process. At the end a list of loaded ones is printed.

VirtIO changes

Lot of things happened since 4.4 kernels. So we added several VirtIO modules.

One of results is working graphical console on AArch64. Thanks to ‘virtio-gpu’ providing framebuffer and ‘hid-generic’ handling usb input devices.

As lack of entropy is common issue in VM instances we added ‘virtio-rng’ module. No more ‘uninitialized urandom read’ messages from kernel.

Final words

Yesterday Scott created 0.5.0 tag and CI built all release images. Then I wrote release notes (based on ones from pre-releases). Kolla project got patch to move to use new version.

When next release? Looking at history someone may say 2023 as previous one was in 2016 year. But know knows. Maybe we will get someone with “please add s390x support” question ;D

by Marcin Juszkiewicz at March 04, 2020 10:53 AM

February 26, 2020

QEMU project

Announcing Google Summer of Code 2020 internships

QEMU is participating in Google Summer of Code 2020 again this year! Google Summer of Code (GSoC) is an internship program that brings students into open source development. GSoC offers full-time remote work opportunities for talented new developers wishing to get involved in our community.

Each intern works with one or more mentors who support them in their project. Code is submitted through QEMU’s open source development process, giving the intern experience in open source software development.

If you are interested in contributing to QEMU through a paid 12-week internship from May to August, take a look at our project ideas for more information. Applications are open March 16-31, 2020.

Please review the eligibility criteria before applying.

QEMU is also participating in the Outreachy internship program, so be sure to check that out as well!

February 26, 2020 07:00 AM

February 25, 2020

Stefano Garzarella

AF_VSOCK: nested VMs and loopback support available

During the last KVM Forum 2019, we discussed some next steps and several requests came from the audience.

In the last months, we worked on that and recent Linux releases contain interesting new features that we will describe in this blog post:

DevConf.CZ 2020

These updates and an introduction to AF_VSOCK were presented at DevConf.CZ 2020 during the “VSOCK: VM↔host socket with minimal configuration” talk. Slides and recording are available.

Nested VMs

Before Linux 5.5, the AF_VSOCK core supported only one transport loaded at run time. That was a limit for nested VMs, because we need multiple transports loaded together.

Types of transport

Under the AF_VSOCK core, that provides the socket interface to the user space applications, we have several transports that implement the communication channel between guest and host.

/img/2020-02-20-vsock-nested-vms-loopback/vsock_transports.png

vsock transports

These transports depend on the hypervisor and we can put them in two groups:

  • H2G (host to guest) transports: they run in the host and usually they provide the device emulation; currently we have vhost and vmci transports.
  • G2H (guest to host) transports: they run in the guest and usually they are device drivers; currently we have virtio, vmci, and hyperv transports.

Multi-transports

In a nested VM environment, we need to load both G2H and H2G transports together in the L1 guest, for this reason, we implemented the multi-transports support to use vsock through nested VMs.

/img/2020-02-20-vsock-nested-vms-loopback/vsock_nested_vms.png

vsock and nested VMs

Starting from Linux 5.5, the AF_VSOCK can handle two types of transports loaded together at runtime:

  • H2G transport, to communicate with the guest
  • G2H transport, to communicate with the host.

So in the QEMU/KVM environment, the L1 guest will load both virtio-transport, to communicate with L0, and vhost-transport to communicate with L2.

Local Communication

Another feature recently added is the possibility to communicate locally on the same host. This feature, suggested by Richard WM Jones, can be very useful for testing and debugging applications that use AF_VSOCK without running VMs.

Linux 5.6 introduces a new transport called vsock-loopback, and a new well know CID for local communication: VMADDR_CID_LOCAL (1). It’s a special CID to direct packets to the same host that generated them.

/img/2020-02-20-vsock-nested-vms-loopback/vsock_loopback.png

vsock loopback

Other CIDs can be used for the same purpose, but it’s preferable to use VMADDR_CID_LOCAL:

  • Local Guest CID
    • if G2H is loaded (e.g. running in a VM)
  • VMADDR_CID_HOST (2)
    • if H2G is loaded and G2H is not loaded (e.g. running on L0). If G2H is also loaded, then VMADDR_CID_HOST is used to reach the host

Richard recently used the vsock local communication to implement a regression test test for nbdkit/libnbd vsock support, using the new VMADDR_CID_LOCAL.

Example

# Listening on port 1234 using ncat(1)
l0$ nc --vsock -l 1234

# Connecting to the local host using VMADDR_CID_LOCAL (1)
l0$ nc --vsock 1 1234

Patches

by sgarzare@redhat.com (Stefano Garzarella) at February 25, 2020 07:30 PM

February 15, 2020

Stefan Hajnoczi

An introduction to GDB scripting in Python

Sometimes it's not humanly possible to inspect or modify data structures manually in a debugger because they are too large or complex to navigate. Think of a linked list with hundreds of elements, one of which you need to locate. Finding the needle in the haystack is only possible by scripting the debugger to automate repetitive steps.

This article gives an overview of the GNU Debugger's Python scripting support so that you can tackle debugging tasks that are not possible manually.

What scripting GDB in Python can do

GDB can load Python scripts to automate debugging tasks and to extenddebugger functionality. I will focus mostly on automating debugging tasks but extending the debugger is very powerful though rarely used.

Say you want to search a linked list for a particular node:

(gdb) p node.next
...
(gdb) p node.next.next
...
(gdb) p node.next.next.next

Doing this manually can be impossible for lists with too many elements. GDB scripting support allows this task to be automated by writing a script that executes debugger commands and interprets the results.

Loading Python scripts

The source GDB command executes files ending with the .py extension in a Python interpreter. The interpreter has access to the gdb Python module that exposes debugging APIs so your script can control GDB.


$ cat my-script.py
print('Hi from Python, this is GDB {}'.format(gdb.VERSION))
$ gdb
(gdb) source my-script.py
Hi from Python, this is GDB Fedora 8.3.50.20190824-28.fc31

Notice that the gdb module is already imported. See the GDB Python API documentation for full details of this module.

It's also possible to run ad-hoc Python commands from the GDB prompt:

(gdb) py print('Hi')
Hi

Executing commands

GDB commands are executed using gdb.execute(command, from_tty, to_string). For example, gdb.execute('step') runs the step command. Output can be collected as a Python string by setting to_string to True. By default output goes to the interactive GDB session.

Although gdb.execute() is fundamental to GDB scripting, at best it allows screen-scraping (interpreting the output string) rather than a Pythonic way of controlling GDB. There is actually a full Python API that represents the debugged program's types and values in Python. Most scripts will use this API instead of simply executing GDB commands as if simulating an interactive shell session.

Navigating program variables

The entry point to navigating program variables is gdb.parse_and_eval(expression). It returns a gdb.Value.

When a gdb.Value is a struct its fields can be indexed using value['field1']['child_field1'] syntax. The following example iterates a linked list:


elem = gdb.parse_and_eval('block_backends.tqh_first')
while elem:
name = elem['name'].string()
if name == 'drive2':
print('Found {}'.format(elem['dev']))
break
elem = elem['link']['tqe_next']

This script iterates the block_backends linked list and checks the name field of each element against "drive2". When it finds "drive2" it prints the dev field of that element.

There is a lot more that GDB Python scripts can do but you'll have to check out the API documentation to learn about it.

Conclusion

Python scripts can automate tedious debugging tasks in GDB. Having the full power of Python and access to file I/O, HTTP requests, etc means pretty much any debugging task can be turned into a full-blown program. A subset of this was possible in the past through GDB command scripts, but Python is a much more flexible programming language and familiar to many developers (more so than GDB's own looping and logic commands!).

by Unknown (noreply@blogger.com) at February 15, 2020 04:16 PM

Powered by Planet!
Last updated: May 27, 2020 05:10 AMEdit this page