Blogging about open source virtualization

News from QEMU, KVM, libvirt, libguestfs, virt-manager and related tools

Subscriptions

Planet Feeds

March 12, 2019

Cornelia Huck

s390x changes in QEMU 4.0

QEMU is now entering softfreeze for the 4.0 release (expected in April), so here is the usual summary of s390x changes in that release.

CPU Models

  • A cpu model for the z14 GA 2 has been added. Currently, no new features have been added.
  • The cpu model for z14 now does, however, include the multiple epoch and PTFF enhancement features per default.
  • The 'qemu' cpu model now includes the zPCI feature per default. No more prerequisites are needed for pci support (see below).

Devices


  • QEMU for s390x is now always built with pci support. If we want to provide backwards compatibility,  we cannot simply disable pci (we need the s390 pci host bus); it is easier to simply make pci mandatory. Note that disabling pci was never supported by the normal build system anyway.
  • zPCI devices have gained support for instruction counters (on a Linux guest, these are exposed through /sys/kernel/debug/pci/<function>/statistics).
  • zPCI devices always lacked support for migrating their s390-specific state (not implemented...); if you tried to migrate a guest with a virtio-pci device on s390x, odd things might happen. To avoid surprises, the 'zpci' devices are now explicitly marked as unmigratable. (Support for migration will likely be added in the future.)
  • Hot(un)plug of the vfio-ap matrix device is now supported.
  • Adding a vfio-ap matrix device no longer inhibits usage of a memory ballooner: Memory usage by vfio-ap does not clash with the concept of a memory balloon.

TCG

  • Support for the floating-point extension facility has been added.
  • The first part of support for z13 vector instructions has been added (vector support instructions). Expect support for the remaining vector instructions in the next release; it should support enough of the instructions introduced with z13 to be able to run a distribution built for that cpu. 

by Cornelia Huck (noreply@blogger.com) at March 12, 2019 05:49 PM

March 11, 2019

KVM on Z

libvirt v4.10 released, providing PCI passthrough support

libvirt v4.10, available for download at the libvirt project website, adds support for PCI passthrough devices on IBM Z (requires Linux kernel 4.14 and QEMU v2.11).
To setup passthrough for a PCI device, follow these steps:
  1. Make sure the vfio-pci module is  available, e.g. using the modinfo command:
       $ modinfo vfio-pci
       filename:       /lib/modules/4.18.0/kernel/drivers/vfio/pci/vfio-pci.ko
       description:    VFIO PCI - User Level meta-driver
  2. Verify that the pciutils package, providing the lspci command et al, is available using your distro's package manager
  3. Determine the PCI device's address using the lspci command:
       $ lspci

       0002:06:00.0 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family

                    [ConnectX-3/ConnectX-3 Pro Virtual Function]
     
  4. Add the following element to the guest domain XML's devices section:
       <hostdev mode='subsystem' type='pci' managed='yes'>

         <source>

           <address domain='0x0002' bus='0x06' slot='0x00' function='0x0'/>

         </source>

       </hostdev>

    Note that if attribute managed is set to no (which is the default), it becomes the user's duty to unbind the PCI device from the respective device driver, and rebind to vfio-pci in the host prior to starting the guest.
Once done and the guest is started, running the lspci command in the guest should show the PCI device, and one can proceed to configure it as needed.
It is well worth checking out the expanded domain XML:
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0002' bus='0x06' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0002' bus='0x00' slot='0x01' function='0x0'>
        <zpci uid='0x0001' fid='0x00000000'/>
      </address>
    </hostdev>

Theoretically, the PCI address in the guest can change between boots. However, the <zpci> element guarantees address persistence inside of the guest. The actual address of the passthrough device is based solely on the uid attribute: The uid becomes the PCI domain, all remaining values of the address (PCI bus, slot and function) are set to zero. Therefore, in this example, the PCI address in the guest would be 0001:00:00:0.
Take note of the fid attribute, whose value is required to hotplug/hotunplug PCI devices within a guest.
Furthermore note that the target PCI address is not visible anywhere (except within the QEMU process) at all. I.e. it is not related to the PCI address as observed within the KVM guest, and could be set to an arbitrary value. However, choosing the "wrong" values might have undesired subtle side effects with QEMU. Therefore, we strongly recommend not to specify a target address, and to rather rely on the auto-assignment. I.e. if the guest's PCI address has to be chosen, at a maximum restrict the target address element to uid (for PCI address definition) and fid (so that e.g. scripts in the guest for hotplugging PCI devices can rely on a specific value) as follows:
   <address type='pci'>
     <zpci uid='0x0001' fid='0x00000000'/>
   </address>


For further (rather technical) details see here and here (git commit).

by Stefan Raspl (noreply@blogger.com) at March 11, 2019 03:11 PM

March 02, 2019

Gerd Hoffmann

EDID support for qemu

Over the last months I've worked on adding EDID support to qemu. This allows to pass all kinds of information about the (virtual) display to the guest. Preferred video mode, display resolution, monitor name, monitor serial number and more. Current focus is getting the infrastructure in place. Once we have this we can build new features on top. HiDPI support comes to mind for example.

New in qemu 3.1

In qemu 3.1 the EDID generator code and support for the qemu stdvga was added. Right now EDID support is turned off by default, use edid=on to enable it. With EDID enabled you can also use the xres and yres properties to set the preferred video mode. Here is an example: qemu -device VGA,edid=on,xres=1280,yres=800

The qemu-edid utility has been added too. Main purpose is to allow testing the generator code without having to boot a guest for that, so typically the qemu-edid output is piped into the edid-decode utility to verify the generator works correctly. If you need an EDID blob for other reasons you might find this useful.

New in linux kernel 5.0

Infrastructure work: Some interface updates have been merged:

  • EDID support has been added to the virtio-gpu protocol.
  • The vfio mdev interface for vgpus got EDID support too.

We also got EDID support in drm drivers for both qemu stdvga (bochs-drm.ko) and virtio-gpu.

If both guest (linux kernel) and host are (qemu) are new enough the drm drivers will use the informations from the edid blob to create the video mode list. It will also be available in sysfs, you can use edid-decode to get a human-readable version: edid-decode /sys/class/drm/card0-Virtual-1/edid

Planned for qemu 4.0

Support for EDID in virtio-gpu will be added, it is already merged in master branch. It is also turned off by default, use edid=on to enable it (simliar to stdvga).

Support for EDID in vgpus (i.e. vfio-mdev) is planned too, patches are out for review right now. Note that the kernel's mdev driver must support that too.

The qemu macos driver for the stdvga have been updated to support EDID.

Future plans

Intel is working on adding EDID support to gvt (the intel graphics mdev driver). Should land in the 5.1 or 5.2 linux kernel merge window.

Once the EDID support got some real world testing it will be enabled by default for both stdvga and virtio-gpu. Unless something unexpected happens that will probably happen in qemu 4.1.

As already mentioned above looking at HiDPI support (starting with the gtk UI probably) is something I plan to look at when I find some time.

by Gerd Hoffmann at March 02, 2019 11:00 PM

February 28, 2019

Stefan Hajnoczi

QEMU accepted into Google Summer of Code and Outreachy 2019

QEMU is participating in the Google Summer of Code and Outreachy open source internship programs again this year. These 12-week, full-time, paid, remote work internships allow people interested in contributing to QEMU get started. Each intern works with one or more mentors who can answer questions and are experienced developers. This is a great way to try out working on open source if you are considering it as a career.

For more information (including eligibility requirements), see our GSoC and our Outreachy pages.

by stefanha (noreply@blogger.com) at February 28, 2019 05:17 PM

February 27, 2019

Gerd Hoffmann

ramfb display in qemu

ramfb is a very simple framebuffer display device. It is intended to be configured by the firmware and used as boot framebuffer, until the guest OS loads a real GPU driver.

The framebuffer memory is allocated from guest RAM and initialized using the firmware config interface (fw_cfg). edk2 (uefi firmware) has ramfb support. There also is a vgabios, which emulates vga text mode and renders it to the framebuffer.

Most interesting use case for this is boot display support for vgpus. vfio has a not hot-pluggable variant, which allows to enable ramfb support: qemu -device vfio-pci-nohotplug,ramfb=on,... Once the guest OS has initialized the vgpu qemu will show the vgpu display. Otherwise the ramfb framebuffer is used. The firmware messages, boot loader menu and efifb/vesafb output all will show via ramfb.

There also is a standalone device, mostly intended for testing: qemu -vga none -device ramfb.

Even though it is possible to use ramfb as primary display it isn't a good idea to actually do that as this isn't very efficient.

by Gerd Hoffmann at February 27, 2019 11:00 PM

February 26, 2019

QEMU project

Announcing GSoC and Outreachy 2019 internships

QEMU is once again participating in Google Summer of Code and Outreachy this year! These open source internship programs offer full-time remote work opportunities for talented new developers wishing to get involved in our community.

Each intern works with one or more mentors who support them in their project. Code is submitted according to QEMU’s normal development process, giving the intern experience in open source software development. Our projects range from device emulation to performance optimization to test infrastructure.

If you are interested in contributing to QEMU through a paid 12-week internship from May to August 2019, take a look at our GSoC page and our Outreachy page for more information.

Both GSoC and Outreachy have eligibility criteria, which you can review here (GSoC) and here (Outreachy) before applying.

You can read about projects that were completed in 2018 here.

These internships are generously funded by Google (GSoC) and Red Hat (Outreachy).

February 26, 2019 07:00 AM

February 18, 2019

Daniel Berrange

Easier QEMU live tracing using systemtap

QEMU is able to leverage a number of live tracing systems, the choice configurable at build time between

  • log – printf formatted string for each event sent into QEMU’s logging system which writes to stderr
  • syslog – printf formatted string for each event sent via syslog
  • simple – binary data stream for each event written to a file or fifo pipe
  • ftrace – printf formatted string for each event sent to kernel ftrace facility
  • dtrace – user space probe markers dynamically enabled via dtrace or systemtap
  • ust – user space probe markers dynamically enabled via LTT-ng

Upstream QEMU enables the “log” trace backend by default since it is cross-platform portable and very simple to use by adding “-d trace:PATTERN” on the QEMU command line. For example to enable logging of all trace events in the QEMU I/O subsystem (aka “qio“) we can

$ qemu -d trace:qio* ...some args...
23266@1547735759.137292:qio_channel_socket_new Socket new ioc=0x563a8a39d400
23266@1547735759.137305:qio_task_new Task new task=0x563a891d0570 source=0x563a8a39d400 func=0x563a86f1e6c0 opaque=0x563a89078000
23266@1547735759.137326:qio_task_thread_start Task thread start task=0x563a891d0570 worker=0x563a86f1ce50 opaque=0x563a891d9d90
23273@1547735759.137491:qio_task_thread_run Task thread run task=0x563a891d0570
23273@1547735759.137503:qio_channel_socket_connect_sync Socket connect sync ioc=0x563a8a39d400 addr=0x563a891d9d90
23273@1547735759.138108:qio_channel_socket_connect_fail Socket connect fail ioc=0x563a8a39d400

This is very simple and surprisingly effective much of the time, but it is not without its downsides

  • Inactive probes have non-negligible performance impact on hot codepaths
  • It is targetted to human consumption, so it is not easy to reliably process with machines
  • It requires adding arguments to QEMU’s command line so is not easy to enable in many cases
  • It is custom to QEMU so does not facilitate getting correlated traces across the whole system

For these reasons, some downstreams chose not to use the default “log” backend. Both Fedora and RHEL have instead enabled the “dtrace” backend which can be leveraged via systemtap on Linux. This provides a very powerful tracing system, but the cost is that the previous simple task of printing a formatted string when a probe point fires has become MUCH more complicated. For example to get equivalent output to that seen with QEMU’s log backend would require

# cat > trace.stp <<EOF
probe qemu.system.x86_64.qio_task_new {
    printf("%d@%d qio_task_new Task new task=%p source=%p func=%p opaque=%p\n", 
           pid(), gettimeofday_ns(), task, source, func, opaque)
}
EOF
# stap trace.stp
22806@1547735341399862570 qio_task_new Task new task=0x56135cd66eb0 source=0x56135d1d7c00 func=0x56135af746c0 opaque=0x56135bf06400

Repeat that code snippet for every qio* probe point you want to watch, figuring out the set of args it has available to print.This quickly becomes tedious for what should be a simple logging job, especially if you need to reference null terminated strings from userspace.

After cursing this difficulty one time too many, it occurred to me that QEMU could easily do more to make life easier for systemtap users. The QEMU build system is already auto-generating all the trace backend specific code from a generic description of probes in the QEMU source tree. It has a format string which is used in the syslog, log and ftrace backends, but this is ignored for the dtrace backend. It did not take much to change the code generator so that it can use this format string to generate a convenient systemtap tapset representing the above manually written probe:

probe qemu.system.x86_64.log.qio_task_new = qemu.system.x86_64.qio_task_new ?
{
    printf("%d@%d qio_task_new Task new task=%p source=%p func=%p opaque=%p\n",
           pid(), gettimeofday_ns(), task, source, func, opaque)
}

This can be trivially executed with minimal knowledge of systemtap tapset language required

# stap -e "qemu.system.x86_64.log.qio_task_new{}"
22806@1547735341399862570 qio_task_new Task new task=0x56135cd66eb0 source=0x56135d1d7c00 func=0x56135af746c0 opaque=0x56135bf06400

Even better, we have now gained the ability to use wildcards too

# stap -e "qemu.system.x86_64.log.qio*{}"
23266@1547735759.137292:qio_channel_socket_new Socket new ioc=0x563a8a39d400
23266@1547735759.137305:qio_task_new Task new task=0x563a891d0570 source=0x563a8a39d400 func=0x563a86f1e6c0 opaque=0x563a89078000
23266@1547735759.137326:qio_task_thread_start Task thread start task=0x563a891d0570 worker=0x563a86f1ce50 opaque=0x563a891d9d90
23273@1547735759.137491:qio_task_thread_run Task thread run task=0x563a891d0570
23273@1547735759.137503:qio_channel_socket_connect_sync Socket connect sync ioc=0x563a8a39d400 addr=0x563a891d9d90
23273@1547735759.138108:qio_channel_socket_connect_fail Socket connect fail ioc=0x563a8a39d400

Users still, however, need to be aware of the naming convention for QEMU’s systemtap tapsets and how it maps to the particular QEMU binary that is used, and don’t forget the trailing “{}”. Thus I decided to go one step further and ship a small helper tool to make it even easier to use

$ qemu-trace-stap run qemu-system-x86_64 'qio*'
22806@1547735341399856820 qio_channel_socket_new Socket new ioc=0x56135d1d7c00
22806@1547735341399862570 qio_task_new Task new task=0x56135cd66eb0 source=0x56135d1d7c00 func=0x56135af746c0 opaque=0x56135bf06400
22806@1547735341399865943 qio_task_thread_start Task thread start task=0x56135cd66eb0 worker=0x56135af72e50 opaque=0x56135c071d70
22806@1547735341399976816 qio_task_thread_run Task thread run task=0x56135cd66eb0

The second argument to this tool is the QEMU binary filename to be traced, which can be relative (to search $PATH) or absolute. What is clever is that it will set the SYSTEMTAP_TAPSET env variable to point to the right location to find the corresponding tapset definition. This is very useful when you have multiple copies of QEMU on the system and need to make sure systemtap traces the right one.

The ‘qemu-trace-stap‘ script takes a verbose arg so you can understand what it is running behind the scenes:

$ qemu-trace-stap run /home/berrange/usr/qemu-git/bin/qemu-system-x86_64 'qio*'
Using tapset dir '/home/berrange/usr/qemu-git/share/systemtap/tapset' for binary '/home/berrange/usr/qemu-git/bin/qemu-system-x86_64'
Compiling script 'probe qemu.system.x86_64.log.qio* {}'
Running script, <Ctrl>-c to quit
...trace output...

It can enable multiple probes at once

$ qemu-trace-stap run qemu-system-x86_64 'qio*' 'qcrypto*' 'buffer*'

By default it monitors all existing running processes and all future launched proceses. This can be restricted to a specific PID using the –pid arg

$ qemu-trace-stap run --pid 2532 qemu-system-x86_64 'qio*'

Finally if you can’t remember what probes are valid it can tell you

$ qemu-trace-stap list qemu-system-x86_64
ahci_check_irq
ahci_cmd_done
ahci_dma_prepare_buf
ahci_dma_prepare_buf_fail
ahci_dma_rw_buf
ahci_irq_lower
...snip...

This new functionality merged into QEMU upstream a short while ago and will be included in the QEMU 4.0 release coming at the end of April.

by Daniel Berrange at February 18, 2019 03:00 PM

January 28, 2019

Thomas Huth

How to create small VMs with buildroot

A couple of times I already ran into the situation that I wanted to provide a small guest disk image to other people. For example, one time I wanted to provide a test application like LTP to colleagues via a server where I only had some limited disk quota available. Back then I was still able to resolve the problem by installing a stock Linux distribution together with the test software into a normal qcow2 image, and then to shrink the image with qemu-img convert and xz to approximately 500 MiB.

But when I started to think about the QEMU advent calendar 2018, where I wanted to provide many small images for various different target architectures, it was clear to me that I needed a different approach. First the disk images needed to be much smaller due to network traffic constraints, and for many of “non-mainstream” target architectures (like MicroBlaze or Xtensa) you also can not easily get a standard Linux distribution that installs without problems on the machines that QEMU provides.

Instead of using a pre-built Linux distribution, it would also be possible to cross-compile the kernel and user space programs and build a small disk image with that on your own. However, figuring out how to do that for multiple target architectures would have been very cumbersome and time consuming.

So after doing some research, I finally discovered buildroot, which is an excellent framework for doing exactly what I wanted: It allows to create small disk images for non-x86 target CPUs, with all the magic about cross compiling and image creation wrapped into its internal scripts, and with a very flexible Kconfig-style configuration system on top.

For those who are interested, here’s now a short description how to use buildroot for creating a small guest disk image:

  1. Download the version that you like to use from the buildroot download page and unpack it:
    $ wget https://buildroot.org/downloads/buildroot-2018.02.9.tar.bz2
    $ tar -xaf buildroot-2018.02.9.tar.bz2 
    $ cd buildroot-2018.02.9/
    
  2. Now you have to choose for which CPU and machine target you want to build. Have a look at the pre-defined config files and then select one. In the following example, I’m going to use the “pseries” POWER machine:
    $ cd configs/
    $ ls qemu*
    qemu_aarch64_virt_defconfig         qemu_nios2_10m50_defconfig
    qemu_arm_versatile_defconfig        qemu_or1k_defconfig
    qemu_arm_versatile_nommu_defconfig  qemu_ppc64le_pseries_defconfig
    qemu_arm_vexpress_defconfig         qemu_ppc64_pseries_defconfig
    qemu_m68k_mcf5208_defconfig         qemu_ppc_g3beige_defconfig
    qemu_m68k_q800_defconfig            qemu_ppc_mpc8544ds_defconfig
    qemu_microblazebe_mmu_defconfig     qemu_ppc_virtex_ml507_defconfig
    qemu_microblazeel_mmu_defconfig     qemu_sh4eb_r2d_defconfig
    qemu_mips32r2el_malta_defconfig     qemu_sh4_r2d_defconfig
    qemu_mips32r2_malta_defconfig       qemu_sparc64_sun4u_defconfig
    qemu_mips32r6el_malta_defconfig     qemu_sparc_ss10_defconfig
    qemu_mips32r6_malta_defconfig       qemu_x86_64_defconfig
    qemu_mips64el_malta_defconfig       qemu_x86_defconfig
    qemu_mips64_malta_defconfig         qemu_xtensa_lx60_defconfig
    qemu_mips64r6el_malta_defconfig     qemu_xtensa_lx60_nommu_defconfig
    qemu_mips64r6_malta_defconfig
    $ cd ..
    $ make qemu_ppc64_pseries_defconfig
    
  3. Now run make menuconfig to fine tune your build. I recommend to have a look at the following settings first:
    • In the Toolchain section, you might need to enable other languages like C++ in case it is required for the application that you want to ship in the image.
    • In the System Configuration section, change the System Banner to something that better suits your disk image.
    • Check the Kernel section to see whether the right kernel settings are used here. The defaults should be fine most of the time, but in case you want to use a newer kernel version for example, or a different kernel config file, you can adjust it here. Note that you also should adjust the kernel header version in the Toolchain section if you change the kernel version here.
    • Have a look at the Target packages section – maybe the application that you want to include is already available by the base buildroot system. In that case you can already enable it here.
    • Check the Filesystem images section and decide which kind of image you want to ship later. For example, for most of the QEMU advent calendar images, I used a simple initrd only, so I unchecked the ext2/3/4 root filesystem here and used initial RAM filesystem linked into linux kernel instead.
  4. Now save your configuration, exit the config menu, and type make for a first test to see whether it produces a usable image. Note: Don’t use the -j parameter of make here, buildroot will figure that out on its own instead.

  5. Once the build finished successfully, have a look at the output/images/ directory. You can start your guest with the results from there to give it a try. For example if you built with the ppc64 pseries configuration, with the initrd linked into the kernel:
    $ qemu-system-ppc64 -M pseries -m 1G -kernel output/images/vmlinux
    

    You should see the kernel booting up, and if you have a look at the serial console, there is also a getty running where you can log in as root and look around.

  6. To customize your build, you sooner or later want to add additional files to the image, for example some additional init scripts in the /etc/init.d/ folder. Or in the above case, it would be good to also have getty running on the graphical console. So to add custom files, the best way is to create an overlay folder which will be copied into the destination filesystem during the make process:
    $ mkdir overlay/etc/init.d
    $ cp my-startup-script.sh overlay/etc/init.d/S99myscript  # If you have one
    $ cp output/target/etc/inittab overlay/etc/inittab
    $ echo 'tty1::respawn:/sbin/getty -L tty1 0 linux' >> overlay/etc/inittab
    

    Then run make menuconfig and set the Root filesystem overlay directories option in the System Configuration section to the overlay folder that you have just created. Run make again and the next time you start your guest, you should see the new files in the image, e.g. also a getty running on the graphical console. Note: Do not try to add/change files directly in the output/target/ folder. That looks tempting first, but this is just a temporary folder used by the build system, which can be overwritten at any time and will be erased when you run make clean for example.

  7. If you need to tweak the kernel configuration, you can run make linux-menuconfig and do the appropriate changes there. For example, if you want to get keyboard input for the ppc64 pseries machine on the graphical console, you should enable the USB XHCI driver in the kernel, too. Once you are happy with the kernel configuration, save it, exit the menu and type make linux-rebuild && make. Note: To avoid that the kernel config gets reset after you run make clean at a later point in time, you should copy output/build/linux-*/.config to a safe location. Then run make menuconfig, change the Kernel -> Kernel configuration setting to Use a custom config file and set the Configuration file path to the copied file.

  8. If you want to add additional software to your image, you basically have to provide a Config.in file and a *.mk file. I recommend to have a look at the various packages in the package/ directory. Use one of the software from there with a similar build system as a template, and have a closer look at buildroot manual for details. Tweaking the build system of your software to properly cross-compile can be a little bit tricky some times, but most software that uses standard systems like autoconf should be fine.

That’s it. You now should be able to package your software in really small VM images. Of course, there are still lots of other settings that you can tweak in the buildroot environment – if you need any of these just have a look at the good buildroot manual for more information.

January 28, 2019 02:20 PM

January 25, 2019

Stefan Hajnoczi

VIRTIO 1.1 is available for public review until Feb 21st 2019

The VIRTIO 1.1 specification for paravirtualized I/O devices includes the new packed vring layout and the GPU, input, crypto, and socket device types. In addition to this there are other improvements and new features in the specification. The new vring layout will increase performance and offers new features that devices can take advantage of.

You can review the specification and post comments until February 21st 2019: VIRTIO 1.1 csprd01.

by stefanha (noreply@blogger.com) at January 25, 2019 02:42 PM

January 23, 2019

QEMU project

GSoC and Outreachy 2018 retrospective

QEMU participates in open source internship programs including Google Summer of Code (GSoC) and Outreachy. These full-time remote work opportunities allow talented new developers to get involved in our community. This post highlights what our interns achieved in 2018.

micro:bit board emulation

Julia Suvorova (Outreachy) and Steffen Görtz (GSoC) tackled adding emulation support for the micro:bit ARM board. Although QEMU already has plenty of ARM emulation code, the Cortex-M0 CPU used in the micro:bit was not yet implemented and the nRF51 system-on-chip was also missing.

The goal of this project was to run micro:bit programs (usually created with the MicroPython or Javascript/Blocks IDEs) with a core set of emulated devices, including the serial port, pushbuttons, and LEDs.

QEMU 3.1 already shipped the groundwork for the new qemu-system-arm -M microbit machine type. Enough functionality to run basic micro:bit programs is expected in the next QEMU release.

This project was mentored by Jim Mussared, Joel Stanley, and Stefan Hajnoczi.

Patchew REST API improvements

Shubham Jain (GSoC) created a REST API for the Patchew continuous integration system that is at the heart of QEMU’s development process. The previous API was not RESTful and exposed database schema internals.

The improvements to the REST API have been included into Patchew and are deployed on patchew.org. They are not in use yet, pending more work on authentication; this may be the topic of a future Summer of Code internship.

This project was mentored by Paolo Bonzini and Fam Zheng.

Qtest Driver Framework

Emanuele Esposito (GSoC) enhanced QEMU’s test infrastructure with an engine that starts tests with all variants of devices that they are capable of driving.

This is a complicated task in QEMU since certain devices and buses are available in an architecture-specific way on each emulation target, making it hard to write test cases without lots of hardcoded dependencies - and to keep them up-to-date!

The qgraph framework that Emanuele created eliminates the need to hardcode each variant into the test. Emanuele also converted several existing tests. His framework was also presented at KVM Forum 2018 by Laurent Vivier and should be merged in 4.0.

This project was mentored by Paolo Bonzini and Laurent Vivier.

Vulkan-izing VirGL

Nathan Gauër (GSoC) improved VirGL, which provides an OpenGL path well supported on Linux guests. On the host, QEMU offers several console back-ends, from EGL to SDL. Adding a Vulkan path will require to change the current VirGL API, write new guest drivers, and also offer a way to display the output. This is a huge task, which can be split in several sub-projects. Expending the current VirGL API to support Vulkan is the first step.

Code is available here.

This project was mentored by Marc-André Lureau.

Stay tuned for 2019 internships!

QEMU will apply to Google Summer of Code and Outreachy again in 2019. We hope to offer more great open source internship opportunities for new developers.

January 23, 2019 06:50 AM

January 07, 2019

KVM on Z

Knowledge Series: CPU Models and Migration Considerations


Another new entry in our Knowledge Series explains CPU models and its relationship to live guest migration.

by Stefan Raspl (noreply@blogger.com) at January 07, 2019 09:23 PM

January 06, 2019

Stefan Hajnoczi

mute-thread: a script to mute email threads with notmuch

Ever get included on an email thread that isn't relevant? It can be distracting to see new emails appear on a thread you already know is not interesting. You could mark them as read manually, but that is tedious.

This mute-thread script silences email threads that you don't want to read, even after new emails are received.

Download it here.

Setup

It relies on the awesome notmuch(1) email utility, so make sure you have that set up in order to use this script.

The following .muttrc macro integrates this with the mutt(1) email client. When you press M the entire email thread is muted:

macro index M "<enter-command>unset wait_key<enter><pipe-message>~/.mutt/mute-thread add<enter><enter-command>set wait_key<enter><read-thread>" "Mute thread"

After fetching new emails, run notmuch and then mute-thread apply.

Unmuting threads

If you change your mind, run mute-thread remove MESSAGE-ID to unmute a thread again. Future emails will not be silenced.

by stefanha (noreply@blogger.com) at January 06, 2019 07:56 AM

December 17, 2018

KVM on Z

QEMU v3.1 released

QEMU v3.1 is out. Besides a number of small enhancements, some items that we would like to highlight from a KVM on Z perspective:
  • Huge Pages Support: KVM guests can now utilize 1MB pages. As this removes one layer of address translation for the guest backing, less page-faults need to be processed, and less translation lookaside buffer (TLB) entries are needed to hold translations. This, as well as the TLB improvements in z14, will improve KVM guest performance.
    To use:
    Create config file /etc/modprobe.d/kvmhpage.conf file with the following content to enable huge pages for KVM:

       options kvm hpage=1


    Furthermore, add the following line to /etc/sysctl.conf to reserve N huge pages:

       vm.nr_hugepages = N

    Alternatively, append the following statement to the kernel parameter line in case support is compiled into the kernel: kvm.hpage=1 hugepages=N.
    Note that means to add hugepages dynamically after boot exist, but with effects like memory fragmentation, it is preferable to define huge pages as early as possible.
    If successful, the file /proc/sys/vm/nr_hugepages should show N huge pages. See here for further documentation.
    Then, to enable huge pages for a guest, add the following element to the respective domain XML:

       <memoryBacking>
         <hugepages/>
       </memoryBacking>


    The use of huge pages in the host is orthogonal to the use of huge pages in the guest. Both will improve the performance independently by reducing the number of page faults and the number of page table walks after a TLB miss.
    The biggest performance improvement can be achieved by using huge pages in both, host and guest, e.g. with libhugetlbfs, as this will also make use of the larger 1M TLB entries in the hardware.
    Requires Linux kernel 4.19.
  • virtio-ap: The Adjunct Processor (AP) facility is an IBM Z cryptographic facility comprised of three AP instructions and up to 256 cryptographic adapter cards, each of which can be group into up to 85 domains , providing cryptographic services. virtio-ap maps a subset of the AP devices/domains to one or more KVM guests, such that the host and each guest has exclusive access to a discrete set of AP devices.
    Here is a small sample script illustrating host setup:

       # load vfio-ap device driver
       modprobe vfio-ap

       # create an mdev by specifying a UUID (or use uuidgen instead)
       UUID=e926839d-a0b4-4f9c-95d0-c9b34190c4ba
       echo $UUID /sys/devices/vfio_ap/matrix/create

       # reserve AP queue 7 on adapter 3 for use by a KVM guest
       echo -0x3 > /sys/bus/ap/apmask
       echo -0x7 > /sys/bus/ap/aqmask

       # create a mediated device (mdev) to provide userspace access
       # to a device in a secure manner
       echo $UUID > /sys/devices/vfio_ap/matrix/mdev_supported_types/ \
                    vfio_ap-passthrough/create
       # assign adapter, domain and control domain
       echo +0x3 > /sys/devices/vfio_ap/matrix/${UUID}/assign_adapter
       echo +0x7 > /sys/devices/vfio_ap/matrix/${UUID}/assign_domain
       echo +0x7 > /sys/devices/vfio_ap/matrix/${UUID}/assign_control_domain


    To make use of the AP device in a KVM guest, add the following element to the respective domain XML:

       <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-ap'>
         <source>
           <address uuid='e926839d-a0b4-4f9c-95d0-c9b34190c4ba'/>
         </source>
       </hostdev>


    Once complete, use the passthrough device in a KVM guest just like a regular crypto adapter.
    Requires Linux kernel 4.20 and libvirt 4.9.

by Stefan Raspl (noreply@blogger.com) at December 17, 2018 04:31 PM

December 13, 2018

KVM on Z

SLES 12 SP4 released

SLES 12 SP4 is out! See the announcement and their release note with Z-specific changes.
It ships the following code levels:
  • Linux kernel 4.12 (SP3: 4.4),
  • QEMU v2.11 (SP3: v2.9), and
  • libvirt v4.0 (SP3: v3.3).
See previous blog entries on QEMU v2.10 and v2.11 for details on new features that become available by the QEMU package update.
See previous blog entries on Linux kernel 4.8 and 4.11 for details on new features becoming available through the kernel update, e.g. nested virtualization support.
An additional feature in this release is the availability of STHYI information in LPAR environments. Requires qclib v1.3 or later. See this blog post for general information on qclib.
    Furthermore, note that these changes provide a full CPU model, which provides protection against live guest migration compatibility troubles. E.g. migrating a guest exploiting the latest features to a KVM instance running on an earlier IBM Z machine lacking said feature would be detected an prevented.
    Note: With this feature, live guest migration back to a KVM instance that does not yet support CPU models (e.g. SLES 12 SP3) will not work anymore.

      by Stefan Raspl (noreply@blogger.com) at December 13, 2018 10:04 AM

      December 12, 2018

      QEMU project

      QEMU version 3.1.0 released

      We would like to announce the availability of the QEMU 3.1.0 release. This release contains 1900+ commits from 189 authors.

      You can grab the tarball from our download page. The full list of changes are available in the Wiki.

      Highlights include:

      • ARM: emulation support for microbit and Xilinx Versal machine models
      • ARM: support for ARMv6M architecture and Cortex-M0 CPU model
      • ARM: support for Cortex-A72 CPU model
      • ARM: virt/xlnx-zynqmp: virtualization extensions for GICv2 interrupt controller
      • ARM: emulation of AArch32 virtualization/hypervisor mode now supported for Cortex-A7 and Corex-A15
      • MIPS: emulation support for nanoMIPS I7200
      • MIPS: emulation support for MXU SIMD instructions for MIPS32
      • PowerPC: pseries: enablement of nested virtualization via KVM-HV
      • PowerPC: prep: deprecated in favor of 40p machine model
      • Powerpc: 40p: IRQ routing fixes, switch from Open HackWare to OpenBIOS
      • PowerPC: g3beige/mac99: support for booting from virtio-blk-pci
      • s390: VFIO passthrough support for crypto devices (vfio-ap)
      • s390: KVM support for backing guests with huge pages
      • SPARC: sun4u: support for booting from virtio-blk-pci
      • x86: multi-threaded TCG support
      • x86: KVM support for Enlightened VMCS (improved perf for Hyper-V on KVM)
      • x86: KVM support for Hyper-V IPI enlightenments
      • Xtensa: support for input from chardev consoles
      • Support for AMD IOMMU interrupt remapping and guest virtual APIC mode
      • XTS cipher mode is now ~2x faster
      • stdvga and bocks-display devices can expose EDID information to guest, (for use with xres/yres resolution options)
      • qemu-img tool can now generate LUKS-encrypted files through ‘convert’ command
      • and lots more…

      Thank you to everyone involved!

      December 12, 2018 06:50 AM

      December 04, 2018

      Cornelia Huck

      Notes from KVM Forum 2018

      KVM Forum 2018 took place October 24 - 26 in Edinburgh, Scotland. Better late than never, here are some of my notes and impressions. As always, there was a lot going on, and I could not attend everything that I would have found interesting. Fortunately, video recordings are available (see the page linked above, respectively the YouTube channel); here, I'd like to thank the folks organizing the logistics, recording the talks, and uploading nicely edited versions!

      This year, KVM Forum was again co-located with OSS Europe, and on the first day (which also featured the annual QEMU summit), talks were on a shared track. This meant an opportunity for people attending OSS to hear some KVM and virtualization related talks; unfortunately, it also meant that the room where the KVM Forum talks were held was very crowded. Nevertheless, it is always nice if a talk is interesting enough to attract a good number of people; I'm happy that my maintainership talk also attracted a nice audience. Other talks from the first day I enjoyed were Alex' talk about L1TF and Marc's talk about running huge libvirt installations.

      The second and third day featured some more comfortable rooms; organization-wise, I liked that talks about similar topics were grouped back-to-back.

      On these days, we had the keynotes for KVM, QEMU, and libvirt; as well as the contributor Q&A panel - some good questions from the audience there. Also check out Christian's talk about the various architectures supported by KVM and how much commonality is there (or not).

      Most of the time, days two and three were dual-track. Some of the topics covered were vfio and migration with vfio; nested virtualization; not-so-common architectures (including s390!); testing and continuous integration. I find it hard to point out specific sessions and recommend browsing through the posted videos instead.

      Some topics were delved into more deeply in BOF sessions; myself, I attended the vfio migration BOF which gave me a couple of things to think about. Many BOF sessions subsequently posted summaries on the relevant mailing lists.

      One of the most important features of any conference is, of course, the hallway track: Meeting new people, seeing old acquaintances again, and impromptu discussions about a lot of different topics. I find that this is one of the most valuable experiences, both for putting a face to a name and for discussing things you did not event think about beforehand.

      So, for an even shorter summary of my short notes: KVM Forum 2018 was great, go watch some videos, and consider attending future KVM Forums :)

      by Cornelia Huck (noreply@blogger.com) at December 04, 2018 06:52 PM

      December 03, 2018

      KVM on Z

      SLES 12 SP3 Updates


      SLES 12 SP3, released late last year, received a couple of mostly performance and security-related updates in support of IBM z14 and LinuxONE through the maintenance web updates.
      In particular:

        by Stefan Raspl (noreply@blogger.com) at December 03, 2018 08:38 AM

        December 01, 2018

        Thomas Huth

        QEMU Advent Calendar 2018 opened the first door

        Starting today, on December 1st, the first door of the QEMU Advent Calendar 2018 can now be opened! The advent calendar reveals a new disk image for download on each of the first 24 days in December 2018, to create a fun experience for the QEMU community, to celebrate the 15th anniversary of QEMU, and to provide some good images for testing the various CPU targets of QEMU – this year it will contain way more images for non-x86 targets than before, so if you are interested in collecting test images for the various CPU targets of QEMU, be sure to check the calendar regularly!

        December 01, 2018 07:05 AM

        November 29, 2018

        Daniel Berrange

        Improved translation po file handling by ditching gettext autotools integration

        The libvirt library has long provided translations of its end user facing strings, which largely means error messages and console output from command line tools / daemons. Since libvirt uses autotools for its build system, it naturally used the standard automake integration provided by gettext for handling .po files. The libvirt.pot file with master strings is exported to Zanata, where the actual translation work is outsourced to the Fedora translation team who support up to ~100 languages. At time of writing libvirt has some level of translation in ~45 languages.

        With use of Zanata, libvirt must periodically create an updated libvirt.pot file and push it to Zanata, and then just before release it must pull the latest translated .po files back into GIT for release.

        There have been a number of problems with this approach which have been annoying us pretty much since the start, and earlier this year it finally became too much to bear any longer.

        • The per-language translation files stored in git contain source file name and line number annotations to indicate where each translatable string originates. Since the translation files are not re-generated on every single source file changes, the file locations annotations becomes increasingly out of date after every commit. When the translation files are updated 98% of the diff is simply changing source file locations leading to a very poor signal/noise ratio.
        • The strings in the per-language translation files are sorted according to source filename. Thus when code is moved between files, or when files are renamed, the strings in the updated translation files all get needlessly reordered, again leading to a poor signal/noise ratio in diffs.
        • Each language translation file contains every translatable string even those which do not have any translation yet. This makes sense if translators are working directly against the .po files, but in libvirt everything is done via the Zanata UI which already knows the list of untranslated strings.
        • The per-language translation files grow in size over time with previously used message strings appended to the end of the file, never discarded by the gettext tools. This again makes sense if translators are working directly against .po files, but Zanata already provides a full translation memory containing historically used strings.
        • Whenever ‘make dist’ is run the gettext autotools integration will regenerate the per-language translation files. As a result of the three previous points, every time a release is made there’s a giant commit more than 100MB in size that contains diffs for translated files which are entirely noise and no signal.

        One suggested approach to deal with this is to stop storing translations in GIT at all and simply export them from Zanata only at time of ‘make dist’. The concern with this approach is that the GIT repository no longer contains the full source for the project in a self-contained manner. ‘make dist‘ now needs a live network connection to the Zanata servers. If we were to replace Zanata with a new tool in the future (Zanata is already a replacement for the previously used Transifex), we would potentially loose access to translations for old releases.

        With this in mind we decided to optimize the way translations are managed in GIT.

        The first easy win was to simply remove the master libvirt.pot file from GIT entirely. This file is auto-generated from the source files and is out of date the moment any source file changes, so no one would ever want to use the stored copy.

        The second more complex step was to minimize and canonicalize the per-language translation files. msgmerge is used to take the full .po file and strip out the source file locations and sort the string alphabetically. A perl script is then used to further process the content dropping any translations marked as “fuzzy” and drop any strings for which there is no translated text available. The resulting output is still using the normal .po file format but we call these ‘.mini.po‘ files to indicate that they are stripped down compared to what you’d normally expect to see.

        The final step was to remove the gettext / autotools integration and write a custom Makefile.am to handle the key tasks.

        • A target ‘update-mini-po‘ to automate the process of converting full .po files into .mini.po files. This is used when pulling down new translations from Zanata to be stored in git before release.
        • A target ‘update-po’ to automate the inverse process of converting .mini.po files back into full .po files. This is to be used by anyone who might need to look at full language translations outside of Zanata.
        • An install hook to generate the binary .gmo files from the .mini.po files and install them into /usr/share/locale for use at runtime. This avoids the need to ship the full .po files in release tarballs.
        • A target ‘zanata-push‘ to automate the process of re-generating the libvirt.pot file and uploading it to Zanata.
        • A target ‘zanata-pull‘ to automate the process of pulling new translations down from zanata and then triggering ‘update-mini-po

        After all this work was completed the key benefits are

        • The size of content stored in GIT was reduced from ~100MB to ~18MB.
        • Updates to the translations in GIT now produce small diffstats with a high signal/noise ratio
        • Files stored in GIT are never changed as a side effect of build system commands like ‘make dist’
        • The autotools integration is easier to understand

        while not having any visible change on the translators using Zanata. In the event anyone does need to see full translation languages outside of Zanata there is an extra step to generate the full .po files from the .mini.po files but this is countered by the fact that the result will be fully up to date with respect to translatable strings and source file locations.

        I’d encourage any project which is using gettext autotools integration, while also outsourcing to a system like Zanata, to consider whether they’d benefit from taking similar steps to libvirt. Not all projects will get the same degree of space saving but diffstats with good signal/noise ratios and removing side effects from ‘make dist’ are wins that are likely desirable for any project.

         

        by Daniel Berrange at November 29, 2018 12:22 PM

        November 28, 2018

        Stefan Hajnoczi

        Software Freedom Conservancy donations are being matched again!

        Donations to Software Freedom Conservancy, the charity that acts as the legal home for QEMU and many other popular open source projects that don't run their own foundations or charities, are being matched again this year. That means your donation is doubled thanks to a group of donors who have pledged to match donations.

        Software Freedom Conservancy helps projects with the details of running an open source project (legal advice, handling expenses, organizing conferences, etc) as well as taking a leading position on open source licensing and enforcement. Their work is not-for-profit and in the interest of the entire open source community.

        If you want more projects like QEMU, Git, Samba, Inkscape, and Selenium to succeed as healthy open source communities, then donating to Software Freedom Conservancy is a good way to help.

        Find out about becoming a Supporter here.

        by stefanha (noreply@blogger.com) at November 28, 2018 10:37 AM

        November 27, 2018

        Stefan Hajnoczi

        QEMU Advent Calendar 2018 is coming!

        QEMU Advent Calendar is running again this year. Each day from December 1st through 24th a surprise QEMU disk image will be released for your entertainment.

        Check out the website on December 1st for the first disk image:

        https://www.qemu-advent-calendar.org/2018/

        Thomas Huth is organizing QEMU Advent Calendar 2018 with the help of others from the QEMU community. If you want to contribute a disk image, take a look at the call for images email.

        by stefanha (noreply@blogger.com) at November 27, 2018 09:15 AM

        November 14, 2018

        Cornelia Huck

        s390x changes in QEMU 3.1

        QEMU is now in the -rc phase for 3.1, with a release expected in early/mid December, and, as usual, this is a good time to summarize the s390x changes for that release.

        CPU models

        • s390x now supports the 'max' cpu model as well (which somehow had been forgotten...) When using KVM, this behaves like the 'host' model; when using TCG, this is the 'qemu' model plus some additional, experimental features. Note that this is neither static nor migration-safe.

        Devices

        • Support for vfio-ap has been added. That allows to pass crypto cards on the AP bus to the guest. Support for this has been merged into the Linux kernel with 4.20. As this is a rather large feature, I plan to do a separate writeup for this.

        KVM

        • Support for enabling huge page backing has been added. This requires a host kernel of version 4.19 or higher. Note that this is only available for the s390-ccw-virtio-3.1 or later machines (due to compat handling), and that it is as of writing this incompatible with nested virtualization (which should change in the future.)
        • Support for the etoken facility (spectre mitigation) has been added. This, as well, needs a host kernel of version 4.19 or higher.

        TCG

        • Support for instruction flags and AFP registers has been added.

        Miscellaneous

        • The deprecated 's390-squash-mcss' option has been removed.
        • And the usual fixes, cleanups and improvements.

        by Cornelia Huck (noreply@blogger.com) at November 14, 2018 06:20 PM

        Thomas Huth

        QEMU Advent Calendar 2018 website online

        This year, we are celebrating the 15th anniversary of QEMU (QEMU 0.1 was announced in March 2003), and to contribute to this celebration, we will have another edition of the QEMU Advent Calendar this year. The new website for the advent calendar is now online at www.qemu-advent-calendar.org – but please do not try to open any of the doors before December 1st. We are also still looking for some images which we can present this year. If you would like to help, please have a look at the “QEMU Advent Calendar 2018 - Help wanted” mail that I have sent to the QEMU mailing lists.

        November 14, 2018 10:15 AM

        November 08, 2018

        Gerd Hoffmann

        Fedora 29 images uploaded

        Fedora 28 was released last week, so here are the fresh Fedora 29 images for qemu.

        As usual the images don't have a root password. You have to set one using virt-customize -a --root-password "password:<secret>", otherwise you can't login after boot.

        Some images use grub2 as bootloader, some use systemd-boot. The filename indicates which uses which. The x86_64 and i686 images can be booted with both uefi and bios firmware. The arm images come as grub2 variant only. systemd-boot doesn't support 32bit arm and crashes on 64bit arm.

        The images can also be booted as container, using systemd-nspawn --boot --image <file>, but you have to convert them to raw first as systemd-nspawn can't handle qcow2.

        The 32bit arm image (armhfp) isn't there because doesn't boot for me. Seems the fedora grub2-efi.armhfp package has trouble booting the kernel in qemu (with edk2 firmware). To be investigated if I find some time. Note: The fedora 28 image uses a custom grub2-efi.armhfp package as fedora didn't ship grub2-efi.armhfp in version 28.

        The images have been created with imagefish.

        by Gerd Hoffmann at November 08, 2018 11:00 PM

        November 06, 2018

        KVM on Z

        Ubuntu 18.10 released

        Ubuntu Server 18.10 is out! Support for IBM Z is available here.
        It ships

        by Stefan Raspl (noreply@blogger.com) at November 06, 2018 12:21 AM

        November 04, 2018

        Stefan Hajnoczi

        Video and slides available for "Security in QEMU"

        I gave a talk about security in QEMU at KVM Forum 2018. It covers the architecture of QEMU and focusses on the attack surfaces that are exposed to guests. I hope it will be useful to anyone auditing or writing device emulation code. It also describes the key design principles for isolating the QEMU process and limiting the damage that can be done if a guest escapes.

        The video of the talk is now available:

        The slides are available here (PDF).

        by stefanha (noreply@blogger.com) at November 04, 2018 07:12 PM

        October 24, 2018

        Gerd Hoffmann

        VGA emulation in qemu - where do we want to go?

        lets start with some history ...

        The original VGA

        It was introduced by IBM in 1987. It had a bunch of new features, and also included old ones which where already present in the predecessor devices CGA and EGA, including:

        • text modes (80x25, also 80x50 using a smaller font)
        • 16 color mode (640x480, 4 bit per color, one plane per bit)
        • 256 color mode (320x240, 8 bit per color)
        • various tweaks you can do, like enabling double scan or split screen.

        The VGA has 256k of video memory and it is accessed using a memory window at 0xa0000. It is not possible to access all video memory at the same time, you have to set bank registers to map the piece of memory you want access into the window.

        All vga devices emulated by qemu support this.

        Super VGA

        In the early 90ies various enhanced VGA cards, typically named "Super VGA" (abbreviated SVGA) became available from various vendors. The cirrus vga emulated by qemu is a typical SVGA card which was quite popular back then. They add various new features:

        • more video memory, which in turn allows for:
        • higher resolutions for 256 color modes.
        • more colors (64k, 16 bit per pixel).
        • even more colors (16M, 24 or 32 bit per pixel).
        • linear framebuffer, so you can access all video memory at the same time, without having to bank-switch the video memory into the rather small window at 0xa0000.
        • 2D acceleration (cirrus blitter for example).

        SVGA in qemu

        All SVGA devices in qemu (except cirrus) have support for the bochs display interface. That interface was implemented by the bochs emulator first (this is where the name comes from). It was implemented in qemu too. For the qemu standard vga it is the primary interface. qxl-vga, virtio-vga and vmsvga support the bochs dispi interface when they are in vga compatibility mode, which is typically the case at boot, before the guest os loads the native display driver.

        The bochs display interface is a paravirtual interface, with just the bare essentials to set video modes on a virtual display device. There are no registers for clock rate and other timing stuff for example.

        Traditionally the bochs display interface uses I/O ports 0x1ce (bochs register index) and 0x1cf (bochs register data), As both registers are 16bit the data registers is unaligned, which does not work on non-x86 archs, so 0x1d0 is supported as data port too.

        Graphics usage by modern guests

        Lets have a look at what modern guests are doing in the graphics field:

        • Everything but 32 bit true color modes is pretty much unused. The only exception is 16 bit true color modes which are still used sometimes in resource-constrained environments (raspberry pi for example).
        • 2D acceleration is dead. It's either software rendering into a dumb framebuffer, or using the 3D engine for 2D rendering.
        • text mode is used only with BIOS firmware, and even then only at boot (bootloader, vgacon until the kms driver loads). UEFI goes straight to graphics mode.
        • Banked video memory access is dead. Text mode still uses the 0xa0000 window, but the text buffer is small enough that there is no bank switching needed.

        So, we have a lot of rather complex code to emulate features not used at all by modern guests. There have been security bugs in the past in that complex but largely unused code ...

        So, can we simplify things?

        Turns out: yes, we can. First step already happened in qemu 1.3. The qemu stdvga got a MMIO bar. The MMIO bar can be used as alternative way to access the vga registers and also the bochs dispi interface registers.

        OVMF (UEFI implementation for qemu) uses the MMIO bar. The bochs-drm.ko linux kms driver uses the MMIO bar too. In fact, both use the bochs display interface registers only, except for setting the unblank bit so the screen will not stay black.

        So, the guest code already ignores the vga emulation. Cool. We can build on that.

        Introducing -device bochs-display

        New display device. Merged in qemu 3.0. Featues:

        • No VGA compatibility. PCI class is display/other instead of display/vga.
        • It has a stdvga-style MMIO bar. The vga registers are not available of course. Otherwise the register interface is identical to the stdvga though.
        • Implemented from scratch, no code sharing with vga. Code size is an order of magnitude smaller when compared to vga.
        • No I/O ports needed. You can plug it into an PCIe slot.
        • OVMF supports it.
        • bochs-drm.ko supports it too.

        So, all set for UEFI guests. You can switch from stdvga to bochs-display, and everything continues to work fine.

        But what about BIOS and text mode?

        Accessing the vga hardware directly for text mode is rare these days. Typically seabios and linux boot loaders call vgabios functions to render text on the display. So, we can hook in there and support text rendering without the hardware actually having text mode support. A very simliar approach is taken by sgabios, to redirect vga text output to the serial line.

        Luckily we are not the first ones facing that problem. coreboot can initialize the graphics hardware and setup a framebuffer with the native display resolution. Having to switch back to text mode when running seabios as coreboot payload is not exactly nice. So, there is a vgabios variant for coreboot which renders text to a framebuffer.

        So, take that, tweak the initialization code to program the bochs dispi interface instead of looking for a framebuffer setup by coreboot, and we are ready to go. Seabios boot messages show up on the bochs-display framebuffer. Yay!

        This will work out-of-the-box in qemu 3.1. The vgabios is already present in qemu 3.0, but due to a bug it is not installed by default, it must be copyed over manually to get things going.

        There are some drawbacks, which may or may not be a problem depending on your use case:

        • linux vgacon does not work due to direct vga hardware access. So you have to use vesafb or just live with not having early boot messages. Once bochs-drm.ko loads fbcon will be functional.
        • The vgabios uses a fixed 1024x768 resolution and does not support switching modes after initialization. Reason is that the initialization code runs in big real mode, so accessing the mmio bar is easy then. That is not the case for vgabios function calls though. Resolutions smaller than 1024x768 are allowed by vgabios and will simply use the upper left corner of the display.

        That's it. Enjoy the new legacy-free display device.

        by Gerd Hoffmann at October 24, 2018 10:00 PM

        October 04, 2018

        Cole Robinson

        Setting custom network names on Fedora

        systemd predictable network names give us host interface names like enp3s0. On one of my hosts, I have two interfaces: one that is my regular hard wired connection, and another I only plug in occasionally for some virt network testing. I can never remember the systemd names, so I want to rename the interfaces to something more descriptive for my needs. in my case lan0main and lan1pcie

        The page referenced says to use systemd links. However after struggling with that for a while I'm that's only relevant to systemd-networkd usage and doesn't apply to Fedora's default use of NetworkManager. So I needed another way.

        Long story short I ended up with some custom udev rules that are patterned after the old 70-persistent-net.rules file:

        $ cat /etc/udev/rules.d/99-cole-nic-names.rules 
        SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="70:8b:cd:80:e5:5f", ATTR{type}=="1", NAME="lan0main"
        SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="68:05:ca:1a:f5:da", ATTR{type}=="1", NAME="lan1pcie"

        by Cole Robinson (noreply@blogger.com) at October 04, 2018 09:27 PM

        October 03, 2018

        KVM on Z

        RHEL 7.5 Beta supports KVM on Z

        The Red Hat Enterprise Linux 7.5 Beta ships with support for KVM on Z through the kernel-alt packages. This will essentially ship Linux kernel 4.14.
        Here is the respective section from the release notes:
        KVM virtualization is now supported on IBM z Systems. However, this feature is only available in the newly introduced user space based on kernel version 4.14, provided by the kernel-alt packages.
        See here for further details.

        by Stefan Raspl (noreply@blogger.com) at October 03, 2018 09:14 AM

        October 01, 2018

        KVM on Z

        Knowledge Series: Black Box Guest Analysis Using kvm_stat


        Another new entry in our Knowledge Series details how to gain insights into black box KVM guests using kvm_stat.

        by Stefan Raspl (noreply@blogger.com) at October 01, 2018 12:12 PM

        Powered by Planet!
        Last updated: March 23, 2019 06:08 AM
        Powered by OpenShift Online