Blogging about open source virtualization

News from QEMU, KVM, libvirt, libguestfs, virt-manager and related tools

Subscriptions

Planet Feeds

June 16, 2019

Gerd Hoffmann

macos guest support for qemu

display support

After one of the minor updates (10.14.3 or 10.14.4, don't remember) my macOS Mojave guest started to switch video modes at boot. Also the "Display" panel in "System Preferences" started to offer three video modes to choose from. Unfortunaly FullHD (aka 1920x1080) is not on the list.

Decided to look into this. Assuming that macOS learned to switch video modes using the EFI GOP interface I've tweaked the mode list in OVMF (QemuVideoDxe driver). No effect. Huh?

Next I've looked at the list of drivers, using kextstat. Found a AppleVirtualGraphics.kext entry. Interesting. Checking out Info.plist (in /System/Library/Extensions/AppleVirtualGraphics.kext/Contents) I found this:

    [ ... ]
    <key>IOKitPersonalities</key>
    <dict>
        <key>AppleBochVGAFB</key>
        <dict>
            [ ... ]
            <key>IOPCIPrimaryMatch</key>
            <string>0x11111234&amp;0xFFFFFFFF</string>
            [ ... ]
        </dict>
        <key>AppleCirrusGD5446FB</key>
        <dict>
            [ ... ]
            <key>IOPCIPrimaryMatch</key>
            <string>0x00001013&amp;0x0000FFFF</string>
            [ ... ]
        </dict>
    </dict>
    [ ... ]

So recent macOS Mojave ships with a driver for qemu stdvga and qemu cirrus vga. Nice. Unfortunaly the question how to switch the display into 1920x1080 mode (to match the hosts display resolution) isn't solved yet.

virtio support

While looking around I've noticed there is a AppleVirtIO.kext too, with this in Info.plist:

    [ ... ]
    <key>IOKitPersonalities</key>
    <dict>
        <key>AppleVirtIO9P</key>
        [ ... ]
        <key>AppleVirtIO9PVFS</key>
        [ ... ]
        <key>AppleVirtIOBlock</key>
        [ ... ]
        <key>AppleVirtIOConsole</key>
        [ ... ]
        <key>AppleVirtIOPCITransport</key>
        [ ... ]
    </dict>

Apparently a virtio driver with support for virtio-console/serial, virtio-blk and virtio-9p.

Tried to switch the system disk from sata to virtio-blk. Clover seems to be slightly confused. It stops showing the nice disk icons. But booting macOS works fine regardless. When using the transitional device, so the driver supports legacy mode only.

virtio-9p for filesystem sharing looks pretty interesting too. So, lets try that (in libvirt xml):

  [ ... ]
  <devices>
    [ ... ]
    <filesystem type='mount' accessmode='mapped'>
      <source dir='/path/to/some/host/directory'/>
      <target dir='/test9p'/>
      <readonly/>
    </filesystem>
    [ ... ]
  </devices>
  [ ... ]

macOS seems to not mount the filesystem automatically. But it is easy to do, using the terminal. You need to create the target directory first (sudo mkdir /test9p). Then run sudo mount_9p. Done.

by Gerd Hoffmann at June 16, 2019 10:00 PM

June 05, 2019

Gerd Hoffmann

recent qemu sound improvements

The qemu sound system got a bunch of improvements in 2018 and 2019.

New in qemu 3.0

The hda emulation uses a high resolution timer now to better emulate the timing-sensitive dma transfer of sound samples. Credits for this implementation go to Martin Schrodt.

Unfortunaly this is incompatible with older qemu versions, so it is only enabled for 3.0+ machine type versions. So upgrading qemu is not enough to get this, you also have to make sure you are using a new enough machine type (qemu -M command line switch).

libvirt stores the machine type in the domain xml when the guest is created. It is never updated automatically. So have a look at your domain configuration (using virsh edit domain-name for example) and check the version is 3.0 or newer:

[ ... ]
  <os>
    <type arch='x86_64' machine='pc-q35-3.0'>hvm</type>
                                        ^^^
[ ... ]

New in qemu 3.1

The pulseaudio backend got fixes in 3.1, so if you are using pulse you should upgrade to at least qemu version 3.1.

New in qemu upcoming 4.0

Yet another pulseaudio bugfix.

Initial support for the -audiodev command line switch was finally merged. So audio support is not the odd kid any more which is configured in a completely different way, using environment variables instead of command line switches. Credits for this go to Kővágó, Zoltán.

In the pipeline

There are more -audiodev improvements in the pipeline, they are expected to land upstream in the 4.1 or 4.2 devel cycle.

Latency tuning

While being at it one final note:

Bugs in qemu sound device emulation and audio backends are not the only possible root cause for bad sound quality. Crackling sound -- typically caused by buffer underruns -- can also be caused by latency problems elsewhere in qemu.

One known offender is disk I/O, specifically the linux aio support which isn't as async as it should be and blocks now and then. linux aio support is configured with io=native for block device backends.

Better choice is io=threads. In libvirt xml:

[ ... ]
  <devices>
    <disk type='...' device='disk'>
      <driver name='qemu' type='...' cache='none' io='threads'/>
                                                  ^^^^^^^^^^^^
[ ... ]

Another known issue is spice audio compression, so better turn that off when using spice:

[ ... ]
    <graphics type='spice'>
      [ ... ]
      <playback compression='off'/>
    </graphics>
[ ... ]

by Gerd Hoffmann at June 05, 2019 10:00 PM

May 22, 2019

QEMU project

QEMU 4.0 adds micro:bit emulation support

micro:bit emulation support is available from QEMU 4.0 onwards and can be used for low-level software testing and development. Unlike existing micro:bit simulators, QEMU performs full-system emulation and actually runs the same ARM code as the real hardware. This blog post explains what full-system emulation means and why QEMU is now a useful tool for developing micro:bit software.

The micro:bit is a tiny ARM board designed for teaching. It is increasingly being used around the world to expose children to computers, programming, and electronics in a low-cost way with an active online community that shares project ideas, lesson plans, and programming tips.

micro:bit board

Simulators and emulators

Simulators are used for many tasks from mobile app development to performance analysis of computer hardware. It is possible to develop code using a simulator without having access to real hardware. Oftentimes using a simulator is more convenient than flashing and debugging programs on real hardware.

Emulators allow programs written for one computer system to run on a different computer system. They use techniques like machine code interpreters and just-in-time compilers to execute guest programs that do not run natively on the host computer. Each CPU instruction must be correctly implemented by the emulator so it can run guest software.

How existing micro:bit simulators work

Simulators can be implemented at various layers in the software stack. The MakeCode editor for JavaScript development includes a micro:bit simulator:

MakeCode editor

This simulator does not execute any ARM code and is therefore not running the same CPU instructions as a real micro:bit. Instead it reuses the JavaScript engine already available in your web browser to execute micro:bit JavaScript programs. This is achieved by providing the micro:bit JavaScript APIs that micro:bit programs expect. The programs don’t need to know whether those APIs are implemented by the real micro:bit software stack or whether they are actually calling into the MakeCode simulator.

In the screenshot above the micro:bit program calls showString("Hello world!") and this becomes a call into the MakeCode simulator code to render images of LEDs in the web browser. On real hardware the code path is different and eventually leads to an LED matrix driver that lights up the LEDs by driving output pins on the micro:bit board.

Full-system emulation

Unlike the MakeCode simulator, QEMU emulates the micro:bit CPU and boots from the same ARM code as the real micro:bit board. The simulation happens at the CPU instruction and hardware interface level instead of at the JavaScript API level. This is called full-system emulation because the entire guest software environment is present.

What are the advantages of full-system emulation?

  • Programs written in any language can run (MicroPython, mbed C/C++, etc)
  • Boot, device driver, and language run-time code can be tested
  • Bugs in lower layers of the software stack can be reproduced
  • CPU architecture-specific bugs can be reproduced (stack and memory corruption bugs)
  • A debugger can be connected to inspect the entire software stack

The main disadvantage of full-system emulation is that the performance overhead is higher since simulation happens at the CPU instruction level. Programs consist of many CPU instructions so the task of emulation is performance-sensitive. Luckily the micro:bit’s CPU is much less powerful than CPUs available in our laptops and desktops, so programs execute at a reasonable speed.

Running micro:bit programs on QEMU

QEMU emulates the core devices on the micro:bit, including the serial port (UART) and timers. This is enough for developing and testing low-level software but does not offer the LEDs, radio, and other devices that most micro:bit programs rely on. These devices might be emulated by QEMU in the future, but for now the main use of QEMU is for developing and testing low-level micro:bit code.

To run test.hex:

$ qemu-system-arm -M microbit -device loader,file=test.hex -serial stdio

Any output written to the serial port is printed to the terminal by QEMU.

Debugging micro:bit programs with QEMU and GDB

QEMU has GDB guest debugging support. This means GDB can connect to QEMU in order to debug the guest software. This is similar to debugging a real system over JTAG, except no hardware is necessary!

Connect with GDB to debug the guest:

$ qemu-system-arm -M microbit -device loader,file=test.hex -s
$ gdb
(gdb) target remote tcp:127.0.0.1:1234
(gdb) x/10i $pc
=> 0x161c4:	ldr	r3, [r4, #0]
   0x161c6:	cmp	r3, #0
   0x161c8:	beq.n	0x161d2
   0x161ca:	ldr	r3, [pc, #48]	; (0x161fc)
   0x161cc:	ldr	r3, [r3, #0]
   0x161ce:	cmp	r3, #0
   0x161d0:	bne.n	0x161d8
   0x161d2:	movs	r0, #6
   0x161d4:	bl	0x16160
   0x161d8:	ldr	r0, [r4, #0]

Having a debugger is very powerful. QEMU can also load ELF files in addition to the popular .hex files used for micro:bit programs. ELF files can contain debugging information that enables source-level debugging so GDB can display function and variable names as well as listing the source code instead of showing assembly instructions.

Conclusion

QEMU now offers a platform for developing and testing micro:bit programs. It is open to future extension, hopefully to emulate more devices and offer a graphical user interface.

micro:bit emulation was contributed by Julia Suvorova and Steffen Görtz as part of their Outreachy and Google Summer of Code internships with QEMU. Jim Mussared, Joel Stanley, and Stefan Hajnoczi acted as mentors and contributed patches as well.

May 22, 2019 10:45 AM

May 17, 2019

KVM on Z

QEMU v3.1 released

QEMU v3.1 is out. Besides a number of small enhancements, some items that we would like to highlight from a KVM on Z perspective:
  • Huge Pages Support: KVM guests can now utilize 1MB pages. As this removes one layer of address translation for the guest backing, less page-faults need to be processed, and less translation lookaside buffer (TLB) entries are needed to hold translations. This, as well as the TLB improvements in z14, will improve KVM guest performance.
    To use:
    Create config file /etc/modprobe.d/kvmhpage.conf file with the following content to enable huge pages for KVM:

       options kvm hpage=1


    Furthermore, add the following line to /etc/sysctl.conf to reserve N huge pages:

       vm.nr_hugepages = N

    Alternatively, append the following statement to the kernel parameter line in case support is compiled into the kernel: kvm.hpage=1 hugepages=N.
    Note that means to add hugepages dynamically after boot exist, but with effects like memory fragmentation, it is preferable to define huge pages as early as possible.
    If successful, the file /proc/sys/vm/nr_hugepages should show N huge pages. See here for further documentation.
    Then, to enable huge pages for a guest, add the following element to the respective domain XML:

       <memoryBacking>
         <hugepages/>
       </memoryBacking>


    The use of huge pages in the host is orthogonal to the use of huge pages in the guest. Both will improve the performance independently by reducing the number of page faults and the number of page table walks after a TLB miss.
    The biggest performance improvement can be achieved by using huge pages in both, host and guest, e.g. with libhugetlbfs, as this will also make use of the larger 1M TLB entries in the hardware.
    Requires Linux kernel 4.19.
  • vfio-ap: The Adjunct Processor (AP) facility is an IBM Z cryptographic facility comprised of three AP instructions and up to 256 cryptographic adapter cards. Each adapter card is partitioned into up to 85 domains, each of which provides cryptographic services. An AP queue is the means by which AP messages are sent to and received from an AP adapter. Each AP queue is connected to a particular domain within a particular adapter. vfio-ap enables assignment of a subset of AP adapters and domains to one or more guests such that each guest has exclusive access to a discrete set of AP queues.
    Here is a small sample script illustrating host setup:

       # load vfio-ap device driver
       modprobe vfio-ap

       # reserve domain 7 for use by KVM guests
       echo -0x7 > /sys/bus/ap/aqmask
       # to reserve all domains of an adapter, use the following
       # line instead (by uncommenting it), and replace NN with the
       # adapter number:
       # echo -0xNN > /sys/bus/ap/apmask

       # create a mediated device (mdev) to provide userspace access
       # to a device in a secure manner
       UUID=e926839d-a0b4-4f9c-95d0-c9b34190c4ba
       echo $UUID > /sys/devices/vfio_ap/matrix/mdev_supported_types/ \
                    vfio_ap-passthrough/create

       # assign adapter, domain and control domain
       echo 0x3 > /sys/devices/vfio_ap/matrix/${UUID}/assign_adapter
       echo 0x7 > /sys/devices/vfio_ap/matrix/${UUID}/assign_domain
       echo 0x7 > /sys/devices/vfio_ap/matrix/${UUID}/ \
                  assign_control_domain


    To make use of the AP device in a KVM guest, add the following element to the respective domain XML:

       <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-ap'>
         <source>
           <address uuid='e926839d-a0b4-4f9c-95d0-c9b34190c4ba'/>
         </source>
       </hostdev>


    Once complete, use the passthrough device in a KVM guest just like a regular crypto adapter.
    Requires Linux kernel 4.20 and libvirt 4.9, and is also available in RHEL 8, Ubuntu 18.04 and SLES 15 SP1.

by Stefan Raspl (noreply@blogger.com) at May 17, 2019 11:33 PM

May 07, 2019

KVM on Z

RHEL 8 released

Red Hat Enterprise Linux 8 is out! See the announcement and their release notes with Z-specific changes.
It ships the following code levels:

by Stefan Raspl (noreply@blogger.com) at May 07, 2019 04:52 PM

April 30, 2019

KVM on Z

QEMU v4.0 released

QEMU v4.0 is out. Besides a number of small enhancements, some items that we would like to highlight from a KVM on Z perspective:
  • CPU models for z14 GA2 as follows:
       $ qemu-system-s390x -cpu help -enable-kvm | grep z14.2
       s390 z14.2-base      IBM z14 GA2           (static, migration-safe)
       s390 z14.2           IBM z14 GA2           (migration-safe)
  • vfio-ap now supports hotplugging of vfio-ap devices.

by Stefan Raspl (noreply@blogger.com) at April 30, 2019 08:17 AM

April 24, 2019

QEMU project

QEMU version 4.0.0 released

We would like to announce the availability of the QEMU 4.0.0 release. This release contains 3100+ commits from 220 authors.

You can grab the tarball from our download page. The full list of changes are available in the Wiki.

Highlights include:

  • ARM: ARMv8+ extensions for SB, PredInv, HPD, LOR, FHM, AA32HPD, PAuth, JSConv, CondM, FRINT, and BTI
  • ARM: new emulation support for “Musca” and “MPS2” development boards
  • ARM: virt: support for >255GB of RAM and u-boot “noload” image types
  • ARM: improved emulation of ARM PMU
  • HPPA: support for TLB protection IDs and TLB trace events
  • MIPS: support for multi-threaded TCG emulation
  • MIPS: emulation support for I7200 I6500 CPUs, QMP-base querying of CPU types, and improved support for SAARI and SAAR configuration registers
  • MIPS: improvements to Interthread Communication Unit, Fulong 2E machine types, and end-user documentation.
  • PowerPC: pseries/powernv: support for POWER9 large decrementer
  • PowerPC: pseries: emulation support for XIVE interrupt controller
  • PowerPC: pseries: support for hotplugging PCI host bridges (PHBs)
  • PowerPC: pseries: Spectre/Meltdown mitigations enabled by default, additional support for count-cache-flush mitigation
  • RISC-V: virt: support for PCI and USB
  • RISC-V: support for TSR, TW, and TVM fields of mstatus, FS field now supports three stats (dirty, clean, and off)
  • RISC-V: built-in gdbserver supports register lists via XML files
  • s390: support for z14 GA 2 CPU model, Multiple-epoch and PTFF features now enabled in z14 CPU model by default
  • s390: vfio-ap: now supports hot plug/unplug, and no longer inhibits memory ballooning
  • s390: emulation support for floating-point extension facility and vector support instructions
  • x86: HAX accelerator now supported POSIX hosts other than Darwin, including Linux and NetBSD
  • x86: Q35: advertised PCIe root port speeds will now optimally default to maximum link speed (16GT/s) and width (x32) provided by PCIe 4.0 for QEMU 4.0+ machine types; older machine types will retain 2.5GT/x1 defaults for compatibility.
  • x86: Xen PVH images can now be booted with “-kernel” option
  • Xtensa: xtfpga: improved SMP support for linux (interrupt distributor, IPI, and runstall) and new SMP-capable test_mmuhifi_c3 core configuration
  • Xtensa: support for Flexible length instructions extension (FLIX)
  • GUI: new ‘-display spice-app’ to configure/launch a Spice client GUI with a similar UI to QEMU GTK. VNC server now supports access controls via tls-authz/sasl-authz options
  • QMP: support for “out-of-band” command execution, can be useful for postcopy migration recovery. Additional QMP commands for working with block devices and dirty bitmaps
  • VFIO: EDID interface for supported mdev (Intel vGPU for kernel 5.0+), allows resolution setting via xres/yres options.
  • Xen: new ‘xen-disk’ device which can create a Xen PV disk backend, and performance improvements for Xen PV disk backend.
  • Network Block Device: improved tracing and error diagnostics, improved client compatibility with buggy NBD server implementations, new –bitmap, –list, –tls-authz options for qemu-nbd
  • virtio-blk now supports DISCARD and WRITE_ZEROES
  • pvrdma device now supports RDMA Management Datagram services (MAD)
  • and lots more…

Thank you to everyone involved!

April 24, 2019 05:45 AM

April 18, 2019

Stefan Hajnoczi

What's new in VIRTIO 1.1?

The VIRTIO 1.1 specification has been published! This article covers the major new features in this specification.

New Devices

The following new devices are defined:

  • virtio-input is a Linux evdev input device (mouse, keyboard, joystick)
  • virtio-gpu is a 2D graphics device (with 3D support planned)
  • virtio-vsock is a host->guest socket communications device
  • virtio-crypto is a cryptographic accelerator device

New Device Features

virtio-net

virtio-blk

virtio-balloon

New Core Features

There is a new virtqueue memory layout called packed virtqueues. The old layout is called split virtqueues because the avail and used rings are separate from the descriptor table. The new packed virtqueue layout uses just a single descriptor table as the single ring. The layout is optimized for a friendlier CPU cache footprint and there are several features that devices can exploit for better peformance.

The VIRTIO_F_NOTIFICATION_DATAfeature is an optimization mainly for hardware implementations of VIRTIO. The driver writes extra information as part of the Available Buffer Notification. Thanks to the information included in the notification, the device does not need to fetch this information from memory anymore. This is useful for PCI hardware implementations where minimizing DMA operations improves performance significantly.

by Unknown (noreply@blogger.com) at April 18, 2019 12:56 PM

April 09, 2019

Cole Robinson

Host 'Network Interfaces' panel removed from virt-manager

I released virt-manager 2.0.0 in October 2018. Since the release contained the full port to python3, it seemed like a good opportunity to drop some baggage from the app.

The biggest piece we removed was the UI for managing host network interfaces. This is the Connection Details->Network Interfaces panel, and the 'New Interface' wizard for defining host network definitions for things like bridges, bonds, and vlan devices. The main screen of the old UI looked like this:


Behind the scenes, this UI was using libvirt's Interface APIs, which also power the 'virsh iface-*' commands. These APIs are little more than a wrapper around the netcf library.

netcf aimed to be a linux distro independent API for network device configuration. On Red Hat distros this meant turning the API's XML format into an /etc/sysconfig/network script. There were even pie-in-the-sky ideas about NetworkManager one day using netcf.

In practice though the library never really took off. It was years before a debian backend showed up, contributed by a Red Hatter in the hope of increasing library uptake, though it didn't seem to help. netcf basically only existed to serve the libvirt Interface APIs, yet those APIs were never really used by any major libvirt consuming app, besides virt-manager. And in virt-manager's case it was largely just slapping some UI over the XML format and lifecycle operations.

For virt-manager's usecases we hoped that netcf would make it trivial to bridge the host's network interface, which when used with VMs would give them first class IP addresses on the host network setup, not NAT like the 'default' virtual network. Unfortunately though the UI would create the ifcfg files well enough, behind the scenes nothing played well with NetworkManager for years and years. The standard suggestion for was to disable NetworkManager if you wanted to bridge your host NIC. Not very user friendly. Some people did manage to use the UI to that effect but it was never a trivial process.


Nowadays NetworkManager can handle bridging natively and is much more powerful than what virt-manager/libvirt/netcf provide. The virt-manager UI was more likely to shoot you in the foot than make things simple. And it had become increasingly clear that virt-manager was not the place to maintain host network config UI.

So we made the decision to drop all this from virt-manager in 2.0.0. netcf and the libvirt interface APIs still exist. If you're interested in some more history on the interface API/netcf difficulties, check out Laine's email to virt-tools-list.

by Cole Robinson (noreply@blogger.com) at April 09, 2019 02:01 PM

April 02, 2019

Gerd Hoffmann

drminfo 6 released

drminfo is a small collection of tools for drm and fbdev devices. They print device information and can run some basic tests.

New in version 6 is a number of avocado test cases for qemu display devices (stdvga, cirrus, qxl and virtio).

drminfo has a homepage and a git repository.
My copr repo has Fedora and EPEL rpm packages.

by Gerd Hoffmann at April 02, 2019 10:00 PM

March 12, 2019

Cornelia Huck

s390x changes in QEMU 4.0

QEMU is now entering softfreeze for the 4.0 release (expected in April), so here is the usual summary of s390x changes in that release.

CPU Models

  • A cpu model for the z14 GA 2 has been added. Currently, no new features have been added.
  • The cpu model for z14 now does, however, include the multiple epoch and PTFF enhancement features per default.
  • The 'qemu' cpu model now includes the zPCI feature per default. No more prerequisites are needed for pci support (see below).

Devices


  • QEMU for s390x is now always built with pci support. If we want to provide backwards compatibility,  we cannot simply disable pci (we need the s390 pci host bus); it is easier to simply make pci mandatory. Note that disabling pci was never supported by the normal build system anyway.
  • zPCI devices have gained support for instruction counters (on a Linux guest, these are exposed through /sys/kernel/debug/pci/<function>/statistics).
  • zPCI devices always lacked support for migrating their s390-specific state (not implemented...); if you tried to migrate a guest with a virtio-pci device on s390x, odd things might happen. To avoid surprises, the 'zpci' devices are now explicitly marked as unmigratable. (Support for migration will likely be added in the future.)
  • Hot(un)plug of the vfio-ap matrix device is now supported.
  • Adding a vfio-ap matrix device no longer inhibits usage of a memory ballooner: Memory usage by vfio-ap does not clash with the concept of a memory balloon.

TCG

  • Support for the floating-point extension facility has been added.
  • The first part of support for z13 vector instructions has been added (vector support instructions). Expect support for the remaining vector instructions in the next release; it should support enough of the instructions introduced with z13 to be able to run a distribution built for that cpu. 

by Cornelia Huck (noreply@blogger.com) at March 12, 2019 06:20 PM

March 11, 2019

KVM on Z

libvirt v4.10 released, providing PCI passthrough support

libvirt v4.10, available for download at the libvirt project website, adds support for PCI passthrough devices on IBM Z (requires Linux kernel 4.14 and QEMU v2.11).
To setup passthrough for a PCI device, follow these steps:
  1. Make sure the vfio-pci module is  available, e.g. using the modinfo command:
       $ modinfo vfio-pci
       filename:       /lib/modules/4.18.0/kernel/drivers/vfio/pci/vfio-pci.ko
       description:    VFIO PCI - User Level meta-driver
  2. Verify that the pciutils package, providing the lspci command et al, is available using your distro's package manager
  3. Determine the PCI device's address using the lspci command:
       $ lspci

       0002:06:00.0 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family

                    [ConnectX-3/ConnectX-3 Pro Virtual Function]
     
  4. Add the following element to the guest domain XML's devices section:
       <hostdev mode='subsystem' type='pci' managed='yes'>

         <source>

           <address domain='0x0002' bus='0x06' slot='0x00' function='0x0'/>

         </source>

       </hostdev>

    Note that if attribute managed is set to no (which is the default), it becomes the user's duty to unbind the PCI device from the respective device driver, and rebind to vfio-pci in the host prior to starting the guest.
Once done and the guest is started, running the lspci command in the guest should show the PCI device, and one can proceed to configure it as needed.
It is well worth checking out the expanded domain XML:
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0002' bus='0x06' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0002' bus='0x00' slot='0x01' function='0x0'>
        <zpci uid='0x0001' fid='0x00000000'/>
      </address>
    </hostdev>

Theoretically, the PCI address in the guest can change between boots. However, the <zpci> element guarantees address persistence inside of the guest. The actual address of the passthrough device is based solely on the uid attribute: The uid becomes the PCI domain, all remaining values of the address (PCI bus, slot and function) are set to zero. Therefore, in this example, the PCI address in the guest would be 0001:00:00:0.
Take note of the fid attribute, whose value is required to hotplug/hotunplug PCI devices within a guest.
Furthermore note that the target PCI address is not visible anywhere (except within the QEMU process) at all. I.e. it is not related to the PCI address as observed within the KVM guest, and could be set to an arbitrary value. However, choosing the "wrong" values might have undesired subtle side effects with QEMU. Therefore, we strongly recommend not to specify a target address, and to rather rely on the auto-assignment. I.e. if the guest's PCI address has to be chosen, at a maximum restrict the target address element to uid (for PCI address definition) and fid (so that e.g. scripts in the guest for hotplugging PCI devices can rely on a specific value) as follows:
   <address type='pci'>
     <zpci uid='0x0001' fid='0x00000000'/>
   </address>


For further (rather technical) details see here and here (git commit).

by Stefan Raspl (noreply@blogger.com) at March 11, 2019 03:11 PM

March 02, 2019

Gerd Hoffmann

EDID support for qemu

Over the last months I've worked on adding EDID support to qemu. This allows to pass all kinds of information about the (virtual) display to the guest. Preferred video mode, display resolution, monitor name, monitor serial number and more. Current focus is getting the infrastructure in place. Once we have this we can build new features on top. HiDPI support comes to mind for example.

New in qemu 3.1

In qemu 3.1 the EDID generator code and support for the qemu stdvga was added. Right now EDID support is turned off by default, use edid=on to enable it. With EDID enabled you can also use the xres and yres properties to set the preferred video mode. Here is an example: qemu -device VGA,edid=on,xres=1280,yres=800

The qemu-edid utility has been added too. Main purpose is to allow testing the generator code without having to boot a guest for that, so typically the qemu-edid output is piped into the edid-decode utility to verify the generator works correctly. If you need an EDID blob for other reasons you might find this useful.

New in linux kernel 5.0

Infrastructure work: Some interface updates have been merged:

  • EDID support has been added to the virtio-gpu protocol.
  • The vfio mdev interface for vgpus got EDID support too.

We also got EDID support in drm drivers for both qemu stdvga (bochs-drm.ko) and virtio-gpu.

If both guest (linux kernel) and host are (qemu) are new enough the drm drivers will use the informations from the edid blob to create the video mode list. It will also be available in sysfs, you can use edid-decode to get a human-readable version: edid-decode /sys/class/drm/card0-Virtual-1/edid

Planned for qemu 4.0

Support for EDID in virtio-gpu will be added, it is already merged in master branch. It is also turned off by default, use edid=on to enable it (simliar to stdvga).

Support for EDID in vgpus (i.e. vfio-mdev) is planned too, patches are out for review right now. Note that the kernel's mdev driver must support that too.

The qemu macos driver for the stdvga have been updated to support EDID.

Future plans

Intel is working on adding EDID support to gvt (the intel graphics mdev driver). Should land in the 5.1 or 5.2 linux kernel merge window.

Once the EDID support got some real world testing it will be enabled by default for both stdvga and virtio-gpu. Unless something unexpected happens that will probably happen in qemu 4.1.

As already mentioned above looking at HiDPI support (starting with the gtk UI probably) is something I plan to look at when I find some time.

by Gerd Hoffmann at March 02, 2019 11:00 PM

February 28, 2019

Stefan Hajnoczi

QEMU accepted into Google Summer of Code and Outreachy 2019

QEMU is participating in the Google Summer of Code and Outreachy open source internship programs again this year. These 12-week, full-time, paid, remote work internships allow people interested in contributing to QEMU get started. Each intern works with one or more mentors who can answer questions and are experienced developers. This is a great way to try out working on open source if you are considering it as a career.

For more information (including eligibility requirements), see our GSoC and our Outreachy pages.

by Unknown (noreply@blogger.com) at February 28, 2019 05:17 PM

February 27, 2019

Gerd Hoffmann

ramfb display in qemu

ramfb is a very simple framebuffer display device. It is intended to be configured by the firmware and used as boot framebuffer, until the guest OS loads a real GPU driver.

The framebuffer memory is allocated from guest RAM and initialized using the firmware config interface (fw_cfg). edk2 (uefi firmware) has ramfb support. There also is a vgabios, which emulates vga text mode and renders it to the framebuffer.

Most interesting use case for this is boot display support for vgpus. vfio has a not hot-pluggable variant, which allows to enable ramfb support: qemu -device vfio-pci-nohotplug,ramfb=on,... Once the guest OS has initialized the vgpu qemu will show the vgpu display. Otherwise the ramfb framebuffer is used. The firmware messages, boot loader menu and efifb/vesafb output all will show via ramfb.

There also is a standalone device, mostly intended for testing: qemu -vga none -device ramfb.

Even though it is possible to use ramfb as primary display it isn't a good idea to actually do that as this isn't very efficient.

by Gerd Hoffmann at February 27, 2019 11:00 PM

February 26, 2019

QEMU project

Announcing GSoC and Outreachy 2019 internships

QEMU is once again participating in Google Summer of Code and Outreachy this year! These open source internship programs offer full-time remote work opportunities for talented new developers wishing to get involved in our community.

Each intern works with one or more mentors who support them in their project. Code is submitted according to QEMU’s normal development process, giving the intern experience in open source software development. Our projects range from device emulation to performance optimization to test infrastructure.

If you are interested in contributing to QEMU through a paid 12-week internship from May to August 2019, take a look at our GSoC page and our Outreachy page for more information.

Both GSoC and Outreachy have eligibility criteria, which you can review here (GSoC) and here (Outreachy) before applying.

You can read about projects that were completed in 2018 here.

These internships are generously funded by Google (GSoC) and Red Hat (Outreachy).

February 26, 2019 07:00 AM

February 18, 2019

Daniel Berrange

Easier QEMU live tracing using systemtap

QEMU is able to leverage a number of live tracing systems, the choice configurable at build time between

  • log – printf formatted string for each event sent into QEMU’s logging system which writes to stderr
  • syslog – printf formatted string for each event sent via syslog
  • simple – binary data stream for each event written to a file or fifo pipe
  • ftrace – printf formatted string for each event sent to kernel ftrace facility
  • dtrace – user space probe markers dynamically enabled via dtrace or systemtap
  • ust – user space probe markers dynamically enabled via LTT-ng

Upstream QEMU enables the “log” trace backend by default since it is cross-platform portable and very simple to use by adding “-d trace:PATTERN” on the QEMU command line. For example to enable logging of all trace events in the QEMU I/O subsystem (aka “qio“) we can

$ qemu -d trace:qio* ...some args...
23266@1547735759.137292:qio_channel_socket_new Socket new ioc=0x563a8a39d400
23266@1547735759.137305:qio_task_new Task new task=0x563a891d0570 source=0x563a8a39d400 func=0x563a86f1e6c0 opaque=0x563a89078000
23266@1547735759.137326:qio_task_thread_start Task thread start task=0x563a891d0570 worker=0x563a86f1ce50 opaque=0x563a891d9d90
23273@1547735759.137491:qio_task_thread_run Task thread run task=0x563a891d0570
23273@1547735759.137503:qio_channel_socket_connect_sync Socket connect sync ioc=0x563a8a39d400 addr=0x563a891d9d90
23273@1547735759.138108:qio_channel_socket_connect_fail Socket connect fail ioc=0x563a8a39d400

This is very simple and surprisingly effective much of the time, but it is not without its downsides

  • Inactive probes have non-negligible performance impact on hot codepaths
  • It is targetted to human consumption, so it is not easy to reliably process with machines
  • It requires adding arguments to QEMU’s command line so is not easy to enable in many cases
  • It is custom to QEMU so does not facilitate getting correlated traces across the whole system

For these reasons, some downstreams chose not to use the default “log” backend. Both Fedora and RHEL have instead enabled the “dtrace” backend which can be leveraged via systemtap on Linux. This provides a very powerful tracing system, but the cost is that the previous simple task of printing a formatted string when a probe point fires has become MUCH more complicated. For example to get equivalent output to that seen with QEMU’s log backend would require

# cat > trace.stp <<EOF
probe qemu.system.x86_64.qio_task_new {
    printf("%d@%d qio_task_new Task new task=%p source=%p func=%p opaque=%p\n", 
           pid(), gettimeofday_ns(), task, source, func, opaque)
}
EOF
# stap trace.stp
22806@1547735341399862570 qio_task_new Task new task=0x56135cd66eb0 source=0x56135d1d7c00 func=0x56135af746c0 opaque=0x56135bf06400

Repeat that code snippet for every qio* probe point you want to watch, figuring out the set of args it has available to print.This quickly becomes tedious for what should be a simple logging job, especially if you need to reference null terminated strings from userspace.

After cursing this difficulty one time too many, it occurred to me that QEMU could easily do more to make life easier for systemtap users. The QEMU build system is already auto-generating all the trace backend specific code from a generic description of probes in the QEMU source tree. It has a format string which is used in the syslog, log and ftrace backends, but this is ignored for the dtrace backend. It did not take much to change the code generator so that it can use this format string to generate a convenient systemtap tapset representing the above manually written probe:

probe qemu.system.x86_64.log.qio_task_new = qemu.system.x86_64.qio_task_new ?
{
    printf("%d@%d qio_task_new Task new task=%p source=%p func=%p opaque=%p\n",
           pid(), gettimeofday_ns(), task, source, func, opaque)
}

This can be trivially executed with minimal knowledge of systemtap tapset language required

# stap -e "qemu.system.x86_64.log.qio_task_new{}"
22806@1547735341399862570 qio_task_new Task new task=0x56135cd66eb0 source=0x56135d1d7c00 func=0x56135af746c0 opaque=0x56135bf06400

Even better, we have now gained the ability to use wildcards too

# stap -e "qemu.system.x86_64.log.qio*{}"
23266@1547735759.137292:qio_channel_socket_new Socket new ioc=0x563a8a39d400
23266@1547735759.137305:qio_task_new Task new task=0x563a891d0570 source=0x563a8a39d400 func=0x563a86f1e6c0 opaque=0x563a89078000
23266@1547735759.137326:qio_task_thread_start Task thread start task=0x563a891d0570 worker=0x563a86f1ce50 opaque=0x563a891d9d90
23273@1547735759.137491:qio_task_thread_run Task thread run task=0x563a891d0570
23273@1547735759.137503:qio_channel_socket_connect_sync Socket connect sync ioc=0x563a8a39d400 addr=0x563a891d9d90
23273@1547735759.138108:qio_channel_socket_connect_fail Socket connect fail ioc=0x563a8a39d400

Users still, however, need to be aware of the naming convention for QEMU’s systemtap tapsets and how it maps to the particular QEMU binary that is used, and don’t forget the trailing “{}”. Thus I decided to go one step further and ship a small helper tool to make it even easier to use

$ qemu-trace-stap run qemu-system-x86_64 'qio*'
22806@1547735341399856820 qio_channel_socket_new Socket new ioc=0x56135d1d7c00
22806@1547735341399862570 qio_task_new Task new task=0x56135cd66eb0 source=0x56135d1d7c00 func=0x56135af746c0 opaque=0x56135bf06400
22806@1547735341399865943 qio_task_thread_start Task thread start task=0x56135cd66eb0 worker=0x56135af72e50 opaque=0x56135c071d70
22806@1547735341399976816 qio_task_thread_run Task thread run task=0x56135cd66eb0

The second argument to this tool is the QEMU binary filename to be traced, which can be relative (to search $PATH) or absolute. What is clever is that it will set the SYSTEMTAP_TAPSET env variable to point to the right location to find the corresponding tapset definition. This is very useful when you have multiple copies of QEMU on the system and need to make sure systemtap traces the right one.

The ‘qemu-trace-stap‘ script takes a verbose arg so you can understand what it is running behind the scenes:

$ qemu-trace-stap run /home/berrange/usr/qemu-git/bin/qemu-system-x86_64 'qio*'
Using tapset dir '/home/berrange/usr/qemu-git/share/systemtap/tapset' for binary '/home/berrange/usr/qemu-git/bin/qemu-system-x86_64'
Compiling script 'probe qemu.system.x86_64.log.qio* {}'
Running script, <Ctrl>-c to quit
...trace output...

It can enable multiple probes at once

$ qemu-trace-stap run qemu-system-x86_64 'qio*' 'qcrypto*' 'buffer*'

By default it monitors all existing running processes and all future launched proceses. This can be restricted to a specific PID using the –pid arg

$ qemu-trace-stap run --pid 2532 qemu-system-x86_64 'qio*'

Finally if you can’t remember what probes are valid it can tell you

$ qemu-trace-stap list qemu-system-x86_64
ahci_check_irq
ahci_cmd_done
ahci_dma_prepare_buf
ahci_dma_prepare_buf_fail
ahci_dma_rw_buf
ahci_irq_lower
...snip...

This new functionality merged into QEMU upstream a short while ago and will be included in the QEMU 4.0 release coming at the end of April.

by Daniel Berrange at February 18, 2019 03:00 PM

January 28, 2019

Thomas Huth

How to create small VMs with buildroot

A couple of times I already ran into the situation that I wanted to provide a small guest disk image to other people. For example, one time I wanted to provide a test application like LTP to colleagues via a server where I only had some limited disk quota available. Back then I was still able to resolve the problem by installing a stock Linux distribution together with the test software into a normal qcow2 image, and then to shrink the image with qemu-img convert and xz to approximately 500 MiB.

But when I started to think about the QEMU advent calendar 2018, where I wanted to provide many small images for various different target architectures, it was clear to me that I needed a different approach. First the disk images needed to be much smaller due to network traffic constraints, and for many of “non-mainstream” target architectures (like MicroBlaze or Xtensa) you also can not easily get a standard Linux distribution that installs without problems on the machines that QEMU provides.

Instead of using a pre-built Linux distribution, it would also be possible to cross-compile the kernel and user space programs and build a small disk image with that on your own. However, figuring out how to do that for multiple target architectures would have been very cumbersome and time consuming.

So after doing some research, I finally discovered buildroot, which is an excellent framework for doing exactly what I wanted: It allows to create small disk images for non-x86 target CPUs, with all the magic about cross compiling and image creation wrapped into its internal scripts, and with a very flexible Kconfig-style configuration system on top.

For those who are interested, here’s now a short description how to use buildroot for creating a small guest disk image:

  1. Download the version that you like to use from the buildroot download page and unpack it:
    $ wget https://buildroot.org/downloads/buildroot-2018.02.9.tar.bz2
    $ tar -xaf buildroot-2018.02.9.tar.bz2 
    $ cd buildroot-2018.02.9/
    
  2. Now you have to choose for which CPU and machine target you want to build. Have a look at the pre-defined config files and then select one. In the following example, I’m going to use the “pseries” POWER machine:
    $ cd configs/
    $ ls qemu*
    qemu_aarch64_virt_defconfig         qemu_nios2_10m50_defconfig
    qemu_arm_versatile_defconfig        qemu_or1k_defconfig
    qemu_arm_versatile_nommu_defconfig  qemu_ppc64le_pseries_defconfig
    qemu_arm_vexpress_defconfig         qemu_ppc64_pseries_defconfig
    qemu_m68k_mcf5208_defconfig         qemu_ppc_g3beige_defconfig
    qemu_m68k_q800_defconfig            qemu_ppc_mpc8544ds_defconfig
    qemu_microblazebe_mmu_defconfig     qemu_ppc_virtex_ml507_defconfig
    qemu_microblazeel_mmu_defconfig     qemu_sh4eb_r2d_defconfig
    qemu_mips32r2el_malta_defconfig     qemu_sh4_r2d_defconfig
    qemu_mips32r2_malta_defconfig       qemu_sparc64_sun4u_defconfig
    qemu_mips32r6el_malta_defconfig     qemu_sparc_ss10_defconfig
    qemu_mips32r6_malta_defconfig       qemu_x86_64_defconfig
    qemu_mips64el_malta_defconfig       qemu_x86_defconfig
    qemu_mips64_malta_defconfig         qemu_xtensa_lx60_defconfig
    qemu_mips64r6el_malta_defconfig     qemu_xtensa_lx60_nommu_defconfig
    qemu_mips64r6_malta_defconfig
    $ cd ..
    $ make qemu_ppc64_pseries_defconfig
    
  3. Now run make menuconfig to fine tune your build. I recommend to have a look at the following settings first:
    • In the Toolchain section, you might need to enable other languages like C++ in case it is required for the application that you want to ship in the image.
    • In the System Configuration section, change the System Banner to something that better suits your disk image.
    • Check the Kernel section to see whether the right kernel settings are used here. The defaults should be fine most of the time, but in case you want to use a newer kernel version for example, or a different kernel config file, you can adjust it here. Note that you also should adjust the kernel header version in the Toolchain section if you change the kernel version here.
    • Have a look at the Target packages section – maybe the application that you want to include is already available by the base buildroot system. In that case you can already enable it here.
    • Check the Filesystem images section and decide which kind of image you want to ship later. For example, for most of the QEMU advent calendar images, I used a simple initrd only, so I unchecked the ext2/3/4 root filesystem here and used initial RAM filesystem linked into linux kernel instead.
  4. Now save your configuration, exit the config menu, and type make for a first test to see whether it produces a usable image. Note: Don’t use the -j parameter of make here, buildroot will figure that out on its own instead.

  5. Once the build finished successfully, have a look at the output/images/ directory. You can start your guest with the results from there to give it a try. For example if you built with the ppc64 pseries configuration, with the initrd linked into the kernel:
    $ qemu-system-ppc64 -M pseries -m 1G -kernel output/images/vmlinux
    

    You should see the kernel booting up, and if you have a look at the serial console, there is also a getty running where you can log in as root and look around.

  6. To customize your build, you sooner or later want to add additional files to the image, for example some additional init scripts in the /etc/init.d/ folder. Or in the above case, it would be good to also have getty running on the graphical console. So to add custom files, the best way is to create an overlay folder which will be copied into the destination filesystem during the make process:
    $ mkdir overlay/etc/init.d
    $ cp my-startup-script.sh overlay/etc/init.d/S99myscript  # If you have one
    $ cp output/target/etc/inittab overlay/etc/inittab
    $ echo 'tty1::respawn:/sbin/getty -L tty1 0 linux' >> overlay/etc/inittab
    

    Then run make menuconfig and set the Root filesystem overlay directories option in the System Configuration section to the overlay folder that you have just created. Run make again and the next time you start your guest, you should see the new files in the image, e.g. also a getty running on the graphical console. Note: Do not try to add/change files directly in the output/target/ folder. That looks tempting first, but this is just a temporary folder used by the build system, which can be overwritten at any time and will be erased when you run make clean for example.

  7. If you need to tweak the kernel configuration, you can run make linux-menuconfig and do the appropriate changes there. For example, if you want to get keyboard input for the ppc64 pseries machine on the graphical console, you should enable the USB XHCI driver in the kernel, too. Once you are happy with the kernel configuration, save it, exit the menu and type make linux-rebuild && make. Note: To avoid that the kernel config gets reset after you run make clean at a later point in time, you should copy output/build/linux-*/.config to a safe location. Then run make menuconfig, change the Kernel -> Kernel configuration setting to Use a custom config file and set the Configuration file path to the copied file.

  8. If you want to add additional software to your image, you basically have to provide a Config.in file and a *.mk file. I recommend to have a look at the various packages in the package/ directory. Use one of the software from there with a similar build system as a template, and have a closer look at buildroot manual for details. Tweaking the build system of your software to properly cross-compile can be a little bit tricky some times, but most software that uses standard systems like autoconf should be fine.

That’s it. You now should be able to package your software in really small VM images. Of course, there are still lots of other settings that you can tweak in the buildroot environment – if you need any of these just have a look at the good buildroot manual for more information.

January 28, 2019 02:20 PM

January 25, 2019

Stefan Hajnoczi

VIRTIO 1.1 is available for public review until Feb 21st 2019

The VIRTIO 1.1 specification for paravirtualized I/O devices includes the new packed vring layout and the GPU, input, crypto, and socket device types. In addition to this there are other improvements and new features in the specification. The new vring layout will increase performance and offers new features that devices can take advantage of.

You can review the specification and post comments until February 21st 2019: VIRTIO 1.1 csprd01.

by Unknown (noreply@blogger.com) at January 25, 2019 02:42 PM

January 23, 2019

QEMU project

GSoC and Outreachy 2018 retrospective

QEMU participates in open source internship programs including Google Summer of Code (GSoC) and Outreachy. These full-time remote work opportunities allow talented new developers to get involved in our community. This post highlights what our interns achieved in 2018.

micro:bit board emulation

Julia Suvorova (Outreachy) and Steffen Görtz (GSoC) tackled adding emulation support for the micro:bit ARM board. Although QEMU already has plenty of ARM emulation code, the Cortex-M0 CPU used in the micro:bit was not yet implemented and the nRF51 system-on-chip was also missing.

The goal of this project was to run micro:bit programs (usually created with the MicroPython or Javascript/Blocks IDEs) with a core set of emulated devices, including the serial port, pushbuttons, and LEDs.

QEMU 3.1 already shipped the groundwork for the new qemu-system-arm -M microbit machine type. Enough functionality to run basic micro:bit programs is expected in the next QEMU release.

This project was mentored by Jim Mussared, Joel Stanley, and Stefan Hajnoczi.

Patchew REST API improvements

Shubham Jain (GSoC) created a REST API for the Patchew continuous integration system that is at the heart of QEMU’s development process. The previous API was not RESTful and exposed database schema internals.

The improvements to the REST API have been included into Patchew and are deployed on patchew.org. They are not in use yet, pending more work on authentication; this may be the topic of a future Summer of Code internship.

This project was mentored by Paolo Bonzini and Fam Zheng.

Qtest Driver Framework

Emanuele Esposito (GSoC) enhanced QEMU’s test infrastructure with an engine that starts tests with all variants of devices that they are capable of driving.

This is a complicated task in QEMU since certain devices and buses are available in an architecture-specific way on each emulation target, making it hard to write test cases without lots of hardcoded dependencies - and to keep them up-to-date!

The qgraph framework that Emanuele created eliminates the need to hardcode each variant into the test. Emanuele also converted several existing tests. His framework was also presented at KVM Forum 2018 by Laurent Vivier and should be merged in 4.0.

This project was mentored by Paolo Bonzini and Laurent Vivier.

Vulkan-izing VirGL

Nathan Gauër (GSoC) improved VirGL, which provides an OpenGL path well supported on Linux guests. On the host, QEMU offers several console back-ends, from EGL to SDL. Adding a Vulkan path will require to change the current VirGL API, write new guest drivers, and also offer a way to display the output. This is a huge task, which can be split in several sub-projects. Expending the current VirGL API to support Vulkan is the first step.

Code is available here.

This project was mentored by Marc-André Lureau.

Stay tuned for 2019 internships!

QEMU will apply to Google Summer of Code and Outreachy again in 2019. We hope to offer more great open source internship opportunities for new developers.

January 23, 2019 06:50 AM

January 07, 2019

KVM on Z

Knowledge Series: CPU Models and Migration Considerations


Another new entry in our Knowledge Series explains CPU models and its relationship to live guest migration.

by Stefan Raspl (noreply@blogger.com) at January 07, 2019 09:23 PM

January 06, 2019

Stefan Hajnoczi

mute-thread: a script to mute email threads with notmuch

Ever get included on an email thread that isn't relevant? It can be distracting to see new emails appear on a thread you already know is not interesting. You could mark them as read manually, but that is tedious.

This mute-thread script silences email threads that you don't want to read, even after new emails are received.

Download it here.

Setup

It relies on the awesome notmuch(1) email utility, so make sure you have that set up in order to use this script.

The following .muttrc macro integrates this with the mutt(1) email client. When you press M the entire email thread is muted:

macro index M "<enter-command>unset wait_key<enter><pipe-message>~/.mutt/mute-thread add<enter><enter-command>set wait_key<enter><read-thread>" "Mute thread"

After fetching new emails, run notmuch and then mute-thread apply.

Unmuting threads

If you change your mind, run mute-thread remove MESSAGE-ID to unmute a thread again. Future emails will not be silenced.

by Unknown (noreply@blogger.com) at January 06, 2019 07:56 AM

December 13, 2018

KVM on Z

SLES 12 SP4 released

SLES 12 SP4 is out! See the announcement and their release note with Z-specific changes.
It ships the following code levels:
  • Linux kernel 4.12 (SP3: 4.4),
  • QEMU v2.11 (SP3: v2.9), and
  • libvirt v4.0 (SP3: v3.3).
See previous blog entries on QEMU v2.10 and v2.11 for details on new features that become available by the QEMU package update.
See previous blog entries on Linux kernel 4.8 and 4.11 for details on new features becoming available through the kernel update, e.g. nested virtualization support.
An additional feature in this release is the availability of STHYI information in LPAR environments. Requires qclib v1.3 or later. See this blog post for general information on qclib.
    Furthermore, note that these changes provide a full CPU model, which provides protection against live guest migration compatibility troubles. E.g. migrating a guest exploiting the latest features to a KVM instance running on an earlier IBM Z machine lacking said feature would be detected an prevented.
    Note: With this feature, live guest migration back to a KVM instance that does not yet support CPU models (e.g. SLES 12 SP3) will not work anymore.

      by Stefan Raspl (noreply@blogger.com) at December 13, 2018 10:04 AM

      December 12, 2018

      QEMU project

      QEMU version 3.1.0 released

      We would like to announce the availability of the QEMU 3.1.0 release. This release contains 1900+ commits from 189 authors.

      You can grab the tarball from our download page. The full list of changes are available in the Wiki.

      Highlights include:

      • ARM: emulation support for microbit and Xilinx Versal machine models
      • ARM: support for ARMv6M architecture and Cortex-M0 CPU model
      • ARM: support for Cortex-A72 CPU model
      • ARM: virt/xlnx-zynqmp: virtualization extensions for GICv2 interrupt controller
      • ARM: emulation of AArch32 virtualization/hypervisor mode now supported for Cortex-A7 and Corex-A15
      • MIPS: emulation support for nanoMIPS I7200
      • MIPS: emulation support for MXU SIMD instructions for MIPS32
      • PowerPC: pseries: enablement of nested virtualization via KVM-HV
      • PowerPC: prep: deprecated in favor of 40p machine model
      • Powerpc: 40p: IRQ routing fixes, switch from Open HackWare to OpenBIOS
      • PowerPC: g3beige/mac99: support for booting from virtio-blk-pci
      • s390: VFIO passthrough support for crypto devices (vfio-ap)
      • s390: KVM support for backing guests with huge pages
      • SPARC: sun4u: support for booting from virtio-blk-pci
      • x86: multi-threaded TCG support
      • x86: KVM support for Enlightened VMCS (improved perf for Hyper-V on KVM)
      • x86: KVM support for Hyper-V IPI enlightenments
      • Xtensa: support for input from chardev consoles
      • Support for AMD IOMMU interrupt remapping and guest virtual APIC mode
      • XTS cipher mode is now ~2x faster
      • stdvga and bocks-display devices can expose EDID information to guest, (for use with xres/yres resolution options)
      • qemu-img tool can now generate LUKS-encrypted files through ‘convert’ command
      • and lots more…

      Thank you to everyone involved!

      December 12, 2018 06:50 AM

      December 04, 2018

      Cornelia Huck

      Notes from KVM Forum 2018

      KVM Forum 2018 took place October 24 - 26 in Edinburgh, Scotland. Better late than never, here are some of my notes and impressions. As always, there was a lot going on, and I could not attend everything that I would have found interesting. Fortunately, video recordings are available (see the page linked above, respectively the YouTube channel); here, I'd like to thank the folks organizing the logistics, recording the talks, and uploading nicely edited versions!

      This year, KVM Forum was again co-located with OSS Europe, and on the first day (which also featured the annual QEMU summit), talks were on a shared track. This meant an opportunity for people attending OSS to hear some KVM and virtualization related talks; unfortunately, it also meant that the room where the KVM Forum talks were held was very crowded. Nevertheless, it is always nice if a talk is interesting enough to attract a good number of people; I'm happy that my maintainership talk also attracted a nice audience. Other talks from the first day I enjoyed were Alex' talk about L1TF and Marc's talk about running huge libvirt installations.

      The second and third day featured some more comfortable rooms; organization-wise, I liked that talks about similar topics were grouped back-to-back.

      On these days, we had the keynotes for KVM, QEMU, and libvirt; as well as the contributor Q&A panel - some good questions from the audience there. Also check out Christian's talk about the various architectures supported by KVM and how much commonality is there (or not).

      Most of the time, days two and three were dual-track. Some of the topics covered were vfio and migration with vfio; nested virtualization; not-so-common architectures (including s390!); testing and continuous integration. I find it hard to point out specific sessions and recommend browsing through the posted videos instead.

      Some topics were delved into more deeply in BOF sessions; myself, I attended the vfio migration BOF which gave me a couple of things to think about. Many BOF sessions subsequently posted summaries on the relevant mailing lists.

      One of the most important features of any conference is, of course, the hallway track: Meeting new people, seeing old acquaintances again, and impromptu discussions about a lot of different topics. I find that this is one of the most valuable experiences, both for putting a face to a name and for discussing things you did not event think about beforehand.

      So, for an even shorter summary of my short notes: KVM Forum 2018 was great, go watch some videos, and consider attending future KVM Forums :)

      by Cornelia Huck (noreply@blogger.com) at December 04, 2018 06:52 PM

      December 03, 2018

      KVM on Z

      SLES 12 SP3 Updates


      SLES 12 SP3, released late last year, received a couple of mostly performance and security-related updates in support of IBM z14 and LinuxONE through the maintenance web updates.
      In particular:

        by Stefan Raspl (noreply@blogger.com) at December 03, 2018 08:38 AM

        December 01, 2018

        Thomas Huth

        QEMU Advent Calendar 2018 opened the first door

        Starting today, on December 1st, the first door of the QEMU Advent Calendar 2018 can now be opened! The advent calendar reveals a new disk image for download on each of the first 24 days in December 2018, to create a fun experience for the QEMU community, to celebrate the 15th anniversary of QEMU, and to provide some good images for testing the various CPU targets of QEMU – this year it will contain way more images for non-x86 targets than before, so if you are interested in collecting test images for the various CPU targets of QEMU, be sure to check the calendar regularly!

        December 01, 2018 07:05 AM

        November 29, 2018

        Daniel Berrange

        Improved translation po file handling by ditching gettext autotools integration

        The libvirt library has long provided translations of its end user facing strings, which largely means error messages and console output from command line tools / daemons. Since libvirt uses autotools for its build system, it naturally used the standard automake integration provided by gettext for handling .po files. The libvirt.pot file with master strings is exported to Zanata, where the actual translation work is outsourced to the Fedora translation team who support up to ~100 languages. At time of writing libvirt has some level of translation in ~45 languages.

        With use of Zanata, libvirt must periodically create an updated libvirt.pot file and push it to Zanata, and then just before release it must pull the latest translated .po files back into GIT for release.

        There have been a number of problems with this approach which have been annoying us pretty much since the start, and earlier this year it finally became too much to bear any longer.

        • The per-language translation files stored in git contain source file name and line number annotations to indicate where each translatable string originates. Since the translation files are not re-generated on every single source file changes, the file locations annotations becomes increasingly out of date after every commit. When the translation files are updated 98% of the diff is simply changing source file locations leading to a very poor signal/noise ratio.
        • The strings in the per-language translation files are sorted according to source filename. Thus when code is moved between files, or when files are renamed, the strings in the updated translation files all get needlessly reordered, again leading to a poor signal/noise ratio in diffs.
        • Each language translation file contains every translatable string even those which do not have any translation yet. This makes sense if translators are working directly against the .po files, but in libvirt everything is done via the Zanata UI which already knows the list of untranslated strings.
        • The per-language translation files grow in size over time with previously used message strings appended to the end of the file, never discarded by the gettext tools. This again makes sense if translators are working directly against .po files, but Zanata already provides a full translation memory containing historically used strings.
        • Whenever ‘make dist’ is run the gettext autotools integration will regenerate the per-language translation files. As a result of the three previous points, every time a release is made there’s a giant commit more than 100MB in size that contains diffs for translated files which are entirely noise and no signal.

        One suggested approach to deal with this is to stop storing translations in GIT at all and simply export them from Zanata only at time of ‘make dist’. The concern with this approach is that the GIT repository no longer contains the full source for the project in a self-contained manner. ‘make dist‘ now needs a live network connection to the Zanata servers. If we were to replace Zanata with a new tool in the future (Zanata is already a replacement for the previously used Transifex), we would potentially loose access to translations for old releases.

        With this in mind we decided to optimize the way translations are managed in GIT.

        The first easy win was to simply remove the master libvirt.pot file from GIT entirely. This file is auto-generated from the source files and is out of date the moment any source file changes, so no one would ever want to use the stored copy.

        The second more complex step was to minimize and canonicalize the per-language translation files. msgmerge is used to take the full .po file and strip out the source file locations and sort the string alphabetically. A perl script is then used to further process the content dropping any translations marked as “fuzzy” and drop any strings for which there is no translated text available. The resulting output is still using the normal .po file format but we call these ‘.mini.po‘ files to indicate that they are stripped down compared to what you’d normally expect to see.

        The final step was to remove the gettext / autotools integration and write a custom Makefile.am to handle the key tasks.

        • A target ‘update-mini-po‘ to automate the process of converting full .po files into .mini.po files. This is used when pulling down new translations from Zanata to be stored in git before release.
        • A target ‘update-po’ to automate the inverse process of converting .mini.po files back into full .po files. This is to be used by anyone who might need to look at full language translations outside of Zanata.
        • An install hook to generate the binary .gmo files from the .mini.po files and install them into /usr/share/locale for use at runtime. This avoids the need to ship the full .po files in release tarballs.
        • A target ‘zanata-push‘ to automate the process of re-generating the libvirt.pot file and uploading it to Zanata.
        • A target ‘zanata-pull‘ to automate the process of pulling new translations down from zanata and then triggering ‘update-mini-po

        After all this work was completed the key benefits are

        • The size of content stored in GIT was reduced from ~100MB to ~18MB.
        • Updates to the translations in GIT now produce small diffstats with a high signal/noise ratio
        • Files stored in GIT are never changed as a side effect of build system commands like ‘make dist’
        • The autotools integration is easier to understand

        while not having any visible change on the translators using Zanata. In the event anyone does need to see full translation languages outside of Zanata there is an extra step to generate the full .po files from the .mini.po files but this is countered by the fact that the result will be fully up to date with respect to translatable strings and source file locations.

        I’d encourage any project which is using gettext autotools integration, while also outsourcing to a system like Zanata, to consider whether they’d benefit from taking similar steps to libvirt. Not all projects will get the same degree of space saving but diffstats with good signal/noise ratios and removing side effects from ‘make dist’ are wins that are likely desirable for any project.

         

        by Daniel Berrange at November 29, 2018 12:22 PM

        November 28, 2018

        Stefan Hajnoczi

        Software Freedom Conservancy donations are being matched again!

        Donations to Software Freedom Conservancy, the charity that acts as the legal home for QEMU and many other popular open source projects that don't run their own foundations or charities, are being matched again this year. That means your donation is doubled thanks to a group of donors who have pledged to match donations.

        Software Freedom Conservancy helps projects with the details of running an open source project (legal advice, handling expenses, organizing conferences, etc) as well as taking a leading position on open source licensing and enforcement. Their work is not-for-profit and in the interest of the entire open source community.

        If you want more projects like QEMU, Git, Samba, Inkscape, and Selenium to succeed as healthy open source communities, then donating to Software Freedom Conservancy is a good way to help.

        Find out about becoming a Supporter here.

        by Unknown (noreply@blogger.com) at November 28, 2018 10:37 AM

        November 27, 2018

        Stefan Hajnoczi

        QEMU Advent Calendar 2018 is coming!

        QEMU Advent Calendar is running again this year. Each day from December 1st through 24th a surprise QEMU disk image will be released for your entertainment.

        Check out the website on December 1st for the first disk image:

        https://www.qemu-advent-calendar.org/2018/

        Thomas Huth is organizing QEMU Advent Calendar 2018 with the help of others from the QEMU community. If you want to contribute a disk image, take a look at the call for images email.

        by Unknown (noreply@blogger.com) at November 27, 2018 09:15 AM

        Powered by Planet!
        Last updated: June 25, 2019 08:01 PM
        Powered by OpenShift Online