Blogging about open source virtualization

News from QEMU, KVM, libvirt, libguestfs, virt-manager and related tools

Subscriptions

Planet Feeds

August 17, 2018

Daniel Berrange

ANNOUNCE: gtk-vnc 0.9.0 release

I’m pleased to announce a new release of GTK-VNC, version 0.9.0. This is a cleanup/modernization release. Note that the next release (1.0.0) will drop support for GTK-2

  • Requires gnutls >= 3.1.18
  • Requires libgcrypt >= 1.5.0
  • Requires glib2 >= 2.42.0
  • Use libgcrypt for DES routines
  • Add missing cipher close calls in ARD auth
  • Check for errors after reading mslogon params
  • Support newer UltraVNC mslogon auth type code
  • Avoid divide by zero in mslogin auth from bogus params
  • Re-allow python2 accidentally blocked when removing python binding

Thanks to all those who reported bugs and provides patches that went into this new release.

by Daniel Berrange at August 17, 2018 04:01 PM

August 15, 2018

QEMU project

QEMU version 3.0.0 released

We’d like to announce the availability of the QEMU 3.0.0 release. This release contains 2300+ commits from 169 authors.

A note from the maintainer: Why 3.0? Well, we felt that our version numbers were getting a bit unwieldy, and since this year is QEMU’s 15th birthday it seemed like a good excuse to roll over the major digit. Going forward we plan to increment the major version once a year, for the first release of the year. Don’t read too much into it: it doesn’t imply a drastic compatibility break. Rumours of our triskaidekaphobia have been greatly exaggerated ;-)

You can grab the tarball from our download page. The full list of changes are available in the Wiki.

Highlights include:

  • Support for additional x86/AMD mitigations against Speculative Store Bypass (Spectre Variant 4, CVE-2018-3639)
  • Improved support for nested KVM guests running on Hyper-V
  • Block device support for active disk-mirroring, which avoids convergence issues which may arise when doing passive/background mirroring of busy devices
  • Improved support for AHCI emulation, SCSI emulation, and persistent reservations / cluster management
  • OpenGL ES support for SDL front-end, additional framebuffer device options for early boot display without using legacy VGA emulation
  • Live migration support for TPM TIS devices, capping bandwidth usage during post-copy migration, and recovering from a failed post-copy migration
  • Improved latency when using user-mode networking / SLIRP
  • ARM: support for SMMUv3 IOMMU when using ‘virt’ machine type
  • ARM: v8M extensions for VLLDM and VLSTM floating-point instructions, and improved support for AArch64 v8.2 FP16 extensions
  • ARM: support for Scalable Vector Extensions in linux-user mode
  • Microblaze: support for 64-bit address sizes and translation bug fixes
  • PowerPC: PMU support for mac99 machine type and improvements for Uninorth PCI host bridge emulation for Mac machine types
  • PowerPC: preliminary support for emulating POWER9 hash MMU mode when using powernv machine type
  • RISC-V: improvement for privileged ISA emulation
  • s390: support for z14 ZR1 CPU model
  • s390: bpb/ppa15 Spectre mitigations enabled by default for z196 and later CPU models
  • s390: support for configuring consoles via -serial options
  • and lots more…

Thank you to everyone involved!

August 15, 2018 11:25 AM

August 01, 2018

Daniel Berrange

ANNOUNCE: gtk-vnc 0.8.0 release

I’m pleased to announce a new release of GTK-VNC, version 0.8.0. This is a small maintenance release tidying up some loose ends

  • Deleted the python2 binding in favour of GObject introspection
  • Pull in latest keycodemapdb content
  • Disable/fix -Wcast-function-type warnings

Thanks to all those who reported bugs and provides patches that went into this new release.

by Daniel Berrange at August 01, 2018 04:45 PM

Cornelia Huck

s390x changes in QEMU 3.0

QEMU 3.0 is currently in the late -rc phase (with the final release expected early/mid August), so here's a quick summary of what has been changed for s390x.

CPU models

  • A CPU model for the z14 Model ZR1 has been added. This is the "small", single-frame z14.
  • The feature bits for Spectre mitigation (bpb and ppa15) are now included in the default CPU model for z196 and up. This means that these features will be available to the guest (given the host supports them) without needing to specify them explicitly.

Devices

  • You can now configure consoles via -serial as well.
  • vfio-ccw devices have gained a "force-orb-pfch" property. This is not very useful for Linux guests, but if you are trying to use vfio-ccw with a guest that does not specify "unlimited prefetch" for its requests but does not actually rely on the semantics, this will help you. Adding support to vfio-ccw to accommodate channel programs that must not be prefetched is unfortunately not straightforward and will not happen in the foreseeable future.

Booting and s390 bios

  • The s390-netboot image has been enhanced: It now supports indirect loading via .INS files and pxelinux.cfg-style booting.
  • The boot menu can now also deal with non-sequential entries.

Miscalleneous

  • Handling of the TOD clock in tcg has been improved; CPU hotplug under tcg is now working.
  • And the usual fixes, cleanups and improvements.

by Cornelia Huck (noreply@blogger.com) at August 01, 2018 09:36 AM

July 27, 2018

Daniel Berrange

ANNOUNCE: virt-viewer 7.0 release

I am happy to announce a new bugfix release of virt-viewer 7.0 (gpg), including experimental Windows installers for Win x86 MSI (gpg) and Win x64 MSI (gpg). The virsh and virt-viewer binaries in the Windows builds should now successfully connect to libvirtd, following fixes to libvirt’s mingw port.

Signatures are created with key DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF (4096R)

All historical releases are available from:

http://virt-manager.org/download/

Changes in this release include:

  • Require spice-gtk >= 0.35
  • Clarify error message when no running VM is found
  • Improve check for libgovirt requirements
  • Support “-” as a URI for input connection file
  • Remove support for spice controller interface
  • Misc man page fixes
  • Lookup win32 translations relative to install dir
  • Position connect window in center not top-left
  • Misc fixes for ovirt foreign menu support

Thanks to everyone who contributed towards this release.

by Daniel Berrange at July 27, 2018 03:23 PM

July 23, 2018

Nathan Gauër

Vulkan-ize Virglrenderer - experiment

Virglrenderer provides OpenGL acceleration to a guest running on QEMU.

My current GSoC project is to add support for the Vulkan API.

Vulkan is drastically different to OpenGL. Thus, this addition is not straight-forward. My current idea is to add an alternative path for Vulkan. Currently, two different states are kept, one for OpenGL, and one for Vulkan. Commands will either go to the OpenGL or Vulkan front-end.

For now, only compute shaders are supported. The work is divided in two parts: a Vulkan ICD in MESA, and a new front-end for Virgl and vtest.

If you have any feedback, do not hesitate !

This experiment can be tested using this repository. If you have an Intel driver in use, you might be able to use the Dockerfile provided.

Each part is also available independently:

July 23, 2018 10:00 PM

July 20, 2018

KVM on Z

Knowledge: Use virt-install for KVM Guest Installations

Tired of fiddling with guest XML files and virtual DJing with ISO images? Then our latest entry in our Knowledge Series might come handy! Plus it also offers access to further Linux distributions on IBM Z!
Read more on how virt-install can help here.

by Stefan Raspl (noreply@blogger.com) at July 20, 2018 10:59 PM

June 29, 2018

Daniel Berrange

CPU model configuration for QEMU/KVM on x86 hosts

With the various CPU hardware vulnerabilities reported this year, guest CPU configuration is now a security critical task. This blog post contains content I’ve written that is on its way to become part of the QEMU documentation.

QEMU / KVM virtualization supports two ways to configure CPU models

Host passthrough
This passes the host CPU model features, model, stepping, exactly to the guest. Note that KVM may filter out some host CPU model features if they cannot be supported with virtualization. Live migration is unsafe when this mode is used as libvirt / QEMU cannot guarantee a stable CPU is exposed to the guest across hosts. This is the recommended CPU to use, provided live migration is not required.
Named model
QEMU comes with a number of predefined named CPU models, that typically refer to specific generations of hardware released by Intel and AMD. These allow the guest VMs to have a degree of isolation from the host CPU, allowing greater flexibility in live migrating between hosts with differing hardware.

In both cases, it is possible to optionally add or remove individual CPU features, to alter what is presented to the guest by default.

Libvirt supports a third way to configure CPU models known as “Host model”. This uses the QEMU “Named model” feature, automatically picking a CPU model that is similar the host CPU, and then adding extra features to approximate the host model as closely as possible. This does not guarantee the CPU family, stepping, etc will precisely match the host CPU, as they would with “Host passthrough”, but gives much of the benefit of passthrough, while making live migration safe.

Recommendations for KVM CPU model configuration on x86 hosts

The information that follows provides recommendations for configuring CPU models on x86 hosts. The goals are to maximise performance, while protecting guest OS against various CPU hardware flaws, and optionally enabling live migration between hosts with hetergeneous CPU models.

Preferred CPU models for Intel x86 hosts

The following CPU models are preferred for use on Intel hosts. Administrators / applications are recommended to use the CPU model that matches the generation of the host CPUs in use. In a deployment with a mixture of host CPU models between machines, if live migration compatibility is required, use the newest CPU model that is compatible across all desired hosts.

Skylake-Server
Skylake-Server-IBRS
Intel Xeon Processor (Skylake, 2016)
Skylake-Client
Skylake-Client-IBRS
Intel Core Processor (Skylake, 2015)
Broadwell
Broadwell-IBRS
Broadwell-noTSX
Broadwell-noTSX-IBRS
Intel Core Processor (Broadwell, 2014)
Haswell
Haswell-IBRS
Haswell-noTSX
Haswell-noTSX-IBRS
Intel Core Processor (Haswell, 2013)
IvyBridge
IvyBridge-IBRS
Intel Xeon E3-12xx v2 (Ivy Bridge, 2012)
SandyBridge
SandyBridge-IBRS
Intel Xeon E312xx (Sandy Bridge, 2011)
Westmere
Westmere-IBRS
Westmere E56xx/L56xx/X56xx (Nehalem-C, 2010)
Nehalem
Nehalem-IBRS
Intel Core i7 9xx (Nehalem Class Core i7, 2008)
Penryn
Intel Core 2 Duo P9xxx (Penryn Class Core 2, 2007)
Conroe
Intel Celeron_4x0 (Conroe/Merom Class Core 2, 2006)

Important CPU features for Intel x86 hosts

The following are important CPU features that should be used on Intel x86 hosts, when available in the host CPU. Some of them require explicit configuration to enable, as they are not included by default in some, or all, of the named CPU models listed above. In general all of these features are included if using “Host passthrough” or “Host model”.

pcid
Recommended to mitigate the cost of the Meltdown (CVE-2017-5754) fix. Included by default in Haswell, Broadwell & Skylake Intel CPU models. Should be explicitly turned on for Westmere, SandyBridge, and IvyBridge Intel CPU models. Note that some desktop/mobile Westmere CPUs cannot support this feature.
spec-ctrl
Required to enable the Spectre (CVE-2017-5753 and CVE-2017-5715) fix, in cases where retpolines are not sufficient. Included by default in Intel CPU models with -IBRS suffix. Must be explicitly turned on for Intel CPU models without -IBRS suffix. Requires the host CPU microcode to support this feature before it can be used for guest CPUs.
ssbd
Required to enable the CVE-2018-3639 fix. Not included by default in any Intel CPU model. Must be explicitly turned on for all Intel CPU models. Requires the host CPU microcode to support this feature before it can be used for guest CPUs.
pdpe1gb
Recommended to allow guest OS to use 1GB size pages.Not included by default in any Intel CPU model. Should be explicitly turned on for all Intel CPU models. Note that not all CPU hardware will support this feature.

Preferred CPU models for AMD x86 hosts

The following CPU models are preferred for use on Intel hosts. Administrators / applications are recommended to use the CPU model that matches the generation of the host CPUs in use. In a deployment with a mixture of host CPU models between machines, if live migration compatibility is required, use the newest CPU model that is compatible across all desired hosts.

EPYC
EPYC-IBPB
AMD EPYC Processor (2017)
Opteron_G5
AMD Opteron 63xx class CPU (2012)
Opteron_G4
AMD Opteron 62xx class CPU (2011)
Opteron_G3
AMD Opteron 23xx (Gen 3 Class Opteron, 2009)
Opteron_G2
AMD Opteron 22xx (Gen 2 Class Opteron, 2006)
Opteron_G1
AMD Opteron 240 (Gen 1 Class Opteron, 2004)

Important CPU features for AMD x86 hosts

The following are important CPU features that should be used on AMD x86 hosts, when available in the host CPU. Some of them require explicit configuration to enable, as they are not included by default in some, or all, of the named CPU models listed above. In general all of these features are included if using “Host passthrough” or “Host model”.

ibpb
Required to enable the Spectre (CVE-2017-5753 and CVE-2017-5715) fix, in cases where retpolines are not sufficient. Included by default in AMD CPU models with -IBPB suffix. Must be explicitly turned on for AMD CPU models without -IBPB suffix. Requires the host CPU microcode to support this feature before it can be used for guest CPUs.
virt-ssbd
Required to enable the CVE-2018-3639 fix. Not included by default in any AMD CPU model. Must be explicitly turned on for all AMD CPU models. This should be provided to guests, even if amd-ssbd is also provided, for maximum guest compatibility. Note for some QEMU / libvirt versions, this must be force enabled when when using “Host model”, because this is a virtual feature that doesn’t exist in the physical host CPUs.
amd-ssbd
Required to enable the CVE-2018-3639 fix. Not included by default in any AMD CPU model. Must be explicitly turned on for all AMD CPU models. This provides higher performance than virt-ssbd so should be exposed to guests whenever available in the host. virt-ssbd should none the less also be exposed for maximum guest compatability as some kernels only know about virt-ssbd.
amd-no-ssb
Recommended to indicate the host is not vulnerable CVE-2018-3639. Not included by default in any AMD CPU model. Future hardware genarations of CPU will not be vulnerable to CVE-2018-3639, and thus the guest should be told not to enable its mitigations, by exposing amd-no-ssb. This is mutually exclusive with virt-ssbd and amd-ssbd.
pdpe1gb
Recommended to allow guest OS to use 1GB size pages. Not included by default in any AMD CPU model. Should be explicitly turned on for all AMD CPU models. Note that not all CPU hardware will support this feature.

Default x86 CPU models

The default QEMU CPU models are designed such that they can run on all hosts. If an application does not wish to do perform any host compatibility checks before launching guests, the default is guaranteed to work.

The default CPU models will, however, leave the guest OS vulnerable to various CPU hardware flaws, so their use is strongly discouraged. Applications should follow the earlier guidance to setup a better CPU configuration, with host passthrough recommended if live migration is not needed.

qemu32
qemu64
QEMU Virtual CPU version 2.5+ (32 & 64 bit variants). qemu64 is used for x86_64 guests and qemu32 is used for i686 guests, when no -cpu argument is given to QEMU, or no <cpu> is provided in libvirt XML.

Other non-recommended x86 CPUs

The following CPUs models are compatible with most AMD and Intel x86 hosts, but their usage is discouraged, as they expose a very limited featureset, which prevents guests having optimal performance.

kvm32
kvm64
Common KVM processor (32 & 64 bit variants). Legacy models just for historical compatibility with ancient QEMU versions.
486
athlon
phenom
coreduo
core2duo
n270
pentium
pentium2
pentium3
Various very old x86 CPU models, mostly predating the introduction of hardware assisted virtualization, that should thus not be required for running virtual machines.

Syntax for configuring CPU models

The example below illustrate the approach to configuring the various CPU models / features in QEMU and libvirt

QEMU command line

Host passthrough
   $ qemu-system-x86_64 -cpu host

With feature customization:

   $ qemu-system-x86_64 -cpu host,-vmx,...
Named CPU models
   $ qemu-system-x86_64 -cpu Westmere

With feature customization:

   $ qemu-system-x86_64 -cpu Westmere,+pcid,...

Libvirt guest XML

Host passthrough
   <cpu mode='host-passthrough'/>

With feature customization:

   <cpu mode='host-passthrough'>
       <feature name="vmx" policy="disable"/>
       ...
   </cpu>
Host model
   <cpu mode='host-model'/>

With feature customization:

   <cpu mode='host-model'>
       <feature name="vmx" policy="disable"/>
       ...
   </cpu>
Named model
   <cpu mode='custom'>
       <model>Westmere</model>
   </cpu>

With feature customization:

   <cpu mode='custom'>
       <model>Westmere</model>
       <feature name="pcid" policy="require"/>
       ...
   </cpu>

 

by Daniel Berrange at June 29, 2018 12:49 PM

KVM on Z

Getting Started: Alpine Linux Instructions added


Instructions for Alpine Linux were added to the Getting Started with KVM on Z series.
See here for the actual page.

by Stefan Raspl (noreply@blogger.com) at June 29, 2018 09:58 AM

June 21, 2018

QEMU project

QEMU TCG Tests

Ever since I started working on QEMU, a small directory called tests/tcg has been in a perpetually broken state. It contains tests that exercise QEMU’s ability to work across architectures using the power of the Tiny Code Generator. However as these tests needed to be compiled for the guest architectures and not the host architecture—this is known as cross-compiling—most developers never ran them. As the tests were hardly ever built inevitably a certain amount of bit-rot set in.

Cross Compilers

In the old days, cross-compilation setups were almost all hand-crafted affairs which involved building versions of binutils, gcc and a basic libc. If you couldn’t get someone to give you a pre-built tarball, it was something you laboured through once and hopefully never had to touch again. There were even dedicated scripts like crosstool-ng which attempted to make the process of patching and configuring your toolchain easier.

While the distributions have improved their support for cross compilers over the years, there are still plenty of variations in how they are deployed. It is hard for a project like QEMU which has to build on a wide range of operating systems and architectures to seamlessly use any given distributions compiler setup. However for those with cross compilers at hand configure now accepts two additional flags:

--cross-cc-$(ARCH)
--cross-cc-flags-$(ARCH)

With a compiler specified for each guest architecture you want to test the build system can now build and run the tests. For developers that don’t have cross compilers around, they can take advantage of QEMU’s docker images.

Enter Docker Containers

If you work in IT you would be hard pressed not to have noticed the hype around Docker and the concept of containerisation over the last few years. Put simply containers allow you to define a known working set of software that gets run in an isolated environment for a given task. While this has many uses for QEMU it allows us to define build environments that any developer can run without having to mess around with their preferred host setup.

Over the last few years QEMU’s build system has been expanding the number of docker images it supports. Most of this has been in service of our CI testing such as Patchew and Shippable but any developer with a docker setup can run the exact same images. For example if you want to check your patches won’t break when compiled on a 32 bit ARM system you can run:

make docker-test-build@debian-armhf-cross J=n

instead of tracking down a piece of ARM hardware to actually build on. Run make docker in your source tree to see the range of builds and tests it can support.

make check-tcg

With the latest work merged into master we can now take advantage of either hand-configured and Docker-based cross compilers to build test cases for TCG again. To run the TCG tests after you have built QEMU:

make check-tcg

and the build system will build and run all the tests it can for your configured targets.

Rules for tests/tcg

So now we have the infrastructure in place to add new tests what rules need to be followed to add new tests?

Well the first thing to note is currently all the tests are for the linux-user variant of QEMU. This means the tests are all currently user-space tests that have access to the Linux syscall ABI.

Another thing to note is the tests are separate from the rest of the QEMU test infrastructure. To keep things simple they are compiled as standalone “static” binaries. As the cross-compilation setup can be quite rudimentary for some of the rarer architectures we only compile against a standard libc. There is no support for linking to other libraries like for example glib. Thread and maths support is part of glibc so shouldn’t be a problem.

Finally when writing new tests consider if it really is architecture specific or can be added to tests/tcg/multiarch. The multiarch tests are re-built for every supported architecture and should be the default place for anything that tests syscalls or other common parts of the code base.

What’s next

My hope with this work is we can start adding more tests to systematically defend functionality in linux-user. In fact I hope the first port of call to reproducing a crash would be writing a test case that can be added to our growing library of tests.

Another thing that needs sorting out is getting toolchains for all of the less common architectures. The current work relies heavily on the excellent work of the Debian toolchain team in making multiarch aware cross compilers available in their distribution. However QEMU supports a lot more architectures than QEMU, some only as system emulations. In principle supporting them is as easy as adding another docker recipe but it might be these recipes end up having to compile the compilers from source.

The tests/tcg directory still contains a number of source files we don’t build.

The cris and openrisc directories contain user-space tests which just need the support of a toolchain and the relevant Makefile plumbing to be added.

The lm32, mips and xtensa targets have a set of tests that need a system emulator. Aside from adding the compilers as docker images some additional work is needed to handle the differences between plain linux-user tests which can simply return an exit code to getting the results from a qemu-system emulation. Some architectures have semi-hosting support already for this while others report their test status over a simple serial link which will need to be parsed and handled in custom versions of the run-%: rule.

by Alex Bennée at June 21, 2018 02:30 PM

June 10, 2018

KVM on Z

KVM at Share St.Louis 2018


Yes, we will be at SHARE in St.Louis this August!
See our sessions in the Linux and VM track as follows:


by Stefan Raspl (noreply@blogger.com) at June 10, 2018 09:17 PM

May 31, 2018

QEMU project

QEMU’s new -nic command line option

If you used QEMU in the past, you are probably familiar with the -net command line option, which can be used to configure a network connection for the guest, or with with the -netdev option, which configures a network back-end. Yet, QEMU v2.12 introduces a third way to configure NICs, the -nic option.

The ChangeLog of QEMU v2.12 says that -nic can “quickly create a network front-end (emulated NIC) and a host back-end”. But why did QEMU need yet another way to configure the network, and how does it compare with -net and -netdev? To answer these questions, we need to look at the model behind network virtualization in QEMU.

As hinted by the ChangeLog entry, a network interface consists of two separate entities:

  1. The emulated hardware that the guest sees, i.e. the so-called NIC (network interface controller). On systems that support PCI cards, these typically could be an e1000 network card, a rtl8139 network card or a virtio-net device. This entity is also called the “front-end”.

  2. The network back-end on the host side, i.e. the interface that QEMU uses to exchange network packets with the outside (like other QEMU instances or other real hosts in your intranet or the internet). The common host back-ends are the “user” (a.k.a. SLIRP) back-end which provides access to the host’s network via NAT, the “tap” back-end which allows the guest to directly access the host’s network, or the “socket” back-end which can be used to connect multiple QEMU instances to simulate a shared network for their guests.

Based on this, it is already possible to define the most obvious difference between -net, -netdev and -nic: the -net option can create either a front-end or a back-end (and also does other things); -netdev can only create a back-end; while a single occurrence of -nic will create both a front-end and a back-end. But for the non-obvious differences, we also need to have a detailed look at the -net and -netdev options first …

The legacy -net option

QEMU’s initial way of configuring the network for the guest was the -net option. The emulated NIC hardware can be chosen with the -net nic,model=xyz,... parameter, and the host back-end with the -net <backend>,... parameter (e.g. -net user for the SLIRP back-end). However, the emulated NIC and the host back-end are not directly connected. They are rather both connected to an emulated hub (called “vlan” in older versions of QEMU). Therefore, if you start QEMU with -net nic,model=e1000 -net user -net nic,model=virtio -net tap for example, you get a setup where all the front-ends and back-ends are connected together via a hub:

Networking with -net

That means the e1000 NIC also gets the network traffic from the virtio-net NIC and both host back-ends… this is probably not what the users expected; it’s more likely that they wanted two separate networks in the guest, one for each NIC. Because -net always connects its NIC to a hub, you would have to tell QEMU to use two separate hubs, using the “vlan” parameter. For example -net nic,model=e1000,vlan=0 -net user,vlan=0 -net nic,model=virtio,vlan=1 -net tap,vlan=1 moves the virtio-net NIC and the “tap” back-end to a second hub (with ID #1).

Please note that the “vlan” parameter will be dropped in QEMU v3.0 since the term was rather confusing (it’s not related to IEEE 802.1Q for example) and caused a lot of misconfigurations in the past. Additional hubs can still be instantiated with -netdev (or -nic) and the special “hubport” back-end. The -net option itself will still stay around since it is still useful if you only want to use one front-end and one back-end together, or if you want to tunnel the traffic of multiple NICs through one back-end only (something like -net nic,model=e1000 -net nic,model=virtio -net l2tpv3,... for example).

The modern -netdev option

Beside the confusing “vlan” parameter of the -net option, there is one more major drawback with -net: the emulated hub between the NIC and the back-end gets in the way when the NIC front-end has to work closely together with the host back-end. For example, vhost acceleration cannot be enabled if you create a virtio-net device with -net nic,model=virtio.

To configure a network connection where the emulated NIC is directly connected to a host network back-end, without a hub in between, the well-established solution is to use the -netdev option for the back-end, together with -device for the front-end. Assuming that you want to configure the same devices as in the -net example above, you could use -netdev user,id=n1 -device e1000,netdev=n1 -netdev tap,id=n2 -device virtio-net,netdev=n2. This will give you straight 1:1 connections between the NICs and the host back-ends:

Networking with -netdev

Note that you can also still connect the devices to a hub with the special -netdev hubport back-end, but in most of the normal use cases, the use of a hub is not required anymore.

Now while -netdev together with -device provide a very flexible and extensive way to configure a network connection, there are still two drawbacks with this option pair which prevented us from deprecating the legacy -net option completely:

  1. The -device option can only be used for pluggable NICs. Boards (e.g. embedded boards) which feature an on-board NIC cannot be configured with -device yet, so -net nic,netdev=<id> must be used here instead.

  2. In some cases, the -net option is easier to use (less to type). For example, assuming you want to set up a “tap” network connection and your default scripts /etc/qemu-ifup and -down are already in place, it’s enough to type -net nic -net tap to start your guest. To do the same with -netdev, you always have to specify an ID here, too, for example like this: -netdev tap,id=n1 -device e1000,netdev=n1.

The new -nic option

Looking at the disadvantages listed above, users could benefit from a convenience option that:

  • is easier to use (and shorter to type) than -netdev <backend>,id=<id> -device <dev>,netdev=<id>
  • can be used to configure on-board / non-pluggable NICs, too
  • does not place a hub between the NIC and the host back-end.

This is where the new -nic option kicks in: this option can be used to configure both the guest’s NIC hardware and the host back-end in one go. For example, instead of -netdev tap,id=n1 -device e1000,netdev=n1 you can simply type -nic tap,model=e1000. If you don’t care about the exact NIC model type, you can even omit the model=... parameter and type -nic tap. This is even shorter and more convenient than the previous shortest way of typing -net nic -net tap. To get a list of NIC models that you can use with this option, you can simply run QEMU with -nic model=help.

Beside being easier to use, the -nic option can be used to configure on-board NICs, too (just like the -net option). For machines that have on-board NICs, the first -nic option configures the first on-board NIC, the second -nic option configures the second on-board NIC, and so forth.

Conclusion

  • The new -nic option gives you an easy and quick way to configure the networking of your guest.
  • For more detailed configuration, e.g. when you need to tweak the details of the emulated NIC hardware, you can use -device together with -netdev.
  • The -net option should be avoided these days unless you really want to configure a set-up with a hub between the front-ends and back-ends.

by Thomas Huth at May 31, 2018 07:50 AM

May 24, 2018

Gerd Hoffmann

Fedora 28 images uploaded

Fedora 28 was released a few weeks ago. New Fedora 28 images are finally uploaded now.

There are no raspberry pi images any more. Just use the standard Fedora arm images, they work just fine for both arm (rpi 2) and aarch64 (rpi 3).

There efi images are for qemu. Some use grub2 as bootloader, some use systemd-boot. The filename indicates which uses which. The efi images can also be booted as container, using systemd-nspawn --boot --image <file>, but you have to convert them to raw first as systemd-nspawn can't handle qcow2.

As usual the images don't have a root password. You have to set one using virt-customize -a --root-password "password:<secret>", otherwise you can't login after boot.

The images have been created with imagefish.

by Gerd Hoffmann at May 24, 2018 10:00 PM

May 17, 2018

KVM on Z

Knowledge Series: Managing KVM on IBM Z with oVirt

See here for a new entry in our "knowledge" series, providing step-by-step instructions on how to include IBM Z hosts in an oVirt data center.

by Stefan Raspl (noreply@blogger.com) at May 17, 2018 02:51 PM

May 14, 2018

KVM on Z

Getting Started: RHEL 7.5 Instructions added

Instructions for RHEL7.5 were added to the Getting Started with KVM on Z series.
See here for the actual page.

by Stefan Raspl (noreply@blogger.com) at May 14, 2018 12:40 PM

Getting Started: Instructions for Ubuntu 16.04 to 17.10 added

Instructions for Ubuntu 16.04 to 17.10 were added to the Getting Started with KVM on Z series.
See here for the entry page.

by Stefan Raspl (noreply@blogger.com) at May 14, 2018 12:34 PM

May 13, 2018

Gerd Hoffmann

Welcome to Jekyll!

Switched my blog from wordpress to jekyll.

Automatic import handled most of content. Didn’t cover syntax highlight (extra wordpress plugin) though, so that needed some manual invention. Also copying over the (few) images was a manual process.

Configuration isn’t imported automatically, but that is just editing a few lines in _config.yml. Permalinks can be configured to be compatible to wordpress without much trouble, so all the article links stay valid.

My blog is stored in git now. Everything is just static pages. No database needed. No user data stored anywhere.

I’m pretty pleased with the result.

by Gerd Hoffmann at May 13, 2018 10:00 PM

May 04, 2018

KVM on Z

Ubuntu 18.04 released

Ubuntu Server 18.04 LTS is out! Support for IBM Z is available here.
It ships
As obvious from these package versions, support for IBM z14 is readily in place.

Since this is a so-called LTS (Long Term Support) release providing approx. 5 years of support (in contrast to the usual 9 months of non-LTS releases), it is of particular interest to Ubuntu users interested in a stable environment for production deployments.

by Stefan Raspl (noreply@blogger.com) at May 04, 2018 08:47 PM

May 03, 2018

Cornelia Huck

A vfio-ccw primer

While basic support for vfio-ccw has been included in Linux and QEMU for some time, work has recently started to ramp up again and it seems like a good time to give some basic overview.

Why vfio-ccw?

Historically, QEMU on s390x presented paravirtualized virtio devices to the guest; first, via a protocol inspired by lguest, later, as emulated channel devices. This satisfies most needs (you get block devices, network devices, a console device, and lots more), but the device types are different from those found on LPARs or z/VM guests, and you may have a need to use e.g. a DASD directly.

For that reason, we want to do the same thing as on other platforms: pass a host device to the guest directly via vfio.

How does this work?

vfio-ccw is using the vfio mediated device framework; see the kernel documentation for an overview.

In a nutshell: The subchannel to be passed to the guest is unbound from its normal host driver (in this case, the I/O subchannel driver) and bound to the vfio-ccw driver. Any I/O request is intercepted and executed on the real device, and interrupts from the real device are relayed back to the guest.

Why subchannels and not ccw devices?

The initial attempt to implement this actually worked at the ccw device level. However, this means that the Linux common I/O layer in the host will perform various actions like handling of channel paths - which may interfere with what the guest is trying to do. Therefore, it seemed like a better idea to keep out of the way as much as possible and just implement a minimal subchannel driver that does not do much beyond what the guest actually triggered itself.

How is an actual I/O request processed?

When the guest is ready to use a channel device, it will issue I/O requests via channel programs (see here for an explanation on how that works and what things like scsw and orb mean.) The channel I/O instructions are mandatory SIE intercepts, so the host will get control for any START SUBCHANNEL the guest issues. QEMU is in charge of interpretation of channel I/O instructions, so it will process the ssch as a request to a pass-through device.

All channel I/O instructions are privileged, which means that the host kernel now needs to get involved again. QEMU does so by writing to an I/O region: the scsw (which contains, amongst other things, the fctl field specifying the start function) and the orb (pointing to the channel program). The host kernel driver now has enough information to actually issue the request on the real device after translating the ccw chain and its addresses to host addresses (involving pinning, idals and other things I will not explain here for brevity.)

After the device has processed the I/O request, it will make the subchannel status pending and generate an I/O interrupt. The host kernel driver collects the state and makes it available via the same I/O region (the IRB field), and afterwards triggers QEMU via an eventfd. QEMU now has all information needed to update its internal structures for the devices so that the guest can obtain the information related to the I/O request.

Isn't that all a bit too synchronous?

Yes, it is. Channel I/O is supposed to be asynchronous (give the device an I/O request, collect status later), but our implementation isn't yet. Why? Short answer: It is hard, and we wanted something to get us going. But this is on the list of things to be worked on.

Where is the IOMMU for this?

Due to the way channel programs work, we don't have a real IOMMU.

Does this cover everything supported by the architecture?

Not yet. Channel program wise, we support the format Linux drivers use. Also, we're emulating things like HALT SUBCHANNEL and CLEAR SUBCHANNEL in QEMU, while they really should be handed through to the device (support for this is in the works).

On the whole, you should be able to pass an ECKD DASD to a Linux guest without (known) issues.

How can I try this out?

Recent QEMU and Linux versions should have everything you need in the host; see this wiki entry for details. As a guest, any guest that can run under KVM should be fine.

What's the deal with that "unrestricted cssids" thing?

If you look at this older article, you'll notice the 'fe' value for the cssid of virtio devices (with the promise to explain it later... which I sadly never did). The basic idea at the time was to put 'virtual' devices like virtio and 'non-virtual' devices like vfio-ccw into different channel subsystem images, so that e.g. channel paths (which are per channel subsystem image) don't clash. In other words, 'virtual' and 'non-virtual' devices (and channel paths) would have different cssids (the first part of their identifiers).

This sounded like a good idea at the time; however, there's a catch: A guest operating system will by default only see the devices in the default channel subsystem image. To see all of them, it needs to explicitly enable the Multiple Channel Subsystems Extended (MCSS-E) feature - and I do not know of any operating system that has done so as of today (not very surprising, as QEMU is the only implementation of MCSS-E I'm aware of).

To work around this, we originally introduced the 's390-squash-mcss' parameter to QEMU, which would put all devices into the default channel subsystem image. But as MCSS-E support is unlikely to arrive in any guest operating system anytime soon, we agreed to rather drop the restriction of virtual devices being in css fe and non-virtual devices everywhere else (since QEMU 2.12).

What are the plans for the future?

Several things are already actively worked on, while others may come up later.
  • Intial libvirt support for vfio-ccw has been posted here.
  • Reworking the Linux host driver to make things more asynchronous and to support halt/clear is in progress.
  • Improvements in channel path handling (for example, to enable the guest to see path availability changes) are also in progress. We may need to consider things like dasd reserve/release as well.

by Cornelia Huck (noreply@blogger.com) at May 03, 2018 01:53 PM

KVM on Z

QEMU v2.12 released

QEMU v2.12 is out. Here are the highlights from a KVM on Z perspective:
  • Added support for an interactive bootloader. As always, we strongly recommend to use the existing support in libvirt.
    To enable/disable, add the following element to your guest definition:

       <os>
         <bootmenu enable=’yes|no’ timeout=’n’/>
         ...
       </os>


    The timeout parameter specifies a timeout in milliseconds after which the default entry is chosen.
    Alternatively, set attribute loadparm to PROMPT to enable the boot menu without timeout in the respective disk's element:

       <disk ...>
         <boot order=’1’ loadparm=’PROMPT’/>
         ...
       </disk>


    Example:
    To enable the boot menu for 32 seconds for a guest using a libvirt
    domain XML format follows:

       <domain type=’kvm’>
         <os>

           <bootmenu enable=’yes’ timeout=’32000’/>

           ...
         </os>
  • Exposure of guest crash information: When a guest is started using libvirt and crashes due to disabled wait, wrong interrupts or a program check loop, libvirt will print the information to the guest’s log, typically located at /var/log/libvirt/qemu.
    E.g. a crash due to a disabled wait results in an entry as follows:

       s390: psw-mask=’0xXXXXXXXXXXXXXXXX’, psw-addr=’0xXXXXXXXXXX
             XXXXXX’,crash reason: disabled wait


    Requires  libvirt v4.2.
  • Added support for guests with more than 8TB of memory.

by Stefan Raspl (noreply@blogger.com) at May 03, 2018 09:49 AM

May 02, 2018

Richard Jones

Dockerfile for running libguestfs, virt-tools and virt-v2v

FROM fedora
RUN dnf install -y libguestfs libguestfs-tools-c virt-v2v \
                   libvirt-daemon libvirt-daemon-config-network

# https://bugzilla.redhat.com/show_bug.cgi?id=1045069
RUN useradd -ms /bin/bash v2v
USER v2v
WORKDIR /home/v2v

# This is required for virt-v2v because neither systemd nor
# root libvirtd runs, and therefore there is no virbr0, and
# therefore virt-v2v cannot set up the network through libvirt.
ENV LIBGUESTFS_BACKEND direct

by rich at May 02, 2018 03:35 PM

April 25, 2018

Peter Maydell

Debian on QEMU’s Raspberry Pi 3 model

For the QEMU 2.12 release we added support for a model of the Raspberry Pi 3 board (thanks to everybody involved in developing and upstreaming that code). The model is sufficient to boot a Debian image, so I wanted to write up how to do that.

Things to know before you start

Before I start, some warnings about the current state of the QEMU emulation of this board:

  • We don’t emulate the boot rom, so QEMU will not automatically boot from an SD card image. You need to manually extract the kernel, initrd and device tree blob from the SD image first. I’ll talk about how to do that below.
  • We don’t have an emulation of the BCM2835 USB controller. This means that there is no networking support, because on the raspi devices the ethernet hangs off the USB controller.
  • Our raspi3 model will only boot AArch64 (64-bit) kernels. If you want to boot a 32-bit kernel you should use the “raspi2” board model.
  • The QEMU model is missing models of some devices, and others are guesswork due to a lack of documentation of the hardware; so although the kernel I tested here will boot, it’s quite possible that other kernels may fail.

You’ll need the following things on your host system:

  • QEMU version 2.12 or better
  • libguestfs (on Debian and Ubuntu, install the libguestfs-tools package)

Getting the image

I’m using the unofficial preview images described on the Debian wiki.

$ wget https://people.debian.org/~stapelberg/raspberrypi3/2018-01-08/2018-01-08-raspberry-pi-3-buster-PREVIEW.img.xz
$ xz -d 2018-01-08-raspberry-pi-3-buster-PREVIEW.img.xz

Extracting the guest boot partition contents

I use libguestfs to extract files from the guest SD card image. There are other ways to do this but I think libguestfs is the easiest to use. First, check that libguestfs is working on your system:

$ virt-filesystems -a 2018-01-08-raspberry-pi-3-buster-PREVIEW.img
/dev/sda1
/dev/sda2

If this doesn’t work, then you should sort that out first. A couple of common reasons I’ve seen:

  • if you’re on Ubuntu then your kernels in /boot are installed not-world-readable; you can fix this with sudo chmod 644 /boot/vmlinuz*
  • if you’re running Virtualbox on the same host it will interfere with libguestfs’s attempt to run KVM; you can fix that by exiting Virtualbox

Now you can ask libguestfs to extract the contents of the boot partition:

$ mkdir bootpart
$ guestfish --ro -a 2018-01-08-raspberry-pi-3-buster-PREVIEW.img -m /dev/sda1

Then at the guestfish prompt type:

copy-out / bootpart/
quit

This should have copied various files into the bootpart/ subdirectory.

Run the guest image

You should now be able to run the guest image:

$ qemu-system-aarch64 \
  -kernel bootpart/vmlinuz-4.14.0-3-arm64 \
  -initrd bootpart/initrd.img-4.14.0-3-arm64 \
  -dtb bootpart/bcm2837-rpi-3-b.dtb \
  -M raspi3 -m 1024 \
  -serial stdio \
  -append "rw earlycon=pl011,0x3f201000 console=ttyAMA0 loglevel=8 root=/dev/mmcblk0p2 fsck.repair=yes net.ifnames=0 rootwait memtest=1" \
  -drive file=2018-01-08-raspberry-pi-3-buster-PREVIEW.img,format=raw,if=sd

and have it boot to a login prompt (the root password for this Debian image is “raspberry”).

There will be several WARNING logs and backtraces printed by the kernel as it starts; these will have a backtrace like this:

[  145.157957] [] uart_get_baud_rate+0xe4/0x188
[  145.158349] [] pl011_set_termios+0x60/0x348
[  145.158733] [] uart_change_speed.isra.3+0x50/0x130
[  145.159147] [] uart_set_termios+0x7c/0x180
[  145.159570] [] tty_set_termios+0x168/0x200
[  145.159976] [] set_termios+0x2b0/0x338
[  145.160647] [] tty_mode_ioctl+0x358/0x590
[  145.161127] [] n_tty_ioctl_helper+0x54/0x168
[  145.161521] [] n_tty_ioctl+0xd4/0x1a0
[  145.161883] [] tty_ioctl+0x150/0xac0
[  145.162255] [] do_vfs_ioctl+0xc4/0x768
[  145.162620] [] SyS_ioctl+0x8c/0xa8

These are ugly but harmless. (The underlying cause is that QEMU doesn’t implement the undocumented ‘cprman’ clock control hardware, and so Linux thinks that the UART is running at a zero baud rate and complains.)

by pm215 at April 25, 2018 08:07 AM

QEMU project

QEMU version 2.12.0 released

We’d like to announce the availability of the QEMU 2.12.0 release. This release contains 2700+ commits from 204 authors.

You can grab the tarball from our download page. The full list of changes are available in the Wiki.

Highlights include:

  • Spectre/Meltdown mitigation support for x86/pseries/s390 guests. For more details see: https://www.qemu.org/2018/02/14/qemu-2-11-1-and-spectre-update/
  • Numerous block support improvements, including support for directly interacting with userspace NVMe driver, and general improvements to NBD server/client including more efficient reads of sparse files
  • Networking support for VMWare paravirtualized RDMA device (RDMA HCA and Soft-RoCE supported), CAN bus support via Linux SocketCAN and SJA1000-based PCI interfaces, and general improvements for dual-stack IPv4/IPv6 environments
  • GUI security/bug fixes, dmabufs support for GTK/Spice.
  • Better IPMI support for Platform Events and SEL logging in internal BMC emulation
  • SMBIOS support for “OEM Strings”, which can be used for automating guest image activation without relying on network-based querying
  • Disk cache information via virtio-balloon
  • ARM: AArch64 new instructions for FCMA/RDM and SIMD/FP16/crypto/complex number extensions
  • ARM: initial support for Raspberry Pi 3 machine type
  • ARM: Corex-M33/Armv8-M emulation via new mps2-an505 board and many other improvements for M profile emulation
  • HPPA: support for full machine emulation (hppa-softmmu)
  • PowerPC: PPC4xx emulation improvements, including I2C bus support
  • PowerPC: new Sam460ex machine type
  • PowerPC: significant TCG performance improvements
  • PowerPC: pseries: support for Spectre/Meltdown mitigations
  • RISC-V: new RISC-V target via “spike_v1.9.1”, “spike_v1.10”, and “virt” machine types
  • s390: non-virtual devices no longer require dedicated channel subsystem and guest support for multiple CSSs
  • s390: general PCI improvements, MSI-X support for virtio-pci devices
  • s390: improved TCG emulation support
  • s390: KVM support for systems larger than 7.999TB
  • SPARC: sun4u power device emulation
  • SPARC: improved trace-event support and emulation/debug fixes
  • Tricore: new instruction variants for JEQ/JNE and 64-bit MOV
  • x86: Intel IOMMU support for 48-bit addresses
  • Xtensa: backend now uses libisa for instruction decoding/disassebly
  • Xtensa: multi-threaded TCG support and noMMU configuration variants
  • and lots more…

Thank you to everyone involved!

April 25, 2018 03:30 AM

April 24, 2018

Yoni Bettan

VirtIO

My name is Yonathan Bettan and I work at RedHat in the virtualization KVM team.

This blog has 2 main purposes, the first is to give you an idea of what VirtIO is and why should we use it when the second is to serve as a step-by-step guide describing how to write a VirtIO device from zero with some code examples.

In addition, I will write a SIMPLE VirtIO example and documented device that you will be able to find in the Qemu project. Clone https://github.com/qemu/qemu.git repo for the full project.

Motivation

Let us start with a NIC (Network Interface Controller) as an example to understand better virtualization. A NIC is responsible for transmitting and receiving packets through the network. The received packets are written into memory and the packets to be sent are copied from the memory to the NIC for transmission with CPU intervention or without (DMA). When the NIC finishes a specific task it sends an interrupt to the OS.

If we want a physical machine to have a network connection we will have to buy a NIC and with the same logic if we want a virtual machine (VM) to have a network connection we will need to supply a virtual NIC.

One possible solution is to make the hypervisor fully emulate the NIC according to its spec – Virtual device.

When a package is sent on the guest OS it is sent to virtual NIC (vNIC). For each byte of data we will get:

A virtual interrupt will be generated ==> a VMexit will occur ==> the hypervisor will send the data to the physical NIC (pNIC) ==> the pNIC will interrupt the host OS when it finishes the transaction ==> the hypervisor will finally interrupt the guest OS to notify the transaction is finish.

We can see here a function of a NIC driver which its purpose is to read data from the device into a buffer.

NOTE: even if we use MMIO instead of PIO we still have a limitation on the MMIO write size and each MMIO write generates a VMexit so we still may have multiply VMexits.

The main benefits, in this case, is that the OS stays unchanged because the virtual device acts like a physical device so the already-written NIC driver does the job correctly on the emulated device. From the other hand it works slowly since each access to the vNIC generate a VMexit for each byte (as pNIC would have done) but in the reality this is not a real device (only code variables) so we don’t need to VMexit on each byte, instead we can just write the whole buffer and generate a single VMexit.

Another possible solution is to give the guest direct access to the pNIC – Device assignment.

When a package is sent on the guest OS it is sent to the vNIC. For each byte of data we will get:

The data is sent directly to the pNIC without the hypervisor intervention ==> the pNIC will interrupt the guest OS directly when it finishes the transaction.

Now we have max performance that the HW can supply but we need separate pNIC for each guest and another one to the host, this becomes expensive.

The tradeoff between Virtual devices and Device assignment is Paravirtual devices and its protocol VirtIO.

This case is quite similar to the Virtual device case except for 2 facts, the first is that the emulated device don’t pretend to act like a real device (no need to send virtual interrupts for each byte written but only a single virtual interrupt once the whole buffer is written) and the second is that we now have to write a new driver since the original driver no longer feets with the emulated HW.

We can now see the same function of a NIC  new driver.

Another reason to use VirtIO devices is that Linux support multiple hypervisors such as KVM, Xen, VMWare etc. Therefore we have drivers for each one of them. VirtIO provides Driver-unification, a uniform ABI for all those hypervisors. An ABI is an interface at the compiler level and describes how do parameters are passed to function (register\stack), how interrupts are propagated etc. VirtIO also provides device check and configuration.

Virtualization VS Paravirtualization

Virtualization Paravirtualization
The guest is unaware that it is being virtualized. The guest is aware that it is running on a hypervisor (and not on real HW).
No changes are requested on the OS. Requires modification of the OS.
The hypervisor must emulate device HW, this will lead to low performance. The guest and the hypervisor can work cooperatively to make this emulation efficient.

 

Sources

 

by Yonathan Bettan at April 24, 2018 12:26 PM

April 17, 2018

Fabian Deutsch

Running minikube v0.26.0 with CRIO and KVM nesting enabled by default

Probably not worth a post, as it’s mentioned in the readme, but CRIO was recently updated in minikube v0.26.0 which now makes it work like a charm.

When updating to 0.26 make sure to update the minikube binary, but also the docker-machine-driver-kvm2 binary.

Like in the past it is possible to switch to CRIO using

$ minikube start --container-runtime=cri-o
Starting local Kubernetes v1.10.0 cluster...
Starting VM...
Getting VM IP address...
Moving files into cluster...
Setting up certs...
Connecting to cluster...
Setting up kubeconfig...
Starting cluster components...
Kubectl is now configured to use the cluster.
Loading cached images from config file.
$

However, my favorit launch line is:

minikube start --container-runtime=cri-o --network-plugin=cni --bootstrapper=kubeadm --vm-driver=kvm2

Which will use CRIO as the container runtime, CNI for networking, kubeadm for bringing up kube inside a KVM VM.

April 17, 2018 08:05 AM

April 12, 2018

KVM on Z

White Paper: Exploiting HiperSockets in a KVM Environment Using IP Routing with Linux on Z

Our performance group has published a new white paper titled "Exploiting HiperSockets in a KVM Environment Using IP Routing with Linux on Z".
Abstract:
"The IBM Z platforms provide the HiperSockets technology feature for high-speed communications. This paper documents how to set up and configure KVM virtual machines to use HiperSockets with IP routing capabilities of the TCP/IP stack.
It provides a Network Performance comparison between various network configurations and illustrates how HiperSockets can achieve greater performance for many workload types, across a wide range of data-flow patterns, compared with using an OSA 10GbE card.
"
This white paper is available as .pdf and .html.

by Stefan Raspl (noreply@blogger.com) at April 12, 2018 04:25 PM

April 11, 2018

KVM on Z

RHEL 7.5 with support for KVM on Z available

Red Hat Enterprise Linux 7.5 is out. From the release notes, available here:
Availability across multiple architectures
To further support customer choice in computing architecture, Red Hat Enterprise Linux 7.5 is simultaneously available across all supported architectures, including x86, IBM Power, IBM z Systems, and 64-bit Arm.
Support for IBM Z is available through the kernel-alt package, as indicated earlier here, which provides Linux kernel 4.14. QEMU ships v2.10 via package qemu-kvm-ma, and libvirt is updated to v3.9.0 for all platforms.
Thereby, all IBM z14 features as previously listed here are available.
Check these instructions on how to get started. 

by Stefan Raspl (noreply@blogger.com) at April 11, 2018 04:17 PM

April 09, 2018

Gerd Hoffmann

vgpu display support finally merged upstream

It took more than a year from the first working patches to the upstream merge. But now it's finally done. The linux kernel 4.16 (released on easter weekend) has the kernel-side code needed. The qemu code has been merged too (for gtk and spice user interfaces) and will be in the upcoming 2.12 release which is in code freeze right now. The 2.12 release candidates already have the code, so you can grab one if you don't want wait for the final release to play with this.

The vgpu code in the intel driver is off by default and must be enabled via module option. And, while being at it, also suggest to load the kvmgt module. So I've dropped a config file with these lines ...

options i915 enable_gvt=1
softdep i915 pre: kvmgt

... into /etc/modprobe.d/. For some reason dracut didn't pick the changes up even after regenerating the initrd. Because of that I've blacklisted the intel driver (rd.driver.blacklist=i915 on the kernel command line) so the driver gets loaded later, after mounting the root filesystem, and modprobe actually sets the parameter.

With that in place you should have a /sys/class/mdev_bus directory with the intel gpu in there. You can create vgpu devices now. Check the mediated device documentation for details.

One final thing to take care of: Currently using gvt mlocks all guest memory. For that work the mlock limit (ulimit -l) must be high enough, otherwise the vgpu will not work correctly and you'll see a scrambled display. Limit can be configured in /etc/security/limits.conf.

Now lets use our new vgpu with qemu:

qemu-system-x86_64 \
     -enable-kvm \
     -m 1G \
     -nodefaults \
     -M graphics=off \
     -serial stdio \
     -display gtk,gl=on \
     -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/UUID,display=on \
     -cdrom /vmdisk/iso/Fedora-Workstation-Live-x86_64-27-1.6.iso

Details on the non-obvious qemu switches:

-nodefaults
Do not create default devices (such as vga and nic).
-M graphics=off
Hint for the firmware that the guest runs without a graphical display. This enables serial console support in seabios. We use this here because the vgpu has no firmware support (i.e. no vgabios), therefore nothing is visible on the display until the i915 kernel module loads.
-display gtk,gl=on
Use gtk display, enable opengl.
-device vfio-pci,sysfsdev=/sys/bus/mdev/devices/UUID,display=on
Add the vgpu to the guest, enable the display. Of course you have to replace UUID with your device.

libvirt support is still being worked on. Most bits are there, but some little details are missing. For example there is no way (yet) to tell libvirt the guest doesn't need an emulated vga device, so you'll end up with two spice windows, one for the emulated vga and one for the vgpu. Other than that things are working pretty straight forward. You need spice with opengl support enabled:

<graphics type='spice'>
  <listen type='none'/>
  <gl enable='yes'/>
</graphics>

And the vgpu must be added of course:

<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci'>
  <source>
    <address uuid='UUID'/>
  </source>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</hostdev>

Then you can start the domain. Use "virt-viewer --attach guest" to connect to the guest. Note that guest and virt-viewer must run on the same machine, sending the vgpu display to a remote machine does not yet work.

by Gerd Hoffmann at April 09, 2018 11:17 AM

March 26, 2018

Cornelia Huck

s390x changes in QEMU 2.12

As QEMU is now in hard freeze for 2.12 (with the final release expected in mid/late April), now is a good point in time to summarize some of the changes that made it into QEMU 2.12 for s390x.

I/O Devices

  • Channel I/O: Any device can now be put into any channel subsystem image, regardless whether it is a virtual device (like virtio-ccw) or a device passed through via vfio-ccw. This obsoletes the s390-squash-mcss option (which was needed to explicitly squash vfio-ccw devices into the default channel subsystem image in order to make it visible to guests not enabling MCSS-E).
  • PCI: Fixes and refactoring, including handling of subregions. This enables usage of virtio-pci devices on s390x (although only if MSI-X is enabled, as s390x depends on it.) Previously, you could add virtio-pci devices on s390x, but they were not usable.
    For more information about PCI, see this blog entry.

Booting and s390-ccw bios

  • Support for an interactive boot menu. Note that this is a bit different than on other architectures (although it hooks into the same infrastructure). The boot menu is written on the (virtual) disk via the 'zipl' program, and these entries need to be parsed and displayed via SCLP.

System Emulation

  • KVM: In case you were short on memory before: You can now run guests with 8 TB or more.
  • KVM: Support for the bpb and ppa15 CPU features (for spectre mitigation). These have been backported to 2.11.1 as well.
  • TCG: Lots of improvements: Implementation of missing instructions, full (non-experimental) SMP support.
  • TCG: Improvements in handling of the STSI instruction (you can look at some information obtained that way via /proc/sysinfo.) Note that a TCG guest reports itself as a KVM guest, rather than an LPAR: In many ways, a TCG guest is closer to KVM, and reporting itself as an LPAR makes the Linux guest code choose an undesired target for its console output by default.
  • TCG: Wire up the zPCI instructions; you can now use virtio-pci devices under TCG.
  • CPU models: Switch the 'qemu' model to a stripped-down z12, adding all features required by kernels on recent distributions. This means that you can now run recent distributions (Fedora 26/27, Ubuntu 18.04, ...) under TCG. Older distributions may not work (older kernels required some features not implemented under TCG), unless they were built for a z900 like Debian stable.

Miscellaneous

  • Support for memory hotplug via SCLP has been removed. This was an odd interface: Unlike as on other architectures, the guest could enable 'standby' memory if it had been supplied. Another problem was that this never worked with migration. Old command lines will continue to work, but no 'standby' memory will be available to the guest any more.
    Memory hotplug on s390x will probably come back in the future with an interface that matches better what is done elsewhere, likely via some paravirtualized interface. Support for the SCLP interface might come back in the future as well, implemented in an architecture-specific way that does not try to look like memory hotplug elsewhere.
  • And of course, the usual fixes, cleanups and other improvements.

by Cornelia Huck (noreply@blogger.com) at March 26, 2018 06:14 PM

KVM on Z

SLES12 SP3 Updates


SLES12SP3, released late last year, received a couple of mostly performance and security-related updates in support of IBM z14 and LinuxONE through the maintenance web updates.
In particular:

    by Stefan Raspl (noreply@blogger.com) at March 26, 2018 09:19 AM

    Powered by Planet!
    Last updated: August 18, 2018 04:05 AM
    Powered by OpenShift Online