Blogging about open source virtualization

News from QEMU, KVM, libvirt, libguestfs, virt-manager and related tools

Subscriptions

Planet Feeds

July 20, 2018

KVM on Z

Knowledge: Use virt-install for KVM Guest Installations

Tired of fiddling with guest XML files and virtual DJing with ISO images? Then our latest entry in our Knowledge Series might come handy! Plus it also offers access to further Linux distributions on IBM Z!
Read more on how virt-install can help here.

by Stefan Raspl (noreply@blogger.com) at July 20, 2018 09:09 PM

June 29, 2018

Daniel Berrange

CPU model configuration for QEMU/KVM on x86 hosts

With the various CPU hardware vulnerabilities reported this year, guest CPU configuration is now a security critical task. This blog post contains content I’ve written that is on its way to become part of the QEMU documentation.

QEMU / KVM virtualization supports two ways to configure CPU models

Host passthrough
This passes the host CPU model features, model, stepping, exactly to the guest. Note that KVM may filter out some host CPU model features if they cannot be supported with virtualization. Live migration is unsafe when this mode is used as libvirt / QEMU cannot guarantee a stable CPU is exposed to the guest across hosts. This is the recommended CPU to use, provided live migration is not required.
Named model
QEMU comes with a number of predefined named CPU models, that typically refer to specific generations of hardware released by Intel and AMD. These allow the guest VMs to have a degree of isolation from the host CPU, allowing greater flexibility in live migrating between hosts with differing hardware.

In both cases, it is possible to optionally add or remove individual CPU features, to alter what is presented to the guest by default.

Libvirt supports a third way to configure CPU models known as “Host model”. This uses the QEMU “Named model” feature, automatically picking a CPU model that is similar the host CPU, and then adding extra features to approximate the host model as closely as possible. This does not guarantee the CPU family, stepping, etc will precisely match the host CPU, as they would with “Host passthrough”, but gives much of the benefit of passthrough, while making live migration safe.

Recommendations for KVM CPU model configuration on x86 hosts

The information that follows provides recommendations for configuring CPU models on x86 hosts. The goals are to maximise performance, while protecting guest OS against various CPU hardware flaws, and optionally enabling live migration between hosts with hetergeneous CPU models.

Preferred CPU models for Intel x86 hosts

The following CPU models are preferred for use on Intel hosts. Administrators / applications are recommended to use the CPU model that matches the generation of the host CPUs in use. In a deployment with a mixture of host CPU models between machines, if live migration compatibility is required, use the newest CPU model that is compatible across all desired hosts.

Skylake-Server
Skylake-Server-IBRS
Intel Xeon Processor (Skylake, 2016)
Skylake-Client
Skylake-Client-IBRS
Intel Core Processor (Skylake, 2015)
Broadwell
Broadwell-IBRS
Broadwell-noTSX
Broadwell-noTSX-IBRS
Intel Core Processor (Broadwell, 2014)
Haswell
Haswell-IBRS
Haswell-noTSX
Haswell-noTSX-IBRS
Intel Core Processor (Haswell, 2013)
IvyBridge
IvyBridge-IBRS
Intel Xeon E3-12xx v2 (Ivy Bridge, 2012)
SandyBridge
SandyBridge-IBRS
Intel Xeon E312xx (Sandy Bridge, 2011)
Westmere
Westmere-IBRS
Westmere E56xx/L56xx/X56xx (Nehalem-C, 2010)
Nehalem
Nehalem-IBRS
Intel Core i7 9xx (Nehalem Class Core i7, 2008)
Penryn
Intel Core 2 Duo P9xxx (Penryn Class Core 2, 2007)
Conroe
Intel Celeron_4x0 (Conroe/Merom Class Core 2, 2006)

Important CPU features for Intel x86 hosts

The following are important CPU features that should be used on Intel x86 hosts, when available in the host CPU. Some of them require explicit configuration to enable, as they are not included by default in some, or all, of the named CPU models listed above. In general all of these features are included if using “Host passthrough” or “Host model”.

pcid
Recommended to mitigate the cost of the Meltdown (CVE-2017-5754) fix. Included by default in Haswell, Broadwell & Skylake Intel CPU models. Should be explicitly turned on for Westmere, SandyBridge, and IvyBridge Intel CPU models. Note that some desktop/mobile Westmere CPUs cannot support this feature.
spec-ctrl
Required to enable the Spectre (CVE-2017-5753 and CVE-2017-5715) fix, in cases where retpolines are not sufficient. Included by default in Intel CPU models with -IBRS suffix. Must be explicitly turned on for Intel CPU models without -IBRS suffix. Requires the host CPU microcode to support this feature before it can be used for guest CPUs.
ssbd
Required to enable the CVE-2018-3639 fix. Not included by default in any Intel CPU model. Must be explicitly turned on for all Intel CPU models. Requires the host CPU microcode to support this feature before it can be used for guest CPUs.
pdpe1gb
Recommended to allow guest OS to use 1GB size pages.Not included by default in any Intel CPU model. Should be explicitly turned on for all Intel CPU models. Note that not all CPU hardware will support this feature.

Preferred CPU models for AMD x86 hosts

The following CPU models are preferred for use on Intel hosts. Administrators / applications are recommended to use the CPU model that matches the generation of the host CPUs in use. In a deployment with a mixture of host CPU models between machines, if live migration compatibility is required, use the newest CPU model that is compatible across all desired hosts.

EPYC
EPYC-IBPB
AMD EPYC Processor (2017)
Opteron_G5
AMD Opteron 63xx class CPU (2012)
Opteron_G4
AMD Opteron 62xx class CPU (2011)
Opteron_G3
AMD Opteron 23xx (Gen 3 Class Opteron, 2009)
Opteron_G2
AMD Opteron 22xx (Gen 2 Class Opteron, 2006)
Opteron_G1
AMD Opteron 240 (Gen 1 Class Opteron, 2004)

Important CPU features for AMD x86 hosts

The following are important CPU features that should be used on AMD x86 hosts, when available in the host CPU. Some of them require explicit configuration to enable, as they are not included by default in some, or all, of the named CPU models listed above. In general all of these features are included if using “Host passthrough” or “Host model”.

ibpb
Required to enable the Spectre (CVE-2017-5753 and CVE-2017-5715) fix, in cases where retpolines are not sufficient. Included by default in AMD CPU models with -IBPB suffix. Must be explicitly turned on for AMD CPU models without -IBPB suffix. Requires the host CPU microcode to support this feature before it can be used for guest CPUs.
virt-ssbd
Required to enable the CVE-2018-3639 fix. Not included by default in any AMD CPU model. Must be explicitly turned on for all AMD CPU models. This should be provided to guests, even if amd-ssbd is also provided, for maximum guest compatibility. Note for some QEMU / libvirt versions, this must be force enabled when when using “Host model”, because this is a virtual feature that doesn’t exist in the physical host CPUs.
amd-ssbd
Required to enable the CVE-2018-3639 fix. Not included by default in any AMD CPU model. Must be explicitly turned on for all AMD CPU models. This provides higher performance than virt-ssbd so should be exposed to guests whenever available in the host. virt-ssbd should none the less also be exposed for maximum guest compatability as some kernels only know about virt-ssbd.
amd-no-ssb
Recommended to indicate the host is not vulnerable CVE-2018-3639. Not included by default in any AMD CPU model. Future hardware genarations of CPU will not be vulnerable to CVE-2018-3639, and thus the guest should be told not to enable its mitigations, by exposing amd-no-ssb. This is mutually exclusive with virt-ssbd and amd-ssbd.
pdpe1gb
Recommended to allow guest OS to use 1GB size pages. Not included by default in any AMD CPU model. Should be explicitly turned on for all AMD CPU models. Note that not all CPU hardware will support this feature.

Default x86 CPU models

The default QEMU CPU models are designed such that they can run on all hosts. If an application does not wish to do perform any host compatibility checks before launching guests, the default is guaranteed to work.

The default CPU models will, however, leave the guest OS vulnerable to various CPU hardware flaws, so their use is strongly discouraged. Applications should follow the earlier guidance to setup a better CPU configuration, with host passthrough recommended if live migration is not needed.

qemu32
qemu64
QEMU Virtual CPU version 2.5+ (32 & 64 bit variants). qemu64 is used for x86_64 guests and qemu32 is used for i686 guests, when no -cpu argument is given to QEMU, or no <cpu> is provided in libvirt XML.

Other non-recommended x86 CPUs

The following CPUs models are compatible with most AMD and Intel x86 hosts, but their usage is discouraged, as they expose a very limited featureset, which prevents guests having optimal performance.

kvm32
kvm64
Common KVM processor (32 & 64 bit variants). Legacy models just for historical compatibility with ancient QEMU versions.
486
athlon
phenom
coreduo
core2duo
n270
pentium
pentium2
pentium3
Various very old x86 CPU models, mostly predating the introduction of hardware assisted virtualization, that should thus not be required for running virtual machines.

Syntax for configuring CPU models

The example below illustrate the approach to configuring the various CPU models / features in QEMU and libvirt

QEMU command line

Host passthrough
   $ qemu-system-x86_64 -cpu host

With feature customization:

   $ qemu-system-x86_64 -cpu host,-vmx,...
Named CPU models
   $ qemu-system-x86_64 -cpu Westmere

With feature customization:

   $ qemu-system-x86_64 -cpu Westmere,+pcid,...

Libvirt guest XML

Host passthrough
   <cpu mode='host-passthrough'/>

With feature customization:

   <cpu mode='host-passthrough'>
       <feature name="vmx" policy="disable"/>
       ...
   </cpu>
Host model
   <cpu mode='host-model'/>

With feature customization:

   <cpu mode='host-model'>
       <feature name="vmx" policy="disable"/>
       ...
   </cpu>
Named model
   <cpu mode='custom'>
       <model name="Westmere"/>
   </cpu>

With feature customization:

   <cpu mode='custom'>
       <model name="Westmere"/>
       <feature name="pcid" policy="require"/>
       ...
   </cpu>

 

by Daniel Berrange at June 29, 2018 12:49 PM

KVM on Z

Getting Started: Alpine Linux Instructions added


Instructions for Alpine Linux were added to the Getting Started with KVM on Z series.
See here for the actual page.

by Stefan Raspl (noreply@blogger.com) at June 29, 2018 09:58 AM

June 21, 2018

QEMU project

QEMU TCG Tests

Ever since I started working on QEMU, a small directory called tests/tcg has been in a perpetually broken state. It contains tests that exercise QEMU’s ability to work across architectures using the power of the Tiny Code Generator. However as these tests needed to be compiled for the guest architectures and not the host architecture—this is known as cross-compiling—most developers never ran them. As the tests were hardly ever built inevitably a certain amount of bit-rot set in.

Cross Compilers

In the old days, cross-compilation setups were almost all hand-crafted affairs which involved building versions of binutils, gcc and a basic libc. If you couldn’t get someone to give you a pre-built tarball, it was something you laboured through once and hopefully never had to touch again. There were even dedicated scripts like crosstool-ng which attempted to make the process of patching and configuring your toolchain easier.

While the distributions have improved their support for cross compilers over the years, there are still plenty of variations in how they are deployed. It is hard for a project like QEMU which has to build on a wide range of operating systems and architectures to seamlessly use any given distributions compiler setup. However for those with cross compilers at hand configure now accepts two additional flags:

--cross-cc-$(ARCH)
--cross-cc-flags-$(ARCH)

With a compiler specified for each guest architecture you want to test the build system can now build and run the tests. For developers that don’t have cross compilers around, they can take advantage of QEMU’s docker images.

Enter Docker Containers

If you work in IT you would be hard pressed not to have noticed the hype around Docker and the concept of containerisation over the last few years. Put simply containers allow you to define a known working set of software that gets run in an isolated environment for a given task. While this has many uses for QEMU it allows us to define build environments that any developer can run without having to mess around with their preferred host setup.

Over the last few years QEMU’s build system has been expanding the number of docker images it supports. Most of this has been in service of our CI testing such as Patchew and Shippable but any developer with a docker setup can run the exact same images. For example if you want to check your patches won’t break when compiled on a 32 bit ARM system you can run:

make docker-test-build@debian-armhf-cross J=n

instead of tracking down a piece of ARM hardware to actually build on. Run make docker in your source tree to see the range of builds and tests it can support.

make check-tcg

With the latest work merged into master we can now take advantage of either hand-configured and Docker-based cross compilers to build test cases for TCG again. To run the TCG tests after you have built QEMU:

make check-tcg

and the build system will build and run all the tests it can for your configured targets.

Rules for tests/tcg

So now we have the infrastructure in place to add new tests what rules need to be followed to add new tests?

Well the first thing to note is currently all the tests are for the linux-user variant of QEMU. This means the tests are all currently user-space tests that have access to the Linux syscall ABI.

Another thing to note is the tests are separate from the rest of the QEMU test infrastructure. To keep things simple they are compiled as standalone “static” binaries. As the cross-compilation setup can be quite rudimentary for some of the rarer architectures we only compile against a standard libc. There is no support for linking to other libraries like for example glib. Thread and maths support is part of glibc so shouldn’t be a problem.

Finally when writing new tests consider if it really is architecture specific or can be added to tests/tcg/multiarch. The multiarch tests are re-built for every supported architecture and should be the default place for anything that tests syscalls or other common parts of the code base.

What’s next

My hope with this work is we can start adding more tests to systematically defend functionality in linux-user. In fact I hope the first port of call to reproducing a crash would be writing a test case that can be added to our growing library of tests.

Another thing that needs sorting out is getting toolchains for all of the less common architectures. The current work relies heavily on the excellent work of the Debian toolchain team in making multiarch aware cross compilers available in their distribution. However QEMU supports a lot more architectures than QEMU, some only as system emulations. In principle supporting them is as easy as adding another docker recipe but it might be these recipes end up having to compile the compilers from source.

The tests/tcg directory still contains a number of source files we don’t build.

The cris and openrisc directories contain user-space tests which just need the support of a toolchain and the relevant Makefile plumbing to be added.

The lm32, mips and xtensa targets have a set of tests that need a system emulator. Aside from adding the compilers as docker images some additional work is needed to handle the differences between plain linux-user tests which can simply return an exit code to getting the results from a qemu-system emulation. Some architectures have semi-hosting support already for this while others report their test status over a simple serial link which will need to be parsed and handled in custom versions of the run-%: rule.

by Alex Bennée at June 21, 2018 02:30 PM

June 10, 2018

KVM on Z

KVM at Share St.Louis 2018


Yes, we will be at SHARE in St.Louis this August!
See our sessions in the Linux and VM track as follows:


by Stefan Raspl (noreply@blogger.com) at June 10, 2018 09:17 PM

May 31, 2018

QEMU project

QEMU’s new -nic command line option

If you used QEMU in the past, you are probably familiar with the -net command line option, which can be used to configure a network connection for the guest, or with with the -netdev option, which configures a network back-end. Yet, QEMU v2.12 introduces a third way to configure NICs, the -nic option.

The ChangeLog of QEMU v2.12 says that -nic can “quickly create a network front-end (emulated NIC) and a host back-end”. But why did QEMU need yet another way to configure the network, and how does it compare with -net and -netdev? To answer these questions, we need to look at the model behind network virtualization in QEMU.

As hinted by the ChangeLog entry, a network interface consists of two separate entities:

  1. The emulated hardware that the guest sees, i.e. the so-called NIC (network interface controller). On systems that support PCI cards, these typically could be an e1000 network card, a rtl8139 network card or a virtio-net device. This entity is also called the “front-end”.

  2. The network back-end on the host side, i.e. the interface that QEMU uses to exchange network packets with the outside (like other QEMU instances or other real hosts in your intranet or the internet). The common host back-ends are the “user” (a.k.a. SLIRP) back-end which provides access to the host’s network via NAT, the “tap” back-end which allows the guest to directly access the host’s network, or the “socket” back-end which can be used to connect multiple QEMU instances to simulate a shared network for their guests.

Based on this, it is already possible to define the most obvious difference between -net, -netdev and -nic: the -net option can create either a front-end or a back-end (and also does other things); -netdev can only create a back-end; while a single occurrence of -nic will create both a front-end and a back-end. But for the non-obvious differences, we also need to have a detailed look at the -net and -netdev options first …

The legacy -net option

QEMU’s initial way of configuring the network for the guest was the -net option. The emulated NIC hardware can be chosen with the -net nic,model=xyz,... parameter, and the host back-end with the -net <backend>,... parameter (e.g. -net user for the SLIRP back-end). However, the emulated NIC and the host back-end are not directly connected. They are rather both connected to an emulated hub (called “vlan” in older versions of QEMU). Therefore, if you start QEMU with -net nic,model=e1000 -net user -net nic,model=virtio -net tap for example, you get a setup where all the front-ends and back-ends are connected together via a hub:

Networking with -net

That means the e1000 NIC also gets the network traffic from the virtio-net NIC and both host back-ends… this is probably not what the users expected; it’s more likely that they wanted two separate networks in the guest, one for each NIC. Because -net always connects its NIC to a hub, you would have to tell QEMU to use two separate hubs, using the “vlan” parameter. For example -net nic,model=e1000,vlan=0 -net user,vlan=0 -net nic,model=virtio,vlan=1 -net tap,vlan=1 moves the virtio-net NIC and the “tap” back-end to a second hub (with ID #1).

Please note that the “vlan” parameter will be dropped in QEMU v3.0 since the term was rather confusing (it’s not related to IEEE 802.1Q for example) and caused a lot of misconfigurations in the past. Additional hubs can still be instantiated with -netdev (or -nic) and the special “hubport” back-end. The -net option itself will still stay around since it is still useful if you only want to use one front-end and one back-end together, or if you want to tunnel the traffic of multiple NICs through one back-end only (something like -net nic,model=e1000 -net nic,model=virtio -net l2tpv3,... for example).

The modern -netdev option

Beside the confusing “vlan” parameter of the -net option, there is one more major drawback with -net: the emulated hub between the NIC and the back-end gets in the way when the NIC front-end has to work closely together with the host back-end. For example, vhost acceleration cannot be enabled if you create a virtio-net device with -net nic,model=virtio.

To configure a network connection where the emulated NIC is directly connected to a host network back-end, without a hub in between, the well-established solution is to use the -netdev option for the back-end, together with -device for the front-end. Assuming that you want to configure the same devices as in the -net example above, you could use -netdev user,id=n1 -device e1000,netdev=n1 -netdev tap,id=n2 -device virtio-net,netdev=n2. This will give you straight 1:1 connections between the NICs and the host back-ends:

Networking with -netdev

Note that you can also still connect the devices to a hub with the special -netdev hubport back-end, but in most of the normal use cases, the use of a hub is not required anymore.

Now while -netdev together with -device provide a very flexible and extensive way to configure a network connection, there are still two drawbacks with this option pair which prevented us from deprecating the legacy -net option completely:

  1. The -device option can only be used for pluggable NICs. Boards (e.g. embedded boards) which feature an on-board NIC cannot be configured with -device yet, so -net nic,netdev=<id> must be used here instead.

  2. In some cases, the -net option is easier to use (less to type). For example, assuming you want to set up a “tap” network connection and your default scripts /etc/qemu-ifup and -down are already in place, it’s enough to type -net nic -net tap to start your guest. To do the same with -netdev, you always have to specify an ID here, too, for example like this: -netdev tap,id=n1 -device e1000,netdev=n1.

The new -nic option

Looking at the disadvantages listed above, users could benefit from a convenience option that:

  • is easier to use (and shorter to type) than -netdev <backend>,id=<id> -device <dev>,netdev=<id>
  • can be used to configure on-board / non-pluggable NICs, too
  • does not place a hub between the NIC and the host back-end.

This is where the new -nic option kicks in: this option can be used to configure both the guest’s NIC hardware and the host back-end in one go. For example, instead of -netdev tap,id=n1 -device e1000,netdev=n1 you can simply type -nic tap,model=e1000. If you don’t care about the exact NIC model type, you can even omit the model=... parameter and type -nic tap. This is even shorter and more convenient than the previous shortest way of typing -net nic -net tap. To get a list of NIC models that you can use with this option, you can simply run QEMU with -nic model=help.

Beside being easier to use, the -nic option can be used to configure on-board NICs, too (just like the -net option). For machines that have on-board NICs, the first -nic option configures the first on-board NIC, the second -nic option configures the second on-board NIC, and so forth.

Conclusion

  • The new -nic option gives you an easy and quick way to configure the networking of your guest.
  • For more detailed configuration, e.g. when you need to tweak the details of the emulated NIC hardware, you can use -device together with -netdev.
  • The -net option should be avoided these days unless you really want to configure a set-up with a hub between the front-ends and back-ends.

by Thomas Huth at May 31, 2018 07:50 AM

May 24, 2018

Gerd Hoffmann

Fedora 28 images uploaded

Fedora 28 was released a few weeks ago. New Fedora 28 images are finally uploaded now.

There are no raspberry pi images any more. Just use the standard Fedora arm images, they work just fine for both arm (rpi 2) and aarch64 (rpi 3).

There efi images are for qemu. Some use grub2 as bootloader, some use systemd-boot. The filename indicates which uses which. The efi images can also be booted as container, using systemd-nspawn --boot --image <file>, but you have to convert them to raw first as systemd-nspawn can't handle qcow2.

As usual the images don't have a root password. You have to set one using virt-customize -a --root-password "password:<secret>", otherwise you can't login after boot.

The images have been created with imagefish.

by Gerd Hoffmann at May 24, 2018 10:00 PM

May 17, 2018

KVM on Z

Knowledge Series: Managing KVM on IBM Z with oVirt

See here for a new entry in our "knowledge" series, providing step-by-step instructions on how to include IBM Z hosts in an oVirt data center.

by Stefan Raspl (noreply@blogger.com) at May 17, 2018 02:51 PM

May 14, 2018

KVM on Z

Getting Started: RHEL 7.5 Instructions added

Instructions for RHEL7.5 were added to the Getting Started with KVM on Z series.
See here for the actual page.

by Stefan Raspl (noreply@blogger.com) at May 14, 2018 12:40 PM

Getting Started: Instructions for Ubuntu 16.04 to 17.10 added

Instructions for Ubuntu 16.04 to 17.10 were added to the Getting Started with KVM on Z series.
See here for the entry page.

by Stefan Raspl (noreply@blogger.com) at May 14, 2018 12:34 PM

May 13, 2018

Gerd Hoffmann

Welcome to Jekyll!

Switched my blog from wordpress to jekyll.

Automatic import handled most of content. Didn’t cover syntax highlight (extra wordpress plugin) though, so that needed some manual invention. Also copying over the (few) images was a manual process.

Configuration isn’t imported automatically, but that is just editing a few lines in _config.yml. Permalinks can be configured to be compatible to wordpress without much trouble, so all the article links stay valid.

My blog is stored in git now. Everything is just static pages. No database needed. No user data stored anywhere.

I’m pretty pleased with the result.

by Gerd Hoffmann at May 13, 2018 10:00 PM

May 04, 2018

KVM on Z

Ubuntu 18.04 released

Ubuntu Server 18.04 LTS is out! Support for IBM Z is available here.
It ships
As obvious from these package versions, support for IBM z14 is readily in place.

Since this is a so-called LTS (Long Term Support) release providing approx. 5 years of support (in contrast to the usual 9 months of non-LTS releases), it is of particular interest to Ubuntu users interested in a stable environment for production deployments.

by Stefan Raspl (noreply@blogger.com) at May 04, 2018 08:47 PM

May 03, 2018

Cornelia Huck

A vfio-ccw primer

While basic support for vfio-ccw has been included in Linux and QEMU for some time, work has recently started to ramp up again and it seems like a good time to give some basic overview.

Why vfio-ccw?

Historically, QEMU on s390x presented paravirtualized virtio devices to the guest; first, via a protocol inspired by lguest, later, as emulated channel devices. This satisfies most needs (you get block devices, network devices, a console device, and lots more), but the device types are different from those found on LPARs or z/VM guests, and you may have a need to use e.g. a DASD directly.

For that reason, we want to do the same thing as on other platforms: pass a host device to the guest directly via vfio.

How does this work?

vfio-ccw is using the vfio mediated device framework; see the kernel documentation for an overview.

In a nutshell: The subchannel to be passed to the guest is unbound from its normal host driver (in this case, the I/O subchannel driver) and bound to the vfio-ccw driver. Any I/O request is intercepted and executed on the real device, and interrupts from the real device are relayed back to the guest.

Why subchannels and not ccw devices?

The initial attempt to implement this actually worked at the ccw device level. However, this means that the Linux common I/O layer in the host will perform various actions like handling of channel paths - which may interfere with what the guest is trying to do. Therefore, it seemed like a better idea to keep out of the way as much as possible and just implement a minimal subchannel driver that does not do much beyond what the guest actually triggered itself.

How is an actual I/O request processed?

When the guest is ready to use a channel device, it will issue I/O requests via channel programs (see here for an explanation on how that works and what things like scsw and orb mean.) The channel I/O instructions are mandatory SIE intercepts, so the host will get control for any START SUBCHANNEL the guest issues. QEMU is in charge of interpretation of channel I/O instructions, so it will process the ssch as a request to a pass-through device.

All channel I/O instructions are privileged, which means that the host kernel now needs to get involved again. QEMU does so by writing to an I/O region: the scsw (which contains, amongst other things, the fctl field specifying the start function) and the orb (pointing to the channel program). The host kernel driver now has enough information to actually issue the request on the real device after translating the ccw chain and its addresses to host addresses (involving pinning, idals and other things I will not explain here for brevity.)

After the device has processed the I/O request, it will make the subchannel status pending and generate an I/O interrupt. The host kernel driver collects the state and makes it available via the same I/O region (the IRB field), and afterwards triggers QEMU via an eventfd. QEMU now has all information needed to update its internal structures for the devices so that the guest can obtain the information related to the I/O request.

Isn't that all a bit too synchronous?

Yes, it is. Channel I/O is supposed to be asynchronous (give the device an I/O request, collect status later), but our implementation isn't yet. Why? Short answer: It is hard, and we wanted something to get us going. But this is on the list of things to be worked on.

Where is the IOMMU for this?

Due to the way channel programs work, we don't have a real IOMMU.

Does this cover everything supported by the architecture?

Not yet. Channel program wise, we support the format Linux drivers use. Also, we're emulating things like HALT SUBCHANNEL and CLEAR SUBCHANNEL in QEMU, while they really should be handed through to the device (support for this is in the works).

On the whole, you should be able to pass an ECKD DASD to a Linux guest without (known) issues.

How can I try this out?

Recent QEMU and Linux versions should have everything you need in the host; see this wiki entry for details. As a guest, any guest that can run under KVM should be fine.

What's the deal with that "unrestricted cssids" thing?

If you look at this older article, you'll notice the 'fe' value for the cssid of virtio devices (with the promise to explain it later... which I sadly never did). The basic idea at the time was to put 'virtual' devices like virtio and 'non-virtual' devices like vfio-ccw into different channel subsystem images, so that e.g. channel paths (which are per channel subsystem image) don't clash. In other words, 'virtual' and 'non-virtual' devices (and channel paths) would have different cssids (the first part of their identifiers).

This sounded like a good idea at the time; however, there's a catch: A guest operating system will by default only see the devices in the default channel subsystem image. To see all of them, it needs to explicitly enable the Multiple Channel Subsystems Extended (MCSS-E) feature - and I do not know of any operating system that has done so as of today (not very surprising, as QEMU is the only implementation of MCSS-E I'm aware of).

To work around this, we originally introduced the 's390-squash-mcss' parameter to QEMU, which would put all devices into the default channel subsystem image. But as MCSS-E support is unlikely to arrive in any guest operating system anytime soon, we agreed to rather drop the restriction of virtual devices being in css fe and non-virtual devices everywhere else (since QEMU 2.12).

What are the plans for the future?

Several things are already actively worked on, while others may come up later.
  • Intial libvirt support for vfio-ccw has been posted here.
  • Reworking the Linux host driver to make things more asynchronous and to support halt/clear is in progress.
  • Improvements in channel path handling (for example, to enable the guest to see path availability changes) are also in progress. We may need to consider things like dasd reserve/release as well.

by Cornelia Huck (noreply@blogger.com) at May 03, 2018 01:53 PM

KVM on Z

QEMU v2.12 released

QEMU v2.12 is out. Here are the highlights from a KVM on Z perspective:
  • Added support for an interactive bootloader. As always, we strongly recommend to use the existing support in libvirt.
    To enable/disable, add the following element to your guest definition:

       <os>
         <bootmenu enable=’yes|no’ timeout=’n’/>
         ...
       </os>


    The timeout parameter specifies a timeout in milliseconds after which the default entry is chosen.
    Alternatively, set attribute loadparm to PROMPT to enable the boot menu without timeout in the respective disk's element:

       <disk ...>
         <boot order=’1’ loadparm=’PROMPT’/>
         ...
       </disk>


    Example:
    To enable the boot menu for 32 seconds for a guest using a libvirt
    domain XML format follows:

       <domain type=’kvm’>
         <os>

           <bootmenu enable=’yes’ timeout=’32000’/>

           ...
         </os>
  • Exposure of guest crash information: When a guest is started using libvirt and crashes due to disabled wait, wrong interrupts or a program check loop, libvirt will print the information to the guest’s log, typically located at /var/log/libvirt/qemu.
    E.g. a crash due to a disabled wait results in an entry as follows:

       s390: psw-mask=’0xXXXXXXXXXXXXXXXX’, psw-addr=’0xXXXXXXXXXX
             XXXXXX’,crash reason: disabled wait


    Requires  libvirt v4.2.
  • Added support for guests with more than 8TB of memory.

by Stefan Raspl (noreply@blogger.com) at May 03, 2018 09:49 AM

May 02, 2018

Richard Jones

Dockerfile for running libguestfs, virt-tools and virt-v2v

FROM fedora
RUN dnf install -y libguestfs libguestfs-tools-c virt-v2v \
                   libvirt-daemon libvirt-daemon-config-network

# https://bugzilla.redhat.com/show_bug.cgi?id=1045069
RUN useradd -ms /bin/bash v2v
USER v2v
WORKDIR /home/v2v

# This is required for virt-v2v because neither systemd nor
# root libvirtd runs, and therefore there is no virbr0, and
# therefore virt-v2v cannot set up the network through libvirt.
ENV LIBGUESTFS_BACKEND direct

by rich at May 02, 2018 03:35 PM

April 25, 2018

Peter Maydell

Debian on QEMU’s Raspberry Pi 3 model

For the QEMU 2.12 release we added support for a model of the Raspberry Pi 3 board (thanks to everybody involved in developing and upstreaming that code). The model is sufficient to boot a Debian image, so I wanted to write up how to do that.

Things to know before you start

Before I start, some warnings about the current state of the QEMU emulation of this board:

  • We don’t emulate the boot rom, so QEMU will not automatically boot from an SD card image. You need to manually extract the kernel, initrd and device tree blob from the SD image first. I’ll talk about how to do that below.
  • We don’t have an emulation of the BCM2835 USB controller. This means that there is no networking support, because on the raspi devices the ethernet hangs off the USB controller.
  • Our raspi3 model will only boot AArch64 (64-bit) kernels. If you want to boot a 32-bit kernel you should use the “raspi2” board model.
  • The QEMU model is missing models of some devices, and others are guesswork due to a lack of documentation of the hardware; so although the kernel I tested here will boot, it’s quite possible that other kernels may fail.

You’ll need the following things on your host system:

  • QEMU version 2.12 or better
  • libguestfs (on Debian and Ubuntu, install the libguestfs-tools package)

Getting the image

I’m using the unofficial preview images described on the Debian wiki.

$ wget https://people.debian.org/~stapelberg/raspberrypi3/2018-01-08/2018-01-08-raspberry-pi-3-buster-PREVIEW.img.xz
$ xz -d 2018-01-08-raspberry-pi-3-buster-PREVIEW.img.xz

Extracting the guest boot partition contents

I use libguestfs to extract files from the guest SD card image. There are other ways to do this but I think libguestfs is the easiest to use. First, check that libguestfs is working on your system:

$ virt-filesystems -a 2018-01-08-raspberry-pi-3-buster-PREVIEW.img
/dev/sda1
/dev/sda2

If this doesn’t work, then you should sort that out first. A couple of common reasons I’ve seen:

  • if you’re on Ubuntu then your kernels in /boot are installed not-world-readable; you can fix this with sudo chmod 644 /boot/vmlinuz*
  • if you’re running Virtualbox on the same host it will interfere with libguestfs’s attempt to run KVM; you can fix that by exiting Virtualbox

Now you can ask libguestfs to extract the contents of the boot partition:

$ mkdir bootpart
$ guestfish --ro -a 2018-01-08-raspberry-pi-3-buster-PREVIEW.img -m /dev/sda1

Then at the guestfish prompt type:

copy-out / bootpart/
quit

This should have copied various files into the bootpart/ subdirectory.

Run the guest image

You should now be able to run the guest image:

$ qemu-system-aarch64 \
  -kernel bootpart/vmlinuz-4.14.0-3-arm64 \
  -initrd bootpart/initrd.img-4.14.0-3-arm64 \
  -dtb bootpart/bcm2837-rpi-3-b.dtb \
  -M raspi3 -m 1024 \
  -serial stdio \
  -append "rw earlycon=pl011,0x3f201000 console=ttyAMA0 loglevel=8 root=/dev/mmcblk0p2 fsck.repair=yes net.ifnames=0 rootwait memtest=1" \
  -drive file=2018-01-08-raspberry-pi-3-buster-PREVIEW.img,format=raw,if=sd

and have it boot to a login prompt (the root password for this Debian image is “raspberry”).

There will be several WARNING logs and backtraces printed by the kernel as it starts; these will have a backtrace like this:

[  145.157957] [] uart_get_baud_rate+0xe4/0x188
[  145.158349] [] pl011_set_termios+0x60/0x348
[  145.158733] [] uart_change_speed.isra.3+0x50/0x130
[  145.159147] [] uart_set_termios+0x7c/0x180
[  145.159570] [] tty_set_termios+0x168/0x200
[  145.159976] [] set_termios+0x2b0/0x338
[  145.160647] [] tty_mode_ioctl+0x358/0x590
[  145.161127] [] n_tty_ioctl_helper+0x54/0x168
[  145.161521] [] n_tty_ioctl+0xd4/0x1a0
[  145.161883] [] tty_ioctl+0x150/0xac0
[  145.162255] [] do_vfs_ioctl+0xc4/0x768
[  145.162620] [] SyS_ioctl+0x8c/0xa8

These are ugly but harmless. (The underlying cause is that QEMU doesn’t implement the undocumented ‘cprman’ clock control hardware, and so Linux thinks that the UART is running at a zero baud rate and complains.)

by pm215 at April 25, 2018 08:07 AM

QEMU project

QEMU version 2.12.0 released

We’d like to announce the availability of the QEMU 2.12.0 release. This release contains 2700+ commits from 204 authors.

You can grab the tarball from our download page. The full list of changes are available in the Wiki.

Highlights include:

  • Spectre/Meltdown mitigation support for x86/pseries/s390 guests. For more details see: https://www.qemu.org/2018/02/14/qemu-2-11-1-and-spectre-update/
  • Numerous block support improvements, including support for directly interacting with userspace NVMe driver, and general improvements to NBD server/client including more efficient reads of sparse files
  • Networking support for VMWare paravirtualized RDMA device (RDMA HCA and Soft-RoCE supported), CAN bus support via Linux SocketCAN and SJA1000-based PCI interfaces, and general improvements for dual-stack IPv4/IPv6 environments
  • GUI security/bug fixes, dmabufs support for GTK/Spice.
  • Better IPMI support for Platform Events and SEL logging in internal BMC emulation
  • SMBIOS support for “OEM Strings”, which can be used for automating guest image activation without relying on network-based querying
  • Disk cache information via virtio-balloon
  • ARM: AArch64 new instructions for FCMA/RDM and SIMD/FP16/crypto/complex number extensions
  • ARM: initial support for Raspberry Pi 3 machine type
  • ARM: Corex-M33/Armv8-M emulation via new mps2-an505 board and many other improvements for M profile emulation
  • HPPA: support for full machine emulation (hppa-softmmu)
  • PowerPC: PPC4xx emulation improvements, including I2C bus support
  • PowerPC: new Sam460ex machine type
  • PowerPC: significant TCG performance improvements
  • PowerPC: pseries: support for Spectre/Meltdown mitigations
  • RISC-V: new RISC-V target via “spike_v1.9.1”, “spike_v1.10”, and “virt” machine types
  • s390: non-virtual devices no longer require dedicated channel subsystem and guest support for multiple CSSs
  • s390: general PCI improvements, MSI-X support for virtio-pci devices
  • s390: improved TCG emulation support
  • s390: KVM support for systems larger than 7.999TB
  • SPARC: sun4u power device emulation
  • SPARC: improved trace-event support and emulation/debug fixes
  • Tricore: new instruction variants for JEQ/JNE and 64-bit MOV
  • x86: Intel IOMMU support for 48-bit addresses
  • Xtensa: backend now uses libisa for instruction decoding/disassebly
  • Xtensa: multi-threaded TCG support and noMMU configuration variants
  • and lots more…

Thank you to everyone involved!

April 25, 2018 03:30 AM

April 24, 2018

Yoni Bettan

VirtIO

My name is Yonathan Bettan and I work at RedHat in the virtualization KVM team.

This blog has 2 main purposes, the first is to give you an idea of what VirtIO is and why should we use it when the second is to serve as a step-by-step guide describing how to write a VirtIO device from zero with some code examples.

In addition, I will write a SIMPLE VirtIO example and documented device that you will be able to find in the Qemu project. Clone https://github.com/qemu/qemu.git repo for the full project.

Motivation

Let us start with a NIC (Network Interface Controller) as an example to understand better virtualization. A NIC is responsible for transmitting and receiving packets through the network. The received packets are written into memory and the packets to be sent are copied from the memory to the NIC for transmission with CPU intervention or without (DMA). When the NIC finishes a specific task it sends an interrupt to the OS.

If we want a physical machine to have a network connection we will have to buy a NIC and with the same logic if we want a virtual machine (VM) to have a network connection we will need to supply a virtual NIC.

One possible solution is to make the hypervisor fully emulate the NIC according to its spec – Virtual device.

When a package is sent on the guest OS it is sent to virtual NIC (vNIC). For each byte of data we will get:

A virtual interrupt will be generated ==> a VMexit will occur ==> the hypervisor will send the data to the physical NIC (pNIC) ==> the pNIC will interrupt the host OS when it finishes the transaction ==> the hypervisor will finally interrupt the guest OS to notify the transaction is finish.

We can see here a function of a NIC driver which its purpose is to read data from the device into a buffer.

NOTE: even if we use MMIO instead of PIO we still have a limitation on the MMIO write size and each MMIO write generates a VMexit so we still may have multiply VMexits.

The main benefits, in this case, is that the OS stays unchanged because the virtual device acts like a physical device so the already-written NIC driver does the job correctly on the emulated device. From the other hand it works slowly since each access to the vNIC generate a VMexit for each byte (as pNIC would have done) but in the reality this is not a real device (only code variables) so we don’t need to VMexit on each byte, instead we can just write the whole buffer and generate a single VMexit.

Another possible solution is to give the guest direct access to the pNIC – Device assignment.

When a package is sent on the guest OS it is sent to the vNIC. For each byte of data we will get:

The data is sent directly to the pNIC without the hypervisor intervention ==> the pNIC will interrupt the guest OS directly when it finishes the transaction.

Now we have max performance that the HW can supply but we need separate pNIC for each guest and another one to the host, this becomes expensive.

The tradeoff between Virtual devices and Device assignment is Paravirtual devices and its protocol VirtIO.

This case is quite similar to the Virtual device case except for 2 facts, the first is that the emulated device don’t pretend to act like a real device (no need to send virtual interrupts for each byte written but only a single virtual interrupt once the whole buffer is written) and the second is that we now have to write a new driver since the original driver no longer feets with the emulated HW.

We can now see the same function of a NIC  new driver.

Another reason to use VirtIO devices is that Linux support multiple hypervisors such as KVM, Xen, VMWare etc. Therefore we have drivers for each one of them. VirtIO provides Driver-unification, a uniform ABI for all those hypervisors. An ABI is an interface at the compiler level and describes how do parameters are passed to function (register\stack), how interrupts are propagated etc. VirtIO also provides device check and configuration.

Virtualization VS Paravirtualization

Virtualization Paravirtualization
The guest is unaware that it is being virtualized. The guest is aware that it is running on a hypervisor (and not on real HW).
No changes are requested on the OS. Requires modification of the OS.
The hypervisor must emulate device HW, this will lead to low performance. The guest and the hypervisor can work cooperatively to make this emulation efficient.

 

Sources

 

by Yonathan Bettan at April 24, 2018 12:26 PM

April 17, 2018

Fabian Deutsch

Running minikube v0.26.0 with CRIO and KVM nesting enabled by default

Probably not worth a post, as it’s mentioned in the readme, but CRIO was recently updated in minikube v0.26.0 which now makes it work like a charm.

When updating to 0.26 make sure to update the minikube binary, but also the docker-machine-driver-kvm2 binary.

Like in the past it is possible to switch to CRIO using

$ minikube start --container-runtime=cri-o
Starting local Kubernetes v1.10.0 cluster...
Starting VM...
Getting VM IP address...
Moving files into cluster...
Setting up certs...
Connecting to cluster...
Setting up kubeconfig...
Starting cluster components...
Kubectl is now configured to use the cluster.
Loading cached images from config file.
$

However, my favorit launch line is:

minikube start --container-runtime=cri-o --network-plugin=cni --bootstrapper=kubeadm --vm-driver=kvm2

Which will use CRIO as the container runtime, CNI for networking, kubeadm for bringing up kube inside a KVM VM.

April 17, 2018 08:05 AM

April 12, 2018

KVM on Z

White Paper: Exploiting HiperSockets in a KVM Environment Using IP Routing with Linux on Z

Our performance group has published a new white paper titled "Exploiting HiperSockets in a KVM Environment Using IP Routing with Linux on Z".
Abstract:
"The IBM Z platforms provide the HiperSockets technology feature for high-speed communications. This paper documents how to set up and configure KVM virtual machines to use HiperSockets with IP routing capabilities of the TCP/IP stack.
It provides a Network Performance comparison between various network configurations and illustrates how HiperSockets can achieve greater performance for many workload types, across a wide range of data-flow patterns, compared with using an OSA 10GbE card.
"
This white paper is available as .pdf and .html.

by Stefan Raspl (noreply@blogger.com) at April 12, 2018 04:25 PM

April 11, 2018

KVM on Z

RHEL 7.5 with support for KVM on Z available

Red Hat Enterprise Linux 7.5 is out. From the release notes, available here:
Availability across multiple architectures
To further support customer choice in computing architecture, Red Hat Enterprise Linux 7.5 is simultaneously available across all supported architectures, including x86, IBM Power, IBM z Systems, and 64-bit Arm.
Support for IBM Z is available through the kernel-alt package, as indicated earlier here, which provides Linux kernel 4.14. QEMU ships v2.10 via package qemu-kvm-ma, and libvirt is updated to v3.9.0 for all platforms.
Thereby, all IBM z14 features as previously listed here are available.
Check these instructions on how to get started. 

by Stefan Raspl (noreply@blogger.com) at April 11, 2018 04:17 PM

April 09, 2018

Gerd Hoffmann

vgpu display support finally merged upstream

It took more than a year from the first working patches to the upstream merge. But now it's finally done. The linux kernel 4.16 (released on easter weekend) has the kernel-side code needed. The qemu code has been merged too (for gtk and spice user interfaces) and will be in the upcoming 2.12 release which is in code freeze right now. The 2.12 release candidates already have the code, so you can grab one if you don't want wait for the final release to play with this.

The vgpu code in the intel driver is off by default and must be enabled via module option. And, while being at it, also suggest to load the kvmgt module. So I've dropped a config file with these lines ...

options i915 enable_gvt=1
softdep i915 pre: kvmgt

... into /etc/modprobe.d/. For some reason dracut didn't pick the changes up even after regenerating the initrd. Because of that I've blacklisted the intel driver (rd.driver.blacklist=i915 on the kernel command line) so the driver gets loaded later, after mounting the root filesystem, and modprobe actually sets the parameter.

With that in place you should have a /sys/class/mdev_bus directory with the intel gpu in there. You can create vgpu devices now. Check the mediated device documentation for details.

One final thing to take care of: Currently using gvt mlocks all guest memory. For that work the mlock limit (ulimit -l) must be high enough, otherwise the vgpu will not work correctly and you'll see a scrambled display. Limit can be configured in /etc/security/limits.conf.

Now lets use our new vgpu with qemu:

qemu-system-x86_64 \
     -enable-kvm \
     -m 1G \
     -nodefaults \
     -M graphics=off \
     -serial stdio \
     -display gtk,gl=on \
     -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/UUID,display=on \
     -cdrom /vmdisk/iso/Fedora-Workstation-Live-x86_64-27-1.6.iso

Details on the non-obvious qemu switches:

-nodefaults
Do not create default devices (such as vga and nic).
-M graphics=off
Hint for the firmware that the guest runs without a graphical display. This enables serial console support in seabios. We use this here because the vgpu has no firmware support (i.e. no vgabios), therefore nothing is visible on the display until the i915 kernel module loads.
-display gtk,gl=on
Use gtk display, enable opengl.
-device vfio-pci,sysfsdev=/sys/bus/mdev/devices/UUID,display=on
Add the vgpu to the guest, enable the display. Of course you have to replace UUID with your device.

libvirt support is still being worked on. Most bits are there, but some little details are missing. For example there is no way (yet) to tell libvirt the guest doesn't need an emulated vga device, so you'll end up with two spice windows, one for the emulated vga and one for the vgpu. Other than that things are working pretty straight forward. You need spice with opengl support enabled:

<graphics type='spice'>
  <listen type='none'/>
  <gl enable='yes'/>
</graphics>

And the vgpu must be added of course:

<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci'>
  <source>
    <address uuid='UUID'/>
  </source>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</hostdev>

Then you can start the domain. Use "virt-viewer --attach guest" to connect to the guest. Note that guest and virt-viewer must run on the same machine, sending the vgpu display to a remote machine does not yet work.

by Gerd Hoffmann at April 09, 2018 11:17 AM

March 26, 2018

Cornelia Huck

s390x changes in QEMU 2.12

As QEMU is now in hard freeze for 2.12 (with the final release expected in mid/late April), now is a good point in time to summarize some of the changes that made it into QEMU 2.12 for s390x.

I/O Devices

  • Channel I/O: Any device can now be put into any channel subsystem image, regardless whether it is a virtual device (like virtio-ccw) or a device passed through via vfio-ccw. This obsoletes the s390-squash-mcss option (which was needed to explicitly squash vfio-ccw devices into the default channel subsystem image in order to make it visible to guests not enabling MCSS-E).
  • PCI: Fixes and refactoring, including handling of subregions. This enables usage of virtio-pci devices on s390x (although only if MSI-X is enabled, as s390x depends on it.) Previously, you could add virtio-pci devices on s390x, but they were not usable.
    For more information about PCI, see this blog entry.

Booting and s390-ccw bios

  • Support for an interactive boot menu. Note that this is a bit different than on other architectures (although it hooks into the same infrastructure). The boot menu is written on the (virtual) disk via the 'zipl' program, and these entries need to be parsed and displayed via SCLP.

System Emulation

  • KVM: In case you were short on memory before: You can now run guests with 8 TB or more.
  • KVM: Support for the bpb and ppa15 CPU features (for spectre mitigation). These have been backported to 2.11.1 as well.
  • TCG: Lots of improvements: Implementation of missing instructions, full (non-experimental) SMP support.
  • TCG: Improvements in handling of the STSI instruction (you can look at some information obtained that way via /proc/sysinfo.) Note that a TCG guest reports itself as a KVM guest, rather than an LPAR: In many ways, a TCG guest is closer to KVM, and reporting itself as an LPAR makes the Linux guest code choose an undesired target for its console output by default.
  • TCG: Wire up the zPCI instructions; you can now use virtio-pci devices under TCG.
  • CPU models: Switch the 'qemu' model to a stripped-down z12, adding all features required by kernels on recent distributions. This means that you can now run recent distributions (Fedora 26/27, Ubuntu 18.04, ...) under TCG. Older distributions may not work (older kernels required some features not implemented under TCG), unless they were built for a z900 like Debian stable.

Miscellaneous

  • Support for memory hotplug via SCLP has been removed. This was an odd interface: Unlike as on other architectures, the guest could enable 'standby' memory if it had been supplied. Another problem was that this never worked with migration. Old command lines will continue to work, but no 'standby' memory will be available to the guest any more.
    Memory hotplug on s390x will probably come back in the future with an interface that matches better what is done elsewhere, likely via some paravirtualized interface. Support for the SCLP interface might come back in the future as well, implemented in an architecture-specific way that does not try to look like memory hotplug elsewhere.
  • And of course, the usual fixes, cleanups and other improvements.

by Cornelia Huck (noreply@blogger.com) at March 26, 2018 06:14 PM

KVM on Z

SLES12 SP3 Updates


SLES12SP3, released late last year, received a couple of mostly performance and security-related updates in support of IBM z14 and LinuxONE through the maintenance web updates.
In particular:

    by Stefan Raspl (noreply@blogger.com) at March 26, 2018 09:19 AM

    March 23, 2018

    Daniel Berrange

    ANNOUNCE: gtk-vnc 0.7.2 release

    I’m pleased to announce a new release of GTK-VNC, version 0.7.2. The release focus is on bug fixing, and addresses an important regression in TLS handling from the previous release.

    • Deprecated the manual python2 binding in favour of GObject introspection. It will be deleted in the next release.
    • Emit led state notification on connect
    • Fix incorrect copyright notices
    • Simplify shifted-tab key handling
    • Don’t short circuit TLS credential request
    • Improve check for keymap under XWayland
    • Update doap description of project
    • Modernize RPM specfile

    Thanks to all those who reported bugs and provides patches that went into this new release.

    by Daniel Berrange at March 23, 2018 02:20 PM

    March 17, 2018

    KVM on Z

    KVM on z14 features

    While the latest addition to the IBM Z family has been announced, here is a list of features in support of specific features of the new hardware generation in past releases of the Linux kernel, QEMU and libvirt, all activated by default in the z14 CPU model:
    • Instruction Execution Protection
      This feature provides KVM hypervisor support for the Instruction Execution Protection (IEP) facility in the z14. The IEP prevents code execution from memory regions marked as non-executable, improving the security model.
      Other than activating/deactivating this feature in the applicable the CPU models in QEMU (which holds true for most hardware-related features on IBM Z in general), there are no switches associated with this feature.
      Requires Linux kernel 4.11 in the KVM host and guests, as well as QEMU v2.10 (host only).
      In the z14 CPU model, the respective feature is:
        iep       Instruction-execution-protection facility
    • SIMD Extensions
      Following up to the SIMD instructions as introduced with the previous z13 model, the new z14 provides further vector instructions, which can again be used in KVM guests.
      These new vector instructions can be used to improve decimal calculations as well as for implementing high performance variants of certain cryptographic operations.
      Requires Linux kernel 4.11 as well as QEMU v2.10 in the KVM host, and binaries or a respective Java Runtime Environment in guests using the new vector instructions.
      In the z14 CPU model, the respective feature is:
        vxpd      Vector packed decimal facility
        vxeh
            Vector enhancements facility
    • Keyless Guest Support
      This feature supports the so-called Keyless Subset (KSS) facility, a new feature of the z14 hardware. With the KSS facility enabled, a host is not required to perform the (costly) storage key initialization and management for KVM guests, unless a guest issues a storage key instruction.
      Requires Linux kernel 4.12 in the KVM host. As for the guests, note that starting with SLES12SP1, RHEL7.2 and Ubuntu 16.04, Linux on IBM Z does not issue any storage key operations anymore.
      This feature does not have a separate entry in the z14 CPU model.
    • CPUMF Basic Sample Configuration Level Indication
      Basic mode samples as defined in "The Load-Program-Parameter and the CPU-Measurement Facilities" (SA23-2260) do not provide an indication whether the sample was taken in a KVM host or guest. Beginning with z14, the hardware provides an indication of the configuration level (level of SIE, e.g. LPAR or KVM). This item exploits this information to make the perf guest/host decision reliable.
      Requires Linux kernel 4.12 in the KVM host.
      There is no separate entry in the z14 CPU model, since this feature applies to the host only.
    • Semaphore assist
      Improves performance of semaphore locks.
      Requires Linux kernel 4.7 and QEMU v2.10 in the KVM host. Exploitation in Linux kernels in guests is still in progress here, scheduled for 4.14.
      In the z14 CPU model, the respective feature is:
        sema      Semaphore-assist facility
    • Guarded storage
      This feature is specifically aimed at Java Virtual Machines running in KVM guests to run with fewer and shorter pauses for garbage collection.
      Requires Linux kernel 4.12 and QEMU 2.10 in the KVM host, and a Java Runtime Environment with respective support in the guests.
      In the z14 CPU model, the respective feature is:
        gs        Guarded-storage facility
    • MSA Updates
      z14 introduces 3 new Message Security Assists (MSA) for the following functionalities:
          MSA6: SHA3 hashing
          MSA7: A True Random  Number Generator (TRNG)
          MSA8: The CIPHER MESSAGE WITH AUTHENTICATION instruction,
                      which provides support for the Galois-counter-mode (GCM)
      MSA6 and MSA 7 require Linux kernel 4.7, while MSA8 requires Linux kernel 4.12. All require QEMU v2.10 in the KVM host. These features can be exploited in KVM guests' kernels and userspace applications independently (i.e. a KVM guest's userspace applications can take advantage of these features irrespective of the guest's kernel version).
      In the z14 CPU model, the respective features are:
        msa6      Message-security-assist-extension 6 facility

        msa7      Message-security-assist-extension 7 facility
        msa8      Message-security-assist-extension 8 facility
    • Compression enhancements
      New instructions improve compression capabilities and performance.
      Requires Linux kernel 4.7 in the KVM host.
      In the z14 CPU model, the respective features are:
        opc       Order Preserving Compression facility
        eec       Entropy encoding compression facility
    • Miscellaneous instructions
      Details on these instructions are to be published in the forthcoming z14 Principles of Operation (PoP).
      Requires Linux kernel 4.7 and QEMU 2.10 in the KVM host, and binaries that were compiled for the z14 instruction set using binutils v2.28 and gcc v7.1 in the guests.
      In the z14 CPU model, the respective feature is:
        minste2   Miscellaneous-instruction-extensions facility 2
    Note: All versions specified are minimum versions.

    Further features will be announced in future blog posts as usual as they find their way into the respective Open Source projects.
    Also, don't forget to check this blog entry with further details on z14 in general and Linux on z in particular.

    by Stefan Raspl (noreply@blogger.com) at March 17, 2018 12:28 PM

    March 12, 2018

    Fabian Deutsch

    Running Ubuntu on Kubernetes with KubeVirt v0.3.0

    You have this image, of a VM, which you want to run - alongside containers - why? - well, you need it. Some people would say it’s dope, but sometimes you really need it, because it has an app you want to integrate with pods.

    Here is how you can do this with KubeVirt.

    1 Deploy KubeVirt

    Deploy KubeVirt on your cluster - or follow the demo guide to setup a fresh minikube cluster.

    2 Download Ubuntu

    While KubeVirt comes up (use kubectl get --all-namespaces pods), download Ubuntu Server

    3 Install kubectl plugin

    Make sure to have the latest or recent kubectl tool installed, and install the pvc plugin:

    curl -L https://github.com/fabiand/kubectl-plugin-pvc/raw/master/install.sh | bash
    

    4 Create disk

    Upload the Ubuntu server image:

    $ kubectl plugin pvc create ubuntu1704 1Gi $PWD/ubuntu-17.04-server-amd64.iso disk.img
    Creating PVC
    persistentvolumeclaim "ubuntu1704" created
    Populating PVC
    pod "ubuntu1704" created
    total 701444
    701444 -rw-rw-r--    1 1000     1000      685.0M Aug 25  2017 disk.img
    Cleanup
    pod "ubuntu1704" deleted
    

    5 Create and launch VM

    Create a VM:

    $ kubectl apply -f -
    apiVersion: kubevirt.io/v1alpha1
    kind: VirtualMachinePreset
    metadata:
      name: large
    spec:
      selector:
        matchLabels:
          kubevirt.io/size: large
      domain:
        resources:
          requests:
            memory: 1Gi
    ---
    apiVersion: kubevirt.io/v1alpha1
    kind: OfflineVirtualMachine
    metadata:
      name: ubuntu
    spec:
      running: true
      selector:
        matchLabels:
          guest: ubuntu
      template:
        metadata:
          labels: 
            guest: ubuntu
            kubevirt.io/size: large
        spec:
          domain:
            devices:
              disks:
                - name: ubuntu
                  volumeName: ubuntu
                  disk:
                    bus: virtio
          volumes:
            - name: ubuntu
              claimName: ubuntu1710
    

    6 Connect to VM

    $ ./virtctl-v0.3.0-linux-amd64 vnc --kubeconfig ~/.kube/config ubuntu
    

    Final notes - This is booting the Ubuntu ISO image. But this flow should work for existing images, which might be much more useful.

    March 12, 2018 03:56 PM

    March 06, 2018

    Fabian Deutsch

    v2v-job v0.2.0 POC for importing VMs into KubeVirt

    KubeVirt becomes usable. And to make it easier to use it would be nice to be able to import existing VMs. After all migration is a strong point of KubeVirt.

    virt-v2v is the tool of choice to convert some guest to run on the KVM hypervisor. What a great fit.

    Thus recently I started a little POC to check if this would really work.

    This post is just to wrap it up, as I just tagged v0.2.0 and finished a nice OVA import.

    What the POC does:

    • Take an URL pointing to an OVA
    • Download and convert the OVA to a domxml and raw disk image
    • Create a PVC and move the raw disk image to it
    • Create an OfflineVirtualMachine from the domxml using xslt

    This is pretty straight forward and currently living in a Job which can be found here: https://github.com/fabiand/v2v-job

    It’s actually using an OpenShift Template, but only works on Kubernetes so far, because I didn’t finish the RBAC profiles. However, using the oc tool you can even run it on Kubernetes without Template support by using:

    $ oc process --local -f manifests/template.yaml \
        -p SOURCE_TYPE=ova \
        -p SOURCE_NAME=http://192.168.42.1:8000/my.ova \
      | kubectl apply -f -
    serviceaccount "kubevirt-privileged" created
    job "v2v" created
    
    

    The interested reader can take a peek at the whole process in this log.

    And btw - This little but awesome patch on libguestfs by Pino - will help this job to auto-detect - well, guess - the the guest operating system and set the OfflineVirtualMachine annotations correctly, in order to then - at runtime - apply the right VirtualMachinePresets, in order to launch the guest with optimized defaults.

    March 06, 2018 10:41 AM

    March 02, 2018

    Marcin Juszkiewicz

    OpenStack ‘Queens’ release done

    OpenStack community released ‘queens’ version this week. IMHO it is quite important moment for AArch64 community as well because it works out of the box for us.

    Gone are things like setting hw_firmware_type=uefi for each image you upload to Glance — Nova assumes UEFI to be the default firmware on AArch64 (unless you set the variable to different value for some reason). This simplifies things as users does not have to worry about and we should have less support questions on new setups of Linaro Developer Cloud (which will be based on ‘Queens’ instead of ‘Newton’).

    There is a working graphical console if your guest image uses properly configured kernel (4.14 from Debian/stretch-backports works fine, 4.4 from Ubuntu/xenial (used by CirrOS) does not have graphics enabled). Handy feature which we were asked already by some users.

    Sad thing is state of live migration on AArch64. It simply does not work through the whole stack (Nova, libvirt, QEMU) because we have no idea what exactly cpu we are running on and how it is compatible with other cpu cores. In theory live migration between same type of processors (like XGene1 -> XGene1) should be possible but we do not have even that level of information available. More information can be found in bug 1430987 reported against libvirt.

    Less sad part? We set cpu_model to ‘host-passthrough’ by default now (in Nova) so nevermind which deployment method is used it should work out of the box.

    When it comes to building (Kolla) and deploying (Kolla Ansible) most of the changes were done during Pike cycle. During Queens’ one most of the changes were small tweaks here and there. I think that our biggest change was convincing everyone in Kolla(-ansible) to migrate from MariaDB 10.0.x (usually from external repositories) to 10.1.x taken from distribution (Debian) or from RDO.

    What will Rocky bring? Better hotplug for PCI Express machines (AArch64/virt, x86/q35 models) is one thing. I hope that live migration stuff situation will improve as well.

    by Marcin Juszkiewicz at March 02, 2018 01:22 PM

    February 23, 2018

    Fabian Deutsch

    KubeVirt v0.3.0-alpha.3: Kubernetes native networking and storage

    First post for quite some time. A side effect of being busy to get streamline our KubeVirt user experience.

    KubeVirt v0.3.0 was not released at the beginnig of the month.

    That release was intended to be a little bigger, because it included a large architecture change (to the good). The change itself was amazingly friendly and went in without much problems - even if it took some time.

    But, the work which was building upon this patch in the storage and network areas was delayed and didn’t make it in time. Thus we skipped the release in order to let storage and network catch up.

    The important thing about these two areas is, that KubeVirt was able to connect a VM to a network, and was able to boot of a iSCSI target, but this was not really tightly integrated with Kubernetes.

    Now, just this week two patches landed which actually do integrate these areas with Kubernetes.

    Storage

    The first is storage - mainly written by Artyom, and finalized by David - which allows a user to use a persistent volume as the backing storage for a VM disk:

    metadata:
      name: myvm
    apiVersion: kubevirt.io/v1alpha1
    kind: VirtualMachine
    spec:
      domain:
        devices:
          disks:
          - name: mypvcdisk
            volumeName: mypvc
            lun: {}
      volumes:
        - name: mypvc
          persistentVolumeClaim:
            claimName: mypvc
    

    This means that any storage solution supported by Kubernetes to provide PVs can be used to store virtual machine images. This is a big step forward in terms of compatibility.

    This actually works by taking this claim, and attaching it to the VM’s pod definition, in order to let the kubelet then mount the respective volume to the VM’s pod. Once that is done, KubeVirt will take care to connect the disk image within that PV to the VM itself. This is only possible because the architecture change caused libvirt to run inside every VM pod, and thus allow the VM to consume the pod resources.

    Side note, another project is in progress to actually let a user upload a virtual machine disk to the cluster in a convenient way: https://github.com/kubevirt/containerized-data-importer.

    Network

    The second change is about network which Vladik worked on for some time. This change also required the architectural changes, in order to allow the VM and libvirt to consume the pod’s network resource.

    Just like with pods the user does not need to do anything to get basic network connectivity. KubeVirt will connect the VM to the NIC of the pod in order to give it the most compatible intergation. Thus now you are able to expose a TCP or UDP port of the VM to the outside world using regular Kubernetes Services.

    A side note here is that despite this integration we are now looking to enhance this further to allow the usage of side cars like Istio.

    Alpha Release

    The three changes - and their delay - caused the delay of v0.3.0 - which will now be released in the beginnig of March. But we have done a few pre-releases in order to allow interested users to try this code right now:

    KubeVirt v0.3.0-alpha.3 is the most recent alpha release and should work fairly well.

    More

    But these items were just a small fraction of what we have been doing.

    If you look at the kubevirt org on GitHub you will notice many more repositories there, covering storage, cockpit, and deployment with ansible - and it will be another post to write about how all of this is fitting together.

    Welcome aboard!

    KubeVirt is really speeding up and we are still looking for support. So if you are interested in working on a bleeding edge project tightly coupled with Kubernetes, but also having it’s own notion, and an great team, then just reach out to me.

    February 23, 2018 11:06 AM

    Powered by Planet!
    Last updated: July 21, 2018 03:12 AM
    Powered by OpenShift Online