Blogging about open source virtualization

News from QEMU, KVM, libvirt, libguestfs, virt-manager and related tools

Subscriptions

Planet Feeds

July 02, 2021

Stefan Hajnoczi

Slides available for "Bring Your Own Virtual Devices: Frameworks for Software and Hardware Device Virtualization"

The PDF slides for my "Bring Your Own Virtual Devices: Frameworks for Software and Hardware Device Virtualization" talk from the 16th Workshop on Virtualization in High-Performance Cloud Computing are now available.

This talk covers out-of-process device interfaces including vhost (kernel), vhost-user, Linux VFIO, mdev, vfio-user, vDPA, and VDUSE. It gives a brief overview of each interface, how it works, and how to develop your own devices.

The growing number of out-of-process device interfaces available in QEMU/KVM can make it hard to understand and compare them. Each of these interfaces is designed for different use cases. For example, whether you want to pass through hardware or implement the device in software, if you want to implement your device in the host kernel or in host userspace, etc. This talk will give you the necessary knowledge to compare these interfaces yourself so you can decide which one is most appropriate for your use case.

For more information about the design of out-of-process device interfaces, see also my previous blog post about requirements for out-of-process devices.

by Unknown (noreply@blogger.com) at July 02, 2021 03:42 PM

June 30, 2021

KVM on Z

Webinar: 2021 Linux on IBM Z and LinuxONE Technical Client Workshop

Join us for the 2021 Linux on IBM Z and LinuxONE Virtual Client Workshop!

Abstract

Get the latest news about the Linux exploitation and advantages of the IBM Z and LinuxONE platform in this technical workshop. Presented by our developers and solution architects, the training focuses on the latest news and technical information for Linux on IBM Z, LinuxONE, z/VM, and KVM, such as Red Hat OpenShift Container Platform, Red Hat OpenShift Container Storage, Security, Performance, Networking and Virtualization. You will have the chance to interact directly with IBM developers and solution experts during the event, especially in the interactive workgroup sessions, which will be held on the last day.

This workshop is free of charge.

Agenda Highlights
  • What's New on RHOCP on IBM Z & LinuxONE 
  • Hybrid Cloud and why RHOCP on IBM Z & LinuxONE can enable highest flexibility
  • Introduction of Red Hat OpenShift Container Storage
  • Red Hat OpenShift Container Platform on IBM Z & LinuxONE: CPU Consumption Demystified
  • Cloud Ready Development, can now profit from multi Architecture capabilities and several features in RHOCP on IBM Z
  • FUJITSU Enterprise Postgres: Finally! An OCP-certified Database for Linux on IBM Z and LinuxONE that exploits our hardware capabilities
  • Reduce your IT costs with IBM LinuxONEHow IBM Cloud Paks drive business value and lower IT costs
  • z/VM Platform Update
  • Linux and KVM on IBM Z and LinuxONE - What's New
  • kdump - Recommendations for Linux on IBM Z and LinuxONE
  • Elasticsearch on IBM Z - Performance Experiences, Hints and Tips
  • Crypto Update
  • Fully homomorphic encryption Introduction and Update
  • Putting SMC-Dv2 to work
  • Java on IBM Z - News, Updates, and other Pulp Fiction
  • Various workgroup sessions

Schedules & Registration

Americas, Europe, Middle East & Africa
July 12-16, every day 8:30 - 11:30 AM EST / 14:30 - 17:30 CET
Register here.

Asia Pacific
July 27-29, 2021, every day 8:30 - 11:30 AM CET / 2:30 - 5:30 PM Singapore time
Register here.

by Stefan Raspl (noreply@blogger.com) at June 30, 2021 10:08 PM

June 25, 2021

KVM on Z

SLES 15 SP3 Released

SUSE Linux Enterprise Server 15 SP3 is out! See the official announcement and the release notes. It provides

  • QEMU v5.2, supporting virtio-fs on IBM Z
  • libvirt v7.1
For a detailed list of IBM Z and LinuxONE-specific (non-KVM) features see here.

by Stefan Raspl (noreply@blogger.com) at June 25, 2021 03:06 PM

June 21, 2021

Gerd Hoffmann

My kubernetes test cluster, overview.

This is an article series about my kubernetes test cluster.

  1. Cluster node installation on fedora and basic cluster setup.
  2. Planned: Setup ingress and other useful cluster services.

by Gerd Hoffmann at June 21, 2021 10:00 PM

My kubernetes test cluster, part one — install.

I'm running a kubernetes test cluster in my home network. It is used to learn kubernetes and try out various things, for example kata containers and kubevirt. Not used much (yet?) for actual development.

After mentioning it here and there some people asked for details, so here we go. I'll go describe my setup, with some kubernetes and container basics sprinkled in.

This is part one of an article series and will cover cluster node installation and basic cluster setup.

The cluster nodes

Most cluster nodes are dual-core virtual machines. The control-plane node (formerly known as master node) has 8G of memory, most worker nodes have 4G of memory. It is a mix of x86_64 and aarch64 nodes. Kubernetes names these architectures amd64 and arm64, which is easily confused, so take care 😎.

The virtual nodes use bridged networking. So no separate network, they simply show up on my 192.168.2.0/24 home network like the physical machines connected. They get a static IP address assigned by the DHCP server, and I can easily ssh into each node.

All cluster nodes run Fedora 34, Server Edition.

Node configuration

I have a git repository with some config files, to simplify rebuilding a cluster node from scratch. The repository also has some shell scripts with the commands listed later in this blog post.

Lets go over the config files one by one.

$ cat /etc/sysctl.d/kubernetes.conf
kernel.printk=4
net.ipv4.ip_forward=1
net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1

This is needed for kubernetes networking.

$ cat /etc/modules-load.d/kubernetes.conf
# networking
bridge
br_netfilter
# kata
vhost
vhost_net
vhost_vsock

Load some kernel modules needed at boot. Again for kubernetes networking. Also vhost support which is needed by kata containers.

$ cat /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-$basearch
enabled=0
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg

The upstream kubernetes rpm repository. Note this is not enabled (enabled=0) because I don't want normal fedora system updates also update the kubernetes packages. For installing/updating kubernetes packages I can enable the repo using dnf --enablerepo=kubernetes ....

Package installation

Given I want play with different container runtimes I've decided to use cri-o, which allows to do just that. Fedora has packages. They are in a module though, so that must be enabled first.

$ sudo dnf module list cri-o
$ sudo dnf module enable cri-o:${version}

The cri-o version should match the kubernetes version you want run. That is not the case in my cluster right now because I've learned that after setting up the cluster, so obviously sky isn't falling in case they don't match. The next time I update the cluster I'll bring them into sync.

Now we can go install the packages from the fedora repos. cri-o, runc (default container runtime), and a handful of useful utilities.

$ sudo dnf install podman skopeo buildah runc cri-o cri-tools \
    containernetworking-plugins bridge-utils telnet jq

Next in line are the kubernetes packages from the google repo. The repo has all versions, not only the most recent, so you can ask for the version you want and you'll get it. As mentioned above the repo must be enabled on the command line.

$ sudo dnf install --enablerepo=kubernetes \
    {kubectl,kubeadm,kubelet}-${version}

Configure and start services

kubelet needs some configuration, my git repo with the config files has this:

$ cat /etc/sysconfig/kubelet
KUBELET_EXTRA_ARGS=--cgroup-driver=systemd --fail-swap-on=false

Asking kubelet to delegate all cgroups work to systemd is needed to make kubelet work with cgroups v2. With that in place we can reload the configuration and start the services:

$ sudo systemctl daemon-reload
$ sudo systemctl enable --now crio
$ sudo systemctl enable --now kubelet

Kubernetes cluster nodes need a few firewall entries so the nodes can speak to each other. I was to lazy to setup all that and just turned off the firewall. The cluster isn't reachable from the internet anyway, so 🤷.

$ sudo systemctl disable --now firewalld

Initialize the control plane node

All the preparing steps up to this point are the same for all cluster nodes. Now we go initialize the control plane node.

$ sudo kubeadm init \
	--pod-network-cidr=10.85.0.0/16 \
	--kubernetes-version=${version} \
	--ignore-preflight-errors=Swap

Picked the 10.85.0.0/16 network because that happens to be the default network used by cri-o, see /etc/cni/net.d/100-crio-bridge.conf.

This command will take a while. It will pull kubernetes container images from the internet, start them using the kubelet service, and finally initialize the cluster.

kubeadm will write the config file needed to access the cluster with kubectl to /etc/kubernetes/admin.conf. It'll make you cluster root. Kubernetes names this cluster-admin role in the rbac (role based access control) scheme.

For my devel cluster I simply use that file as-is instead of setting up some more advanced user authentication and access control. I place a copy of the file at $HOME/.kube/config (the default location used by kubectl). Copying the file to other machines works, so I can also run kubectl on my laptop or workstation instead of ssh'ing into the control plane node.

Time to run the first kubectl command to see whenever everything worked:

$ kubectl get nodes
NAME                        STATUS   ROLES                  AGE   VERSION
k8s-node1.home.kraxel.org   Ready    control-plane,master   5m    v1.21.1

Yay! First milestone.

Side note: single node cluster

By default kubeadm init adds a taint to the control plane node so kubernetes wouldn't schedule pods there:

$ kubectl describe node k8s-node1.home.kraxel.org | grep NoSchedule
Taints:             node-role.kubernetes.io/master:NoSchedule

If you want go for a single node cluster all you have to do is remove that taint so kubernetes will schedule and run your pods directly on your new and shiny control plane node. The magic words for that are:

$ kubectl taint nodes --all node-role.kubernetes.io/master-

Done. You can start playing with the cluster now.

If you want add one or more worker nodes to the cluster instead, then watch kubernetes distribute the load, read on ...

Initialize worker nodes

The worker nodes need a bootstrap token to authenticate when they want join the cluster. The kubeadm init command creates a token and will also print the kubeadm join command needed to join. If you don't have that any more, no problem, you can always get the token later using kubeadm token list. In case the token did expire (they are valid for a day or so) you can create a new one using kubeadm token create. Beside the token kubeadm also needs the hostname and port to be used to connect to the control plane node. Default port for the kubernetes API is 6443, so ...

$ sudo kubeadm join "k8s-node1.home.kraxel.org:6443" \
	--token "${token}" \
	--discovery-token-unsafe-skip-ca-verification \
	--ignore-preflight-errors=Swap

... and check results:

$ kubectl get nodes
NAME                        STATUS   ROLES                  AGE   VERSION
k8s-node1.home.kraxel.org   Ready    control-plane,master   22m   v1.21.1
k8s-node2.home.kraxel.org   Ready    <none>                 2m    v1.21.1

The node may show up in "NotReady" state for a while when it did register already but didn't complete initialization yet.

Now repeat that procedure on every node you want add to the cluster.

Side note: scripting kubernetes with json

Both kubeadm and kubectl can return the data you ask for in various formats. By default they print a nice, human-readable table to the terminal. But you can also ask for yaml, json and others using the -o or --output switch. Specifically json is very useful for scripting, you can pipe the output through the jq utility (you might have noticed this in the list of packages to install at the start of this blog post) to fish out the items you actually need.

For starters two simple examples. You can get the raw bootstrap token this way:

$ kubeadm token list -o json | jq -r .token

Or check out some node details:

$ kubectl get node k8s-node1.home.kraxel.org -o json | jq .status.nodeInfo
{
  "architecture": "amd64",
  "bootID": "a18dcad0-3427-4a12-a238-7b815fe45ea0",
  "containerRuntimeVersion": "cri-o://1.19.0-dev",
  "kernelVersion": "5.12.9-300.fc34.x86_64",
  "kubeProxyVersion": "v1.21.1",
  "kubeletVersion": "v1.21.1",
  "machineID": "a2b3a7ba9ec54b2d84b66d70156702d2",
  "operatingSystem": "linux",
  "osImage": "Fedora 34 (Thirty Four)",
  "systemUUID": "7f4854c4-2b92-4fea-9bb7-3d28537af675"
}

There are way more possible use cases. When reading config and patch files kubectl likewise accepts both yaml and json as input.

Pod networking with flannel

There is one more basic thing to setup: Install a network fabric to get the pod network going. This is needed to allow pods running on different cluster nodes to talk to each other. When running a single node cluster this can be skipped.

There are a bunch of different solutions out there, I've settled for flannel in "host-gw" mode. First download kube-flannel.yml from github. Then tweak the configuration: Make sure the network matches the pod network passed to kubeadm init, and change the backend. Here are the changes I've made:

--- kube-flannel.yml	2021-04-26 11:15:09.820696429 +0200
+++ kube-flannel-local.yml	2021-04-26 11:15:18.403551923 +0200
@@ -125,9 +125,9 @@
     }
   net-conf.json: |
     {
-      "Network": "10.244.0.0/16",
+      "Network": "10.85.0.0/16",
       "Backend": {
-        "Type": "vxlan"
+        "Type": "host-gw"
       }
     }
 ---

Now apply the yaml file to install flannel:

$ kubectl apply -f kube-flannel-local.yml

The flannel pods are created in the kube-system namespace, you can check the status this way:

$ kubectl get pods -n kube-system
NAME                            READY   STATUS    RESTARTS   AGE
[ ... ]
kube-flannel-ds-5l7x6           1/1     Running   0          3m
kube-flannel-ds-7xjtz           1/1     Running   0          3m
[ ... ]

Once all pods are up and running your pod network should be working. One nice thing with "host-gw" mode is that this uses standard network routing of the cluster nodes and you can inspect the state with standard linux tools:

$ ip route list | grep 10.85
10.85.0.0/24 dev cni0 proto kernel scope link src 10.85.0.1 
10.85.1.0/24 via 192.168.2.112 dev enp2s0
[ ... ]

Each cluster node gets a /24 subnet of the pod network assigned. The cni0 device is the subnet of the local node. The other subnets are routed to the other cluster nodes. Pretty straight forward.

Rounding up

So, that's it for part one. The internet has tons of kubernetes tutorials and examples which you can try on the cluster now. One good starting point is Kubernetes by example.

My plan for part two of this article series is installing and configuring some useful cluster services, with one of them being ingress which is needed to access your cluster services with a web browser.

by Gerd Hoffmann at June 21, 2021 10:00 PM

June 20, 2021

Stefan Hajnoczi

My performance benchmarking workflow (2021)

Benchmarking computer systems is time-consuming because setting up the necessary environment involves a lot of work. Over time I have built a workflow that mitigates the cost of setting up benchmarks and allows me to analyze performance more effectively. This blog post covers my workflow as of 2021.

Performance investigations often follow these steps:

  1. Set up hardware and software.
  2. Run initial benchmarks to verify that the bottleneck under investigation is being triggered.
  3. Collect a full set of benchmark results and monitoring data.
  4. Analyze results and form a hypothesis about the bottleneck.
  5. Implement a proof-of-concept optimization to test the hypothesis.
  6. Go to Step 3 until the desired benchmark results are reached, keeping those optimizations that helped.

This is a long process that is costly to pause/resume or replicate again in the future. Setting up hardware and software manually is both time-consuming and error-prone. Therefore we don't want to do it more than once. There is a risk that replicating the benchmark on another machine will fail to produce identical results due to differences in environments.

The consequence of high-overhead processes is that we minimize their use since we cannot afford to run through the process as often as we'd like. This means we cannot answer all the performance questions we'd like to and therefore our understanding is limited. We cannot discover all the truths that would enable us to make performance improvements.

A more lightweight process would encourage experimentation and lead to higher productivity.

An ideal workflow

In a low-overhead world I would like to do the following:

  1. Set up hardware and software once only and be able to return to that state again in the future at the press of a button.
  2. Capture the full benchmarking environment so the configuration can be inspected and modified easily.
  3. Store benchmark results so that each run is available for further analysis in the future.

The workflow is actually quite similar to developing code with git:

  1. Create a topic branch for this performance investigation.
  2. Add an environment definition to produce the desired hardware and software state.
  3. Run the benchmark and collect the results.
  4. Commit the environment and results.
  5. Go to Step 2 to modify the environment (e.g. apply proof-of-concept patches to software) and repeat.

Since the performance investigation is captured in a git branch it's easy to switch to another investigation without losing history.

This is actually what I do! Git provides the storage and time machine functionality for easily pausing/resuming or replicating performance investigations.

Ansible

Ansible provides the automation system necessary to put hardware and software into the desired state for benchmarking. Ansible's killer feature is the large ecosystem of modules for handling tasks like installing packages, configuring virtual machines and containers, etc. I find Ansible more productive than Python or shell scripting thanks to Ansible's modules collection.

I've begun collecting Ansible tasks for Linux KVM development in virt-tasks. If you're wondering what the configuration for running a benchmark looks like, here is an Ansible playbook that builds QEMU and a guest kernel, creates a Fedora 34 virtual machine, runs the fio disk I/O benchmark, and collects the results:

pre { white-space: pre-wrap; font-family: monospace; color: #ffffff; background-color: #000000; } .PreProc { color: #00ffff; } .Constant { color: #ff40ff; } .Special { color: #ffd7d7; } .Identifier { color: #00ffff; font-weight: bold; } .Statement { color: #ffff00; }

---
- hosts: hosts
tasks:
- include_tasks: tasks/build-qemu.yml
vars:
- repo: https://gitlab.com/qemu-project/qemu.git
- version: v6.0.0

- name: create disk image
include_tasks: tasks/virt-builder-create-image.yml
vars:
- os_version: fedora-34
- size: 32G
- output: /var/lib/libvirt/images/test.img
- format: raw

- name: build guest kernel
include_tasks: tasks/build-kernel.yml
vars:
- repo: https://gitlab.com/stefanha/linux.git
- version: cpuidle-haltpoll-virtqueue
- config_src_path: files/.config

- name: stop vm
virt:
name: test
state: shutdown
ignore_errors: yes

- name: start vm
include_tasks: tasks/start-vm.yml
vars:
- xml: "{{ lookup('file', 'files/test.xml') }}"
- host: 192.168.122.192

- hosts: vms
tasks:
- name: install fio and rsync
dnf:
state: present
name:
- fio
- rsync

- name: run fio
script: files/fio.sh

- name: fetch fio output files
synchronize:
src: fio-output/
dest: notebook/fio-output/poll_source-off
mode: pull

- name: run fio
script: files/fio.sh --enable

- name: fetch fio output files
synchronize:
src: fio-output/
dest: notebook/fio-output/poll_source-on
mode: pull

- hosts: hosts
tasks:
- name: stop vm
virt:
name: test
state: shutdown

An important point is that the Ansible playbook sets up the full environment, runs all benchmarks, and collects the results. Unlike a playbook that runs a single benchmark this one runs the full suite so that the playbook captures the entire environment that produced the results. This distinction is important when tweaking the benchmark configuration or trying out proof-of-concept optimizations. Each git commit needs to encompass the full environment so that the performance investigation is reproducible and can be resumed in the future.

Jupyter

I recently started using JupyterLab notebooks for data analysis. It provides a convenient environment for graphing results and organizing them in documents.

Thanks to the official Jupyter container images you can get a full Python data analysis environment running with just one command:


$ podman run --userns=keep-id -e JUPYTER_ENABLE_LAB=yes -p 8888:8888 --rm -v "$PWD":/home/jovyan/work:z jupyter/scipy-notebook

So far I have only scratched the surface of JupyterLab. It works well for visualizing data although the way I currently use it is not much different from writing a Python matplotlib script and running it from the command-line. In time I'll get a better appreciation for its strengths and weaknesses.

Conclusion

A git-based workflow that automates benchmark setup and stores the results goes a long way towards mitigating the high overhead of performance investigations. If I'm interrupted or need to switch to a different machine it's easy to resume the investigation. Having the results and details of the environment stored together makes it possible to revisit benchmark runs in the future to reproduce or tweak them. The combination of git, Ansible, and Jupyter achieves this workflow quite well, but if you're familiar with other tools I'd love to hear!

by Unknown (noreply@blogger.com) at June 20, 2021 02:04 PM

May 31, 2021

KVM on Z

RHEL 8.4 Released

RHEL 8.4 is out! See the official announcement and the release notes.

KVM is supported via Advanced Virtualization, and provides

  • QEMU v5.2, supporting virtio-fs on IBM Z
  • libvirt v7.0

Furthermore, RHEL 8.4 now supports graphical installation for guest installs. Just add the highlighted arguments to your virt-install command line for an RHEL 8.4 install on an RHEL 8.4 KVM host:

    virt-install --input keyboard,bus=virtio --input mouse,bus=virtio \
    --graphics vnc --video virtio
--disk size=8 --memory 2048 --name rhel84 \
    --cdrom /var/lib/libvirt/images/RHEL-8.4.0-20210503.1-s390x-dvd1.iso

And the installation will enter the fancy graphical installer:

Make sure to have package virt-viewer installed on the host, and X forwarding enabled (option -X for ssh).

This new support also allows graphical installs started in cockpit:

by Stefan Raspl (noreply@blogger.com) at May 31, 2021 04:55 PM

May 30, 2021

Gerd Hoffmann

Adding cut+paste support to qemu

The spice project supports cut+paste for ages, now the rest of qemu is playing catch up.

Implementation options

So, what are the choices for implementing cut+paste support? Without guest cooperation the only possible way would be to send text as keystrokes to the guest. Which has a number of drawbacks:

  • It works for text only.
  • It is one-way (host to guest) only.
  • Has keyboard mapping problems even when limiting to us-ascii,
    sending unicode (ä ø Я € © 漢字 ❤ 😎) reliably is impossible.
  • Too slow for larger text blocks.

So, this is not something to consider seriously. Instead we need help from the guest, which is typically implemented with some agent process running inside the guest. The options are:

  1. Write a new cut+paste agent.
  2. Add cut+paste support to the qemu guest agent.
  3. Use the spice agent which already supports cut+paste.

Reusing the spice agent has some major advantages. For starters there is no need to write any new guest code for this. Less work for developers and maintainers. Also the agent is packaged since years for most distributions (typically the package is named spice-vdagent). So it is easily available, making things easier for users, and guest images with the agent installed work out-of-the-box.

Downside is that this is a bit confusing as you need the spice agent in the guest even when not using spice on the host. So I'm writing this article to address that ...

Some background on spice cut+paste

The spice guest agent is not a single process but two: One global daemon running as system service (spice-vdagentd) and one process (spice-vdagent) running in desktop session context.

The desktop process will handle everything which needs access to your display server. That includes cut+paste support. It will also talk to the system service. The system service in turn connects to the host using a virtio-serial port. It will relay data messages between desktop process and host and also process some of the requests (mouse messages for example) directly.

On the host side qemu simply forwards the agent data stream to the spice client and visa versa. So effectively the spice guest agent can communicate directly with the spice client. It's configured this way:

qemu-system-x86_64 [ ... ] \
  -chardev spicevmc,id=ch1,name=vdagent \
  -device virtio-serial-pci \
  -device virtserialport,chardev=ch1,id=ch1,name=com.redhat.spice.0
spicevmc
This is the data channel to the spice client.
virtio-serial
The virtio device which manages the ports.
virtserialport
The port for the guest/host connection. It'll show up as /dev/virtio-ports/com.redhat.spice.0 inside the guest.

The qemu clipboard implementation.

The central piece of code is the new qemu clipboard manager (ui/clipboard.c). Initially it supports only plain text. The interfaces are designed for multiple data types though, so adding support for more data types later on is possible.

There are three peers which can talk to the qemu clipboard manager:

vnc
The vnc server got clipboard support (ui/vnc-clipboard.c), so vnc clients with cut+paste support can exchange data with the qemu clipboard.
gtk
The gtk ui got clipboard support too (ui/gtk-clipboard.c) and connects the qemu clipboard manager with your desktop clipboard.
vdagent
Qemu got an implementation of the spice agent protocol (ui/vdagent.c), which connects the guest to the qemu clipboard.

This landed in the qemu upstream repo a few days ago and will be shipped with the qemu 6.1 release.

Configure the qemu vdagent

The qemu vdagent is implemented as chardev. It is a drop-in replacement for the spicevmc chardev, and instead of forwarding everything to the spice client it implements the spice agent protocol and parses the messages itself. So only the chardev configuration changes, the virtserialport stays as-is:

qemu-system-x86_64 [ ... ] \
  -chardev qemu-vdagent,id=ch1,name=vdagent,clipboard=on \
  -device virtio-serial-pci \
  -device virtserialport,chardev=ch1,id=ch1,name=com.redhat.spice.0

The vdagent has two options to enable/disable vdagent protocol features:

mouse={on,off}
enable/disable mouse messages. When enabled absolute mouse events can travel this way instead of using an usb or virtio tablet device for that. Default is on.
clipboard={on,off}
enable/disable clipboard support. Default is off (for security reasons).

Future work

No immediate plans right now, but I have some ideas what could be done:

Add more peers
Obvious candidates are the other UIs (SDL, cocoa). Possibly also more guest protocols, I think vmware supports cut+paste too (via vmport and agent).
Add more data types
With image support being a hot candidate. Chances are high that this involves more than just passing data. spice uses png as baseline image format, whereas vnc uses bmp. So qemu most likely has to do image format conversions.

Maybe I look into them when I find some time. No promise though. Patches are welcome.

by Gerd Hoffmann at May 30, 2021 10:00 PM

May 20, 2021

Gerd Hoffmann

virtio-gpu and qemu graphics in 2021

Time for an update, a few things did happen since the previous update in November 2019.

virtio-gpu features

Progress is rather slow in qemu due to shifted priorities. That doesn't mean virglrenderer development is completely stalled though. crosvm (aka Chrome OS Virtual Machine Monitor) has virtio-gpu support too and is pushing forward virglrenderer development these days. There is good progress in virglrenderer library (although I don't follow that closely any more these days), crosvm and linux kernel driver.

Lets go through the feature list from 2019 for a quick update first:

shared mappings
Not a separate feature, it's part of blob resources now.
blob resources
Specification is final. Linux kernel got support. crosvm too as far I know. qemu lags behind.
metadata query
Is implemented using execbuffer commands. Which is the communication path between guest driver and virglrenderer, so virtio-gpu doesn't need any changes for this.
host memory
Also part of blob resources.
vulkan support
Not fully sure where we stand here. Blob resources are designed with vulkan memory management needs in mind, so there shouldn't be any blockers left in the virtio-gpu guest/host interface. It should "only" be a matter of coding things up in guest driver and virglrenderer. And add blob resource support to qemu of course.

Another feature which was added to virtio-gpu which is not on the 2019 list is UUID support. This allows to attach a UUID to a virtio-gpu resource, which is specifically useful for dma-buf sharing between virtio drivers inside the guest. A guest driver importing a virtio-gpu resource can send the UUID to the host device for lookup, so the host devices can easily share the resource too.

virtio-gpu state in qemu

Not much progress here up to qemu 6.0. There are a few changes merged or in the pipeline for the next release (6.1) though.

First, the virtio-gpu device is splitted. It will loose the virgl=on|off property. There will be two devices instead: virtio-vga and virtio-vga-gl (same for the other device variants).

This will de-clutter the source code and it will also remove hard virglrenderer dependency from virtio-gpu. With modular qemu builds you'll now have two modules: One for the simple virtio-vga device, without any external dependencies, and one for the virtio-vga-gl device, which supports virgl and thus depends on the virglrenderer library.

Second, blob resource support for the simple virtio-vga device is in progress, and it will bring support shared resource mappings to qemu. This will accelerate the display path due to less or no copying of pixel data.

You may wonder why this is useful for the non-virgl device. The use case is 3D-rendering using a pci-assigned GPU (or vgpu). In that model the GPU handles only the rendering, virtio-gpu handles the display scanout, and framebuffers are shared between drivers using dma-bufs. If you worked with arm socs before this may sound familiar because they often handle rendering and scanout with separate hardware blocks too. So virt graphics will use the same approach, and userspace (xorg + wayland) luckily is already prepared for it.

modular graphics in qemu

When building qemu with configure --enable-modules you'll get a modular qemu build, which means some functionality is build as separate module (aka shared object) which is loaded on demand. This allows distributions to move some functionality to separate, optional sub-packages, which is especially useful for modules which depend on shared libraries. That way you can have a rather lightweight qemu install by simply not installing the sub-packages not needed.

Block backend drivers where modularized first, audio backend drivers next. The easy UI code was modularized early on too: curses, gtk and sdl.

Last year we took a big step forward in modularizing qemu graphics. Two features got added to qemu: First support for building devices as modules, and second support for modules depending on other modules. This allowed building pretty much all qemu graphics code with external shared library dependencies modular:

opengl, egl-headless
depend on mesa libraries and drivers
spice-core, spice-app
depend on libspice-server (plus more indirect deps)
depend on qemu opengl module
qxl device
depends on libspice-server too
depends on qemu spice-core module
virtio-vga-gl device
depends on virglrenderer

You can see the results on Fedora 34. Installing qemu-system-x86-core on a fresh system installs 29 packages, summing up to an 18M download and 74M installed size. Installing qemu-system-x86 (which pulls in all module sub-packages) on top adds 125 more packages with 45M download size and 145M installed size.

linux kernel drm driver updates

As already mentioned above the virtio drm driver got support for blob resources. Also a collection of bugfixes all over the place.

The ttm (drm memory manager) got a bunch of cleanups over the last few upstream kernel releases. A bunch of sanity checks have been added along the way, and they flagged a number of issues in the qxl drm driver (which uses ttm to manage video memory). That in turn caused a bunch of bugfixes and some other improvements in the qxl drm driver. Merged upstream for linux 5.13, most important fixes have been backported to 5.10+ stable branches.

by Gerd Hoffmann at May 20, 2021 10:00 PM

April 30, 2021

QEMU project

QEMU version 6.0.0 released

We’d like to announce the availability of the QEMU 6.0.0 release. This release contains 3300+ commits from 268 authors.

You can grab the tarball from our download page. The full list of changes are available in the Wiki.

Highlights include:

  • 68k: new ‘virt’ machine type based on virtio devices
  • ARM: support for ARMv8.1-M ‘Helium’ architecture and Cortex-M55 CPU
  • ARM: support for ARMv8.4 TTST, SEL2, and DIT extensions
  • ARM: ARMv8.5 MemTag extension now available for both system and usermode emulation
  • ARM: support for new mps3-an524, mps3-an547 board models
  • ARM: additional device emulation support for xlnx-zynqmp, xlnx-versal, sbsa-ref, npcm7xx, and sabrelite board models
  • Hexagon: new emulation support for Qualcomm hexagon DSP units
  • MIPS: new Loongson-3 ‘virt’ machine type
  • PowerPC: external BMC support for powernv machine type
  • PowerPC: pseries machines now report memory unplug failures to management tools, as well as retrying unsuccessful CPU unplug requests
  • RISC-V: Microchip PolarFire board now supports QSPI NOR flash
  • Tricore: support for new TriBoard board model emulating Infineon TC27x SoC
  • x86: AMD SEV-ES support for running guests with secured CPU register state
  • x86: TCG emulation support for protection keys (PKS)

  • ACPI: support for assigning NICs to known names in guest OS independently of PCI slot placement
  • NVMe: new emulation support for v1.4 spec with many new features, experimental support for Zoned Namespaces, multipath I/O, and End-to-End Data Protection.
  • virtiofs: performance improvements with new USE_KILLPRIV_V2 guest feature
  • VNC: virtio-vga support for scaling resolution based on client window size
  • QMP: backup jobs now support multiple asynchronous requests in parallel

  • and lots more…

Thank you to everyone involved!

April 30, 2021 12:39 AM

April 27, 2021

Thomas Huth

How to check your shell scripts for portability

This blog is mainly a reminder for myself for the various possibilities to check my shell scripts for portability, but maybe it’s helpful for some other people, too.

First, why bother? Well, while bash is the default /bin/sh shell on many rpm-based Linux distributions (so it’s also the default shell on the systems I’m developing with and thus referring to here), it’s often not the case on other Linux distributions like Debian or Alpine, and it’s certainly not the case on non-Linux systems like the various *BSD flavors or illumos based installations.

Test your scripts with other shells

The most obvious suggestion is, of course, to run your script with a different shell than bash to see whether it works as expected.

Using dash

The probably most important thing to check is whether your script works with dash. dash is the default /bin/sh shell on most Debian-based distributions, so if you want to make sure that your script also works on such systems, this is the bare minimum that you should check. The basic idea of dash is to run scripts as far as possible, without adding bloat to the shell. Therefore the shell is restricted to a minimum with regards to the syntax that it understands, and with regards to the user interface, e.g. the interactive shell prompt is way less comfortable compared with shells like bash.

Since dash is also available in Fedora and in RHEL via EPEL, its installation is as easy as typing something like:

 sudo dnf install dash

Thus checking your scripts with dash is almost no additional effort and thus a very good place to start.

Using posh

posh stands for “Policy-compliant Ordinary SHell” – it’s another shell that has been developed within the Debian project to check shell scripts for POSIX compliance. Unlike dash, the syntax that this shell understands is really restricted to the bare minimum set that the POSIX standard suggests for shells, so if your script works with posh, you can be pretty sure that it is portable to most POSIX-compliant shells indeed.

Unfortunately, I haven’t seen a pre-compiled binary of posh for Fedora or RHEL yet, and I haven’t spotted a dedicated website for this shell either, so the installation is a little bit more complicated compared to dash. The best thing you can do on a non-Debian based system is to download the tar.xz source package from https://packages.debian.org/sid/posh and compile it on your own:

 wget http://deb.debian.org/debian/pool/main/p/posh/posh_0.14.1.tar.xz
 tar -xaf posh_0.14.1.tar.xz
 cd posh-0.14.1/
 autoreconf -i
 ./configure
 make
 ./posh ~/script-to-test.sh

Using virtual machines

Of course you can also check your scripts on other systems using virtual machines, e.g. on guest installations with FreeBSD, NetBSD, OpenBSD or one of the illumos distributions. But since this is quite some additional effort (e.g. you have to boot a guest and make your script available to it), I normally skip this step – testing with dash and posh catches most of the issues already anyway.

Test your scripts with shell checkers

There are indeed programs that help you to check the syntax of your shell scripts. The two I’ve been using so far are checkbashism and ShellCheck:

Using checkbashism

checkbashism is a Perl script, maintained again by the Debian people to check for portability issues that occur when a shell script has been only written with bash in mind. It is part of the devscripts package in Debian. Fortunately, the script is also available in Fedora by installing the so-called devscripts-checkbashisms package (which can also be used on RHEL by way). checkbashism focuses on the syntax constructs that are typically only available in bash, so this is a good and easy way to check your scripts on distributions where /bin/sh is bash by default.

Using ShellCheck

ShellCheck is another static shell script analyzer tool which is available for most distributions or can ben installed via the instructions provided on its GitHub page. The nice thing about ShellCheck is that they even provide the possibility to check your script via upload on their www.shellcheck.net website – so for small, public scripts, you don’t have to install anything at all to try it out, just copy and paste your script into the text box on the website.

April 27, 2021 09:00 AM

April 22, 2021

Daniel Berrange

ANNOUNCE: virt-viewer release 10.0

I am happy to announce a new bugfix release of virt-viewer 10.0 (gpg), including experimental Windows installers for Win x86 MSI (gpg) and Win x64 MSI (gpg).

Signatures are created with key DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF (4096R)

With this release the project replaced the autotools build system with Meson and Ninja and re-designed the user interface to eliminate the menu bar

All historical releases are available from:

http://virt-manager.org/download/

Changes in this release include:

  • Switch to use Meson for build system instead of autotools
  • Require libvirt >= 1.2.8
  • Redesign UI to use title bar widget instead of menu bar
  • Request use of dark theme by default, if available
  • Don’t filter out oVirt DATA storage domains for ISO image sharing
  • Add –keymap arg to allow keys to be remapped
  • Display error message if no extension is present for screenshot filename
  • Fix misc memory leaks
  • Use nicer error message if not ISOs are available
  • Use more explicit accelerator hint to distinguish left and right ctrl/alt keys
  • Report detailed file transfer errors
  • Use standard about diaglog
  • Refresh and improve translations
  • Install appstream data file in preferred location
  • Refresh appstream data file contents
  • Display VM title when listing VMs, if available
  • Display VM description as tooltop, if available
  • Sort VM names when listing
  • Enable ASLR and NX for Windows builds
  • Add –shared arg to request a shared session for VNC
  • Disable all accels when not grabbed in kiosk mode
  • Allow num keypad to be used for zoom changes
  • Disable grab sequence in kiosk mode to prevent escape
  • Allow zoom hotkeys to be set on the command line / vv file
  • Display error message if VNC connection fails
  • Fix warnings about atomics with new GLib
  • Remove use of deprecated GTK APIs
  • Document cursor ungrab sequence in man pages
  • Honour Ctrl-C when auth dialog is active
  • Minor UI tweaks to auth dialog
  • Support VM power control actions with VNC
  • Add –cursor arg to control whether a local pointer is rendered with VNC
  • Add –auto-resize arg and menu to control whether to resize the remote framebuffer to math local window size
  • Add support for remote framebuffer resize with VNC
  • Handle case sensitivity when parsing accelerator mappings

by Daniel Berrange at April 22, 2021 05:39 PM

April 15, 2021

Daniel Berrange

ANNOUNCE: gtk-vnc release 1.2.0 available

I’m pleased to announce a new release of GTK-VNC, version 1.2.0.

https://download.gnome.org/sources/gtk-vnc/1.2/gtk-vnc-1.2.0.tar.xz (213K)
sha256sum: 7aaf80040d47134a963742fb6c94e970fcb6bf52dc975d7ae542b2ef5f34b94a

Changes in this release include

  • Add API to request fixed zoom level
  • Add API to request fixed aspect ratio when scaling
  • Add APIs for client initiated desktop resize
  • Implement “Extended Desktop Resize” VNC extension
  • Implement “Desktop Rename” VNC extension
  • Implement “Last Rect” VNC extension
  • Implement “XVP” (power control) VNC extension
  • Implement VeNCrypt “plain” auth mode
  • Implement alpha cursor VNC extension
  • Use GTK preferred width/height helpers for resizing
  • Fix misc docs/introspection annotation bugs
  • Honour meson warninglevel setting for compiler flags
  • Fix JPEG decoding in low colour depth modes
  • Fix minor memory leaks
  • Add header file macros for checking API version
  • Change some meson options from “bool” to “feature”
  • Validate GLib/GTK min/max symbol versions at build time
  • Avoid recreating framebuffer if size/format is unchanged
  • Emit resize signal after WMVi update
  • Various fixes & enhancements to python demo program
  • Ensure Gir files build against local libs
  • Enable stack protector on more platforms
  • Don’t force disable introspection on windows
  • Relax min x11 deps for older platforms
  • Avoid mutex deadlock on FreeBSD in test suite
  • Stop using deprecated GLib thread APIs
  • Stop using deprecated GLib main loop APIs
  • Stop using deprecated GObject class private data APIs
  • Add fixes for building on macOS
  • Fix deps for building example program
  • Update translations

Thanks to all those who reported bugs and provided patches that went into this new release.

by Daniel Berrange at April 15, 2021 10:56 AM

April 06, 2021

KVM on Z

Webinar: Red Hat OpenShift for IBM Z and LinuxONE on RHEL 8.3 KVM

Join us for our webinar on Wednesday, April 21, 11:00 AM - 12:00 PM EST!

Abstract

Red Hat OpenShift is available on RHEL 8.3 KVM starting with Red Hat OpenShift version 4.7 on IBM Z and LinuxONE. We discuss the deployment of a Red Hat OpenShift Cluster on RHEL KVM from a high-level perspective, including supported configurations and requirements, especially the available network and storage options.
Furthermore, we explain the installation steps of Red Hat OpenShift 4.7 on RHEL KVM in detail, including best practices and a short excursion on cluster debugging.

Speakers

  • Dr. Wolfgang Voesch, Iteration Manager - OpenShift on IBM Z and LinuxONE
  • Holger Wolf, Product Owner - OpenShift on Linux on IBM Z and LinuxONE

Registration

Register here. You can check the system requirements here.
After registering, you will receive a confirmation email containing information about joining the webinar.

Replay & Archive

All sessions are recorded. For the archive as well as a replay and handout of this session and all previous webinars see here.

by Stefan Raspl (noreply@blogger.com) at April 06, 2021 01:22 PM

April 05, 2021

Stefan Hajnoczi

Learning programming languages

You may be productive and comfortable in one programming language but find the idea of learning a new programming language daunting. Or you may know and use multiple programming languages but haven't learnt a new one in a while. Or you might be a programming language geek who is just curious about how others dive into new programming languages and get productive quickly. No matter how easy or difficult it is for you to engage in new programming languages, this article explains how I like to learn new programming languages. Although people learn best in different ways, I hope you'll find my thought process interesting even if you decide to take a different approach.

Background

Language N+1

This article isn't aimed at learning to program. Learning your first programming language is much harder than learning an additional one. The reason is that many abstract concepts are involved in computer programming. When you first encounter programming, most languages require you to understand concepts like iteration, scopes, (im)mutability, arrays, modules, functions, and much more. The good news is that when you learn an additional language you'll already be familiar with common concepts and can therefore take a more streamlined approach in order to get up to speed quickly.

Courses, videos, exercises

There is a lot of educational material online that teaches various programming languages, but I don't find structured courses, videos, or exercises efficient. If you already know common programming concepts and have an idea of what you want to build in the new programming language, then it's more efficient to chart your own course. Materials that let you jump/skip around will let you focus on information that is novel and that you actually need. Working through a series of exercises that someone else designed may be time spent practicing the wrong things since usually you are the one with the best idea of what to practice. Courses, videos, and exercises tend to be an "on the rails" experience where you are exposed to information in a linear fashion whether it's useful at the moment or not.

1. Understanding the computational model

The first question about a new programming language is "what is its computational model?". Sadly, many language manuals and websites do not describe the computational model beyond what programming paradigms are supported (object-oriented, concatenative, functional, logic programming, etc). The actual computational model may only become fully apparent later. Or it might be expressed in too much detail in a language standards document to be of use early on. In any case, it's worthwhile reading the programming language's website for information on the computational model to grasp the big picture.

It's the computational model that you need to understand in order to write programs. Often we think about syntax and language features too much when learning a new language. The computational model informs us how to break down requirements into programs. We approach logic programming differently from object-oriented programming in how we organize data and code. The syntax and to an extent even the language features don't matter.

Understanding the computational model also helps you situate the new programming language relative to others, especially programming languages that you already know. It will give you an idea of how different programming will be and where you'll need to learn new concepts.

2. The language tutorial

After familiarizing yourself with the computational model of the programming language, the next step is to learn the basic syntax and concepts. Most modern programming languages have an official tutorial available online. The tutorial introduces the language elements, usually with short examples, and its table of contents gives an overview of what the language consists of. The tutorial can be completed in a few hours or days. Unlike full courses, official programming language tutorials often lend themselves to non-linear reading, which is helpful when certain aspects of the language are already familiar or will not be relevant to you.

I remember reading the Python tutorial in an afternoon years ago, but watch out: at this point you might be able to write valid syntax but you won't be writing idiomatic code yet. There's that saying "you can write FORTRAN in any language". In order to write programs that are expressed naturally and take advantage of the language effectively, more effort will be necessary.

3. Writing toy programs

After becoming aware of the language elements the next step is to explore how the language works. This can be done by writing small programs. Often these toy programs are familiar tasks you've already solved in other languages. If you want to write games, maybe it's Pong. If you write web applications, it could be a todo list. There are lots of different well-known programs to write.

During the course of writing toy programs you'll encounter syntax errors or issues with the program. Learning to interpret common error messages is important because they will come up in more complicated scenarios later where it can be harder to resolve them if you haven't seen them before.

You'll also hit common tasks for which you need to find solutions in the standard library or language reference manual. Whether it's parsing command-line options, regular expression matching, HTTP requests, or error handling, the language probably has a way of doing it. Toy programs present a simple environment in which to explore the basic facilities of a programming language.

4. Gaining a deeper appreciation for the language

Once you have written some toy programs you'll be able to start writing your own programs that solve new problems. At this stage you start being productive but there is still more to learn. In particular, the language's idioms and patterns must be studied in order to write natural code. Once I have experience with the basics of a language I like to read the source code to the standard library, popular libraries, and popular applications. In the beginning this is hard because they use unfamiliar language features or library dependencies, but after following up on unknown parts of one program, you'll find it becomes easier to read other programs because your knowledge of the language has expanded.

At this point it is also worth looking for style guides, manuals on language idioms, and documentation on common gotchas or anti-patterns. These will provide the information about thinking natively in the new programming language. This is what's needed to become fluent in the language and capable of reading and writing real programs confidently.

Although I have presented steps in a linear order, learning complex subjects is often an iterative process. Sometimes I find myself jumping back and forth between steps as my understanding evolves.

Conclusion

Learning a new programming language is time-consuming no matter how you do it. However, it doesn't all need to happen upfront and after a few days of reading the documentation and experimenting with toy programs, it's possible to peform basic tasks. Learning how to use a language effectively by studying popular programs and reading guides is the quickest way I've found to reaching fluency. Finally, it just takes practice!

by Unknown (noreply@blogger.com) at April 05, 2021 09:29 PM

March 26, 2021

KVM on Z

Installing Red Hat OpenShift on KVM on Z

While there is no documentation on how to install Red Hat OCP on Linux on Z with a static IP under KVM today, the instructions here will get you almost there. However, there are a few parts within section Creating Red Hat Enterprise Linux CoreOS (RHCOS) machines that require attention. Here is an updated version that will get you through:
 
4. You can use an empty QCOW2 image: Using the prepared one will also work, but it will be overwritten anyway.

5. Start the guest with the following modified command-line:
  $ virt-install --noautoconsole
     --boot kernel=/bootkvm/rhcos-4.7.0-s390x-live-kernel-s390x, \
       initrd=/bootkvm/rhcos-4.7.0-s390x-live-initramfs.s390x.img, \
           kernel_args='rd.neednet=1 dfltcc=off coreos.inst.install_dev=/dev/vda
       coreos.live.rootfs_url=https://mirror.openshift.com \
           /pub/openshift-v4/s390x/dependencies/rhcos/4.7/4.7.0 \
           /rhcos-4.7.0-s390x-live-rootfs.s390x.img
       coreos.inst.ignition_url=http://
192.168.5.106:8080/ignition \
       /bootstrap.ign ip=192.168.5.11::
192.168.5.1:24:bootstrap-0.pok-241-macvtap- \
           mars.com::none
       nameserver=9.1.1.1'
     --connect qemu:///system
     --name bootstrap-0
     --memory 16384
     --vcpus 8
     --disk /home/libvirt/images/bootstrap-0.qcow2
     --accelerate
     --import
     --network network=macvtap-mv1
     --qemu-commandline="-drive if=none,id=ignition,format=raw,file=/bootkvm \
           /bootstrap.ign,readonly=on -device virtio-blk, \
           serial=ignition,drive=ignition"

Note the following changes:

  • Use the live installer kernel, initrd (you can get them from the redhat mirror) and parmline (this you need to create yourself once for each guest) in the --boot parameter. This is basically like installing on z/VM, and will write the image to your QCOW2 image with the correct static IP configuration. Keep in mind that the ignition file needs to be provided by an http/s server for this method to work
  • dfltcc=off is required for IBM z15 and LinuxONE III

6. To restart the guest later on, you will need to change the guest definition to boot from the QCOW2 image.
When the kernel parms are passed into the installer, the domain xml will look like this once the guest is installed and running:
  <os>
    <type arch='s390x' machine='s390-ccw-virtio-rhel8.2.0'>hvm</type>
    <kernel>/bootkvm/rhcos-4.7.0-s390x-live-kernel-s390x</kernel>
    <initrd>/bootkvm/rhcos-4.7.0-s390x-live-initramfs.s390x.img</initrd>
    <cmdline>rd.neednet=1 dfltcc=off coreos.inst.install_dev=/dev/vda
        coreos.live.rootfs_url=https://mirror.openshift.com/pub/openshift-v4/ \
             s390x/dependencies/rhcos/4.7/4.7.0/rhcos-4.7.0-s390x-live- \
             rootfs.s390x.img
        coreos.inst.ignition_url=http://
192.168.5.106:8080/ignition/worker.ign
        ip=
192.168.5.49::192.168.5.1:24:worker-1.pok-241-macvtap- \
             mars.com::none nameserver=1.1.1.1</cmdline>
    <boot dev='hd'/>
  </os>

However, this domain XML still points at the installation media, hence a reboot will not work (it will merely restart the installation).
Remove the <kernel>, <initrd>, <cmdline> elements, so that all that is left is the following:
  <os>
    <type arch='s390x' machine='s390-ccw-virtio-rhel8.2.0'>hvm</type>
    <boot dev='hd'/>
  </os>

With this, the guest will start successfully.

 [Content contributed by Alexander Klein]

by Stefan Raspl (noreply@blogger.com) at March 26, 2021 08:59 PM

March 12, 2021

Stefan Hajnoczi

Overcoming fear of public communication in open source

When given the choice between communicating in private or in public, many people opt for private communication. They send questions or unfinished patches to individuals instead of posting on mailing lists, forums, or chat rooms. This post explains why public communication is a faster and more efficient way for developers to communicate in open source communities than private communication. A big factor in the private vs public decision is psychological and I've provided a checklist to give you confidence when communicating in public.

Time and time again I find people initiating discussions through private channels when I know it would be advantageous to have them in public instead. I even keep an email reply template handy asking the sender to reach out to the public mailing list instead of communicating with me in private. Communicating in public is a good default unless you need confidentiality (security bugs, business reasons, etc). So why do people prefer to ask questions or send patches off-list?

Why we fear public communication

I'm interested in understanding why people avoid public communications channels in open source communities. The purpose of mailing lists, chat rooms, and forums is to engage in discussions and share knowledge. Members of these communities are interested in the topic and want to engage in discussion. But there are some common reasons I've found why people don't make use of public communications channels:

  • I don't want to create noise. Mailing lists are home to important discussions between experienced members of the community. Sending a relatively simple question or unfinished patch that no expert would need to ask can make you doubt whether it deserves to be sent at all. This is a fallacy. As long as the question or patches are relevant to the community and you have done your homework (see the checklist below), there is no need to worry about creating noise.
  • I will look dumb or seem like a bad programmer. When you need help it's likely that your understanding is incomplete. We often hold ourselves to artifically high standards when asking for help in public, yet when we observe others interacting in public we don't hold their questions or unfinished code against them. Only if they show a repeated pattern of sloppy work does it damage their reputation. Asking for help in public won't harm your image.
  • Too much traffic. Big open source communities are often so active that no single person can keep track of everything that is going on. Even core members of the community rely on filtering only topics relevant to them. Since filtering becomes necessary at scale anyway, there's no need to worry about sending too many messages.

A note about tone: in the past people were more likely to be put off by the unfriendly tone in chat rooms and mailing lists. Over time the tone seems to have improved in general, probably due to multiple factors like open source becoming more professional, codes of conduct making participants aware of their behavior, etc. I don't want to cover the pros and cons of these factors, but suffice to say that nowadays tone is less of a problem and that's a good thing for everyone.

Why public communication is faster and more efficient

Next let's look at the reasons why public communication is beneficial:

Including the community from the start avoids waste

If you ask for help in private it's possible that the answer you get will not end up being consensus in the community. When you eventually send patches to the community they might disagree with the approach you settled on in private and you'll have to redo your work to get the patches merged. This can be avoided by including the community from the start.

Similarly, you can avoid duplicating work when multiple people are investigating the same problem in private without knowing about each other. If you discuss what you are doing in public then you can collaborate and avoid spending time creating multiple solutions, only one of which can be merged.

Public discussions create a searchable knowledge base

If you find a solution to your problem in private, that doesn't help others who have the same question. Public discussions are typically archived and searchable on the web. When the next person has the same question they will find the answer online and won't need to ask at all!

Additionally, everyone benefits and learns from each other when questions are asked in public. We soak up knowledge by following these discussions. If they are private then we don't have this opportunity.

Get a reply even if the person you asked is unavailable

The person you intended to reach may be away due to timezones, holidays, etc. A private discussion is blocked until they respond to your message. Public discussions, on the other hand, can progress even when the original participants are unavailable. You can get an answer to your question faster by asking in public.

Public activity gives visibility to your work

Private discussions are invisible to your teammates, managers, and other people affected by your progress. When you communicate in public they will be able plan better, help out when needed, and give you credit for the effort you are putting in.

A checklist before you press Send

Hopefully I have encouraged you to go ahead and ask questions or send unfinished patches in public. Here is a checklist that will help you feel confident about communicating in public:

Before asking questions...

  1. Have you searched the web, documentation, and code?
  2. Have you added printfs, GDB breakpoints, or enabled tracing to understand the behavior of the system?

Before sending patches...

  1. Is the code formatted according to the coding standard?
  2. Are the error cases handled, memory allocation/ownership correctly implemented, and thread safety addressed? Most of the time you can figure these out yourself and forgetting to do so may distract from the topics you want help with.
  3. Did you add todo comments pointing out unfinished aspects of the code? Being upfront about what is missing helps readers understand the status of the code and saves them time trying to distinguish requirements you forgot from things that are simply not yet implemented.
  4. Do the commit descriptions explain the purpose of the code changes and does the cover letter give an overview of the patch series?
  5. Are you using git-format-patch --subject-prefix RFC (same for git-publish) to mark your patches as a "request for comment"? This tells people you are seeking input on unfinished code.

Finally, do you know who to CC on emails or mention in comments/chats? Look up the relevant maintainers or active developers using git-log(1), scripts/get_maintainer.pl, etc.

Conclusion

Private communication can be slower, less efficient, and adds less value to an open source community. For many people public communication feels a little scary and they prefer to avoid it. I hope that by understanding the advantages of public communication you will be motivated to use public communication channels more.

by Unknown (noreply@blogger.com) at March 12, 2021 06:15 PM

March 10, 2021

QEMU project

Google Summer of Code 2021 is on!

QEMU has been accepted into Google Summer of Code 2021 and we look forward to mentoring talented students from around the world as they make open source contributions this summer. GSoC is a remote work open source internship program where students work on a project for an open source organization like QEMU.

Check out the project ideas page where there are 10 projects that eligible students can apply for. This year we have C, Rust, and Python projects in various areas related to emulation and virtualization.

If you are a student who is interested in doing an internship this summer, head over to QEMU’s GSoC organization page where you can read about how to apply and learn more about Google Summer of Code in general.

The GSoC 2021 timeline is:

  • Student application period - March 29 - April 13
  • Student projects announced - May 17
  • Community bonding period - May 17 - June 7
  • Coding - June 7 - August 16

We look forward to meeting you and answering questions on the #qemu-gsoc IRC channel on irc.oftc.net!

by Stefan Hajnoczi at March 10, 2021 07:00 AM

February 24, 2021

Stefan Hajnoczi

Milestone Systems: Software that changes how things are done

Every few years a project comes out with a new approach that becomes influential. Often it involves combining existing concepts in a novel way. People argue about whether the project is actually novel or whether it was just in the right place at the right time and popularized existing technology. Regardless, I find these projects fascinating and try to learn about them because they are milestones that future systems are based on.

Here is a short list of projects that I think fall into this category. I hope you enjoy them (if you haven't already explored them). Send me your picks!

Tor

Tor is an onion router. It enables (mostly) anonymous communication by tunneling encrypted connections. The client does not know the IP address of the server (when connecting to so-called hidden services), the server does not know the IP address of the client, and the intermediate hops only know about their immediate predecessor and successor.

The design of Tor is described in a paper.

BitTorrent

BitTorrent is a decentralized peer-to-peer file sharing protocol that can be used to reduce load on file hosting servers and improve download times. It's commonly used to share copyrighted material, but is also used by Linux distributions to publish ISO images and by software update systems.

A central aspect to BitTorrent is that peers exchange pieces of the file amongst themselves thanks to a Merkle tree. Pieces received from untrusted peers are checked against the file's Markle tree to ensure that data has not been corrupted or manipulated.

A paper about the economics of BitTorrent described some of the ideas behind it. The actual protocol is described by the protocol specification.

git

Git is the most popular version control system as of 2020. It replaced the older CVS and Subversion systems that were widely used before it. Other systems like Mercurial, Darcs, Perforce, and BitKeeper had similar use cases and ideas.

Git is a content-addressable object store with a convention for representing trees of files as well as commits and tags. I wrote about how the object store is implemented here if you want to learn about pack files and deltas.

Bitcoin

Bitcoin is a decentralized currency, also known as cryptocurrency. A network of mutually untrusted nodes maintains a ledger called the blockchain that records transactions. Bitcoin is famous for mining where nodes compete to solve a computationally-expensive problem in order to extend the ledger.

What is interesting about Bitcoin is that the blockchain prevents abuse as long as at least half of the nodes are not controlled or colluding. In other words, it is a decentralized consensus - although there can be short-lived splits where not all nodes agree on the current state.

The Bitcoin paper gives an overview of how the system works.

Conclusion

I hope this was a fun post that motivated you to look at a system you haven't studied yet or made you think about systems that you consider milestone systems. Please get in touch if you want to share yours!

by Unknown (noreply@blogger.com) at February 24, 2021 08:29 PM

KVM on Z

QEMU v5.2 released [UPDATE Feb 24, 2021]

QEMU v5.2 is out. A highlight from a KVM on Z perspective:

  • PCI passthrough support now includes any PCI devices other than RoCE Express cards, e.g. including NVME devices. However, ISM devices as needed for SMC-D, require extra support an cannot be used at this point.
  • virtiofs support vi virtio-fs-ccw: Shared Filesystem allowing KVM guests to access host directories.
    Use cases:
    • Container image access in lightweight VMs (e.g. in Kata Containers)
    • CI/CD and development enablement
    • Filesystem as a service, to easily switch backends
    To use, define in the host as follows:
      <domain>
        <memoryBacking>
          <access mode='shared'/>
        </memoryBacking>
        <devices>
          <filesystem type='mount'
                accessmode='passthrough'>
            <driver type='virtiofs'/>
            <source dir='/<hostpath>'/>
            <target dir='mount_tag'/>
          </filesystem>
          ...
        </devices>
        ...
      </domain>

    Then mount in guests as follows:
      # mount -t virtiofs mount_tag /mnt/<path>
    Requires Linux kernel 5.4 and libvirt v7.0.

For further details, see the Release Notes.

UPDATE: A previous version had falsely listed ISM devices as supported.

by Stefan Raspl (noreply@blogger.com) at February 24, 2021 05:13 PM

Red Hat OpenShift Cotainer Platform 4.7 Released

Red Hat OCP 4.7 is out!

Among others, it adds support for KVM on Z as provided by RHEL 8.3 as the hypervisor for user-provisioned infrastructure.

See here for the full list of IBM Z-specific changes and improvements.

by Stefan Raspl (noreply@blogger.com) at February 24, 2021 04:54 PM

February 17, 2021

QEMU project

QEMU is applying to Google Summer of Code and Outreachy 2021

QEMU is applying to Google Summer of Code 2021 and is participating in Outreachy May-August 2021. Both of these open source internship programs offer remote work opportunities for new developers wishing to get involved in our community.

Interns work with mentors who support them in their project. The code developed during the project is submitted via the same open source development process that all QEMU code follows. This gives interns experience with contributing to open source software.

QEMU’s mentors are experienced contributors who enjoy working with talented individuals who are getting started in open source. You can find a list of project ideas that mentors are proposing here.

Outreachy

Initial applications are open until February 22nd at 16:00 UTC. Outreachy’s goal is to increase diversity in open source and is open to anyone who faces under-representation, systemic bias, or discrimination in the technology industry of their country.

You can learn more about Outreachy May-August and how to apply at the Outreachy website.

Google Summer of Code

Google Summer of Code (GSOC) is a 10-week internship for students. Applications are open from March 29th to April 13th. You can find the details of how to apply at the Google Summer of Code website.

Google will announced accepted organizations on March 9th. QEMU is applying and we hope to mentors GSoC interns again this year!

Please review the eligibility criteria for GSoC before applying.

by Stefan Hajnoczi at February 17, 2021 07:00 AM

February 16, 2021

Stefan Hajnoczi

Video and slides available for "The Evolution of File Descriptor Monitoring in Linux"

My FOSDEM 2021 talk "The Evolution of File Descriptor Monitoring in Linux: From select(2) to io_uring" is now available:

The talk compares the file descriptor monitoring system calls available in Linux and discusses their design. Benchmark results show how well they scale when there are many file descriptors. I hope this is a useful overview to this important kernel feature that GUI applications, network services, and many other programs rely on.

If you are interested in API design and performance, this talk highlights how different approaches like stateless vs stateful APIs can affect performance and how to minimize the number of API calls through careful design.

Enjoy!

by Unknown (noreply@blogger.com) at February 16, 2021 09:24 AM

February 15, 2021

Daniel Berrange

ANNOUNCE: libvirt-glib release 4.0.0

I am pleased to announce that a new release of the libvirt-glib package, version 4.0.0, is now available from

https://libvirt.org/sources/glib/

The packages are GPG signed with

Key fingerprint: DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF (4096R)

Changes in this release:

  • Replace autotools build system with meson
  • Mandate libvirt >= 1.2.8
  • Mandate libxml2 >= 2.9.1
  • Mandate glib >= 2.48.0
  • Mandate gobject-introspection >= 1.46.0
  • Fix docs incompatibility with gtk-doc >= 1.30
  • Updated translations
  • Misc API docs fixes
  • Add constants related to NVRAM during domain delete
  • Add domain config API for controller ports attribute
  • Fix compat with newer glib by avoid volatile for enum types

Thanks to everyone who contributed to this new release.

by Daniel Berrange at February 15, 2021 12:36 PM

February 03, 2021

Stefan Hajnoczi

Keeping a clean git commit history

Does the commit history of your source repository look like this:

f02af91822 docs: fix incorrect subheadings
dd7bee8b38 cli: add --import option
900ca2936a cli: extract move_topic() helper function

Or like this:

7011cc9868 lunch time
a07c82331d resolve code review comments
331d79a8ff more fixes

?

The first is a clean git commit history where each commit has a clear purpose and is a single logical change. The second is a messy commit history where the commits have no inherent structure:

  • "more fixes" does not describe clearly what is being fixed and the plural ("fixes") hints it may contain multiple logical changes instead of just one.
  • "resolve code review comments" contains changes requested by code reviewers in relation to another commit, it's not a self-contained logical change.
  • "lunch time" is an unfinished commit that was created because the programmer wanted to save their work.

Commit anti-patterns

The example above illustrates several anti-patterns:

Vague commit messages

If the commit message is vague and does not express a clear purpose, then it is hard to know what a commit does from the commit message. If git-log(1) doesn't provide useful information about commits then one has to resort to searching the code diffs. That is very tedious and sometimes it's almost impossible to come up with a good code search query while a clear commit message would have been easy to search. So clear commit messages are the first step towards clean commit history.

Doing too many things in one commit

Commits that make several logical code changes are hard to review and impede backporting fixes to stable branches. For example, a commit that fixes a bug as well as adding a new feature may need to be rewritten for a stable branch. If instead the code had been split into two commits, then the bug fix commit could have been backported easily. Therefore it is good practice to separate distinct bug fixes, features, and other logical code changes into separate commits.

Addressing code review comments

The code review and testing history is usually not useful information once a commit has been merged. For example, if there was a continuous integration (CI) test failure and a pull request needed to be changed, then the change should be made directly to the buggy commit so that the final commit passes the tests. No one needs to know about the code review or testing history once the code is merged and keeping these artifacts makes the commit history unwieldy by spreading a logical code change across multiple incomplete commits.

Saving work

There are valid reasons to temporarily save your work in a commit, but work-in-progress (WIP) commits should be cleaned up before merging them. For example, sometimes people make arbitrary commits to save work at the end of the day. That is fine in a local branch, but those temporary commits can be restructured into clean commits using git-rebase(1). No one else needs to know about temporary commits.

Broken commits

It can be easy to accidentally include a commit that does not build or fails tests if a later commit happens to resolve the issue. Since the later commit hides the issue it may not be apparent when testing the branch. When reordering commits the risk of introducing broken commits increases because those commits were originally written in a different order. I use the git-rebase(1) exec action to build and run tests after every commit to detect broken commits when doing extensive rebases.

Why clean commit history is important

Not all reasons for maintaining a clean commit history are obvious. Unfortunately all the above anti-patterns make commit history less useful so it's interesting to note that if you value any of the following reasons for keeping a clean commit history, then all anti-patterns need to be avoided.

Code review

Reviewers have an easier time reading clean commits than an unstructured series of commits. For example, if there is a broken commit because a function is used before it is defined in a later commit, then that affects code reviewers who read the commits linearly. They will be puzzled by the non-existent function and unable to decide whether it is being used correctly because it has not been defined yet. Although code reviewers could put in extra effort to reread the commits multiple times and try to remember the misordered changes, it's better to let code reviewers spend time on real issues rather than on untangling poorly structured commits.

Capturing the rationale for code changes

When each commit is a single logical change it becomes possible to write good commit descriptions that give the rationale for the code change. Explanations for why a code change is necessary, as well as links to issue trackers, email discussions, etc can be valuable when revisiting the commit history later. If commits contain multiple logical code changes or are incomplete then it is hard to include a good commit description, so the commit history is less useful when referring back to it later on.

Making cherry-picking easy

Many software projects maintain stable branches that still receive bug fixes for some time. This allows development to introduce new features and less mature code while users can run a mature stable release. However, maintaining stable branches can be time-consuming. Maintainers need to identify commits suitable for stable branches and cherry-pick or backport them. This requires clean commit history so that bug fixes can be applied in isolation without dragging in other code changes that do not fit the criteria for stable branches.

Enabling git-bisect(1)

When a bug is observed it may not be clear which commit introduced it. The git-bisect(1) command systematically searches the commit history and identifies the commit that caused the bug. However, git-bisect(1) only works with clean commit history. If there are broken commits then bisection becomes unreliable because some portions of commit history cannot be tested. Poorly structured commits, such as huge changes that do many different things, also make it difficult to identify which line caused the bug even when git-bisect(1) has determined which commit is to blame.

When clean commit history does not matter

The reason I have found that not everyone practices clean commit history is that they may not need any of this. Especially small projects developed by a single author may involve little code review, backporting changes to stable branches, or git-bisect(1). In that case the effort required to split code changes into clean commits and write good commit messages may seem unjustified. Of course this can change but once the commit history is messy there is not much to be done. So it's worth thinking carefully about whether to take shortcuts.

Another factor is poor tooling. Gerrit and GitHub's code review has historically made it hard to practice clean commit history. They were not designed for reviewing commit series and favored anti-patterns like squashing everything into a single commit or adding additional commits to address code review feedback. These are tool limitations and luckily GitHub code review has become better over the years. Tools that encourage you to review a commit series as a single diff are not conducive to clean commit history.

Finally, clean commit history requires proficiency with git-rebase(1) and that you are comfortable with the idea of rewriting your local branch to clean it up before publishing it. It takes a little practice to become competent at reordering, squashing, and splitting commits. The process can be a little scary, although git-reflog(1) makes it possible to undo even the most serious errors where commits were accidentally lost. On a related note, some people falsely believe that a pull or merge request branch should not be rebased. Although it is good practice to avoid rewriting history of branches that other people track, rewriting history and force-pushing a pull request is different. Most of the time no one else will maintain a local branch based on it and therefore force-pushing will not inconvenience anyone. Even if it is necessary to develop branches based on someone else's not-yet-merged branches, one needs to weigh the trade-offs of having to do more work in the short-term with the drawbacks of having a messy commit history forever.

Conclusion

I hope this is a useful summary of why each commit should have a clear purpose and embody a single logical change. For source repositories that are used by more than one person it is especially important to think about commit best practices. Clean commit history facilitates better code review, bug-finding, and maintaining stable branches. Beyond that it also provides a useful form of communication and sharing knowledge about the codebase that is missing when commit history is disregarded.

by Unknown (noreply@blogger.com) at February 03, 2021 09:25 AM

February 01, 2021

KVM on Z

Webcast: IBM Secure Execution for Linux Introduction and Demo

IBM Secure Execution for Linux allows to build a Trusted Execution Environment for IBM Z and LinuxONE that helps protect data in use.
This webcast gives an overview of the value and the key concepts of the technology, followed by a hands-on demo, outlining the steps needed to secure Linux workloads.

    UPDATE: A recording of the event is now available here.

Audience: Clients, Business Partners, IT Architects, Systems Admins

Speaker: Viktor Mihajlovski, Linux on IBM Z Development, Product Owner for KVM on IBM Z

Date: November 18, 11:00 AM - 12:15 PM EST

Registration: Register here, and check system requirements here.

by Stefan Raspl (noreply@blogger.com) at February 01, 2021 08:15 PM

January 30, 2021

Stefan Hajnoczi

Why learning the Vim text editor is worth it

Vim logo, GNU General Public License v2 or Later

Many tools come and go as our software and devices change - or we get bored and want to try something new and shiny. One of the few exceptions for me has been the Vim text editor, which I use for programming, emails, and writing every day. In this post I want to share why Vim is remarkable but more generally why learning a text editor is a great investment.

It may not be obvious why text editors are useful tools. Many programs, like web browsers, email clients, and integrated development environments (IDEs), have built-in text editing functionality. Why use a separate text editor? Text editors go deep. They are much more powerful than text boxes in browsers and email clients while being more general than IDEs. While IDEs often have excellent programming language-specific functionality, they are rarely used for other text editing tasks like writing emails or documents because they are specialized tools. Text editors strike a good balance of powerful editing and support for programming without being boxed into a narrow use-case.

Getting started with a text editor is easy. Vim implements the arcane vi editor user interface but has many videos, cheatsheets, and tutorials that make it fun to try. Really getting familiar with the features and customizing the editor to your own needs takes time though. This is true for any popular text editor because the number of settings, extensions, or plugins available can be huge. However, once you are familiar with a text editor you will have a powerful tool that can be used for most tasks involving writing or manipulating text. The time investment will pay off as you use the editor for todo lists, emails, documentation, programming, configuration files, and more.

This explains why I've found Vim a useful and enduring tool that I use daily. But what makes it a particularly strong text editor compared to the other options? Text editors go in and out of fashion all the time. I remember many that attracted attention for a time but then faded away. Vim has remained popular and I think there are a few reasons for that.

Powerful text editing plus IDE-like functionality. The vi user interface is actually a language of text editing operators. The keys you press aren't just keyboard shortcuts, they are like a bytecode (!) for a text manipulation CPU that is Turing-complete. Years ago I wrote vi macros that solve the stable marriage problem to demonstrate this. For many geeks this alone might be enough to convince you to learn Vim! But on top of this crazy text editing power Vim also has IDE-like functionality including syntax highlighting, completion, code search and navigation, compiler error navigation, and diffing. There is a large collection of plugins and scripts if you want to extend Vim's functionality even further.

Vim is ubiquitous. It runs on all major operating systems but furthermore it is found on devices from tiny Wi-Fi routers to the largest servers. It has a GUI but also a terminal interface if you are connecting to a remote machine over SSH. I always find it strange when I see people use their editor of choice on their laptop but then use another, less-familiar editor when connecting to remote machines. Learn Vim and you can use it everywhere!

Keyboard-friendly. Constantly moving my hand between the mouse and keyboard is tiring and distracting. Vim has excellent keyboard support and many things can be done without leaving the home row on the keyboard, including navigating, inserting, and deleting text. I find there is no need to use the arrow keys, mouse, or anything that is hard to reach.

If you are looking for a powerful text editor that you can use for many years then I recommend Vim. It's also worth looking at Emacs, which has a different angle but is also a good time investment. Looking back on 17 years of using Vim, I'm happy I stopped switching between language-specific IDEs and instead found a text editor capable of handling all tasks.

by Unknown (noreply@blogger.com) at January 30, 2021 11:02 AM

January 22, 2021

Stefano Garzarella

SOCAT now supports AF_VSOCK

SOCAT is a CLI utility which enables the concatenation of two sockets together. It establishes two bidirectional byte streams and transfers data between them.

socat supports several address types (e.g. TCP, UDP, UNIX domain sockets, etc.) to construct the streams. The latest version 1.7.4, released earlier this year [2021-01-04], supports also AF_VSOCK addresses:

  • VSOCK-LISTEN:<port>

    • Listen on port and accepts a VSOCK connection.
  • VSOCK-CONNECT:<cid>:<port>

    • Establishes a VSOCK stream connection to the specified cid and port.

FOSDEM 2021

If you are interested on VSOCK, I’ll talk witn Andra Paraschiv (AWS) about it at FOSDEM 2021. The talk is titled Leveraging virtio-vsock in the cloud and containers and it’s scheduled for Saturday, February 6th 2021 at 11:30 AM (CET).

We will show cool VSOCK use cases and some demos about developing, debugging, and measuring the VSOCK performance, including socat demos.

Examples

socat could be very useful for concatenating and redirecting sockets. In this section we will see some examples.

All examples below refer to a guest with CID 42 that we created using virt-builder and virt-install .

VM setup

virt-builder is able to download the installer and create the disk image with Fedora 33 or other distros. It is also able to set the root password and inject the ssh public key, simplifying the creation of guest disk image:

VM_IMAGE="vsockguest_f33.qcow2"

host$ virt-builder --root-password=password:mypassword \
        --ssh-inject root:file:/home/user/.ssh/id_rsa.pub \
        --output=${VM_IMAGE} \
        --format=qcow2 --size 10G --selinux-relabel \
        --update fedora-33

Once the disk image is ready, we create our VM with virt-install. We can specify the VM settings like the number of vCPUs, the amount of RAM, and the CID assigned to the VM [42]:

host$ virt-install --name vsockguest \
        --ram 2048 --vcpus 2 --os-variant fedora33 \
        --import --disk path=${VM_IMAGE},bus=virtio \
        --graphics none --vsock cid.address=42

After the creation of the VM, we will remain attached to the console and we can detach from it by pressing ctrl-].

We can reattach to the console in this way:

host$ virsh console vsockguest

If the VM is turned off, we can boot it and attach directly to the console in this way:

host$ virsh start --console vsockguest

ncat like

It’s possible to use socat like ncat, transferring stdin and stdout via VSOCK.

Guest listening

In this example we start socat in the guest listening on port 1234:

guest$ socat - VSOCK-LISTEN:1234

Then we connect from the host using the CID 42 assigned to the VM:

host$ socat - VSOCK-CONNECT:42:1234

At this point we can exchange characters between guest and host, since stdin and stdout are linked through the VSOCK socket.

Host listening

In this example we do the opposite, starting socat in the host listening on port 1234:

host$ socat - VSOCK-LISTEN:1234

Then, in the guest, we connect to the host using the well defined CID 2. It’s always used to reach the host:

guest$ socat - VSOCK-CONNECT:2:1234

ssh over VSOCK

The coolest feature of socat is to concatenate sockets of different address families, so in this example we redirect ssh traffic through VSOCK socket exposed by the VM.

This example could be useful if the VM doesn’t have any NIC attached and we want to provide some network connectivity, like the ssh access.

First of all, in the guest we start socat linking the VSOCK socket listening on port 22, to a TCP socket which will connect to the local TCP port 22 where the ssh server is listening:

guest$ socat VSOCK-LISTEN:22,reuseaddr,fork TCP:localhost:22

On the host we link a TCP socket listening on a port of our choice (e.g. 4321) to the guest port 22 just opened using VSOCK:

host$ socat TCP4-LISTEN:4321,reuseaddr,fork VSOCK-CONNECT:42:22

Finally from the host we can connect to the guest using ssh on the local port 4321, where socat is listening:

host$ ssh -p 4321 root@localhost

socat redirects all the traffic between the sockets and allow us to use ssh over VSOCK to reach the guest.

by sgarzare@redhat.com (Stefano Garzarella) at January 22, 2021 02:16 PM

January 19, 2021

QEMU project

Configuring virtio-blk and virtio-scsi Devices

The previous article in this series introduced QEMU storage concepts. Now we move on to look at the two most popular emulated storage controllers for virtualization: virtio-blk and virtio-scsi.

This post provides recommendations for configuring virtio-blk and virtio-scsi and how to choose between the two devices. The recommendations provide good performance in a wide range of use cases and are suitable as default settings in tools that use QEMU.

Virtio storage devices

Key points

  • Prefer virtio storage devices over other emulated storage controllers.
  • Use the latest virtio drivers.

Virtio devices are recommended over other emulated storage controllers as they are generally the most performant and fully-featured storage controllers in QEMU.

Unlike emulations of hardware storage controllers, virtio-blk and virtio-scsi are specifically designed and optimized for virtualization. The details of how they work are published for driver and device implementors in the VIRTIO specification.

Virtio drivers are available for Linux, Windows, and other operating systems. Installing the latest version is recommended for the latest bug fixes and performance enhancements.

If virtio drivers are not available, the AHCI (SATA) device is widely supported by modern x86 operating systems and can be used as a fallback. On non-x86 guests the default storage controller can be used as a fallback.

Comparing virtio-blk and virtio-scsi

Key points

  • Prefer virtio-blk in performance-critical use cases.
  • Prefer virtio-scsi for attaching more than 28 disks or for full SCSI support.
  • With virtio-scsi, use scsi-block for SCSI passthrough and otherwise use scsi-hd.

Two virtio storage controllers are available: virtio-blk and virtio-scsi.

virtio-blk

The virtio-blk device presents a block device to the virtual machine. Each virtio-blk device appears as a disk inside the guest. virtio-blk was available before virtio-scsi and is the most widely deployed virtio storage controller.

The virtio-blk device offers high performance thanks to a thin software stack and is therefore a good choice when performance is a priority. It does not support non-disk devices such as CD-ROM drives.

CD-ROMs and in general any application that sends SCSI commands are better served by the virtio-scsi device, which has full SCSI support. SCSI passthrough was removed from the Linux virtio-blk driver in v5.6 in favor of using virtio-scsi.

Virtual machines that require access to many disks can hit limits based on availability of PCI slots, which are under contention with other devices exposed to the guest, such as NICs. For example a typical i440fx machine type default configuration allows for about 28 disks. It is possible to use multi-function devices to pack multiple virtio-blk devices into a single PCI slot at the cost of losing hotplug support, or additional PCI busses can be defined. Generally though it is simpler to use a single virtio-scsi PCI adapter instead.

virtio-scsi

The virtio-scsi device presents a SCSI Host Bus Adapter to the virtual machine. SCSI offers a richer command set than virtio-blk and supports more use cases.

Each device supports up to 16,383 LUNs (disks) per target and up to 255 targets. This allows a single virtio-scsi device to handle all disks in a virtual machine, although defining more virtio-scsi devices makes it possible to tune for NUMA topology as we will see in a later blog post.

Emulated LUNs can be exposed as hard disk drives or CD-ROMs. Physical SCSI devices can be passed through into the virtual machine, including CD-ROM drives, tapes, and other devices besides hard disk drives.

Clustering software that uses SCSI Persistent Reservations is supported by virtio-scsi, but not by virtio-blk.

Performance of virtio-scsi may be lower than virtio-blk due to a thicker software stack, but in many use cases, this is not a significant factor. The following graph compares 4KB random read performance at various queue depths:

Comparing virtio-blk and virtio-scsi performance

virtio-scsi configuration

The following SCSI devices are available with virtio-scsi:

Device SCSI Passthrough Performance
scsi-hd No Highest
scsi-block Yes Lower
scsi-generic Yes Lowest

The scsi-hd device is suitable for disk image files and host block devices when SCSI passthrough is not required.

The scsi-block device offers SCSI passthrough and is preferred over scsi-generic due to higher performance.

The following graph compares the sequential I/O performance of these devices using virtio-scsi with an iothread:

Comparing scsi-hd, scsi-block, and scsi-generic performance

Conclusion

The virtio-blk and virtio-scsi offer a choice between a single block device and a full-fledged SCSI Host Bus Adapter. Virtualized guests typically use one or both of them depending on functional and performance requirements. This post compared the two and offered recommendations on how to choose between them.

The next post in this series will discuss the iothreads feature that both virtio-blk and virtio-scsi support for increased performance.

by Stefan Hajnoczi and Sergio Lopez at January 19, 2021 07:00 AM

December 11, 2020

KVM on Z

New Publications

 

The following videos and publications are now available on the IBM Knowledge Center:

For more updates, check this blog entry.

by Stefan Raspl (noreply@blogger.com) at December 11, 2020 08:04 PM

Powered by Planet!
Last updated: July 27, 2021 03:05 PMEdit this page