0%

Easy way to build a virtual cluster with KVM

Before getting hands dirty

This is my first post written in English, and this post is written for my “underboss” and general Linux users who has a little experience on virtual machines. This time, we are going to build our virtual cluster that is similar to old-school supercomputers, which means our configuration will be some kind of outdated and ugly compared to the latest fancy new clusters, but this configuration is much close to mainstream supercomputers in the real world.

Architecture

To let you have a better understanding of what we are going to build, I will introduce the architecture first.

  • Physical Server (Host)
    • Public IP: 10.20.x.x (Assigned by Campus)
    • Private IP: 192.168.122.1 (Static)
    • KVM
      • Login Node
        • Public IP: 10.20.y.y (Assigned by Campus)
        • Private IP: 192.168.122.10 (Static)
      • Computing Node 1-4
        • Private IP: 192.168.122.11-14 (Static)
  • My Laptop
    • Public IP: 10.20.z.z (Assigned by Campus)

Static IP is assigned by ourselves.

The list below describes my configuration of software and hardware.

  • Server
    • CPU: Intel Xeon E5-2697 v4
    • OS: Ubuntu 18.04 LTS
  • Virtual Machines
    • OS: Ubuntu 20.04 LTS
    • Job Scheduler: IBM LSF
    • Network Storage: NFS

Set up KVM hypervisor on Host

You probably have heard VMware, Hyper-V or Virtualbox. KVM is similar to them, which is also capable of creating a highly-isolated virtual environment.

Make sure Virtualization support is enabled

Some manufacturers disabled CPU Virtualization support for security reasons. Please check your BIOS setting and enable CPU Virtualization support. Besides, KVM kernel module is activated on Ubuntu by default. If you are unsure whether everything is correct, this article will guide you to do some extra examinations.

Install necessary software

The following simple command will install everything we need, including KVM hypervisor. Note that apt is the corresponding packet manager for Ubuntu, just like yum for CentOS.

1
$ sudo apt install virt-manager

Launch GUI VM Manager

Actually, it is quite hard to create virtual machines under command line interface (CLI) when you have to specify a bunch of arguments manually. Luckily, there is a powerful graphical tool to manage virtual machines. Meanwhile, there is a simple way to let a graphical application running on the remote server show on your computer.

For Linux / Mac User

By appending a flag -X to ssh command, the function X11 Forwarding will be enabled. In brief, the data of the graphical interface will be forwarded and rendered on your computer.

1
2
local$ ssh -X 10.20.x.x # Your Server's IP
remote$ virt-manager

Then you will notice virt-manger is already shown on your computer.

For Windows User

Unfortunately, Windows doesn’t have a build-in X11 Library. Thus additional software called VcXsrv is required. After installing it, open it, follow the path Multiple windows -> Start a program -> Start a program on remote computer, and fill in your information.

Note that Remote program could be either xterm if you have it or virt-manager. xterm is a terminal and you could use it to launch virt-manager directly.

Create login node

Just like what you will do to create a virtual machine using VMware or something else, create it using virt-manager. But there are several points you should pay attention to.

CPU and Memory

I recommend checking the boxes Copy host CPU configuration, which helps Guest OS to identify CPU instruction set correctly and Manually set CPU topology. My configuration is for each node, one socket, four cores, and one thread per core. As for memory, under most circumstances, two gigabytes of memory per core should be enough.

Disk

By default, Storage format is qcow, which is fine but preallocates a lot of space. In contrast, vmdk is the format that only consumes disk space when virtual machines store new data.

The best Disk bus is VirtIO, which is the default option for Ubuntu virtual machines since Ubuntu has native support for VirtIO. Actually, it provides the best performance compared to alternatives.

Network

Every node will own a network interface card (NIC) that only accesses our private network. So Network source and Device model should be NAT and VirtIO respectively. Still, VirtIO is the best option for the same reason.

Display VNC

If you cannot input anything to virtual machines, try to change VNC Server type from Spice to VNC.

Install Guest OS

During installation, some settings should be modified.

Necessary software

SSH must be installed because a lot of distributed programs rely on this. Other software is optional.

Change Software Source (Optional)

You are able to change software source during installation if having trouble fetching software packages in China. https://mirrors.tuna.tsinghua.edu.cn is recommended. Additionally, you can download OS Image from this site.

Network

IP should be manually assigned and belong to the same private subnet. For example, by default, the DHCP server will automatically assign your node with a new IP like 192.168.122.233. This IP exactly belongs to your private subnet, but it is a random number. So, we could set 192.168.122.10 as its new IP address.

Clone nodes

After installing OS, it is time to make a login node by cloning, which also takes the storage node’s responsibility and provides shared storage. virt-clone is a good utility to clone virtual machines. Here is the usage.

1
$ virt-clone -o existed_node_name -n new_node_name --auto-clone

We just need to clone one node. In other words, we should have login node and node 1 after that. Consider that some settings will be shared within all computing nodes, so we will make the remaining nodes after configuring node 1.

Make login node accessible

By far, there is only one private IP assigned to login node, which results in that our computer cannot connect to login node without the server’s help. The solution is straightforward, adding a new NIC with public IP. Click the button Add Hardware and select Network. This time, Network source should be one of physical NICs, the one that the server uses to connect to the campus network. Source mode has better be Bridge, which will allows the campus’ network infrastructure to recognize your virtual machines directly. As for VEPA, it is similar to Bridge, but requires an advanced physical switch that supports the “hairpin” mode and theoretically improves network performance.

Note that one known flaw is that the server cannot access the virtual machine via its public IP address, vice versa. This is caused by the driver macvtap, and could be regarded as the trade-off between functionality and performance. But the server could still access the virtual machine via its private IP. If you cannot accept this flaw, you should consider the old approach, setting up network bridges (virtual switches) and tuntap devices (virtual NICs) manually.

Configure shared network storage

On many supercomputers in the real world, there are a few special nodes called storage nodes. They are responsible for storing all the user data such as /home directory and sharing within the cluster. Since user programs will be executed by plenty of nodes simultaneously, all nodes should be capable of accessing the program data. That’s why shared network storage will be deployed on the cluster.

Although many productive clusters will use a much more advanced and complex storage system, even a distributed system, NFS has already satisfied our needs. Moreover, it is simple.

On Login Node

First, create /opt directory, which could store shared software such as Intel Parallel Studio.

1
$ sudo mkdir /opt

And modify the file /etc/exports (sudo required), add the following lines.

1
2
/home 192.168.122.0/24(rw,sync,no_root_squash,no_subtree_check)
/opt 192.168.122.0/24(rw,sync,no_root_squash,no_subtree_check)

Note that 192.168.122.0/24 should be replaced with your subnet address if your private IP doesn’t begin with 192.168.122.

Then restart NFS service.

1
$ sudo service nfs-kernel-server restart

On Computing Node

Add the following lines to /etc/fstab (sudo required), then reboot.

1
2
192.168.122.10:/home /home nfs auto,nofail,noatime,nolock,intr,tcp,actimeo=1800 0 0
192.168.122.10:/opt /opt nfs auto,nofail,noatime,nolock,intr,tcp,actimeo=1800 0 0

Note that this change is permanent. Although anything wrong will less likely destroy the whole system, the command to test whether you can mount NFS disks is still listed below.

1
2
3
4
5
6
$ mkdir /tmp/home
$ mkdir /tmp/opt
$ mount 192.168.122.10:/home /tmp/home
$ mount 192.168.122.10:/opt /tmp/opt
$ touch /tmp/home/test
$ touch /tmp/opt/test

If you can observe test files in /opt and /home directories, everything should work correctly.

Clone computing nodes

Refer to Clone nodes section and clone to remaining nodes.

Change Hostname and IP

Now all the nodes are labeled the same name, which looks weird. And hostnamectl is the tool to change the hostname. I took the names node-login, node1, node2… for my cluster.

1
$ hostnamectl set-hostname your_new_name

Furthermore, all the nodes use the same IP to communicate, which will disturb communications. So open the directory /etc/netplan/, find a yaml file, and open it, then you should see something similar to the one below.

1
2
3
4
5
6
# This is the network config written by 'subiquity'
network:
ethernets:
ens3:
dhcp4: true
version: 2

And replace it with the following code.

1
2
3
4
5
6
7
8
9
10
# This is the network config written by 'subiquity'
network:
ethernets:
ens3:
dhcp4: false
addresses: [192.168.122.11/24] # Your new static IP address
gateway4: 192.168.122.1 # Your server's private IP
nameservers:
addresses: [172.18.1.92, 172.18.1.93] # DNS Servers
version: 2

Then apply the new configuration.

1
$ sudo netplan apply

Enter the command ip a to check whether new IP address is applied.

Write list of machines

/etc/hosts

~/machinefile

(to be continued)

Set up password-less login

Set up password-less sudo

(to be continued)

Install extra utilities (Optional)

Parallel-SSH

Environment Modules

IBM Spectrum LSF

(to be continued)