FirstServed Tech Blog - FirstServed and the Art of Server Tuning

Archive for the ‘Xen’ Category

Xen and SNMPD

Friday, January 4th, 2008

SNMPD has problems with xen net drivers.
Why?
Xen net emulates a blank MAC address (FF:FF:FF:FF:FF:FF)
Which in it’s turn generates a faulty IPv6 address which is duplicated amongst the interfaces.
SNMPD cannot handle these duplicated addresses an detects an IP conflict.

Adding VLANs to XenServer 4.0

Thursday, September 6th, 2007

In our Xen confidurations, we like to configure different VLANs on the Dom0 network interfaces, which are then bridged to the DomU guests.  Not only is this more secure than configuring the VLAN interfaces on the virtual machine, it’s also the only way we know of that will work with Windows guests, since on Windows you need proprietary software for Broadcom or Intel NICs in order to configure VLANs, and the Xen Virtual Network Adapter NIC certainly doesn’t provide any software to this end.

We were pretty annoyed to say the least, when we found out that the brand new XenServer 4.0 – which costs five times as much as it’s predecessor XenServer 3.2 – doesn’t allow one to add VLANs to the host interfaces.  Both the XenCenter console and the xe vlan-create command return ‘This operation is not allowed with your current license’.
Hope was not lost, however, since we figured out the following workaround:

In /etc/sysconfig/network-scripts, comment out the following line in ifcfg-eth0:

DEVICE=eth0
ONBOOT=yes
TYPE=Ethernet
HWADDR=00:19:b9:ea:4d:b7
BRIDGE=xenbr0
check_link_down() { return 1 ; }

Add a new VLAN interface, called ifcfg-eth0.142:

DEVICE=eth0.142
BOOTPROTO=static
ONBOOT=yes
TYPE=Ethernet
BRIDGE=xenbr0
check_link_down() { return 1 ; }

Add the following line to /etc/sysconfig/network:

VLAN=yes

You can now test the new setup without restarting by using the following commands:

brctl delif xenbr0 eth0
modprobe 8021q
vconfig add eth0 142
ifup ifcfg-eth0.142
brctl addif xenbr0 eth0.142

You should now have connectivity on your newly created VLAN interface.

Bridging VLAN interfaces in Xen

Thursday, July 19th, 2007

The init scripts bundled with Xen offer little or no support for VLANs.  One way to configure VLANs is to do it at the DomU level, but that’s a trick that’s only feasible on Linux guests, since on Windows VLAN support for ethernet cards is implemented by proprietary software of the NIC vendor, and there’s certainly no such thing as Windows software for the Xen Virtual NIC.

In case the Dom0 doesn’t need active interfaces on the VLAN bridges, you can follow the solution outlined on this page: Bridging domains to tagged VLANs in Xen.

I found a much simpler solution.  The reason Xen’s network-bridge doesn’t play ball with VLAN interfaces is that at a certain point, /sbin/ifdown is called to bring down the net device – a prerequisite to rename it and reassign it to the bridge.  Ifdown however has the undesirable side effect of removing any configured VLAN’s on the device it’s bringing down, effectively deleting the interface entirely.  So when running the network-bridge script, you’ll get error messages saying the device was not found:

SIOCSIFNAME: No such device

I modified xen-network-common.sh so ifdown is no longer called directly.  Instead of it, the ip address of the interface is flushed manually, then the device’s link is brought down, leaving the vlan configured on the interface intact.  The following script is based on Xen 3.0.3′s xen-network-common.sh script.

if [ -e /etc/SuSE-release ]
then
  preiftransfer()
  {
    eval `/sbin/getcfg -d /etc/sysconfig/network/ -f ifcfg- — $1`
  }
  ifup()
  {
    /sbin/ifup ${HWD_CONFIG_0} $1
  }
elif ! which ifup >/dev/null 2>/dev/null
then
  preiftransfer()
  {
    true
  }
  ifup()
  {
    false
  }
  ifdown()
  {
    false
  }
else
  preiftransfer()
  {
    true
  }
  # do not call ifdown directly
  ifdown()  {
    ip addr flush $1
    ip link set $1 down
    true
  }

fi

first_file()
{
  t="$1"
  shift
  for file in $@
  do
    if [ "$t" "$file" ]
    then
      echo "$file"
      return
    fi
  done
}

find_dhcpd_conf_file()
{
  first_file -f /etc/dhcp3/dhcpd.conf /etc/dhcpd.conf
}

find_dhcpd_init_file()
{
  first_file -x /etc/init.d/{dhcp3-server,dhcp,dhcpd}
}

# configure interfaces which act as pure bridge ports:
#  – make quiet: no arp, no multicast (ipv6 autoconf)
#  – set mac address to fe:ff:ff:ff:ff:ff
setup_bridge_port() {
    local dev="$1"

    # take interface down …
    ip link set ${dev} down

    # … and configure it
    ip link set ${dev} arp off
    ip link set ${dev} multicast off
    ip link set ${dev} addr fe:ff:ff:ff:ff:ff
    ip addr flush ${dev}
}

# Usage: create_bridge bridge
create_bridge () {
    local bridge=$1

    # Don’t create the bridge if it already exists.
    if [ ! -e "/sys/class/net/${bridge}/bridge" ]; then
        brctl addbr ${bridge}
        brctl stp ${bridge} off
        brctl setfd ${bridge} 0
        sysctl -w "net.bridge.bridge-nf-call-arptables=0"
        sysctl -w "net.bridge.bridge-nf-call-ip6tables=0"
        sysctl -w "net.bridge.bridge-nf-call-iptables=0"
        ip link set ${bridge} arp off
        ip link set ${bridge} multicast off
    fi
    # A small MTU disables IPv6 (and therefore IPv6 addrconf).
    mtu=$(ip link show ${bridge} | sed -n ‘s/.* mtu \([0-9]\+\).*/\1/p’)
    ip link set ${bridge} mtu 68
    ip link set ${bridge} up
    ip link set ${bridge} mtu ${mtu:-1500}
}

# Usage: add_to_bridge bridge dev
add_to_bridge () {
    local bridge=$1
    local dev=$2

    # Don’t add $dev to $bridge if it’s already on a bridge.
    if [ -e "/sys/class/net/${bridge}/brif/${dev}" ]; then
        ip link set ${dev} up || true
        return
    fi
    brctl addif ${bridge} ${dev}
    ip link set ${dev} up
}

You can the start up the needed bridges from within your my-network-bridge script – assuming you change (network-script network-bridge) to (network-script my-network-bridge) in xend-config.sxp:

#!/bin/sh
dir=$(dirname "$0")
"$dir/network-bridge" "$@" vifnum=0 netdev=eth0 bridge=xenbr0
"$dir/network-bridge" "$@" vifnum=1 netdev=eth1 bridge=xenbr1
"$dir/network-bridge" "$@" vifnum=2 netdev=eth1.142 bridge=xenbr142
"$dir/network-bridge" "$@" vifnum=3 netdev=eth1.143 bridge=xenbr143

The above example gives you the choice of four different bridges for each domU, some of which are transparently bound to a VLAN.

How to mount and eject a CD-rom on a Windows Xen guest

Friday, June 15th, 2007

Originally, it was possible to access the QEMU console from within a VNC viewer window, by pressing Ctrl-Alt-1.  Since this made for a major security breach – users are able to mount any file on Dom0 this way, the feature has been disabled in recent Xen releases.  That makes for a major problem, since mounting, ejecting and changing CD’s in Windows seems only possible by rebooting the Virtual Machine.

After a bit of trial and error, we came to a solution that allows the mounting and ejecting of CD’s from within Windows:

In your Xen guest config file, you must specify an empty CD device – since without it Windows will fail to recognize a CD-rom device:

disk=['phy:/dev/...,ioemu:hda,w',',hdc:cdrom,r']

Mounting a CD-rom

Execute xm block-list to view the configured bloack devices for your Xen guest:

# xm block-list <vm-id> –long
(768
    ((backend-id 0)
        (virtual-device 768)
        (device-type disk)
        (state 1)
        (backend /local/domain/0/backend/vbd/1/768)
    )
)
(5632
    ((backend-id 0)
        (virtual-device 5632)
        (device-type cdrom)
        (state 1)
        (backend /local/domain/0/backend/vbd/1/5632)
    )
)

Note the cdrom’s device number and detach it from the guest. Use the –force switch, else detaching will fail.

 
# xm block-detach 1 5632 -f

Now reattach the device with the correct path specified ( phy:/dev/cdrom, file:/path/to/some/iso, … ):

# xm block-attach 1 phy:/dev/cdrom /dev/hdc r

Unmounting a CD-rom

Eject the cdrom from Windows by right-clicking on its icon and selecting ‘Eject’.

Eject the cdrom physically from Dom0 if needed:

# eject /dev/cdrom

Remounting a CD-rom

Now’s the fun part: it appears that if you try to remount exactly the same backend device, e.g. /dev/cdrom, the Windows HVM guest will not be signalled that a new device has been inserted.  A workaround for this is to attach and to detach another device first – any will do, as long as it’s different:

# xm block-attach 1 phy:/dev/sda /dev/hdc r
# xm block-detach 1 5632 -f

Then attach your new device:

# xm block-attach 1 phy:/dev/cdrom /dev/hdc r

Booting XenServer or XenEnterprise from software RAID

Thursday, June 7th, 2007

Since XenServer and XenEnterprise do not support installing the operating system on an MD software RAID device during the installation, you’ll have to undertake a few steps afterwards if you want to mirror your boot disks.

This article is largely based on Harry de Jong’s solution, as posted on the XenSource forums, with a few differences:

  • There’s no need to copy mdadm.static from another server to the Xen host;
  • The initrd ramdisk’s configuration is a bit simpler;
  • There’s no need to swap the physical disks, which means the procedure can be accomplished remotely.

Create identical partitions to your boot disk /dev/sda on your second disk /dev/sdb

If /dev/sda looks like this:

# fdisk -l /dev/sda
Disk /dev/sda: 249.3 GB, 249376538624 bytes
255 heads, 63 sectors/track, 30318 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1         499     4008186   83  Linux
/dev/sda2             500         998     4008217+  83  Linux
/dev/sda3             999       30318   235512900   83  Linux

/dev/sdb should look like this:

# fdisk -l /dev/sdb
Disk /dev/sdb: 249.3 GB, 249376538624 bytes
255 heads, 63 sectors/track, 30318 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1         499     4008186   fd  Linux raid autodetect
/dev/sdb2             500         998     4008217+  fd  Linux raid autodetect
/dev/sdb3             999       30318   235512900   fd  Linux raid autodetect

Do not forget to set the partition type to ‘fd’ (Linux RAID autodetect) instead of the default 83 (Linux).

Create the necessary device nodes

# mknod /dev/md0 b 9 0
# mknod /dev/md1 b 9 1
# mknod /dev/md2 b 9 2

Create your MD RAID 1 arrays (with missing disks)

# mdadm –create /dev/md0 –level=1 –raid-devices=2 /dev/sdb1 missing
# mdadm –create /dev/md1 –level=1 –raid-devices=2 /dev/sdb2 missing
# mdadm –create /dev/md2 –level=1 –raid-devices=2 /dev/sdb3 missing

/dev/sda2 is just an empty, unmounted partition used by XenServer for upgrades.

 

Copy the Xen Storage Manager data over to RAID

# pvcreate /dev/md2
# vgextend VG_XenStorage-3553b468-fca7-46c0-baeb-7cd471a6a9ab /dev/md1
# pvmove /dev/sda3 /dev/md2

Replace the uuid in the vgextend line above with the uuid of your own Xen SR.  Use ‘sm info’ to display your SR’s uuid.

Remove /dev/sda3 from the SR volume group and add it to the RAID array

# vgreduce VG_XenStorage-3553b468-fca7-46c0-baeb-7cd471a6a9ab /dev/sda3
# pvremove /dev/sda3
# mdadm -a /dev/md2 /dev/sda3

Mount /dev/md0 and copy the filesystem to it

# mkfs.ext3 /dev/md0
# mount /dev/md0 /mnt
# cd /
# cp -axv . /mnt

Make a new initrd ramdisk containing the MD RAID drivers

Modify /mnt/etc/fstab so the system will mount / from /dev/md0 instead of LABEL=/-main.  Replace "LABEL=/-main" by "/dev/md0".

Create a new boot image and uncompress it:

# mkdir /mnt/root/initrd-raid
# mkinitrd –fstab=/mnt/etc/fstab /mnt/root/initrd-raid/initrd-2.6.16.38-xs3.2.0.531.3960xen-raid.img 2.6.16.38-xs3.2.0.531.3960xen
# cd /mnt/root/initrd-raid
# zcat initrd-2.6.16.38-xs3.2.0.531.3960xen-raid.img | cpio -i

Since mkinitrd looks at /etc/fstab to determine what device the root volume is on, it has now added the necessary raid drivers to the new boot image.

Uncompress the current ramdisk and add the raid drivers to it

# mkdir /root/initrd
# cd /root/initrd
# zcat /boot/initrd-2.6.16.38-xs3.2.0.531.3960xen.img | cpio -i

Now add the raid module from the new ramdisk and modify the init file:

# cp /root/initrd-raid/lib/raid1.ko lib
# vi init

Add the following lines before the second line containing "/sbin/udevstart":

echo "Loading raid1.ko module"
insmod /lib/raid1.ko

 

Add the following lines before the line containing "echo Creating root device":

raidautorun /dev/md0
raidautorun /dev/md1
raidautorun /dev/md2

Note: if you’ve created other MD raid devices, add a ‘raidautorun’ statement for them as well.

Copy the new ramdisk to the /mnt/boot folder and add modify GRUB’s boot menu

# find . -print | cpio -o -Hnewc | gzip -c > /mnt/boot/initrd-2.6.16.38-xs3.2.0.531.3960xen-raid.img
# rm /mnt/boot/initrd-2.6-xen.img
rm: remove symbolic link `/mnt/boot/initrd-2.6-xen.img’? y
# ln -s initrd-2.6.16.38-xs3.2.0.531.3960xen-raid.img /mnt/boot/initrd-2.6-xen.img
# vi /mnt/boot/grub/menu.lst

Replace "root=LABEL=/-main" by "root=/dev/md0" in all menu entries.

Unmount /dev/md0

# cd /root
# umount /dev/md0
# sync

Set up the Master Boot Record on /dev/sdb

# grub

grub> device (hd0) /dev/sdb

grub> root (hd0,0)
 Filesystem type is ext2fs, partition type 0xfd

grub> setup (hd0)
 Checking if "/boot/grub/stage1" exists… yes
 Checking if "/boot/grub/stage2" exists… yes
 Checking if "/boot/grub/e2fs_stage1_5" exists… yes
 Running "embed /boot/grub/e2fs_stage1_5 (hd0)"…  16 sectors are embedded.
succeeded
 Running "install /boot/grub/stage1 (hd0) (hd0)1+16 p (hd0,0)/boot/grub/stage2
/boot/grub/grub.conf"… succeeded
Done.

grub> quit

 

Modify GRUB’s boot menu to boot from the second disk

# vi /boot/grub/menu.lst

Replace the line in the first XenServer entry that says "root (hd0,0)" with "root (hd1,0)". Also replace "root=LABEL=/-main" by "root=/dev/md0" in all menu entries.  This will allow you to boot from /dev/md0 instead of /dev/sda.  In case something goes wrong, you could always reboot using one of the other entries, which still point to /dev/sda.

Reboot

# shutdown -r now

If all goes well, you should see your system mounting /dev/md0 as the filesystem root.  If not, reboot using one of the other GRUB menu entries and check out the previous steps.

Change /dev/sda’s partition types from Linux to Linux RAID autodetect

# fdisk /dev/sda

The result should be something like this:

Disk /dev/sda: 249.3 GB, 249376538624 bytes
255 heads, 63 sectors/track, 30318 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1         499     4008186   fd  Linux raid autodetect
/dev/sda2             500         998     4008217+  fd  Linux raid autodetect
/dev/sda3             999       30318   235512900   fd  Linux raid autodetect

Add the missing partitons to the RAID arrays

# mdadm -a /dev/md0 /dev/sda1
mdadm: hot added /dev/sda1
# mdadm -a /dev/md1 /dev/sda2
mdadm: hot added /dev/sda2
# mdadm -a /dev/md2 /dev/sda3
mdadm: hot added /dev/sda3

Then wait for the sync to complete:

# watch cat /proc/mdstat

Copy the running RAID setup to /etc/mdadm.conf

# mdadm –detail –scan >> /etc/mdadm.conf

Be sure to add a line ‘DEVICE partitions’ at the top of /etc/mdadm.conf if you want to define MD devices that span a whole disk instead of just a partition.  Without this line, MD won’t be able to detect your drives as MD components at boot.

That’s it!

Getting Xen networking to work on a Dell Poweredge 2950

Friday, April 6th, 2007

What happened to our networking?

Here we were, admiring our brand new Dell Poweredge 2950 server that we were going to use as a Xen host.  Fedora Core 6 installed without a hitch, but when we tried booting the Xen kernel, we lost connectivity on the first network card for Dom0.  The second NIC, which had not been tweaked by Xen’s bridging setup, still worked as a charm.  So, regretfully, we set out to learn once again things we’d rather left unlearned…

Dell, Broadcom and the art of technical support

After googling around a bit, we found that the problem was hardly a new one.  It appears that the Broadcom Netextreme II BCM5708 NIC’s that ship with the Dell Poweredge 2950 do not play ball with Xen’s bridging setup, at least not when support for Dell’s IPMI card ( DRAC ) is enabled.  Xen does some internal tweaking with your network interfaces, and ends up assigning eth0′s mac address to a new virtual interface, which is Dom0′s eth0.  The original eth0 gets renamed to peth0, and gets a mac address of ff:ff:ff:ff:ff:fe, since in normal circumstances this interface should not be visible on the LAN.

It appears that the Broadcom BCM5708 NIC’s filter out all traffic that is not directed at the NIC’s physical mac address, at least when the NIC’s management function is enabled.  The funny thing is that this problem has been acknowledged by Dell as far back as the summer of 2006, but they haven’t seen fit to bring out a firmware update ro solve this problem.  Calling Dell Support brought us no further: there’s nothing known about an eventual release of a patch.  How the guys are going to cope with the new Suse Linux Enterprise Server 10 and Red Hat Enterprise Linux 5, both of whom proudly announce the wonders of Xen virtualisation, I don’t know.

Two solutions for the problem

The first, quick and dirty workaround, is to disable the management functions on your Broadcom NIC’s.  Since this means going without IPMI support, and forgoing remote reboot and serial text console functionality , that’s more of a choice between Scylla and Charibdis.  Anyway, this workaround is documented at:

How to disable the management function on Broadcom NetXtreme II BCM5708 NIC

We chose for a second solution, which is slightly more complex, but has the advantage of keeping IPMI connectivity intact.  Since the Broadcom NIC’s require that the physical NIC’s mac addresses remain unchanged, we had to tweak Xen’s bridging scripts to allow this.  Dom0′s network interfaces, which normally are assigned the physical NIC’s mac addresses, get a locally administered mac address out of Xen’s mac address range instead.  Beware: you should take care to assign different addresses to all Xen servers on a same subnet, or you may experience connectivity problems to your Dom0′s.

So here’s what we did:

In you Xen config file, found at /etc/xen/xend-config.sxp in our installation, change the default network script to a custom one:

# comment this one out
#(network-script network-bridge)
# add this one
(network-script my-network-bridge)

Then you have to create your custom bridging script.  Place it in /etc/xen/scripts/my-network-bridge – or the directory your xen scripts reside in.  The script should contain the following lines – the second one is not strictly necessary if you only want top bridge one network interface:

#!/bin/sh
dir=$(dirname "$0")
# The mac addresses you assign here should be different for every server.
# Also, watch out for conflicts with your DomU’s mac addresses.
# Use the 00:16:3e:xx:xx:xx range, which is Xen’s assigned range
"$dir/network-bridge" "$@" vifnum=0 macaddr=00:16:3e:00:00:00
"$dir/network-bridge" "$@" vifnum=1 macaddr=00:16:3e:00:00:01

Then finally, modify your network-bridge script like this – the changes are outlined in bold:

#!/bin/sh
#============================================================================
# Default Xen network start/stop script.
# Xend calls a network script when it starts.
# The script name to use is defined in /etc/xen/xend-config.sxp
# in the network-script field.
#
# This script creates a bridge (default xenbr${vifnum}), adds a device
# (default eth${vifnum}) to it, copies the IP addresses from the device
# to the bridge and adjusts the routes accordingly.
#
# If all goes well, this should ensure that networking stays up.
# However, some configurations are upset by this, especially
# NFS roots. If the bridged setup does not meet your needs,
# configure a different script, for example using routing instead.
#
# Usage:
#
# network-bridge (start|stop|status) {VAR=VAL}*
#
# Vars:
#
# vifnum     Virtual device number to use (default 0). Numbers >=8
#            require the netback driver to have nloopbacks set to a
#            higher value than its default of 8.
# bridge     The bridge to use (default xenbr${vifnum}).
# netdev     The interface to add to the bridge (default eth${vifnum}).
# antispoof  Whether to use iptables to prevent spoofing (default no).
#
# Internal Vars:
# pdev="p${netdev}"
# vdev="veth${vifnum}"
# vif0="vif0.${vifnum}"
#
# start:
# Creates the bridge
# Copies the IP and MAC addresses from netdev to vdev
# Renames netdev to be pdev
# Renames vdev to be netdev
# Enslaves pdev, vdev to bridge
#
# stop:
# Removes netdev from the bridge
# Transfers addresses, routes from netdev to pdev
# Renames netdev to vdev
# Renames pdev to netdev
# Deletes bridge
#
# status:
# Print addresses, interfaces, routes
#
#============================================================================

dir=$(dirname "$0")
. "$dir/xen-script-common.sh"
. "$dir/xen-network-common.sh"

findCommand "$@"
evalVariables "$@"

vifnum=${vifnum:-$(ip route list | awk ‘/^default / { print $NF }’ | sed ‘s/^[^0-9]*//’)}
vifnum=${vifnum:-0}
bridge=${bridge:-xenbr${vifnum}}
netdev=${netdev:-eth${vifnum}}
antispoof=${antispoof:-no}
# add a new macaddr parameter to the script
macaddr=${macaddr:-ff:ff:ff:ff:ff:ff}

pdev="p${netdev}"
vdev="veth${vifnum}"
vif0="vif0.${vifnum}"

get_ip_info() {
    addr_pfx=`ip addr show dev $1 | egrep ‘^ *inet’ | sed -e ‘s/ *inet //’ -e ‘s/ .*//’`
    gateway=`ip route show dev $1 | fgrep default | sed ‘s/default via //’`
}

do_ifup() {
    if ! ifup $1 ; then
        if [ ${addr_pfx} ] ; then
            # use the info from get_ip_info()
            ip addr flush $1
            ip addr add ${addr_pfx} dev $1
            ip link set dev $1 up
            [ ${gateway} ] && ip route add default via ${gateway}
        fi
    fi
}

# Usage: transfer_addrs src dst
# Copy all IP addresses (including aliases) from device $src to device $dst.
transfer_addrs () {
    local src=$1
    local dst=$2
    # Don’t bother if $dst already has IP addresses.
    if ip addr show dev ${dst} | egrep -q ‘^ *inet ‘ ; then
        return
    fi
    # Address lines start with ‘inet’ and have the device in them.
    # Replace ‘inet’ with ‘ip addr add’ and change the device name $src
    # to ‘dev $src’.
    ip addr show dev ${src} | egrep ‘^ *inet ‘ | sed -e "
s/inet/ip addr add/
s@\([0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+/[0-9]\+\)@\1@
s/${src}/dev ${dst}/
" | sh -e
    # Remove automatic routes on destination device
    ip route list | sed -ne "
/dev ${dst}\( \|$\)/ {
  s/^/ip route del /
  p
}" | sh -e
}

# Usage: transfer_routes src dst
# Get all IP routes to device $src, delete them, and
# add the same routes to device $dst.
# The original routes have to be deleted, otherwise adding them
# for $dst fails (duplicate routes).
transfer_routes () {
    local src=$1
    local dst=$2
    # List all routes and grep the ones with $src in.
    # Stick ‘ip route del’ on the front to delete.
    # Change $src to $dst and use ‘ip route add’ to add.
    ip route list | sed -ne "
/dev ${src}\( \|$\)/ {
  h
  s/^/ip route del /
  P
  g
  s/${src}/${dst}/
  s/^/ip route add /
  P
  d
}" | sh -e
}

##
# link_exists interface
#
# Returns 0 if the interface named exists (whether up or down), 1 otherwise.
#
link_exists()
{
    if ip link show "$1" >/dev/null 2>/dev/null
    then
        return 0
    else
        return 1
    fi
}

# Set the default forwarding policy for $dev to drop.
# Allow forwarding to the bridge.
antispoofing () {
    iptables -P FORWARD DROP
    iptables -F FORWARD
    iptables -A FORWARD -m physdev –physdev-in ${pdev} -j ACCEPT
    iptables -A FORWARD -m physdev –physdev-in ${vif0} -j ACCEPT
}

# Usage: show_status dev bridge
# Print ifconfig and routes.
show_status () {
    local dev=$1
    local bridge=$2

    echo ‘============================================================’
    ip addr show ${dev}
    ip addr show ${bridge}
    echo ‘ ‘
    brctl show ${bridge}
    echo ‘ ‘
    ip route list
    echo ‘ ‘
    route -n
    echo ‘============================================================’
}

op_start () {
    if [ "${bridge}" = "null" ] ; then
        return
    fi

    if ! link_exists "$vdev"; then
        if link_exists "$pdev"; then
            # The device is already up.
            return
        else
            echo "
Link $vdev is missing.
This may be because you have reached the limit of the number of interfaces
that the loopback driver supports.  If the loopback driver is a module, you
may raise this limit by passing it as a parameter (nloopbacks=<N>); if the
driver is compiled statically into the kernel, then you may set the parameter
using loopback.nloopbacks=<N> on the domain 0 kernel command line.
" >&2
            exit 1
        fi
    fi

    create_bridge ${bridge}

    if link_exists "$vdev"; then
        mac=`ip link show ${netdev} | grep ‘link\/ether’ | sed -e ‘s/.*ether \(..:..:..:..:..:..\).*/\1/’`
        preiftransfer ${netdev}
        transfer_addrs ${netdev} ${vdev}
        if ! ifdown ${netdev}; then
            # If ifdown fails, remember the IP details.
            get_ip_info ${netdev}
            ip link set ${netdev} down
            ip addr flush ${netdev}
        fi
        ip link set ${netdev} name ${pdev}
        ip link set ${vdev} name ${netdev}

#       do not use the default bridging function as this resets the mac address
#       setup_bridge_port ${pdev}

#       instead use the code below, which leaves the mac address intact
        ip link set ${pdev} down
        ip link set ${pdev} arp off
        ip link set ${pdev} multicast off
        ip addr flush ${pdev}

        setup_bridge_port ${vif0}

#       do not assign pethX’s mac address to Dom0′s ethX
#       ip link set ${netdev} addr ${mac} arp on

#       instead use the $macaddr parameter we passed to the script
        ip link set ${netdev} addr ${macaddr} arp on

        ip link set ${bridge} up
        add_to_bridge  ${bridge} ${vif0}
        add_to_bridge2 ${bridge} ${pdev}
        do_ifup ${netdev}
    else
        # old style without ${vdev}
        transfer_addrs  ${netdev} ${bridge}
        transfer_routes ${netdev} ${bridge}
    fi

    if [ ${antispoof} = 'yes' ] ; then
        antispoofing
    fi
}

op_stop () {
    if [ "${bridge}" = "null" ]; then
        return
    fi
    if ! link_exists "$bridge"; then
        return
    fi

    if link_exists "$pdev"; then
        ip link set dev ${vif0} down
        mac=`ip link show ${netdev} | grep ‘link\/ether’ | sed -e ‘s/.*ether \(..:..:..:..:..:..\).*/\1/’`
        transfer_addrs ${netdev} ${pdev}
        if ! ifdown ${netdev}; then
            get_ip_info ${netdev}
        fi
        ip link set ${netdev} down arp off
        ip link set ${netdev} addr fe:ff:ff:ff:ff:ff
        ip link set ${pdev} down
        ip addr flush ${netdev}

#       do not reassign pethX’s mac address, since it hasn’t changed
#       ip link set ${pdev} addr ${mac} arp on

        brctl delif ${bridge} ${pdev}
        brctl delif ${bridge} ${vif0}
        ip link set ${bridge} down

        ip link set ${netdev} name ${vdev}
        ip link set ${pdev} name ${netdev}
        do_ifup ${netdev}
    else
        transfer_routes ${bridge} ${netdev}
        ip link set ${bridge} down
    fi
    brctl delbr ${bridge}
}

# adds $dev to $bridge but waits for $dev to be in running state first
add_to_bridge2() {
    local bridge=$1
    local dev=$2
    local maxtries=10

    echo -n "Waiting for ${dev} to negotiate link."
    ip link set ${dev} up
    for i in `seq ${maxtries}` ; do
        if ifconfig ${dev} | grep -q RUNNING ; then
            break
        else
            echo -n ‘.’
            sleep 1
        fi
    done

    if [ ${i} -eq ${maxtries} ] ; then echo ‘(link isnt in running state)’ ; fi

    add_to_bridge ${bridge} ${dev}
}

case "$command" in
    start)
        op_start
        ;;

    stop)
        op_stop
        ;;

    status)
        show_status ${netdev} ${bridge}
        ;;

    *)
        echo "Unknown command: $command" >&2
        echo ‘Valid commands are: start, stop, status’ >&2
        exit 1
esac

Finally, comment out the mac addresses in the ifup scripts, else your OS will complain about setting up eth0 and eth1 with different mac addresses than those mentioned in the ifup scripts:

# Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet
DEVICE=eth0
BOOTPROTO=static
BROADCAST=192.168.255.255
#HWADDR=00:1A:A0:05:12:AB
IPADDR=192.168.0.11
NETMASK=255.255.0.0
NETWORK=192.168.0.0
ONBOOT=yes

Restart your services as needed, and you’ll be able to connect to your network interfaces without a hitch!

Using ntfsclone and LVM to backup and restore Xen Windows images

Friday, March 23rd, 2007

For faster living and greater comfort, you should use LVM 2 logical volumes for the disks you export to your Windows virtual machines.  This will allow you to backup the disks easily using LVM snapshots, and as an added bonus it’ll be really easy to clone further Windows images once you’ve installed Windows once.

The recipe goes as follows:

Taking a snapshot of your virtual disk

Assuming your initial Windows VM resides on /dev/vgdrbd0/lvdrbd0 – which it does in our installations:

# lvcreate -L 1G -s -n lvsnapshot /dev/vgdrbd0/lvdrbd0

This will create a 1 gigabyte LVM snapshot which we’ll use for backup.  Note that the snapshot volume doesn’t have to be as big as the source volume, since it only stores differences between itself and the original.

Backing up your snapshot using ntfsclone

The obvious way to copy your snapshot to a safe location would be to use dd – you can’t go wrong with a bit per bit copy of your volume, but we chose to use ntfsclone, which is quite a bit faster, since it supports both sparse files and its own special image format.  The image file produced by those options will only be the size of the allocated blocks on the partition, not the size of the entire partition.

ntfsclone operates on partition level, not on disk level, and it’s important to keep this in mind.  We don’t want to backup the disk /dev/vgdrbd0/lvsnapshot, but its partitions – one in this case.  Since the NTFS partition contained on our logical volume normally isn’t in use by the Linux OS, we’ll first have to add its partition tables to our known devices using kpartx:

# kpartx -a /dev/vgdrbd0/lvsnapshot

A new entry has now been added by the device mapper: /dev/mapper/lvsnapshot1.  We’ll backup this volume using ntfsclone:

# ntfsclone -s -o windows.img /dev/mapper/lvsnapshot1

This may take considerable time, depending on the size of the partition you want to backup.  Get yourself your favourite hot beverage and watch the progress meter creep along.  Do not forget to clean up your system afterwards, else your LVM snapshot will continue to register its differences with its source volume until it fills up completely:

# kpartx -d /dev/vgdrbd0/lvsnapshot
# lvremove /dev/vgdrbd0/lvsnapshot

Do you really want to remove active logical volume "lvsnapshot"? [y/n]: y
  Logical volume "lvsnapshot" successfully removed

Restoring your Windows VM to your original partition

Restoring your image to the same volume is simply reversing the steps above.  Of course, your target logical volume should not be in use by the Xen domU when you’re restoring.

# kpartx -a /dev/vgdrbd0/lvdrbd0
# ntfsclone -r -O /dev/mapper/lvdrbd0p1 windows.img
# kpartx -d /dev/vgdrbd0/lvdrbd0

Do NOT copy your image file to /dev/vgdrbd0/lvdrbd0, since that will copy your image to your disk instead of to your partition.  The net result of this is that you’ll overwrite the first crucial 512 bytes of your disk, which contain your MBR (master boot record) and parition table.

Cloning your Windows VM to another disk

Your target volume should be of exactly the same size as the source volume you’re cloning, and should contain the same partition info.  Else, NTFS may fail booting since the partition’s boot sector will not correlate to the partition layout on the new disk.  There are ways and means to work around this problem, using hex editor magic, but that’s outside the scope of this article.

In order to get an exact copy of your original disk, create an LVM volume of the same size as the original (assuming your original volume had a size of 2.5 gigabyte ):

# lvcreate -L 2.5G -n lvdrbd1 vgdrbd0

Since no Windows Install was run on this disk, all you have at this moment is an empty disk, which is missing among others its MBR and partition table.

Copy the first 512 bytes of your original volume to a file using dd and copy them to your newly created logical volume.  They contain the disk’s MBR and partition table.  An alternative would be to copy the configuration using fdisk, but hey, why bother since both disks are identical!

# dd if=/dev/vgdrbd0/lvdrbd0 of=mbr.img count=1 bs=512
# dd mbr.img of=/dev/vgdrbd0/lvdrbd1 bs=512 count=1
# fdisk -ul /dev/vgdrbd0/lvdrbd1

Disk /dev/vgdrbd0/lvdrbd0: 2684 MB, 2684354560 bytes
128 heads, 63 sectors/track, 650 cylinders, total 5242880 sectors
Units = sectors of 1 * 512 = 512 bytes

                Device   Boot       Start  End         Blocks     Id System
/dev/vgdrbd0/lvdrbd0p1   *          63     5233535     2616736+   7  HPFS/NTFS

After this, proceed as when restoring to your original partition:

# kpartx -a /dev/vgdrbd0/lvdrbd1
# ntfsclone -r -O /dev/mapper/lvdrbd1p1 windows.img
# kpartx -d /dev/vgdrbd0/lvdrbd1

Et voilà!