martedì 20 agosto 2013

About KVM QCOW2 Live backup

Today I've had to work with a cluster made of QEMU-KVM and using QCOW2 as storage format for VM disk.
It's a cool platform for hosting VM; it's Linux based, so if anything doesn't work I (and those who knows) can go deeply inside the system and find the reason.
But it's not as easy as you can think... I came from VMWare world, where everything is deeply tested (I hope) and easy to do.

My problem now was to find a way to make "hot" backups of VM running with QCOW2 disks. The best practice for store disk images is to use LVM and take snapshots by using the underlying volume manager, so mount the snapshot in read-only and extract files to backup.

I am against this procedure for 3 things:

  1.  It's "dirty" in sense that it freezes the I/O at device level
  2.  It's architecture dependent: the host OS needs to be able to mount the snapshot to extract files (= understand the filesystem)
  3.  It's so much different from VMWare approach to backups

Yes, LVM volumes are cool, I must say that, and they guarantee high performances... but they don't feet my needs.

So how can I do the backup ?

By googling I've found an interesting discussion here. One of the members is Eric Blake from RedHat, so a trusted author, almost for me.

So, by reading this, I've made a small shell script (called bkvm, which you can download) that makes backup using the blockcopy feature of libvirt/qemu. It doesn't stop VM, but guarantees (if blockcopy does it work) data integrity by saving both volumes and VM state.
The maximum downtime is the one needed to save VM memory and restore it (about 30 seconds for a 4 GB memory VM).

I've tested it on libvirt 1.0.6 and qemu-1.5; don't blame me if it destroys you system :-)

I'm obviously in no way responsible for damages which this script could cause to your system.. If you wanna try it, remember that the default directory it uses for backup is /mnt/local_vmbackup but you can easily change it via shell arguments.

If anyone is using it, let me know your impression.

13 commenti:

  1. Hello Luca, thanks for posting this. It is just what I was looking for to do backups of KVM VMs from the host server itself.

    I have a few script adjustments to pass back to you based on trying it out.

    # diff -u bkvm bkvm-new
    --- bkvm-orig 2013-12-30 10:54:39.959802764 -0500
    +++ bkvm-new 2013-12-30 11:14:04.011774960 -0500
    @@ -10,18 +10,23 @@
    # or problem caused by this script and its uses

    -# Default backup path
    +# The values below are defaults -- some can be overridden by providing script command-line arguments when the script is called
    +# Default base backup path -- this directory will be created if it does not already exist

    -# Maximum backup allowed
    +# Default virtual machine disk file path or matching substring if there are multiple data file paths
    +# Maximum history of VM backups allowed (e.g. 2 means keep the current backup and one previous backup)

    -# Default email (disabled)
    +# Default email (an empty string means no e-mail will be sent)


    @@ -72,7 +77,7 @@
    function getVMBlockDevs() {
    local VMNAME=$1
    # ottieni i dischi della vm
    - BLOCKDEVS=$(virsh domblklist $VMNAME 2> /dev/null | grep "/mnt/vm" | cut -d' ' -f1)
    + BLOCKDEVS=$(virsh domblklist $VMNAME 2> /dev/null | grep "$VMDATAPATH" | cut -d' ' -f1)
    if [ "x$BLOCKDEVS" == "x" ]; then
    logLine "Cannot get block device list for domain. Exit."
    doExit 1
    @@ -286,6 +291,15 @@
    function backupVM() {
    local VMNAME=$1

    + # create the base backup directory if it does not exist and set correct ownership
    + mkdir -p "$BKPATH"
    + if [ $? -gt 0 ]; then
    + logLine "Problem creating base backup directory. Exiting."
    + doExit 1
    + fi
    + # set permissions
    + chown -R libvirt-qemu:kvm "$BKPATH"
    # start backup
    logLine "Backup of $VMNAME started"

    @@ -348,9 +362,11 @@
    if [ "x$OPTARG" == "x" ] || [ $OPTARG -le 0 ] || [ $OPTARG -gt 4 ]; then
    - logLine "Invalid number of backup specified. It must be between q and 4."
    + logLine "Invalid number of backups to keep specified. It must be between 1 and 4."
    exit 1
    + logLine "Number of backups to keep set to $OPTARG"
    logLine "Sending email to $OPTARG"

    Tim Miller Dyck

    1. Thanks for the patch you suggested; I use the bash script into a complex python-based backup solutions which do the backup job on a cluster of 5 physical servers with 30 VM running.

      The only problem I’ve found in using the script was after the final “resume” of VM; in the past (but after some kernel updates inside vm the problem seems disappeared) I’ve had to hard-reboot Linux VMs (mainly Ubuntu 12.04 LTS) after resume because they were “frozen”.. very strange.. Can you tell me if it works correctly for you ? Which kind of distro are you using inside your VM ?

  2. How are these meant to be restored in case of a failure in the host?
    Lets say the host system is completely destroyed and the backups are used to get the virtual machines back.
    The qcow2 image is clear, but is there a way to restore the saved VM memory to avoid any data loss?

    1. Virtual machine are backed up by using the blockcopy features; after the script have made copies of all virtual devices, KVM put them in "mirroring" mode, replicating write operations to both original and backup disks. You correctly ask about data loss due to, for example, metadata or contents not yet flushed to disk. To avoid this, I suspend virtual machine, stop mirroring operation and store vm state to disk. This automatically put VM in stopped state and freeze metadata inside memory dump. In fact after this step the script needs to manually restore VM into running state from saved dump since the vm-save-state operation powers it off.
      It's not properly a 0-downtime backup script; we experience a 1-2 seconds downtime (it depends on VM memory size) in every backup.

      In case of disaster, I copy QCOW2 disk and resume vm from saved state; this restores metadata in VM memory.

      System claims that clock jumped in future (which is right) and disk cache immediately expires, flushing dirty buffers to disk. Of course some running services crash due to time-jump, but after a reboot everything restart correctly.

      I think this approach (which should be something intrinsic safe) should give us some warranty. I've made some restore tests and they were succesfully; in many of them the VM just said that time jumped in future and continued working normally, with clock updated.

      To be sure it works correctly we should do backup during high I/O operation and see if data has been corrupted in restore, but for my needs this is enough.

      Let me know if this helps you,

    2. Thanks for the quick reply.

      Yes, I understand how it works, I just haven't ever restored a machine in practice, so I have no idea on what to do with the file containing the dumped memory.
      I don't know what do with it and where to place if I need to rebuild the host machine from scratch.

      Oh, btw. I've been testing your script with Ubuntu 12.04 and libvirt 1.1.1 (from ) and it seems to work fine.

    3. Uh, I'm sorry I forgot to write how to restore the VM :-) However the saved VM memory can be restored via the

      virsh restore filename

      It surely works because it's the same command used at the end of the backup script in the "restoreVMState" function in bkfunctions.

    4. Oh, I didn't think it would be _that_ easy :)

      Grazie mille!

    5. Thanks for you feedback ! If you have time to write a "friendly" restore script, I'll surely test it :-)

    6. I'm hoping that there won't ever be a time that I would need such a script :D

  3. I'm going to use your script for backup of my small business virtualization server.

  4. here is a version of this script with some minor changes. it supports copying images with rsync, if the machine is not running.

    1. ;-)

  5. Hello Jan, the problem here is that if you don't use "--sparse" option with rsync, you end up with a full size image:

    Original disk is 30Gb qcow2 but with only 7.2Gb occupied:
    $ sudo qemu-img info /ssd840pool/ssd840_vmpool/ovirt_italian.qcow2
    image: /ssd840pool/ssd840_vmpool/ovirt_italian.qcow2
    file format: qcow2
    virtual size: 30G (32212254720 bytes)
    disk size: 7.2G
    cluster_size: 65536

    After offline backup with rsync:
    $ sudo qemu-img info /ssd840pool/ssd840_vmpool/ovirt_italian/backup-0/ovirt_italian.qcow2-backup.qcow2
    image: /ssd840pool/ssd840_vmpool/ovirt_italian/backup-0/ovirt_italian.qcow2-backup.qcow2
    file format: qcow2
    virtual size: 30G (32212254720 bytes)
    disk size: 30G
    cluster_size: 65536

    with "--sparse" option added:
    $ qemu-img info /ssd840pool/ssd840_vmpool/ovirt_italian/backup-0/ovirt_italian.qcow2-backup.qcow2
    image: /ssd840pool/ssd840_vmpool/ovirt_italian/backup-0/ovirt_italian.qcow2-backup.qcow2
    file format: qcow2
    virtual size: 30G (32212254720 bytes)
    disk size: 6.7G
    cluster_size: 65536

    not tried restore yet....