Migrating instances between compute nodes

Modified: 26 Sep 2019 21:20 UTC

The following details instance migrations, both zones (SmartOS and Container Native Linux) and HVMs (Linux/Windows) between two compute nodes (CN). Differences between Zone and KVM instances are noted in the details below.

Notes:

Warnings

Because of this risk, customers with current Triton support contracts must contact Joyent Support for access to the migrator.sh script. To do this, please send an email to help@joyent.com

Overview

The migrator.sh script is designed to be used from the head node to handle the full migration of a instance between two compute nodes.

Of note, migrator.sh will determine the current run state of the instance prior to migration. If the instance is running prior to migration, it will be started on the dest compute node at the end of migration. If it is in any state other than running, the instance will not be restarted.

The script will create it's own ssh keys and push them to the source and destination compute nodes; these keys are used to enable migrator.sh to move the zfs data between the compute nodes without requiring passwords. These keys are removed at the completion of the migration.

At this time, migrator.sh will handle three kinds of migrations, full, incremental, and automatic incremental:

There are a number of options available for the migrator.sh script:

[root@headnode (cak-1) ~]# /opt/custom/bin/migrator.sh -h

migrator.sh migrates an instance from one CN to another within an
    AZ in SDC.  A migration can be all at once or incremental.  All at
    once means the instance will be immediately shut down, datasets
    transferred, config files copied, APIs updated, and restarted once
    transfer is complete to the new CN.  Incremental means the instance
    will stay online while incremental dataset snapshots are
    transferred to the new CN, decreasing in size until the instance's
    migration state file is removed.  Once the state file is removed,
    the instance is shut down, a final snapshot taken and transferred
    to the new CN, config files copied, APIs updates, and the instance
    is restarted on the new CN.

Usage:  migrator.sh [ -h ]
        migrator.sh [ -v ]
        migrator.sh [ -i | -a ] [ -D ] INST_UUID DEST_CN_NAME

    -h            This help output
    -D            Allow migration to an older dest PI
    -v            Display migrator.sh version and exit
    -i            Perform an incremental migration (optional); A log
                    file and state file will be created at:
                        /var/tmp/migrator-INST_UUID-DEST_CN_NAME-ST_EPOCH.EXT
                    where EXT is either 'log' or 'st'; incremental
                    dataset transfers will continue to occur while
                    the state file exists; once the state file is
                    removed, the migration be will finalized of all
                    other work to complete the migration
    -a            Automatic incremental mode (optional); A hybrid mode in which
                    one incremenatal dataset transfer will occurr after which
                    migrator will remove the statefile for you and finish the
                    migration normally.  a implies i
    INST_UUID     Instance UUID to be migrated
    DEST_CN_NAME  CN Hostname to migrate the instance to

The migrator.sh script now supports migrating instances on Fabric (VXLAN) Networks and Docker instances.

Full migration

This is the default migration option, and requires that the source instance be down during the entire process.

At a high level, the migrator.sh script performs the following for a Full Migration.

  1. Validates the instance UUID.
  2. Validates access to the source and destination compute nodes.
  3. Generates and pushes SSH keys for use between the compute nodes.
  4. Reserves the instance's IP address(es)
  5. Verifies the instance on the source compute node.
  6. Shuts the instance down and validates the attributes.
  7. Snapshots the datasets associated with the instance.
  8. Transfers the datasets and configuration from the source to the destination.
  9. Reviews the transfer and cleans up the snapshots.
  10. Creates the cores dataset on the destination compute node.
  11. Sets up the /etc/zones/index file on the destination compute node.
  12. Forces the API to acknowledge the instance on the destination compute node.
  13. Verifies the instance in the API.
  14. Boots the instance and validates state (Only if instance was in a running state).
  15. Unreserve the instance's IP address(es).
  16. Provides instructions on cleaning data from the source compute node.

Preparation

You will need to have root level access to the source compute node (CN), the destination compute node, and the head node (HN). You will also need to know the UUID of the instance to be migrated, and you should ensure that the destination compute node has the requisite traits, network tags, disk space, memory, and CPU requirements necessary to host the migrated instance.

Note: It is highly recommended that you push an SSH key from the head node to the compute nodes to allow passwordless ssh between the head node and the compute node. This will eliminate the need to type passwords during this process. The keys that are created by migrator.sh are used only between the source and destination compute nodes, not the head node and the compute nodes.

Identify instance

Identify the instance's current server_uuid, alias, create_timestamp, and state. Also determine the hostname of the server it is running on; this can be done using the sdc-vmapi and sdc-cnapi tools. For point of example, we'll be migrating SmartOS instance "kirby" with the UUID of "5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa" from the server source-server.

headnode# sdc-vmapi /vms/5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa| json -Ha server_uuid alias create_timestamp state
00000000-0000-0000-0000-d43d7ef73056 kirby 2013-12-12T20:37:34.494Z running
headnode# sdc-cnapi /servers/00000000-0000-0000-0000-d43d7ef73056 | json -Ha hostname
source-server

Estimate migration time

In order to determine the amount of time required to migrate an instance, the mig-estimator.sh script can be used. This script is designed to be run on the compute node hosting the instance to be migrated, and assumes a data transfer rate of 20MB/s - the script can be adjusted to use a different data transfer rate if desired. The script provides an estimate for each instance found on the compute node, as well as an overall "purge" value to indicate how long it would take to migrate all instances off the compute node.

All networks are different; test your network speed and adjust the mig-estimator.sh script as required for your network.

For more about how to use this script, please contact support at support@joyent.com.

Locate a destination compute node

You will need to ensure that the destination compute node meets all necessary requirements for the migrated instance:

The migrator is an operator-level tool, so it will bypass normal sanity checks (i.e. checking to make sure the right networks are present, traits, etc). If you are not familiar with operator tools, it is possible to render the VM unbootable on new hardware. Consider instead contacting support.

Validate access to source and destination compute nodes

Identify the IP address for both source (src) and destination (dest) compute nodes; the code snippet below is one way to determine the IP addresses. Replace 10.1.1 with the internal subnet you are using for your Triton installation:

headnode# sdc-oneachnode -n source-server,dest-server 'ifconfig -a| grep 10.1.1' | awk '!/HOST/{print $1" ("$3")"}'
source-server (10.1.1.33)
dest-server (10.1.1.34)

Once you have the IP addresses, validate that you are able to establish an SSH connection. In our example below we are using private/public key authentication, although it is acceptable to use passwords it is highly recommended that you configure SSH keys between your head node and the compute nodes you are using to help streamline this process.

headnode# ssh 10.1.1.33 uname -a
SunOS SRC_SERVER 5.11 joyent_20140115T175151Z i86pc i386 i86pc
headnode# ssh 10.1.1.34 uname -a
SunOS dest-server 5.11 joyent_20140115T175151Z i86pc i386 i86pc

Using the migrator.sh script - full migration

You can use migrator.sh from the head node to handle fully migrating an instance between two compute nodes. Of note, migrator.sh will determine the current run state of the instance prior to migration. If the instance is running prior to migration, it will be started on the destination compute node at the end of migration. If it is in any state other than running, the instance will not be restarted.

Verify the instance on the source compute node

Verify the current run state and datasets of the instance on the source compute node:

source-server# INSTANCE_UUID="5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa" ; zoneadm list -vic | grep ${INSTANCE_UUID} ; zfs list -rtall | grep ${INSTANCE_UUID}
   3 5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa running    /zones/5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa joyent   excl
zones/5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa                                                    208M  24.8G  2.41G  /zones/5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa
zones/cores/5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa                                               31K   100G    31K  /zones/5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa/cores
Note on Time Estimation

From the above datasets, there's about 208MB of data that will need to be migrated to the destination compute node based on the 4th column (REFER) of zfs output above. For these examples, we assume a data transfer rate of 1 - 1.5 GB of data per minute. This means data transfer should take between 10 and 20 seconds, with about 2-3 minutes of overhead from migrator.sh. However, all networks are different so please test your network speed and throughput and adjust accordingly.

Destination compute node prep work

On the dest compute node, run the following to watch the progress of the migration (the output repeats once per minute, but you can adjust by changing the sleep time):

dest-server#  INSTANCE_UUID="5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa" ; while : ; do date ;  zoneadm list -vic | grep ${INSTANCE_UUID} ; zfs list -rtall | grep ${INSTANCE_UUID} ; echo ; sleep 60 ; done
March  3, 2014 09:58:11 PM UTC

Destroy or recreate NAT zones

Use nat-recreate.sh to destroy or recreate a NAT zone, and migrate the zone off a compute node.

The migrator.sh script will not migrate an NAT zone. It also will not migrate any core zones used by Triton DataCenter or Object Storage.

Start the migration

On the head node, start the migration. You'll want to execute migrator.sh with root privileges. Note that you can either pass the instance UUID and compute node name on the command line, or allow the script to prompt you for that data. The following is the full runtime output of our migration:

headnode# ./migrator.sh

           -: Welcome to the SDC Migrator :-

UUID of Instance to Move: 5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa
Destination Server: dest-server

* Please wait, gathering instance and CN data....
    + retrieving instance alias, server_uuid, brand, create_timestamp,
      image_uuid, zone_state, quota, ram, and owner_uuid ....           [ DONE ]
    + checking instance for any IP addrs to potentially reserve during
      the migration ...                             [ DONE ]
    + retrieving SRC CN hostname, SDC Version, reservation status, and
      IP addr ...                               [ DONE ]
    + retrieving DEST CN UUID, SDC Version, reservation status, and
      IP addr ...                               [ DONE ]
    + checking instance for datasets to migrate during the migration ...    [ DONE ]
    + Creating an SSH key on source-server to copy to authorized_keys on
      dest-server for the migration; Key will be removed once migration completes.  [ DONE ]
    + Copying SSH key to dest-server...                     [ DONE ]
  - Data gathering complete!

We will be migrating:
     INSTANCE:
                uuid:  5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa
               alias:  kirby
          IP Addr(s):  192.168.212.115
          datacenter:  cak-1
          zone state:  running
                type:  ZONE
               owner:  d56db211-59cf-4913-c10c-cb35d3a26bee
           create at:  2014-04-14T16:52:42.749Z
          base image:  398deede-c025-11e3-8b24-f3ba141900bd
  total dataset size:  207.98 MBytes across 1 datasets
 est. migration time:  2.17 minutes (@ ~20 MB / second (~1.17 GB / minute); +2 minutes extra)
      migration type:  Non-Incremental

                    SRC CN                                         DEST CN
 ----------------------------------------------  ----------------------------------------------
Host:     source-server                         Host:     dest-server
UUID:     10895171-5599-db11-8667-763f24705829  UUID:     00000000-0000-0000-0000-d43d7ef73056
SDC Ver:  7.0                                   SDC Ver:  7.0
IP Addr:  10.1.1.33                             IP Addr:  10.1.1.1
reserved: false                                 reserved: false
migr key: /root/.ssh/migr-key-38135             auth_keys bkup: /root/.ssh/authorized_keys.38135

Are you ready to proceed? [y|n] y
Here we go...

* Checking if tcp_max_buf, tcp_xmit_hiwat, and tcp_recv_hiwat have
  been tuned on both source-server and dest-server ...
  - nothing to do for source-server
  - nothing to do for dest-server
* Checking for origin dataset for the instance...
    + origin dataset is zones/398deede-c025-11e3-8b24-f3ba141900bd@final
      > origin DS doesn't exist on dest-server, will need to try import
        of origin DS (398deede-c025-11e3-8b24-f3ba141900bd@final)
* Attempting import of image (398deede-c025-11e3-8b24-f3ba141900bd) to dest-server
  - Image imported successfully!

**** Runtime impact and pre-migration changes to ZONE instance
****    5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa (kirby)
**** start here!

* Instance kirby is running; shutting it down, please wait ..  [ DONE ]
  - New state is:  stopped
* Source XML file modifications...
    + Checking instance XML file for create-timestamp (JPC-1212)...
    = create-timestamp already exists
    + Checking instance XML file for dataset-uuid (image_uuid; AGENT-629)...
    = dataset-uuid already exists
  - Source XML file modifications complete!
* Getting Zone index configuration...                       [ DONE ]
    + checking if we need to correct it (JPC-1421)
  - Zone index configuration check complete!
* Creating dataset snapshots on source-server
    + Creating zones/5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa snapshot...       [ DONE ]
* Transferring dataset snapshots to dest-server
    + Tranferring zones/5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa@vmsnap-1393137060_ops_migration_1393137060 ... The authenticity of host '10.1.1.1 (10.1.1.1)' can't be established.
RSA key fingerprint is 3c:86:e6:13:e6:59:4d:fc:9a:ae:59:05:19:30:fb:11.
Are you sure you want to continue connecting (yes/no)? yes
done
* Creating cores dataset on dest-server
  - Created cores dataset:  zones/cores/5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa
* Transferring Zone XML to the destination...                   [ DONE ]
* Deleting migration snapshots on dest-server...                    [ DONE ]
* Disabling instance on source-server...                        [ DONE ]
* Importing & Attaching Zone on dest-server...                  [ DONE ]
* Restarting vmadmd on dest-server, to ensure detection of transferred VM's presence.
* Temporarily reserving instance IP addrs so they don't get reprovisioned elsewhere
  in the short amount of time that the instance may show up as "destroyed"...
    + reserving 192.168.212.115 on network_uuid e76a5115-1353-4788-b3a3-7302e2f2b710...    Done
* Setting attr 'do-not-inventory' for 5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa on source-server...
    + Restarting vmadmd again on destination, to hasten update of VM's presence.
    + Restarting heartbeater on destination, to hasten update of VM's presence.
* Checking VMAPI for 'do-not-inventory' update (VMAPI will show dest-server's
  server_uuid if updated); may take up to a minute or so ......... Success!
* Unreserving instance IP addrs since we're no longer at risk of
  the instance showing up as "destroyed".
    - unreserving 192.168.212.115 on network_uuid e76a5115-1353-4788-b3a3-7302e2f2b710...    Done
* Enabling Autostart on the dest-server...                   [ DONE ]
* VM kirby is ready for startup, please wait for boot..Done. (State is: running)

                    ===   Done! ===

ZONE Instance:  5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa (kirby)

          is now installed on

      Dest CN:  dest-server (00000000-0000-0000-0000-d43d7ef73056)

    Migration started:  April 28, 2014 03:43:40 PM UTC
      Migration ended:  April 28, 2014 03:51:42 PM UTC
Duration of migration:  2.23 minutes
    Instance downtime:  17 seconds
       Migration type:  Non-Incremental
 # dataset increments:  1

Don't forget to at least comment out 5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa
  on source-server in /etc/zones/index, if not outright remove it.
To remove it on source-server:
   * /usr/bin/ssh 10.1.1.33 "/usr/sbin/zonecfg -z 5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa delete -F"
   * /usr/bin/ssh 10.1.1.33 "/usr/sbin/zfs destroy zones/cores/5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa"
   * /usr/bin/ssh 10.1.1.33 "/usr/sbin/zfs destroy -r zones/5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa"

 Please validate in AdminUI:  https://10.1.1.26/vms/5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa
        or validate via CLI:  /opt/smartdc/bin/sdc-vmapi /vms/5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa

* Clearing migration SSH keys
Monitoring progress

On the dest compute node, we started a while loop to basically watch progress of the dataset transfers. The following is a sample (taken at 1 minute intervals) of that output while migrator.sh was running:

dest-server#  INSTANCE_UUID="5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa" ; while : ; do date ;  zoneadm list -vic | grep ${INSTANCE_UUID} ; zfs list -rtall | grep ${INSTANCE_UUID} ; echo ; sleep 60 ; done
March  3, 2014 09:58:11 PM UTC
zones/5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa                              100M  24.8G  2.69G  /zones/5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa

March  3, 2014 09:59:11 PM UTC
zones/5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa                              200M  24.8G  2.69G  /zones/5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa

March  3, 2014 10:00:11 PM UTC
zones/5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa                              215M  24.8G  2.69G  /zones/5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa
zones/cores/5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa                        144K  2.50G   144K  /zones/5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa/cores                                       31K  13.1G    31K  /zones/92a45727-b73c-664f-92e9-e13cdb635b28/cores

End state on Destination compute node

Post migration, the instance should appear on the Destination compute node as:

dest-server#  INSTANCE_UUID="5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa" ; zoneadm list -vic | grep ${INSTANCE_UUID} ; zfs list -rtall | grep ${INSTANCE_UUID}
  28 5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa running    /zones/5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa joyent   excl
zones/5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa                              215M  24.8G  2.69G  /zones/5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa
zones/cores/5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa                        144K  2.50G   144K  /zones/5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa/cores                                       31K  13.1G    31K  /zones/92a45727-b73c-664f-92e9-e13cdb635b28/cores

Validate migration via APIs

Validate the instance was successfully migrated and is reported as being hosted by the destination compute node:

dest-server# sdc-vmapi /vms/5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa | json -Ha server_uuid alias create_timestamp state
00000000-0000-0000-0000-d43d7ef73056 kirby 2014-04-14T16:52:42.749Z running

Verify instance

Verify the state and sanity of the migrated instance with the instance user. Provided they have no issues, you can destroy the instance on the source compute node.

Clean up the source compute node

If you haven't yet verified the instance with the instance user, at least comment it out in /etc/zones/index on the source compute node by using a # in front of the appropriate line.

source-server# INSTANCE_UUID="5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa "; grep ${INSTANCE_UUID} /etc/zones/index
#5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa :installed:/zones/5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa:5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa

If you have verified the instance as stable with the user, remove the instance from the source compute node:

headnode# /usr/bin/ssh 10.1.1.33 "/usr/sbin/zonecfg -z 5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa delete -F"
headnode# /usr/bin/ssh 10.1.1.33 "/usr/sbin/zfs destroy zones/cores/5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa"
headnode# /usr/bin/ssh 10.1.1.33 "/usr/sbin/zfs destroy -r zones/5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa"
Zone, Bhyve and KVM differences

KVM instances have 2 extra datasets that Zones do not have, zones/:UUID-disk0 and zones/:UUID-disk1. If we had migrated a KVM instance, there would be additional destroy commands for the 2 zones/:UUID-disk\# datasets associated with a KVM instance.

Bhyve instances have the top-level disk quota and reservation at 100% of the sum of the root and child dataset quotas. There is no additional quota for snapshots. For this reason, the migrator has extra steps to temporarily bump up the quota before migration and reset it after migration.

Incremental migration

An incremental migration differs from a standard migration in that migrator.sh will enable the operator to minimize the amount of downtime the instance will incur as a result of the migration. The benefit to the incremental migration is fully realized when you have large instances that would take hours (or days) to migrate.

Rather than use a full migration that would immediately shut down the instance, during an incremental migration the instance is still online until the last set of snapshots. Providing the rate of data deviation before removal of the state file is no more than a few GB (say < 2GB), the instance will only be down for about 2 minutes.

In the course of an incremental migration, migrator.sh will start sending incremental snapshots of the instance's datasets to the destination compute node immediately while the instance is still running.

Once the first round of snapshots has been received, migrator.sh will pause for 60 seconds, create a new set of snapshots for the instance, and send the deviation between the new snapshots and the original snapshots. During this, the instance continues to run.

Once the 2nd round of snapshots has been received, 'migrator.sh' will pause for 60 seconds, create a new set of snapshots for the instance, and send the deviation between the new snapshots and the 2nd round of snapshots. In order to track when it should finalize the migration and quit with snapshotting, migrator.sh creates a state file to indicate migration type is incremental. As long as that state file exists on the head node, migrator.sh continues to create and send snapshots.

As soon as that state file is removed, migrator.sh will finish sending the current round of snapshots, shutdown the instance, and take one final incremental snapshot of the instance's datasets. It will then send the deviation between the last snapshots and the penultimate snapshots, at which point the destination compute node will now have the full content of the instance's datasets. At this point, migrator.sh finishes the migration of the instance just as it would in a full migration.

At a high level, the migrator.sh script performs the following for a Incremental Migration.

  1. Validates the instance UUID.
  2. Validates access to the source and destination compute nodes.
  3. Creates a state file to be used to determine when to complete the migration.
  4. Generates and pushes SSH keys for use between the compute nodes.
  5. Reserves the instance's IP address(es)
  6. Verifies the instance on the source compute node.
  7. Snapshots the datasets associated with the instance.
  8. Transfers the datasets to the destination compute node.
  9. Pauses at completion of transfer and then loops and:
    • Creates a new set of snapshots.
    • Sends the deviation between the snapshots to the destination compute node.
    • Checks to see if the state file exists.
    • If the state file exists, goes back to the top of the loop.
    • If the state file does not exist, the script breaks out of the loop and continues.
  10. Transfers the configuration from the source to the destination.
  11. Shuts the instance down and validates the attributes.
  12. Reviews the transfer and cleans up the snapshots.
  13. Creates the cores dataset on the destination compute node.
  14. Sets up the /etc/zones/index file on the destination compute node.
  15. Forces the API to acknowledge the instance on the destination compute node.
  16. Verifies the instance in the API.
  17. Boots the instance and validates state.
  18. Unreserve the instance's IP address(es).
  19. Provides instructions on cleaning data from the source compute node.   There is an exponential number of backups for incremental migrations.

Preparation

You will need to have root level access to the source compute node (CN), the destination compute node, and the head node (HN). You will also need to know the UUID of the instance to be migrated, and you should ensure that the destination compute node has the requisite traits, network tags, disk space, memory, and CPU requirements necessary to host the migrated instance.

Note: It is highly recommended that you push an SSH key from the head node to the compute nodes to allow passwordless SSH between the head node and the compute node. This will eliminate the need to type passwords during this process. The keys that are created by migrator.sh are used only between the source and destination compute nodes, not the head node and the compute nodes.

Identify instance

Identify the instance's current server_uuid, alias, create_timestamp, and state. Also determine the hostname of the server it is running on; this can be done using the sdc-vmapi and sdc-cnapi tools. For point of example, we'll be migrating SmartOS instance "kirby" with the UUID of "5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa" from the server source-server.

headnode# sdc-vmapi /vms/43aba933-2cd6-6c47-e63e-b1d2d1ab4956 | json -Ha server_uuid alias create_timestamp state
00000000-0000-0000-0000-d43d7ef73056 pkgbuild 2013-12-12T20:37:34.494Z running
headnode# sdc-cnapi /servers/00000000-0000-0000-0000-d43d7ef73056 | json -Ha hostname
source-server

Estimate migration time

In order to determine the amount of time required to migrate an instance, the mig-estimator.sh script can be used. This script is designed to be run on the compute node hosting the instance to be migrated, and assumes a data transfer rate of 20MB/s - the script can be adjusted to use a different data transfer rate if desired. The script provides an estimate for each instance found on the compute node, as well as an overall "purge" value to indicate how long it would take to migrate all instances off the compute node.

All networks are different; test your network speed and adjust the mig-estimator.sh script as required for your network.

For more about how to use this script, please contact support at support@joyent.com.

Locate a destination compute node

You will need to ensure that the Destination compute node meets all necessary requirements for the migrated instance:

Validate access to source and destination compute nodes

Identify the IP address for both source (src) and destination (dest) compute nodes; the code snippet below is one way to determine the IP addresses. Replace 10.1.1 with the internal subnet you are using for your Triton installation:

headnode# sdc-oneachnode -n source-server,dest-server 'ifconfig -a| grep 10.1.1' | awk '!/HOST/{print $1" ("$3")"}'
source-server (10.1.1.33)
dest-server (10.1.1.34)

Once you have the IP addresses, validate that you are able to establish an SSH connection. In our example below we are using private/public key authentication, although it is acceptable to use passwords it is highly recommended that you configure ssh keys between your head node and the compute nodes you are using to help streamline this process.

headnode# ssh 10.1.1.33 uname -a
SunOS SRC_SERVER 5.11 joyent_20140115T175151Z i86pc i386 i86pc
headnode# ssh 10.1.1.34 uname -a
SunOS dest-server 5.11 joyent_20140115T175151Z i86pc i386 i86pc

Using the migrator.sh script - incremental migration

You can use migrator.sh from the head node to handle fully migrating an instance between two compute nodes. Of note, migrator.sh will determine the current run state of the instance prior to migration. If the instance is running prior to migration, it will be started on the destination compute node at the end of migration. If it is in any state other than running, the instance will not be restarted.

Verify the instance on the source compute node

Verify the current run state and datasets of the instance on the source compute node:

source-server# INSTANCE_UUID="43aba933-2cd6-6c47-e63e-b1d2d1ab4956" ; zoneadm list -vic | grep ${INSTANCE_UUID} ; zfs list -rtall | grep ${INSTANCE_UUID}
   - 43aba933-2cd6-6c47-e63e-b1d2d1ab4956 installed  /zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956 joyent   excl
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956                              98G   34.8G    98G  /zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956
zones/cores/43aba933-2cd6-6c47-e63e-b1d2d1ab4956                        144K  15.0G   144K  /zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956/cores                                             31K   100G    31K  /zones/5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa/cores
Note on time estimation

From the above datasets, there's about 98GB of data that will need to be migrated to the destination compute node based on the 4th column (REFER) of zfs output above. For these examples, we assume a data transfer rate of 1 - 1.5 GB of data per minute. This means data transfer should take between 80 and 120 minutes, with about 2-3 minutes of overhead from migrator.sh. However, all networks are different so please test your network speed and throughput and adjust accordingly.

Destination compute node prep work

On the dest compute node, run the following to watch the progress of the migration (the output repeats once per minute, but you can adjust by changing the sleep time):

dest-server#  INSTANCE_UUID="5ca5b3dc-48fe-6f75-ed8c-8d8fbd77e2aa" ; while : ; do date ;  zoneadm list -vic | grep ${INSTANCE_UUID} ; zfs list -rtall | grep ${INSTANCE_UUID} ; echo ; sleep 60 ; done
March  3, 2014 09:58:11 PM UTC

Start the migration

On the head node, start the migration. You'll want to execute migrator.sh with root privileges. Note that you can either pass the instance UUID and compute node name on the command line, or allow the script to prompt you for that data. The following is the full runtime output of our migration:

headnode# ./migrator.sh -i

           -: Welcome to the SDC Migrator :-

UUID of Instance to Move: 43aba933-2cd6-6c47-e63e-b1d2d1ab4956
Destination Server: dest-server

* Please wait, gathering instance and CN data....
    + retrieving instance alias, server_uuid, brand, create_timestamp,
      image_uuid, zone_state, quota, ram, and owner_uuid ....           [ DONE ]
    + checking instance for any IP addrs to potentially reserve during
      the migration ...                             [ DONE ]
    + retrieving SRC CN hostname, SDC Version, reservation status, and
      IP addr ...                               [ DONE ]
    + retrieving DEST CN UUID, SDC Version, reservation status, and
      IP addr ...                               [ DONE ]
    + checking instance for datasets to migrate during the migration ...    [ DONE ]
    + Creating an SSH key on source-server to copy to authorized_keys on
      dest-server for the migration; Key will be removed once migration completes.  [ DONE ]
    + Copying SSH key to dest-server...                     [ DONE ]
  - Data gathering complete!

We will be migrating:
     INSTANCE:
                uuid:  43aba933-2cd6-6c47-e63e-b1d2d1ab4956
               alias:  pkgbuild
          IP Addr(s):  192.168.212.108
          datacenter:  cak-1
          zone state:  installed
                type:  ZONE
               owner:  d56db211-59cf-4913-c10c-cb35d3a26bee
           create at:  2014-03-13T15:44:35.586Z
          base image:  74c3b232-7961-11e3-a7a7-935768270b93
  total dataset size:  97.72 GBytes across 1 datasets
 est. migration time:  1.42 hours (@ ~20 MB / second (~1.17 GB / minute); +2 minutes extra)
      migration type:  Incremental
  migration log file:  cak-1 HN:/var/tmp/migrator-43aba933-2cd6-6c47-e63e-b1d2d1ab4956-dest-server-1398709539.log
migration state file:  cak-1 HN:/var/tmp/migrator-43aba933-2cd6-6c47-e63e-b1d2d1ab4956-dest-server-1398709539.st

                    SRC CN                                         DEST CN
 ----------------------------------------------  ----------------------------------------------
Host:     source-server                               Host:     dest-server
UUID:     10895171-5599-db11-8667-763f24705829  UUID:     00000000-0000-0000-0000-d43d7ef73056
SDC Ver:  7.0                                   SDC Ver:  7.0
IP Addr:  10.1.1.33                             IP Addr:  10.1.1.1
reserved: false                                 reserved: false
migr key: /root/.ssh/migr-key-52742             auth_keys bkup: /root/.ssh/authorized_keys.52742

Are you ready to proceed? [y|n] y
Here we go...

* Checking if tcp_max_buf, tcp_xmit_hiwat, and tcp_recv_hiwat have
  been tuned on both source-server and dest-server ...
  - nothing to do for source-server
  - nothing to do for dest-server
* Checking for origin dataset for the instance...
    + origin dataset null or non-standard (-)
      > instance DS origin does not reference IMG_UUID@final, relation to
        origin DS will be lost in the course of migrating the instance.
* Creating dataset snapshots on source-server
    + Creating zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956 snapshot 0...     [ DONE ]
* Transferring incremental dataset snapshot 0 to dest-server
    + Tranferring zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-0 ... done
    + Sleeping for 60 seconds before next snapshot.
* Creating dataset snapshots on source-server
    + Creating zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956 snapshot 1...     [ DONE ]
* Transferring incremental dataset snapshot 1 to dest-server
    + Tranferring zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-1 ... done
    + Sleeping for 60 seconds before next snapshot.
* Creating dataset snapshots on source-server
    + Creating zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956 snapshot 2...     [ DONE ]
* Transferring incremental dataset snapshot 2 to dest-server
    + Tranferring zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-2 ... done
    + Sleeping for 60 seconds before next snapshot.
* Creating dataset snapshots on source-server
    + Creating zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956 snapshot 3...     [ DONE ]
* Transferring incremental dataset snapshot 3 to dest-server
    + Tranferring zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-3 ... done
    + Sleeping for 60 seconds before next snapshot.
* Creating dataset snapshots on source-server
    + Creating zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956 snapshot 4...     [ DONE ]
* Transferring incremental dataset snapshot 4 to dest-server
    + Tranferring zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-4 ... done
    + Sleeping for 60 seconds before next snapshot.
* Creating dataset snapshots on source-server
    + Creating zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956 snapshot 5...     [ DONE ]
* Transferring incremental dataset snapshot 5 to dest-server
    + Tranferring zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-5 ... done
    + Sleeping for 60 seconds before next snapshot.
* Creating dataset snapshots on source-server
    + Creating zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956 snapshot 6...     [ DONE ]
* Transferring incremental dataset snapshot 6 to dest-server
    + Tranferring zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-6 ... done
    + Sleeping for 60 seconds before next snapshot.

**** Runtime impact and pre-migration changes to ZONE instance
****    43aba933-2cd6-6c47-e63e-b1d2d1ab4956 (pkgbuild)
**** start here!

* Instance pkgbuild is already shutdown.  Proceeding.
* Source XML file modifications...
    + Checking instance XML file for create-timestamp (JPC-1212)...
    = create-timestamp already exists
    + Checking instance XML file for dataset-uuid (image_uuid; AGENT-629)...
    = dataset-uuid already exists
  - Source XML file modifications complete!
* Getting Zone index configuration...                       [ DONE ]
    + checking if we need to correct it (JPC-1421)
  - Zone index configuration check complete!
* Creating final incrmental dataset snapshot on source-server
    + Creating zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956 snapshot 7...     [ DONE ]
    + Tranferring zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-7 ... done
* Creating cores dataset on dest-server
  - Created cores dataset:  zones/cores/43aba933-2cd6-6c47-e63e-b1d2d1ab4956
* Transferring Zone XML to the destination...                   [ DONE ]
* Deleting migration snapshots on dest-server...                    [ DONE ]
* Disabling instance on source-server...                        [ DONE ]
* Importing & Attaching Zone on dest-server...                  [ DONE ]
* Restarting vmadmd on dest-server, to ensure detection of transferred VM's presence.
* Temporarily reserving instance IP addrs so they don't get reprovisioned elsewhere
  in the short amount of time that the instance may show up as "destroyed"...
    + reserving 192.168.212.108 on network_uuid e76a5115-1353-4788-b3a3-7302e2f2b710...    Done
* Setting attr 'do-not-inventory' for 43aba933-2cd6-6c47-e63e-b1d2d1ab4956 on source-server...
    + Restarting vmadmd again on destination, to hasten update of VM's presence.
    + Restarting heartbeater on destination, to hasten update of VM's presence.
* Checking VMAPI for 'do-not-inventory' update (VMAPI will show dest-server's
  server_uuid if updated); may take up to a minute or so ........ Success!
* Unreserving instance IP addrs since we're no longer at risk of
  the instance showing up as "destroyed".
    - unreserving 192.168.212.108 on network_uuid e76a5115-1353-4788-b3a3-7302e2f2b710...    Done
* Enabling Autostart on the dest-server...                   [ DONE ]
 - VM pkgbuild was in state installed when we started
   thus not running.  Leaving in "installed" state.

                    ===   Done! ===

ZONE Instance:  43aba933-2cd6-6c47-e63e-b1d2d1ab4956 (pkgbuild)

          is now installed on

      Dest CN:  dest-server (00000000-0000-0000-0000-d43d7ef73056)

    Migration started:  April 28, 2014 06:43:10 PM UTC
      Migration ended:  April 28, 2014 09:30:00 PM UTC
Duration of migration:  1.50 hours
    Instance downtime:  36 seconds
       Migration type:  Incremental
 # dataset increments:  8
   migration log file:  cak-1 HN:/var/tmp/migrator-43aba933-2cd6-6c47-e63e-b1d2d1ab4956-dest-server-1398709539.log

Don't forget to at least comment out 43aba933-2cd6-6c47-e63e-b1d2d1ab4956
  on source-server in /etc/zones/index, if not outright remove it.
To remove it on source-server:
   * /usr/bin/ssh 10.1.1.33 "/usr/sbin/zonecfg -z 43aba933-2cd6-6c47-e63e-b1d2d1ab4956 delete -F"
   * /usr/bin/ssh 10.1.1.33 "/usr/sbin/zfs destroy zones/cores/43aba933-2cd6-6c47-e63e-b1d2d1ab4956"
   * /usr/bin/ssh 10.1.1.33 "/usr/sbin/zfs destroy -r zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956"

 Please validate in AdminUI:  https://10.1.1.26/vms/43aba933-2cd6-6c47-e63e-b1d2d1ab4956
        or validate via CLI:  /opt/smartdc/bin/sdc-vmapi /vms/43aba933-2cd6-6c47-e63e-b1d2d1ab4956

* Clearing migration SSH keys
You have new mail in /var/mail/root

Note that the log above shows the migration took 2.78 hours, but the actual downtime was only 36 seconds.

Monitoring progress

On the dest compute node, we started a while loop to basically watch progress of the dataset transfers. The following is a sample (taken at 1 minute intervals) of that output while migrator.sh was running; the output has been broken up for the sake of brevity, but you can see the generation of the snapshots once the first iteration of the data transfer has been completed, following by the removal of those snapshots once the state file is removed and the transfer finalized.

dest-server# INSTANCE_UUID="43aba933-2cd6-6c47-e63e-b1d2d1ab4956" ; while : ; do date ;  zoneadm list -vic | grep ${INSTANCE_UUID} ; zfs list -rtall | grep ${INSTANCE_UUID} ; echo ; sleep 60 ; done
April 28, 2014 06:43:02 PM UTC

April 28, 2014 06:44:02 PM UTC
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956                              540M   149G   540M  /zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956

April 28, 2014 06:45:02 PM UTC
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956                             1.19G   149G  1.19G  /zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956

April 28, 2014 06:46:02 PM UTC
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956                             1.85G   148G  1.85G  /zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956

April 28, 2014 06:47:02 PM UTC
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956                             2.63G   147G  2.63G  /zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956

April 28, 2014 06:48:02 PM UTC
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956                             3.57G   146G  3.57G  /zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956

<----------SNIP---------->

April 28, 2014 08:01:57 PM UTC
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956                              114G  36.3G   114G  /zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956

April 28, 2014 08:02:58 PM UTC
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956                              114G  35.6G   114G  /zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956

April 28, 2014 08:03:59 PM UTC
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956                              115G  34.9G   115G  /zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956

April 28, 2014 08:05:01 PM UTC
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956                                                115G  34.8G   115G  /zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-0      0      -   115G  -

April 28, 2014 08:07:01 PM UTC
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956                                                115G  34.8G   115G  /zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-0     8K      -   115G  -
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-1      0      -   115G  -

April 28, 2014 08:08:01 PM UTC
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956                                                115G  34.8G   115G  /zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-0     8K      -   115G  -
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-1     8K      -   115G  -
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-2      0      -   115G  -

April 28, 2014 08:09:01 PM UTC
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956                                                115G  34.8G   115G  /zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-0     8K      -   115G  -
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-1     8K      -   115G  -
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-2     8K      -   115G  -
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-3      0      -   115G  -

April 28, 2014 08:10:01 PM UTC
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956                                                115G  34.8G   115G  /zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-0     8K      -   115G  -
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-1     8K      -   115G  -
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-2     8K      -   115G  -
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-3     8K      -   115G  -
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-4      0      -   115G  -

April 28, 2014 08:11:01 PM UTC
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956                                                115G  34.8G   115G  /zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-0     8K      -   115G  -
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-1     8K      -   115G  -
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-2     8K      -   115G  -
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-3     8K      -   115G  -
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-4     8K      -   115G  -
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-5      0      -   115G  -

April 28, 2014 08:12:01 PM UTC
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956                                                115G  34.8G   115G  /zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-0     8K      -   115G  -
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-1     8K      -   115G  -
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-2     8K      -   115G  -
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-3     8K      -   115G  -
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-4     8K      -   115G  -
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-5     8K      -   115G  -
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956@vmsnap-1393137060_ops_migration_1393137060-6      0      -   115G  -

<----------SNIP---------->

April 28, 2014 08:15:01 PM UTC
   - 43aba933-2cd6-6c47-e63e-b1d2d1ab4956 installed  /zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956 joyent   excl
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956                              115G  34.8G   115G  /zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956
zones/cores/43aba933-2cd6-6c47-e63e-b1d2d1ab4956                        144K  15.0G   144K  /zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956/cores

April 28, 2014 08:16:01 PM UTC
   - 43aba933-2cd6-6c47-e63e-b1d2d1ab4956 installed  /zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956 joyent   excl
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956                              115G  34.8G   115G  /zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956
zones/cores/43aba933-2cd6-6c47-e63e-b1d2d1ab4956                        144K  15.0G   144K  /zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956/cores

Remove state file to finish migration

By design, the incremental migration will continue to loop and resend snapshots every 60 seconds (60 seconds from the end of the previous snapshot send) until the state file is deleted. The state file will be specified in the header information that is echoed by the script at startup, and is also stored in the log:

headnode# ls -l migrator-43aba933-2cd6-6c47-e63e-b1d2d1ab4956-dest-server-1398709539.*
-rw-r--r--   1 root     root        5300 Apr 28 21:28 migrator-43aba933-2cd6-6c47-e63e-b1d2d1ab4956-dest-server-1398709539.log
-rw-r--r--   1 root     root           0 Apr 28 18:25 migrator-43aba933-2cd6-6c47-e63e-b1d2d1ab4956-dest-server-1398709539.st

Removing the state file tells the script to shutdown the instance and continue.

headnode# rm migrator-43aba933-2cd6-6c47-e63e-b1d2d1ab4956-dest-server-1398709539.st

End state on destination compute node

Post migration, the instance should appear on the destination compute node as:

dest-server# # INSTANCE_UUID="43aba933-2cd6-6c47-e63e-b1d2d1ab4956" ; zoneadm list -vic | grep ${INSTANCE_UUID} ; zfs list -rtall | grep ${INSTANCE_UUID}
   - 43aba933-2cd6-6c47-e63e-b1d2d1ab4956 installed  /zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956 joyent   excl
zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956                              115G  34.8G   115G  /zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956
zones/cores/43aba933-2cd6-6c47-e63e-b1d2d1ab4956                        144K  15.0G   144K  /zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956/cores

Validate migration via APIs

Validate the instance was successfully migrated and is reported as being hosted by the destination compute node:

dest-server# # sdc-vmapi /vms/43aba933-2cd6-6c47-e63e-b1d2d1ab4956  | json -Ha server_uuid alias create_timestamp state
00000000-0000-0000-0000-d43d7ef73056 pkgbuild 2014-03-13T15:44:35.586Z running

Verify instance

Verify the state and sanity of the migrated instance with the instance user. Provided they have no issues, you can destroy the instance on the source compute node.

Clean up the source compute node

If you haven't yet verified the instance with the instance user, at least comment it out in /etc/zones/index on the source compute node by using a # in front of the appropriate line.

source-server# INSTANCE_UUID="43aba933-2cd6-6c47-e63e-b1d2d1ab4956 "; grep ${INSTANCE_UUID} /etc/zones/index
#43aba933-2cd6-6c47-e63e-b1d2d1ab4956  :installed:/zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956:43aba933-2cd6-6c47-e63e-b1d2d1ab4956

If you have verified the instance as stable with the user, remove the instance from the source compute node:

dest-server# /usr/bin/ssh 10.1.1.33 "/usr/sbin/zonecfg -z 43aba933-2cd6-6c47-e63e-b1d2d1ab4956 delete -F"
dest-server# /usr/bin/ssh 10.1.1.33 "/usr/sbin/zfs destroy zones/cores/43aba933-2cd6-6c47-e63e-b1d2d1ab4956"
dest-server# /usr/bin/ssh 10.1.1.33 "/usr/sbin/zfs destroy -r zones/43aba933-2cd6-6c47-e63e-b1d2d1ab4956"
Zone and KVM differences

KVM instances have 2 extra datasets that Zones do not have, zones/:UUID-disk0 and zones/:UUID-disk1. If we had migrated a KVM instance, there would be additional destroy commands for the 2 zones/:UUID-disk\# datasets associated with a KVM instance.

Bhyve instances have the top-level disk quota and reservation at 100% of the sum of the root and child dataset quotas. There is no additional quota for snapshots. For this reason, the migrator has extra steps to temporarily bump up the quota before migration and reset it after migration.