Migrate multiple repositories from GitHub.com to GitHub Enterprise

Scenario:

https://github.com/MyOrg/MyRepo :arrow_right: https://ghe.example.com/MyOrg/MyRepo https://github.com/MyOrg/MyOtherRepo :arrow_right: https://ghe.example.com/MyOrg/MyOtherRepo

Preparation & setup

  1. Make a backup of your instance using backup-utils

    Making a backup of the instance will allow you to easily roll back to a pre-migration state during trial runs.

  2. Set up a sandboxed instance for trial runs

    If you're planning to migrate repos onto a production GitHub Enterprise instance, you should set up a sandboxed duplicate of your instance in which to test your migration.

    This may not be necessary on a fresh GitHub Enterprise installation

  3. SSH into your GitHub Enterprise instance

    ssh -p 122 admin@github.example.com

  4. Ensure you have ownership permissions for the Organization you're migrating

    For the current version of organizations, you must be a part of the "Owners" team. For the newer version of organizations (currently in private beta at the time of this writing), you must be a direct owner of the organization.

Note: For simplicity, complete the following steps by executing the commands while SSHed into your GitHub Enterprise instance. This will prevent the need for additional file transfers later.

Set environment variables

Before getting started, you set the environment variables for your personal access token you set up earlier. Example snippets provided here assume that this variable is set.

export GITHUB_TOKEN=0000000000000000000000

Perform a trial run

Note: It is strongly recommended that you perform a backup between as many steps as possible using backup-utils. This will consistently provide recent restore points in the event of an import error.

Perform the migration

Start the migration for the repositories (trial run)

For the trial run, we don't want to unnecessarily lock the repositories. Though commits may occur after the archive is made, it doesn't matter since it's assumed you'll be restoring the archive to a sandboxed instance of GitHub Enterprise anyway, then later discarding the archive.

Command:

curl -H "Authorization: token ${GITHUB_TOKEN}" -X POST \
  -H "Accept: application/vnd.github.wyandotte-preview+json" \
  -d'{"lock_repositories":false,"repositories":["MyOrg/MyRepo", "MyOrg/MyOtherRepo"]}' \
  https://api.github.com/orgs/MyOrg/migrations

Be sure to replace instances of MyOrg, MyRepo, and MyOtherRepo with your appropriate information. Also, be sure to retain the guid and url values from the JSON response. You'll need them later.

You can also optionally store the guid value into an environment variable, which is what is assumed for the rest of this guide.

export MIGRATION_GUID=fs26dac8-1w87-1135-8e13-a8tjd25q0fdd

Monitor the status of the migration export (trial run)

Once you start the migration, it can take several minutes to complete, depending on the combined size of the repositories. This snippet will check on the status of the migration every 30 seconds. Once the state is exported, use ^C to cancel the runner. Replace the URL with the URL you attained in the previous step.

while true; do \
  curl -s -H "Authorization: token ${GITHUB_TOKEN}" \
  -H "Accept: application/vnd.github.wyandotte-preview+json" \
  https://api.github.com/orgs/MyOrg/migrations/999 \
  | grep -E 'state|exported'; sleep 30; done

Download the migration archive (trial run)

Once the export is complete, a copy of the archive is uploaded to a file server. However, the archive does not have a publicly accessible URL at this point. This snippet instructs the file server to delegate a public URL to the file. For security reasons, the URL will expire within 60 seconds. To save you from feeling rushed, this snippet also downloads the file the instant the URL becomes available. Again, be sure to replace the organization name and migration number in the example URL.

ARCHIVE_URL=`curl -H "Authorization: token ${GITHUB_TOKEN}" \
  -H "Accept: application/vnd.github.wyandotte-preview+json" \
  https://api.github.com/orgs/MyOrg/migrations/999/archive`; \
  curl "${ARCHIVE_URL}" -o migration_archive.tar.gz

The archive will be saved to your current directory with the filename migration_archive.tar.gz.

Delete the migration archive from remote server (trial run)

After downloading the archive, you'll want to delete it from the file server. Even though the URL will not be accessible and archives are automatically deleted after 7 days, it's good practice to remove the file from the file server immediately.

curl -H "Authorization: token ${GITHUB_TOKEN}" -X DELETE \
  -H "Accept: application/vnd.github.wyandotte-preview+json" \
  https://api.github.com/orgs/MyOrg/migrations/999/archive

Prepare import from archive (trial run)

The preparation step opens the archive, reads it, and loads resource URLs data into the database so they can be mapped in the next step. If you did not set a $MIGRATION_GUID environment variable above, be sure to replace it with the retained guid value from earlier.

ghe-migrator prepare migration_archive.tar.gz -g $MIGRATION_GUID

Map records and resolve conflicts (trial run)

Detect import conflicts

When importing your backup archive, it's possible that users, repositories, organizations, or other entities will have conflicting names. gh-migrator comes with a utility to detect and output these conflicts to a CSV file.

ghe-migrator conflicts -g $MIGRATION_GUID > conflicts.csv

This will output the conflicts into conflicts.csv, which may look like this:

model_name,source_url,target_url,recommended_action
user,https://github.source/jonmagic,https://github.target/jonmagic,map
user,https://github.source/leo,https://github.target/leo,map
user,https://github.source/adalyn,https://github.target/adalyn,map
user,https://github.source/red,https://github.target/red,map
organization,https://github.source/acme,https://github.target/acme,map
repository,https://github.source/acme/widgets,https://github.target/acme/widgets,rename
team,https://github.source/orgs/acme/teams/owners,https://github.target/orgs/acme/teams/owners,merge
team,https://github.source/orgs/acme/teams/builders,https://github.target/orgs/acme/teams/builders,merge
team,https://github.source/orgs/acme/teams/testers,https://github.target/orgs/acme/teams/testers,merge

Keys:

Name

Meaning

model_name

The type of resource that has a name conflict

source_url

The defining URL of the resource as it exists on the source server (GitHub.com)

target_url

The defining URL of the resource as it exists on the target server (GitHub Enterprise)

recommended_action

How the conflict should be resolved. Defaults to a suggested action.

Recommended actions:

Name

Meaning

map

Any references to the source resource should now reference the target resource

rename

Import the source resource with the name provided by the target resource and update references

merge

Combine data from the source resource with the target resource and update references

Resolving conflicts

In conflicts.csv, you can change the target_url and recommended_action for each resource, using the actions defined above. For example, even if the recommended action might be to merge two copies of a team, you may wish to rename the team as it exists on your instance of GitHub Enterprise. You can open the CSV file in a text editor or spreadsheet application to make changes, just be sure to save as a CSV when you are done editing the file.

So this line:

team,https://github.source/orgs/acme/teams/owners,https://github.target/orgs/acme/teams/owners,merge

becomes:

team,https://github.source/orgs/acme/teams/owners,https://github.target/orgs/acme/teams/used-to-be-owners,rename

Process conflict resolutions and map resources

Once you are satisfied with how you've determined to resolve the resource conflicts, simply use the map command to send your changes back to ghe-migrator. If you are satisfied with the recommended actions that were generated, this command will apply those actions.

ghe-migrator map -i /home/admin/conflicts.csv -g $MIGRATION_GUID

Import repositories (trial run)

Once your conflict resolutions have been submitted, you can now perform the import of the archive. You will be prompted to authenticate with an administrator's credentials for the target appliance. You may provide either the password or a personal access token for the administrative user. It's also option to pass these as option flags: -u USERNAME or -p PASSWORD.

ghe-migrator import /home/admin/migration_archive.tar.gz -g $MIGRATION_GUID -u AdminUser

Note: This command can run very long, depending on archive size. When running this during your trial run, you may prefix the command with time. This will give you an idea of how long the import will take on production and about how long you can expect to have downtime.

Audit imported records (trial run)

Once the import is complete, you should validate the integrity of the imported data by performing an audit. To generate an audit file, use the following command:

ghe-migrator audit -g $MIGRATION_GUID

This will output a manifest of migrated records to a CSV file.

You can specify states using the -s flag (the flag accepts a comma-separated list of states). States you can specify for auditing are:

  • import

  • map

  • rename

  • merge

  • imported

  • mapped

  • renamed

  • merged

  • failed_import

  • failed_map

  • failed_rename

  • failed_merge

For example, to output a list of records that failed to import:

ghe-migrator audit -s failed_import,failed_map,failed_rename,failed_merge -g -g $MIGRATION_GUID

If no state is specified, ghe-migrator audit defaults to the imported state.

You can also specify models to audit using the -m flag (the flag accepts a comma-separated list of models). For example, to audit only pull requests:

ghe-migrator audit -m pull_request -g $MIGRATION_GUID

When performing an audit, we recommend opening two side-by-side web browser windows, then selecting a line in the generated audit CSV file, open the source (GitHub.com) and target (GitHub Enterprise) URLs for comparison. Check to make sure that data looks consistent. Repeat this process until you are satisfied with the integrity of the migrated records. Audit various types of records, including users, teams, issues, and pull requests.

Once you are satisfied with the trial-run migration results, continue with a production migration.

Start the migration for the repositories (production)

When archiving for the purposes of restoring to production, you need to lock the source repositories. This prevents future commits, issues, or any other changes from occurring. This way, users are forced to only push changes to the new, migrated repositories once they become available.

Command:

curl -H "Authorization: token ${GITHUB_TOKEN}" -X POST \
  -H "Accept: application/vnd.github.wyandotte-preview+json" \
  -d'{"lock_repositories":true,"repositories":["MyOrg/MyRepo", "MyOrg/MyOtherRepo"]}' \
  https://api.github.com/orgs/MyOrg/migrations

Be sure to replace instances of MyOrg, MyRepo, and MyOtherRepo with your appropriate information. Also, be sure to retain the guid and url values from the JSON response. You'll need them later.

You can also optionally store the guid value into an environment variable, which is what is assumed for the rest of this guide.

export MIGRATION_GUID=fs26dac8-1w87-1135-8e13-a8tjd25q0fdd

Monitor the status of the migration export (production)

Same step as what was used in the trial run

while true; do \
  curl -s -H "Authorization: token ${GITHUB_TOKEN}" \
  -H "Accept: application/vnd.github.wyandotte-preview+json" \
  https://api.github.com/orgs/MyOrg/migrations/999 \
  | grep -E 'state|exported'; sleep 30; done

Download the migration archive (production)

Same step as what was used in the trial run

ARCHIVE_URL=`curl -H "Authorization: token ${GITHUB_TOKEN}" \
  -H "Accept: application/vnd.github.wyandotte-preview+json" \
  https://api.github.com/orgs/MyOrg/migrations/999/archive`; \
  curl "${ARCHIVE_URL}" -o migration_archive.tar.gz

Delete the migration archive from remote server (production)

Same step as what was used in the trial run

curl -H "Authorization: token ${GITHUB_TOKEN}" -X DELETE \
  -H "Accept: application/vnd.github.wyandotte-preview+json" \
  https://api.github.com/orgs/MyOrg/migrations/999/archive

Prepare import from archive (production)

Same step as what was used in the trial run

ghe-migrator prepare migration_archive.tar.gz -g $MIGRATION_GUID

Map records and resolve conflicts (production)

Same step as what was used in the trial run

You may use the same file that was generated during your trial run.

ghe-migrator map -i /home/admin/conflicts.csv -g $MIGRATION_GUID

Import repositories (production)

Same step as what was used in the trial run

ghe-migrator import /home/admin/migration_archive.tar.gz -g $MIGRATION_GUID -u AdminUser

Audit imported records (production)

Same step as what was used in the trial run

ghe-migrator audit -g $MIGRATION_GUID

Complete the migration

Once you are satisfied that the migration has completed successfully, you need to unlock the repo on the GitHub Enterprise instance to allow users to access it.

Unlock the imported repositories from the command line

You can unlock the imported repositories from the command line. While SSHed into your GitHub Enterprise appliance:

Command:

ghe-migrator unlock -g $MIGRATION_GUID -u YOUR-USERNAME -p YOUR-TOKEN

Unlock the imported repositories from the web

Access your GitHub Enterprise instance with a web browser.

Go to Admin Tools (stafftools) for each repository being migrated.

Click on Admin in the left sidebar.

Click Unlock in the Single Repository Lock area.

Cleaning up

Wait a week or two and then delete the (still locked) migrated repositories from GitHub.com.

Last updated