Migrating From SVN To Git: The Powershell Way

For the last few months, I have been working with a client to update older programs to meet their current development standards. Part of this process requires us to migrate code from their SVN repositories to Git, but for a while we had no good way of preserving the revision history as commits. One of my colleagues took to the official documentation to help find a solution, but unfortunately this did not completely suit our needs. The reason being is that the documentation used BASH/Perl code combined with Git commands as part of the migration process, which would not work on our Windows development PCs unless we took advantage of Cygwin or WSL (the commands did not work with Git Bash). However, I did have some Powershell scripting experience handy from my last job, and the documentation provided an excellent starting point for me to begin work on a script. So how can we use Powershell to help automate this process?

Prerequisites

Before we can begin writing our script, we must first ensure some conditions are satisfied:

  • Windows 10 is our development environment (8/8.1 might work too, but if you are using 7, then you need to upgrade pronto).

  • Powershell 5.1 is installed on our system. We aim to use the Active Directory module in our script, which does not work with Powershell Core 6+ at the time of this writing.

  • The Remote Server Administration Tools is enabled on our system, as this allows us to import the Active Directory module for Powershell.

  • The usernames of SVN users must match what is in Active Directory.


Quick Disclaimer

Since the script I am describing was written for the client, I cannot share it in its entire form. WHAT I CAN DO is describe the process I went through to write it and provide a few commands that I found the most helpful.

Step 1: Understanding What We Need to Accomplish

Since our goal was to preserve revision history, we aimed to make them look like proper commits as much as possible. Typically, a commit will contain the following information:

  • The first and last name of the commit author.

  • The email address of the commit author.

  • A commit hash.

  • The commit message.

However, if you were to use the “git svn clone” command with the minimum arguments (location of svn repo, and where you want your converted repo stored), we would get a commit that looks similar to the one below (note: I am referencing this from the linked documentation).

commit 37efa680e8473b615de980fa935944215428a35a
Author: schacon <schacon@4c93b258-373f-11de-be05-5f7a86268029>
Date:   Sun May 3 00:12:22 2009 +0000

    fixed install - go to trunk

    git-svn-id: https://myproject.googlecode.com/svn/trunk@944c93b258-373f-11de-be05-5f7a86268029

The commit author’s first name, last name, and email do not appear in the commit. Instead, we only see their SVN username appear twice in the author section. And while the commit message is present, it has a git-svn-id appended to it. What we want is a commit message that looks more like this (again, referenced from the linked documentation):

commit 03a8785f44c8ea5cdb0e8834b7c8e6c469be2ff2
Author: Scott Chacon <schacon@geemail.com>
Date:   Sun May 3 00:12:22 2009 +0000

    fixed install - go to trunk

Hey, that looks much better! The author’s first name, last name, and email are now present in the Author section, and the commit message no longer has the git-svn-id appended to it. But how do we get our commit to look like that?

The attached Git documentation achieves this by doing the following:

  • Get the Authors’ Usernames: While targeting the desired repo, use a command to but a list of all of its authors’ usernames.

  • Create a File Containing Users’ Data: Import the usernames to a text file, and then add the first name, last name, and email address of each other next to its associated username.

  • Make a Clone On Our Development PC That Converts the Repo from SVN to Git: Attach the file to the “git svn clone” command, which will then render our commit messages in the desired format.

  • Perform Cleanup: Git rid of (hehhehheh) any references to the former SVN repo that we cloned/converted from.

Okay, so all we need to do is replicate what the authors of the Git documentation are doing using Powershell (while looking for opportunities to improve the process along the way).

Step 2: Get the Author’s Usernames

Next, we need to communicate with the SVN repo we wish to convert and retrieve a list of all the authors who have done work on it. The documentation we are using as a starting point accomplished this by using the following command:

$ svn log --xml --quiet | grep author | sort -u | \
  perl -pe 's/.*>(.*?)<.*/$1 = /'

Which performs the following actions:

  1. Obtains all the log entries from an SVN repo and outputs them as XML.

  2. Executes ‘grep’ on the output, which pulls all lines containing the word ‘author’.

  3. Removes duplicate entries with ‘sort -u’.

  4. Performs a final pipe on the output to Perl, which uses a regular expression to remove the author tags, leaving only the usernames behind.

The same can be accomplished in Powershell using the following line of code:

$revisionAuthors = invoke-expression "svn log  --xml 
| select-string 'author'" -ErrorAction SilentlyContinue 
| foreach-object {[regex]::Matches($_, '<author>([^/)]+)</author>')} 
| ForEach-Object { $_.Groups[1].Value } 
| sort-object | Get-Unique

However, since Powershell is a scripting language that works with objects (not text) our approach will need to be different. To better understand this, let’s go over the above command step by step:

  1. The first portion of the command is wrapped in the Invoke-Expression cmdlet. Normally on the Powershell console, we could type ‘svn log –xml’ and expect it to work. Powershell does not know to treat non-Powershell commands (i.e Git/SVN commands) differently when they are placed into a script and will ignore them UNLESS they are called by Invoke-Expression.

  2. The code that we provide Invoke-Expression will pull all the log entries from an SVN repo and output them as XML, then use the Select-String cmdlet to extract all lines of XML containing the word ‘author’.

  3. Here is where things start to get tricky. We use a regular expression to pull all the data between the author tag, AND PIPE IT OUT AS A LIST OF OBJECTS! Our next step needs to locate where the author names are stored in each object. This is done by…

  4. Calling foreach-object again to loop through each object (represented by $_) and take the value of the property $_.Groups[1].Value, which is of type string. This produces a list of strings for us.

  5. Since strings are objects as well, we use sort-object to sort them in alphabetical order.

  6. Finally, we use get-unique to remove any duplicate entries from our list of strings. We now have our list of SVN usernames, which is stored in a variable called $revisionAuthors.

Step 3: Create a File Containing Users’ Data

We have all our usernames, it is time to match them up with corresponding first names, last names, and email addresses. The linked Git documentation discusses setting up a file, called Users.txt, where this information entered manually, as shown below:

schacon = Scott Chacon <schacon@geemail.com>
selse = Someo Nelse <selse@geemail.com>

Fortunately, Active Directory can help us automate this process. First, you will need to import the Active Directory module into your script:

Import-Module ActiveDirectory

Next, loop through the $revisionAuthors list. For each iteration, use each SVN username to query against the appropriate Active Directory property (in my case, SamAccountName), and pull the properties that correspond with their first name, last name, and email (in my case, givenname, surname, and userprincipalname respectively) and save it in an object called $adUserProfile:

$adUserProfile = Get-ADUser -Filter 'SamAccountName -like $revisionAuthor' 
| select-object givenname, surname, userprincipalname

In the same loop, check if $adUserProfile has values. If so, add the username, first name, last name, and email in the same format as we saw with the Git documentation:

Add-Content -Path "./Users.txt" -Value "$($revisionAuthor) 
= $($adUserProfile.givenname) $($adUserProfile.surname) <$($adUserProfile.userprincipalname)>"

You will likely discover that some of you SVN users no longer have accounts in Active Directory. This can be problematic, as the clone command we will be using expects our file to follow a specific format in order for our commits to be saved properly. To get around this, simply place the SVN username in place of the first name, last name, and email address.

Add-Content -Path "./Users.txt" -Value "$($revisionAuthor) 
= $($revisionAuthor) <$($revisionAuthor)>"

Once the looping is finished, Users.txt will be ready to be used in the cloning process.

Step 4: Make a Clone On Our Development PC That Converts the Repo from SVN to Git

Now that we have everything we need, it is time to perform the conversion. The documentation shows us how a hypothetical project, called my_project, is converted from SVN to Git using the following command:

$ git svn clone http://my-project.googlecode.com/svn/ \
--authors-file=users.txt --no-metadata --prefix "" -s my_project

Since this is nothing more than a Git command, we can pass it as an argument to the Invoke-Expression cmdlet, and change the Git command arguments to suit our needs. Let’s use the example above as a base to create the Powershell equivalent of this command:

invoke-expression "git svn clone 
http://my-project.googlecode.com/svn/ 
--authors-file=./Users.txt --no-metadata 
--prefix '' -s my_project" 
-ErrorAction SilentlyContinue

After running that command, you will have a fully-functional Git repository. However, our work does not stop there.

Step 5: Perform Cleanup

When we converted the repository to our local development machine, we also brought SVN remote tracking branches with us. Since our goal is to move forward by using Git, we want to be done with SVN once and for all. To remove the remote tracking branch, add the following command to your conversion script:

invoke-expression “git branch -D -r -- -sgit-svn”

Once this step is complete, you will be able to move forward using Git!

Conclusion

Well there you have it; an easy way to convert your SVN repos to Git while having Active Directory and Powershell do some of the leg work! As stated before, the script I developed is now the property of the client, so I cannot show it to you in its entirety. However, I do hope that you are able to use this walkthrough to develop your own (and even improved) script that you can use to help you and your development team’s conversion process. As always, take care and have an excellent day!