Git

I've been playing around with Git, with several end goals in mind:

  • Managing local changes to core files of open source projects that I'm not a committer (LedgerSMB, primarily)
  • Managing builds of Dojo files
  • Managing multiple installations of projects like Joomla, ZenCart, etc.
  • Creating a branch to update Project Auriga for some changes to the Dojo API

Three out of that list are directly related to Dojo. Given its build system, its changing API, and similar issues related to embedding it in other projects, Git promises to be a much better tool to meet my needs than Subversion (which we currently use extensively).

However, the layout of Dojo's Subversion repository has made it very challenging to import into Git.

Tracking SVN repositories in Git

Subversion recommends setting up your repositories in one of two ways:
  • branches/
    • projectA/
    • projectB/
    • projectC/
  • tags/
    • projectA/
    • projectB/
    • projectC/
  • trunk/
    • projectA/
    • projectB/
    • projectC/

If you have a Subversion repository that's laid out like this, importing into git couldn't be easier:
git svn clone http://path/to/svn -s
The -s means expect a standard layout. This will create a new directory named with the last part of the path, create a new .git repo inside it, and start sucking down the entire Subversion repository into the git repository.

... or, alternately, the other recommended layout for Subversion looks like this:

  • projectA/
    • branches/
    • tags/
    • trunk/
  • projectB/
    • branches/
    • tags/
    • trunk/
  • projectC/
    • branches/
    • tags/
    • trunk/
In this case, you would create 3 different git repositories, one for each project:
git svn clone http://path/to/svn/projectA -T trunk -t tags -b branches
git svn clone http://path/to/svn/projectB -T trunk -t tags -b branches
git svn clone http://path/to/svn/projectC -T trunk -t tags -b branches

You can then set these up as git submodules underneath a regular git repository.

Importing non-standard Subversion layouts

However, the Dojo toolkit repository is laid out like this:
  • branches/
    • dijit/
    • dojo/
    • dojox/
    • util/
  • dijit/
    • trunk/
  • dojo/
    • trunk/
  • dojox/
    • trunk/
  • tags/
    • dijit/
    • dojo/
    • dojox/
    • util/
  • util/
    • trunk/

(while there are other directories scattered around, this is where to find the code we want to track--none of the other modules are relevant, and there's a bunch of other tags and branches directories that are empty).

We want to normalize this directory structure, so that our other applications can depend upon the dijit, dojo, dojox, and util directories being available, whichever branch we want to check out. We want a tagged version to look the same as a branch (aside from changes to the code), and also the same layout as the trunk version. So we want to have a persistent "trunk" branch that contains the trunk versions of dijit, dojo, dojox, and util. But how?

There are several steps to get there:

  1. Create the repository
  2. Import the individual modules
  3. Create the trunk branch, with modules as a subtree
  4. Import the branches and tags
  5. Set up module branches to keep up to date with the original repository
  6. Update trunk branch

Create the repository

git svn is the main tool to track changes between a git repository and an svn repository. The main git svn command lets you create a new repository using init or clone, fetch, commit, and rebase changes to and from the remote place. While you can use a wild card in the paths for branches and tags, you cannot use it for the trunk. So we're going to set up one svn remote to collect all the tags and branches, and then separate svn remotes to grab the trunks out of each module. To do this for dojo:

git svn init http://svn.dojotoolkit.org/src -t tags -b branches

... then edit the .git/config to add the other remotes:

[svn-remote "svn"]
url = http://svn.dojotoolkit.org/src
branches = branches/*:refs/remotes/*
tags = tags/*:refs/remotes/tags/*
[svn-remote "dojo"]
url = http://svn.dojotoolkit.org/src
fetch = dojo/trunk:refs/remotes/dojo-trunk
[svn-remote "dijit"]
url = http://svn.dojotoolkit.org/src
fetch = dijit/trunk:refs/remotes/dijit-trunk
[svn-remote "dojox"]
url = http://svn.dojotoolkit.org/src
fetch = dojox/trunk:refs/remotes/dojox-trunk
[svn-remote "util"]
url = http://svn.dojotoolkit.org/src
fetch = util/trunk:refs/remotes/util-trunk

Note that we gave each module a unique svn-remote name, and adjusted the fetch config parameter. The left side of the parameter (e.g. dojo/trunk) is the path in the remote svn repository to pull down. The right site is the name you'll use when loading this branch: dojo-trunk.

Import individual modules

Now that we have our layout set up, we need to fetch the data. Now one thing to note: this layout is very time-consuming to download. If you download an entire Subversion repository at once, git svn essentially parses each commit and drops it in the right place. However, if you're only getting part of a repository (which we have to do to be able to get these 4 modules where we want them), git svn will parse through the entire history of each particular branch, one at a time. So when we have 80 some branches, each with a history of thousands of commits, it's going to take a VERY long time.

So we'll leave the branches and tags for later--we want the trunk so we can get started.
git svn fetch dojo
git svn fetch dijit
git svn fetch dojox
git svn fetch util
... each of these will search back and pull down the entire history of the trunk branch--but at least we don't need to wait for this to happen to all the tagged versions. Yet.

Create the trunk branch, with modules as a subtree

This was the hardest part to figure out. I based this on the howto in the git-docs on using subtrees. The main difference, aside from the remotes already being set up (as svn-remotes in our .git/config file), is that we are pulling in a branch in our current repository instead of an external repository.

First, create our trunk branch, and make it empty:
git checkout -b trunk
rm -Rf *
git commit -a

... so we should now have an empty branch to work in. Next, we're going to do the following process for each module:

  1. Merge it into our trunk branch, without committing
  2. Pull the files into the subdirectory
  3. Commit the changes, so you can merge the next module
It looks like this:
git merge -s ours --no-commit dojo-trunk
git read-tree --prefix=dojo/ -u dojo-trunk
git commit -m "merge dojo into trunk"
Notes: the merge -s command specifies a merging strategy. This basically prevents any files from getting copied into the directory. We use dojo-trunk, which is the name of the remote branch, because we don't (yet) have a local branch. You can see all the branches using git branch -a.

Now repeat for the other modules:
git merge -s ours --no-commit dijit-trunk
git read-tree --prefix=dijit/ -u dijit-trunk
git commit -m "merge dijit into trunk"
git merge -s ours --no-commit dojox-trunk
git read-tree --prefix=dojox/ -u dojox-trunk
git commit -m "merge dojox into trunk"
git merge -s ours --no-commit util-trunk
git read-tree --prefix=util/ -u util-trunk
git commit -m "merge util into trunk"

Done! You now have a trunk with each of the 4 modules loaded. You can pull this into your other projects.

Import the branches and tags

Unfortunately, getting the rest of the repository is going to take a while. A long while--it took me several days to get this down the first time (and due to some poor choices early on, I'm having to do it again...)

This is, however, a simple step:
git svn fetch
... and let it grind away.

Set up module branches to keep up to date with the original repository

I'm not absolutely sure this is the best way, but what I found worked is to create a local branch for each of the modules, and then update that from the original Subversion branch.
git checkout -b dojo dojo-trunk
This creates a local branch named "dojo" that corresponds to the remote dojo-trunk branch. Now update it:
git svn rebase dojo-trunk
You use rebase to keep a local branch up-to-date with a remote svn repository, so that you can dcommit back to it if you have the rights...

Update trunk branch

Once you have created your local branches for the modules, you can just switch to them before updating. If you just did the previous step, your local branches are already up-to-date. Otherwise, update them:
git checkout dojo
git svn rebase dojo-trunk
git checkout dijit
git svn rebase dijit-trunk
...

Finally, it's time to update our trunk branch:
git checkout trunk
git pull -s subtree ./ dojo
git pull -s subtree ./ dijit
git pull -s subtree ./ dojox
git pull -s subtree ./ util

Switch to a tag

I'm still working this out... Subversion tags show up as remote branches starting with "tag/". You can fetch the remote branch, but this is bad practice without creating a local branch.

So to switch to a tagged release the first time, simply do:
git checkout -b release-1.1.1 tags/release-1.1.1
... and to go back to trunk,
git checkout trunk
After you've created the branch, you can just do:
git checkout release-1.1.1


Git links

Since I've worked with Subversion a lot, the most helpful links for me so far have been the various tutorials for svn users. Here are some pages I found useful getting up to speed, and figuring out the steps on this page:
  • 5min
  • deserters
  • Course
  • git
  • Subtree How-to (/usr/share/doc/git-doc/howto/using-merge-subtree.txt in the Ubuntu git-doc package)
  • Several others I scoured to pick up general help, but these were the critical ones I read through several times while experimenting to get it right...