Manager: Michelle, is it true that in Subversion anyone can check-out the code?
Me: Yes, absolutely. Anyone with an account can check-out the source.
Developer 1: Yes, see that’s the problem. If I check-out a file, I have no idea if someone else is working on the file as well. Visual Source Safe was much better in that respect.
Me: But that’s the point, unlike VSS, SVN supports concurrent development. It is smart enough to automatically keep track of who is changing what. Even if you edit the same line, SVN will detect this.
Developer 2: But we’re working on a major feature and need to know that no-one else is working on those files.
Me: In that case you should create a branch, and merge the changes back when you’re done. And if you want to automatically keep track of who is working on what we can set up email notifications, allowing you to be notified of any commits automagically.
Manager: No that’s not good enough. We can’t risk developers clobbering each other’s code. You need to change this.
Me: But it’s really not necessary, and defeats the point of a multi-versioned, concurrent development…
Manager: No Michelle, it’s too dangerous. Please change SVN to make it single check-out only, now.*
This is a conversation I have actually had. Surprisingly, it seems to be a common mindset amongst former VSS users.
It happens often. Companies struggle for years to manage their source code in VSS, which is little more than a glorified file share. To keep track of their changes, they establish mind-bogglingly complex and disciplined processes, introducing some rather bad habits along the way. And then when they finally migrate to a proper concurrent versioning system (that even supports branching, <shock, gasp>), those ingrained VSS habits become difficult to break.
Here are the three main legacy VSS habits I’ve come across:
1. Mega commits
Developers check-out an area of source code. Then maybe a week, fortnight, or even several months later, when they are happy with their change, they check all 23 files back into source control in one mega commit.
This comes about because in VSS when code is checked out to a user, it is locked and no-one else can modify it, so there is little motivation to regularly commit incremental changes back to the repository.
Why mega-commits are wrong:
- Essentially all the changes are stored on the developer’s computer. What if the hard-disc dies, is stolen, or the developer suffers a concussion while going out shooting that weekend?
- Developers can’t roll-back to their last working version. If three days into the work they discover a problem, they can’t roll back to a version they know didn’t have that problem and do a diff. The end result is a lot of wasted time.
- Proper source control tools like SVN can automatically merge changes made by different people on the same file, but only when changes are committed reasonably regularly. Unfortunately a common scenario with ex VSS users is this:
- Developer 1 checks out project XXY and begins work for specification #6.
- Two weeks later Developer 1 is pulled off the project to work on another project.
- Over the next few weeks three other developers proceed to fix a number of bugs, and also add several new features to project XXY and commit them back to the repository.
- A month later Developer 1 is finally able to complete the specification and proceeds to do a mega-commit. However, because so many changes have happened since that first check-out, SVN can’t cope with the merge. Problems result. And the SVN admin is beckoned to fix the mess. The SVN admin then has to listen to woes of how aweful SVN is and how back in the good old days of VSS, these sort of issues just never happened.
2. Single check-outs
Only developers actively working on a file are allowed to check it out, and authorization first needs to be sought prior to checking anything out from the source code repository.
In VSS, to have some sense of order, the administrator generally decides where code lives, and who is allowed to check out what and when.
So, when migrating to something supporting concurrent development which allows anyone to check-out whatever they want, project managers start feeling anxious. How will they ever know who is working on what? How will they maintain any code quality? The only way to achieve this is to completely lock down the repository. (Of course the real answer lies with commit notifications, browseable history, integrated bug tracking in commit comments, and branching policies. However, this is all functionality not present in VSS, and so is often overlooked.)
Why locked single check-outs are wrong:
- Single check-out is essentially sequential development. It is awfully inefficient.
- Source code administration becomes a full-time job. Proper concurrent versioning systems more-or-less require no on-going maintenance once configured, and it’s still possible to see who is working on what, where and why at a glance.
- Any (developer) should be able to check-out code. Restrictions should be placed around committing/checking in. Placing read restrictions on source code creates a fragile cloud of mysterious mush, and has a detrimental effect on long-term software development.
3. Checking binaries into source control
With each release, all the binaries are checked back into the repository.
Now, with a good source control system, it is possible to take a snapshot of the source code tree at any given point. So, if we build a release for a customer on Monday 17th August, we can label the source with the release at that point. And then in three months time, if the customer happens to report a problem with that release, we have the functionality to roll-back to exactly the same build as the customer has to reproduce and debug the problem.
However, the concept of labeling (or tagging) the repository is not supported in VSS, so there is no easy means of rolling-back to a specific build. To cope with this, companies insist on checking every binary for every release back into VSS (and then also into any subsequent version control system they migrate to).
Why it’s wrong:
- Binaries are large. Within no time the project will be gigabytes in size, putting unnecessary strain on bandwidth and disk space usage. The server becomes bogged-down and slow.
- Introducing binaries into the source repository encourages bad development habits. No-one validates the build output and there is no guarantee that any given binary in the repository can in fact even be reproduced.
- Binaries are a product of source code. Keeping binaries in source control, is like an architect building and storing a house for every plan s/he releases to clients. If the architect is that insecure about their plans, that they need to keep a replica for reference, then I really wouldn’t be trusting them to build my house. Source code should be structured in such a manner that the application can be reproduced from source at any point in time.
*And yes, I did write a trigger script hack that did this. But at the end of the day, I couldn’t bring myself to activate it, really I couldn’t; it just seemed wrong.