SCM Requirements

From Apache OpenOffice Wiki
Jump to: navigation, search

The problem

CVS has been an invaluable SCM (Software Configuration Management) [1] [2] [3] tool for the past 6 years for OpenOffice.org, but it's showing its age. There have been calls to replace it with a modern SCM solution. The stated reasons vary, but the following topics are mentioned most often:

Branching and tagging is an O(n) operation
OpenOffice.org CWS (Child Workspace) development model relies on heavy use of branches and tags. CVS branching and tagging scales with the number of files affected and is so slow that it actually hinders development for a project of the size of OpenOffice.org.
Versioned renaming of files and directories
A history preserving renaming/moving of files is awkward in CVS, a renaming of directories is plain impossible.
Proper handling of binary files
The CVS way to handle binary files is clumsy and prone to errors.
Atomic commits
CVS has no atomic commits, something which practically all modern SCM have. Interestingly, I never felt that the missing atomic operations are a problem for "commit" operations, but they are badly needed for tagging and branching. It's quite common that a tag run is interrupted (for example during "cwsadd") and than the repository has to be cleaned before the operation can be retried.

Development model

Within the child workspace development model every development is done in private branches. That way a change can be developed and QAed completely insulated from changes in the trunk. In the SCM literature these kind of branches are also called "feature branches", we extend that concept even to the vast majority of bug fixes. The CWS tools provide means to update the branch to a newer version ("milestone") of the trunk, release engineering is responsible for the integration (merging) of the branches into the trunk. There seems to be an universal agreement that this model is a good model for OpenOffice.org, no one wants to go back to the bad old days of everyone committing directly to the trunk or a release branch. We just need a tool that is better in supporting this kind of development model than CVS.

There is much more to the CWS development model than just the branches and their handling, but this is outside the scope of the SCM tool.

Requirements for the next OpenOffice.org SCM tool (preliminary list)

We want a SCM tool which fits our development model, not modifying the development model to fit a SCM tool.

Mandatory requirements

  1. The SCM tool and its repository format must be stable enough to support a code base of the size of OpenOffice.org (this is self evident). The best way to prove this is the successful use in other large software projects.
  2. Clients for the SCM tool have to be available for the major development platforms (Linux, MacOSX, Solaris, Windows) and should be available for most of the OpenOffice.org platforms.
  3. The new SCM should support the subset of CVS which is used in every day life, such as "status", "diff", "annotate", "log" etc in a reasonable way. I'm certain that every modern SCM does this, CVS sets the lower bar here.
  4. The general operation of the SCM should not be significantly slower than CVS, at least the important things: "commit", "diff", "log" etc. If some seldom used operations like "annotations", "history" are slower than this is probably not much of problem.
  5. The SCM tool must easily support the concept of branches.
    1. Branch creation must be light weight. We create branches even for one liner fixes in single files (bugfix CWS).
    2. It must be possible to repeately update (resync) a branch to a newer version (milestone) of the trunk or a release branch. We create branches that live for many months with constant work on them (huge feature CWS).
    3. If the update (resync) operation is expensive (in terms of merge time and repository size) than branching and resyncing must be possible on a subset of the tree, let's say only on a number of modules.
  6. The SCM must easily support the concept of tags
    1. Tag creation must be lightweight. We create regularly new milestones which needs to be tagged (milestone tags). If the repeated update mechanism for the branches requires tagging to prevent multiple merging as CVS does, than this is even more important (anchor tags).
  7. There must be an easy way to share changes on a branch even before the branch is ready for integration. There is a need to do cross merging between branches from time to time. Usually not a complete changeset is cross merged but just single selected pieces (pulling selected fixes from another CWS).
    1. Note: with a centralized SCM tool this requirement is inherent fulfilled, but for a distributed SCM tool this requires the setup of a publishing framework.
  8. The pushing of changes into a public visible repository must be adequately secured by an authentication mechanism. We cannot risk someone trying to sneak bad code (back doors, encumbered code, embarrassing stuff) into our code base. OpenOffice.org has now a public visibility that makes that kind of attack more likely and is huge enough that certain code changes might go unnoticed for a long time.
    1. Note: this is also something which comes more a or less natural to a centralized SCM tool but needs to be set up for a DSCM.
  9. There must be a way to replicate the "one-and-true" repository into remote r/o repositories to reduce network load.
    1. Note: now this is one thing that DSCM can do by design but needs to be set up for centralized SCM tools.
  10. It must be possible to preserve the history of trunk, all release branches and all active CWS branches during import in the new SCM tool. The import should not take an unreasonable amount of time.
  11. Proper handling of binary files.

Things we'll consider strongly in favor of a new SCM

  1. If the SCM tool is a drop in replacement for CVS.
  2. If the SCM tool allows easy integration in a development framework such as commit messages, correlation of commits with issues etc.
  3. If the SCM tool plays nice with CEE from collab.net (our site hosting software).
  4. If it's enough to modify our CWS tooling to work with new the SCM tool without inventing completely new authentication and publishing schemes.
  5. An easy scriptable interface or client libraries with multiple language bindings (Perl preferred).
  6. Easily traverses firewalls.
  7. A web browsable interface.
  8. Available Bonsai and Tinderbox integration.
  9. If it's easy to replicate "the one-and-true" repository at least for r/o access.
  10. The ability rename/move files with full history preservation.
  11. The ability to rename/move directories with full history preservation.
  12. Atomic commits.
  13. The repository will be accessed by non-developers, too, thus an understandable interface and good documentation is desirable.
  14. Ability to handle more heads, and to cross-merge changes
    1. There exist quite a lot of 'flavors' of OpenOffice.org - vanilla up-stream, StarOffice, ooo-build (Novell version, Debian version, Win32 version, ...), MacOSX port, ... The new SCM should be able to handle the scenario where a branch is already merged in one of the heads, but cross-merged to vanilla upstream later after approval. (Easy with DSCM, not sure about centralized SCM.)

Nice to have

  1. Be able to fully work offline.
  2. Familiar commands to reduce retraining effort.
  3. Integration into popular IDEs like netbeans and eclipse.
  4. Before/during the conversion we should change tabs in c/c++ files to spaces (as the current OOo coding standards recommend)

Preceding efforts

Kai Backmann and others have already invested some work into a Subversion conversion SVNMigration.

External links

[1] http://en.wikipedia.org/wiki/Software_Configuration_Management
[2] http://en.wikipedia.org/wiki/Software_configuration_management/MEE
[3] http://en.wikipedia.org/wiki/Revision_control

[Comparison of various SCM's ]

Personal tools