Thursday, December 21, 2006

Compass as compared with Maven's SNAPSHOT system

In this article, I'll describe some of the differences between Maven 2.x and the "Compass" interal home-grown system we use at work.  I'll first describe our repository layout, then describe our component descriptor file, and finally I'll summarize some of the advantages and disadvantages of using the different systems and suggest future work.

The Compass system was designed with Maven 1.x in mind.  The original developers had said, roughly: "You know, Maven's got the right idea, but this really hasn't been implemented the way we'd want it.  We should rewrite it ourselves from scratch."

Repository Layout

Like Maven, Compass has one or more remote repositories containing official built artifacts, (or "components", as we call them,) as well as a local repo on each developer's machine which caches artifacts from the remote repo and contains locally built artifacts.  Where Maven and Compass substantially diverge is in how artifacts are stored in the repository.

While Compass doesn't have a notion of "groupId", our remote repository is divided up into sections, like this:

    thirdparty/
        log4j/
        junit/
    firstparty/
        RECENT/
            foo/
            bar/
        INTERNAL/
            foo/
            bar/
        RELEASE/
            foo/
            bar/

[NOTE: This isn't exactly how it looks, but it's close enough.]

Within a given section, you find a flat list of components.  In this example, "foo" and "bar" are buildable components that we've created; log4j and junit are, naturally, components built by other people.  "RECENT" contains only freshly built components.  "INTERNAL" contains components that have been blessed by some human being, and are intended for internal consumption.  "RELEASE" contains released components and products.

In practice, there are 914 components in RECENT and 671 components in INTERNAL.

Within a given component directory, you'll find a number of subdirectories, which define the "version" of the component.  Thirdparty versions may have any arbitrary strings for their names (e.g. "3.8.1" "1.0beta3" "deerpark").  However, firstparty versions are strictly defined: they are simply the P4 Changelist number of the product at the time it was built.

(A quick note about changelist numbers as opposed to revision numbers.  Most people are familiar with the distinction between CVS revision numbers and SVN revision numbers: CVS revision numbers are "per file" whereas SVN revision numbers are global to the repository.  P4 changelist numbers are like SVN revision numbers.  [Also note that you can calculate something like an SVN revision number in CVS, simply by noting the timestamp of the most recent check-in.])

So, within the "foo" directory in RECENT, you'll see this:

    foo/
        123456/
            foo.jar
        123457/
            foo.jar
        123458/
            foo.jar
        @LATEST -> 123458

That's three numbered directories with a "LATEST" symlink, pointing to the most recent build in that directory.

The first thing to note about this system is that if you build 123458 and then rebuild 123458, it will replace the old "123458" directory.  The second thing to note is that if you change foo at all, it will get a new changelist/revision number, and so it will get a new subdirectory under "foo" once automatically built.

The three sections within the "firstparty" directory (RECENT, INTERNAL, RELEASE) are called "release levels", and we have a process about how components move into each release level.  "foo" and "bar" are automatically built every night and deployed into RECENT; if there are more than three builds in RECENT, we automatically delete the oldest build.

If somebody thinks that a build of "foo" is good enough to keep around, they "promote" that build into INTERNAL by simply copying the numbered changelist directory into INTERNAL.  Once we think it's good enough to release, we can promote that INTERNAL build into RELEASE by copying it there.  There is no tool, nor any need for a tool, to rebuild for release or make even the slightest changes to the released binaries.

Especially note that we don't put any of this information in the filename of the jar.  It's called "foo.jar" whether it's in RECENT, INTERNAL, or RELEASE.  We do burn the changelist number of foo.jar into its META-INF/manifest.mf at build-time...  that information remains constant whether "foo.jar" is copied to INTERNAL or RELEASE.

Component Descriptor File

Compass has a file that looks a lot like the Maven POM XML file... our file is called "component.xml".  component.xml defines a list of <dependency> elements.  Here's an example component.xml file:

<component>
    <name>foo</name>
    <release>6.1.0</release>
    <branch>main</branch>
    <depends>
        <dependency type='internal'>
            <name>bar</name>
            <branch>2.1.x</branch>
            <version>242483</version>
            <release-level>RECENT</release-level>
        </dependency>
        <!-- ... -->
    </depends>
</component>

Note that the component does not declare its own version number.  (Since in Compass-lingo, version numbers are SCM revision numbers, it would be impossible to declare this in the descriptor file; as soon as you checked in, it would be wrong!)  Instead, it allows you to declare a "branch" name, usually something like "main" or "feature" or "2.1.x" as you see above, as well as a "release", which looks more like a Maven version number, but is purely descriptive... it's not used for resolving artifacts at all.

Also note the presence of the <release-level> tag in the <dependency> element, which specifies the release level (RECENT, INTERNAL, RELEASE) of the dependency in question.

We do have a simple tool that automatically verifies whether a component is suitable for promotion to INTERNAL or RELEASE, which we call "DepWalker" (Dependency Walker).  You can use it to check to see if "foo" depends on any components in RECENT, or whether anything "foo" depends on (or anything they depend on, etc.) depends on components in RECENT.  Components in RECENT are temporary, and therefore unsuitable for long-term reproducibility.

Of course, if you like, you can also wire up your <dependency> tag to depend on RECENT/bar/LATEST.  In that case, you can continuously integrate with the latest version of bar.  You do that like this:

<dependency type='internal'>
    <name>bar</name>
    <branch>main</branch>
    <version label='LATEST'>242483</version>
    <release-level>RECENT</release-level>
</dependency>

The presence of the attribute "label='LATEST'" informs Compass that we want to automatically upgrade to the current LATEST version of "bar" that's available.

In an optional step we call "pre-build", Compass automatically modifies the number in <version> to match the LATEST version number.  If you don't "pre-build", the existing version number will be used.  The official nightly build system always runs "pre-build", and then automatically checks in the updated version into source control.

With that said, you don't have to use label=LATEST if you don't want to.  If you don't care about reproducibility, you can just say this:

<dependency type='internal'>
    <name>bar</name>
    <branch>2.1.x</branch>
    <version cl='242483'>LATEST</version>
    <release-level>RECENT</release-level>
</dependency>

Since the version number is LATEST, we can't reproduce this build later.  In that case, the automated build system still performs automated check-ins to modify the "cl='242483'" attribute, but that information is only there so humans can know what "LATEST" was at a given time, and so we can automatically bump the version number (by checking in, we increase the revision number). 

Finally, if you really don't care about reprodicibility, you say this:

<dependency type='internal'>
    <name>bar</name>
    <branch>2.1.x</branch>
    <version>LATEST</version>
    <release-level>RECENT</release-level>
</dependency>

In that case, your build is totally unreproducible, but it has the advantage that we won't bother with automated check-ins.

Advantages/Disadvantages Relative to the Current "Snapshot" System

Advantages:

  • "SNAPSHOT" is a marker that indicates that a given component/project is under development.  But that means that you necessarily have to modify the binaries in order to release them; using today's release plug-in, you actually have to check in modified source code before you can release, which is really troubling.

    The Compass system doesn't use a "SNAPSHOT" marker, and so promoting/releasing is simply a copy step.

  • Under today's SNAPSHOT system, there's no notion of a "build number" for a non-SNAPSHOT release.  If you deploy foo-1.0 today, then make some changes and redeploy foo-1.0, today's deploy system will simply replace the old foo-1.0 with the newer version.

    In Compass, everything always has a build number (and it's the same as the SCM revision number).

  • label=LATEST guarantees reproducibility while allowing for "soft" version numbering.  Maven only allows for unreproducible "soft" version numbers.

    Reproducibility is generally a virtue, but specifically it pays off when "foo" depends on "bar" and "bar" introduces a change that breaks "foo".  When "foo" is a reproducible build, you can say "this (automated) check-in 123456 broke the build of foo", see what changed in that check-in, and immediately identify the source of the problem (a bad new "bar").  It also allows you to roll back to an older check-in of "foo" to fix the problem.

    With unreproducible "soft" version numbers, you find yourself automatically upgraded to the new "bar", and no reliable way to determine this.  The same code in "foo" may build successfully on Tuesday but fail on Wednesday, with no apparent explanation as to why.

Disadvantages:

  • If you don't use a SNAPSHOT marker, it's not as easy to tell whether the file in question is an official release or not.  (This may matter a lot more to open source developers than closed source developers.  Open source developers make pre-release builds available for public download, which creates a risk that someone may download a pre-release binary and then ask for support.  Closed source developers typically keep pre-release binaries a secret and only make release binaries available through official channels.)

    Since typically under the Compass system even the filename doesn't include version information, the only way to figure out the "version" (changelist/revision#) of a given jar is to crack it open and look at the manifest.mf file.  Even that won't tell you whether the jar has been officially released, but it should be enough information for you to check to see if it is an official release at all.

  • label=LATEST does automated check-ins...  in some cases, it does a lot of automated check-ins.  These can clutter your revision logs.

Conclusion

I wouldn't want to suggest from this writing that anyone should cast aside the existing "SNAPSHOT" system in favor of Compass.  However, it is a major goal of mine to make Maven's release mechanism powerful enough that we could follow/enforce the Compass system using Maven.

Here's what I'd like to do, in no particular order:

  • Allow users to optionally specify a named repo within which you require that a given artifact can be found.  (Internally, we'd probably use this as an equivalent to our <release-level> tag, but I think this would be generally useful just to make it simpler to diagnose problems when a given artifact can't be found.)
  • Modify Maven's deploy mechanism and repository layout to ensure that a build number is always available in the remote repository, even for official releases.
  • Allow POMs to declare a dependency on a particular jar/version/build number, even for official releases.
  • Enhance the build-numbering mechanism to allow Continuum and other continuous build engines to deploy using an SCM revision number as a build number.
  • Create a mechanism that will allow you, if you wish, to automatically upgrade dependencies in POM files, by declaring both a literal version number + build number as well as a "soft" version number which serves only as a guideline for the automatic upgrade tool.

1 comment:

Unknown said...

It been over a year since you posted this, but I wanted to see if you knew of any progress on the issues you described. Both this blog post and the "BEA Maven Requirements" you wrote clearly describe what must be a very common problem with Maven.

My understanding is that NMaven has made progress in keeping the version number out of the artifact file name, but the holy grail of soft dependency versions combined with repeatable builds is still impossible.

I'd much appreciate if you could post an update on the current state of affairs or perhaps just point me in the general direction of some recent info.