Evan Herbst
[eherbst cs washington edu] (link goes to a web form)

Build System for (C/C++-Based?) Research Code

I'm working on a build system designed for C/C++-based projects (which I say because I don't know how similar the build process is for fortran, say, or ocaml, and this system might not fit them). Intended for research code: I'm not worrying very much about portability or about interfacing with version control, but I do worry about files moving around or being renamed often; I don't want to have to edit a dir's makefile each time I create/delete/move/rename a file. In the same vein, I want to provide 1) correctness and 2) as much intelligence as possible at build time; build speed is not my first priority. For example, for speed, makedepend assumes that the dependences of header file X are the same when X is included by all files in a project dir; I don't. My system emphasizes checking for dependences of various sorts. I don't want to have to specify any more than necessary about what depends on what, because code (at least mine :) ) changes so much so often. (Build speed does matter to me, though.)

The system is all in ruby and relies on rake for ordering tasks after the dependence graph has been generated. I don't use many other rake features. I've written the basic functionality to handle simple C/C++ projects but haven't provided lots of specific builder classes for other compiled languages, and there's only basic support for generated files (eg I haven't provided a builder class specifically to run flex).

To be clear: by "project" I mean the codebase for one research project, some of which may be shared with other research projects. This generally produces multiple static libraries and executables, some of which are unrelated to each other.

Organization

build-framework.rb goes in a per-user-or-group rubylib dir; a project-specific Rakefile.include.rb in the project root dir (under which it's assumed all the code is located); a directory-specific Rakefile.rb for each source directory. (One for each source dir, but not necessarily one in each source dir; a rakefile can specify that it gives build info for its whole subtree.)

Features

  • Easy maintenance: as a project grows, rakefile changes are minimized. Most editing should happen in the Rakefile.include, to specify new external libraries or build types. I've found the resulting rakefiles to be 30 to 80 percent shorter than the corresponding makefiles for a moderately sized project (~50 dirs; dir hierarchy of 3 - 4 levels).
  • Call from anywhere: each individual directory's rakefile knows where the project's Rakefile.include is and includes it. After that Rakefile.include pulls in Rakefiles from other source dirs as necessary (and only as necessary; I don't want to have to read every makefile in the project if I'm only building one executable).
  • Separate source/build trees: the build tree is put in a directory specified by the Rakefile.include, which allows for multiple coexistent builds (eg multithreaded/single-threaded, ubuntu/redhat, 32-bit/64-bit, debug/release). Since rake provides command-line args, the build dir can be specified by the user at build time.

    After building an executable in the build tree, the system sticks a symlink to it in the source tree.

  • Multiple source trees: sources from any number of directories can be built as one project. The build tree's location doesn't have to be related to the source tree's.
  • Intelligent filepath handling: all paths coming into the system are canonicalized. This does require visiting the filesystem, but allows us to use filepaths as keys in various maps, which is what we want most of the time since a file is uniquely identified by its absolute filepath. In most places absolute and relative filepaths are both accepted.
  • Generated files: as of 12 / 08, generated headers are supported by default. No generated non-header source files yet.
  • Intelligent prerequisite finding:
    • Source files are parsed for include filenames, and the system considers included files that don't exist because they'll be generated as part of the build process.
    • Third-party (including system) libraries are specified, including mappings from header filenames to the libraries they front for so that dependences can be figured out. This minimizes the amount of prerequisite-listing that needs to go into individual-directory makefiles. These lists of external libs can be shared among projects.
    • Headers are automatically associated with source files of the same name. (Meaning you need to be slightly careful naming, but you usually shouldn't name unrelated files the same thing anyway.) Project-internal static libs are made unnecessary because objects being linked pull in headers they include, which pull in objects or libs they front for, and so on recursively.
    The result of all this is that in most cases you need to specify the external libs your project uses, a list of executable filenames to be produced, and nothing else.
  • Per-target customization: you can specify one or more arbitrary Ruby functions to set up the build process for a particular target or all targets matching a regex. (Currently each of these functions takes a Compiler object.)
  • Extensibility:
    • Based on rake. At a high level, all I do is create some rake tasks and some lookup tables for them to use. You can always mix my system with rake tasks that don't depend on mine.
    • Based on an existing programming language. You can create new Builder classes for new types of build rules (such as calling flex) and add the corresponding functions to Raker.

High-Level Details

The basic idea is there are compile-time dependences and link-time dependences. At compile time, x.cpp "depends on" x.h in that x.h must have been built (generated or found in the filesystem) before x.cpp can be compiled. At build time, x.h "depends on" x.o in that if y.cpp #includes x.h, y.o probably needs to be linked with x.o. Exactly how to define these dependence relationships isn't obvious, at least to me; took me a few iterations to get it right. In addition, you need to make sure objects being linked don't include the same symbols (eg if a.o is in b.a, don't link both). Because I want to do #include parsing at build time (which encompasses compile and link times), the dependence graphs get updated at build time, so I need to put off compilation and/or linking until I'm sure the relevant parts of the dependence graphs are complete.

Alternatively I could do #include parsing at graph setup time. In this case there's no way around having to parse every source file in every directory that gets visited, which is horribly slow and results in your having to decide what to do with compile errors on files that might not be needed during the build the user requested. You could ignore these errors until the build step for the source file; keep them until you know whether the file will be needed, then display them if it will; or avoid compiling the file at all, which means deciding at build time what needs to be built. I've tried all of these and ended up with the third, for efficiency. For efficiency, I cache lists of included headers.

System setup

  • Filesystem layout: source and build trees can be wherever you want. There must be a "project root" directory so the system knows where to find the Rakefile.include and other per-project files.
  • Required ruby libraries: you need ruby (possibly v2.5 or newer, since that's what I'm using); you need rake; you need the Ruby Graph Library (since I need to keep track of more types of dependences than just what rake does). If you have rubygems installed you can get rake, RGL and streams (which RGL requires) via it. If you want to extend my code, you may find the rake API and RGL API useful.

User guide

  • Basic usage:
    • To make target executable in the current dir: rake executable
    • To clean all targets in the current dir: rake clean
    • To clean all targets in the current dir and everything they depend on: rake clean-all
      (Doesn't always work at present. You may need to rm -rf $PROJROOT/build/$BUILD_NAME/, which removes all built things for the project under the specified build configuration.)
  • Multiple source dirs: the rakefiles in dirs that are shared between projects should load the rakefile.include from the dir ENV['projroot']; this lets the user specify a projroot dir on the command line if building in a dir that doesn't belong unambiguously to one project.
  • Example project setup: [OUT OF DATE as of 1 / 09, but still gives you the gist]

    $PROJROOT/Rakefile.include.rb:

    #extend Evan's C/C++-based-project build system framework built on rake

    require 'rubylib/build-framework'

    ######################################################################################
    ###extend build-framework by adding project-specific vars (some of which must be global) and project-specific functions here if you want
    ###(for instance, you might want to change file extension info in the filename-conversion functions)
    #project-wide build environment
    #for projects involving compile and link stages, and possibly generated source code
    #a module, but not meant to be mixed in
    module BuildEnv

    ##### project setup #####

    private

    #globals ($...) are meant to be available to all rakefiles
    #ENV is a rake thing: if the user calls 'rake KEY=VAL', ENV will be {'KEY' => 'VAL'}

    #basics
    PROJROOT = Dir.pwd() #project root
    SRC = File.join(PROJROOT, 'src')
    @@build_dirname = File.join('build', ENV['build'] || 'default') #all built files are put in PROJROOT/build_dirname/...
    $CXX = ENV['CXX'] || 'g++'
    $CC = $CXX
    $LD = $CXX
    OPTIM_FLAGS = ENV['OPTIMFLAGS'] || '-g -O2'
    $DEBUGGING = (OPTIM_FLAGS =~ /-g/) #it seems ruby uses $DEBUG; defining that leads to lots and lots of text :) -- EVH 20080422

    #non-c++ interpreters/command-line tools
    $RM = 'rm -fv'
    $AR = 'ar'

    #external libs
    PROJLIBS = "proj_shared_data_dir/libs"
    CUSTOMIZED_LIBS = "#{SRC}/customized_libs" #third-party libs customized for this project
    BOOSTDIR = "my_libs/boost-1.35.0"
    CIMGDIR = "#{CUSTOMIZED_LIBS}/CImg-1.2.7"
    VWDIR = "#{PROJLIBS}/VisionWorkbench-2.0.alpha4"
    OPENCVDIR = "#{PROJLIBS}/opencv-1.0.0"
    $OPENGL_INCDIR = '/usr/include'
    $OPENCV_INCDIR = "#{OPENCVDIR}/include/opencv"

    $CXXFLAGS = "-Wall #{OPTIM_FLAGS} -I#{PROJROOT}/src -I#{BOOSTDIR}/include/boost-1_35 -I#{CIMGDIR} -I#{VWDIR}/src"
    $CFLAGS = "#{$CXXFLAGS}"
    $LDFLAGS = "#{OPTIM_FLAGS}"

    RAKEFILENAME = 'Rakefile.rb' #the only filename that will be looked for in individual source dirs

    #return: Pathname for project root dir
    def self.projrootPathname()
    return Pathname.new(PROJROOT)
    end

    public

    #return: Pathname to root of project source tree
    def self.sourceTreeRoot()
    return projrootPathname()
    end

    #return: Pathname to root of project build tree
    def self.buildTreeRoot()
    return projrootPathname().join(@@build_dirname)
    end

    ##### project-specific libs #####

    private

    #project-external libs
    CIMG_H = /CImg\.h$/
    BOOST_LIBDIR = File.join(BOOSTDIR, 'lib')
    def self.boostlibname(namebase) return "boost_#{namebase}-gcc41-mt-1_35" end

    register3pLib(:sym => :boost_regex, :name => boostlibname('regex'), :dir => BOOST_LIBDIR, :headers => [/boost\/regex\.hpp$/, /boost\/regex\/.*\.hpp$/])
    register3pLib(:sym => :boost_iostreams, :name => boostlibname('iostreams'), :dir => BOOST_LIBDIR, :headers => /boost\/iostreams\/.*.hpp$/)
    register3pLib(:sym => :boost_filesystem, :name => boostlibname('filesystem'), :dir => BOOST_LIBDIR, :headers => /boost\/filesystem\/.*\.hpp$/)
    register3pLib(:sym => :boost_serialization, :name => boostlibname('serialization'), :dir => BOOST_LIBDIR, :headers => [/boost\/serialization\/.*\.hpp$/, /boost\/archive\/.*\.hpp$/])
    register3pLib(:sym => :vw, :dir => "#{VWDIR}/lib", :headers => /\/vw\/.*\.h$/)
    register3pLib(:sym => :gl, :name => 'GL', :headers => /GL\/gl\.h$/)
    register3pLib(:sym => :glu, :name => 'GLU', :headers => /GL\/glu\.h$/, :requires => :gl)
    register3pLib(:sym => :glut, :headers => /GL\/glut\.h$/, :requires => :glu)
    register3pLib(:sym => :opencv, :name => 'cv', :dir => "#{OPENCVDIR}/lib", :headers => /\/cv\.h$/)
    register3pLib(:sym => :math, :name => 'm')
    register3pLib(:sym => :pthread, :headers => /\/pthread\.h$/)
    register3pLib(:sym => :X11, :headers => CIMG_H)

    end

    ######################################################################################
    ### extend build-framework by adding project-specific Builder subclasses here if you want

    ######################################################################################
    ### extend build-framework by adding factory functions for project-specific builders here if you want
    #one Raker per directory; created in rakefiles
    #any cleaning of pathnames from outside this file is done in this class (ie there might not be any done, but if you want to add it, do that here)
    class Raker
    ##### for rakefile use #####
    public
    end

    When the user calls rake with the option build=debug, for example, the build tree will be rooted at $PROJROOT/build/debug.

    $DEBUGGING seems like a good thing to call your project-specific are-we-in-debugging-mode flag that the individual-dir rakefiles can use, since $DEBUG is apparently used by ruby or rake or something else my framework requires.

    The regexes for header filenames associated with each external library are compared to the absolute filepath of each header we look at, so if all your files #include <gl.h>, for example, you need to specify :headers => /gl\.h$/ rather than :headers => 'gl.h' when registering the OpenGL lib. The string form is provided mostly for cases when A/b.h and C/b.h are both included but require different external libs.

    If external library A requires external library B, create entries for both. As for any external lib, if B can be found in the default lib-search path, you don't need to specify a :dir for it.

    Rather than listing lots of include dirs with the -I option, I could have added them to the array $INCDIRS. Different format, same effect.

    $PROJROOT/Rakefile.rb:

    Dir.chdir('.') do require 'Rakefile.include' end
    raker = Raker.new(:vars => {'INCDIRS' => [$OPENCV_INCDIR]})

    raker.exe(:name => 'lucasKanadeOpticalFlow') #build from lucasKanadeOpticalFlow.{c or cpp}
    raker.exe(:name => 'hornSchunckOpticalFlow', :mainmod => 'hornSchunck') #exe filename different from main module's source filename

    raker.lib(:sym => 'flow', :modules => ['flowPyramid', 'image', 'gradients', '../aux/aux'])

    Two executables and a shared library will be built in the build directory corresponding to this source directory.

    The elements of $INCDIRS will be appended to $CXXFLAGS/$CFLAGS in the appropriate manner. It would really be best to initialize $INCDIRS in the Rakefile.include instead of hardcoding include paths into $CXXFLAGS there.

    $PROJROOT/subdir/Rakefile.rb:

    Dir.chdir('..') do require 'Rakefile.include' end
    raker = Raker.new(:subtree => true)

    #these are all equivalent ways of specifying target-specific variable overrides
    raker.set(:prereqs => /image_.*\.cpp/, :vars => {'INCDIRS' => ['image'], 'CXXFLAGS' => ' -Wall'})
    raker.set(:prereqs => /image_.*\.cpp/, :vars => {'INCDIRS' => ['image'], 'CXXFLAGS' => ' -Wall'}, :how => :add)
    raker.set(:targets => /image_.*\.o/, :vars => {'INCDIRS' => ['image'], 'CXXFLAGS' => ' -Wall'})

    raker.set(:type => :cobj, :vars => {'INCDIRS' => ['c_incdir1', 'c_incdir2']}, :how => :replace) #replace previous var values for targets of type C object

    raker.ruby_gen_src(:script => 'autogen_header.rb', :output => 'header.h', :redirect => true) #:redirect: call `script > header` instead of `script`

    This rakefile proclaims that it provides build info for its entire subtree, not just the directory it's in. It also overrides variables for particular targets.

    The two modes for variable overrides are :add and :replace, which do what they sound like. Override rules are visited in the order in which they're declared; any number can be used for a particular target. Rules to which overrides should be applied can be specified by prereq name (string/regex; any prereq can match), target name (string/regex), target module type or prereq module type (any prereq can match).

  • Near the top of build-framework.rb is a list of module types (eg :cobj in the example file you just read) and various information about them that the build system uses; you may want to add new module types here or in the BuildEnv module in your Rakefile.include. At the bottom of the same file is a list of public functions in the Raker class, which are meant for use in individual-directory rakefiles, and specs for their parameters. If you want to add Raker functions to declare new types of build information, or add subclasses of Builder, do so in your project's Rakefile.include or in your copy of build-framework.rb. I'd appreciate your sending me any additions you make; I'll be happy to acknowledge you. Similarly you may use my code for any legal purpose as long as you don't claim to have written it.

To do

  • I don't deal with generated source files; only generated headers. Furthermore, as of 8 / 2 / 09 dealing with generated files is broken because I'm not careful enough when parsing for #include dependences. (Which, by the way, is really really slow already. Not sure how to deal with that since all I'm doing is calling gcc -M.)
  • You can use headers in a project-external dir, but you can't use source files.
  • Rake does timestamp checking to figure out what's been updated. Due to timestamps on different machines not in general matching, this should be replaced by file hashing a la cons, a perl-based build system.

    I can confirm this really is a problem in UWCS: sometimes the include-list files are found to be up to date when they aren't.

Code

I've put a "stable" version at github. (You can get a free github account with nothing but username, e-mail addr and password. Also read up on git.) Realize this is code in progress; plenty of things are broken. I've included a small example project. Because the code doesn't actually do anything (since it's meant to test dependence checking), you might need to build without optimization, ie don't use opt= on the rake command line.

When I say "broken", I mean that while this version does make maintenance easier than using make, you have to delete the build directory more often than you might like. ($rm -rf $PROJROOT/build/$BUILD_NAME is usually what I do when it's obvious the build is screwing up and I'm not quite sure why. If you set up your Rakefile.include the way I do in the example project, $BUILD_NAME here is the argument you gave to opt=, or 'default' if none.)


updated 9 / 09