Table of Contents
The LHCb Nightly Build System is based on a few subsystems:
The LHCb Nightly builds are organized in slots, projects and platforms.
A slot is a named set of projects meant to test the build of the software under some well defined conditions. For example, the slot lhcb-head is used to build the latest version in the repository of all the LHCb software projects on top of a released version version of Gaudi (and the externals), while lhcb-cmake is used to test the build with CMake of the projects already converted to it.
A project in a slot is a well defined version of a LHCb software project, which could be a tagged version (as it can be released) or the latest version in the repository (using the special version tag HEAD). A project can also be tuned by changing the version used for one (or more) of its packages with respect to the one that is implied by the specified version of the project (for example use a released version of a package in the HEAD version of the project or vice versa).
Each project in each slot is built and tested on one or more platforms, i.e. combinations of CPU architecture, Operating System (OS), compiler and optimization level. A platform is identified by a string where the four parts of its definition are separated by a -, for example x86_64-slc6-gcc46-opt means Intel/AMD (x86) 64 bits architecture, Scientific Linux CERN 6 (SLC6), gcc 4.6.x and optimized build.
TO-DO
There is no particular scheduling (or throttling) for the nightly builds slots.
Every night, around midnight, the nightly bootstrap Jenkins job (nightly-slot) is started. It is a parameterized job that can be used to start any slot, but in the default configuration it triggers all the slot flagged as enabled (or, better, not disabled) in the configuration.
The nightly-slot job triggers one nightly-slot-checkout job per enabled slot, which will then trigger nightly-slot-build-platform either directly or via nightly-slot-precondition (as shown in the figure below).
Trigger diagram of the jobs controlling the nightly builds in Jenkins.
We try to recover from temporary failures of the jobs infrastructure by automatically restarting the failed jobs (up to 3 times). An overview of the status of the nightly build jobs can be found in the Jenkins Jobs Status page.
It must be noted that only failures of the Nightly Build System are shown as failures in Jenkins, and failures of the builds or tests are considered successful run of the Jenkins jobs.
TO-DO
TO-DO
- XML (can use the web-based configuration editor, but not the new features of the build system)
- JSON (requires manual editing from a checkout of LHCbNightlyConf)
- Python (most powerful, requires manual editing)
- create the slot directories and volumes in $LHCBNIGHTLIES (they can be symlinks to other directories, if needed)
- add 'afs' to the slot deployment field in the configuration
Sometimes it is necessary to stop a slot before it completes (for example to restart the builds).
If there are pathologic problems with the build of a slot on one platform, or before triggering its rebuild, we can stop it following these steps:
The build will terminate shortly, after some Jenkins internal book keeping operations.
If the slot is still in the checkout step, stopping the checkout job will be enough:
If the checkout was completed, you need to stop all the building platforms and the wrapper build job:
Re-building can be triggered at different levels:
This is the easiest option and should be preferred to the others if we can afford the time it takes for a checkout (for slots with several projects it may take more than one hour).
This is also the only option in case we need a fresh checkout.
The field os_label allows you to override the system a build is run on. For example to build slc5 binaries on a slc6 machine or to force the build on a specific host. In most cases it must be left empty.
Useful if the checkout of a slot was correct, but all the builds failed for some reason.
If, for example, there has been a problem with a machine you can rebuild only one platform:
Note that you can access the specific build page from the Jenkins Jobs Status page if you cannot find it through the Nightly Builds Dashboard.
In principle there is no need to remove builds from the database, because each new complete build of a slot will be reported in its own table and new partial builds will overwrite the old entries, but sometimes a broken (or aborted) build is just noise in the web page.
- connect to lhcb-archive.cern.ch as lhcbsoft
- remove the symlink /data/archive/artifacts/nightly/<slot>/<day>, where <day> is the current date as yyyy-mm-dd
- cd ~/LbNightlyTools
- source setup.csh
- from LbNightlyTools.Utils import Dashboard
- d = Dashboard()
- d.dropBuild(<slot>, <build_id>)
To update the dashboard CouchApp avoiding downtime of the web page, we need to use a fallback replica.
- connect to http://lbcouchdb.cern.ch:5984/_utils/replicator.html (only a few machines can do it)
- select the local database nightlies-nightly as source and nightlies-nightly-bk as destination
- click on the Replicate button and wait
- either from the web
- go to http://lbcouchdb.cern.ch:5984/_utils/database.html?nightlies-nightly-bk
- select a view (under _dashboard_) in the dropdown list (all views of the dashboard will be cached, which will take some time, but you can check the progress at http://lbcouchdb.cern.ch:5984/_utils/status.html)
or with a script (from LbNightlyTools):
./cron/preheat_nightly_dashboard.sh -v -d http://lbcouchdb.cern.ch:5984/nightlies-nightly-bk/_design/dashboard
- edit /etc/httpd/conf.d/25-lbcouchdb443.conf replacing nightlies-nightly with nightlies-nightly-bk
- (as root) call service httpd reload
- either from the web
- go to http://lbcouchdb.cern.ch:5984/_utils/database.html?nightlies-nightly
- select a view (under _dashboard_) in the dropdown list (all views of the dashboard will be cached, which will take some time, but you can check the progress at http://lbcouchdb.cern.ch:5984/_utils/status.html)
or with a script (from LbNightlyTools):
./cron/preheat_nightly_dashboard.sh -v -d http://lbcouchdb.cern.ch:5984/nightlies-nightly/_design/dashboard
Replicate new documents from the backup instance to the main one
- same as step 1, but swapping source and target
- check for conflicts
Restore the original web page configuration (see step 4)
Replicate once more from the backup instance to the main one (see step 7)