SQR-006: The LSST the Docs Platform for Continuous Documentation Delivery

  • Jonathan Sick

Latest Revision: 2016-07-28

1   Introduction

Documentation is an integral deliverable of LSST Data Management’s software development work, and is produced at all stages of our process. Before code is written, we document architectural designs, specifications, and experiments. While code is being written, documentation describes implementations and interfaces. Documentation makes the software understandable through user guides, tutorials, and references.

Documentation underpins the usefulness of LSST software and data as a resource for the astronomical community. All LSST Data Management software is open source and engineered with the expectation that it will find use beyond LSST’s alert and data release pipelines. For instance, astronomers exploring LSST data through the Level 3 compute environment will become direct users of LSST software. And though it is not an explicit mission statement, we expect that future imaging surveys will adopt and build upon LSST software since the LSST pipelines are already engineered to process data from several observatories for testing purposes. Documentation is prerequisite for any use and adoption of LSST software beyond LSST’s internal teams.

1.1   Importance of Integrating Documentation and Code in Git

Originally we authored and published documentation in a manner removed from the code itself, such as Confluence wikis or PDFs archived to Docushare. While these tools made the initial acts of content creation and publication straightforward, they provided no obvious workflow for updating and versioning documentation as part of regular development work. As a result, documentation would often fall out of step with code. An example manifestation of this problem has been silent breakages of tutorials and user guides hosted on wikis as the code was been developed elsewhere.

On the other hand, we already have an excellent workflow for developing well-tested and peer-reviewed code (see the DM Development Workflow). By treating documentation as code we realize the same workflow advantages for documentation development. First, co-locating documentation and code in the same Git repository removes any versioning ambiguity: documentation reflects changes in code potentially on a per-commit granularity. Such documentation can also be continuously integrated against the associated code. Continuous integration ensures that examples work, application programming interfaces are properly documented, and that the documentation web site itself can be built without errors. Finally, such documentation can be reviewed as part of pull requests. Overall, treating documentation as code can improve the overall documentation culture of a software development organization. There is much less cognitive overhead for a developer to update documentation that lives in a Git repository (even within a source code file) than there is to deal with a separate environment like a wiki.

Simply storing documentation in Git is only the first step. Infrastructure is needed to build and deploy that documentation. LSST Data Management has already used Doxygen to extract and render documentation stored in the Git repositories of software packages. However, the documentation deployment implementation had several short-comings. Documentation was only published once merged to master, hindering well-informed conversations about the documentation during code reviews. This documentation build and deployment system was also highly specific to both the LSST Science Pipelines architecture and the build server. This made it difficult to independently debug documentation builds. Each LSST software project largely had to invent its own documentation deployment system, resulting in several ad-hoc approaches across the science pipelines, database, user interface and simulations teams. Finally, limitations in Doxygen encouraged software documentation to also be hosted on Confluence wikis, leading to an entire class of issues already described.

1.2   Read the Docs and Continous Documentation Delivery

Read the Docs redefined expectations for how software documentation should be deployed. Along with the Sphinx documentation build tool, Read the Docs has provided a common documentation infrastructure for open source software projects. As of 2015, Read the Docs hosts documentation for 28,000 projects and served documentation to over 38 million unique visitors.

Through a GitHub Service Hook, Read the Docs is notified when a Sphinx-based project has new commits. Read the Docs then clones the Git repository, builds the Sphinx project (i.e., make html) and deploys the HTML product.

Read the Docs can be configured to listen to different Git branches. By default, Read the Docs builds documentation from the master Git branch and publishes it to a /en/latest/ URL path prefix. It can can publish additional branches, either for private code review or to maintain documentation for stable releases. Such branches are published from /en/branch-slug/ path prefixes. Through a UI element, Read the Docs allows readers to discover and switch between different versions of a software project.

Overall, the key innovation of Read the Docs is the generic automation of versioned documentation deployment that has enabled thousands of open source developers to maintain documentation with minimal overhead.

1.3   Beyond Read the Docs

LSST Data Management deployed as many of 39 documentation projects on Read the Docs, including the DM Developer Guide, several stand-alone Python projects, and technical notes (see SQR-000: The LSST DM Technical Note Publishing Platform). This experience allowed us to be better understand our documentation deployment needs, and culminated in the design and implementation of a new documentation deployment platform, LSST the Docs.

Our experience underscored two categories of documentation deployment needs: first that we need to own the documentation build environments, and secondly that we require deeper integration and automation with respect to our development workflow.

Owning the build environment is the most important requirement, which in fact blocked us from publishing the LSST Science Pipelines documentation on Read the Docs. Software documentation projects require that the software itself be built and available to the documentation build tool. Sphinx, and Numpydoc in particular, inspect the docstrings of installed code to automatically build accurate API reference documentation. Continuous integration also requires that the software be installed to test examples and tutorials. Read the Docs assumes that Python projects can be built with Python’s standard Setuptools/Distutils (i.e., a setup.py file). The LSST Science Pipelines build process does not follow this architecture, and instead uses EUPS to coordinate the versioning of tens of GitHub repositories and Scons to compile the software. Simply put, the LSST Science Pipelines build process is incompatible with Read the Docs.

Another challenge was the scalability of administration for Read the Docs-hosted projects. This challenge was particularly acute with LSST Technotes. Each technote was implemented as its own Read the Docs project. To encourage the adoption of Technotes, a single DM team member was responsible for configuring a new Read the Docs project (including DNS configuration). While this ensured consistent configuration, it created a bottleneck in Technote publication. In some cases, DM team members forgot or didn’t read enough of the documentation to realize they needed to ask the administrator to create their Read the Docs project. Ideally, documentation project provisioning should be fully automated, perhaps even as a ChatOps command.

In day-to-day development work, this administration model was also a bottleneck. For each development branch, the developer would have to ask the administrator to create a new branch build so that the documentation could be previewed in development and code review. Often, developers would never see their rendered documentation until it was merged, sometimes resulting in obvious formatting errors on the published master branch documentation. Alternatively, developers would rely upon GitHub’s rendering of a reStructuredText file. Developers who did this were often confused about rendering errors, not realizing that GitHub does not render the extended reStructuredText syntax available to Sphinx. Instead, we want new versions of documentation to be published immediately and automatically for each branch.

In response to these challenges, the Science Quality and Reliability Engineering (SQuaRE) Team in LSST Data Management undertook the design and engineering of LSST the Docs, a platform for the continuous delivery of documentation.

This technote describes the architecture of LSST the Docs. Those wishing to operate an instance of LSST the Docs, or produce content for a project published by LSST the Docs, should also refer to the documentation listed in 10   Additional Resources.

2   LSST the Docs: A Microservice Architecture

In setting out to design LSST the Docs, we realized that the short-comings of both the original LSST Science Pipelines documentation deployment scripts and Read the Docs stemmed from their integrated architectures. The Pipelines documentation deployment script was deeply integrated with the architecture of the LSST Science Pipelines and its build environment, making it impossible to adapt to other projects. Read the Docs, on the other hand, provided a common build and publication pipeline—yet that build environment was not flexible enough for all projects.

Instead, LSST the Docs is designed as a set of microservices. Each service has a clear, well-defined responsibility along with well-defined interfaces between them. This gives LSST the Docs the flexibility to host a range of documentation projects, from simple Sphinx projects to multi-repository EUPS-based software stacks. In fact, LSST the Docs is agnostic of the documentation build tool; a LaTeX document could be published as easily as a full Sphinx project. This microservice architecture also improves development efficiency since each component can be updated independently of the others. LSST the Docs also takes advantage of standard third-party infrastructure wherever possible.

The main components of LSST the Docs are LTD Mason, LTD Keeper, Amazon Web Services S3 and Route 53, and Fastly. Figure 1 shows how these services operate together.


Figure 1 Architecture of LSST the Docs (LTD) when specifically used to deploy documentation for the EUPS-based LSST Science Pipelines. Other projects might use other build platforms such as Travis CI, and even their own HTML compilation tools. LTD Mason provides the common interface for delivering built HTML to LSST the Docs.

In brief, a documentation deployment begins with LTD Mason, which is a Python tool intended to be run on a project’s existing continuous integration server, such as Jenkins or the publicly-hosted Travis CI, that is triggered by pushes to GitHub. Not only does this strategy absolve LSST the Docs from maintaining build environments, it also follows our philosophy of treating documentation as code. Mason’s primary responsibility is to upload documentation builds onto Amazon S3. Mason is also optionally capable of driving complex multi-repository documentation build (this functionality gives Mason its name).

LTD Keeper is a RESTful web application that maintains a database of documentation projects. An arbitrary number of projects can be hosted by an LSST the Docs deployment. Keeper coordinates all services, such as directing Mason uploads to S3, registering project domains on Route 53, and purging content from the Fastly cache. Other projects, such as the LSST DocHub, can use LTD Keeper’s public API to discover documentation projects and their published versions.

Finally, Fastly is a third-party content distribution network that serves documentation to readers. In addition to making content delivery fast and reliable, Fastly allows LSST the Docs to serve highly intuitive URLs for versioned documentation.

The remainder of this technote will discuss each aspect of LSST the Docs in further detail.

3   LTD Mason for Multi-Repository EUPS Documentation Projects

The role of LTD Mason in LSST the Docs is to register and upload new documentation builds from the continuous integration server to AWS S3. Since LSST the Docs was initially created specifically to build documentation for multi-repository EUPS products such as lsst_apps` and ``qserv, we added optional affordances to LTD Mason to build such projects. Note that LTD Mason can also be used for non-EUPS projects, see 5   LTD Mason on Travis.

For EUPS products, documentation exists in two strata. The base tier consists of the repositories of individual EUPS packages. The second tier is the product’s doc repo.

The role of documentation embedded in packages provide API references and guides specific for code in that package. Co-locating documentation in the code’s Git repository ensures that documentation is versioned in step with the code itself. The documentation for a package should also be independently build-able in a developer’s local environment. Although broken cross-package links may be inevitable with local builds, such local builds are critical for the productivity of documentation writers.

The product’s doc repo is a Sphinx project that produces the coherent documentation structure for the EUPS product itself. The product doc repo establishes the overall table of contents that links into package documentation, and also contains its own content that applies at a product-wide level. In fact, The product doc repo is the lone Sphinx project seen directly by the Sphinx builder.

The product’s doc repo is not distributed by EUPS to end-users, so it is not an EUPS package. Instead, EUPS tags for releases are mapped to branch names in the product doc repo.

3.1   Package documentation organization

To effect the integration of package documentation content into the product documentation repo, each package must adhere the following file layout:

   # ...
      # ...
            <image files>...

The role of the doc/Makefile, doc/conf.py and doc/index.rst files is solely to allow local builds. Also note how static assets for packages are isolated in the _static/<package_name>/ directory.

3.2   Product documentation organization

When ltd-mason builds documentation for an EUPS product, it links documentation resources from individual package repositories into a cloned copy of the product’s documentation repository. In Data Management’s Jenkins environment, the Git repositories for each EUPS package are available on the filesystem through lsstsw.

Given the structure of individual packages described above, softlinks are made from the product documentation repo as so:

  # ...
    index.rst -> /<package_1>/doc/index.rst
    # ...
    index.rst -> /<package_2>/doc/index.rst
    # ...
    <package_1>/ -> /<package_1>/doc/_static/<package_1>/
    <package_2>/ -> /<package_2>/doc/_static/<package_2>/
    # ...

In this scheme, the absolute paths to static assets in the _static/ directory is unchanged whether a package’s documentation is built alone, or integrated into the EUPS product.

Once an EUPS product’s documentation is linked, it is built by LTD Mason like any other Sphinx project.

3.3   The Build’s YAML Manifest interface to LTD Mason

Although LTD Mason runs on Jenkins in the Stack build environment, LTD Mason is not integrated tightly with LSST’s build technologies (Eups and Scons). This choice allows our build system to evolve independently of the Stack build environment, and can even accommodate non-EUPS based build environments.

The interface layer that bridges the software build system (buildsstsw.sh) to the documentation build system (the ltd-mason command line too) is a Manifest file, formatted as YAML. Note that this YAML manifest is the sole input to ltd-mason, besides environment variable-based configurations.

A minimal example of the manifest:

product_name: 'lsst_apps'
build_id: 'b1925'
refs: ['u/jonathansick/DM-4195']
requester_github_handle: 'jonathansick'
    url: 'https://github.com/lsst-sqre/pipelines_docs.git'
    ref: 'tickets/DM-4196'
        dir: '/mnt/stack_docs/lsstsw/stack/Linux64/afw/2015_10.0-14-g7c5ed66'
        url: 'https://github.com/lsst/afw.git'
        ref: 'u/jonathansick/DM-4195'

A formal schema for this YAML manifest file is available in the LTD Mason repository. For reference, the fields are:

This is the slug identifier that maps to a Product resource in ltd-keeper. For Eups-based projects, this should correspond to the Stack meta-package name (e.g., lsst_apps).
A string uniquely identifying the Jenkins build. Typically this is a monotonically increasing (or time-sortable) number.
This is the set of branches or tags that a user entered upon triggering a Jenkins build of the software. E.g. [tickets/DM-XXXX, tickets/DM-YYYY]. This field defines is used by LTD Keeper to map documentation Builds to Editions.
This is an optional field that can contain the GitHub username of the person requesting the build. If provided, this will be used to notify the build requester through Slack.
The Git URL of the product’s documentation repository.
This is the Git reference (commit, tag or branch) to checkout from the product’s documentation repository.

This field consists of key-value objects for each package in an EUPS-based multi-package software product. The keys correspond to the names of individual packages (and the Git repository names in the github.com/lsst organization).

Local directory where the package was installed by lsstsw/.
URL of the package’s Git repository on GitHub.
Git reference (typically a branch name) that was cloned and installed by lsstsw.

3.4   Summary of the documentation build process for EUPS-based projects

Given the input file, ltd-mason runs the following process to build an EUPS-based software product’s HTML documentation:

  1. Clone the product’s documentation repo and checkout the appropriate Git reference (based on the YAML manifest’s doc_repo key).
  2. Link the doc/ directories of each installed package (in lsstsw/install/) to the cloned product documentation repository (see 3.2   Product documentation organization).
  3. Run a Sphinx build of the complete product documentation with sphinx-build.

The result is a built static HTML site.

4   LTD Mason Documentation Uploads

Once documentation is compiled into a static website consisting of HTML, CSS, images, and other assets, LTD Mason uploads those resources to LSST the Doc’s Amazon Web Services S3 bucket.

The upload process is governed by a handshake with the LTD Keeper API server. When an LTD Mason instance wants to upload a new build it sends a POST /products/(slug)/builds/ request to LTD Keeper. The request body describes what Git branch this documentation corresponds to, and the request URL specifies the documentation Product maintained by LTD Keeper. The response from LTD Keeper to this request is a Build resource that specifies where this build should be uploaded in the S3 bucket, along with metadata that should be attached to uploaded artifacts.

LTD Mason uses boto3 to upload static documentation sites to S3. During this upload, LTD Mason gives every object a public-read Access Control List header to facilitate API access by Fastly. Content-Type headers are also set, based on mimetypes.guess_type(), to enable gzip-compressed delivery with Fastly. The Cache-Control header is set to max-age=31536000, which allows content to be retained in Fastly’s cache for one year, or until specifically purged. Purges are facilitated by also setting the x-amz-meta-surrogate-key header according to the Build resource returned by the LTD Keeper request. This surrogate key allows individual documentation builds to be purged from the Fastly CDN.

Once the upload is complete, LTD Mason notifies LTD Keeper by sending a PATCH request to the build resource that changes the uploaded field from false to true.

5   LTD Mason on Travis

Although LTD Mason can run documentation builds for EUPS-based projects, not all projects use EUPS. In fact, not all projects will even use Sphinx. For such generic projects, Travis CI is a popular continuous integration environment. LTD Mason provides an alternative command line interface, ltd-mason-travis, specifically for publishing documentation from a Travis environment. In this mode, LTD Mason can upload any static website to LSST the Docs, regardless of the tooling used to create that site.

The LTD Mason documentation describes how to configure Travis. The following is a realistic example Travis configuration .travis.yml for a Python project that is also publishing documentation to LSST the Docs:

sudo: false
language: python
  - '2.7'
  - '3.4'
  - '3.5'
  - '3.5-dev'
    - python: "3.5-dev"
    # This is the ltd-mason documentation deployment build
    - python: "3.5"
      env: LTD_MASON_BUILD=true
  - pip install -r requirements.txt
  - pip install ltd-mason
  - pip install -e .
  - py.test --flake8 --cov=ltdmason
  - sphinx-build -b html -a -n -W -d docs/_build/doctree docs docs/_build/html
  - ltd-mason-travis --html-dir docs/_build/html
    - LTD_MASON_BUILD=false  # disable builds in regular text matrix
    - LTD_MASON_PRODUCT="ltd-mason"
    # travis encrypt "LTD_MASON_AWS_ID=... LTD_MASON_AWS_SECRET=... LTD_KEEPER_URL=... LTD_KEEPER_USER=... LTD_KEEPER_PASSWORD=..." --add env.global
    - secure: "CIpaoNzWwEQngjmj0/OQBRUOnkT9Rq8273N5ZgXmZTtVSliukfJMROQnp9m42x3a2XFamaYV60mmuAvMRNU8VHi4nePxF2vp7utVnp8cF4zFQQzL6KnN2rqWv0H3Snqc1sfMT2n4H9qgBlYG7w5Cv52VIXdwh8MqGSxl8HAiYgqcVNJ+q1Rxeb1Yk+Bv3VW6O0/K4AlrhGY2Gl/zbwgM4ph0K0UvT1IZg8ZjCdddOpgwxPq66kvzHNcpCR6JUnvy5vRVH+IgC83Ar+oJqOA/4pizcFccriLF7nANkVJMrRSL8B1h2IHuuGYpC2VzDPMlAuEPmU6t8QAhVCOq9BSy98902TgKkvt4enPcxS2iNqMoOJSNUW7q9yqvVacz4JApJfHWlq5K7uTy00p4XHV4TUs+9NEgBUCwEFE5CXcRQvg+Y2y1wqUUkH+12nb1Nv4CdGxG6k7yG+eM+qmANJ87jZK9vX0RmDLKXuA3gpJyVomrAKX1+MqqwD0Qu885AUsHCQevO+oDmXv6nKLK/x2ZeyHQrgWISj3LXU6B7LarLrqsrE7JWTwgo/iX6xiVHS422tj94/+rab3JarBWe+ntdG9rZBdILU92kLqzgMA570ryVxtsnu8GnzOB0/3yvdtW+duAgrrBUusBcg9E/Kz/68Cm5RbMLyjaeA6HxP6mfM4="

Several aspects of this .travis.yml example are relevant to LSST the Docs users. First, an extra build is added to the testing matrix where the environment variable LTD_MASON_BUILD is explicitly set to true. Since the ltd-mason-travis command is always run in the after_success phase of a Travis build, the LTD_MASON_BUILD environment variable helps ensure that only one build in the matrix follows through on documentation publication.

We recommend running Sphinx (or similar) in the script phase. This ensures that errors in the documentation build to fail the entire build, which is likely desirable behavior. In fact, sphinx-build is run with the -a -W flags that both turns ‘warnings’ into errors, and elevates missing references into errors. Again, note how sphinx-build is entirely separate from ltd-mason-travis; any static site builder could be used.

Finally, in the env.global section, LTD Mason is configured through several environment variables. Travis’s encrypted environment variable feature is used to to securely store credentials for AWS S3 and LTD Keeper. The private key needed to decrypt these fields is known only to Travis and is directly associated with the GitHub repository. In other words, forks of a repository cannot gain access, and publish to, LSST the Docs.

6   Versioned Documentation URLs

LSST the Docs is designed to host an arbitrary number of documentation projects, along with an arbitrary number of versions of those projects.

LSST the Docs serves each documentation project from its own subdomain of lsst.io. For example, sqr-000.lsst.io or ltd-keeper.lsst.io. These subdomains are memorable, and allow documentation to be referenced without need for a link shortener. This URL model also concurs with the recent practice by Apple’s Safari browser to collapse the entire URL to just the domain in the location bar.

Also note that LSST the Docs publishes specifically to lsst.io rather than lsst.org. This is because LSST the Docs requires programmatic access to a domain’s DNS settings, and the lsst.io domain allows us to do that without interfering with lsst.org‘s operations. Our intention is to brand lsst.io as synonymous with ‘LSST Documentation.’

6.1   The default documentation edition

From the root URL for a documentation product, for example https://example.lsst.io/, LSST the Docs serves what is considered to be the ‘default’ version of the documentation. Conventionally, this is documentation built from the master branch of a Git repository. This choice can be changed on a per-project basis for strategic reasons. For example, a software project may choose to serve documentation from a stable release branch at the root URL.

6.2   Additional editions for Git branches

LSST the Docs serves separate editions of documentation for each branch of the project’s parent repository. These editions are served from a /v/ path off of the root domain. For example, a branch named v1 would be served from https://example.lsst.io/v/v1/.

For ticket branches used by Data Management (e.g., tickets/DM-1234), LSST the Docs transforms that branch name to create more convenient edition URLs: https://example.lsst.io/v/DM-1234/.

Editions are created automatically for every new branch (as in, they are provisioned on-demand when LTD Mason POSTs a build from a new Git branch). We believe that this automation will be incredibly useful for code reviews. For any pull request it will be unambiguous where corresponding documentation can be found. Making documentation more visible in code reviews should improve the culture of documentation within Data Management.

6.3   Archived documentation builds

LSST the Docs stores every documentation build uploaded as an immutable object that is never deleted, by default. When a new documentation build is uploaded by LTD Mason, that build exists alongside the previous documentation builds rather than replacing them. These individual builds are available from the /builds/ path off the root domain. For example, the first build would be available at https://example.lsst.io/builds/1/.

Retaining builds serves two purposes. First, it allows “A/B” comparisons of documentation during development. During a code review, or debugging session, a developer can link to individual builds corresponding to individual pushes to GitHub.

Second, keeping builds available provides a recovery mechanism should a published build for an edition be broken. If old builds were not available the only recourse we be to rebuild and re-upload the documentation from scratch. Yet if the documentation is somehow broken, this may not be a quick recovery operation. This entire scenario is solved by retaining all builds so that recovery to a known ‘good’ build is immediate.

6.4   Discovery of available editions and builds

A reader of an LSST the Docs-published project will likely want a convenient interface for discovering and switching between the available editions and even builds. Such services are enabled by LTD Keeper’s RESTful API.

One type of interface would be edition-switcher interface elements embedded in published HTML pages. Such interface elements are specific to the front-end architecture of documents published on LSST the Docs, and are out of scope of this document.

Another type of interface would be dashboard pages that dynamically list metadata about available editions and builds. Though not yet implemented, we envision that such pages would be available at


for editions of a documentation project and


for builds of a documentation project. These dashboards would leverage data from the LTD Keeper API, and be rendered entirely on the client with React, or example.

In addition, we anticipate that the LTD Keeper API will be consumed by DocHub, a proposed LSST-wide API for documentation discovery. With DocHub and the LTD Keeper API, documentation projects and their main editions would be dynamically listed from LSST documentation landing pages.

6.5   Presenting versioned documentation to search engines

Having so many instances of documentation sites is detrimental to those site’s ranking in search engines, such as Google. Furthermore, we likely want a potential documentation reader to always land on the default edition of the documentation. These objectives can be achieved by setting the page’s canonical URL in HTML:

<link rel="canonical" href="https://example.lsst.io/index.html">

Of course, this will require modification of the HTML presentation of projects published on LSST the Docs. As an alternative, LSST the Docs may in the future set the canonical URL of pages it serves through an HTTP header:

Link: <https://example.lsst.io/index.html>; rel="canonical"

7   Serving Versioned Documentation for Unlimited Projects with Fastly

The previous section laid out the URL architecture of documentation projects hosted on LSST the Docs. This section focuses on the practical implementation of documentation delivery to the reader.

Besides serving beautiful URLs, LSST the Doc’s hosting design is governed by two key requirements. First, LSST the Docs must be capable of serving an arbitrarily large number of documentation projects, along with an arbitrarily large number of versions of those documentation projects. Second, web page delivery must be fast and reliable. Since documentation consumption is an integral aspect of LSST development work, any documentation download latency or downtime is unacceptable. Finally, LSST the Docs will host highly public documentation projects, such as documentation for LSST data releases. LSST the Docs must gracefully handle any web-scale traffic load.

To meet these requirements, LSST the Docs uses two managed services: Amazon Web Services S3 and the Fastly content distribution network.

The role of S3 is to authoritatively store all documentation sites hosted by LSST the Docs. When readers visit an lsst.io site, they do not directly interact with S3, but rather with Fastly. As a content distribution network, Fastly has points of presence distributed globally. When a page from LSST the Docs is requested for the first time from a point of presence, Fastly retrieves the page from S3 and forwards it the original requester. At the same time, Fastly caches the page in its point of presence. The next time the same page is requested, it is served directly from the nearby Fastly point of presence. By bringing the documentation content closer to the reader, regardless of where on Earth the reader is, LSST the Docs can deliver content with less latency.

7.1   Organization of documentation in S3

Static web pages are conceptually simple to serve since individual files on the server’s filesystem map directly to URLs. Specifically, S3 provides a cost-effective static site hosting solution that is highly available and resilient to any traffic load. S3 even includes a setting to turn its buckets into statically hosted public websites. In this mode, the S3 bucket’s URL is named after the domain the site is served from. For LSST the Docs, this would imply that each documentation project would need its own bucket to be served from its own subdomain. Creating so many buckets, especially autonomously, is not a scalable approach since Amazon asserts an apparent limit of 100 buckets per AWS account.

Instead of using multiple S3 buckets, we adopted a scalable solution advocated by Seth Vargo of HashiCorp where multiple sites are stored in a single S3 bucket but served separately through Fastly.

Files for each documentation project are stored in separate root directories of the common S3 bucket. The names of these directories match the projects’ subdomains. For example:


Within these project directories, builds are stored in a /builds subdirectory, and editions are stored in /v/ directories. For example:


This path architecture purposefully mirrors the URL architecture. This enables Fastly to serve multiple sites, and builds or editions thereof, by transforming the requested URL into a URL in the S3 bucket. This mechanism is described in the next section.

7.2   Re-writing URLs in Varnish Control Language

Every HTTP request to Fastly is processed by Varnish. Varnish is an open source caching HTTP reverse proxy. Varnish gives LSST the Docs a great deal of flexibility since each HTTP request is processed in the Varnish Configuration Language (VCL), which is an extensible Turing-complete programming language.

Thus when a request is received, we have programmed Varnish to map the requested URL to a URL in the S3 origin bucket through simple regular-expression base manipulations. The follow tables describes the three types of URLs that need to be supported.

Type Request URL S3 Origin URL
default https://example.lsst.io/ {{ bucket }}.s3.amazonaws.com/example/v/main/index.html
edition https://example.lsst.io/v/{{ edition }}/ {{ bucket }}.s3.amazonaws.com/example/v/{{ edition }}/index.html
build https://example.lsst.io/builds/{{ build }}/ {{ bucket }}.s3.amazonaws.com/example/builds/{{ build }}/index.html

This is URL manipulation is accomplished with approximately the following VCL code:

sub vcl_recv {

  # ...

  set req.http.Fastly-Orig-Host = req.http.host;
  set req.http.host = "bucket.s3.amazonaws.com";

  # ... HTTP -> HTTPS redirect code

  # ... Read the Docs redirection code

  # Rewrite URL for default edition (root URL)
  if( req.url !~ "^/v/|^/builds/" ) {
        set req.url = regsub(req.http.Fastly-Orig-Host,
                             "/\1/v/main") req.url;

  # Rewrite URL for editions and builds
  if( req.url ~ "^/v/" || req.url ~ "^/builds/" ) {
        set req.url = regsub(req.http.Fastly-Orig-Host,
                             "/\1") req.url;

  # ...


On line 6, the domain is changed from *.lsst.io to the bucket’s API endpoint.

In the next highlight section, we detect any URL path (stored in the req.url variable) and test if it does not start with /v/ or /build/, meaning that the default documentation is being requested. In that case, the path is re-written such that req.url is relative to the /v/main/ subdirectory of a product in the S3 bucket (the default edition is an alias for /v/main/). The directory of the product is obtained from the subdomain of the original request domain (req.http.Fastly-Orig-Host).

For regular edition or build URLs, all that must be done is combine the product name extracted from req.http.Fastly-Orig-Host with the req.url to obtain the path in the S3 bucket.

7.3   Replicating web server behavior from S3’s REST API

We configured Fastly to obtain resources from S3 through its REST endpoint (e.g., {{ bucket }}.s3.amazonaws.com) rather than the S3 website endpoint (e.g., s3-website-us-east-1.amazonaws.com/{{ bucket }}). The advantage of using the REST endpoint is that communications between Fastly and S3 are encrypted with TLS, preventing a ‘man-in-the-middle’ attack.

Using the REST endpoint, on the other hand, means forgoing some conveniences of a web server built for browser traffic. For example, a example.lsst.io/ path does not automatically imply example.lsst.io/index.html. Instead, these conveniences must be built into the VCL logic.

For example, the code to re-write a directory URL to the index.html document is

sub vcl_recv {
  # ...

  if( req.url ~ "/$" ) {
    set req.url = req.url "index.html";

  # ...

Web servers also provide courtesy directory redirects: even if a user requests only example.lsst.io/dirname, a server will know to redirect and serve example.lsst.io/dirname/index.html. This feature is more difficult to implement in VCL since Varnish do not know if /dirname is a directory or merely a file named dirname. LSST the Docs solves this issue in a two-phased approach.

First, LTD Mason uploads ‘directory redirect objects’ automatically with each build to S3. These objects, created for each directory, possess the directory’s name. Such objects can exist in an S3 bucket because S3 does not implement directories as filesystem objects, rather they are inferred by from objects’ keys. These directory redirect objects are empty, except for a metadata header: x-amz-meta-dir-redirect=true.

When a user requests a URL like example.lsst.io/v/demo, Fastly will receive the directory redirect object from S3. In the vcl_fetch Varnish phase, this header is detected:

if ( beresp.http.x-amz-meta-dir-redirect ) {
      error 901 "Fastly Internal";

The internal 901 error is caught in vcl_deliver and converted into a 301 permanent redirect response.

if (resp.status == 901 ) {
    set resp.status = 301;
    set resp.response = "Moved Permanently";

And the destination URL for the redirect is set:

if( req.url !~ "/$" && resp.status == 301 ) {
    set resp.http.location = "https://" req.http.Fastly-Orig-Host req.http.x-original-url "/index.html";

Note that the req.http.x-original-url and req.http.Fastly-Orig-Host are the originally requested paths and domain names, respectively, cached early by the Varnish processing stack.

7.4   Redirecting Read the Docs URLs

When LSST the Docs was launched, tens of LSST documents were already being published with Read the Docs. Whereas LSST the Docs serves default documentation from the root URL, example.lsst.io/, Read the Docs always exposes a version name in its URLs. The default edition is example.lsst.io/en/latest/. To prevent broken URLs, we coded the VCL to send a 301 permanent HTTP redirect response to any path beginning with /en/latest/, using a pattern similar to the courtesy directory redirects described previously.

In vcl_recv, the deprecated URL is detected:

if( req.url ~ "^/en/latest" ) {
  error 900 "Fastly Internal";

In vcl_deliver, the internal 900 error is converted into a 301 response:

if (resp.status == 900 ) {
  set resp.status = 301;
  set resp.response = "Moved Permanently";

if( req.url ~ "^/en/latest" && resp.status == 301 ) { 
  set resp.http.location = "https://" req.http.Fastly-Orig-Host regsub(req.url, "^/en/latest(.+)$", "\1");

Sending a 301 redirect rather than silently re-writing the URL improves search engine optimization since the canonical URL is enforced.

7.5   Serving HTTPS to the browser

Another convenience of Fastly is that web pages are encrypted to the browser with TLS (that is, served over HTTPS). LSST the Docs uses a shared wildcard certificate to serve all *.lsst.io domains.

Although HTTP requests are accepted, we configured Fastly to redirect HTTP requests to HTTPS so that all communications are encrypted.

Non-TLS requests are detected early in the vcl_recv block with the Fastly-SSL header passed from Fastly’s TLS terminator to the caching layer:

if( !req.http.Fastly-SSL ) {
    set req.http.host = req.http.Fastly-Orig-Host;

if( req.url ) {
  if (!req.http.Fastly-SSL) {
     error 801 "Force SSL";
  # ...

Note how req.http.host is reset to the original host (*.lsst.io) rather than the S3 hostname.

This 801 error is serviced in vcl_error:

if (obj.status == 801) {
   set obj.status = 301;
   set obj.response = "Moved Permanently";
   set obj.http.Location = "https://" req.http.host req.url;
   synthetic {""};
   return (deliver);

7.6   Serving Gzip-compressed content

We have configured Fastly to serve text-based content with Gzip compression. Specifically, HTML, CSS, JavaScript, web font, JSON, XML and SVG content is compressed en route to the browser. This reduces bandwidth and creates a better user experience.

7.7   Managing Fastly and browser caching

Caches accelerate browsing performance. In LSST the Docs there is not one cache but two: Fastly, and the local cache maintained by a web browser. With caches there is a natural tension between the lifetime of objects in a cache and ensuring that a browser is always displaying the most recent content. This section summarizes how LSST the Docs manages caches. Note that this cache logic is controlled both by the LTD Mason upload phase and LTD Keeper’s Edition updates.

7.7.1   Controlling the Fastly cache with surrogate key purges

LSST the Docs ensures that Fastly points of presence retain data for as long as possible by either setting Cache-Control: max-age=31536000 or x-amz-meta-surrogate-control: max-age=31536000 (i.e., one year) in the headers of objects stored on S3.

If content cached in Fastly needs to be updated, LSST the Docs is able to do so with a surrogate key purge. LTD Keeper assigns unique surrogate keys to every Build and Edition resource. When either LTD Mason or LTD Keeper add files to S3, these surrogate keys are inserted into the x-amz-meta-surrogate-key headers of objects. Thus when an Edition is updated, for example, LTD Keeper is able to purge that Edition specifically through its surrogate key with the Fastly API: POST /service/{{id}}/purge/{{key}}.

7.7.2   Controlling the browser’s caching

For maximum performance, the browser’s cache must also be managed. By default, LTD Mason uploads objects with a Cache-Control: max-age=31536000 header. This header applies to both Fastly and browsers. Since builds are immutable, such a potentially long-lived cache in the browser is acceptable.

Editions have more complex caching requirements since objects at a given URL can be updated. In fact, for editions serving development branches, a developer will want the edition to reliably represent the most recent push to GitHub. To accomplish this, LTD Keeper alters the headers of objects in editions to include the following headers:

x-amz-meta-surrogate-control: max-age=31536000
Cache-Control: no-cache

The x-amz-meta-surrogate-control header instructs Fastly to retain the edition in its caches for one year (or until purged). This Surrogate Control key is only used by Fastly, and is not send to the browser. This allows the Cache-Control header to exclusively manage browser caching.

Here, Cache-Control: no-cache means that a browser can cache content, but that each request must be re-validated by the server (Fastly). In this re-validation process, the browser provides Fastly with the ETag of the object in its cache. If that ETag matches the current version, Fastly responds with a content-less HTTP 304 response. Otherwise, Fastly returns the entire new object. This caching approach balances the needs of reducing network bandwidth while ensuring content is up-to-date, though at the expense of lightweight validation requests to Fastly.

8   LTD Keeper API

LTD Keeper is a microservice that plays a central coordination and automation role in LSST the Docs. It is implemented as a Python 3 web application, built upon the Flask microframework. As shown in Figure 1, LTD Keeper directly interacts with AWS S3 (storage), AWS Route53 (DNS) and Fastly (CDN). LTD Keeper also maintains an SQL database of all documentation products, editions and builds. Clients can interact with LTD Keeper resources, and trigger actions, through a RESTful HTTP API. LTD Mason is the original consumer of this API.

LTD Keeper’s API is documented at https://ltd-keeper.lsst.io. This section will describe the API resources and methods broadly; those writing clients should consult the API reference documentation.

8.1   LTD Keeper Authentication and Authorization

LTD Keeper, at the moment, generally accepts anonymous read requests to facilitate clients that discovery documentation through the API. HTTP methods that change state (POST, PUT and PATCH) require the client to be both authenticated an authorized.

Authentication is implemented with HTTP basic auth. Registered clients have a username and password. Clients send these credentials in the basic auth header to the POST /token API endpoint to receive a temporary auth token. This auth token is used for all other API endpoints.

Users are also assigned different authorization roles. These roles are:

Role Description
ADMIN_USER Can create a new API user, view API users, and modify API user permissions.
ADMIN_PRODUCT Can add, modify and deprecate Products.
ADMIN_EDITION Permission to add, modify and deprecate Editions.
UPLOAD_BUILD Permission to create a new Build.
DEPRECATE_BUILD Permission to deprecate a Build.

A given user can have several roles, although users should be given only the minimum permission set to accomplish their activities. For example, the user accounts used by LTD Mason only have the UPLOAD_BUILD role.

8.2   API resources

As a RESTful application, LTD Keeper makes resources available through URL endpoints that can be acted upon with HTTP methods. The main resources are Products, Builds, and Editions.

8.2.1   Products

Products (/products/) are the root resource. A Product corresponds to a software project (such as lsst_apps or Qserv) or a pure documentation project, such as a technical note or design document. Each Product is served from its own subdomain of lsst.io.

An administrator creates a new Product with POST /products/. When a new Product is created, LTD Keeper configures a CNAME DNS entry for that product’s subdomain to the Fastly endpoint. LTD Keeper also automatically creates an Edition called main that serves documentation from the root URL.

Information about a single Product can be retrieved with GET /products/(slug). A listing of all Products is obtained with GET /products/.

See the /products/ resource documentation for a full listing of the methods and metadata associated with a Product.

8.2.2   Builds

Builds are discrete, immutable uploads of a Product’s documentation, typically uploaded by LTD Mason. The process of uploading a build is described above.

Build resources contain a surrogate_key that corresponds to the X-Surrogate-Key HTTP header set by LTD Mason. Through this surrogate key, Fastly can purge a specific build from its cache.

Build resources also contain a git_refs field, which is a list of Git branches that describe the documentation’s version. (Note that git_refs is a list type to accommodate multi-repository projects). This git_refs field is used to identify Builds that can be published through an Edition.

Builds for a single Product can be discovered through the GET /products/(slug)/builds/ endpoint

8.2.3   Editions

Editions are documentation published from branches of a Git repository (e.g. example.lsst.io/v/{{ branch }}. The default documentation published at the root URL is also an Edition.

Editions have a slug that corresponds to the both the Edition’s subdirectory in S3 and the Edition’s URL path. Editions also have a tracked_refs field that lists the set of Git branches for which the Edition serves documentation. The slug is typically derived from tracked_refs, though not necessarily. For example, LSST the Docs includes a rule to transform ticket branch names like tickets/DM-1234 into readable slugs like DM-1234.

As well, Editions have a pointer to the Build that they are currently publishing, as well as a surrogate key. This surrogate key is separate from the one used by the Build, and instead allows a specific Edition to be reliably purged from Fastly’s cache.   Updating Editions with new Builds

An Edition can be updated by uploading new Builds with git_refs fields that match the tracked_refs field of the Edition. Whenever a new build it posted, LTD Keeper automatically checks if that build corresponds to an Edition. An edition can also be manually ‘re-built’ by sending a PATCH request to the Edition resource that contains a new build_url. This feature is useful for scenarios where a new Build is broken and the Edition needs to be reset to a previous Build without needing to upload a completely new Build.

When an Edition is being updated, the old copy of the Edition is deleted and the new build is copied to the Edition’s location in the S3 bucket. During this copy operation the surrogate key metadata in the files is changed from that of the Build to the Edition. Cache control headers are also modified to ensure that browsers request the latest version of any Edition. By associating a stable surrogate key to an Edition, purges are easy to carry out. Indeed, once the new build is copied into the Edition’s directory, LTD Keeper purges the Edition from the Fastly cache. This means that during the copy there is no downtime since content is served from Fastly’s cache. Once the copy is complete, and the old build purged, the updated Edition is served. See the section 7.7   Managing Fastly and browser caching for more details.

9   LTD Keeper Deployment with Kubernetes

LTD Keeper is deployed in Docker containers orchestrated by Kubernetes on Google Container Engine.

9.1   LSST the Docs as a manifestation of DevOps culture

Given that LTD Keeper is a relatively modest application, it may not have been unreasonable in some organizations to deploy the application manually. This process would likely involve provisioning a virtual machine, installing dependencies like Python on it, installing Nginx and uwsgi, installing the LTD Keeper application, and finally configuring all this software. The problem with this approach is that it does not scale. Each hand-configured server is a special snowflake with its own operational rules. Without extensive documentation, such applications cannot be managed by anyone on the team of than the person who originally configured it.

The generic solution to this problem is to treat infrastructure as code. In this case, the infrastructure is completely specified in code that can be checked into a Git repository and documented. Software (like Puppet and Terraform) can apply this configuration to provision and manage servers. If a server breaks or an application needs to be updated, the operator simply applies or updates the configuration. Treating infrastructure as code dramatically improves service reliability, improves a team’s operational efficiency, and makes it easier for a team to collectively manage production services.

Infrastructure as code also gives rise to DevOps (development/operations). In DevOps, an application’s developers are also its operational administrators. SQuaRE, the team which builds LSST the Docs, is an excellent example of a DevOps team. Since we are a small, agile group, we cannot afford to hire staff who either only develop, or only operate, services. Another advantage of DevOps is that there are massive incentives for developers to write reliable, easy to maintain, services—otherwise developers would never have time to develop new features. Google’s Site Reliability Engineer program is an especially good example. Google SREs are only ‘allowed’ to spend 50% of their time operating services. If a service requires more operational effort, regular developers are temporarily drafted into an SRE team until systematic operational issues are resolved. [1] This feedback loop ensures that operational technical debt is kept in check.

9.2   Docker and Kubernetes

Containers, particularly Docker containers, are an excellent tool for DevOps. Essentially, a container is a very lightweight isolated Linux environment. With containers, a developer can build and test an application in exactly the same environment as in production. Furthermore, this environment is fully specified in a Dockerfile that is maintained in Git.

Containers are closely aligned with the idea of microservices: each container should only serve a specific function. For example, a Python web application, HTTP reverse proxy, and database should all reside in separate containers. This architecture makes containers (or rather, their images) easier to re-use across projects (see Docker Hub), isolates complexity, and also makes a deployment easier to scale.

Given proliferation of containers in a typical deployment, a vibrant class of container orchestration platforms has established itself. Examples include Mesos and Mesosphere DC/OS, Docker Swarm and Compose, and Kubernetes.

Ultimately we chose to deploy LTD Keeper with Kubernetes for several reasons. First, we subjectively found Kubernetes easy to use. Kubernetes is spun off of Google’s proprietary Borg orchestration platform. Thus Kubernetes inherits Google’s operational experience. We found that Kubernetes’ Pod, Replication Controller, and Load Balancer service patterns (all configured with YAML) were easy to build a complete LTD Keeper deployment around (see below).

Another benefit of Kubernetes is that it allows us to deploy LTD Keeper in a cloud (saving operational costs and improving reliability) without being locked into a single cloud provider. As a counter-example, Amazon Web Service’s Elastic Container Service (ECS) provides a variant of Mesos. If we developed an LTD Keeper deployment against ECS, we would effectively be locked into the integrated Amazon Web Services API. By contrast, Kubernetes is positioned as a developer-friendly orchestration service that can itself by deployed on OpenStack, Amazon Web Services, or even atop another orchestration layer such as Mesos.

Currently, LTD Keeper runs on a Kubernetes deployment managed by the Google Cloud Platform (Google Container Engine). At any time SQuaRE could opt to use its own Kubernetes deployment should it make strategic sense.

As a footnote, LTD Keeper also supports Docker Compose for local development. The next section, however, will focus on production deployments with Kubernetes.

9.3   Kubernetes deployment architecture

The Kubernetes deployment architecture for LTD Keeper is depicted in the following diagram.

Full operational details are provided in LTD Keeper’s documentation..


Figure 2 LTD Keeper deployment with Kubernetes. In this diagram, an incoming web request is roughly processed from top to bottom.

9.4   TLS termination service tier

The web request first encounters the nginx-ssl-proxy service. In Kubernetes, a service encapsulates networking details from the outside to Pods running within. nginx-ssl-proxy is unique in that is exposed to external internet traffic and gives LTD Keeper a fixed IP address. Services like nginx-ssl-proxy also act as load balancers, distributing traffic to pods.

Pods in the nginx-ssl-proxy service are managed by a replication controller of the same name. The role of replication controllers in Kubernetes is to launch Pods, and ensure that the desired number of Pods is active. If a pod dies (perhaps because it crashed, or the physical node it is running on failed), the replication controller automatically schedules a replacement pod. Replication controllers also provide a means of scaling the number of pods in a service. A configuration template for the nginx-ssl-proxy replication controller is available on GitHub.

Pods run under nginx-ssl-proxy each host an Nginx reverse proxy container, whose image is made available by Google Cloud Platform. Containers created from this image are configured to terminate TLS traffic using our own TLS certificate, as well as permanently redirect non-TLS traffic to HTTPS. These containers are configured with TLS certificates deployed via Kubernetes Secrets. A configuration template for nginx-ssl-proxy Secrets is available on GitHub.

9.5   Keeper service tier

Traffic from nginx-ssl-proxy is directed to the keeper service. The role of this service is to provide a networking endpoint for the LTD Keeper application pods, as well as to load balance traffic to these pods. Pods containing the LTD Keeper application are managed by a replication controller, which, again, ensures that the required number of keeper pods is available, and scales that number on demand.

Rather than create a replication controller directly, we chose to use the higher-level Kubernetes deployment API. In addition to maintaining a replication controller, deployments provide a convenient API for upgrading pods. Pod updates can be rolled out, use a canary pattern for testing, and be rolled back if necessary. Through the deployment API, deploying an upgrade of LTD Keeper in production is as simple as pushing a new image to Docker Hub, and rolling out that update with a single Kubernetes command.

keeper pods consist of two containers that are internally networked. (In Kubernetes, pods are a mechanism for ensuring that closely related containers are scheduled together on the same node.) The first pod is an Nginx reverse proxy, while the second contains the LTD Keeper codebase and runs it as a uWSGI application. The latter uwsgi container receives its configuration through a keeper-secets resource. These secrets—which include the secret key for hashing passwords, the administrator’s password, API keys for AWS and Fastly, and more benign configuration such as the database URI—are mapped to environment variables that the LTD Keeper application reads to configure itself. A keeper-deployment.yaml template is available on GitHub.

Both containers are derived from images based on the Python 3.5 base image on Docker Hub. Using a common base image reduces the container footprint on the host node. See the lsst-sqre/nginx-python-docker project and the lsst-sqre/ltd-keeper GitHub repositories for Dockerfiles specifying both containers’ images.

Note that in the overall Kubernetes deployment there are two layers of Nginx reverse proxies: one in nginx-ssl-proxy and another embedded in keeper pods. This architecture, while not strictly necessary, is consistent with a microservices approach. Nginx reverse proxies in nginx-ssl-proxy are solely responsible for TLS termination, while Nginx reverse proxy containers in keeper containers provide a solid HTTP interface to the LTD Keeper uWSGI application server. In the future, Nginx containers in nginx-ssl-proxy may be replaced by a built-in Kubernetes Ingress Service that terminates TLS. As well, pairing a Nginx reverse proxy container with LTD Keeper’s uWSGI container allows us to test their interaction on a local development environment with Docker Compose.

9.6   Management pods

In general, containers run by Pods start automatically and there is no need to log into a running container. Pods are intended to be run as immutable infrastructure.

Databases run counter to this philosophy since they are stateful. Provisioning a new database, or migrating a database’s schema, are special events that require an operator in the loop.

To deal these circumstances we use a management pod that is modified to run a container with the LTD Keeper codebase, yet not serve traffic. When deployed, an operator can log into the management pod and run maintenance tasks included in the LTD Keeper codebase, while having access to production configurations. LTD Keeper’s documentation includes a playbook for executing database migrations through a maintenance pod.

9.7   Database

For its initial launch, LSST the Docs’ Keeper deployment uses SQLite as its relational database. Since Kubernetes pods are ephemeral, the SQLite database is stored on a Google Compute Engine persistent disk that is attached to the node hosting the keeper pod. This choice imposes a limitation on LTD Keeper’s reliability since the persistent disk can only be attached to a single node. Ideally we would run several keeper pods simultaneously from multiple nodes.

We plan to eventually migrate from SQLite to a hosted relational database solution. Given our current use of Google Cloud Platform, the Google Cloud SQL hosted MySQL database is the primary choice. Since LTD Keeper uses SQLAlchemy to generically interact with SQL databases, this eventual migration will be easy.

10   Additional Resources

LSST the Docs code is MIT-Licensed open source. It’s built either natively for, or compatible with, Python 3. Here are the main repositories and their documentation:

Work on LSST the Docs is labeled under ‘lsst-the-docs’ on LSST Data Management’s JIRA.

LSST the Docs is part of a greater LSST Data Management documentation and communications strategy. For more information:

11   References

[1]Niall Richard Murphy, Jennifer Petoff, Chris Jones, and Betsy Beyer. Site Reliability Engineering: How Google Runs Production Systems. O’Reilly Media, Sebastopol, California, 2016.