Architecture

Overview

Treeherder is a Django-based Python 3 application that consumes CI data from a variety of sources - making it available via REST APIs to a static Neutrino/webpack-built React frontend.

Treeherder's production, stage and prototype instances are hosted on the Heroku platform, however Amazon RDS is used for their MySQL databases. These RDS instances are owned by the moz-devservices AWS account and to reduce latency are in the same AWS region (us-east-1) as Heroku's US common runtime.

The apps are structured like so:

Architecture Diagram

To learn more about Heroku, see How Heroku Works and their Python app tutorial.

Web requests

Incoming requests are directed to web dynos by Heroku's HTTP router, which also terminates TLS. Certificates are automatically renewed by the Let's Encrypt-powered Automated Certificate Management feature.

On each web dyno, the Django WSGI app is served directly using gunicorn, which also handles requests for static assets via the Django middleware WhiteNoise. In the future the frontend may be hosted separately using Netlify instead, see bug 1504996.

Background tasks

Background tasks are primarily managed using Celery with RabbitMQ as a broker. The RabbitMQ instances are provided by the CloudAMQP Heroku addon.

Version control pushes and CI job result data is consumed from Mozilla's event stream, Pulse, via dynos running Kombu-powered Django management commands. These do not ingest the data themselves, instead adding tasks to internal queues for the Celery workers to process. The Celery workers then handle the storing of this push/job data, along with tasks spawned as a result (such as the parsing of logs associated with those jobs).

For more details on process/worker types, see Treeherder's Procfile.

There are also tasks that are run on a schedule, triggered via either:

  1. Heroku's scheduler addon

    This runs the commands configured within the addon in one-off dynos at the chosen interval (see adjusting scheduled tasks).

    The tasks it currently runs are:

    • update_bugscache (hourly)
    • cycle_data (daily)
    • run_intermittents_commenter (daily)
  2. The celery_scheduler dyno

    This schedules the tasks listed in CELERY_BEAT_SCHEDULE in settings.py, which are then processed by the worker_misc Celery worker dyno. However we're moving away from using this in favour of the Heroku scheduler addon (see deps of bug 1176492).

Deployment lifecycle

The Heroku apps are linked to Treeherder's repository using Heroku's GitHub integration, and each instance is normally set to auto-deploy from specific branches (see Deploying Treeherder).

Once a Treeherder deployment is initiated, the following occurs:

  1. Building the application "slug"

    • The Treeherder source code is checked out from GitHub.
    • Files matching .slugignore entries are removed.
    • The buildpacks configured on the app are invoked, which in our case are:

      1. heroku-buildpack-nodejs: Installs Node.js and Yarn (using the versions from package.json), runs yarn install, then builds the Treeherder frontend via yarn heroku-postbuild.

      2. heroku-buildpack-python: Installs Python (using the version from runtime.txt), pip installs Python dependencies (using requirements.txt), for Django apps runs collectstatic, then triggers our custom bin/post_compile script.

    • The results of the above are packaged into a compressed tar archive known as a "slug".
  2. Creating a "release"

  3. Running the new release

    • The existing dynos are shutdown.
    • New dynos are spun up using the newly-created release with the process types/commands specified in our Procfile.

Note

Heroku follows the twelve-factor app principal of having three distinct phases for the deployment process: build, release and run. Whilst all three phases occur for a standard deployment, that is not always the case.

For example, when updating environment variables or performing a rollback the build step is skipped and a new release created that re-uses the slug from a previous build. This is why it is important to only interact with external services (such as running DB migrations or notifying New Relic about a deployment) during the release step (using Release Phase) and not during the build.