Tags: ,

haskell-ci how-to: caching and using your program executable

In this article I show how to extend the haskell-ci GitHub Actions workflow to pass the built executable to subsequent jobs.

Background §

The Haskell Security Response Team recently bootstrapped the haskell/security-advisories database. This repository contains:

With both tool sources and advisory data in the repository our continuous integration (CI) pipelines have to do several things:

The remainder of this post explains how we use haskell-ci and GitHub Actions reusable workflows to achieve the first two objectives. The Security Response Team has not yet tackled publishing but the same techniques should be applicable.

Introduction to haskell-ci §

haskell-ci is a tool that generates CI workflows for Haskell projects. It supports GitHub Actions (actively maintained) and Travis-CI (unmaintained), and can also generate shell scripts for local testing. You can install haskell-ci via cabal:

% cabal install haskell-ci

Alternatively, you can clone the Git repository and build from there:

% git clone https://github.com/haskell-CI/haskell-ci
% cd haskell-ci
% cabal install

Now that haskell-ci is on the PATH you can generate the GitHub actions workflow in a couple of steps. First, add the GHC versions you want to test with to the tested-with field in your package’s .cabal file:

cabal-version:      2.4
name:               hsec-tools
version:            0.1.0.0
tested-with:
  GHC ==8.10.7 || ==9.0.2 || ==9.2.7 || ==9.4.5 || ==9.6.2
…

Run haskell-ci list-ghc to see the list of GHC versions it knows about. haskell-ci updates usually follow soon after GHC releases, especially major versions.

Next run haskell-ci github path/to/package.cabal. It will inspect the .cabal file to see what GHC versions to include in the build matrix, and write .github/workflows/haskell-ci.yml. Then commit the changes and push (or create a pull request). For example:

% haskell-ci github code/hsec-tools/hsec-tools.cabal
*INFO* Generating GitHub config for testing for GHC versions: 8.10.7 9.0.2 9.2.7 9.4.5 9.6.2
% git add code .github
% git commit -m 'ci: add haskell-ci workflow' --quiet
% git push
…

What does the haskell-ci workflow do? §

This post is not the place to belabour the details of GitHub Actions workflow syntax. But I will make a few observations about the steps in the haskell-ci workflow.

Adding Haddock and HLint jobs §

haskell-ci makes it easy to add Haddock (build documentation) and HLint (source code suggestions) jobs to your workflow. Just use the --haddock and --hlint options when creating the workflow:

% haskell-ci github --haddock --hlint path/to/package.cabal

The Haddock step (if enabled) runs on every job in the build matrix. Haddock is part of the GHC toolchain so there are no extra dependencies.

HLint is an extra dependency; if the HLint step is enabled, it will install it via cabal v2-install. The HLint step is skipped for all but one of the jobs in the matrix—by default, the most recent version of GHC.

haskell-ci prefers a particular version of HLint. Sometimes that version of HLint doesn’t build against the latest version of GHC. Use the --hlint-job option to override the job:

% haskell-ci github --hlint --hlint-job 9.4.5 foo.cabal

Updating the build matrix §

When a new release of GHC comes along, updating the haskell-ci workflow is as simple as adding it to the tested-with list, then running:

% haskell-ci regenerate
No haskell-ci.sh, skipping bash regeneration
*INFO* Generating GitHub config for testing for GHC versions: 8.10.7 9.0.2 9.2.7 9.4.5 9.6.2
No .travis.yml, skipping travis regeneration

haskell-ci regenerate reuses the options from the original invocation of haskell-ci github. These were recorded in a comment starting with # REGENDATA in haskell-ci.yml. After running haskell-ci regenerate, all that’s left is to commit and push the changes.

GitHub Actions: passing the executable between jobs §

Now the haskell-ci job is set up it will build and test the package on every push or pull request. We have a further CI use case: using the built executable to perform additional action. So we now turn to the problem of how to use data produced by the haskell-ci workflow in other jobs.

GitHub Actions provides (at least) 3 mechanisms for passing data between jobs.

By default GitHub retains artifacts for 90 days. The duration can be customised.

We need to add two steps to the linux job. First we install the hsec-tools executable. It was already built—this just copies it to a known location. --install-method=copy ensures the executable is copied to that location, not symlinked.

      - name: install executable
        if: matrix.compiler == 'ghc-9.6.2'
        run: |
          $CABAL v2-install $ARG_COMPILER \
            --install-method=copy exe:hsec-tools

The second step uses the upload-artifact action to archive the executable. The artifact bundle name includes the commit hash. The file within the bundle keeps the name hsec-tools.

      - name: upload executable
        uses: actions/upload-artifact@v3
        if: matrix.compiler == 'ghc-9.6.2'
        with:
          name: hsec-tools-${{ github.sha }}
          path: ~/.cabal/bin/hsec-tools

All Haskell dependencies are statically linked in the binary. It does need some system libraries including libgmp and libffi. But we do not need to preserve the Cabal store or provide the GHC toolchain when we use the artifact.

Notice that each of the new steps has the condition:

        if: matrix.compiler == 'ghc-9.6.2'

The build matrix produces jobs for several different GHC versions. But we only need one copy of the hsec-tools executable. I’m not totally happy with this approach because the patch will need updating as the matrix evolves. But I can live with it for now.

GitHub Actions: workflows and jobs §

A repository can define one or more CI workflows. They are written as YAML files in the .github/workflows/ directory.

Each workflow is comprised of one or more jobs. It is straightforward to declare dependencies between jobs within a workflow. But workflows themselves are independent. There is no reasonable way to specify that a particular workflow depends on the result or outputs of another workflow.

This means that for our use case we have to create a new job within the Haskell-CI workflow. Because haskell-ci.yml is generated by the haskell-ci tool we have to patch this file. Fortunately haskell-ci provides a mechanism to apply specified patches when generating haskell-ci.yml (shown later in this article). Unfortunately, defining and maintaining our additional job(s) as patches to YAML files is even more unpleasant than dealing with them as plain YAML.

GitHub Actions: reusable workflows §

Reusable workflows provide a neat solution. A reusable workflow is defined as a separate YAML file, just like ordinary workflows. The main differences are:

The main goal of reusable workflows is to enable reuse, like subroutines in programming. Our use case is a bit different. We will define the check-advisories behaviour as a reusable workflow. Although we will not be using it from multiple places, it still gives us several advantages:

Defining the check-advisories workflow §

The check-advisories workflow is defined in .github/workflows/check-advisories.yml. The full content is below, with commentary.

name: Check security advisories
on:
  workflow_call:
    inputs:
      artifact-name:
        required: true
        type: string

The workflow_call trigger condition establishes it as a reusable workflow. We also define the artifact-name input. The caller is required to provide it.

jobs:
  check-advisories:
    runs-on: ubuntu-20.04
    steps:
      - uses: actions/checkout@v3
        with:
          path: source

The workflow has a single job called check-advisories. As usual the first step is to check out the repository.

      - run: mkdir -p .local/bin
      - id: download
        uses: actions/download-artifact@v3
        with:
          name: ${{ inputs.artifact-name }}
          path: ~/.local/bin
      - run: chmod +x ~/.local/bin/hsec-tools

Next we download the hsec-tools artifact to ~/.local/bin, which is in the PATH. Then we chmod it to make it executable.

      - name: run checks
        run: |
          cd source
          RESULT=0
          while read FILE ; do
            echo -n "$FILE: "
            hsec-tools check < "$FILE" || RESULT=1
          done < <(find advisories EXAMPLE_ADVISORY.md -name "*.md")
          exit $RESULT

Finally we find all the advisory files and run hsec-tools check on each one. If any of the checks fail the whole job fails (after checking each file—we don’t want to short-circuit).

Calling the check-advisories workflow §

Add a new job to the haskell-ci.yml workflow. It must be a separate job, not a step of the existing linux job.

  check-advisories:
    name: Invoke check-advisories workflow
    needs: linux
    uses: ./.github/workflows/check-advisories.yml
    with:
      artifact-name: hsec-tools-${{ github.sha }}

The meaning of the fields is as follows:

You can call workflows defined in other repositories. For example:

uses: user-or-org/repo/.github/workflows/workflow.yml@v1

Patching haskell-ci.yml §

At this stage I have committed the check-advisories.yml reusable workflow. I also have uncommitted changes to haskell-ci.yml:

diff --git a/.github/workflows/haskell-ci.yml b/.github/workflows/haskell-ci.yml
index d51bb64..7ff8684 100644
--- a/.github/workflows/haskell-ci.yml
+++ b/.github/workflows/haskell-ci.yml
@@ -224,3 +224,19 @@ jobs:
         with:
           key: ${{ runner.os }}-${{ matrix.compiler }}-${{ github.sha }}
           path: ~/.cabal/store
+      - name: install executable
+        if: matrix.compiler == 'ghc-9.6.2'
+        run: |
+          $CABAL v2-install $ARG_COMPILER \
+            --install-method=copy exe:hsec-tools
+      - name: upload executable
+        uses: actions/upload-artifact@v3
+        if: matrix.compiler == 'ghc-9.6.2'
+        with:
+          name: hsec-tools-${{ github.sha }}
+          path: ~/.cabal/bin/hsec-tools
+  check-advisories:
+    name: Invoke check-advisories workflow
+    needs: linux
+    uses: ./.github/workflows/check-advisories.yml
+    with:
+      artifact-name: hsec-tools-${{ github.sha }}

We could commit these changes as is, but they will be lost the next time we run haskell-ci regenerate. Instead create a patch file:

% git diff > .github/haskell-ci.patch

Then tell haskell-ci to apply the patch when (re)generating haskell-ci.yml. What I would like to do is run:

% haskell-ci regenerate \
    --github-patches .github/haskell-ci.patch

The above command regenerates the haskell-ci.yml and correctly applies our patch. But it does not add the new arguments to the REGENDATA line. As a consequence, subsequent executions of haskell-ci regenerate will not apply the patch unless you use the --github-patches option every time. This is not what we want, and possibly a bug (I will investigate further, but not today).

The workaround: manually edit haskell-ci.yml, inserting "--github-patches",".github/haskell-ci.patch" in the REGENDATA line. As a result of that change, running haskell-ci regenerate without extra arguments applies the patch.

The final step is to commit the patch file together with the updated haskell-ci.yml.

Final words §

In this article I showed how to use haskell-ci to generate a GitHub Actions workflow for testing Haskell projects. I also demonstrated how to extend the haskell-ci workflow to save a built executable as an artifact, which can then be used by other CI jobs.

I hope it has been a useful article, both for people starting out and wondering how to test their Haskell projects, as well as for projects with more advanced CI workflows.

One area I would like to investigate further is how to skip the haskell-ci workflow when the tool code did not change. For example, if someone submits a pull request that adds or updates an advisory but does touch the hsec-tools code. Artifacts and cache entries have a name or key. Right now we use the Git commit hash in the artifact name. Perhaps we could use the Git tree hash of the code/hsec-tools directory instead:

% git rev-parse HEAD:code/hsec-tools 
a08aa5a2ee93ed09ec0025809226571969e24e3d

Uploading the artifact with a name based on the tree hash seems straightforward. The bigger challenge is how to skip the linux jobs when the artifact for the current hsec-tools tree already exists. And how to not skip the check-advisories job, even though it depends on the linux jobs. I think it’s probably possible. But it’s a nice-to-have; this yak’s haircut will have to wait for another day.

Creative Commons License
Except where otherwise noted, this work is licensed under a Creative Commons Attribution 4.0 International License .