<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>pureblog</title>
    <link href="https://frasertweedale.github.io/blog-fp/atom.xml" rel="self" />
    <link href="https://frasertweedale.github.io/blog-fp" />
    <id>https://frasertweedale.github.io/blog-fp/atom.xml</id>
    <author>
        <name>Fraser Tweedale</name>
        
        <email>frase@frase.id.au</email>
        
    </author>
    <updated>2026-03-28T00:00:00Z</updated>
    <entry>
    <title>Generating Hakyll pages from Haskell source</title>
    <link href="https://frasertweedale.github.io/blog-fp/posts/2026-03-28-hakyll-pages-from-haskell-source.html" />
    <id>https://frasertweedale.github.io/blog-fp/posts/2026-03-28-hakyll-pages-from-haskell-source.html</id>
    <published>2026-03-28T00:00:00Z</published>
    <updated>2026-03-28T00:00:00Z</updated>
    <summary type="html"><![CDATA[<h1 id="generating-hakyll-pages-from-haskell-source">Generating Hakyll pages from Haskell source</h1>
<p>In typical use, Hakyll programs convert external content files into
HTML pages. But what if your content is defined in Haskell source
files that are part of the Hakyll program? That can work—but you
need a trick or two. In this post I’ll show you how.</p>
<h2 id="the-scenario">The scenario <a href="#the-scenario" class="section">§</a></h2>
<p>For whatever reason, you’ve got some Haskell data structure you want
to present somehow. A typical example would be a list or table or
tree of data. My use case was a list of objects describing some
files, to be presented as a table. A file may or may not be
available for download (<code>Maybe FilePath</code>).</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">Files</span> <span class="kw">where</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">File</span> <span class="ot">=</span> <span class="dt">File</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a>  {<span class="ot"> fileDate ::</span> <span class="dt">String</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a>  ,<span class="ot"> fileName ::</span> <span class="dt">Maybe</span> <span class="dt">FilePath</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a>  ,<span class="ot"> fileDesc ::</span> <span class="dt">String</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a>  }</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a><span class="ot">fileList ::</span> [<span class="dt">File</span>]</span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a>fileList <span class="ot">=</span></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a>  [ <span class="dt">File</span> <span class="st">&quot;2026-01-15&quot;</span> (<span class="dt">Just</span> <span class="st">&quot;alice-to-tribunal.pdf&quot;</span>)</span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a>      <span class="st">&quot;Alice&#39;s submissions to the Tribunal&quot;</span></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a>  , <span class="dt">File</span> <span class="st">&quot;2025-12-23&quot;</span> <span class="dt">Nothing</span></span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a>      <span class="st">&quot;Bob&#39;s submissions to the Tribunal (confidential)&quot;</span></span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a>  , <span class="dt">File</span> <span class="st">&quot;2025-12-17&quot;</span> (<span class="dt">Just</span> <span class="st">&quot;tribunal-directions.pdf&quot;</span>)</span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a>      <span class="st">&quot;Orders made at the directions hearing&quot;</span></span>
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a>  ]</span></code></pre></div>
<p>I want Hakyll to generate an HTML page that formats this list as a
<code>&lt;table&gt;</code>. Importantly, I also want Hakyll to notice when
<code>Files.hs</code> changes and update the output, <em>even when nothing else
changed</em>.</p>
<h2 id="context-and-template">Context and template <a href="#context-and-template" class="section">§</a></h2>
<p>We first define how to process a <code>File</code> into a <code>Context File</code> that
can be fed to a template.</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="ot">fileContext ::</span> <span class="dt">Context</span> <span class="dt">File</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>fileContext <span class="ot">=</span></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>  field <span class="st">&quot;filename&quot;</span></span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a>    ( <span class="fu">maybe</span></span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a>        (noResult <span class="st">&quot;unavailable&quot;</span>)</span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a>        (<span class="fu">pure</span> <span class="op">.</span> toUrl <span class="op">.</span> (<span class="st">&quot;files/&quot;</span> <span class="op">&lt;&gt;</span>))</span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a>    <span class="op">.</span> fileName <span class="op">.</span> itemBody )</span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a>  <span class="op">&lt;&gt;</span> field <span class="st">&quot;date&quot;</span> (<span class="fu">pure</span> <span class="op">.</span> fileDate <span class="op">.</span> itemBody)</span>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a>  <span class="op">&lt;&gt;</span> field <span class="st">&quot;desc&quot;</span> (<span class="fu">pure</span> <span class="op">.</span> fileDesc <span class="op">.</span> itemBody)</span></code></pre></div>
<p>The Hakyll compiler for the HTML page populates a <code>listField</code> with
contents of <code>fileList</code>. <code>fileContext</code> generates the context for the
individual items.</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Hakyll</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Files</span></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a><span class="ot">main ::</span> <span class="dt">IO</span> ()</span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a>main <span class="ot">=</span> hakyll <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a>  create [<span class="st">&quot;files.html&quot;</span>] <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a>    route idRoute</span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a>    compile <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a>      <span class="kw">let</span> filesContext <span class="ot">=</span></span>
<span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a>            listField <span class="st">&quot;files&quot;</span> fileContext</span>
<span id="cb3-12"><a href="#cb3-12" aria-hidden="true" tabindex="-1"></a>              (<span class="fu">traverse</span> makeItem fileList)</span>
<span id="cb3-13"><a href="#cb3-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-14"><a href="#cb3-14" aria-hidden="true" tabindex="-1"></a>      makeItem <span class="st">&quot;&quot;</span></span>
<span id="cb3-15"><a href="#cb3-15" aria-hidden="true" tabindex="-1"></a>        <span class="op">&gt;&gt;=</span> loadAndApplyTemplate</span>
<span id="cb3-16"><a href="#cb3-16" aria-hidden="true" tabindex="-1"></a>              <span class="st">&quot;templates/files.html&quot;</span> filesContext</span>
<span id="cb3-17"><a href="#cb3-17" aria-hidden="true" tabindex="-1"></a>        <span class="op">&gt;&gt;=</span> loadAndApplyTemplate</span>
<span id="cb3-18"><a href="#cb3-18" aria-hidden="true" tabindex="-1"></a>              <span class="st">&quot;templates/default.html&quot;</span> defaultContext</span>
<span id="cb3-19"><a href="#cb3-19" aria-hidden="true" tabindex="-1"></a>        <span class="op">&gt;&gt;=</span> relativizeUrls</span></code></pre></div>
<p>The template content follows:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode html"><code class="sourceCode html"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="dt">&lt;</span><span class="kw">table</span><span class="ot"> id</span><span class="op">=</span><span class="st">&quot;files&quot;</span><span class="dt">&gt;</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>    <span class="dt">&lt;</span><span class="kw">tr</span><span class="dt">&gt;&lt;</span><span class="kw">th</span><span class="ot"> colspan</span><span class="op">=</span><span class="st">&quot;3&quot;</span><span class="dt">&gt;</span>Tribunal proceeding files<span class="dt">&lt;/</span><span class="kw">th</span><span class="dt">&gt;&lt;/</span><span class="kw">tr</span><span class="dt">&gt;</span></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>    $for(files)$</span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a>    <span class="dt">&lt;</span><span class="kw">tr</span><span class="dt">&gt;</span></span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a>        <span class="dt">&lt;</span><span class="kw">td</span><span class="dt">&gt;</span>$date$<span class="dt">&lt;/</span><span class="kw">td</span><span class="dt">&gt;</span></span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a>        $if(filename)$</span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a>        <span class="dt">&lt;</span><span class="kw">td</span><span class="dt">&gt;&lt;</span><span class="kw">a</span><span class="ot"> href</span><span class="op">=</span><span class="st">&quot;$filename$&quot;</span><span class="dt">&gt;</span>download<span class="dt">&lt;/</span><span class="kw">a</span><span class="dt">&gt;&lt;/</span><span class="kw">td</span><span class="dt">&gt;</span></span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a>        $else$</span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a>        <span class="dt">&lt;</span><span class="kw">td</span><span class="dt">&gt;&lt;</span><span class="kw">em</span><span class="dt">&gt;</span>N/A<span class="dt">&lt;/</span><span class="kw">em</span><span class="dt">&gt;&lt;/</span><span class="kw">td</span><span class="dt">&gt;</span></span>
<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a>        $endif$</span>
<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a>        <span class="dt">&lt;</span><span class="kw">td</span><span class="dt">&gt;</span>$desc$<span class="dt">&lt;/</span><span class="kw">td</span><span class="dt">&gt;</span></span>
<span id="cb4-12"><a href="#cb4-12" aria-hidden="true" tabindex="-1"></a>    <span class="dt">&lt;/</span><span class="kw">tr</span><span class="dt">&gt;</span></span>
<span id="cb4-13"><a href="#cb4-13" aria-hidden="true" tabindex="-1"></a>    $endfor$</span>
<span id="cb4-14"><a href="#cb4-14" aria-hidden="true" tabindex="-1"></a><span class="dt">&lt;/</span><span class="kw">table</span><span class="dt">&gt;</span></span></code></pre></div>
<p>The <code>noResult</code> in the <code>Nothing</code> case for the <code>fileName</code> field is
what makes the <code>$if(filename)$</code> / <code>$else$</code> conditional work. Apart
from that, the template is trivial.</p>
<h2 id="recompiling-on-change">Recompiling on change <a href="#recompiling-on-change" class="section">§</a></h2>
<p>One more thing. Because the source of the data is not a markdown
(or whatever) file, Hakyll won’t automatically notice if the data
changed and regenerate the HTML. We have to set up an explicit
dependency on the Haskell source file.</p>
<p>Add a <code>match</code> rule for the relevant <code>.hs</code> file, so that Hakyll will
monitor it. Also update the compiler for <code>files.html</code> load the
<code>Files.hs</code> “item”, to establish the dependency:</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>main <span class="ot">=</span> hakyll <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>  match <span class="st">&quot;Files.hs&quot;</span> <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a>    compile <span class="op">$</span> makeItem ()</span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a>  create [<span class="st">&quot;files.html&quot;</span>] <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a>    route idRoute</span>
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a>    compile <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a>      _ <span class="ot">&lt;-</span> load <span class="st">&quot;Files.hs&quot;</span><span class="ot"> ::</span> <span class="dt">Compiler</span> (<span class="dt">Item</span> ())</span>
<span id="cb5-10"><a href="#cb5-10" aria-hidden="true" tabindex="-1"></a>      <span class="kw">let</span> filesContext <span class="ot">=</span></span>
<span id="cb5-11"><a href="#cb5-11" aria-hidden="true" tabindex="-1"></a>        …</span></code></pre></div>
<p>And that’s all there is to it!</p>]]></summary>
</entry>
<entry>
    <title>Type-level programming for safer resource management</title>
    <link href="https://frasertweedale.github.io/blog-fp/posts/2025-07-19-type-nats-and-constraints.html" />
    <id>https://frasertweedale.github.io/blog-fp/posts/2025-07-19-type-nats-and-constraints.html</id>
    <published>2025-07-19T00:00:00Z</published>
    <updated>2025-07-19T00:00:00Z</updated>
    <summary type="html"><![CDATA[<h1 id="type-level-programming-for-safer-resource-management">Type-level programming for safer resource management</h1>
<p>I find type-level naturals useful for enforcing proper usage of APIs
or protocols that involve transactions, locking, memory
(de)allocation, and similar concerns. In this post I demonstrate
the main idea and discuss some of the shortcomings when using this
technique with Haskell.</p>
<h2 id="overview-of-the-technique">Overview of the technique <a href="#overview-of-the-technique" class="section">§</a></h2>
<p>Consider the following API.</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="co">-- provides type-level numeric literals</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE DataKinds #-}</span>  <span class="co">-- </span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="co">-- provides type-level numeric operations</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">GHC.TypeLits</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">DbHandle</span> n</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a><span class="ot">openDatabase      ::</span> <span class="dt">IO</span> (<span class="dt">DbHandle</span> <span class="dv">0</span>)</span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a><span class="ot">startTransaction  ::</span> <span class="dt">DbHandle</span> n <span class="ot">-&gt;</span> <span class="dt">IO</span> (<span class="dt">DbHandle</span> (n <span class="op">+</span> <span class="dv">1</span>))</span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a><span class="ot">commitTransaction ::</span> <span class="dt">DbHandle</span> n <span class="ot">-&gt;</span> <span class="dt">IO</span> (<span class="dt">DbHandle</span> (n <span class="op">-</span> <span class="dv">1</span>))</span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a><span class="ot">closeDatabase     ::</span> <span class="dt">DbHandle</span> <span class="dv">0</span> <span class="ot">-&gt;</span> <span class="dt">IO</span> ()</span></code></pre></div>
<p>The phantom type parameter <code>n</code> in <code>DbHandle n</code> tracks the
level of nested transactions. The database is initially opened at
level <code>0</code> and can only be closed when the level is <code>0</code>. The API
enforces that transactions must be committed. (Error handling and
rollback is left as an exercise for the reader.)</p>
<p>Here’s a trivial example of using this API:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="ot">main ::</span> <span class="dt">IO</span> ()</span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>main <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>  db <span class="ot">&lt;-</span> startTransaction <span class="op">=&lt;&lt;</span> openDatabase</span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a>  <span class="co">-- do stuff</span></span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a>  commitTransaction db <span class="op">&gt;&gt;=</span> closeDatabase</span></code></pre></div>
<p>All happy. But if we fail to commit the transaction…</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>main <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>  db <span class="ot">&lt;-</span> startTransaction <span class="op">=&lt;&lt;</span> openDatabase</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a>  <span class="co">-- do stuff</span></span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a>  closeDatabase db</span></code></pre></div>
<p>…then we get a type error:</p>
<pre><code>ExampleX.hs:21:9: error: [GHC-83865]
    • Couldn&#39;t match type ‘1’ with ‘0’
      Expected: DbHandle 0 -&gt; IO (DbHandle 0)
        Actual: DbHandle 0 -&gt; IO (DbHandle (0 + 1))
    • In the first argument of ‘(=&lt;&lt;)’, namely ‘startTransaction’
      In a stmt of a &#39;do&#39; block: db &lt;- startTransaction =&lt;&lt; openDatabase
      In the expression:
        do db &lt;- startTransaction =&lt;&lt; openDatabase
           closeDatabase db
   |
21 |   db &lt;- startTransaction =&lt;&lt; openDatabase
   |         .hs:21:9: error: [GHC-83865]
    • Couldn&#39;t match type ‘1’ with ‘0’
      Expected: DbHandle 0 -&gt; IO (DbHandle 0)
        Actual: DbHandle 0 -&gt; IO (DbHandle (0 + 1))
    • In the first argument of ‘(=&lt;&lt;)’, namely ‘startTransaction’
      In a stmt of a &#39;do&#39; block: db &lt;- startTransaction =&lt;&lt; openDatabase
      In the expression:
        do db &lt;- startTransaction =&lt;&lt; openDatabase
           closeDatabase db
   |
21 |   db &lt;- startTransaction =&lt;&lt; openDatabase
   |</code></pre>
<p>Very well.</p>
<h2 id="unnatural-inhabitants">Unnatural inhabitants <a href="#unnatural-inhabitants" class="section">§</a></h2>
<p>Because GHC’s type-level naturals are not a true inductive type,
subtraction is allowed even when the result with be less than zero.
In this case, the value just appears as an un-normalised <code>(0 - 1)</code>.
We can see this in the type error when we attempt to compile the
following program:</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>main <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>  db <span class="ot">&lt;-</span> openDatabase  <span class="co">-- we didn&#39;t start a transaction</span></span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>  <span class="co">-- do stuff</span></span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a>  commitTransaction db <span class="op">&gt;&gt;=</span> closeDatabase</span></code></pre></div>
<pre><code>X.hs:23:3: error: [GHC-83865]
    • Couldn&#39;t match type ‘0 - 1’ with ‘0’
      Expected: IO (DbHandle 0)
        Actual: IO (DbHandle (0 - 1))</code></pre>
<p>Unfortunately, the compiler could accept a program that commits a
transaction when no transaction is in progress. The following
program compiles without error:</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>main <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>  db <span class="ot">&lt;-</span> openDatabase</span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a>  <span class="co">-- do stuff</span></span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a>  commitTransaction db  <span class="co">-- we didn&#39;t start a transaction</span></span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a>  <span class="fu">pure</span> ()               <span class="co">-- we didn&#39;t close the database</span></span></code></pre></div>
<h2 id="constraints-to-the-rescue">Constraints to the rescue <a href="#constraints-to-the-rescue" class="section">§</a></h2>
<p>We can add a constraint to <code>commitTransaction</code> to ensure that it can
only be applied to handles that are in a (possibly nested)
transaction:</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>commitTransaction</span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a><span class="ot">  ::</span> (<span class="dv">1</span> <span class="op">&lt;=</span> n)</span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a>  <span class="ot">=&gt;</span> <span class="dt">DbHandle</span> n <span class="ot">-&gt;</span> <span class="dt">IO</span> (<span class="dt">DbHandle</span> (n <span class="op">-</span> <span class="dv">1</span>))</span></code></pre></div>
<p>Compiling the previous program now results in an error:</p>
<pre><code>X.hs:25:3: error: [GHC-64725]
    • Cannot satisfy: 1 &lt;= 0
    • In a stmt of a &#39;do&#39; block: commitTransaction db
      In the expression:
        do db &lt;- openDatabase
           commitTransaction db
           pure ()
</code></pre>
<p>And the original (correct) program continues still compiles. But
now it emits a <em>redundant constraint</em> warning:</p>
<pre><code>X.hs:15:6: warning: [GHC-30606] [-Wredundant-constraints]
    Redundant constraint: 1 &lt;= n
    In the type signature for:
         commitTransaction :: forall (n :: Natural).
                              (1 &lt;= n) =&gt;
                              DbHandle n -&gt; IO (DbHandle (n - 1))
   |
15 |   :: (1 &lt;= n)</code></pre>
<p>This is a bit annoying. I don’t know how to dispel this error
except to disable <code>-Wredundant-constraints</code>, and I don’t want to do
that. But in the end, these few spurious warnings are a small price
to pay for a safer API that prevents incorrect use at compile time.</p>
<h2 id="ergonomics">Ergonomics <a href="#ergonomics" class="section">§</a></h2>
<p>User code typically use locks, transactions or allocation in a
<em>symmetric</em> way. In other words, they initialise a context, perform
some actions, then tidy up. It makes sense to wrap this pattern up
in a function that handles the bookkeeping:</p>
<div class="sourceCode" id="cb11"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a>withTransaction</span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a><span class="ot">  ::</span> <span class="dt">DbHandle</span> (<span class="ot">n ::</span> <span class="dt">Natural</span>)</span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a>  <span class="ot">-&gt;</span> (<span class="kw">forall</span> m<span class="op">.</span> <span class="dt">DbHandle</span> (<span class="ot">m ::</span> <span class="dt">Natural</span>) <span class="ot">-&gt;</span> <span class="dt">IO</span> a)</span>
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a>  <span class="ot">-&gt;</span> <span class="dt">IO</span> a</span>
<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a>withTransaction db k <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb11-6"><a href="#cb11-6" aria-hidden="true" tabindex="-1"></a>  db&#39; <span class="ot">&lt;-</span> startTransaction db</span>
<span id="cb11-7"><a href="#cb11-7" aria-hidden="true" tabindex="-1"></a>  r <span class="ot">&lt;-</span> k db</span>
<span id="cb11-8"><a href="#cb11-8" aria-hidden="true" tabindex="-1"></a>  commitTransaction db&#39;</span>
<span id="cb11-9"><a href="#cb11-9" aria-hidden="true" tabindex="-1"></a>  <span class="fu">pure</span> r</span></code></pre></div>
<p>The <code>forall m.</code> (universal quantification) prevents improper use of
the database handle (e.g. premature commit) by requiring the action
function to work with database handles with any transaction level
(including <code>0</code>).</p>
<p>Unfortunately this code does not compile:</p>
<pre><code>X.hs:28:3: error: [GHC-64725]
    • Cannot satisfy: 1 &lt;= n + 1
    • In a stmt of a &#39;do&#39; block: commitTransaction db&#39;
      In the expression:
        do db&#39; &lt;- startTransaction db
           r &lt;- k db
           commitTransaction db&#39;
           pure r</code></pre>
<p>The proposition to be satisfied is obviously true. But again the
lack of inductive reasoning in GHC’s type-level naturals bites us.
Sprinkling this constraint into the type signature solves the
problem (and also lets us avoid the explicit kind signatures):</p>
<div class="sourceCode" id="cb13"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a>withTransaction</span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a><span class="ot">  ::</span> (<span class="dv">1</span> <span class="op">&lt;=</span> n <span class="op">+</span> <span class="dv">1</span>)</span>
<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a>  <span class="ot">=&gt;</span> <span class="dt">DbHandle</span> n</span>
<span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a>  <span class="ot">-&gt;</span> (<span class="kw">forall</span> m<span class="op">.</span> (<span class="dv">1</span> <span class="op">&lt;=</span> m <span class="op">+</span> <span class="dv">1</span>) <span class="ot">=&gt;</span> <span class="dt">DbHandle</span> m <span class="ot">-&gt;</span> <span class="dt">IO</span> a)</span>
<span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a>  <span class="ot">-&gt;</span> <span class="dt">IO</span> a</span></code></pre></div>
<p>Now we can write a program that uses transactions like so:</p>
<div class="sourceCode" id="cb14"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a>main <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a>  db <span class="ot">&lt;-</span> openDatabase</span>
<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a>  withTransaction db (<span class="fu">const</span> <span class="op">$</span> <span class="fu">pure</span> ())</span>
<span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a>  closeDatabase db</span></code></pre></div>
<p>Nested transactions also work:</p>
<div class="sourceCode" id="cb15"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a>main <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a>  db <span class="ot">&lt;-</span> openDatabase</span>
<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a>  withTransaction db</span>
<span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a>    ( \db&#39; <span class="ot">-&gt;</span> withTransaction db&#39; (<span class="fu">const</span> <span class="op">$</span> <span class="fu">pure</span> ()) )</span>
<span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a>  closeDatabase db</span></code></pre></div>
<p>But operations that rely on knowing the transaction level—like
committing—are excluded. This program:</p>
<div class="sourceCode" id="cb16"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a>main <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a>  db <span class="ot">&lt;-</span> openDatabase</span>
<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a>  withTransaction db (\db&#39; <span class="ot">-&gt;</span> commitTransaction db&#39;)</span>
<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a>  closeDatabase db</span></code></pre></div>
<p>…results in a type error:</p>
<pre><code>X.hs:38:31: error: [GHC-64725]
    • Cannot satisfy: 1 &lt;= m
    • In the expression: commitTransaction db&#39;
      In the second argument of ‘withTransaction’, namely
        ‘(\ db&#39; -&gt; commitTransaction db&#39;)’
      In a stmt of a &#39;do&#39; block:
        withTransaction db (\ db&#39; -&gt; commitTransaction db&#39;)
   |
38 |   withTransaction db (\db&#39; -&gt; commitTransaction db&#39;)
   |                               ^^^^^^^^^^^^^^^^^</code></pre>
<p>Which is exactly what we want.</p>
<h2 id="conclusion">Conclusion <a href="#conclusion" class="section">§</a></h2>
<p>I have demonstrated a technique that enables safer APIs for
resource-related operations such as locking/unlocking, transactions,
and memory management. Some type-level boilerplate is needed to
work around shortcomings in Haskell’s type-level naturals.</p>
<p>We also discussed an idea to make the the API more ergonomic by
providing a “wrapper” function that handles the initialisation and
cleanup. You would need additional functionality to safely handle
special operations that <em>should</em> be callable by user code, such as
rollbacks. One idea is to use a sum type that allows
user code to request special operations. For example:</p>
<div class="sourceCode" id="cb18"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">TransactionResult</span> a <span class="ot">=</span> <span class="dt">Rollback</span> <span class="op">|</span> <span class="dt">Commit</span> a</span>
<span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"></a>withTransaction</span>
<span id="cb18-4"><a href="#cb18-4" aria-hidden="true" tabindex="-1"></a><span class="ot">  ::</span> (<span class="dv">1</span> <span class="op">&lt;=</span> n <span class="op">+</span> <span class="dv">1</span>)</span>
<span id="cb18-5"><a href="#cb18-5" aria-hidden="true" tabindex="-1"></a>  <span class="ot">=&gt;</span> <span class="dt">DbHandle</span> n</span>
<span id="cb18-6"><a href="#cb18-6" aria-hidden="true" tabindex="-1"></a>  <span class="ot">-&gt;</span> (<span class="kw">forall</span> m<span class="op">.</span> (<span class="dv">1</span> <span class="op">&lt;=</span> m <span class="op">+</span> <span class="dv">1</span>) <span class="ot">=&gt;</span> <span class="dt">DbHandle</span> m <span class="ot">-&gt;</span> <span class="dt">TransactionResult</span> a)</span>
<span id="cb18-7"><a href="#cb18-7" aria-hidden="true" tabindex="-1"></a>  <span class="ot">-&gt;</span> <span class="dt">IO</span> (<span class="dt">TransactionResult</span> a)</span>
<span id="cb18-8"><a href="#cb18-8" aria-hidden="true" tabindex="-1"></a>withTransaction db k <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb18-9"><a href="#cb18-9" aria-hidden="true" tabindex="-1"></a>  db&#39; <span class="ot">&lt;-</span> startTransaction db</span>
<span id="cb18-10"><a href="#cb18-10" aria-hidden="true" tabindex="-1"></a>  r <span class="ot">&lt;-</span> k db</span>
<span id="cb18-11"><a href="#cb18-11" aria-hidden="true" tabindex="-1"></a>  <span class="kw">case</span> r <span class="kw">of</span></span>
<span id="cb18-12"><a href="#cb18-12" aria-hidden="true" tabindex="-1"></a>    <span class="dt">Rollback</span> <span class="ot">-&gt;</span> rollback db&#39; <span class="op">$&gt;</span> r</span>
<span id="cb18-13"><a href="#cb18-13" aria-hidden="true" tabindex="-1"></a>    <span class="dt">Commit</span> _ <span class="ot">-&gt;</span> <span class="fu">pure</span> r</span></code></pre></div>
<p>This is just a sketch of the idea (I haven’t tried it myself yet).
I encourage interested readers to explore further and share their
results.</p>]]></summary>
</entry>
<entry>
    <title>Education fund modelling with Haskell</title>
    <link href="https://frasertweedale.github.io/blog-fp/posts/2023-10-10-education-fund-modelling.html" />
    <id>https://frasertweedale.github.io/blog-fp/posts/2023-10-10-education-fund-modelling.html</id>
    <published>2023-10-10T00:00:00Z</published>
    <updated>2023-10-10T00:00:00Z</updated>
    <summary type="html"><![CDATA[<h1 id="education-fund-modelling-with-haskell">Education fund modelling with Haskell</h1>
<p>Like most people, I don’t like big financial surprises or sudden,
substantial changes to cashflow. Although we can’t control all
circumstances, we can plan for projected future expenses, like your
kids’ education. <span class="abstract">In this post I share a basic model built in
Haskell to help plan for education expenses (or other large, future,
time-bounded expenses).</span></p>
<p>This <strong>beginner-friendly post</strong> demonstrates many simple Haskell
functions, especially for working with lists. It also shows how to
build and execute stateful computations using <code>State</code> from <em>mtl</em>. I
(mostly) avoid type signatures and just focus on defining the terms,
but there are plenty of links to API documentation. At the end of
the post I suggest some enhancements to the model that would be good
<strong>exercises for learners</strong> (and might be fun even for more
experienced Haskell programmers).</p>
<h2 id="scenario-and-simplifying-assumptions">Scenario and simplifying assumptions <a href="#scenario-and-simplifying-assumptions" class="section">§</a></h2>
<p>The general scenario I built the model for is to save for private
secondary <strong>school fees for two children</strong>. They are 3 years apart
with the older child commencing in 4 years. Costs of primary
(elementary) schooling are not considered in this scenario, although
the model would accommodate that.</p>
<p>Given the time span (&gt; 10 years) we have to consider <strong>inflation</strong>.
The model uses a constant rate of inflation of 5% per annum, which
is less than the rate of inflation in Australia at time of writing,
but more than the our RBA’s long term target of 2–3%.</p>
<p>We also model an annual <strong>investment return</strong> of 8%. Short-term
volatility is inevitable but this is less than the <em>long-term
average</em> for the Australian stock market.</p>
<p><strong>Contributions</strong> to the fund will be annual. In real life, for
stable cashflow and to achieve <a href="https://en.wikipedia.org/wiki/Dollar_cost_averaging"><em>dollar cost averaging</em></a>,
contributions could be made more regularly (e.g. each pay day). The
model is therefore slightly pessimistic in this regard, but simpler
to implement.</p>
<h2 id="general-description-of-the-model">General description of the model <a href="#general-description-of-the-model" class="section">§</a></h2>
<p>The inputs to the model are a <strong>fee structure</strong>, and an annual
contribution amount which is fixed (does not grow with inflation or
income).</p>
<p>The model projects the costs of education in future years, and works
backwards to determine how much money needs to be in the fund at the
start of each year. The output of the model is a list of these
required balances, the first of which is the required <strong>initial
starting balance</strong> for the education fund. If the starting balance
is too high, increase the yearly contribution and evaluate the model
again. Continue until you find a balance between starting value and
contribution amount that works for you.</p>
<h2 id="modelling-the-costs">Modelling the costs <a href="#modelling-the-costs" class="section">§</a></h2>
<p>A school’s <em>current</em> fee structure, for the six years of secondary
education, is the ordered list of these numbers:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>feesBase <span class="ot">=</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>  [ <span class="dv">12000</span>, <span class="dv">12000</span>, <span class="dv">12000</span>     <span class="co">-- grade  7,  8,  9</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>  , <span class="dv">13500</span>, <span class="dv">13500</span>, <span class="dv">13500</span> ]   <span class="co">-- grade 10, 11, 12</span></span></code></pre></div>
<p>Child 1 will be starting high school in 4 years; Child 2 in 7 years.
We consider the intervening years to have nil cost, which we
represent by using <a href="https://hackage.haskell.org/package/base-4.19.0.0/docs/Prelude.html#v:replicate"><code>replicate</code></a> to make lists of
<code>0</code> of the required lengths. We append these sublists using
<a href="https://hackage.haskell.org/package/base-4.19.0.0/docs/Prelude.html#v:-60--62-"><code>&lt;&gt;</code></a>. We also extend the <em>shorter</em> list with
additional zeroes (<em>only</em> the shorter list, because
<a href="https://hackage.haskell.org/package/base-4.19.0.0/docs/Prelude.html#v:repeat"><code>repeat</code></a> makes an <strong>infinite list</strong>).</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>feesChild1Base <span class="ot">=</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>  <span class="fu">replicate</span> <span class="dv">4</span> <span class="dv">0</span> <span class="op">&lt;&gt;</span> feesBase <span class="op">&lt;&gt;</span> <span class="fu">repeat</span> <span class="dv">0</span></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a>feesChild2Base <span class="ot">=</span></span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a>  <span class="fu">replicate</span> <span class="dv">7</span> <span class="dv">0</span> <span class="op">&lt;&gt;</span> feesBase</span></code></pre></div>
<div class="note">
<p>The <a href="https://hackage.haskell.org/package/base-4.19.0.0/docs/Prelude.html#v:-60--62-"><code>&lt;&gt;</code></a> function appends many types, where the
operation is associative and the result is always defined. You can
also append lists with <a href="https://hackage.haskell.org/package/base-4.19.0.0/docs/Prelude.html#v:-43--43-"><code>++</code></a>, which is
specific to the list type.</p>
</div>
<p>We could add the yearly fees using <a href="https://hackage.haskell.org/package/base-4.19.0.0/docs/Prelude.html#v:zipWith"><code>zipWith</code></a>,
whose arguments are a binary combining function and two lists:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>feesCombinedBase <span class="ot">=</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>  <span class="fu">zipWith</span> (<span class="op">+</span>) feesChild1Base feesChild2Base</span></code></pre></div>
<p>However, many schools offer discounts when you have multiple
children enrolled. In this scenario, the school gives a 10%
discount for the younger (cheaper) child, when both are enrolled.
We define the combining function using <a href="https://hackage.haskell.org/package/base-4.19.0.0/docs/Prelude.html#v:min"><code>min</code></a>,
<a href="https://hackage.haskell.org/package/base-4.19.0.0/docs/Prelude.html#v:max"><code>max</code></a>, addition and multiplication:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="co">-- define our custom combining function...</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>sumWithDiscount a b <span class="ot">=</span></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>  <span class="fu">max</span> a b <span class="op">+</span> <span class="fu">min</span> a b <span class="op">*</span> <span class="fl">0.9</span></span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a><span class="co">-- ... and update feesCombinedBase to use it</span></span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a>feesCombinedBase <span class="ot">=</span></span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a>  <span class="fu">zipWith</span> sumWithDiscount feesChild1Base feesChild2Base</span></code></pre></div>
<p>Let’s evaluate <code>feesCombinedBase</code> and <a href="https://hackage.haskell.org/package/base-4.19.0.0/docs/Prelude.html#v:print"><code>print</code></a> the
values. <a href="https://hackage.haskell.org/package/base-4.19.0.0/docs/Prelude.html#v:traverse"><code>traverse_</code></a> applies an action to each
element of a list (or other container), then discards the result.</p>
<pre><code>λ&gt; traverse_ print feesCombinedBase
0.0
0.0
0.0
0.0
12000.0
12000.0
12000.0
24300.0
24300.0
24300.0
13500.0
13500.0
13500.0</code></pre>
<p>Now <strong>inflation</strong> must have its way with these numbers. Use
<a href="https://hackage.haskell.org/package/base-4.19.0.0/docs/Prelude.html#v:iterate"><code>iterate</code></a> to generate the inflation factor for
successive years (<em>ad infinitum</em>) by iteratively apply our inflation
rate, starting at <code>1</code>. To make the numbers presentable we’ll also
<a href="https://hackage.haskell.org/package/base-4.19.0.0/docs/Prelude.html#v:round"><code>round</code></a> to 4 decimal places.</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>round4    <span class="ot">=</span> (<span class="op">/</span> <span class="dv">10000</span>) <span class="op">.</span> <span class="fu">fromIntegral</span> <span class="op">.</span> <span class="fu">round</span> <span class="op">.</span> (<span class="op">*</span> <span class="dv">10000</span>)</span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>inflation <span class="ot">=</span> <span class="fu">fmap</span> round4 (<span class="fu">iterate</span> (<span class="op">*</span> <span class="fl">1.05</span>) <span class="dv">1</span>)</span></code></pre></div>
<div class="note">
<p><a href="https://hackage.haskell.org/package/base-4.19.0.0/docs/Prelude.html#v:fmap"><code>fmap</code></a> applies a function (the first argument) to
every element of a container or producer (the second argument).</p>
</div>
<p>Then we can apply the inflation to our uninflated costs, year by
year. One thing I did not yet mention is that the fees above are
nearly a year old and will be going up soon, so we’ll use
<a href="https://hackage.haskell.org/package/base-4.19.0.0/docs/Prelude.html#v:drop"><code>drop</code></a> to “shift left” the inflation figures by one
year. Once more we use <code>zipWith</code> for a very neat expression:</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>feesCombinedInflated <span class="ot">=</span></span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>  <span class="fu">zipWith</span> (<span class="op">*</span>) feesCombinedBase (<span class="fu">drop</span> <span class="dv">1</span> inflation)</span></code></pre></div>
<p>Let’s <code>print</code> the projected fees:</p>
<pre><code>λ&gt; traverse_ print feesCombinedInflated
0.0
0.0
0.0
0.0
15315.6
16081.2
16885.2
35903.25
37696.59
39582.27
23089.05
24244.65
25455.6</code></pre>
<p>This looks right. The highest costs are in the three “overlap”
years where both children are enrolled, and the impact of inflation
is evident.</p>
<h2 id="modelling-the-fund-balance">Modelling the fund balance <a href="#modelling-the-fund-balance" class="section">§</a></h2>
<p>A <strong>stateful computation</strong> will work out how much money we need in
the fund at the start of each year. Specifically, we use the
<a href="https://hackage.haskell.org/package/mtl-2.3.1/docs/Control-Monad-State-Lazy.html#t:State"><code>State</code></a> type from the <a href="https://hackage.haskell.org/package/mtl"><em>mtl</em></a> library
to define the computation.</p>
<p>The model takes into account the contribution amount and the growth
factor. To do this it has to work backwards in time. Each <code>step</code>
of the computation takes the schooling fee for that year, and the
state value tracks the required balance of the fund.</p>
<p>We define the <code>step</code> function, which considers what happens to the
fund over one year. It must first <a href="https://hackage.haskell.org/package/base-4.19.0.0/docs/Prelude.html#v:subtract"><code>subtract</code></a>
the contribution from the required balance. This excludes the
contribution from the (presumed) growth over the year; a simplifying
assumption that is reasonable <em>over the long-term</em>. We also ensure
the balance will not be negative. We intend to <strong>exhaust the fund</strong>
when fees are paid in the final year, and a negative balance will
spoil the subsequent calculations. The <a href="https://hackage.haskell.org/package/mtl-2.3.1/docs/Control-Monad-State-Lazy.html#v:modify"><code>modify</code></a>
function applies its argument (the subtraction) to <strong>modify the
state value</strong>.</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>step contrib fee <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a>  modify (<span class="fu">max</span> <span class="dv">0</span> <span class="op">.</span> <span class="fu">subtract</span> contrib)</span></code></pre></div>
<p>Next we <em>divide</em> the balance (state value) by the <strong>growth rate</strong>,
to obtain a nominal value of the fund at the start of the year:</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>  modify (<span class="op">/</span> <span class="fl">1.08</span>)</span></code></pre></div>
<p>Finally, we have to <strong>pay the fees</strong> at (or near) the start of the
year. So we must <em>increase</em> the required balance by that amount.
Some schools offer a discount for full year payment up-front. This
model applies a discount of 5%, but this is another area where the
model could be parameterised.</p>
<div class="sourceCode" id="cb11"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a>  modify (<span class="op">+</span> (fee <span class="op">*</span> <span class="fl">0.95</span>))</span></code></pre></div>
<p>All together the <code>step</code> function contains a few simple operations.
At the end it returns the state value via the <a href="https://hackage.haskell.org/package/mtl-2.3.1/docs/Control-Monad-State-Lazy.html#v:get"><code>get</code></a>
function. This will enable us to see how the value of the fund
changes year by year.</p>
<div class="sourceCode" id="cb12"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a>step contrib fee <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a>  modify (<span class="fu">max</span> <span class="dv">0</span> <span class="op">.</span> <span class="fu">subtract</span> contrib)</span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a>  modify (<span class="op">/</span> <span class="fl">1.08</span>)</span>
<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a>  modify (<span class="op">+</span> (fee <span class="op">*</span> <span class="fl">0.95</span>))</span>
<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a>  get</span></code></pre></div>
<p>Now that we have the <code>step</code> function, we can
<a href="https://hackage.haskell.org/package/base-4.19.0.0/docs/Prelude.html#v:traverse"><code>traverse</code></a> the list of year costs and apply the
<code>step</code> to each element. Like <code>traverse_</code>, <code>traverse</code> applies an
action to each element of a container, but instead of discarding the
results it replaces each element of the container with the “return
value” of the action.</p>
<p>We have to start at the final year and work backwards, so first
<a href="https://hackage.haskell.org/package/base-4.19.0.0/docs/Prelude.html#v:reverse"><code>reverse</code></a> the list, then <code>traverse</code> it:</p>
<div class="sourceCode" id="cb13"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="ot">go ::</span> <span class="dt">Double</span> <span class="ot">-&gt;</span> <span class="dt">State</span> <span class="dt">Double</span> [<span class="dt">Double</span>]</span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a>go contrib <span class="ot">=</span></span>
<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a>  <span class="fu">traverse</span> (step contrib) (<span class="fu">reverse</span> feesCombinedInflated)</span></code></pre></div>
<p>This is the first time I have shown a <strong>type signature</strong> in this
whole post! The <code>go</code> function takes the contribution amount and
returns a <em>state computation</em> whose <em>state variable</em> (the required
balance) is a real number (<code>Double</code>) and whose <em>output</em> is a list of
numbers (the required balance at the start of each year).</p>
<p>To actually <strong>run the state computation</strong> we use
<a href="https://hackage.haskell.org/package/mtl-2.3.1/docs/Control-Monad-State-Lazy.html#v:evalState"><code>evalState</code></a>, whose arguments are the state
computation and an <em>initial state</em>. Our initial state is <em>$0</em>,
because we intend to exhaust the fund when we pay for the final year
of schooling.</p>
<div class="sourceCode" id="cb14"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a>model contrib <span class="ot">=</span> <span class="fu">fmap</span> <span class="fu">round</span> (<span class="fu">reverse</span> (evalState go <span class="dv">0</span>))</span></code></pre></div>
<div class="note">
<p><code>evalState</code> yields the final <em>output</em> of the state computation,
discarding the state variable. If you instead want the final value
of the <em>state variable</em>, use <a href="https://hackage.haskell.org/package/mtl-2.3.1/docs/Control-Monad-State-Lazy.html#v:execState"><code>execState</code></a>.
<a href="https://hackage.haskell.org/package/mtl-2.3.1/docs/Control-Monad-State-Lazy.html#v:runState"><code>runState</code></a> yields the <em>(state var, output)</em>
pair.</p>
</div>
<p>The result of <code>model</code> is a list of the required fund balance at the
start of each year, in order. The first value is the initial
balance required for the given yearly contribution amount.</p>
<h2 id="running-the-model">Running the model <a href="#running-the-model" class="section">§</a></h2>
<p>Let’s see what the model tells us for various contribution amounts.
First, let’s just pick a number, say <em>$10,000</em>.</p>
<pre><code>λ&gt; traverse_ print (model 10000)
43542
57025
71587
87314
104299
106929
108984
110379
92373
71086
46161
36165
24183</code></pre>
<p>The final year’s value is <code>24183</code>. That will <em>always</em> be the final
value, regardless of contribution amount, because that is what the
final fee payment will be.</p>
<p>As for the required starting value, for a <em>$10,000</em> yearly
contribution we would need to start the fund with <em>$43,542</em>. But
what if you don’t have any money to start the fund? With a bit of
trial and error I found that a yearly contribution of <em>$15,778</em> is
enough (note the start value for the <em>second</em> year):</p>
<pre><code>λ&gt; traverse_ print (model 15778)
0
15776
32816
51220
71095
76847
82273
87309
73235
56195
35857
30815
24183</code></pre>
<p>Finally, how much would you need if you wanted the fund to be
completely passive—no further contributions after the initial
amount?</p>
<pre><code>λ&gt; traverse_ print (model 0)
118903
128415
138688
149783
161766
158993
155213
150306
125494
96857
63994
45424
24183</code></pre>
<p><em>$118,903</em>. Not many people have that kind of money at their
immediate disposal. But if you do, or if you get a big windfall,
you could “set and forget” an investment for your children’s
schooling, or other long-term financial objective.</p>
<h2 id="possible-model-enhancements-or-variations">Possible model enhancements or variations <a href="#possible-model-enhancements-or-variations" class="section">§</a></h2>
<p>There are several ways the model could be improved, or tweaked to
suit the circumstances or preferences of the investor.</p>
<p>You could consider <strong>increasing the contribution over time</strong> to
adjust for expected income growth. There are a several ways to do
it. One way is to pass the inflation factor to the step function
and apply it to the base contribution amount there. Another way is
to precompute the annual inflated contribution, <code>zip</code> it with the
inflated fee list, and <code>traverse</code> the <code>step</code> function over the
<em>(contribution, fee)</em> pairs. I leave it to the reader to play
around, if interested.</p>
<p>Another obvious area for improvement is that the model is hardcoded
for exactly two children. Enhancing it to <strong>handle different
numbers of children</strong> would be a good exercise. <code>zipWith</code> will no
longer cut it for combining fees. Furthermore, schools can have
diverse discount structures for multiple children (e.g. second child
10% off, 25% for third child, and so on). So the discount structure
should be parameterised. If you want to improve your skill working
with lists in Haskell, this would be a good exercise.</p>
<p>Other areas for improvement include dollar-cost averaging the yearly
contribution amount, or step functions for different payment
frequencies (e.g. quarterly / per term). Both of these tasks would
be good <strong>practice with <code>State</code> computations</strong>.</p>]]></summary>
</entry>
<entry>
    <title>haskell-ci how-to: caching and using your program executable</title>
    <link href="https://frasertweedale.github.io/blog-fp/posts/2023-06-04-haskell-ci-use-executable.html" />
    <id>https://frasertweedale.github.io/blog-fp/posts/2023-06-04-haskell-ci-use-executable.html</id>
    <published>2023-06-04T00:00:00Z</published>
    <updated>2023-06-04T00:00:00Z</updated>
    <summary type="html"><![CDATA[<h1 id="haskell-ci-how-to-caching-and-using-your-program-executable"><code>haskell-ci</code> how-to: caching and using your program executable</h1>
<p>In this article I show how to extend the <code>haskell-ci</code> GitHub Actions
workflow to pass the built executable to subsequent jobs.</p>
<h2 id="background">Background <a href="#background" class="section">§</a></h2>
<p>The Haskell <em>Security Response Team</em> recently bootstrapped the
<a href="https://github.com/haskell/security-advisories"><em>haskell/security-advisories</em></a> database.
This repository contains:</p>
<ul>
<li>The security advisories themselves. These are freeform Markdown
files with a TOML header. The header contains various required or
optional fields encoding information about the security issue, the
package it affects, and the affected package versions.</li>
<li>Tools for maintaining the database and exporting the data in
various formats. The <em>hsec-tools</em> Cabal package contains a
library that defines the advisory data format and parsers, and the
<code>hsec-tools</code> executable which is a CLI front-end for processing
advisories.</li>
</ul>
<p>With both tool sources and advisory data in the repository our
<em>continuous integration (CI)</em> pipelines have to do several things:</p>
<ul>
<li><strong>Build and test the tools.</strong> We want to test against several
recent GHC versions (to avoid inconvenience for contributors). We
also perform a Nix build.</li>
<li><strong>Check the validity of the advisory data.</strong> Advisories have to
conform to our schema. We can use <code>hsec-tools</code> to check each
advisory file. We <em>should</em> use the version of <code>hsec-tools</code> from
the same commit, to allow the advisory format and tooling to
evolve in lockstep.</li>
<li><strong>Publish advisories.</strong> We will likely want to set up automation
to publish <a href="https://osv.dev/">OSV</a> streams, a web site, and other relevant
artifacts.</li>
</ul>
<p>The remainder of this post explains how we use <code>haskell-ci</code> and
GitHub Actions reusable workflows to achieve the first two
objectives. The Security Response Team has not yet tackled
<em>publishing</em> but the same techniques should be applicable.</p>
<h2 id="introduction-to-haskell-ci">Introduction to <code>haskell-ci</code> <a href="#introduction-to-haskell-ci" class="section">§</a></h2>
<p><a href="https://github.com/haskell-CI/haskell-ci"><code>haskell-ci</code></a> is a tool that generates CI
workflows for Haskell projects. It supports GitHub Actions
(actively maintained) and Travis-CI (unmaintained), and can also
generate shell scripts for local testing. You can install
<code>haskell-ci</code> via <code>cabal</code>:</p>
<pre class="shell"><code>% cabal install haskell-ci</code></pre>
<p>Alternatively, you can clone the Git repository and build from
there:</p>
<pre class="shell"><code>% git clone https://github.com/haskell-CI/haskell-ci
% cd haskell-ci
% cabal install</code></pre>
<p>Now that <code>haskell-ci</code> is on the <code>PATH</code> you can generate the GitHub
actions workflow in a couple of steps. First, add the GHC versions
you want to test with to the <a href="https://cabal.readthedocs.io/en/3.6/cabal-package.html#pkg-field-tested-with"><code>tested-with</code></a>
field in your package’s <code>.cabal</code> file:</p>
<pre><code>cabal-version:      2.4
name:               hsec-tools
version:            0.1.0.0
tested-with:
  GHC ==8.10.7 || ==9.0.2 || ==9.2.7 || ==9.4.5 || ==9.6.2
…</code></pre>
<div class="note">
<p>Run <code>haskell-ci list-ghc</code> to see the list of GHC versions it knows
about. <code>haskell-ci</code> updates usually follow soon after GHC releases,
especially major versions.</p>
</div>
<p>Next run <code>haskell-ci github path/to/package.cabal</code>. It will
inspect the <code>.cabal</code> file to see what GHC versions to include in the
build matrix, and write <code>.github/workflows/haskell-ci.yml</code>. Then
commit the changes and push (or create a pull request). For
example:</p>
<pre class="shell"><code>% haskell-ci github code/hsec-tools/hsec-tools.cabal
*INFO* Generating GitHub config for testing for GHC versions: 8.10.7 9.0.2 9.2.7 9.4.5 9.6.2
% git add code .github
% git commit -m &#39;ci: add haskell-ci workflow&#39; --quiet
% git push
…</code></pre>
<h3 id="what-does-the-haskell-ci-workflow-do">What does the <code>haskell-ci</code> workflow do? <a href="#what-does-the-haskell-ci-workflow-do" class="section">§</a></h3>
<p>This post is not the place to belabour the details of GitHub Actions
<a href="https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions">workflow syntax</a>. But I will make a few
observations about the steps in the <code>haskell-ci</code> workflow.</p>
<ul>
<li>The build runs on Ubuntu (version 20.04 at time of writing). You
can tell <code>haskell-ci</code> to install <strong>extra APT packages</strong> via option
<code>--apt "space separated list"</code>. For example, if your package uses
the FFI to bind to a C library, use this option to install that
library and its development headers.</li>
<li>There is a job <em>matrix</em> with a different job for each of the GHC
versions mentioned in the <code>tested-with</code> field.</li>
<li>Each job downloads GHC via <a href="https://www.haskell.org/ghcup/">GHCUp</a>, a popular, multi-platform
installation tool for Haskell.</li>
<li>The package under test is not built <em>in situ</em>. Instead, a source
distribution is built using <code>cabal sdist</code>. It is then unpacked in
a different location, and built and tested there. This helps
<strong>detect packaging errors</strong> (e.g. missing extra source or data
files).</li>
<li>The job caches build tools and Haskell dependencies using the
GitHub Actions cache mechanism (discussed later in the post).
This saves time on subsequent test runs.</li>
<li>There is a step that runs <code>cabal check</code>, which checks for issues
that Hackage may complain about if you try to publish your package
there. This could be a mild annoyance for private or toy
projects.</li>
</ul>
<h3 id="building-documentation">Building documentation <a href="#building-documentation" class="section">§</a></h3>
<p><code>haskell-ci</code> makes it easy to build package documentation during
your CI jobs. All you need to do is use the <code>--haddock</code> option when
creating the workflow, and it will add a step that runs the
<a href="https://haskell-haddock.readthedocs.io/en/latest/">Haddock</a> tool.</p>
<pre class="shell"><code>% haskell-ci github --haddock path/to/package.cabal</code></pre>
<p>The Haddock step (if enabled) runs on every job in the build matrix.
Haddock is part of the GHC toolchain so there are no extra
dependencies.</p>
<h3 id="updating-the-build-matrix">Updating the build matrix <a href="#updating-the-build-matrix" class="section">§</a></h3>
<p>When a new release of GHC comes along, updating the <code>haskell-ci</code>
workflow is as simple as adding it to the <code>tested-with</code> list, then
running:</p>
<pre class="shell"><code>% haskell-ci regenerate
No haskell-ci.sh, skipping bash regeneration
*INFO* Generating GitHub config for testing for GHC versions: 8.10.7 9.0.2 9.2.7 9.4.5 9.6.2
No .travis.yml, skipping travis regeneration</code></pre>
<p><code>haskell-ci regenerate</code> reuses the options from the original
invocation of <code>haskell-ci github</code>. These were recorded in a comment
starting with <code># REGENDATA</code> in <code>haskell-ci.yml</code>. After running
<code>haskell-ci regenerate</code>, all that’s left is to commit and push the
changes.</p>
<h2 id="github-actions-passing-the-executable-between-jobs">GitHub Actions: passing the executable between jobs <a href="#github-actions-passing-the-executable-between-jobs" class="section">§</a></h2>
<p>Now the <code>haskell-ci</code> job is set up it will build and test the
package on every push or pull request. We have a further CI use
case: using the built executable to perform additional action. So
we now turn to the problem of <strong>how to use data produced by the
<code>haskell-ci</code> workflow in other jobs</strong>.</p>
<p>GitHub Actions provides (at least) 3 mechanisms for passing data
between jobs.</p>
<ul>
<li>Jobs can define <a href="https://docs.github.com/en/actions/using-jobs/defining-outputs-for-jobs"><strong><em>outputs</em></strong></a>. They must be unicode
strings and the size limit is 50MB. Both limitations make this
mechanism unsuitable for passing the built executable to dependent
jobs.</li>
<li>Jobs can <a href="https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows"><strong><em>cache</em></strong></a> dependencies to speed up the
build. But we want to cache the <em>result</em> of the build, which will
often be different from previous build. It seems to me that we
<em>could</em> use the caching mechanism, but it doesn’t feel like a good
fit.</li>
<li>Jobs can upload build <a href="https://docs.github.com/en/actions/using-workflows/storing-workflow-data-as-artifacts"><strong><em>artifacts</em></strong></a>. This makes
them available to subsequent jobs in the workflow. Unlike caches,
they can also be downloaded by anyone with access to the
repository. This is the appropriate mechanism for our use case.</li>
</ul>
<div class="note">
<p>By default GitHub retains artifacts for 90 days. If this is not
suitable you can <a href="https://docs.github.com/en/actions/using-workflows/storing-workflow-data-as-artifacts#configuring-a-custom-artifact-retention-period">customise the duration</a>.</p>
</div>
<p>We need to add two steps to the <code>linux</code> job. First we install the
<code>hsec-tools</code> executable. It was already built—this just copies it
to a known location. <code>--install-method=copy</code> ensures the executable
is copied to that location, not symlinked.</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode yaml"><code class="sourceCode yaml"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="at">      </span><span class="kw">-</span><span class="at"> </span><span class="fu">name</span><span class="kw">:</span><span class="at"> install executable</span></span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="at">        </span><span class="fu">if</span><span class="kw">:</span><span class="at"> matrix.compiler == &#39;ghc-9.6.2&#39;</span></span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="fu">        run</span><span class="kw">: </span><span class="ch">|</span></span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a>          $CABAL v2-install $ARG_COMPILER \</span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a>            --install-method=copy exe:hsec-tools</span></code></pre></div>
<p>The second step uses the <code>upload-artifact</code> action to archive the
executable. The artifact <em>bundle name</em> includes the commit hash.
The file <em>within the bundle</em> keeps the name <code>hsec-tools</code>.</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode yaml"><code class="sourceCode yaml"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="at">      </span><span class="kw">-</span><span class="at"> </span><span class="fu">name</span><span class="kw">:</span><span class="at"> upload executable</span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a><span class="at">        </span><span class="fu">uses</span><span class="kw">:</span><span class="at"> actions/upload-artifact@v3</span></span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a><span class="at">        </span><span class="fu">if</span><span class="kw">:</span><span class="at"> matrix.compiler == &#39;ghc-9.6.2&#39;</span></span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a><span class="at">        </span><span class="fu">with</span><span class="kw">:</span></span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a><span class="at">          </span><span class="fu">name</span><span class="kw">:</span><span class="at"> hsec-tools-${{ github.sha }}</span></span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a><span class="at">          </span><span class="fu">path</span><span class="kw">:</span><span class="at"> ~/.cabal/bin/hsec-tools</span></span></code></pre></div>
<div class="note">
<p>All <em>Haskell</em> dependencies are statically linked in the binary. It
does need some system libraries including <em>libgmp</em> and <em>libffi</em>.
But we do not need to preserve the Cabal store or provide the GHC
toolchain when we use the artifact.</p>
</div>
<p>Notice that each of the new steps has the condition:</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode yaml"><code class="sourceCode yaml"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="at">        </span><span class="fu">if</span><span class="kw">:</span><span class="at"> matrix.compiler == &#39;ghc-9.6.2&#39;</span></span></code></pre></div>
<p>The build matrix produces jobs for several different GHC versions.
But we only need one copy of the <code>hsec-tools</code> executable. I’m not
totally happy with this approach because the patch will need
updating as the matrix evolves. But I can live with it for now.</p>
<h2 id="github-actions-workflows-and-jobs">GitHub Actions: workflows and jobs <a href="#github-actions-workflows-and-jobs" class="section">§</a></h2>
<p>A repository can define one or more CI <em>workflows</em>. They are
written as YAML files in the <code>.github/workflows/</code> directory.</p>
<p>Each <em>workflow</em> is comprised of one or more <em>jobs</em>. It is
straightforward to declare dependencies between jobs <em>within a
workflow</em>. But workflows themselves are independent. There is no
reasonable way to specify that a particular workflow depends on the
result or outputs of another workflow.</p>
<p>This means that for our use case we have to create a new <em>job</em>
<strong>within the <code>Haskell-CI</code> workflow</strong>. Because <code>haskell-ci.yml</code> is
generated by the <code>haskell-ci</code> tool we have to patch this file.
Fortunately <code>haskell-ci</code> provides a mechanism to apply specified
patches when generating <code>haskell-ci.yml</code> (shown later in this
article). Unfortunately, defining and maintaining our additional
job(s) as <em>patches</em> to YAML files is even more unpleasant than
dealing with them as plain YAML.</p>
<h2 id="github-actions-reusable-workflows">GitHub Actions: reusable workflows <a href="#github-actions-reusable-workflows" class="section">§</a></h2>
<p><a href="https://docs.github.com/en/actions/using-workflows/reusing-workflows"><em>Reusable workflows</em></a> provide a neat
solution. A reusable workflow is defined as a separate YAML file,
just like ordinary workflows. The main differences are:</p>
<ul>
<li>Reusable workflows use the trigger condition <code>workflow_call</code>,
instead of the usual triggers like <code>push</code> or <code>pull_request</code>.</li>
<li>Reusable workflows can be parameterised by <em>inputs</em>. The calling
job provides the values. An input can be required or optional.</li>
</ul>
<p>The main goal of reusable workflows is to enable reuse, like
subroutines in programming. Our use case is a bit different. We
will define the <em>check-advisories</em> behaviour as a reusable workflow.
Although we will not be using it from multiple places, it still
gives us several advantages:</p>
<ul>
<li>Separation of concerns: checking the advisories uses an artifact
from the <code>haskell-ci</code> build/test job, but it’s a distinct task.</li>
<li>Maintainability: the behaviour is specified in an ordinary
workflow YAML file. We do not need to edit patch files to modify
the workflow.</li>
<li>We minimise the size and complexity of the patch to be applied to
<code>haskell-ci.yml</code>. The patch itself should rarely change, even if
the reusable workflow definition changes.</li>
</ul>
<h2 id="defining-the-check-advisories-workflow">Defining the <em>check-advisories</em> workflow <a href="#defining-the-check-advisories-workflow" class="section">§</a></h2>
<p>The <em>check-advisories</em> workflow is defined in
<code>.github/workflows/check-advisories.yml</code>. The full content is
below, with commentary.</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode yaml"><code class="sourceCode yaml"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="fu">name</span><span class="kw">:</span><span class="at"> Check security advisories</span></span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a><span class="fu">on</span><span class="kw">:</span></span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a><span class="at">  </span><span class="fu">workflow_call</span><span class="kw">:</span></span>
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a><span class="at">    </span><span class="fu">inputs</span><span class="kw">:</span></span>
<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a><span class="at">      </span><span class="fu">artifact-name</span><span class="kw">:</span></span>
<span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a><span class="at">        </span><span class="fu">required</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
<span id="cb10-7"><a href="#cb10-7" aria-hidden="true" tabindex="-1"></a><span class="at">        </span><span class="fu">type</span><span class="kw">:</span><span class="at"> string</span></span></code></pre></div>
<p>The <code>workflow_call</code> trigger condition establishes it as a reusable
workflow. We also define the <code>artifact-name</code> input. The caller is
<code>required</code> to provide it.</p>
<div class="sourceCode" id="cb11"><pre class="sourceCode yaml"><code class="sourceCode yaml"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="fu">jobs</span><span class="kw">:</span></span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a><span class="at">  </span><span class="fu">check-advisories</span><span class="kw">:</span></span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a><span class="at">    </span><span class="fu">runs-on</span><span class="kw">:</span><span class="at"> ubuntu-20.04</span></span>
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a><span class="at">    </span><span class="fu">steps</span><span class="kw">:</span></span>
<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a><span class="at">      </span><span class="kw">-</span><span class="at"> </span><span class="fu">uses</span><span class="kw">:</span><span class="at"> actions/checkout@v3</span></span>
<span id="cb11-6"><a href="#cb11-6" aria-hidden="true" tabindex="-1"></a><span class="at">        </span><span class="fu">with</span><span class="kw">:</span></span>
<span id="cb11-7"><a href="#cb11-7" aria-hidden="true" tabindex="-1"></a><span class="at">          </span><span class="fu">path</span><span class="kw">:</span><span class="at"> source</span></span></code></pre></div>
<p>The workflow has a single job called <code>check-advisories</code>. As usual
the first step is to check out the repository.</p>
<div class="sourceCode" id="cb12"><pre class="sourceCode yaml"><code class="sourceCode yaml"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="at">      </span><span class="kw">-</span><span class="at"> </span><span class="fu">run</span><span class="kw">:</span><span class="at"> mkdir -p .local/bin</span></span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a><span class="at">      </span><span class="kw">-</span><span class="at"> </span><span class="fu">id</span><span class="kw">:</span><span class="at"> download</span></span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a><span class="at">        </span><span class="fu">uses</span><span class="kw">:</span><span class="at"> actions/download-artifact@v3</span></span>
<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a><span class="at">        </span><span class="fu">with</span><span class="kw">:</span></span>
<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a><span class="at">          </span><span class="fu">name</span><span class="kw">:</span><span class="at"> ${{ inputs.artifact-name }}</span></span>
<span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a><span class="at">          </span><span class="fu">path</span><span class="kw">:</span><span class="at"> ~/.local/bin</span></span>
<span id="cb12-7"><a href="#cb12-7" aria-hidden="true" tabindex="-1"></a><span class="at">      </span><span class="kw">-</span><span class="at"> </span><span class="fu">run</span><span class="kw">:</span><span class="at"> chmod +x ~/.local/bin/hsec-tools</span></span></code></pre></div>
<p>Next we download the <code>hsec-tools</code> artifact to <code>~/.local/bin</code>, which
is in the <code>PATH</code>. Then we <code>chmod</code> it to make it executable.</p>
<div class="sourceCode" id="cb13"><pre class="sourceCode yaml"><code class="sourceCode yaml"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="at">      </span><span class="kw">-</span><span class="at"> </span><span class="fu">name</span><span class="kw">:</span><span class="at"> run checks</span></span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a><span class="fu">        run</span><span class="kw">: </span><span class="ch">|</span></span>
<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a>          cd source</span>
<span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a>          RESULT=0</span>
<span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a>          while read FILE ; do</span>
<span id="cb13-6"><a href="#cb13-6" aria-hidden="true" tabindex="-1"></a>            echo -n &quot;$FILE: &quot;</span>
<span id="cb13-7"><a href="#cb13-7" aria-hidden="true" tabindex="-1"></a>            hsec-tools check &lt; &quot;$FILE&quot; || RESULT=1</span>
<span id="cb13-8"><a href="#cb13-8" aria-hidden="true" tabindex="-1"></a>          done &lt; &lt;(find advisories EXAMPLE_ADVISORY.md -name &quot;*.md&quot;)</span>
<span id="cb13-9"><a href="#cb13-9" aria-hidden="true" tabindex="-1"></a>          exit $RESULT</span></code></pre></div>
<p>Finally we <code>find</code> all the advisory files and run <code>hsec-tools check</code>
on each one. If any of the checks fail the whole job fails (after
checking each file—we don’t want to short-circuit).</p>
<h2 id="calling-the-check-advisories-workflow">Calling the <em>check-advisories</em> workflow <a href="#calling-the-check-advisories-workflow" class="section">§</a></h2>
<p>Add a new job to the <code>haskell-ci.yml</code> workflow. It must be a
<strong>separate job</strong>, not a <em>step</em> of the existing <code>linux</code> job.</p>
<div class="sourceCode" id="cb14"><pre class="sourceCode yaml"><code class="sourceCode yaml"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="at">  </span><span class="fu">check-advisories</span><span class="kw">:</span></span>
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a><span class="at">    </span><span class="fu">name</span><span class="kw">:</span><span class="at"> Invoke check-advisories workflow</span></span>
<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a><span class="at">    </span><span class="fu">needs</span><span class="kw">:</span><span class="at"> linux</span></span>
<span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a><span class="at">    </span><span class="fu">uses</span><span class="kw">:</span><span class="at"> ./.github/workflows/check-advisories.yml</span></span>
<span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a><span class="at">    </span><span class="fu">with</span><span class="kw">:</span></span>
<span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a><span class="at">      </span><span class="fu">artifact-name</span><span class="kw">:</span><span class="at"> hsec-tools-${{ github.sha }}</span></span></code></pre></div>
<p>The meaning of the fields is as follows:</p>
<ul>
<li><strong><code>uses</code></strong>: <em>calls</em> the <code>check-advisories.yml</code> workflow.</li>
<li><strong><code>with</code></strong>: specifies values for the inputs, which in our
case is the <code>artifact-name</code>.</li>
<li><strong><code>needs</code></strong>: expresses the dependency on the <code>linux</code> job.</li>
</ul>
<div class="note">
<p>You can call workflows defined in other repositories. For example:</p>
<div class="sourceCode" id="cb15"><pre class="sourceCode yaml"><code class="sourceCode yaml"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="fu">uses</span><span class="kw">:</span><span class="at"> user-or-org/repo/.github/workflows/workflow.yml@v1</span></span></code></pre></div>
</div>
<h2 id="patching-haskell-ci.yml">Patching <code>haskell-ci.yml</code> <a href="#patching-haskell-ci.yml" class="section">§</a></h2>
<p>At this stage I have committed the <code>check-advisories.yml</code> reusable
workflow. I also have <em>uncommitted changes</em> to <code>haskell-ci.yml</code>:</p>
<div class="sourceCode" id="cb16"><pre class="sourceCode diff"><code class="sourceCode diff"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="kw">diff --git a/.github/workflows/haskell-ci.yml b/.github/workflows/haskell-ci.yml</span></span>
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a>index d51bb64..7ff8684 100644</span>
<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a><span class="dt">--- a/.github/workflows/haskell-ci.yml</span></span>
<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a><span class="dt">+++ b/.github/workflows/haskell-ci.yml</span></span>
<span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a><span class="dt">@@ -224,3 +224,19 @@ jobs:</span></span>
<span id="cb16-6"><a href="#cb16-6" aria-hidden="true" tabindex="-1"></a>         with:</span>
<span id="cb16-7"><a href="#cb16-7" aria-hidden="true" tabindex="-1"></a>           key: ${{ runner.os }}-${{ matrix.compiler }}-${{ github.sha }}</span>
<span id="cb16-8"><a href="#cb16-8" aria-hidden="true" tabindex="-1"></a>           path: ~/.cabal/store</span>
<span id="cb16-9"><a href="#cb16-9" aria-hidden="true" tabindex="-1"></a><span class="va">+      - name: install executable</span></span>
<span id="cb16-10"><a href="#cb16-10" aria-hidden="true" tabindex="-1"></a><span class="va">+        if: matrix.compiler == &#39;ghc-9.6.2&#39;</span></span>
<span id="cb16-11"><a href="#cb16-11" aria-hidden="true" tabindex="-1"></a><span class="va">+        run: |</span></span>
<span id="cb16-12"><a href="#cb16-12" aria-hidden="true" tabindex="-1"></a><span class="va">+          $CABAL v2-install $ARG_COMPILER \</span></span>
<span id="cb16-13"><a href="#cb16-13" aria-hidden="true" tabindex="-1"></a><span class="va">+            --install-method=copy exe:hsec-tools</span></span>
<span id="cb16-14"><a href="#cb16-14" aria-hidden="true" tabindex="-1"></a><span class="va">+      - name: upload executable</span></span>
<span id="cb16-15"><a href="#cb16-15" aria-hidden="true" tabindex="-1"></a><span class="va">+        uses: actions/upload-artifact@v3</span></span>
<span id="cb16-16"><a href="#cb16-16" aria-hidden="true" tabindex="-1"></a><span class="va">+        if: matrix.compiler == &#39;ghc-9.6.2&#39;</span></span>
<span id="cb16-17"><a href="#cb16-17" aria-hidden="true" tabindex="-1"></a><span class="va">+        with:</span></span>
<span id="cb16-18"><a href="#cb16-18" aria-hidden="true" tabindex="-1"></a><span class="va">+          name: hsec-tools-${{ github.sha }}</span></span>
<span id="cb16-19"><a href="#cb16-19" aria-hidden="true" tabindex="-1"></a><span class="va">+          path: ~/.cabal/bin/hsec-tools</span></span>
<span id="cb16-20"><a href="#cb16-20" aria-hidden="true" tabindex="-1"></a><span class="va">+  check-advisories:</span></span>
<span id="cb16-21"><a href="#cb16-21" aria-hidden="true" tabindex="-1"></a><span class="va">+    name: Invoke check-advisories workflow</span></span>
<span id="cb16-22"><a href="#cb16-22" aria-hidden="true" tabindex="-1"></a><span class="va">+    needs: linux</span></span>
<span id="cb16-23"><a href="#cb16-23" aria-hidden="true" tabindex="-1"></a><span class="va">+    uses: ./.github/workflows/check-advisories.yml</span></span>
<span id="cb16-24"><a href="#cb16-24" aria-hidden="true" tabindex="-1"></a><span class="va">+    with:</span></span>
<span id="cb16-25"><a href="#cb16-25" aria-hidden="true" tabindex="-1"></a><span class="va">+      artifact-name: hsec-tools-${{ github.sha }}</span></span></code></pre></div>
<p>We could commit these changes <em>as is</em>, but they will be lost the
next time we run <code>haskell-ci regenerate</code>. Instead create a <em>patch</em>
file:</p>
<pre class="shell"><code>% git diff &gt; .github/haskell-ci.patch</code></pre>
<p>Then tell <code>haskell-ci</code> to apply the patch when (re)generating
<code>haskell-ci.yml</code>. What I would <em>like to do</em> is run:</p>
<pre class="shell"><code>% haskell-ci regenerate \
    --github-patches .github/haskell-ci.patch</code></pre>
<p>The above command regenerates the <code>haskell-ci.yml</code> and correctly
applies our patch. But it <strong>does not add the new arguments to the
<code>REGENDATA</code> line</strong>. As a consequence, subsequent executions of
<code>haskell-ci regenerate</code> will not apply the patch unless you use the
<code>--github-patches</code> option every time. This is not what we want, and
possibly a bug (I will investigate further, but not today).</p>
<p><strong>The workaround</strong>: manually edit <code>haskell-ci.yml</code>, inserting
<code>"--github-patches",".github/haskell-ci.patch"</code> in the <code>REGENDATA</code>
line. As a result of that change, running <code>haskell-ci regenerate</code>
without extra arguments applies the patch.</p>
<p>The final step is to <strong>commit the patch file</strong> together with the
updated <code>haskell-ci.yml</code>.</p>
<h2 id="final-words">Final words <a href="#final-words" class="section">§</a></h2>
<p>In this article I showed how to use <code>haskell-ci</code> to generate a
GitHub Actions workflow for testing Haskell projects. I also
demonstrated how to extend the <code>haskell-ci</code> workflow to save a built
executable as an artifact, which can then be used by other CI jobs.</p>
<p>I hope it has been a useful article, both for people starting out
and wondering how to test their Haskell projects, as well as for
projects with more advanced CI workflows.</p>
<p>One area I would like to investigate further is how to skip the
<code>haskell-ci</code> workflow when the tool code did not change. For
example, if someone submits a pull request that adds or updates an
advisory but does touch the <code>hsec-tools</code> code. Artifacts and cache
entries have a name or key. Right now we use the Git <em>commit</em> hash
in the artifact name. Perhaps we could use the Git <em>tree</em> hash of
the <code>code/hsec-tools</code> directory instead:</p>
<pre class="shell"><code>% git rev-parse HEAD:code/hsec-tools 
a08aa5a2ee93ed09ec0025809226571969e24e3d</code></pre>
<p>Uploading the artifact with a name based on the tree hash seems
straightforward. The bigger challenge is how to skip the <code>linux</code>
jobs when the artifact for the current <code>hsec-tools</code> tree already
exists. And how to <em>not</em> skip the <code>check-advisories</code> job, even
though it depends on the <code>linux</code> jobs. I think it’s probably
possible. But it’s a <em>nice-to-have</em>; this yak’s haircut will have
to wait for another day.</p>]]></summary>
</entry>
<entry>
    <title>Haskell FFI call safety and garbage collection</title>
    <link href="https://frasertweedale.github.io/blog-fp/posts/2022-09-23-ffi-safety-and-gc.html" />
    <id>https://frasertweedale.github.io/blog-fp/posts/2022-09-23-ffi-safety-and-gc.html</id>
    <published>2022-09-23T00:00:00Z</published>
    <updated>2022-09-23T00:00:00Z</updated>
    <summary type="html"><![CDATA[<h1 id="haskell-ffi-call-safety-and-garbage-collection">Haskell FFI call safety and garbage collection</h1>
<p>The Haskell <em>Foreign Function Interface (FFI)</em> lets you interface
with code written in other languages, including C. Some kinds of
foreign calls—such as those that could call back into Haskell
code—require the GHC <em>runtime system (RTS)</em> to do some bookkeeping.
This bookkeeping has a performance cost, so there is a mechanism to
out of it for foreign calls that can’t call back into Haskell. This
mechanism is called the <em>safety level</em>. There are two levels:</p>
<ul>
<li><strong><code>safe</code></strong>: do the bookkeeping; callbacks are safe</li>
<li><strong><code>unsafe</code></strong>: skip the bookkeeping; callbacks have undefined
behaviour</li>
</ul>
<p>But beware! Besides callback safety, there are other situations
that require a <code>safe</code> foreign call. And some that may require an
<code>unsafe</code> call (not just for performance). <span class="abstract">In this post I explain
the garbage collection behaviour of <code>safe</code> and <code>unsafe</code> foreign
calls, and describe how the wrong choice led to a nasty deadlock bug
in <em>hs-notmuch</em>.</span></p>
<h2 id="foreign-imports">Foreign imports <a href="#foreign-imports" class="section">§</a></h2>
<p><a href="https://www.haskell.org/onlinereport/haskell2010/haskellch8.html">Chapter 8 of the Haskell 2010 Language Report</a> specifies the
foreign function interface syntax and semantics. A <code>foreign import</code>
declaration creates a Haskell binding to a foreign function or
value:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>foreign <span class="kw">import</span> ccall unsafe &quot;notmuch.h notmuch_database_open&quot;</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>  notmuch_database_open</span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="ot">    ::</span> <span class="dt">CString</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a>    <span class="ot">-&gt;</span> <span class="dt">CInt</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a>    <span class="ot">-&gt;</span> <span class="dt">Ptr</span> (<span class="dt">Ptr</span> <span class="dt">DatabaseHandle</span>)</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a>    <span class="ot">-&gt;</span> <span class="dt">IO</span> <span class="dt">CInt</span></span></code></pre></div>
<p>You can see that the <code>foreign import</code> declaration contains:</p>
<ul>
<li>the safety declaration (<code>unsafe</code>)</li>
<li>a reference to the C header and symbol to be imported</li>
<li>a name for the function on the Haskell side</li>
<li>a type annotation, which corresponds to the C type signature</li>
</ul>
<p>If you need a <code>safe</code> foreign call, write <code>safe</code> or just omit the
safety declaration (<code>safe</code> is the default).</p>
<div class="note">
<p><code>notmuch_database_open</code> is a C <em>double-pointer style constructor</em>.
The arguments are the filesystem path (<code>CString</code>), a mode enum
(<code>CInt</code>) and a location to write the pointer to the database handle
upon success (<code>Ptr (Ptr DatabaseHandle)</code>). The return value is <code>0</code>
on success or a nonzero error code (<code>CInt</code>).</p>
</div>
<h2 id="finalizers">Finalizers <a href="#finalizers" class="section">§</a></h2>
<p>Haskell is a garbage collected language. It is possible to use the
garbage collector to clean up objects that were allocated in foreign
calls, when they are no longer referenced. The clean up functions
are called <em>finalizers</em>. Often, finalizers are themselves imported
from the foreign library:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>foreign <span class="kw">import</span> ccall &quot;notmuch.h &amp;notmuch_database_destroy&quot;</span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="ot">  notmuch_database_destroy ::</span> <span class="dt">FinalizerPtr</span> <span class="dt">DatabaseHandle</span></span></code></pre></div>
<p>The ampersand (<code>&amp;</code>) denotes that we are importing a <em>function
pointer</em> rather than the function itself.</p>
<p><code>FinalizerPtr</code> is a type synonym defined in the
<a href="https://hackage.haskell.org/package/base-4.16.2.0/docs/Foreign-ForeignPtr.html#t:FinalizerPtr"><code>Foreign.ForeignPtr</code></a> module:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">FinalizerPtr</span> a <span class="ot">=</span> <span class="dt">FunPtr</span> (<span class="dt">Ptr</span> a <span class="ot">-&gt;</span> <span class="dt">IO</span> ())</span></code></pre></div>
<p>This arises from the usual definition of a destructor or <code>free</code>
function. That is, a void function whose single argument is the
pointer to the object to be destroyed, or memory to be freed.</p>
<p>Programs need to associate finalizers with the objects they are to
clean up. The function to do this is
<a href="https://hackage.haskell.org/package/base-4.16.2.0/docs/Foreign-ForeignPtr.html#v:newForeignPtr"><code>newForeignPtr</code></a>:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>newForeignPtr</span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="ot">  ::</span> <span class="dt">FinalizerPtr</span> a <span class="ot">-&gt;</span> <span class="dt">Ptr</span> a <span class="ot">-&gt;</span> <span class="dt">IO</span> (<span class="dt">ForeignPtr</span> a)</span></code></pre></div>
<div class="note">
<p>A <code>ForeignPtr a</code> can have multiple (or zero) finalizers. Use cases
for multiple finalizers are uncommon.</p>
</div>
<h2 id="ffi-safety-and-garbage-collection">FFI safety and garbage collection <a href="#ffi-safety-and-garbage-collection" class="section">§</a></h2>
<p>Consider the wording of the Haskell 2010 FFI chapter:</p>
<blockquote>
<p>A <code>safe</code> call … guarantees to leave the Haskell system in a state
that allows callbacks from the external code. In contrast, an
<code>unsafe</code> call, while carrying less overhead, must not trigger a
callback into the Haskell system. If it does, the system behaviour
is undefined. … Note that a callback into the Haskell system
implies that a garbage collection might be triggered after an
external entity was called, but before this call returns.</p>
</blockquote>
<p>This says that garbage collection can occur during a <code>safe</code> call.
But it <em>does not say</em> whether GC is allowed, or not, during an
<code>unsafe</code> call. It is up to implementations to decide what to do.</p>
<p>GHC’s behaviour here has changed over time. Since version 8.4, GHC
<em>guarantees</em> that <strong>garbage collection will never occur during an
<code>unsafe</code> FFI call.</strong> This guarantee allows <code>unsafe</code> FFI calls to
work with heap-allocated data, which enables some performance
optimisations.</p>
<div class="note">
<p>The <a href="https://downloads.haskell.org/ghc/9.4.1/docs/users_guide/exts/ffi.html#guaranteed-call-safety">GHC users guide</a> has a more thorough treatment of
this topic. It also mentions important details about threading
and the FFI, among other things.</p>
</div>
<h2 id="crouching-gc-hidden-deadlock">Crouching GC, hidden deadlock <a href="#crouching-gc-hidden-deadlock" class="section">§</a></h2>
<p>We have discussed the FFI, finalizers, foreign call (un)safety and
garbage collection. What’s it all coming to?</p>
<p>The earlier foreign import examples are from
<a href="https://hackage.haskell.org/package/notmuch"><em>hs-notmuch</em></a>, my Haskell binding to the
<a href="https://notmuchmail.org/"><em>notmuch</em></a> mail indexer. Note the following:</p>
<ul>
<li><p><code>notmuch_database_open</code> is an <code>unsafe</code> foreign call (because it
doesn’t call back into Haskell and I don’t want the bookkeeping
overhead).</p></li>
<li><p><code>notmuch_database_destroy</code> is a finalizer that closes the database
and frees resources. The garbage collector schedules the
finalizer when the database handle is no longer in use.</p></li>
<li><p>Wrapper code in <em>hs-notmuch</em> uses <code>newForeignPtr</code> to associate the
the <code>notmuch_database_destroy</code> finalizer with the pointers created
by <code>notmuch_database_open</code>.</p></li>
<li><p>The finalizer (called after GC) is the <em>only way</em> to close a
database handle. The <em>hs-notmuch</em> API does not offer an explicit
close function.</p></li>
</ul>
<p>An application could attempt to open a database multiple times.
This might be intentional. Or it could occur when there is an
unreferenced database handle whose finalizer has not yet been
executed.</p>
<p><em>libnotmuch</em> uses locks to prevent multiple read-write sessions to a
single database. <code>notmuch_database_open</code> blocks if the lock is
already held. In the case of <em>accidental</em> multiple open this isn’t
a problem because GC will eventually occur, finalizers will run and
the lock will be released.</p>
<p><strong>Except it won’t, because GHC prevents garbage collection during
<code>unsafe</code> foreign calls.</strong> As a result, the program deadlocks.
Non-deterministically.</p>
<p>This bug went unnoticed for a long time. It was <a href="https://github.com/purebred-mua/purebred/issues/468">eventually
detected</a> by <a href="https://github.com/purebred-mua/purebred"><em>purebred</em></a>’s automated user acceptance
tests, which perform many user actions very quickly. (<em>purebred</em> is
a mail program that uses <em>hs-notmuch</em>). Whether deadlock is likely
to occur depends very much on the application and/or user behaviour.</p>
<p>Fortunately, the fix was simple: make <code>notmuch_database_open</code> a
<code>safe</code> foreign call. Opening the database would typically be an
infrequent operation so the bookkeeping overhead is tolerable.</p>
<h2 id="conclusion">Conclusion <a href="#conclusion" class="section">§</a></h2>
<p>This post discussed the FFI, finalizers, and GHC’s garbage
collection behaviour (or lack thereof) during <code>safe</code> and <code>unsafe</code>
foreign calls. I used a deadlock bug in a foreign binding library
as a case study of this behaviour.</p>
<p>The folk wisdom regarding <code>safe</code> versus <code>unsafe</code> foreign calls
mainly deals with callbacks and performance overheads. I have
rarely seen the garbage collection mentioned. This is unfortunate
because the GC behaviour is critical to program safety and
correctness (as the case study proves). Resources (wiki pages, blog
posts, etc) that discuss FFI call safety but fail to mention the GC
behaviour of <code>safe</code> versus <code>unsafe</code> should be updated.</p>
<p>With these things in mind, here are my recommendations for Haskell
programmers working with the FFI:</p>
<ul>
<li><p>If a foreign function could call back into Haskell code, it must
be <code>safe</code>.</p></li>
<li><p>If a foreign call might block, it probably needs to be <code>safe</code>
(unless you are certain about what you are doing).</p></li>
<li><p>If you are unsure about whether a foreign call could block (or
why), make it <code>safe</code>.</p></li>
</ul>
<p>In fact, it’s fine to make every foreign import <code>safe</code> unless:</p>
<ul>
<li><p>You need to guarantee that heap-allocated objects (e.g. unpinned
<code>ByteArray#</code>) will not move during the foreign call, or</p></li>
<li><p>The bookkeeping overhead is a real performance issue (e.g. C-style
<code>_valid()</code>/<code>_get()</code>/<code>_next()</code> iterators, calls in tight loops).</p></li>
</ul>
<p>Doing so might deliver you from debugging a non-deterministic
deadlock.</p>]]></summary>
</entry>
<entry>
    <title>Writing tests for GHC</title>
    <link href="https://frasertweedale.github.io/blog-fp/posts/2022-05-31-ghc-test-suite.html" />
    <id>https://frasertweedale.github.io/blog-fp/posts/2022-05-31-ghc-test-suite.html</id>
    <published>2022-05-31T00:00:00Z</published>
    <updated>2022-05-31T00:00:00Z</updated>
    <summary type="html"><![CDATA[<h1 id="writing-tests-for-ghc">Writing tests for GHC</h1>
<p>In this post I explain how to write functional tests for GHC, with
examples.</p>
<h2 id="overview-of-the-ghc-test-suite">Overview of the GHC test suite <a href="#overview-of-the-ghc-test-suite" class="section">§</a></h2>
<p>The <em>Glasgow Haskell Compiler (GHC)</em> is a huge project. It
includes:</p>
<ul>
<li>a Haskell compiler (parser, simplifier, codegen, etc)</li>
<li>the runtime system (GC, thread scheduler, STM, etc)</li>
<li>GHCi (interactive interpreter / REPL)</li>
<li>bundled libraries (<em>base</em>, <em>template-haskell</em>, <em>ghc-prim</em>, etc)</li>
<li>build tooling (Makefiles, <a href="https://gitlab.haskell.org/ghc/ghc/-/wikis/building/hadrian">Hadrian</a>)</li>
<li>the users guide</li>
<li>a test suite</li>
</ul>
<p>The test suite should test all the functional parts of GHC. There
are different kinds of tests:</p>
<ul>
<li><strong>Testing the compiler.</strong> The test suite includes Haskell program
sources. A test can assert that the program compiles, or that
compilation failure is expected.</li>
<li><strong>Testing the resulting programs.</strong> Does the behaviour of a
program match expectations?</li>
<li><strong>Testing the bundled libraries.</strong> This is conceptually distinct
from the preceding point. In practice it can be achieved in
the same way.</li>
<li><strong>Performance tests</strong> ensure that the performance of the
compiler, and of compiled programs, does not regress. This is a
complex topic and I won’t discuss it further in this post. The
GHC wiki has <a href="https://gitlab.haskell.org/ghc/ghc/-/wikis/building/running-tests/performance-tests">a good introduction</a>.</li>
</ul>
<p>The GHC GitLab instances runs a <strong>continuous integration (CI)</strong>
pipeline for all merge requests. It builds GHC and runs the test
suite on a variety of target architectures and operating systems.
Here’s an <a href="https://gitlab.haskell.org/frasertweedale/ghc/-/pipelines/51652">example for one of my merge requests</a>.</p>
<p>The test suite driver is implemented in Python. Tests are described
in terms of its library interface. I won’t discuss the
<em>implementation</em> of the test driver—I just want to show you how to
use it.</p>
<h2 id="writing-tests">Writing tests <a href="#writing-tests" class="section">§</a></h2>
<p>In the GHC source repository, tests are organised heirarchically
under <code>testsuite/tests/</code>. Bundled libraries can also supply tests.
For example, some tests for <em>base</em> sit under
<code>libraries/base/tests/</code>. <code>.T</code> files describe the tests, using the
Python DSL. Usually the name <code>all.T</code> is used. The source code for
each test lives alongside the <code>.T</code> file.</p>
<h3 id="the-test-driver-dsl">The test driver DSL <a href="#the-test-driver-dsl" class="section">§</a></h3>
<p>Each test is described in a <code>.T</code> file by an invocation of the <code>test</code>
function:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>test(<span class="op">&lt;</span>name<span class="op">&gt;</span>, <span class="op">&lt;</span>setup<span class="op">&gt;</span>, <span class="op">&lt;</span>test<span class="op">-</span>fn<span class="op">&gt;</span>, <span class="op">&lt;</span>args<span class="op">&gt;</span>)</span></code></pre></div>
<ul>
<li><code>&lt;name&gt;</code> is the <strong>name of the test</strong> source file (without file
extension), or the directory containing the source code (for
multi-module builds). Often, the name is based on an issue
number.</li>
<li><code>&lt;setup&gt;</code> is a function or list of functions that affect <strong>when or
how to run the test program</strong>. For example, you can set extra
program arguments, declare the expected exit status, or supply a
predicate for skipping the test. The GHC wiki gives a <a href="https://gitlab.haskell.org/ghc/ghc/-/wikis/building/running-tests/adding#the-setup-field">full
list</a> with descriptions.</li>
<li><code>&lt;test-fn&gt;</code> specifies <strong>how to compile</strong> the test program, and
<strong>whether to run</strong> the resulting program. Common values include
<code>compile</code>, <code>compile_fail</code> (compilation failure expected) and
<code>compile_and_run</code> (run the resulting program). There are <a href="https://gitlab.haskell.org/ghc/ghc/-/wikis/building/running-tests/adding#the-test-fn-field">several
other options</a> such as for multi-module builds, and
GHCi sessions.</li>
<li><code>&lt;args&gt;</code> specifies extra <strong>arguments for GHC</strong> when compiling the
test program. It also has other use patterns depending on the
value of <code>&lt;test-fn&gt;</code>.</li>
</ul>
<p>Alongside the source file (<code>&lt;name&gt;.hs</code>) are optional files for
specifying the input and output of the test program:</p>
<ul>
<li><code>&lt;name&gt;.stdin</code>: data to feed on standard input</li>
<li><code>&lt;name&gt;.stdout</code>: data expected on standard output</li>
<li><code>&lt;name&gt;.stderr</code>: data expected on standard error from the test
program (for compile-only tests, from GHC).</li>
</ul>
<h3 id="example">Example <a href="#example" class="section">§</a></h3>
<p>Test <code>T20757</code> is a regression test for issue <a href="https://gitlab.haskell.org/ghc/ghc/-/issues/20757"><code>#20757</code></a>.
The test program source code lives in
<code>testsuite/tests/ghc-api/T20757.hs</code>:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">Main</span> <span class="kw">where</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">GHC.SysTools.BaseDir</span></span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a><span class="ot">main ::</span> <span class="dt">IO</span> ()</span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a>main <span class="ot">=</span> findToolDir <span class="dt">False</span> <span class="st">&quot;/&quot;</span> <span class="op">&gt;&gt;=</span> <span class="fu">print</span></span></code></pre></div>
<p>The test driver file <code>testsuite/tests/ghc-api/all.T</code> declares the
test (alongside several others):</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>test(<span class="st">&#39;T20757&#39;</span>,</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>     [unless(opsys(<span class="st">&#39;mingw32&#39;</span>), skip), exit_code(<span class="dv">1</span>)],</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a>     compile_and_run,</span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a>     [<span class="st">&#39;-package ghc&#39;</span>])</span></code></pre></div>
<p>This declaration:</p>
<ul>
<li>tells the test driver to both compile and run the test program</li>
<li>skips the test on operating systems other than Windows</li>
<li>sets the expected exit status to <code>1</code></li>
<li>adds <code>-package ghc</code> to the <em>compiler</em> CLI options</li>
</ul>
<p>Additionally, <code>testsuite/tests/ghc-api/T20757.stderr</code> exists. The
test driver shall assert that whatever the <em>test program</em> writes to
standard error matches the contents of that file.</p>
<h2 id="running-tests">Running tests <a href="#running-tests" class="section">§</a></h2>
<p>I use <a href="https://gitlab.haskell.org/ghc/ghc/-/wikis/building/hadrian">Hadrian</a> to build and test GHC:</p>
<pre class="shell"><code>% ./hadrian/build -j test</code></pre>
<p>The <code>-j[N]</code> option sets the number of jobs that can be run in
parallel. If the integer argument is not specified, it defaults to
the number of CPU cores.</p>
<p>There are around 9000 tests in the GHC test suite. It takes a while
to run them all (~15 minutes with 12 parallel jobs on my 12-core
workstation). If you’re hacking on GHC you might want to limit the
driver to one or just a few tests. Use the <code>--only</code> option to do
this:</p>
<pre class="shell"><code>% ./hadrian/build -j test --only=T20757
...
SUMMARY for test run started at Wed May 18 22:58:26 2022 
0:00:00.162930 spent to go through
       1 total tests, which gave rise to
       9 test cases, of which
       9 were skipped
       0 had missing libraries

       0 expected passes
       0 expected failures

       0 caused framework failures
       0 caused framework warnings
       0 unexpected passes
       0 unexpected failures
       0 unexpected stat failures
       0 fragile tests

Build completed in 1.30s</code></pre>
<p>The driver ran zero tests. Well, I am using FreeBSD; <code>T20757</code> skips
on all platforms except Windows. Let’s add one more test. You can
specify multiple tests via <code>--only</code>, <strong>separated by spaces</strong>:</p>
<pre class="shell"><code>% ./hadrian/build -j test --only=&quot;T20757 executablePath&quot;
...
SUMMARY for test run started at Wed May 18 23:02:33 2022
0:00:00.513652 spent to go through
       2 total tests, which gave rise to
      18 test cases, of which
      17 were skipped
       0 had missing libraries

       1 expected passes
       0 expected failures

       0 caused framework failures
       0 caused framework warnings
       0 unexpected passes
       0 unexpected failures
       0 unexpected stat failures
       0 fragile tests

Build completed in 11.82s</code></pre>
<p>That’s more like it.</p>
<p>Each test seems to inflate to 9 <em>test cases</em>. I think these
correspond to the different <a href="https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/rts/compiler-ways"><em>compiler ways</em></a>, and the test is
only run for the configured way(s).</p>
<p>Use <code>--test-verbose=[1,2,3,4,5]</code> to see more verbose output. The
commands and output for compiling and running the test program
appear at level <code>3</code> and above.</p>
<p>See the GHC wiki for further <a href="https://gitlab.haskell.org/ghc/ghc/-/wikis/building/running-tests/running">details about running tests</a>.</p>
<h2 id="testing-executablepath">Testing <code>executablePath</code> <a href="#testing-executablepath" class="section">§</a></h2>
<p>In my <a href="2022-05-10-improved-executable-path-queries.html">previous post</a> I described
<a href="https://downloads.haskell.org/ghc/9.4.1-alpha1/docs/html/libraries/base/System-Environment.html#v:executablePath"><code>System.Environment.executablePath</code></a>, an improved
way to query the path to the executable file of the calling process.</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="ot">executablePath ::</span> <span class="dt">Maybe</span> (<span class="dt">IO</span> (<span class="dt">Maybe</span> <span class="dt">FilePath</span>))</span></code></pre></div>
<p>The <code>IO</code> query is not defined on all platforms. Where it is
defined, its behaviour differs by platform. So it is an interesting
feature to test.</p>
<p>One could write multiple small test programs, one for each operating
system. Then tell the test driver to run the test for the current
system, and skip the others. Alternatively, define a single test
program, but tell it what platform it’s running on. That is what I
did.</p>
<p>The test driver declaration adds the operating system to the test
program’s arguments:</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>test(<span class="st">&#39;executablePath&#39;</span>,</span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>     extra_run_opts(config.os),</span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a>     compile_and_run,</span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a>     [<span class="st">&#39;&#39;</span>])</span></code></pre></div>
<p>The test program itself then implements some OS-aware checks of the
behaviour of <code>executablePath</code>. First come lists of which systems
have what behaviour:</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>canQuery <span class="ot">=</span> [<span class="st">&quot;mingw32&quot;</span>,<span class="st">&quot;freebsd&quot;</span>,<span class="st">&quot;linux&quot;</span>,<span class="st">&quot;darwin&quot;</span>,<span class="st">&quot;netbsd&quot;</span>]</span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a>canDelete <span class="ot">=</span> [<span class="st">&quot;freebsd&quot;</span>,<span class="st">&quot;linux&quot;</span>,<span class="st">&quot;darwin&quot;</span>,<span class="st">&quot;netbsd&quot;</span>]</span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a>canQueryAfterDelete <span class="ot">=</span> [<span class="st">&quot;netbsd&quot;</span>]</span></code></pre></div>
<p>In <code>main</code>, grab the OS from the program arguments:</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="ot">main ::</span> <span class="dt">IO</span> ()</span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>main <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a>  [os] <span class="ot">&lt;-</span> getArgs</span></code></pre></div>
<p>Next grab the query function. If the query is <em>expectedly</em>
undefined, stop here (<code>exitSuccess</code>). If the query is <em>unexpected</em>
undefined, or <em>unexpectedly defined</em>, fail the test. Otherwise
return the query.</p>
<div class="sourceCode" id="cb11"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a>  query <span class="ot">&lt;-</span> <span class="kw">case</span> (os <span class="ot">`elem`</span> canQuery, executablePath) <span class="kw">of</span></span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a>    (<span class="dt">False</span>, <span class="dt">Nothing</span>) <span class="ot">-&gt;</span> exitSuccess</span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a>    (<span class="dt">False</span>, <span class="dt">Just</span> _) <span class="ot">-&gt;</span> die <span class="st">&quot;query unexpectedly defined&quot;</span></span>
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a>    (<span class="dt">True</span>, <span class="dt">Nothing</span>) <span class="ot">-&gt;</span> die <span class="st">&quot;query unexpected not defined&quot;</span></span>
<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a>    (<span class="dt">True</span>, <span class="dt">Just</span> k) <span class="ot">-&gt;</span> <span class="fu">pure</span> k</span></code></pre></div>
<p>Now run the query. If it returns <code>Nothing</code>, fail the test
(it should return a result).</p>
<div class="sourceCode" id="cb12"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a>  before <span class="ot">&lt;-</span> query <span class="op">&gt;&gt;=</span> \r <span class="ot">-&gt;</span> <span class="kw">case</span> r <span class="kw">of</span></span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a>    <span class="dt">Nothing</span>   <span class="ot">-&gt;</span> die <span class="st">&quot;query unexpectedly returned Nothing&quot;</span></span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a>    <span class="dt">Just</span> path <span class="ot">-&gt;</span> <span class="fu">pure</span> path</span></code></pre></div>
<p>We need to compare the result (<code>before</code>) to the expected value.
That is, the file <code>executablePath</code> in the current directory:</p>
<div class="sourceCode" id="cb13"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a>  cwd <span class="ot">&lt;-</span> getCurrentDirectory</span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a>  <span class="kw">let</span> expected <span class="ot">=</span> cwd <span class="op">&lt;/&gt;</span> <span class="st">&quot;executablePath&quot;</span></span></code></pre></div>
<p>Also drop the file extension (if any) from the result of the query.
This is needed because GHC names the executables it generates with a
file extension on some platforms (e.g. <code>.exe</code> on Windows).</p>
<div class="sourceCode" id="cb14"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a>  <span class="kw">let</span> actual <span class="ot">=</span> dropExtension before</span></code></pre></div>
<p>Now compare <code>expected</code> and <code>actual</code>. Use <code>equalFilePath</code> because
the query may return a non-normalised path on some systems (I have
observed this on NetBSD):</p>
<div class="sourceCode" id="cb15"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a>  unless (equalFilePath actual expected) <span class="op">$</span></span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a>    die <span class="st">&quot;query result did not match expected&quot;</span></span></code></pre></div>
<p>Now, what happens if we <em>delete the executable</em> while the process
runs? First of all, some operating systems don’t even allow
that. We grant those systems an honourable discharge. The
remaining systems delete the file.</p>
<div class="sourceCode" id="cb16"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a>  unless (os <span class="ot">`elem`</span> canDelete)</span>
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a>    exitSuccess</span>
<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a>  removeFile before</span></code></pre></div>
<p>Finally we run the query again. Once again, the expected behaviour
differs by platform. On Mac OS X and FreeBSD, we expect <code>Nothing</code>.
But NetBSD successfully returns the original value.</p>
<div class="sourceCode" id="cb17"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a>  after <span class="ot">&lt;-</span> query</span>
<span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a>  <span class="kw">case</span> after <span class="kw">of</span></span>
<span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a>    <span class="dt">Nothing</span></span>
<span id="cb17-4"><a href="#cb17-4" aria-hidden="true" tabindex="-1"></a>      <span class="op">|</span> os <span class="ot">`elem`</span> canQueryAfterDelete</span>
<span id="cb17-5"><a href="#cb17-5" aria-hidden="true" tabindex="-1"></a>      <span class="ot">-&gt;</span> die <span class="st">&quot;query unexpected failed after delete&quot;</span></span>
<span id="cb17-6"><a href="#cb17-6" aria-hidden="true" tabindex="-1"></a>      <span class="op">|</span> <span class="fu">otherwise</span></span>
<span id="cb17-7"><a href="#cb17-7" aria-hidden="true" tabindex="-1"></a>      <span class="ot">-&gt;</span> <span class="fu">pure</span> ()</span>
<span id="cb17-8"><a href="#cb17-8" aria-hidden="true" tabindex="-1"></a>    <span class="dt">Just</span> _</span>
<span id="cb17-9"><a href="#cb17-9" aria-hidden="true" tabindex="-1"></a>      <span class="op">|</span> os <span class="ot">`elem`</span> canQueryAfterDelete</span>
<span id="cb17-10"><a href="#cb17-10" aria-hidden="true" tabindex="-1"></a>      <span class="ot">-&gt;</span> <span class="fu">pure</span> ()</span>
<span id="cb17-11"><a href="#cb17-11" aria-hidden="true" tabindex="-1"></a>      <span class="op">|</span> <span class="fu">otherwise</span></span>
<span id="cb17-12"><a href="#cb17-12" aria-hidden="true" tabindex="-1"></a>      <span class="ot">-&gt;</span> die <span class="op">$</span> <span class="st">&quot;query unexpected succeeded after delete&quot;</span></span></code></pre></div>
<p>Phew, quite a lot of code to test one little feature. There is no
standard system interface for querying the executable path. So it
is no surprise to see such diverse behaviour across different
platforms—including no query mechanism at all (looking at you,
OpenBSD).</p>
<h2 id="conclusion">Conclusion <a href="#conclusion" class="section">§</a></h2>
<p>In this article I gave an introduction to writing tests for the GHC
test suite, with some examples. The GHC wiki <a href="https://gitlab.haskell.org/ghc/ghc/-/wikis/building/running-tests/adding">contains more
comprehensive documentation</a>. <a href="https://gitlab.haskell.org/ghc/ghc/-/wikis/building/running-tests/performance-tests">Performance
tests</a> are a more complex aspect of the GHC test
suite which I didn’t discuss in detail.</p>]]></summary>
</entry>
<entry>
    <title>Better executable path queries in GHC 9.4</title>
    <link href="https://frasertweedale.github.io/blog-fp/posts/2022-05-10-improved-executable-path-queries.html" />
    <id>https://frasertweedale.github.io/blog-fp/posts/2022-05-10-improved-executable-path-queries.html</id>
    <published>2022-05-10T00:00:00Z</published>
    <updated>2022-05-10T00:00:00Z</updated>
    <summary type="html"><![CDATA[<h1 id="better-executable-path-queries-in-ghc-9.4">Better executable path queries in GHC 9.4</h1>
<p>I <a href="2021-01-01-fixing-getExecutablePath-FreeBSD.html">previously wrote about</a>
<a href="https://downloads.haskell.org/ghc/9.4.1-alpha1/docs/html/libraries/base/System-Environment.html#v:getExecutablePath"><code>System.Environment.getExecutablePath</code></a> and how
I fixed it on FreeBSD. Unfortunately, this function still has some
problems. In this post I explain the problems and introduce
<a href="https://downloads.haskell.org/ghc/9.4.1-alpha1/docs/html/libraries/base/System-Environment.html#v:executablePath"><code>executablePath</code></a>, the solution arriving in
<em>base-4.17.0.0</em> (GHC 9.4.1).</p>
<h2 id="problems-with-getexecutablepath">Problems with <code>getExecutablePath</code> <a href="#problems-with-getexecutablepath" class="section">§</a></h2>
<p><code>getExecutablePath :: IO FilePath</code> is a way for a Haskell program to
query the path to its own executable. It has several significant
problems:</p>
<ul>
<li><p><strong>Not all operating systems provide a reliable mechanism to query
the executable path.</strong> Where an OS-specific implementation does
not exist, <code>getExecutablePath</code> falls back to providing the value
of <code>argv[0]</code> (<a href="https://gitlab.haskell.org/ghc/ghc/-/issues/12377">#12377</a>). The invoking process chooses the
value; it does not necessarily represent the path to the
executable. It might represent or resolve to a different
executable. <code>argv</code> could even be an empty array, in which case
<code>getExecutablePath</code> throws an exception!</p></li>
<li><p><strong>Divergent behaviour when executable has been deleted.</strong> When we
say “executable” we mean “<em>file which contains <strong>program</strong> text,
which the OS can load and execute (becoming a <strong>process</strong>)</em>”.
That file could be deleted while the process is running. In this
case, the behaviour of <code>getExecutablePath</code> differs by platform.
On FreeBSD it throws an exception. On Linux it returns the
original <code>FilePath</code> suffixed with <code>" (deleted)"</code> (<a href="https://gitlab.haskell.org/ghc/ghc/-/issues/10957">#10957</a>).
These differences impede cross-platform development.</p></li>
<li><p><strong>The documentation is wrong.</strong> Until I fixed it, the
documentation for <code>getExecutablePath</code> stated, <em>“Returns the
absolute pathname of the current executable.”</em> It didn’t explain
any of the discrepancies mentioned in the preceding points.
Programmers can easily stumble into the unsafe behaviour (I did).</p></li>
</ul>
<h2 id="type-of-the-solution">Type of the solution <a href="#type-of-the-solution" class="section">§</a></h2>
<p>Types are an essential tool for modelling a problem and guiding the
development of a solution. The problems with <code>getExecutablePath</code>
reveal that:</p>
<ul>
<li><p>Some OSes provide a mechanism to query the executable path, and
some do not. This is a static property of the platform; it does
not change over the lifetime of a process.</p></li>
<li><p>The query mechanism (if it exists) might be unable to return a
result. For example, when the executable file has been deleted.
The result may vary during the lifetime of a process.</p></li>
</ul>
<p>The <code>Maybe a</code> type models the existence or absence of a value:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Maybe</span> a <span class="ot">=</span> <span class="dt">Nothing</span> <span class="op">|</span> <span class="dt">Just</span> a</span></code></pre></div>
<p>Accordingly, a suitable type to model this problem is:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="ot">executablePath ::</span> <span class="dt">Maybe</span> (<span class="dt">IO</span> (<span class="dt">Maybe</span> <span class="dt">FilePath</span>))</span></code></pre></div>
<p>The outer <code>Maybe</code> models the presence or absence of a query
mechanism. The query itself has the type <code>IO (Maybe FilePath)</code>.
The inner <code>Maybe</code> models that the query might be unable to return
a valid <code>FilePath</code>.</p>
<p>The type is also a kind of (machine-checked) documentation. It
reveals things that the written documentation for
<code>getExecutablePath</code> <strong><em>should have said, but didn’t</em></strong>.</p>
<div class="note">
<p><code>FilePath</code> is defined as a type synonym for <code>String</code>, which is
itself a type synonym for <code>[Char]</code>:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">FilePath</span> <span class="ot">=</span> <span class="dt">String</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">String</span>   <span class="ot">=</span> [<span class="dt">Char</span>]</span></code></pre></div>
<p>It can be argued on multiple grounds that this is not an appropriate
type for representing file paths:</p>
<ul>
<li><p>Performance: <code>[Char]</code> is a linked list of individual characters.
Packed strings have better performance.</p></li>
<li><p>Correctness: <code>FilePath</code> admits any string value, not just valid
paths. See above for a real world example: paths suffixed with
<code>"(deleted)"</code> on Linux.</p></li>
</ul>
<p>I did not go further down this rabbit hole for the change discussed
in this post. <code>FilePath</code> pervades <em>base</em> and other “standard”
libraries. Furthermore, GHC targets a variety of operating systems;
accurately modeling valid file paths on diverse platforms drives up
complexity. If you have specific needs not met by <code>FilePath</code>, check
out the <a href="https://hackage.haskell.org/packages/search?terms=filepath">many path libraries</a> which offer different approaches to
representing and working with paths.</p>
</div>
<h2 id="implementation-of-executablepath">Implementation of <code>executablePath</code> <a href="#implementation-of-executablepath" class="section">§</a></h2>
<p>In this section I’ll briefly review the implementation. GHC
<a href="https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4779">merge request !4779</a> has the gory details, for those
interested.</p>
<p>I was able to implement <code>executablePath</code> without modifying any code
that uses the <em>foreign function interface (FFI)</em>.
<code>getExecutablePath</code> was unchanged. <code>executablePath</code> implementations
wrap the former. See <a href="2021-01-01-fixing-getExecutablePath-FreeBSD.html">my earlier post</a> for an example
of how <code>getExecutablePath</code> uses the FFI.</p>
<h3 id="mac-os-x-freebsd-and-netbsd">Mac OS X, FreeBSD and NetBSD <a href="#mac-os-x-freebsd-and-netbsd" class="section">§</a></h3>
<p>The FreeBSD and NetBSD implementations of <code>getExecutablePath</code> are
nearly identical, but the implementation for Mac OS X is very
different. Nevertheless, the observable behaviour is identical: the
system calls error with <code>ENOENT</code> when the executable has been
deleted, and succeed otherwise. No other expected failure scenarios
are known (yet).</p>
<p>Therefore, the <code>executablePath</code> implementation for these platforms
boils down to catching the Haskell exception value corresponding to
<code>ENOENT</code> and turning it into <code>Nothing</code>. Unexpected exceptions are
re-thrown.</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>executablePath <span class="ot">=</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>  <span class="dt">Just</span> (<span class="fu">fmap</span> <span class="dt">Just</span> getExecutablePath <span class="ot">`catch`</span> f)</span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>    <span class="kw">where</span></span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a>    f e <span class="op">|</span> isDoesNotExistError e <span class="ot">=</span> <span class="fu">pure</span> <span class="dt">Nothing</span></span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a>        <span class="op">|</span> <span class="fu">otherwise</span>             <span class="ot">=</span> throw e</span></code></pre></div>
<h3 id="linux">Linux <a href="#linux" class="section">§</a></h3>
<p>The Linux implementation of <code>getExecutablePath</code> reads the value of
<code>/proc/self/exe</code> (part of the <a href="https://manpages.debian.org/buster/manpages/procfs.5.en.html"><code>procfs(5)</code></a>). The
man page states:</p>
<blockquote>
<p>If the pathname has been unlinked, the symbolic link will contain
the string ‘(deleted)’ appended to the original pathname.</p>
</blockquote>
<p><code>executablePath</code> checks for this condition and, if detected, returns
<code>Nothing</code>. Note that we could have stripped the suffix and returned
<code>Just</code> the “original” path. Returning <code>Nothing</code> makes it consistent
with the other platforms.</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>executablePath <span class="ot">=</span> <span class="dt">Just</span> (<span class="fu">fmap</span> check getExecutablePath)</span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>  <span class="kw">where</span></span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>  check s <span class="op">|</span> <span class="st">&quot;(deleted)&quot;</span> <span class="ot">`isSuffixOf`</span> s <span class="ot">=</span> <span class="dt">Nothing</span></span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a>          <span class="op">|</span> <span class="fu">otherwise</span>                  <span class="ot">=</span> <span class="dt">Just</span> s</span></code></pre></div>
<div class="note">
<p>What if the file is named <code>foo (deleted)</code>? The behaviour is
ambiguous. Checking the existence of the file is not safe either.
If the file was <code>foo</code>, a <em>different</em> file <code>foo (deleted)</code> could
exist beside it. Better a false negative in an unlikely scenario,
than an <strong>unsafe false positive</strong>.</p>
</div>
<h3 id="windows">Windows <a href="#windows" class="section">§</a></h3>
<p>Windows prevents the deletion of an executable file during the
lifetime of any process created from it. So <code>executablePath</code> simply
wraps the result of <code>getExecutablePath</code> with a <code>Just</code>.</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>executablePath <span class="ot">=</span> <span class="dt">Just</span> (<span class="fu">fmap</span> <span class="dt">Just</span> getExecutablePath)</span></code></pre></div>
<h3 id="fallback-implementation">Fallback implementation <a href="#fallback-implementation" class="section">§</a></h3>
<p>The “fallback implementation” is for platforms that don’t have a
reliable mechanism for querying the executable path (or no one
implemented it in GHC yet). In this case, <code>executablePath</code> does not
even supply the query <code>IO</code> action.</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>executablePath <span class="ot">=</span> <span class="dt">Nothing</span></span></code></pre></div>
<p>Programs that want to query the executable path have to deal with
the <code>Nothing</code> case. That is: the possibility that there <em>is no
reliable way</em> to get it. That’s a good thing.</p>
<h2 id="conclusion">Conclusion <a href="#conclusion" class="section">§</a></h2>
<p>This article explained the problems of <code>getExecutablePath</code> and
reviewed the solution coming in GHC 9.4, called <code>executablePath</code>. I
encourage programs that use <code>getExecutablePath</code> to migrate when
feasible, especially if multi-platform support is important.</p>
<p>One topic I did not discuss is how I implemented tests for this
feature in the GHC test suite. I will cover this in an upcoming
post.</p>]]></summary>
</entry>
<entry>
    <title>Haddock: disambiguating types and values</title>
    <link href="https://frasertweedale.github.io/blog-fp/posts/2021-11-12-haddock-disambiguation.html" />
    <id>https://frasertweedale.github.io/blog-fp/posts/2021-11-12-haddock-disambiguation.html</id>
    <published>2021-11-12T00:00:00Z</published>
    <updated>2021-11-12T00:00:00Z</updated>
    <summary type="html"><![CDATA[<h1 id="haddock-disambiguating-types-and-values">Haddock: disambiguating types and values</h1>
<p>Haskell has separate namespaces for types and values. When types
and data constructors share a name, <a href="https://haskell-haddock.readthedocs.io/en/latest/index.html">Haddock</a>, Haskell’s
documentation generator, can get confused. <span class="abstract">In this post I show how
to disambiguate types and values in Haddock
documentation.</span></p>
<h2 id="demonstrating-the-problem">Demonstrating the problem <a href="#demonstrating-the-problem" class="section">§</a></h2>
<p>For demonstration purposes I created a simple module, <code>ACME.Disamb</code>:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">ACME.Disamb</span> (<span class="dt">Foo</span>(<span class="op">..</span>), <span class="dt">Bar</span>(<span class="op">..</span>), <span class="dt">Quux</span>, <span class="dt">Xyxxy</span>) <span class="kw">where</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Foo</span> <span class="ot">=</span> <span class="dt">Foo</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="co">-- | A bar contains a &#39;Foo&#39;.  Example:</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a><span class="co">--</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a><span class="co">-- @let bar = &#39;Bar&#39; &#39;Foo&#39;@</span></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a><span class="co">--</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Bar</span> <span class="ot">=</span> <span class="dt">Bar</span> <span class="dt">Foo</span></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Quux</span></span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> <span class="dt">Xyxxy</span> a</span></code></pre></div>
<p>Note that <code>Foo</code> is the name of both a type and a data constructor.
Same for <code>Bar</code>. <code>Quux</code> is a type with no constructor and <code>Xyxxy</code> is
a class. The Haddock
for type <code>Bar</code> contains ambiguous references to both <code>Bar</code> and
<code>Foo</code>.</p>
<p>Let’s look at the HTML Haddock generated for each top-level
declaration:</p>
<h3 id="data-foo"><code>data Foo</code> <a href="#data-foo" class="section">§</a></h3>
<div class="sourceCode" id="cb2"><pre class="sourceCode html"><code class="sourceCode html"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="dt">&lt;</span><span class="kw">div</span><span class="ot"> class</span><span class="op">=</span><span class="st">&quot;top&quot;</span><span class="dt">&gt;</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>  <span class="dt">&lt;</span><span class="kw">p</span><span class="ot"> class</span><span class="op">=</span><span class="st">&quot;src&quot;</span><span class="dt">&gt;</span></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>    <span class="dt">&lt;</span><span class="kw">span</span><span class="ot"> class</span><span class="op">=</span><span class="st">&quot;keyword&quot;</span><span class="dt">&gt;</span>data<span class="dt">&lt;/</span><span class="kw">span</span><span class="dt">&gt;</span> <span class="dt">&lt;</span><span class="kw">a</span><span class="ot"> id</span><span class="op">=</span><span class="st">&quot;t:Foo&quot;</span><span class="ot"> class</span><span class="op">=</span><span class="st">&quot;def&quot;</span><span class="dt">&gt;</span>Foo<span class="dt">&lt;/</span><span class="kw">a</span><span class="dt">&gt;</span></span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a>    <span class="dt">&lt;</span><span class="kw">a</span><span class="ot"> href</span><span class="op">=</span><span class="st">&quot;#t:Foo&quot;</span><span class="ot"> class</span><span class="op">=</span><span class="st">&quot;selflink&quot;</span><span class="dt">&gt;</span>#<span class="dt">&lt;/</span><span class="kw">a</span><span class="dt">&gt;</span></span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a>  <span class="dt">&lt;/</span><span class="kw">p</span><span class="dt">&gt;</span></span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a>  <span class="dt">&lt;</span><span class="kw">div</span><span class="ot"> class</span><span class="op">=</span><span class="st">&quot;subs constructors&quot;</span><span class="dt">&gt;</span></span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a>    <span class="dt">&lt;</span><span class="kw">p</span><span class="ot"> class</span><span class="op">=</span><span class="st">&quot;caption&quot;</span><span class="dt">&gt;</span>Constructors<span class="dt">&lt;/</span><span class="kw">p</span><span class="dt">&gt;</span></span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a>    <span class="dt">&lt;</span><span class="kw">table</span><span class="dt">&gt;</span></span>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a>      <span class="dt">&lt;</span><span class="kw">tbody</span><span class="dt">&gt;</span></span>
<span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a>        <span class="dt">&lt;</span><span class="kw">tr</span><span class="dt">&gt;</span></span>
<span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a>          <span class="dt">&lt;</span><span class="kw">td</span><span class="ot"> class</span><span class="op">=</span><span class="st">&quot;src&quot;</span><span class="dt">&gt;&lt;</span><span class="kw">a</span><span class="ot"> id</span><span class="op">=</span><span class="st">&quot;v:Foo&quot;</span><span class="ot"> class</span><span class="op">=</span><span class="st">&quot;def&quot;</span><span class="dt">&gt;</span>Foo<span class="dt">&lt;/</span><span class="kw">a</span><span class="dt">&gt;&lt;/</span><span class="kw">td</span><span class="dt">&gt;</span></span>
<span id="cb2-12"><a href="#cb2-12" aria-hidden="true" tabindex="-1"></a>          <span class="dt">&lt;</span><span class="kw">td</span><span class="ot"> class</span><span class="op">=</span><span class="st">&quot;doc empty&quot;</span><span class="dt">&gt;</span><span class="dv">&amp;nbsp;</span><span class="dt">&lt;/</span><span class="kw">td</span><span class="dt">&gt;</span></span>
<span id="cb2-13"><a href="#cb2-13" aria-hidden="true" tabindex="-1"></a>        <span class="dt">&lt;/</span><span class="kw">tr</span><span class="dt">&gt;</span></span>
<span id="cb2-14"><a href="#cb2-14" aria-hidden="true" tabindex="-1"></a>      <span class="dt">&lt;/</span><span class="kw">tbody</span><span class="dt">&gt;</span></span>
<span id="cb2-15"><a href="#cb2-15" aria-hidden="true" tabindex="-1"></a>    <span class="dt">&lt;/</span><span class="kw">table</span><span class="dt">&gt;</span></span>
<span id="cb2-16"><a href="#cb2-16" aria-hidden="true" tabindex="-1"></a>  <span class="dt">&lt;/</span><span class="kw">div</span><span class="dt">&gt;</span></span>
<span id="cb2-17"><a href="#cb2-17" aria-hidden="true" tabindex="-1"></a><span class="dt">&lt;/</span><span class="kw">div</span><span class="dt">&gt;</span></span></code></pre></div>
<p>The element representing type <code>Foo</code> has <code>id="t:Foo"</code>, whereas the
constructor has <code>id="v:Foo"</code>. These identifiers can be used as
fragment identifiers in hyperlinks. Types and values are
disambiguated through the <code>t:…</code> and <code>v:…</code> identifier prefixes.</p>
<h3 id="data-bar"><code>data Bar</code> <a href="#data-bar" class="section">§</a></h3>
<div class="sourceCode" id="cb3"><pre class="sourceCode html"><code class="sourceCode html"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="dt">&lt;</span><span class="kw">div</span><span class="ot"> class</span><span class="op">=</span><span class="st">&quot;top&quot;</span><span class="dt">&gt;</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>  <span class="dt">&lt;</span><span class="kw">p</span><span class="ot"> class</span><span class="op">=</span><span class="st">&quot;src&quot;</span><span class="dt">&gt;</span></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a>    <span class="dt">&lt;</span><span class="kw">span</span><span class="ot"> class</span><span class="op">=</span><span class="st">&quot;keyword&quot;</span><span class="dt">&gt;</span>data<span class="dt">&lt;/</span><span class="kw">span</span><span class="dt">&gt;</span> <span class="dt">&lt;</span><span class="kw">a</span><span class="ot"> id</span><span class="op">=</span><span class="st">&quot;t:Bar&quot;</span><span class="ot"> class</span><span class="op">=</span><span class="st">&quot;def&quot;</span><span class="dt">&gt;</span>Bar<span class="dt">&lt;/</span><span class="kw">a</span><span class="dt">&gt;</span></span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a>    <span class="dt">&lt;</span><span class="kw">a</span><span class="ot"> href</span><span class="op">=</span><span class="st">&quot;#t:Bar&quot;</span><span class="ot"> class</span><span class="op">=</span><span class="st">&quot;selflink&quot;</span><span class="dt">&gt;</span>#<span class="dt">&lt;/</span><span class="kw">a</span><span class="dt">&gt;</span></span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a>  <span class="dt">&lt;/</span><span class="kw">p</span><span class="dt">&gt;</span></span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a>  <span class="dt">&lt;</span><span class="kw">div</span><span class="ot"> class</span><span class="op">=</span><span class="st">&quot;doc&quot;</span><span class="dt">&gt;</span></span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a>    <span class="dt">&lt;</span><span class="kw">p</span><span class="dt">&gt;</span></span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a>      A bar contains a</span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a>      <span class="dt">&lt;</span><span class="kw">code</span><span class="dt">&gt;&lt;</span><span class="kw">a</span><span class="ot"> href</span><span class="op">=</span><span class="st">&quot;ACME-Disamb.html#t:Foo&quot;</span><span class="ot"> title</span><span class="op">=</span><span class="st">&quot;ACME.Disamb&quot;</span><span class="dt">&gt;</span>Foo<span class="dt">&lt;/</span><span class="kw">a</span><span class="dt">&gt;&lt;/</span><span class="kw">code</span><span class="dt">&gt;</span>. Example:</span>
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a>    <span class="dt">&lt;/</span><span class="kw">p</span><span class="dt">&gt;</span></span>
<span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a>    <span class="dt">&lt;</span><span class="kw">pre</span><span class="dt">&gt;</span>let bar = <span class="dt">&lt;</span><span class="kw">code</span><span class="dt">&gt;&lt;</span><span class="kw">a</span><span class="ot"> href</span><span class="op">=</span><span class="st">&quot;ACME-Disamb.html#t:Bar&quot;</span><span class="ot"> title</span><span class="op">=</span><span class="st">&quot;ACME.Disamb&quot;</span><span class="dt">&gt;</span>Bar<span class="dt">&lt;/</span><span class="kw">a</span><span class="dt">&gt;&lt;/</span><span class="kw">code</span><span class="dt">&gt;</span> <span class="dt">&lt;</span><span class="kw">code</span><span class="dt">&gt;&lt;</span><span class="kw">a</span><span class="ot"> href</span><span class="op">=</span><span class="st">&quot;ACME-Disamb.html#t:Foo&quot;</span><span class="ot"> title</span><span class="op">=</span><span class="st">&quot;ACME.Disamb&quot;</span><span class="dt">&gt;</span>Foo<span class="dt">&lt;/</span><span class="kw">a</span><span class="dt">&gt;&lt;/</span><span class="kw">code</span><span class="dt">&gt;&lt;/</span><span class="kw">pre</span><span class="dt">&gt;</span></span>
<span id="cb3-12"><a href="#cb3-12" aria-hidden="true" tabindex="-1"></a>  <span class="dt">&lt;/</span><span class="kw">div</span><span class="dt">&gt;</span></span>
<span id="cb3-13"><a href="#cb3-13" aria-hidden="true" tabindex="-1"></a>  <span class="co">&lt;!-- constructors elided --&gt;</span></span>
<span id="cb3-14"><a href="#cb3-14" aria-hidden="true" tabindex="-1"></a><span class="dt">&lt;/</span><span class="kw">div</span><span class="dt">&gt;</span></span></code></pre></div>
<p>Here we can see that all references to <code>Foo</code> and <code>Bar</code> in the
documentation I wrote <strong>all link to <code>t:Foo</code> or <code>t:Bar</code></strong>. This is
not what I intended. The usage example should refer to the data
constructors.</p>
<p>In my example this is a minor nuisance, but recall that <code>Foo</code> could
be the constructor of some other type. The <em>type</em> <code>Foo</code> could be
unrelated!</p>
<h3 id="data-quux"><code>data Quux</code> <a href="#data-quux" class="section">§</a></h3>
<div class="sourceCode" id="cb4"><pre class="sourceCode html"><code class="sourceCode html"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="dt">&lt;</span><span class="kw">div</span><span class="ot"> class</span><span class="op">=</span><span class="st">&quot;top&quot;</span><span class="dt">&gt;</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>  <span class="dt">&lt;</span><span class="kw">p</span><span class="ot"> class</span><span class="op">=</span><span class="st">&quot;src&quot;</span><span class="dt">&gt;</span></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>    <span class="dt">&lt;</span><span class="kw">span</span><span class="ot"> class</span><span class="op">=</span><span class="st">&quot;keyword&quot;</span><span class="dt">&gt;</span>data<span class="dt">&lt;/</span><span class="kw">span</span><span class="dt">&gt;</span> <span class="dt">&lt;</span><span class="kw">a</span><span class="ot"> id</span><span class="op">=</span><span class="st">&quot;t:Quux&quot;</span><span class="ot"> class</span><span class="op">=</span><span class="st">&quot;def&quot;</span><span class="dt">&gt;</span>Quux<span class="dt">&lt;/</span><span class="kw">a</span><span class="dt">&gt;</span></span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a>    <span class="dt">&lt;</span><span class="kw">a</span><span class="ot"> href</span><span class="op">=</span><span class="st">&quot;#t:Quux&quot;</span><span class="ot"> class</span><span class="op">=</span><span class="st">&quot;selflink&quot;</span><span class="dt">&gt;</span>#<span class="dt">&lt;/</span><span class="kw">a</span><span class="dt">&gt;</span></span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a>  <span class="dt">&lt;/</span><span class="kw">p</span><span class="dt">&gt;</span></span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a><span class="dt">&lt;/</span><span class="kw">div</span><span class="dt">&gt;</span></span></code></pre></div>
<p><code>Quux</code> has no constructor. As a result, there is no element with <code>id="v:…"</code>.</p>
<h3 id="class-xyxxy-a"><code>class Xyxxy a</code> <a href="#class-xyxxy-a" class="section">§</a></h3>
<div class="sourceCode" id="cb5"><pre class="sourceCode html"><code class="sourceCode html"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="dt">&lt;</span><span class="kw">div</span><span class="ot"> class</span><span class="op">=</span><span class="st">&quot;top&quot;</span><span class="dt">&gt;</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>  <span class="dt">&lt;</span><span class="kw">p</span><span class="ot"> class</span><span class="op">=</span><span class="st">&quot;src&quot;</span><span class="dt">&gt;</span></span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>    <span class="dt">&lt;</span><span class="kw">span</span><span class="ot"> class</span><span class="op">=</span><span class="st">&quot;keyword&quot;</span><span class="dt">&gt;</span>class<span class="dt">&lt;/</span><span class="kw">span</span><span class="dt">&gt;</span> <span class="dt">&lt;</span><span class="kw">a</span><span class="ot"> id</span><span class="op">=</span><span class="st">&quot;t:Xyzzy&quot;</span><span class="ot"> class</span><span class="op">=</span><span class="st">&quot;def&quot;</span><span class="dt">&gt;</span>Xyzzy<span class="dt">&lt;/</span><span class="kw">a</span><span class="dt">&gt;</span> a</span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a>    <span class="dt">&lt;</span><span class="kw">a</span><span class="ot"> href</span><span class="op">=</span><span class="st">&quot;#t:Xyzzy&quot;</span><span class="ot"> class</span><span class="op">=</span><span class="st">&quot;selflink&quot;</span><span class="dt">&gt;</span>#<span class="dt">&lt;/</span><span class="kw">a</span><span class="dt">&gt;</span></span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a>  <span class="dt">&lt;/</span><span class="kw">p</span><span class="dt">&gt;</span></span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a><span class="dt">&lt;/</span><span class="kw">div</span><span class="dt">&gt;</span></span></code></pre></div>
<p>Type class names inhabit the type namespace. Therefore the
corresponding element identifiers also use the <code>t:…</code> prefix.</p>
<h2 id="the-solution">The solution <a href="#the-solution" class="section">§</a></h2>
<p>To refer explicitly to a type or value, prefix the reference with
<code>t</code> or <code>v</code>. For example:</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="co">-- | A bar contains a &#39;Foo&#39;.  Example:</span></span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="co">--</span></span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a><span class="co">-- @let bar = v&#39;Bar&#39; v&#39;Foo&#39;@</span></span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a><span class="co">--</span></span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Bar</span> <span class="ot">=</span> <span class="dt">Bar</span> <span class="dt">Foo</span></span></code></pre></div>
<p>The resulting HTML</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode html"><code class="sourceCode html"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>…</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>  <span class="dt">&lt;</span><span class="kw">div</span><span class="ot"> class</span><span class="op">=</span><span class="st">&quot;doc&quot;</span><span class="dt">&gt;</span></span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a>    <span class="dt">&lt;</span><span class="kw">p</span><span class="dt">&gt;</span></span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a>      A bar contains a</span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a>      <span class="dt">&lt;</span><span class="kw">code</span><span class="dt">&gt;&lt;</span><span class="kw">a</span><span class="ot"> href</span><span class="op">=</span><span class="st">&quot;ACME-Disamb.html#t:Foo&quot;</span><span class="ot"> title</span><span class="op">=</span><span class="st">&quot;ACME.Disamb&quot;</span><span class="dt">&gt;</span>Foo<span class="dt">&lt;/</span><span class="kw">a</span><span class="dt">&gt;&lt;/</span><span class="kw">code</span><span class="dt">&gt;</span>. Example:</span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a>    <span class="dt">&lt;/</span><span class="kw">p</span><span class="dt">&gt;</span></span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a>    <span class="dt">&lt;</span><span class="kw">pre</span><span class="dt">&gt;</span>let bar = <span class="dt">&lt;</span><span class="kw">code</span><span class="dt">&gt;&lt;</span><span class="kw">a</span><span class="ot"> href</span><span class="op">=</span><span class="st">&quot;ACME-Disamb.html#v:Bar&quot;</span><span class="ot"> title</span><span class="op">=</span><span class="st">&quot;ACME.Disamb&quot;</span><span class="dt">&gt;</span>Bar<span class="dt">&lt;/</span><span class="kw">a</span><span class="dt">&gt;&lt;/</span><span class="kw">code</span><span class="dt">&gt;</span> <span class="dt">&lt;</span><span class="kw">code</span><span class="dt">&gt;&lt;</span><span class="kw">a</span><span class="ot"> href</span><span class="op">=</span><span class="st">&quot;ACME-Disamb.html#v:Foo&quot;</span><span class="ot"> title</span><span class="op">=</span><span class="st">&quot;ACME.Disamb&quot;</span><span class="dt">&gt;</span>Foo<span class="dt">&lt;/</span><span class="kw">a</span><span class="dt">&gt;&lt;/</span><span class="kw">code</span><span class="dt">&gt;&lt;/</span><span class="kw">pre</span><span class="dt">&gt;</span></span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a>  <span class="dt">&lt;/</span><span class="kw">div</span><span class="dt">&gt;</span></span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a>…</span></code></pre></div>
<div class="note">
<p>This feature is available since <a href="https://hackage.haskell.org/package/haddock-2.23.0/changelog">haddock-2.23.0</a> (<a href="https://github.com/haskell/haddock/commit/dd47029cb29c80b1ab4db520c9c2ce4dca37f833">commit</a>).
The published <a href="https://haskell-haddock.readthedocs.io/en/latest/index.html">user guide</a> is out of date but you can read
<a href="https://github.com/haskell/haddock/blob/haddock-2.25.0-release/doc/markup.rst#hyperlinked-identifiers">up-to-date documentation on GitHub</a>.</p>
</div>
<h2 id="inter-module-references">Inter-module references <a href="#inter-module-references" class="section">§</a></h2>
<p>Consider the following module:</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="co">-- | See also &#39;ACME.Disamb.Quux&#39;.</span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">ACME.Disamb2</span> <span class="kw">where</span></span></code></pre></div>
<p>Unlike references <em>within</em> a module, inter-module references default
to the <em>value</em> namespace:</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode html"><code class="sourceCode html"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="dt">&lt;</span><span class="kw">p</span><span class="ot"> class</span><span class="op">=</span><span class="st">&quot;caption&quot;</span><span class="dt">&gt;</span>Description<span class="dt">&lt;/</span><span class="kw">p</span><span class="dt">&gt;</span></span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a><span class="dt">&lt;</span><span class="kw">div</span><span class="ot"> class</span><span class="op">=</span><span class="st">&quot;doc&quot;</span><span class="dt">&gt;</span></span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a>  <span class="dt">&lt;</span><span class="kw">p</span><span class="dt">&gt;</span></span>
<span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a>    See also</span>
<span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a>    <span class="dt">&lt;</span><span class="kw">code</span><span class="dt">&gt;&lt;</span><span class="kw">a</span><span class="ot"> href</span><span class="op">=</span><span class="st">&quot;ACME-Disamb.html#v:Quux&quot;</span><span class="ot"> title</span><span class="op">=</span><span class="st">&quot;ACME.Disamb&quot;</span><span class="dt">&gt;</span>Quux<span class="dt">&lt;/</span><span class="kw">a</span><span class="dt">&gt;&lt;/</span><span class="kw">code</span><span class="dt">&gt;</span>.</span>
<span id="cb9-6"><a href="#cb9-6" aria-hidden="true" tabindex="-1"></a>  <span class="dt">&lt;/</span><span class="kw">p</span><span class="dt">&gt;</span></span>
<span id="cb9-7"><a href="#cb9-7" aria-hidden="true" tabindex="-1"></a><span class="dt">&lt;/</span><span class="kw">div</span><span class="dt">&gt;</span></span></code></pre></div>
<p>Recall that <code>Quux</code> has no constructor. So the link doesn’t even
target the wrong identifier; it targets a <em>non-existent</em> identifier.</p>
<p>The solution is the same: prefix the whole reference with <code>t</code>:</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="co">-- | See also t&#39;ACME.Disamb.Quux&#39;.</span></span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">ACME.Disamb2</span> <span class="kw">where</span></span></code></pre></div>]]></summary>
</entry>
<entry>
    <title>How to protect aeson code from hash flooding</title>
    <link href="https://frasertweedale.github.io/blog-fp/posts/2021-10-12-aeson-hash-flooding-protection.html" />
    <id>https://frasertweedale.github.io/blog-fp/posts/2021-10-12-aeson-hash-flooding-protection.html</id>
    <published>2021-10-12T00:00:00Z</published>
    <updated>2021-10-12T00:00:00Z</updated>
    <summary type="html"><![CDATA[<h1 id="how-to-protect-aeson-code-from-hash-flooding">How to protect <em>aeson</em> code from hash flooding</h1>
<p>A few weeks ago Tom Sydney Kerckhove (<a href="https://twitter.com/kerckhove_ts/">@kerckhove_ts</a>)
published <a href="https://cs-syd.eu/posts/2021-09-11-json-vulnerability">an excellent writeup</a> of a serious DoS
vulnerability in <a href="https://hackage.haskell.org/package/aeson"><em>aeson</em></a>, a widely used Haskell
JSON library. <span class="abstract">A new <em>aeson</em> release addresses the hash flooding
issue, but you <strong>need more than a version bump</strong> to ensure your
programs are protected.</span> This post outlines how <em>aeson</em>
addressed the vulnerability and what action <em>you</em> need to take.</p>
<h2 id="overview-of-the-issue">Overview of the issue <a href="#overview-of-the-issue" class="section">§</a></h2>
<p><a href="https://cs-syd.eu/posts/2021-09-11-json-vulnerability">Tom’s article</a> is great and if you want the gory details,
go read it. There’s no need for me to repeat it here. It’s enough
to say that the attack, called <em>hash flooding</em> or <em>hash DoS</em>,
exploits the behaviour of the <a href="https://hackage.haskell.org/package/unordered-containers-0.2.14.0/docs/Data-HashMap-Lazy.html"><code>HashMap</code></a>
implementation from <em>unordered-containers</em>, which <em>aeson</em> used. It
results in a denial of service through CPU consumption. This
technique has been used in real-world attacks against a variety of
languages, libraries and frameworks over the years.</p>
<h2 id="am-i-vulnerable">Am I vulnerable? <a href="#am-i-vulnerable" class="section">§</a></h2>
<p>If you are using <code>aeson &lt; 2.0.0.0</code> and processing JSON from
untrusted sources, you are probably vulnerable. You could mitigate
the attack by refusing to decode large inputs, if your use case
allows it. Rate limiting may be a possible mitigation for some
applications.</p>
<h2 id="how-did-aeson-address-the-vulnerability">How did <em>aeson</em> address the vulnerability? <a href="#how-did-aeson-address-the-vulnerability" class="section">§</a></h2>
<p>Whereas prior versions used <code>HashMap</code> directly, starting at version
<code>2.0.0.0</code> <em>aeson</em> abstracts the map implementation behind a new data
type: <a href="https://hackage.haskell.org/package/aeson-2.0.1.0/docs/src/Data.Aeson.KeyMap.html"><code>Data.Aeson.KeyMap</code></a>. The <code>ordered-keymap</code>
Cabal flag selects the underlying implementation. When set, <em>aeson</em>
uses the <code>Ord</code>-based <a href="https://hackage.haskell.org/package/containers-0.6.0.1/docs/Data-Map-Lazy.html#t:Map"><code>Map</code></a> from <em>containers</em>. If
unset, <em>aeson</em> uses <a href="https://hackage.haskell.org/package/unordered-containers-0.2.14.0/docs/Data-HashMap-Lazy.html"><code>HashMap</code></a>.</p>
<p>Version <code>2.0.0.0</code> defaults the flag to <code>False</code>. As of <code>2.0.1.0</code> it
defaults to <code>True</code>. Importantly, the maintainers offer <a href="https://github.com/haskell/aeson/issues/864#issuecomment-939363297"><strong>no
guarantee that the default won’t change again</strong></a>. So
if you use <em>aeson</em> and want to protect yourself from hash flooding
attacks, take the extra precautions outlined in the following
sections.</p>
<p>This is an API-breaking change, hence the major version bump. Most
users will not have to change much code, but there will be
exceptions (I had to change quite a lot for <a href="https://hackage.haskell.org/package/jose"><em>jose</em></a>).</p>
<p>The <code>Map</code> version also behaves differently from <code>HashMap</code>. In
particular, objects may be serialised with a different key order,
and object keys are iterated in different orders. And who knows
what systems out there depend on the key order in some way, even
though they should not. That is a big reason why the maintainers
felt it was necessary to keep the option of using <code>HashMap</code>.</p>
<p>Also, these data structures have different performance
characteristics, with <code>Map</code> having <em>O(log n)</em> insertion and lookup
time. <code>HashMap</code> insertion and lookup are amortised <em>O(1)</em>,
degrading to <em>O(n)</em> for pathological inputs—which is the cause of
the vulnerability!</p>
<h2 id="compiling-a-safe-version-of-aeson">Compiling a safe version of aeson <a href="#compiling-a-safe-version-of-aeson" class="section">§</a></h2>
<p>If you have a program or library that uses <em>aeson</em>, you need to
ensure that the <em>aeson</em> you link against was compiled with the
<code>ordered-keymap</code> flag. There is no way to express this condition in
a <code>.cabal</code> file, but you can <em>can</em> express these constraints in the
<code>cabal.project</code> file:</p>
<pre><code>packages: .
constraints:
  aeson +ordered-keymap</code></pre>
<p>For Stack users, configure the flag in your <code>stack.yaml</code>:</p>
<pre><code>flags:
  aeson:
    ordered-keymap: true</code></pre>
<p>If you’re building and installing <em>aeson</em> directly, via
<em>cabal-install</em> (the <code>cabal</code> program), you can use the
<code>--flags=ordered-keymap</code> command line option.</p>
<h2 id="runtime-checks">Runtime checks <a href="#runtime-checks" class="section">§</a></h2>
<p>In your program or library you can also detect the <code>KeyMap</code>
implementation at runtime. If you detect <code>HashMap</code> you could abort,
emit a warning, or employ other mitigations like limiting the input
size.</p>
<p><code>Data.Aeson.KeyMap</code> exports the following types:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>coercionToHashMap</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="ot">  ::</span> <span class="dt">Maybe</span> (<span class="dt">Coercion</span> (<span class="dt">HashMap</span> <span class="dt">Key</span> v) (<span class="dt">KeyMap</span> v))</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a>coercionToMap</span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a><span class="ot">  ::</span> <span class="dt">Maybe</span> (<span class="dt">Coercion</span>     (<span class="dt">Map</span> <span class="dt">Key</span> v) (<span class="dt">KeyMap</span> v))</span></code></pre></div>
<p>The values are coercions—proofs of representational equality
enabling zero-cost conversions; see
<a href="https://hackage.haskell.org/package/base-4.15.0.0/docs/Data-Type-Coercion.html#t:Coercion"><code>Data.Type.Coercion</code></a>. Only one of <code>HashMap</code> or
<code>Map</code> is actually used, which is why they’re wrapped in <code>Maybe</code>.
The map implementation that <em>aeson</em> is using has a non-<code>Nothing</code>
coercion.</p>
<p>In <a href="https://hackage.haskell.org/package/jose"><em>jose</em></a> I will export the following value to make
it easy for library users to check that the implementation is safe
from hash flooding:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="ot">vulnerableToHashFlood ::</span> <span class="dt">Bool</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>vulnerableToHashFlood <span class="ot">=</span></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>  <span class="kw">case</span> KeyMap.coercionToMap <span class="kw">of</span></span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a>    <span class="dt">Just</span> _  <span class="ot">-&gt;</span> <span class="dt">False</span></span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a>    <span class="dt">Nothing</span> <span class="ot">-&gt;</span> <span class="dt">True</span></span></code></pre></div>
<p>Users can (and hopefully will) check that value and respond in
whatever way is suitable for their use case. I might go even
further and cause all JWS processing to immediately fail when the
vulnerable implementation is detected, unless the caller overrides
this behaviour.</p>
<h2 id="what-about-other-things-that-use-hashmap">What about other things that use <code>HashMap</code>? <a href="#what-about-other-things-that-use-hashmap" class="section">§</a></h2>
<p>The <code>HashMap</code> data structure from <em>unordered-containers</em> remains
vulnerable to hash flooding attacks. Users and maintainers are
discussion potential solutions and mitigations in <a href="https://github.com/haskell-unordered-containers/unordered-containers/issues/319">issue #319</a>.
There are several interesting ideas, including:</p>
<ul>
<li><p>Initialise the library with a random salt, via <code>unsafePerformIO</code>.
Many libraries in other language ecosystems use this approach.
But it breaks referential integrity. Values and orders will not
be stable across different executions.</p></li>
<li><p>Use a more collision-resistant hash algorithm, or multiple hashes,
to make it harder to compute collisions.</p></li>
<li><p>Don’t do anything, because the other ideas come with performance
or usability penalties. If your program needs to be safe against
hash flooding, employ other mitigations (size check, rate
limiting, etc) or use an ordered map.</p></li>
</ul>
<p>This discussion is ongoing. The only change so far is to add a
security advisory to the package description.</p>
<h2 id="conclusion">Conclusion <a href="#conclusion" class="section">§</a></h2>
<p><code>aeson &gt;= 2.0.0.0</code> has mitigated the hash flooding vulnerability.
Users of the library must take specific action not only to upgrade
<em>aeson</em> to the latest version, but also ensure it is compiled with
the correct flags. Programs can also perform runtime checks and
take appropriate action if <em>aeson</em> is using <code>HashMap</code>.</p>]]></summary>
</entry>
<entry>
    <title>Reusing random generators in Hedgehog</title>
    <link href="https://frasertweedale.github.io/blog-fp/posts/2021-10-03-hedgehog-reuse-random.html" />
    <id>https://frasertweedale.github.io/blog-fp/posts/2021-10-03-hedgehog-reuse-random.html</id>
    <published>2021-10-03T00:00:00Z</published>
    <updated>2021-10-03T00:00:00Z</updated>
    <summary type="html"><![CDATA[<h1 id="reusing-random-generators-in-hedgehog">Reusing random generators in Hedgehog</h1>
<p><a href="https://hedgehog.qa/">Hedgehog</a> has a powerful API for generating arbitrary values of
your types. But sometimes a library will already provide a random
generator. <span class="abstract">In this post I show how to use existing generators with
Hedgehog, and discuss the advantages and disadvantages.</span></p>
<h2 id="random-generator-use-cases">Random generator use cases <a href="#random-generator-use-cases" class="section">§</a></h2>
<p>Libraries may need to provide random generators of (some of) their
types for a variety of reasons. Cryptographic keys, secrets and
unique identifiers come to mind immediately.</p>
<p>One use case we have in <a href="https://hackage.haskell.org/package/purebred-email"><em>purebred-email</em></a>
is generation of MIME multipart boundary values (<a href="https://www.rfc-editor.org/rfc/rfc2046.html#section-5.1">RFC
2046</a>). The boundary is a string with 1–70 characters
from a restricted alphabet. Using a random boundary is useful
because the boundary delimiter line (the boundary value preceded by
two hyphens) must not appear anywhere within the message parts.</p>
<p>The <code>Boundary</code> type is defined as follows:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="co">-- constructor NOT exported</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="kw">newtype</span> <span class="dt">Boundary</span> <span class="ot">=</span> <span class="dt">Boundary</span> <span class="dt">ByteString</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>  <span class="kw">deriving</span> (<span class="dt">Eq</span>, <span class="dt">Show</span>)</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="ot">unBoundary ::</span> <span class="dt">Boundary</span> <span class="ot">-&gt;</span> <span class="dt">ByteString</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a>unBoundary (<span class="dt">Boundary</span> s) <span class="ot">=</span> s</span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a><span class="co">-- smart constructor; checks length and validity</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a><span class="ot">makeBoundary ::</span> <span class="dt">ByteString</span> <span class="ot">-&gt;</span> <span class="dt">Either</span> <span class="dt">ByteString</span> <span class="dt">Boundary</span></span></code></pre></div>
<p>We don’t export the constructor. Users must use the <code>makeBoundary</code>
<em>smart constructor</em> which checks that the input is a valid boundary
value.</p>
<p>We also instance the <a href="https://hackage.haskell.org/package/random-1.2.0/docs/System-Random-Stateful.html#t:Uniform"><code>Uniform</code></a> type class from
the <a href="https://hackage.haskell.org/package/random"><em>random</em></a> package (version 1.2.0 onwards).
This instance provides a convenient way for users to generate
conformant boundary values that have a negligible probability of
matching any line in an arbitrary message.</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="kw">qualified</span> <span class="dt">Data.ByteString</span> <span class="kw">as</span> <span class="dt">B</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="kw">qualified</span> <span class="dt">Data.ByteString.Internal</span> <span class="kw">as</span> <span class="dt">B</span></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="kw">qualified</span> <span class="dt">Data.ByteString.Char8</span> <span class="kw">as</span> <span class="dt">C8</span></span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Uniform</span> <span class="dt">Boundary</span> <span class="kw">where</span></span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a><span class="ot">  uniformM ::</span> <span class="dt">StatefulGen</span> g m <span class="ot">=&gt;</span> g <span class="ot">-&gt;</span> m a</span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a>  uniformM g <span class="ot">=</span></span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a>    <span class="dt">Boundary</span> <span class="op">.</span> B.unsafePackLenBytes <span class="dv">64</span> <span class="op">&lt;$&gt;</span> randString</span>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a>  <span class="kw">where</span></span>
<span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a>    randString  <span class="ot">=</span> replicateM <span class="dv">64</span> randChar</span>
<span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a>    randChar    <span class="ot">=</span> B.index bchars <span class="op">&lt;$&gt;</span> randIndex</span>
<span id="cb2-12"><a href="#cb2-12" aria-hidden="true" tabindex="-1"></a>    randIndex   <span class="ot">=</span> uniformRM (<span class="dv">0</span>, B.length bchars <span class="op">-</span> <span class="dv">1</span>) g</span>
<span id="cb2-13"><a href="#cb2-13" aria-hidden="true" tabindex="-1"></a>    bchars      <span class="ot">=</span> C8.pack <span class="op">$</span></span>
<span id="cb2-14"><a href="#cb2-14" aria-hidden="true" tabindex="-1"></a>                       [<span class="ch">&#39;a&#39;</span><span class="op">..</span><span class="ch">&#39;z&#39;</span>] <span class="op">&lt;&gt;</span> [<span class="ch">&#39;A&#39;</span><span class="op">..</span><span class="ch">&#39;Z&#39;</span>]</span>
<span id="cb2-15"><a href="#cb2-15" aria-hidden="true" tabindex="-1"></a>                    <span class="op">&lt;&gt;</span> [<span class="ch">&#39;0&#39;</span><span class="op">..</span><span class="ch">&#39;9&#39;</span>] <span class="op">&lt;&gt;</span> <span class="st">&quot;&#39;()+_,-./:=?&quot;</span></span></code></pre></div>
<div class="note">
<p>A <code>Uniform</code> instance is supposed to draw from all possible values of
a type. In the <code>Boundary</code> instance we are only generating values of
length 64. This is acceptable for our use case but may surprise
some users.</p>
</div>
<p>The <em>random</em> library provides a very general interface to
instantiate and use random number generators. I cannot cover it in
any detail in this post. Assuming you already have a generator
value, <a href="https://hackage.haskell.org/package/random-1.2.0/docs/System-Random-Stateful.html#t:Uniform"><code>System.Random.uniform</code></a> generates a value
of any type with an instance of <code>Uniform</code>:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="ot">uniform ::</span> (<span class="dt">RandomGen</span> g, <span class="dt">Uniform</span> a) <span class="ot">=&gt;</span> g <span class="ot">-&gt;</span> (a, g)</span></code></pre></div>
<p>You can use <code>uniform</code> with
<a href="https://hackage.haskell.org/package/random-1.2.0/docs/System-Random.html#v:getStdRandom"><code>System.Random.getStdRandom</code></a> to generate
values using a global pseudo-random number generated initialised
from system entropy, as an <code>IO</code> action:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="ot">getStdRandom ::</span> <span class="dt">MonadIO</span> m <span class="ot">=&gt;</span> (<span class="dt">StdGen</span> <span class="ot">-&gt;</span> (a, <span class="dt">StdGen</span>)) <span class="ot">-&gt;</span>  m a</span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="ot">getStdRandom ::</span>              (<span class="dt">StdGen</span> <span class="ot">-&gt;</span> (a, <span class="dt">StdGen</span>)) <span class="ot">-&gt;</span> <span class="dt">IO</span> a</span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a>getStdRandom<span class="ot"> uniform ::</span> (<span class="dt">MonadIO</span> m, <span class="dt">Uniform</span> a) <span class="ot">=&gt;</span>  m a</span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a>getStdRandom<span class="ot"> uniform ::</span>            (<span class="dt">Uniform</span> a) <span class="ot">=&gt;</span> <span class="dt">IO</span> a</span></code></pre></div>
<h2 id="hedgehog-and-hidden-constructors">Hedgehog and hidden constructors <a href="#hedgehog-and-hidden-constructors" class="section">§</a></h2>
<p>If a module does not expose the constructor of some type, how can
the test suite generate random values of that type? There are
several ways you could tackle this:</p>
<ol type="1">
<li><p>Export the constructor from some “internal” module, which is not
really internal. In this way, library users may be
discouraged—but not prevented—from constructing bad data. The
test module can import the constructor from the library’s
“internal” module and use it to define the generator.</p></li>
<li><p>Export a Hedgehog <code>Gen</code> for the type from the library itself.
This causes the library to depend on Hedgehog, which is usually
not desirable.</p></li>
<li><p>For a <code>newtype</code>, use
<a href="https://hackage.haskell.org/package/base-4.15.0.0/docs/Unsafe-Coerce.html"><code>Unsafe.Coerce.unsafeCoerce</code></a> in the <code>Gen</code>
definition to coerce the underlying type to the wrapped type.
You cannot use <a href="https://hackage.haskell.org/package/base-4.15.0.0/docs/Data-Coerce.html"><code>Data.Coerce.coerce</code></a> if the
constructor is not in scope. This is nasty, but not unspeakable
given we’re talking about generators for the test suite.</p></li>
</ol>
<ol start="4" type="1">
<li>Export a “lightweight” random generator from the library, and
reuse it to define the <code>Gen</code> in the test suite. If you were
going to export a <code>Uniform</code> (or <code>UniformRange</code>) instance anyway,
this will be low-effort. This approach is the main topic of this
article.</li>
</ol>
<h2 id="implementing-gen-using-uniform">Implementing <code>Gen</code> using <code>Uniform</code> <a href="#implementing-gen-using-uniform" class="section">§</a></h2>
<p>I was aware that Hedgehog depends on <em>random</em>, and was hopeful of
finding a way to use the existing <code>Uniform</code> instance to implement a
<code>Gen Boundary</code>. Looking through the docs, I stumbled across
<a href="https://hackage.haskell.org/package/hedgehog-1.0.5/docs/Hedgehog-Internal-Gen.html#v:generate"><code>generate</code></a>:</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="ot">generate ::</span> <span class="dt">MonadGen</span> m <span class="ot">=&gt;</span> (<span class="dt">Size</span> <span class="ot">-&gt;</span> <span class="dt">Seed</span> <span class="ot">-&gt;</span> a) <span class="ot">-&gt;</span> m a</span></code></pre></div>
<p>It was not immediately apparent whether I could use <code>generate</code> to
define a <code>Gen Boundary</code>. First, does <code>Gen</code> have an instance of
<code>MonadGen</code>?</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Gen</span> <span class="ot">=</span> <span class="dt">GenT</span> <span class="dt">Identity</span></span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a><span class="dt">Monad</span> m <span class="ot">=&gt;</span> <span class="dt">MonadGen</span> (<span class="dt">GenT</span> m)</span></code></pre></div>
<p>Yes, it does. Next, I had to work out how to turn a <code>Size</code> and a
<code>Seed</code> into a <code>Boundary</code>. To my delight, I saw that <code>Seed</code> has an
instance of <code>RandomGen</code>. Putting it together, all that is required
is to apply <code>uniform</code> to the <code>Seed</code>, and discard the new generator
value. I ignore the <code>Size</code>.</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Hedgehog</span> (<span class="dt">Gen</span>)</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Hedgehog.Internal.Gen</span> (generate)</span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a><span class="ot">genBoundary ::</span> <span class="dt">Gen</span> <span class="dt">Boundary</span></span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a>genBoundary <span class="ot">=</span> generate (\_size seed <span class="ot">-&gt;</span> <span class="fu">fst</span> (uniform seed))</span></code></pre></div>
<h2 id="disadvantages">Disadvantages <a href="#disadvantages" class="section">§</a></h2>
<p>There are a few disadvantages to reusing a library’s random
generator to define your Hedgehog <code>Gen</code>.</p>
<p>First, the generated values are restricted to whatever the library’s
generator gives you. In my case, the <code>Boundary</code> generator only
generates values of length 64. It follows that Hedgehog could miss
all kinds of bugs. For example, if <em>purebred-email</em> fails to decode
boundaries of length 70 due to an off-by-one error, I have no hope
of catching that bug.</p>
<p>Second, <code>generate</code> gives you a <code>Gen</code> with no shrinks. If Hedgehog
finds a counterexample, it can’t do anything to try and simplify it.
Automatic shrinking is one of Hedgehog’ss killer features, but you
give it up by using <code>generate</code>.</p>
<p>You can use the <code>shrink</code> function to supply additional shrinking
behaviour to a <code>Gen</code>:</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="ot">shrink ::</span> <span class="dt">MonadGen</span> m <span class="ot">=&gt;</span> (a <span class="ot">-&gt;</span> [a]) <span class="ot">-&gt;</span> m a <span class="ot">-&gt;</span> m a </span></code></pre></div>
<p>But when you don’t have access to the constructor for the data type
you’re generating, defining your own shrinks is at best awkward, and
maybe impossible. I <em>could</em> implement <code>Boundary</code> shrinking by
extracting the underlying <code>ByteString</code> (<code>unBoundary</code>), shrinking it,
applying the smart constructor (<code>makeBoundary</code>) and filtering
invalid values. That’s a lot of work. I didn’t bother.</p>
<h2 id="conclusion">Conclusion <a href="#conclusion" class="section">§</a></h2>
<p>Defining Hedgehog <code>Gen</code> values can be awkward or very difficult for
types whose constructors are hidden. But if you have a function
that uses a <code>RandomGen</code> to generate values, you can use it with
Hedgehog’s <code>generate</code> function to define a <code>Gen</code>. The downsides are
that you don’t get automatic shrinking, and you are restricted to
whatever values the generator produces.</p>
<p>Alternative approaches include exposing the constructor via an
“internal” (but actually public) module, or using <code>unsafeCoerce</code>.</p>]]></summary>
</entry>

</feed>
