Reusing random generators in Hedgehog
Hedgehog has a powerful API for generating arbitrary values of your types. But sometimes a library will already provide a random generator. In this post I show how to use existing generators with Hedgehog, and discuss the advantages and disadvantages.
Random generator use cases §
Libraries may need to provide random generators of (some of) their types for a variety of reasons. Cryptographic keys, secrets and unique identifiers come to mind immediately.
One use case we have in purebred-email is generation of MIME multipart boundary values (RFC 2046). The boundary is a string with 1–70 characters from a restricted alphabet. Using a random boundary is useful because the boundary delimiter line (the boundary value preceded by two hyphens) must not appear anywhere within the message parts.
The Boundary
type is defined as follows:
-- constructor NOT exported
newtype Boundary = Boundary ByteString
deriving (Eq, Show)
unBoundary :: Boundary -> ByteString
Boundary s) = s
unBoundary (
-- smart constructor; checks length and validity
makeBoundary :: ByteString -> Either ByteString Boundary
We don’t export the constructor. Users must use the makeBoundary
smart constructor which checks that the input is a valid boundary
value.
We also instance the Uniform
type class from
the random package (version 1.2.0 onwards).
This instance provides a convenient way for users to generate
conformant boundary values that have a negligible probability of
matching any line in an arbitrary message.
import qualified Data.ByteString as B
import qualified Data.ByteString.Internal as B
import qualified Data.ByteString.Char8 as C8
instance Uniform Boundary where
uniformM :: StatefulGen g m => g -> m a
=
uniformM g Boundary . B.unsafePackLenBytes 64 <$> randString
where
= replicateM 64 randChar
randString = B.index bchars <$> randIndex
randChar = uniformRM (0, B.length bchars - 1) g
randIndex = C8.pack $
bchars 'a'..'z'] <> ['A'..'Z']
[<> ['0'..'9'] <> "'()+_,-./:=?"
A Uniform
instance is supposed to draw from all possible values of
a type. In the Boundary
instance we are only generating values of
length 64. This is acceptable for our use case but may surprise
some users.
The random library provides a very general interface to
instantiate and use random number generators. I cannot cover it in
any detail in this post. Assuming you already have a generator
value, System.Random.uniform
generates a value
of any type with an instance of Uniform
:
uniform :: (RandomGen g, Uniform a) => g -> (a, g)
You can use uniform
with
System.Random.getStdRandom
to generate
values using a global pseudo-random number generated initialised
from system entropy, as an IO
action:
getStdRandom :: MonadIO m => (StdGen -> (a, StdGen)) -> m a
getStdRandom :: (StdGen -> (a, StdGen)) -> IO a
uniform :: (MonadIO m, Uniform a) => m a
getStdRandom uniform :: (Uniform a) => IO a getStdRandom
Hedgehog and hidden constructors §
If a module does not expose the constructor of some type, how can the test suite generate random values of that type? There are several ways you could tackle this:
Export the constructor from some “internal” module, which is not really internal. In this way, library users may be discouraged—but not prevented—from constructing bad data. The test module can import the constructor from the library’s “internal” module and use it to define the generator.
Export a Hedgehog
Gen
for the type from the library itself. This causes the library to depend on Hedgehog, which is usually not desirable.For a
newtype
, useUnsafe.Coerce.unsafeCoerce
in theGen
definition to coerce the underlying type to the wrapped type. You cannot useData.Coerce.coerce
if the constructor is not in scope. This is nasty, but not unspeakable given we’re talking about generators for the test suite.
- Export a “lightweight” random generator from the library, and
reuse it to define the
Gen
in the test suite. If you were going to export aUniform
(orUniformRange
) instance anyway, this will be low-effort. This approach is the main topic of this article.
Implementing Gen
using Uniform
§
I was aware that Hedgehog depends on random, and was hopeful of
finding a way to use the existing Uniform
instance to implement a
Gen Boundary
. Looking through the docs, I stumbled across
generate
:
generate :: MonadGen m => (Size -> Seed -> a) -> m a
It was not immediately apparent whether I could use generate
to
define a Gen Boundary
. First, does Gen
have an instance of
MonadGen
?
type Gen = GenT Identity
Monad m => MonadGen (GenT m)
Yes, it does. Next, I had to work out how to turn a Size
and a
Seed
into a Boundary
. To my delight, I saw that Seed
has an
instance of RandomGen
. Putting it together, all that is required
is to apply uniform
to the Seed
, and discard the new generator
value. I ignore the Size
.
import Hedgehog (Gen)
import Hedgehog.Internal.Gen (generate)
genBoundary :: Gen Boundary
= generate (\_size seed -> fst (uniform seed)) genBoundary
Disadvantages §
There are a few disadvantages to reusing a library’s random
generator to define your Hedgehog Gen
.
First, the generated values are restricted to whatever the library’s
generator gives you. In my case, the Boundary
generator only
generates values of length 64. It follows that Hedgehog could miss
all kinds of bugs. For example, if purebred-email fails to decode
boundaries of length 70 due to an off-by-one error, I have no hope
of catching that bug.
Second, generate
gives you a Gen
with no shrinks. If Hedgehog
finds a counterexample, it can’t do anything to try and simplify it.
Automatic shrinking is one of Hedgehog’ss killer features, but you
give it up by using generate
.
You can use the shrink
function to supply additional shrinking
behaviour to a Gen
:
shrink :: MonadGen m => (a -> [a]) -> m a -> m a
But when you don’t have access to the constructor for the data type
you’re generating, defining your own shrinks is at best awkward, and
maybe impossible. I could implement Boundary
shrinking by
extracting the underlying ByteString
(unBoundary
), shrinking it,
applying the smart constructor (makeBoundary
) and filtering
invalid values. That’s a lot of work. I didn’t bother.
Conclusion §
Defining Hedgehog Gen
values can be awkward or very difficult for
types whose constructors are hidden. But if you have a function
that uses a RandomGen
to generate values, you can use it with
Hedgehog’s generate
function to define a Gen
. The downsides are
that you don’t get automatic shrinking, and you are restricted to
whatever values the generator produces.
Alternative approaches include exposing the constructor via an
“internal” (but actually public) module, or using unsafeCoerce
.