Reusing random generators in Hedgehog
Hedgehog has a powerful API for generating arbitrary values of your types. But sometimes a library will already provide a random generator. In this post I show how to use existing generators with Hedgehog, and discuss the advantages and disadvantages.
Random generator use cases §
Libraries may need to provide random generators of (some of) their types for a variety of reasons. Cryptographic keys, secrets and unique identifiers come to mind immediately.
One use case we have in purebred-email is generation of MIME multipart boundary values (RFC 2046). The boundary is a string with 1–70 characters from a restricted alphabet. Using a random boundary is useful because the boundary delimiter line (the boundary value preceded by two hyphens) must not appear anywhere within the message parts.
The Boundary type is defined as follows:
-- constructor NOT exported
newtype Boundary = Boundary ByteString
deriving (Eq, Show)
unBoundary :: Boundary -> ByteString
unBoundary (Boundary s) = s
-- smart constructor; checks length and validity
makeBoundary :: ByteString -> Either ByteString BoundaryWe don’t export the constructor. Users must use the makeBoundary
smart constructor which checks that the input is a valid boundary
value.
We also instance the Uniform type class from
the random package (version 1.2.0 onwards).
This instance provides a convenient way for users to generate
conformant boundary values that have a negligible probability of
matching any line in an arbitrary message.
import qualified Data.ByteString as B
import qualified Data.ByteString.Internal as B
import qualified Data.ByteString.Char8 as C8
instance Uniform Boundary where
uniformM :: StatefulGen g m => g -> m a
uniformM g =
Boundary . B.unsafePackLenBytes 64 <$> randString
where
randString = replicateM 64 randChar
randChar = B.index bchars <$> randIndex
randIndex = uniformRM (0, B.length bchars - 1) g
bchars = C8.pack $
['a'..'z'] <> ['A'..'Z']
<> ['0'..'9'] <> "'()+_,-./:=?"A Uniform instance is supposed to draw from all possible values of
a type. In the Boundary instance we are only generating values of
length 64. This is acceptable for our use case but may surprise
some users.
The random library provides a very general interface to
instantiate and use random number generators. I cannot cover it in
any detail in this post. Assuming you already have a generator
value, System.Random.uniform generates a value
of any type with an instance of Uniform:
uniform :: (RandomGen g, Uniform a) => g -> (a, g)You can use uniform with
System.Random.getStdRandom to generate
values using a global pseudo-random number generated initialised
from system entropy, as an IO action:
getStdRandom :: MonadIO m => (StdGen -> (a, StdGen)) -> m a
getStdRandom :: (StdGen -> (a, StdGen)) -> IO a
getStdRandom uniform :: (MonadIO m, Uniform a) => m a
getStdRandom uniform :: (Uniform a) => IO aHedgehog and hidden constructors §
If a module does not expose the constructor of some type, how can the test suite generate random values of that type? There are several ways you could tackle this:
Export the constructor from some “internal” module, which is not really internal. In this way, library users may be discouraged—but not prevented—from constructing bad data. The test module can import the constructor from the library’s “internal” module and use it to define the generator.
Export a Hedgehog
Genfor the type from the library itself. This causes the library to depend on Hedgehog, which is usually not desirable.For a
newtype, useUnsafe.Coerce.unsafeCoercein theGendefinition to coerce the underlying type to the wrapped type. You cannot useData.Coerce.coerceif the constructor is not in scope. This is nasty, but not unspeakable given we’re talking about generators for the test suite.
- Export a “lightweight” random generator from the library, and
reuse it to define the
Genin the test suite. If you were going to export aUniform(orUniformRange) instance anyway, this will be low-effort. This approach is the main topic of this article.
Implementing Gen using Uniform §
I was aware that Hedgehog depends on random, and was hopeful of
finding a way to use the existing Uniform instance to implement a
Gen Boundary. Looking through the docs, I stumbled across
generate:
generate :: MonadGen m => (Size -> Seed -> a) -> m aIt was not immediately apparent whether I could use generate to
define a Gen Boundary. First, does Gen have an instance of
MonadGen?
type Gen = GenT Identity
Monad m => MonadGen (GenT m)Yes, it does. Next, I had to work out how to turn a Size and a
Seed into a Boundary. To my delight, I saw that Seed has an
instance of RandomGen. Putting it together, all that is required
is to apply uniform to the Seed, and discard the new generator
value. I ignore the Size.
import Hedgehog (Gen)
import Hedgehog.Internal.Gen (generate)
genBoundary :: Gen Boundary
genBoundary = generate (\_size seed -> fst (uniform seed))Disadvantages §
There are a few disadvantages to reusing a library’s random
generator to define your Hedgehog Gen.
First, the generated values are restricted to whatever the library’s
generator gives you. In my case, the Boundary generator only
generates values of length 64. It follows that Hedgehog could miss
all kinds of bugs. For example, if purebred-email fails to decode
boundaries of length 70 due to an off-by-one error, I have no hope
of catching that bug.
Second, generate gives you a Gen with no shrinks. If Hedgehog
finds a counterexample, it can’t do anything to try and simplify it.
Automatic shrinking is one of Hedgehog’ss killer features, but you
give it up by using generate.
You can use the shrink function to supply additional shrinking
behaviour to a Gen:
shrink :: MonadGen m => (a -> [a]) -> m a -> m a But when you don’t have access to the constructor for the data type
you’re generating, defining your own shrinks is at best awkward, and
maybe impossible. I could implement Boundary shrinking by
extracting the underlying ByteString (unBoundary), shrinking it,
applying the smart constructor (makeBoundary) and filtering
invalid values. That’s a lot of work. I didn’t bother.
Conclusion §
Defining Hedgehog Gen values can be awkward or very difficult for
types whose constructors are hidden. But if you have a function
that uses a RandomGen to generate values, you can use it with
Hedgehog’s generate function to define a Gen. The downsides are
that you don’t get automatic shrinking, and you are restricted to
whatever values the generator produces.
Alternative approaches include exposing the constructor via an
“internal” (but actually public) module, or using unsafeCoerce.