And here comes random data generators
I just checked in
data.random, a collection of random
data generators and their combinators. The names of API functions
are not yet fixed, but I think the overall it's in a good shape.
(Since 0.9.4 is overdue, I might be going to release it without
data.random official. I'm not sure yet.)
It provides a bunch of primitive random generators such as followings.
- uniform distribution
(integer size :optional (start 0))returns a generator that produces random integer between start and start+size-1, uniformly.
(integer-between lo hi)returns a generator that produces random integer between lo and hi (both inclusive).
uint8etc. are preset generators to produce the range their name suggest.
(char :optional cset)returns a generator of random characters from a character set. When omitted, we use
#[A-Za-z0-9]as the default character set.
- We also have
- We want to have exact rational generators and complex generators, but I wonder how the range and distribution should be specified.
- nonuniform distribution
- For discrete sampling, we have geometric and poisson distribution.
- For continuous sampling, we have normal and exponential distribution.
Then, those generators can be combined to make more complex generators.
- random choice
(one-of generators)returns a generator that picks one generator in generators randomly to produce the next value.
(weighted-sample weight&generators)allows you to specify weight of selection probability for each generators.
- aggregate data
(pair-of gen1 gen2),
(tuple-of gen ...)
string-of- these combinators can be called in two different forms, e.g.
(list-of sizer item-gen): sizer can be an integer, or an integer-generator, to give the length of the resulting list. item-gen is a generator to produce elements.
(list-of item-gen): If sizer is omitted, we use some default generator to determine the length of the resulting list. Currently I use
I also have
combination-of, which takes
a list of items (not item generators).
What I like about the current shape is that those generators can be
gauche.generator framework as well; e.g. you
can have series of sum of two dice rolling by:
(gmap + (integer-between 1 6) (integer-between 1 6))
or apply a filter:
(gfilter (cut < 0 <> 1) (exponential 1))
or taking some values into a list:
(generator->list (poisson 5) 10)
Here are some elements about API I'm still pondering about:
- We have procedures that creates a generator (e.g.
char) and pre-created generators (e.g.
int8). Without the static typing support, this kind of layers could be confusing. Shall we use some naming convention to distinguish these two layers?
- There's an idea rolling in my head to provide plural names as an
char. It plays nicely with the combinators, e.g.
(string-of 5 (chars)). But I also feel this is just a superficial convenience; we double the number of exported names to get nothing added functionally.
- The handling of omitted argument of
list-ofetc. is also different from Gauche's convention of optional arugments.
If you have data generator ideas to be thrown in to this module, let me know.
Now I'm writing a generative test framework, using this module as a data generators.