2013/03/12
And here comes random data generators
I just checked in data.random, a collection of random
data generators and their combinators. The names of API functions
are not yet fixed, but I think the overall it's in a good shape.
(Since 0.9.4 is overdue, I might be going to release it without
making data.random official. I'm not sure yet.)
Here's the code: http://gauche.git.sourceforge.net/git/gitweb.cgi?p=gauche/Gauche;a=blob;f=lib/data/random.scm;hb=HEAD
It provides a bunch of primitive random generators such as followings.
- uniform distribution
(integer size :optional (start 0))returns a generator that produces random integer between start and start+size-1, uniformly.(integer-between lo hi)returns a generator that produces random integer between lo and hi (both inclusive).int8,uint8etc. are preset generators to produce the range their name suggest.(char :optional cset)returns a generator of random characters from a character set. When omitted, we use#[A-Za-z0-9]as the default character set.- We also have
boolean,real,real-between. - We want to have exact rational generators and complex generators, but I wonder how the range and distribution should be specified.
- nonuniform distribution
- For discrete sampling, we have geometric and poisson distribution.
- For continuous sampling, we have normal and exponential distribution.
Then, those generators can be combined to make more complex generators.
- random choice
(one-of generators)returns a generator that picks one generator in generators randomly to produce the next value.(weighted-sample weight&generators)allows you to specify weight of selection probability for each generators.
- aggregate data
(pair-of gen1 gen2),(tuple-of gen ...)list-of,vector-of,string-of- these combinators can be called in two different forms, e.g.(list-of sizer item-gen): sizer can be an integer, or an integer-generator, to give the length of the resulting list. item-gen is a generator to produce elements.(list-of item-gen): If sizer is omitted, we use some default generator to determine the length of the resulting list. Currently I use(poisson 4)provisionally.
I also have permutation-of and combination-of, which takes
a list of items (not item generators).
What I like about the current shape is that those generators can be
combined using gauche.generator framework as well; e.g. you
can have series of sum of two dice rolling by:
(gmap + (integer-between 1 6) (integer-between 1 6))
or apply a filter:
(gfilter (cut < 0 <> 1) (exponential 1))
or taking some values into a list:
(generator->list (poisson 5) 10)
Here are some elements about API I'm still pondering about:
- We have procedures that creates a generator (e.g.
integer,real,char) and pre-created generators (e.g.fixnum,int8). Without the static typing support, this kind of layers could be confusing. Shall we use some naming convention to distinguish these two layers? - There's an idea rolling in my head to provide plural names as an
alias, e.g.
charsforchar. It plays nicely with the combinators, e.g.(list-of fixnums)or(string-of 5 (chars)). But I also feel this is just a superficial convenience; we double the number of exported names to get nothing added functionally. - The handling of omitted argument of
list-ofetc. is also different from Gauche's convention of optional arugments.
If you have data generator ideas to be thrown in to this module, let me know.
Now I'm writing a generative test framework, using this module as a data generators.
Tags: data.random, Generators

Post a comment