2013/03/12
And here comes random data generators
I just checked in data.random
, a collection of random
data generators and their combinators. The names of API functions
are not yet fixed, but I think the overall it's in a good shape.
(Since 0.9.4 is overdue, I might be going to release it without
making data.random
official. I'm not sure yet.)
Here's the code: http://gauche.git.sourceforge.net/git/gitweb.cgi?p=gauche/Gauche;a=blob;f=lib/data/random.scm;hb=HEAD
It provides a bunch of primitive random generators such as followings.
- uniform distribution
(integer size :optional (start 0))
returns a generator that produces random integer between start and start+size-1, uniformly.(integer-between lo hi)
returns a generator that produces random integer between lo and hi (both inclusive).int8
,uint8
etc. are preset generators to produce the range their name suggest.(char :optional cset)
returns a generator of random characters from a character set. When omitted, we use#[A-Za-z0-9]
as the default character set.- We also have
boolean
,real
,real-between
. - We want to have exact rational generators and complex generators, but I wonder how the range and distribution should be specified.
- nonuniform distribution
- For discrete sampling, we have geometric and poisson distribution.
- For continuous sampling, we have normal and exponential distribution.
Then, those generators can be combined to make more complex generators.
- random choice
(one-of generators)
returns a generator that picks one generator in generators randomly to produce the next value.(weighted-sample weight&generators)
allows you to specify weight of selection probability for each generators.
- aggregate data
(pair-of gen1 gen2)
,(tuple-of gen ...)
list-of
,vector-of
,string-of
- these combinators can be called in two different forms, e.g.(list-of sizer item-gen)
: sizer can be an integer, or an integer-generator, to give the length of the resulting list. item-gen is a generator to produce elements.(list-of item-gen)
: If sizer is omitted, we use some default generator to determine the length of the resulting list. Currently I use(poisson 4)
provisionally.
I also have permutation-of
and combination-of
, which takes
a list of items (not item generators).
What I like about the current shape is that those generators can be
combined using gauche.generator
framework as well; e.g. you
can have series of sum of two dice rolling by:
(gmap + (integer-between 1 6) (integer-between 1 6))
or apply a filter:
(gfilter (cut < 0 <> 1) (exponential 1))
or taking some values into a list:
(generator->list (poisson 5) 10)
Here are some elements about API I'm still pondering about:
- We have procedures that creates a generator (e.g.
integer
,real
,char
) and pre-created generators (e.g.fixnum
,int8
). Without the static typing support, this kind of layers could be confusing. Shall we use some naming convention to distinguish these two layers? - There's an idea rolling in my head to provide plural names as an
alias, e.g.
chars
forchar
. It plays nicely with the combinators, e.g.(list-of fixnums)
or(string-of 5 (chars))
. But I also feel this is just a superficial convenience; we double the number of exported names to get nothing added functionally. - The handling of omitted argument of
list-of
etc. is also different from Gauche's convention of optional arugments.
If you have data generator ideas to be thrown in to this module, let me know.
Now I'm writing a generative test framework, using this module as a data generators.
Tags: data.random, Generators
Post a comment