# 2013/03/12

## And here comes random data generators

I just checked in `data.random`

, a collection of random
data generators and their combinators. The names of API functions
are not yet fixed, but I think the overall it's in a good shape.
(Since 0.9.4 is overdue, I might be going to release it without
making `data.random`

official. I'm not sure yet.)

Here's the code: http://gauche.git.sourceforge.net/git/gitweb.cgi?p=gauche/Gauche;a=blob;f=lib/data/random.scm;hb=HEAD

It provides a bunch of primitive random generators such as followings.

- uniform distribution
`(integer size :optional (start 0))`

returns a generator that produces random integer between start and start+size-1, uniformly.`(integer-between lo hi)`

returns a generator that produces random integer between lo and hi (both inclusive).`int8`

,`uint8`

etc. are preset generators to produce the range their name suggest.`(char :optional cset)`

returns a generator of random characters from a character set. When omitted, we use`#[A-Za-z0-9]`

as the default character set.- We also have
`boolean`

,`real`

,`real-between`

. - We want to have exact rational generators and complex generators, but I wonder how the range and distribution should be specified.

- nonuniform distribution
- For discrete sampling, we have geometric and poisson distribution.
- For continuous sampling, we have normal and exponential distribution.

Then, those generators can be combined to make more complex generators.

- random choice
`(one-of generators)`

returns a generator that picks one generator in*generators*randomly to produce the next value.`(weighted-sample weight&generators)`

allows you to specify weight of selection probability for each generators.

- aggregate data
`(pair-of gen1 gen2)`

,`(tuple-of gen ...)`

`list-of`

,`vector-of`

,`string-of`

- these combinators can be called in two different forms, e.g.`(list-of sizer item-gen)`

:*sizer*can be an integer, or an integer-generator, to give the length of the resulting list.*item-gen*is a generator to produce elements.`(list-of item-gen)`

: If*sizer*is omitted, we use some default generator to determine the length of the resulting list. Currently I use`(poisson 4)`

provisionally.

I also have `permutation-of`

and `combination-of`

, which takes
a list of items (not item generators).

What I like about the current shape is that those generators can be
combined using `gauche.generator`

framework as well; e.g. you
can have series of sum of two dice rolling by:

(gmap + (integer-between 1 6) (integer-between 1 6))

or apply a filter:

(gfilter (cut < 0 <> 1) (exponential 1))

or taking some values into a list:

(generator->list (poisson 5) 10)

Here are some elements about API I'm still pondering about:

- We have procedures that creates a generator (e.g.
`integer`

,`real`

,`char`

) and pre-created generators (e.g.`fixnum`

,`int8`

). Without the static typing support, this kind of layers could be confusing. Shall we use some naming convention to distinguish these two layers? - There's an idea rolling in my head to provide plural names as an
alias, e.g.
`chars`

for`char`

. It plays nicely with the combinators, e.g.`(list-of fixnums)`

or`(string-of 5 (chars))`

. But I also feel this is just a superficial convenience; we double the number of exported names to get nothing added functionally. - The handling of omitted argument of
`list-of`

etc. is also different from Gauche's convention of optional arugments.

If you have data generator ideas to be thrown in to this module, let me know.

Now I'm writing a generative test framework, using this module as a data generators.

Tags: data.random, Generators

Post a comment