2013/05/09
Export-time renaming
Recently I implemented rename feature in the export
form.
With the import options (see Import options: part one and Import options: part two),
this completes the infrastructure to support R[67]RS's modules
on top of our module system.
The syntax of export-time renaming is the same as R[67]RS. If you have the following module:
(define-module example1 (export (rename foo boo)) (define foo 3))
Then the name foo
in example1 module can be
referred as boo
from the modules that imports it.
gosh> (import example1) #<undef> gosh> boo 1
* * *
During the course,
I changed ScmModule
structure to manage the exported symbols.
A module is a map from names (symbols) to locations (global locations, or GLOCs).Essentially it's just a hashtable. Visibility (whether a name can be seen from others that import this module) is an auxiliary information.
Initially ScmModule had a list of exported symbols. When an identifier was looked up, we scanned the exported list of each imported modules, and if we found a match, we looked up the hashtable to get the corresponding GLOC.
Obviously this didn't scale when export list got longer. So several years ago I switched to put a flag in each binding to indicate whether the symbol was exported. Then I only needed to look up a hashtable and check a flag. But what does the binding mean? Conceptually, it is the association of a name and a GLOC, which is a hash table entry in our implementation. There's no place to add a flag in a hash table entry itself. So I made GLOC to have the exported flag.
There I stepped into a shady area. If a GLOC can be shared among different modules, there might be a case that it's exported in one module but not in another. We didn't have such cases in good old days, but the import options introduced GLOC-sharing cases. It wasn't a problem so far, since import options operates only on exported symbols. Yet this kind of hack reeks, and may bites back down the road.
Then it comes to the export renaming. A GLOC can now have multiple names, and we need to choose which name to look up depending on whether we're searching exported symbols or not. A straightforward way is to have two tables, one for exported names, and another for internal names.
And if we have a separate table for exported names, then the mere fact that the name is registered to the table indicates the fact that the name is exported---we don't need an extra flag in GLOC. Yay!
So the flag is removed from GLOC, and a new table for exported names
is added to ScmModule. The module-exports
introspection API
returns a list of exported names for the backward compatibility,
but now it calculates the result list from the exported name table
every time it is called.
There's one caveat, caused by the openness of Gauche module.
In Gauche, a programmer can export symbols of existing modules
at any time, using with-module
. (During development, sometimes
I do (with-module foo.internal (export-all))
so that I can
call internal procedures of foo.internal
easily.)
This wasn't a problem before, since exporting already exported
symbol was just a no-op.
With export-time renaming, it's no longer true. An internal symbol
foo
may have been exported as boo
, but now it can be
exported as voo
. How should this situation be handled?
- Should the previous export be removed? I decided not. It's costly
to search if the internal symbol
foo
has been exported in another name. (we could have a reverse map, but it seems unnecessary complexity.) Plus, any code that counts on the external nameboo
may break. - What if another symbol has already been exported as
voo
? This also would break code that counts on the previousvoo
, but the operation may be intentional (e.g. hot-patching). I assume such case shouldn't happen in normal circumstances, but needed in emergency. So I make a warning issued but allow the meaning of external namevoo
to be updated to point tofoo
.
* * *
I already implemented R7RS module system on top of this, and I'll describe it in the next entry.
2013/03/19
When the inexact square root of an integer is exact
In Exact sqrt entry, I wrote that, for a natural number smaller than 2^53, we could use double-precision floating point sqrt
and take the answer exact if the result was an integer. As Mark H Weaver pointed out, it was wrong.
Suppose we have nonnegative integers N, M, m, and non-zero real number e, where M = m^2 < 2^53, N = (m*(1+e))^2 < 2^53.
If the absolute value of e isn't greater than 2^-53,
square root of N, which is m*(1+e), can be rounded to
m in the double-precision sqrt
calcluation.
The result becomes an integer, but not exact. That's the case
we want to exclude.
N - M = (2e+e^2)M. The maximum bound of (2e+e^2) is (2^-52 + 2^-106) when e = 2^-53, so if M is greater than 2^52, N - M can exceed 1 and we might incorrectly recognize N as a square number of m.
The greatest square number below 2^52 is (2^26-1)^2 = 4503599493152769, and (2^-52 + 2^-106)*4503599493152769 is 81129635996755064984528658366465/81129638414606681695789005144064, which is smaller than 1.
2^52 = 4503599627370496 is also a square number, and
calculating (sqrt (inexact (+ (expt 2 52) 1)))
yields
a rounded integer 67108864.0
.
But the lower bound, (-(2^52) + 2^-106)*4503599627370496 is
-18014398509481983/18014398509481984, which is grater than -1,
so applying inexact sqrt on 2^52-1 won't lead us to the rounded
integer. (Indeed, (sqrt (inexact (- (expt 2 52) 1)))
yields
67108863.99999999
.)
So we can use inexact square root when the input is smaller than 2^52.
Tags: Flonums, sqrt, exact-integer-sqrt
2013/03/16
Checking scripts
I write small scripts in Gauche, all the time. They're not so 'serious' as to require unit-tests and configure scripts and other scaffolds, but they're not one-time throwaway scripts, either. So it's annoying to find out a silly bug like misspelling in less-executed code paths later.
Here's a command I run to quickly check such scripts.
gosh -ugauche.test -l script-name -E "test-module 'user" -Eexit
Tag: Script
2013/03/12
And here comes random data generators
I just checked in data.random
, a collection of random
data generators and their combinators. The names of API functions
are not yet fixed, but I think the overall it's in a good shape.
(Since 0.9.4 is overdue, I might be going to release it without
making data.random
official. I'm not sure yet.)
Here's the code: http://gauche.git.sourceforge.net/git/gitweb.cgi?p=gauche/Gauche;a=blob;f=lib/data/random.scm;hb=HEAD
It provides a bunch of primitive random generators such as followings.
- uniform distribution
(integer size :optional (start 0))
returns a generator that produces random integer between start and start+size-1, uniformly.(integer-between lo hi)
returns a generator that produces random integer between lo and hi (both inclusive).int8
,uint8
etc. are preset generators to produce the range their name suggest.(char :optional cset)
returns a generator of random characters from a character set. When omitted, we use#[A-Za-z0-9]
as the default character set.- We also have
boolean
,real
,real-between
. - We want to have exact rational generators and complex generators, but I wonder how the range and distribution should be specified.
- nonuniform distribution
- For discrete sampling, we have geometric and poisson distribution.
- For continuous sampling, we have normal and exponential distribution.
Then, those generators can be combined to make more complex generators.
- random choice
(one-of generators)
returns a generator that picks one generator in generators randomly to produce the next value.(weighted-sample weight&generators)
allows you to specify weight of selection probability for each generators.
- aggregate data
(pair-of gen1 gen2)
,(tuple-of gen ...)
list-of
,vector-of
,string-of
- these combinators can be called in two different forms, e.g.(list-of sizer item-gen)
: sizer can be an integer, or an integer-generator, to give the length of the resulting list. item-gen is a generator to produce elements.(list-of item-gen)
: If sizer is omitted, we use some default generator to determine the length of the resulting list. Currently I use(poisson 4)
provisionally.
I also have permutation-of
and combination-of
, which takes
a list of items (not item generators).
What I like about the current shape is that those generators can be
combined using gauche.generator
framework as well; e.g. you
can have series of sum of two dice rolling by:
(gmap + (integer-between 1 6) (integer-between 1 6))
or apply a filter:
(gfilter (cut < 0 <> 1) (exponential 1))
or taking some values into a list:
(generator->list (poisson 5) 10)
Here are some elements about API I'm still pondering about:
- We have procedures that creates a generator (e.g.
integer
,real
,char
) and pre-created generators (e.g.fixnum
,int8
). Without the static typing support, this kind of layers could be confusing. Shall we use some naming convention to distinguish these two layers? - There's an idea rolling in my head to provide plural names as an
alias, e.g.
chars
forchar
. It plays nicely with the combinators, e.g.(list-of fixnums)
or(string-of 5 (chars))
. But I also feel this is just a superficial convenience; we double the number of exported names to get nothing added functionally. - The handling of omitted argument of
list-of
etc. is also different from Gauche's convention of optional arugments.
If you have data generator ideas to be thrown in to this module, let me know.
Now I'm writing a generative test framework, using this module as a data generators.
Tags: data.random, Generators
2013/03/11
Math fun
Recently I added a few math functions in the core of Gauche, for they were needed to implement some more domain-specific functions.
Just committed are gamma
and lgamma
,
Gamma function and natural logarithm of
absolute value of it.
(What prompted me to add them was the random variable of
Poisson distribution; the algorithm I found needed log(n!)
.
And the reason I was doing Poisson distribution thing was
that I was writing a general random data generator module and
I wanted to cover typical distributions. And the reason that
I was writing such a module was that they were needed for
generative test framework. It's like a domino effect; I'm writing
X, then finds X needs Y, then Y needs Z, etc...)
Some time ago I also added expt-mod
, that calculates modulus
exponentiation.
I used to be worried for adding too much stuff to the core library, and tried to put functions together into a module whenever I could attach a suitable name for the bunch. However, some functions are just fundamental enough that appear in different fields now and then, and the core library may be the suitable place for them, after all.
Comments (0)