2013/05/09

Export-time renaming

Recently I implemented rename feature in the export form. With the import options (see Import options: part one and Import options: part two), this completes the infrastructure to support R[67]RS's modules on top of our module system.

The syntax of export-time renaming is the same as R[67]RS. If you have the following module:

(define-module example1
  (export (rename foo boo))
  (define foo 3))

Then the name foo in example1 module can be referred as boo from the modules that imports it.

gosh> (import example1)
#<undef>
gosh> boo
1

＊＊＊

During the course, I changed ScmModule structure to manage the exported symbols.

A module is a map from names (symbols) to locations (global locations, or GLOCs).Essentially it's just a hashtable. Visibility (whether a name can be seen from others that import this module) is an auxiliary information.

Initially ScmModule had a list of exported symbols. When an identifier was looked up, we scanned the exported list of each imported modules, and if we found a match, we looked up the hashtable to get the corresponding GLOC.

Obviously this didn't scale when export list got longer. So several years ago I switched to put a flag in each binding to indicate whether the symbol was exported. Then I only needed to look up a hashtable and check a flag. But what does the binding mean? Conceptually, it is the association of a name and a GLOC, which is a hash table entry in our implementation. There's no place to add a flag in a hash table entry itself. So I made GLOC to have the exported flag.

There I stepped into a shady area. If a GLOC can be shared among different modules, there might be a case that it's exported in one module but not in another. We didn't have such cases in good old days, but the import options introduced GLOC-sharing cases. It wasn't a problem so far, since import options operates only on exported symbols. Yet this kind of hack reeks, and may bites back down the road.

Then it comes to the export renaming. A GLOC can now have multiple names, and we need to choose which name to look up depending on whether we're searching exported symbols or not. A straightforward way is to have two tables, one for exported names, and another for internal names.

And if we have a separate table for exported names, then the mere fact that the name is registered to the table indicates the fact that the name is exported---we don't need an extra flag in GLOC. Yay!

So the flag is removed from GLOC, and a new table for exported names is added to ScmModule. The module-exports introspection API returns a list of exported names for the backward compatibility, but now it calculates the result list from the exported name table every time it is called.

There's one caveat, caused by the openness of Gauche module.

In Gauche, a programmer can export symbols of existing modules at any time, using with-module. (During development, sometimes I do (with-module foo.internal (export-all)) so that I can call internal procedures of foo.internal easily.) This wasn't a problem before, since exporting already exported symbol was just a no-op.

With export-time renaming, it's no longer true. An internal symbol foo may have been exported as boo, but now it can be exported as voo. How should this situation be handled?

Should the previous export be removed? I decided not. It's costly to search if the internal symbol foo has been exported in another name. (we could have a reverse map, but it seems unnecessary complexity.) Plus, any code that counts on the external name boo may break.
What if another symbol has already been exported as voo? This also would break code that counts on the previous voo, but the operation may be intentional (e.g. hot-patching). I assume such case shouldn't happen in normal circumstances, but needed in emergency. So I make a warning issued but allow the meaning of external name voo to be updated to point to foo.

＊＊＊

I already implemented R7RS module system on top of this, and I'll describe it in the next entry.

Tags: Module, export

Comments (0)

2013/03/19

#permalink

When the inexact square root of an integer is exact

In Exact sqrt entry, I wrote that, for a natural number smaller than 2^53, we could use double-precision floating point sqrt and take the answer exact if the result was an integer. As Mark H Weaver pointed out, it was wrong.

Suppose we have nonnegative integers N, M, m, and non-zero real number e, where M = m^2 < 2^53, N = (m*(1+e))^2 < 2^53.

If the absolute value of e isn't greater than 2^-53, square root of N, which is m*(1+e), can be rounded to m in the double-precision sqrt calcluation. The result becomes an integer, but not exact. That's the case we want to exclude.

N - M = (2e+e^2)M. The maximum bound of (2e+e^2) is (2^-52 + 2^-106) when e = 2^-53, so if M is greater than 2^52, N - M can exceed 1 and we might incorrectly recognize N as a square number of m.

The greatest square number below 2^52 is (2^26-1)^2 = 4503599493152769, and (2^-52 + 2^-106)*4503599493152769 is 81129635996755064984528658366465/81129638414606681695789005144064, which is smaller than 1.

2^52 = 4503599627370496 is also a square number, and calculating (sqrt (inexact (+ (expt 2 52) 1))) yields a rounded integer 67108864.0. But the lower bound, (-(2^52) + 2^-106)*4503599627370496 is -18014398509481983/18014398509481984, which is grater than -1, so applying inexact sqrt on 2^52-1 won't lead us to the rounded integer. (Indeed, (sqrt (inexact (- (expt 2 52) 1))) yields 67108863.99999999.)

So we can use inexact square root when the input is smaller than 2^52.

Tags: Flonums, sqrt, exact-integer-sqrt

Comments (0)

2013/03/16

#permalink

Checking scripts

I write small scripts in Gauche, all the time. They're not so 'serious' as to require unit-tests and configure scripts and other scaffolds, but they're not one-time throwaway scripts, either. So it's annoying to find out a silly bug like misspelling in less-executed code paths later.

Here's a command I run to quickly check such scripts.

gosh -ugauche.test -l script-name -E "test-module 'user" -Eexit

Tag: Script

Comments (0)

2013/03/12

#permalink

And here comes random data generators

I just checked in data.random, a collection of random data generators and their combinators. The names of API functions are not yet fixed, but I think the overall it's in a good shape. (Since 0.9.4 is overdue, I might be going to release it without making data.random official. I'm not sure yet.)

Here's the code: http://gauche.git.sourceforge.net/git/gitweb.cgi?p=gauche/Gauche;a=blob;f=lib/data/random.scm;hb=HEAD

It provides a bunch of primitive random generators such as followings.

uniform distribution
- (integer size :optional (start 0)) returns a generator that produces random integer between start and start+size-1, uniformly.
- (integer-between lo hi) returns a generator that produces random integer between lo and hi (both inclusive).
- int8, uint8 etc. are preset generators to produce the range their name suggest.
- (char :optional cset) returns a generator of random characters from a character set. When omitted, we use #[A-Za-z0-9] as the default character set.
- We also have boolean, real, real-between.
- We want to have exact rational generators and complex generators, but I wonder how the range and distribution should be specified.
nonuniform distribution
- For discrete sampling, we have geometric and poisson distribution.
- For continuous sampling, we have normal and exponential distribution.

Then, those generators can be combined to make more complex generators.

random choice
- (one-of generators) returns a generator that picks one generator in generators randomly to produce the next value.
- (weighted-sample weight&generators) allows you to specify weight of selection probability for each generators.
aggregate data
- (pair-of gen1 gen2), (tuple-of gen ...)
- list-of, vector-of, string-of - these combinators can be called in two different forms, e.g.
  - (list-of sizer item-gen): sizer can be an integer, or an integer-generator, to give the length of the resulting list. item-gen is a generator to produce elements.
  - (list-of item-gen): If sizer is omitted, we use some default generator to determine the length of the resulting list. Currently I use (poisson 4) provisionally.

I also have permutation-of and combination-of, which takes a list of items (not item generators).

What I like about the current shape is that those generators can be combined using gauche.generator framework as well; e.g. you can have series of sum of two dice rolling by:

    (gmap + (integer-between 1 6) (integer-between 1 6))

or apply a filter:

    (gfilter (cut < 0 <> 1) (exponential 1))

or taking some values into a list:

    (generator->list (poisson 5) 10)

Here are some elements about API I'm still pondering about:

We have procedures that creates a generator (e.g. integer, real, char) and pre-created generators (e.g. fixnum, int8). Without the static typing support, this kind of layers could be confusing. Shall we use some naming convention to distinguish these two layers?
There's an idea rolling in my head to provide plural names as an alias, e.g. chars for char. It plays nicely with the combinators, e.g. (list-of fixnums) or (string-of 5 (chars)). But I also feel this is just a superficial convenience; we double the number of exported names to get nothing added functionally.
The handling of omitted argument of list-of etc. is also different from Gauche's convention of optional arugments.

If you have data generator ideas to be thrown in to this module, let me know.

Now I'm writing a generative test framework, using this module as a data generators.

Tags: data.random, Generators

Comments (0)

2013/03/11

#permalink

Math fun

Recently I added a few math functions in the core of Gauche, for they were needed to implement some more domain-specific functions.

Just committed are gamma and lgamma, Gamma function and natural logarithm of absolute value of it.

(What prompted me to add them was the random variable of Poisson distribution; the algorithm I found needed log(n!). And the reason I was doing Poisson distribution thing was that I was writing a general random data generator module and I wanted to cover typical distributions. And the reason that I was writing such a module was that they were needed for generative test framework. It's like a domino effect; I'm writing X, then finds X needs Y, then Y needs Z, etc...)

Some time ago I also added expt-mod, that calculates modulus exponentiation.

I used to be worried for adding too much stuff to the core library, and tried to put functions together into a module whenever I could attach a suitable name for the bunch. However, some functions are just fundamental enough that appear in different fields now and then, and the core library may be the suitable place for them, after all.

Tags: gamma, expt-mod

Comments (0)

More entries ...

Gauche Devlog

About

Author

Recent Entries

Recent Comments

Recent Releases

2013/05/09

Export-time renaming

2013/03/19

When the inexact square root of an integer is exact

2013/03/16

Checking scripts

2013/03/12

And here comes random data generators

2013/03/11

Math fun