2013/05/22
R7RS support
We don't have an official announcement yet, but it seems that R7RS is ratified. Yay! Great thanks to the WG members for long and hard work to realize it.
I couldn't participate in discussions as much as I did for R6RS mainly due to time constraints, but another reason is that I was generally happy about the drafts, unlike what I felt during R6RS development.
I don't hate R6RS; they have some parts I like (e.g. I/O system) and I expect them to be in R7RS-large. I just think R6RS was too ambitious; it tried so hard to plug all the loopholes that some of its parts were introduced prematurely, IMHO. R7RS-small isn't perfect; but it fixes some of the biggest shortcomings of R5RS and "good enough" to move on. I believe, in order to fix the remaining defects, it's better to wait quasi-standard SRFIs that are adopted by most active implementations. The standard can come later, merely to codify the de-facto and proven ways, as R7RS did for some SRFIs.
* * *
The developlemnt HEAD of Gauche already has some R7RS support.
If you invoke gosh as gosh -r7, it starts REPL with
R7RS environment.
Currently it implicitly imports all the R7RS-small libraries.
You can also load files containing define-library form.
(The -r7 option only sets up the default behavior, and
it's not that there's a distinct R7RS language mode. You'll be able
to use R7RS library from Gauche code, and import
Gauche library from R7RS code. Aside from the reader mode
described below, the difference between R7RS and Gauche are
merely namespaces.)
However, it's not quite ready yet to load portable R7RS libraries.
The biggest obstacle is the lexical syntax---the \xNN; style
escaping in strings and symbols are not supported yet, because
of the backward compatibility problem. Gauche has been using
\xNN (two-digits fixed, no semicolon terminator) style.
It doesn't generally appear in the source code (the unicode escape,
\uNNNN, is preferred), but it appears in datafiles dumped
by write. Changing it would break existing datafiles,
which would be a disaster.
There are also a few minor reader incompatibilities. For example,
Gauche treats
single quote as delimiters, so abc'def is parsed as a symbol
abc and a list (quote def). In R7RS, this is a reader error.
My plan is to provide a few reader modes:
- Legacy Gauche: Completely backward compatible
- r7rs-compatible: Accepts both format, preferring r7rs when ambiguous
- r7rs-strict: Reject syntax that doesn't comply r7rs
There are also small number of unsupported library functions and syntaxes, which I'm implementing gradually at my spare time. See lib/r7rs.scm to check what aren't supported yet.
The high-level macro also need to be enhanced to comply R7RS.
Internal define-syntax is yet to be supported.
* * *
The R7RS import form works differently from Gauche's import.
Gauche's one purely works on on-memory module objects and doesn't involve
loading files. R7RS import is rather similar to Gauche's use,
which is explained as require and (Gauche's) import.
I pondered a few options for some time: Overload import
form with dual functionalities? Change Gauche's import so that
it work like R7RS import? Finally I decided to implement
completely separate forms.
Gauche's import is mostly used in define-module form,
which isn't R7RS, so I expect there's not much confusion. We can
always rename Gauche's import to something like import-module
in future.
2013/05/09
Export-time renaming
Recently I implemented rename feature in the export form.
With the import options (see Import options: part one and Import options: part two),
this completes the infrastructure to support R[67]RS's modules
on top of our module system.
The syntax of export-time renaming is the same as R[67]RS. If you have the following module:
(define-module example1 (export (rename foo boo)) (define foo 3))
Then the name foo in example1 module can be
referred as boo from the modules that imports it.
gosh> (import example1) #<undef> gosh> boo 1
* * *
During the course,
I changed ScmModule structure to manage the exported symbols.
A module is a map from names (symbols) to locations (global locations, or GLOCs).Essentially it's just a hashtable. Visibility (whether a name can be seen from others that import this module) is an auxiliary information.
Initially ScmModule had a list of exported symbols. When an identifier was looked up, we scanned the exported list of each imported modules, and if we found a match, we looked up the hashtable to get the corresponding GLOC.
Obviously this didn't scale when export list got longer. So several years ago I switched to put a flag in each binding to indicate whether the symbol was exported. Then I only needed to look up a hashtable and check a flag. But what does the binding mean? Conceptually, it is the association of a name and a GLOC, which is a hash table entry in our implementation. There's no place to add a flag in a hash table entry itself. So I made GLOC to have the exported flag.
There I stepped into a shady area. If a GLOC can be shared among different modules, there might be a case that it's exported in one module but not in another. We didn't have such cases in good old days, but the import options introduced GLOC-sharing cases. It wasn't a problem so far, since import options operates only on exported symbols. Yet this kind of hack reeks, and may bites back down the road.
Then it comes to the export renaming. A GLOC can now have multiple names, and we need to choose which name to look up depending on whether we're searching exported symbols or not. A straightforward way is to have two tables, one for exported names, and another for internal names.
And if we have a separate table for exported names, then the mere fact that the name is registered to the table indicates the fact that the name is exported---we don't need an extra flag in GLOC. Yay!
So the flag is removed from GLOC, and a new table for exported names
is added to ScmModule. The module-exports introspection API
returns a list of exported names for the backward compatibility,
but now it calculates the result list from the exported name table
every time it is called.
There's one caveat, caused by the openness of Gauche module.
In Gauche, a programmer can export symbols of existing modules
at any time, using with-module. (During development, sometimes
I do (with-module foo.internal (export-all)) so that I can
call internal procedures of foo.internal easily.)
This wasn't a problem before, since exporting already exported
symbol was just a no-op.
With export-time renaming, it's no longer true. An internal symbol
foo may have been exported as boo, but now it can be
exported as voo. How should this situation be handled?
- Should the previous export be removed? I decided not. It's costly
to search if the internal symbol
foohas been exported in another name. (we could have a reverse map, but it seems unnecessary complexity.) Plus, any code that counts on the external nameboomay break. - What if another symbol has already been exported as
voo? This also would break code that counts on the previousvoo, but the operation may be intentional (e.g. hot-patching). I assume such case shouldn't happen in normal circumstances, but needed in emergency. So I make a warning issued but allow the meaning of external namevooto be updated to point tofoo.
* * *
I already implemented R7RS module system on top of this, and I'll describe it in the next entry.
2013/03/19
When the inexact square root of an integer is exact
In Exact sqrt entry, I wrote that, for a natural number smaller than 2^53, we could use double-precision floating point sqrt and take the answer exact if the result was an integer. As Mark H Weaver pointed out, it was wrong.
Suppose we have nonnegative integers N, M, m, and non-zero real number e, where M = m^2 < 2^53, N = (m*(1+e))^2 < 2^53.
If the absolute value of e isn't greater than 2^-53,
square root of N, which is m*(1+e), can be rounded to
m in the double-precision sqrt calcluation.
The result becomes an integer, but not exact. That's the case
we want to exclude.
N - M = (2e+e^2)M. The maximum bound of (2e+e^2) is (2^-52 + 2^-106) when e = 2^-53, so if M is greater than 2^52, N - M can exceed 1 and we might incorrectly recognize N as a square number of m.
The greatest square number below 2^52 is (2^26-1)^2 = 4503599493152769, and (2^-52 + 2^-106)*4503599493152769 is 81129635996755064984528658366465/81129638414606681695789005144064, which is smaller than 1.
2^52 = 4503599627370496 is also a square number, and
calculating (sqrt (inexact (+ (expt 2 52) 1))) yields
a rounded integer 67108864.0.
But the lower bound, (-(2^52) + 2^-106)*4503599627370496 is
-18014398509481983/18014398509481984, which is grater than -1,
so applying inexact sqrt on 2^52-1 won't lead us to the rounded
integer. (Indeed, (sqrt (inexact (- (expt 2 52) 1))) yields
67108863.99999999.)
So we can use inexact square root when the input is smaller than 2^52.
Tags: Flonums, sqrt, exact-integer-sqrt
2013/03/16
Checking scripts
I write small scripts in Gauche, all the time. They're not so 'serious' as to require unit-tests and configure scripts and other scaffolds, but they're not one-time throwaway scripts, either. So it's annoying to find out a silly bug like misspelling in less-executed code paths later.
Here's a command I run to quickly check such scripts.
gosh -ugauche.test -l script-name -E "test-module 'user" -Eexit
Tag: Script

Comments (0)