2013/09/18
Macro system extension
I finally added syntax-rules
extensions (srfi:46) to Gauche,
that makes Gauche's hygienic macro system compatible to R7RS
(except a few known bugs).
The current hygienic macro expander is written in C which is
an ugly pile of spaghetti. Originally I planned to ditch
the legacy code and to write an explicit-renaming macro expander
as the new basis of our hygineic macro system, then
to implement syntax-rules
on top of it.
I like ER-macro since it's transparent to what it is doing for
hygienity. It doesn't necessary to be the easiest one to use---
destructuring the input form, then renaming identifiers explicitly
would be cumbersome for day-to-day programming.
But those things can be easily alleviated
by combining other tools. For example, we can just use util.match
matcher to destructure the input form (instead of yet another pattern
matcher tied to macro system).
In fact, in er-macro
branch in the repo I implemented ER-macro
expander to some extent. But it turned out I need some more time
to substitute the low-level macro layer completely.
A major issue is to keep compatibility between ER-macro, which
allows raw symbols inserted by the macro expander to capture
symbols in macro calls, and the current syntax-rules
implementation
which turns all symbols into identifiers.
(The same issue is described by mjt here,
in Japanese.)
Since I'd like to push out R7RS compatible release sooner, I just went into the legacy code and added some more spaghetti to make it work as srfi:46.
* * *
I realized this enhancement makes syntax-rules
a lot more
useful. I also adapted define-values
form to R7RS, which
allows generic formals, as follows:
(define-values (x y . z) (values 1 2 3 4)) z => (3 4)
With R7RS syntax-rules
it's not difficult to distinguish
proper list and inproper list (see Gauche:lib/gauche/defvalues.scm).
2013/08/01
.gaucherc
When gosh
is started in the interactive REPL mode, it loads
~/.gaucherc
if it exists. I suppose it may be handy
if the user needs his own local setup, even though I personally
haven't used the rc file yet---I guess it's a sort of traditional
Unix culture.
Recently I realized this feature interferes with R7RS mode.
The .gaucherc
file is loaded into #<module user>
,
but what's visible from the user module differs greatly
when gosh
is invoked with -r7
option. It'll be
quite difficult to write .gaucherc
that can work
both in traditional Gauche mode and r7rs mode.
(Note: I say r7rs mode and Gauche mode, but it's not that
there are two separate modes, except the planned reader compatibility
modes. You can load R7RS library from
standard Gauche program and load Gauche library from standard
R7RS program, no matter whether you start gosh
with -r7
option or not. The -r7
option merely specifies which
environment you're in at the time interactive REPL starts.)
I considered a few options:
- If
-r7
option is given, try to load a different rc file, e.g.~/.gaucherc-r7
. This option is less appealing: It scatters more rc files in the home directory. Besides, I expect things you want to do in rc file are likely to need to access Gauche-specific features (e.g.add-load-path
) and you can't do that easily from R7RS environment. You would need to create a separate module, e.g.mysetup.scm
for the setup code, then(import (mysetup))
from.gaucherc-r7
. - Let rc file be loaded in a module other than
user
, say,gauche.user
module. Then you can use Gauche features in.gaucherc
, regardless of-r7
option. This is clean, but adding a new module just for the rc file seems a bit overkill. Besides, it is incompatible to the current version if a user defines something in.gaucherc
and expect it visible from the user module. - Drop
.gaucherc
support. This is a tempting solution, for it makes things simpler. But who knows? Sometimes this kind of hook comes handy unexpectedly.
Eventually I settled on somewhat compromised design.
- We load
.gaucherc
to#<module user>
, as we have been doing. - When
gosh
is started with-r7
option, the initial module will be#<module r7rs.user>
, not#<module user>
.
It looks a bit ad-hoc solution, but let's give a shot.
2013/05/22
R7RS support
We don't have an official announcement yet, but it seems that R7RS is ratified. Yay! Great thanks to the WG members for long and hard work to realize it.
I couldn't participate in discussions as much as I did for R6RS mainly due to time constraints, but another reason is that I was generally happy about the drafts, unlike what I felt during R6RS development.
I don't hate R6RS; they have some parts I like (e.g. I/O system) and I expect them to be in R7RS-large. I just think R6RS was too ambitious; it tried so hard to plug all the loopholes that some of its parts were introduced prematurely, IMHO. R7RS-small isn't perfect; but it fixes some of the biggest shortcomings of R5RS and "good enough" to move on. I believe, in order to fix the remaining defects, it's better to wait quasi-standard SRFIs that are adopted by most active implementations. The standard can come later, merely to codify the de-facto and proven ways, as R7RS did for some SRFIs.
* * *
The developlemnt HEAD of Gauche already has some R7RS support.
If you invoke gosh as gosh -r7
, it starts REPL with
R7RS environment.
Currently it implicitly imports all the R7RS-small libraries.
You can also load files containing define-library
form.
(The -r7
option only sets up the default behavior, and
it's not that there's a distinct R7RS language mode. You'll be able
to use
R7RS library from Gauche code, and import
Gauche library from R7RS code. Aside from the reader mode
described below, the difference between R7RS and Gauche are
merely namespaces.)
However, it's not quite ready yet to load portable R7RS libraries.
The biggest obstacle is the lexical syntax---the \xNN;
style
escaping in strings and symbols are not supported yet, because
of the backward compatibility problem. Gauche has been using
\xNN
(two-digits fixed, no semicolon terminator) style.
It doesn't generally appear in the source code (the unicode escape,
\uNNNN
, is preferred), but it appears in datafiles dumped
by write
. Changing it would break existing datafiles,
which would be a disaster.
There are also a few minor reader incompatibilities. For example,
Gauche treats
single quote as delimiters, so abc'def
is parsed as a symbol
abc
and a list (quote def)
. In R7RS, this is a reader error.
My plan is to provide a few reader modes:
- Legacy Gauche: Completely backward compatible
- r7rs-compatible: Accepts both format, preferring r7rs when ambiguous
- r7rs-strict: Reject syntax that doesn't comply r7rs
There are also small number of unsupported library functions and syntaxes, which I'm implementing gradually at my spare time. See lib/r7rs.scm to check what aren't supported yet.
The high-level macro also need to be enhanced to comply R7RS.
Internal define-syntax
is yet to be supported.
* * *
The R7RS import
form works differently from Gauche's import
.
Gauche's one purely works on on-memory module objects and doesn't involve
loading files. R7RS import
is rather similar to Gauche's use
,
which is explained as require
and (Gauche's) import
.
I pondered a few options for some time: Overload import
form with dual functionalities? Change Gauche's import
so that
it work like R7RS import
? Finally I decided to implement
completely separate forms.
Gauche's import
is mostly used in define-module
form,
which isn't R7RS, so I expect there's not much confusion. We can
always rename Gauche's import
to something like import-module
in future.
2013/05/09
Export-time renaming
Recently I implemented rename feature in the export
form.
With the import options (see Import options: part one and Import options: part two),
this completes the infrastructure to support R[67]RS's modules
on top of our module system.
The syntax of export-time renaming is the same as R[67]RS. If you have the following module:
(define-module example1 (export (rename foo boo)) (define foo 3))
Then the name foo
in example1 module can be
referred as boo
from the modules that imports it.
gosh> (import example1) #<undef> gosh> boo 1
* * *
During the course,
I changed ScmModule
structure to manage the exported symbols.
A module is a map from names (symbols) to locations (global locations, or GLOCs).Essentially it's just a hashtable. Visibility (whether a name can be seen from others that import this module) is an auxiliary information.
Initially ScmModule had a list of exported symbols. When an identifier was looked up, we scanned the exported list of each imported modules, and if we found a match, we looked up the hashtable to get the corresponding GLOC.
Obviously this didn't scale when export list got longer. So several years ago I switched to put a flag in each binding to indicate whether the symbol was exported. Then I only needed to look up a hashtable and check a flag. But what does the binding mean? Conceptually, it is the association of a name and a GLOC, which is a hash table entry in our implementation. There's no place to add a flag in a hash table entry itself. So I made GLOC to have the exported flag.
There I stepped into a shady area. If a GLOC can be shared among different modules, there might be a case that it's exported in one module but not in another. We didn't have such cases in good old days, but the import options introduced GLOC-sharing cases. It wasn't a problem so far, since import options operates only on exported symbols. Yet this kind of hack reeks, and may bites back down the road.
Then it comes to the export renaming. A GLOC can now have multiple names, and we need to choose which name to look up depending on whether we're searching exported symbols or not. A straightforward way is to have two tables, one for exported names, and another for internal names.
And if we have a separate table for exported names, then the mere fact that the name is registered to the table indicates the fact that the name is exported---we don't need an extra flag in GLOC. Yay!
So the flag is removed from GLOC, and a new table for exported names
is added to ScmModule. The module-exports
introspection API
returns a list of exported names for the backward compatibility,
but now it calculates the result list from the exported name table
every time it is called.
There's one caveat, caused by the openness of Gauche module.
In Gauche, a programmer can export symbols of existing modules
at any time, using with-module
. (During development, sometimes
I do (with-module foo.internal (export-all))
so that I can
call internal procedures of foo.internal
easily.)
This wasn't a problem before, since exporting already exported
symbol was just a no-op.
With export-time renaming, it's no longer true. An internal symbol
foo
may have been exported as boo
, but now it can be
exported as voo
. How should this situation be handled?
- Should the previous export be removed? I decided not. It's costly
to search if the internal symbol
foo
has been exported in another name. (we could have a reverse map, but it seems unnecessary complexity.) Plus, any code that counts on the external nameboo
may break. - What if another symbol has already been exported as
voo
? This also would break code that counts on the previousvoo
, but the operation may be intentional (e.g. hot-patching). I assume such case shouldn't happen in normal circumstances, but needed in emergency. So I make a warning issued but allow the meaning of external namevoo
to be updated to point tofoo
.
* * *
I already implemented R7RS module system on top of this, and I'll describe it in the next entry.
Comments (7)