2010/05/14

Supporting general transformers: Step 1

The development of Gauche is a kind of streched bootstrapping process. Initially, the entire VM, compiler, and most of builtin procedures are written in C (with a bit help of STk to generate stub code). That's because I needed a reasonable performance from the beginning to use Gauche in the production; the initial VM was not well tuned, and if I had written a compiler in Gauche from the beginning, the rudimental version would've been too slow for my purpose.

Gradually I rewrote VM (still in C, but most part is written in a sort of DSL using S-expressions), then the compiler (almost entirely written in Gauche now). The initial optimization target of the compiler was the compiler itself, and it worked well. I'm also in the process of gradually rewriting builtin procedures in Scheme, whenever doing so doesn't affect overall performance.

So the VM and the compiler has been rewritten. But there's one component left untouched: The syntax-rules expander. It's a nasty spaghetti of C code I did't dare to touch. There were portable Scheme expanders, but I was afraid that they were not optimal to run on Gauche---I need a fast macro expander, since Gauche needs to compile on the fly. I planned to write an expander from scratch tuned to take advantage of Gauche's runtime.

The change I committed today is the very first step of rewriting macro subsystem. It still uses old syntax-rules expander, but it decouples the expander from the compiler. The compiler used to recognize syntax-rules as a part of define-syntax etc. That is, the syntax-rules form alone didn't mean anything to the compiler:

gosh> (syntax-rules () [(_ x) 'x])
*** ERROR: unbound variable: x

Now syntax-rules itself is a macro, which evaluates to a macro transformer.

gosh> (syntax-rules () [(_ x) 'x])
#<macro #f>

Syntactic bindings such as define-syntax, let-syntax and letrec-syntax are changed so that it evaluates rhs in the compile-time environment (which is supposed to yield a macro transformer) and creates a syntactic binding to the given name.

★ ★ ★

An interesting outcome is that this change officially supports aliasing syntactic/macro keywords.

Gauche doesn't separate compile-time global bindings and runtime global bindings, so evaluating a syntax/macro keyword reveals the syntax handlers or macro transformers as a first-class value.

gosh> if
#<syntax if>
gosh> let1
#<macro let1>

And you can rebind those transformers to another global variable as if they are runtime bindings:

gosh> (define xif if)    ; don't do this!
xif
gosh> (xif #f (error "oops") 'ok)   ; works like if
ok

This is an unintended artifact, relying on the fact that Gauche compiles each toplevel form right before executing it in normal mode of operation. This hasn't been encouraged, however, since it mixes phases and will break unexpectedly when the timing of compilation and execution is changed. In fact, it doesn't work if both form is enclosed in a single toplevel form:

gosh> (begin (define yif if) (yif #f (error "oops") 'ok))
*** ERROR: oops

It also doesn't work if the file with those forms are precompiled. (Precompilation is not officially documented, for there are still unreliable behaviors in general cases. But quite a few built-in procedures and the compiler itself are precompiled into arrays of VM instructions).

With today's change, the rhs of define-syntax etc. can be any Scheme expression as far as it yields a macro transformer (or a syntactic handler; I won't go into details of difference of two for now.) Now, this is a proper way to give an alias to a syntactic/macro keyword:

gosh> (define-syntax zif if)

★ ★ ★

I'm still pondering the interface of macro transformers.

Internally, the current implementation uses a procedure that receives a source form and the compile-time enviornment, and returns an expanded form with possible syntactic annotations. But the compile-time enviornment is a private structure to the compiler and I don't want to expose such internal guts to the programmers, so I want to wrap the transformer with some nice abstraction.

A possible choice is the model defined in R6RS--- a macro transformer is a procedure that receives a syntax object and returns a syntax object. Which itself is ok (I can put the source form and compiler environment together into an opaque object). But the more I read the R6RS, the less I want to follow the spec... There are various ways the transformer is called (whether it receives entire form or just a keyword, or whether the keyword can appear in lhs of set!) and the rules to detemine which one is used looks somewhat ad-hoc. I understand those variations are needed to support identifier macros and variable transformers. I just hope they are designed as utility APIs on top of a simpler axiom, something you can say in one sentence.

Actually, it is not very clear to me that R6RS intends those transformer API as the API, or just one of possible APIs. The definition of transformer procedure is not in the main report at all, but rather in the library report. It's as though this particular definition of transformers are attached to (rnrs syntax-case (6)) library. Is an implementation allowed to provide different transformer interface, e.g. explicit renaming one, if the implementation imports other library? It doesn't look straightforward, since define-syntax etc. need to switch the interpretation of transformer procedures depeding on which library is imported. It would be more reasonable that the transfomer library also provides syntactic binding forms, or provide a intermediate macro to translate whatever system-internal transfomer API into the explicit transformer API:

(define-syntax foo
  (sc-transformer
    (lambda (expr)
      (syntax-case expr
        ....))))

I think MIT-scheme have something like this, and some other implementations too.

Probably I'll take this approach. My current plan is to provide an explicit renaming transformer as a basis, and build a syntax-case one on top of it. Alex Shinn's Chibi-Scheme does that, I believe. The only question is if I can get it fast enough, and that's what I need to implement it to see.

Tags: 0.9.1, macro, r6rs

Comments (0)

2010/05/06

#permalink

Import options: part two

In the previous article I introduced 0.9.1's import options. Now I describe how it is implemented currently.

How names are searched for in modules

Modules have two kinds of relations. Importing is the way to use one module from another module; importing-imported relationship forms a directed graph, possibly contains cycles. Inheriting is the way to extend existing module(s) to add something; like augumenting existing modules with new bindings, or making a single facade of a bunch of modules. Inheritance is handled the same way as the class inheritance, and forms a directed acyclic graph.

Importing is formed by import, and it is not transitive. If module X imports module Y which imports module Z, X only sees Y's exported bindings but not Z's. Inheritance is formed by extend, and it is transitive. (ref:import, ref:extend).

Module inheritance is less used than imports (except that all modules inherits gauche module by default), but it comes handy time to time. And implementing import options is one of the times inheritance comes handy unexpectedly.

Suppose you set up modules as follows:

(define-module P (extend Q R))
(define-module S (extend T U))
(define-module V (extend W X))

(define-module A
  (import V S P)
  (extend B C))

The following figure shows how a name used in module A is searched for . Red numbers are the order of the search. (It doesn't show the default inherited modules like gauche, scheme etc.)

Note that this search occurs at most twice per global name; once in compile time to see if the name has a syntactic binding, and another in the first time the code is run. Once the name is resolved, the result is cached and never be searched again.

Thus, although the modules are open, once the name is resolved you cannot insert shadowing binding into the module between the current one and the one the name is resolved in. This is a trade-off between speed and flexibility; if you want the new shadowing binding to be reflected, you can always reload the current module.

Supporting 'prefix'

A 'prefix' slot is added to a module. When a name is searched into the prefixed module, the prefix is stripped from the name first---if the name doesn't have the prefix, we can stop serching of that particular path on the imported module.

A prefix is attached by the importing module, so we cannot modify the imported module itself (Another module may import the same module with different prefix or without prefix at all). The 'prefix' import option creates an anonymous module that inherits the imported moudle, and import the created module instead. The created module behaves the same as the original imported module, except that it strips the prefix.

Supporting 'except'

except is also done with module inheritance, and another new feature, a negative binding. It is a special binding that answers "no, the name doesn't have a binding along this path" when a name is looked for in it.

So, the except option creates an anonymous module inheriting the imported module, and inserts the negative bindings of the names listed in the option. When one of the names are searched for, the search is gave up at this anonymous module. For other exported names, the search is continued to the ancestor modules and eventually gets a hit.

An except and a prefix can be combined to one anonymous modules. The difference of which one comes first can be reflected to the names inserted into the anonymous module: If prefix comes first, we insert the negative bindings with the name the prefix is stripped, since the search process strips the prefix before looking into this module. If except comes first, we insert the negative bindings with the name as they are.

Supporting 'only'

For the only option, we create an anonymous module that do not inherit anything, and inserts the bindings with the listed names from the imported module.

What we actually do is taking a gloc object which is a value of the module hash table keyed by a name, and registering the name and the gloc object as a new binding in the anonymous module. So the two bindings share the same gloc object. It allows the importing module to see whenever the original binding is modified by set!.

In effect, we create aliases to exising global bindings, although in only case the aliased name is the same, only visibility differs.

Supporting 'rename'

rename was the trickiest. Not only making a new names visible, but also it must make sure that importing modules won't see the original names before renaming. The interaction with prefix is also nasty, for we may want to see the renamed symbol with or without prefix, depending on which option comes first.

We create an anonymous module, inheriting the imported module. Then creates an alias binding in the same way as only, but using the renamed name instead of the original name. Next, we insert negative bindings to the original names, except the original names that are used as renamed names as well.

For example, when we have this crazy setting:

(import (M :rename ((kar kdr) (kdr kar) (kons snok))))

We insert the following bindings to the anonymous module:

an alias binding of kdr (which shares the gloc with the name kar in the module M)
an alias binding kar (which shares the gloc with the name kdr in the module M)
an alias binding 'snok (which shares the gloc with the name kons in the module M)
a negative binding of kons

Since it inherits the imported module M, searching all other names falls into M.

If prefix is added after the rename option, after-rename names also get prefix, so we can just add prefix to the anonymous module.

If prefix is added before the rename option, however, we have to make the after-rename names without prefix. So we need two anonymous modules, the first one for prefixing, and the second one with the renaming setup, inheriting the first one.

Tags: 0.9.1, import, module

Comments (0)

2010/05/03

#permalink

Import options: part one

A bit of background

Gauche's module system is based on STk's. What I liked about STk's module system was its simplicity and flexibility.

It's simple since all the forms don't have lots of options. Compare it with CL's defpackage---I always have to look up the manual whenever I write a new defpackage form.

It's flexible because modules are open---I can add definitions to existing modules later, or even alter the definition afterwards. Altering the definitions comes handy when you have to patch existing library, but you don't have permission to change the installed libraries, or don't want to risk affecting other programs by changing shared libraries. In your program you can just put some code like the following:

(with-module target-module
  (define (foo  ...)
    ...fixed definition..))

This (I think it is called monkey patching) isn't recommended in the final code to be shipped, but sometimes you have to step outside of "the Right Things" to fix the holes in emergency.

However, the system lacks some handy features like prefixing imported symbols. The limit gets in the programmer's way more often as we have more libraries, and the possibility to import from two modules that exports the same name grows.

I've been aware with this issue for long time, but procrastinating to implement it, for I want to avoid neither a complex beast that everybody have to look up manuals constantly, nor a half-baked solution that covers simple cases but falls short and nees to be reworked in practical situations. As R6RS was finalized, and it offered a module system with various options, that became my reference point in terms of implementing features.

In 0.9.1 the feature will be finally available. For a taster, the following code says load srfi-1 but only import iota and fold from it, prefixing them with srfi-1:.

(use srfi-1 :only (iota fold) :prefix srfi-1:)

(srfi-1:fold + 0 (srfi-1:iota 100 1)) => 5050

Challenge of open module

I eventually want to provide R6RS-compatible layer, so putting enough functionarity to support R6RS library form is one of the goals. However, R6RS specification poses some challenges to open module system like ours.

R6RS has a concept of import-set which is a set of names to import, and defines only, except, prefix and rename as operations on the sets; that is, each option takes a set of names, and returns a modified set of names. It is easy to explain, and straightforward to implement if you know all the names you are dealing with.

The if is the problem in our open modules, since we don't know the complete set of names when we process import forms. For example, except cannot be just a set-difference operation; if a new exported binding is added that is not listed in the except list later, that binding should become visible in the importing module.

I regret that I didn't discuss more on this during R6RS review process. Being simple to explain is a virtue, but the way of R6RS covers too broad space than necessary; for example, you can nest arbitrary number of prefix and rename forms. Suppose library lib exports names w, x, y, and z. If you import the library like the following, can you figure out what names you use to access to the original names?

(import (rename (prefix (rename (prefix (lib) n:)
                                (n:x y) (n:y x))
                        m:)
                (m:n:z z) (m:x y)))

If I understand R6RS correctly, it'll be like this:

imported	original
y	y
m:y	x
z	z
m:n:w	w

It's hard to imagine when this kind of setting up is useful, but even if you really need this kind of multiple layering, I imagine it is doable by writing an intermediate module to remapping names. Probably nobody would want to write nested prefix/renames. But R6RS compatible implementations need to support them.

In the R6RS world, modules are closed, and the set of names is fixed when you process imports. So that's not a defect of R6RS per-se, but I like the designs more that encourages alternative views of the worlds, instead of putting hurdles to them.

Anyway, when we process import forms, we don't know yet the entire set of the names to import. We don't want to recalculate the set whenever a new exported binding is added to a module, or whenever we search for the imported module with those options.

So I employed a few tricks.

For prefixing, we ran operation in reverse. That is, when we search into an imported module with prefix, we strip the prefix from the searching symbol first and look for the stripped symbol in the module.
For only and inserted bindings by rename, the import form creates an anonymous intermediate module to which necessary names are injected. This is one-time cost at processing import and doesn't cost at symbol lookup.
For except and hidden binding by rename, we also use an intermedidate module that has a special shadow binding that prevents name searching further into the module's ancestors. This is also one-time cost at processing import and doesn't cost at symbol lookup.

The implementation is a bit more compilcated than I like, but it doesn't seem to have too much impact in performance. Espcecially, if you don't use prefix, overhead is negligible.

New import and use form

The import form is extended as follows:

<import-form> : (import <import-spec> ...)

<import-spec> : <module-name>
              | (<module-name> <import-option> ...)

<import-option> : :only (<symbol> ...)
                | :except (<symbol> ...)
                | :rename ((<symbol> <symbol>) ...)
                | :prefix <symbol>

<module-name> : <symbol>

use form is also extended to accept import options. You don't need extra parentheses, for use takes only one modules (note that import can take multiple modules, that's why we needed parens).

<use-form> : (use <module-name> <import-option> ...)

The option modifies imported symbols as the way it appears, so the order matters. The following two import forms are equivalent, both make iota available in the current module under the name srfi-1:iota.

   (use srfi-1 :only (iota) :prefix srfi-1:)

   (use srif-1 :prefix srfi-1: :only (srfi-1:iota))

In the latter form, symbols in :only option must be prefixed since they are already prefixed in the previous :prefix option.

I think it is a good idea to put :only and :except option always before :prefix, for less confusion.

On the other hand, you may need both orders of :rename and :prefix, depending on what you want. If you put :prefix clause after :rename, the renamed identifier gets prefix as well:

   (use srfi-1 :rename ((iota i)) :prefix srfi-1:)
   
   srfi-1:i => #<iota>
   srfi-1:fold => #<fold>

If you put :prefix first, you can import renamed symbols without prefix:

   (use srfi-1 :prefix srfi-1: :rename ((srfi-1:iota i)))
   
   i => #<iota>
   srfi-1:fold => #<fold>

The contrived complex imports above can be written in our syntax as follows, though I don't recommend it.

(use lib
     :prefix n:
     :rename ((n:x y) (n:y x))
     :prefix m:
     :rename ((m:n:z z) (m:x y)))

In the next entry, I'd like to explain how this is implemented in Gauche. Stay tuned.

Tags: 0.9.1, import, use, r6rs

Comments (0)

2010/05/01

#permalink

Records and util.match

This is not a new feature for 0.9.1; it's been there for long time. But I rediscover it today with a pleasant surprise, so it may be worth to note about it, for I expect to use records more, as srfi-99 is supported in 0.9.1 (cf. ERR5RS Records and beyond).

The pattern matcher util.match can be used to match records. The feature already existed in the original Andrew Wright's match, and I believe it's available for many other Scheme implementations.

(use gauche.record)
(use util.match)
(use math.const)  ; for pi

(define-record-type circle #t #t radius)

(define-record-type rectangle #t #t width height)

(define (area shape)
  (match shape
    [($ circle r) (* pi r r)]
    [($ rectangle w h) (* w h)]))

gosh> (area (make-circle 5))
78.53981633974483
gosh> (area (make-rectangle 4 8))
32

($ class var ...) matches the instance of class, with binding slot values to var ... in order.

This pattern can be used for any instances, nevertheless I haven't used it much before. I care less about the order of slots when I think in terms of classes. Slots are always accessed by thier name, and I freely change the order of slots in class definitions, so it's a bit cumbersome to have patterns that depend on the order of slots.

However, records' slots feel more positional---maybe because their default constructor takes positional arguments to initialize slots. Also their emphasis of immutability reminds me the style of other functional languages.

If I find myself using this pattern more and more, it'll be worth to optimize it. Currently it looks up n-th slot name in the list of slots of the matching object, then uses slot-ref to obtain a value. For general objects we need to do that, since associations of the position and the slot can be changed by class redefinition. But for records we don't need to worry about class redefinition, and internally it's much faster to access records' slot via its position instead of its name.

Tags: gauche.record, srfi-99, util.match

Comments (0)

2010/04/30

#permalink

Extended formals

One thing I miss most when I hop back to Scheme from CL is CL's lambda list for optional and keyword arguments. I feel the CL's spec is too complicated for my taste (CLHS specifies 10 different kind of lambda lists), nevertheless I know it is useful.

Gauche has been providing argument parsing utility macros, let-optionals*, let-keywords and let-keywords* (ref:let-optionals*, ref:let-keywords*). These cover enough functionalities to deal with optional and keyword arguments. Yet they are different from being able to specify those arguments directly in the formals. One thing is that those macros makes code longer. Another thing is that I feel specifying them directly in the formals somewhat makes them more a part of the public contract of the procedure. They stand out in the source code, claiming that the procedure takes two required argments and three optional arguments, something like that. It makes easier to read the source. Which is precious.

So I've been secretly experimenting the CL-like extended formal list for almost two years. Now I'm convinced that it has enough advantages to be in officially.

Actually, there's a SRFI for extended formals (srfi:89). It introduces new forms, define* and lambda*, that recognize extended syntax for optional and named (keyword) arguments.

Providing new forms is a polite way---it leaves original Scheme intact, and won't step on existing code accidentally. It is highly desirable for a portable library, and understandable that srfi-89 took that path.

However, for this feature, I rather opted to extend define and lambda. Support of extended formals is an upper compatible change (I mean, even with this extention, proper R5RS programs runs just fine), and having different forms only to be polite makes the language unnecessarily complex. Another important advantage is that, by extending existing forms, extended formals will be available for the macros that expands into defines or lambdas. If we were to have different forms, we'd need to change such macros around to make the extended feature available.

This is a kind of decision highly depends on the target of the implementation. I think it is bad to conflate standard syntax if the implementation is for education; students could confuse the language itself and the implementation's specifics. But that's not the target of Gauche.

For what's worth, this extension is attached to define and lambda of the gauche module. The original semantics is still kept if you import null module (corresponds to (null-environment)) or scheme module (corresponds to (scheme-report-enviornment 5)). In other words, Gauche's lambda has different syntactic binding from R5RS's lambda.

★ ★ ★

Gauche's extended formal syntax is similar to Common Lisp's, but I use keywords :optional, :key, :allow-other-keys and :rest instead of CL's &optional, &key, &allow-other-keys and &rest. I saw no point adding extra reserved symbols.

(define (foo x :optional  (y 0) (z 1)) (list x y z))

gosh> (foo 9)
(9 0 1)
gosh> (foo 9 10)
(9 10 1)
gosh> (foo 9 10 11)
(9 10 11)
gosh> (foo 9 10 11 12)
*** ERROR: too many arguments for (lambda (x :optional (y 0) (z 1)) (list x y z))

(define (foo x :key (y 0) (z 1)) (list x y z))

gosh> (foo 9 :z -1)
(9 0 -1)
gosh> (foo 9 :z -1 :zz 3)
*** ERROR: unknown keyword :zz

(define (foo :key (x 0) :allow-other-keys) x)

gosh> (foo :z 9)
0
gosh> (foo :x 8 :y 9)
8

If no default value is given, the variable is bound to #<undef>. (which can be tested with undefined?, but that hasn't been documented. I'll make it public in 0.9.1, too.)

(define (foo :key x) x)

gosh> (foo)
#<undef>

#<undef> is first-class value, so we can't be sure if the argument isn't provided, or the argument is provided but its value happens to be #<undef>. CL solves this problem by allowing extra parameter, supplied-p-parameter, that binds to a boolean value indicating whether the argument is provided. Gauche doesn't support that feature yet.

Internally, these lambdas with extended formals are expanded into the base lambdas and let-optionals*/let-keywords*.

One twist I added is an optional parameter after :allow-other-keys.

(define (foo :key x y :allow-other-keys others) 
   (list x y others))

The parameter others is bound to a keyword-value list that didn't consumed by :key parameters.

gosh> (foo :w 0 :x 1 :y 2 :z 3)
(1 2 (:z 3 :w 0))

It is handy to a procedure that wraps another procedure, and that wants to filter out whatever extra keyword argument it gets.

★ ★ ★

After implementing extended formal list, I rewrote argument parsing macros in my code with this feature. Interestingly, I found let-keywords* etc. weren't obsoleted competely by extended formals. They are still useful when you factor out common option processing:

(define (some-api x y z . options)
  (check-common-options options)
  ...)

(define (another-api a b . options)
  (check-common-options options)
  ...)

(define (check-common-options options)
  (let-keywords* options ((key1 init1) (key2 init2) ...)
    ...))

In CL, I would use destructuring-bind. But lack of that, it has a merit to have argument parsing feature separately from lambda syntax.

Tags: 0.9.1, formals, define, lambda

Comments (0)

More entries ...

Gauche Devlog

About

Author

Recent Entries

Recent Comments

Recent Releases

2010/05/14

Supporting general transformers: Step 1

2010/05/06

Import options: part two

How names are searched for in modules

Supporting 'prefix'

Supporting 'except'

Supporting 'only'

Supporting 'rename'

2010/05/03

Import options: part one

A bit of background

Challenge of open module

New import and use form

2010/05/01

Records and util.match

2010/04/30

Extended formals