Gauche Devlog

2010/06/11

Constructing circular structures

Since I brought up circular structures in the last entry, I describe another related issue that hasn't solved yet.

Using #n= and #n# notation (srfi:38), you can give an external representation to a circular structure. When read in, you'll get a structure with the same topology as DAG. It is pretty handy in practice.

A simple way to reconstruct a circular structure is that you allocate the container first, and fills its content later; e.g. if you see #1=(#1#), you allocate a cons cell and register it with label '1' first; then when you encounter #1# you set-car! the pointer to the cons cell to the car of the cons cell.

As trivial as it sounds, this scheme doesn't play well with functional style, where once an object is constructed it is immutable. Actually it is more than a matter of style.

We have srfi:10 that allows user-defined structures to have a readable extrenal representation. Which is also pretty handy in practice, and I've used the feature in production works number of times (it's too bad that R6RS conflicts with srfi-10; srfi-10 preceded years before R6RS.)

Unfortunately, srfi-10 requires all the initialization arguments must be present before constructing the object. So, there seems no clear way to mix circular reference and srfi-10, such as the following:

  #1=#,(my-record (x y #1#))

Gauche has a very ad-hoc solution for this. In fact, it's too ugly that I haven't made it official, although it's been in the code for years. In the current implementation, when a reader encounters a shared reference that hasn't been realized (#1# in the above example), it creates a special read-reference object to pass to the constructor. When a constructor sees a read-reference object, it can register a callback procedure which will be called when the reference has a concrete value. The callback procedure then replaces read references for the given values.

I don't like this solution since it doesn't have enough abstraction barrier; if a user-registered read-time constructor doesn't register the callback, for example, the read object contains read-reference objects, which we don't want to expose to the rest of the program.

There are several ways to address this issue.

Implicit pointer swapping

This automates the callback process; if we can know all the pointers in the objects created by read-time constructor, we can trace the objects after all the shared labels are realized, and replaces pointers to the read references for pointers to the real objects.

This would be trivial with precise GC. With Gauche's conservative GC, however, we need to ask user-defined structures with read-time constructors to register pointer layout of the objects. Besides, the structure created by read can be arbitrary large, and I wonder if it can be efficient enough.

Implicit pointer forwarding

If we have something like implicit future, we can wrap the to-be-constructed shared reference by a future to pass to the constructor. Implicit means the existence of the future is transparent to the program; if the code tries to use the future as a value, the system implicitly returns wrapped value. (It somewhat resembles with implicit forcing of promises, but the realization of the value is done by the time read returns, without explicitly forcing it, so calling it a future seems more appropriate. Cf. http://www.ps.uni-saarland.de/alice/manual/futures.html ).

This is probably cleanest solution since side effects are nicely abstracted, and also more general, for implicit futures can be used in various other places.

The obstacle is that we have to insert checks to see whether the given Scheme object is a future everywhere. For Scheme programs it is not a big deal since VM handles it all, but we also have to consider C code that deal with ScmObjs. For primitive aggregate types such as pairs and vectors, C API exposes accessor macros so we can insert future check transparently; but other type of C structs that contains ScmObjs.... I'm not sure we can cover every cases.

Two-phase construction

This idea is to split read-time constructor to allocate phase and initialize phase, like CLOS. It encapsulates side effects into initialize method, which is good. Srfi-10 style read-time constructors can be understood as a combination of allocate and initialize, and we can raise an error if circular reference is involved in structures that don't offer separate allocate and initialize methods.

An issue is that there are structures that adopt single-phase construction protocol. Some built-in objects do, and also the record system (srfi:9, srfi:99) assumes it.

This would work if we can say all objects should be able to be constructed in two phases in Gauche; otherwise it would be very confusing.

★ ★ ★

I'm currently swinging between implicit future and two-phase construction. Is there any other ways?

Tags: srfi-10, srfi-38, srfi-9, srfi-99

2010/06/09

(cdr '#1='#1#) => ?

What should (cdr '#1='#1#) print?

This started from a blog post by zick (btw, zick is the author of Manga guide to Lisp.)

He reported Gauche segfaults after printing large number of single quotes. Here are results of other lisp/schemes.

  • CLISP (2.42) : stack overflow
  • Allegro CL (8.2) : (#1='#1#)
  • SBCL (1.0.11) : (#1='#1#)
  • MzScheme (v372) : #0=((quote . #0#))

I just fixed Gauche at r7170 so it now works just like MzScheme:

gosh> (cdr '#1='#1#)
#0=((quote . #0#))

The bug was trivial in hindsight. The write/ss uses two-pass algorithm: first it scans the structure to find shared objects, then it walks the structure again to print. In the second pass it detects the pattern such as (quote X) and using abbreviated notation:

(let/cc return
  (if (obj-is-shared? obj)
    (if (first-time-to-print? obj)
      (begin ... display #n= ...)
      (begin ... display #n# ... (return))))

  (if (and (pair? obj) (null? (cdr obj)) (eq? (car obj) 'quote))
    (begin (display #\') (write/ss (cadr obj)))
    (begin ...general print... ))

When Gauche tries to write cdr of #1=(quote #1#), the first pass marks the cell whose cdr is '() and whose car is a cell #1=(quote #1#) as a shared object (see the figure below). So it starts writing #0=(, then it recognizes the (quote X) pattern; the bug is that it just recurse to (cadr obj), skipping the check of whether (cdr obj) is shared object or not.

[image]

The fix is trivial that just add an extra check in the pattern recognition that cdr of (quote X) is not a shared object, which produces #0=((quote . #0#)).

However, Allegro and SBCL prints the structure differently (aside from their number counting from one): Are they just another notation of the same structure? As shown in the figure in zick's blog (I reproduce the figure below), they should be different.

[image]

I'm curious that what algorithm produces Allegro's result. I can't imagine one-pass algorithm, so my guess is that instead of having a separate code path for pass 1 and 2, they may use a single code walker (that handles (quote X) pattern detection) and calls it twice with different modes, once for shared-structure detection and again for output. Then it may fail to recognize that the initial object is the same as cdr of (quote X), because recursing into (cadr obj) will skip that particular cell.

Well, I can check it by looking at SBCL but this is so minor problem so I put it off for someday...

Tag: srfi-38

2010/05/14

Supporting general transformers: Step 1

The development of Gauche is a kind of streched bootstrapping process. Initially, the entire VM, compiler, and most of builtin procedures are written in C (with a bit help of STk to generate stub code). That's because I needed a reasonable performance from the beginning to use Gauche in the production; the initial VM was not well tuned, and if I had written a compiler in Gauche from the beginning, the rudimental version would've been too slow for my purpose.

Gradually I rewrote VM (still in C, but most part is written in a sort of DSL using S-expressions), then the compiler (almost entirely written in Gauche now). The initial optimization target of the compiler was the compiler itself, and it worked well. I'm also in the process of gradually rewriting builtin procedures in Scheme, whenever doing so doesn't affect overall performance.

So the VM and the compiler has been rewritten. But there's one component left untouched: The syntax-rules expander. It's a nasty spaghetti of C code I did't dare to touch. There were portable Scheme expanders, but I was afraid that they were not optimal to run on Gauche---I need a fast macro expander, since Gauche needs to compile on the fly. I planned to write an expander from scratch tuned to take advantage of Gauche's runtime.

The change I committed today is the very first step of rewriting macro subsystem. It still uses old syntax-rules expander, but it decouples the expander from the compiler. The compiler used to recognize syntax-rules as a part of define-syntax etc. That is, the syntax-rules form alone didn't mean anything to the compiler:

gosh> (syntax-rules () [(_ x) 'x])
*** ERROR: unbound variable: x

Now syntax-rules itself is a macro, which evaluates to a macro transformer.

gosh> (syntax-rules () [(_ x) 'x])
#<macro #f>

Syntactic bindings such as define-syntax, let-syntax and letrec-syntax are changed so that it evaluates rhs in the compile-time environment (which is supposed to yield a macro transformer) and creates a syntactic binding to the given name.

★ ★ ★

An interesting outcome is that this change officially supports aliasing syntactic/macro keywords.

Gauche doesn't separate compile-time global bindings and runtime global bindings, so evaluating a syntax/macro keyword reveals the syntax handlers or macro transformers as a first-class value.

gosh> if
#<syntax if>
gosh> let1
#<macro let1>

And you can rebind those transformers to another global variable as if they are runtime bindings:

gosh> (define xif if)    ; don't do this!
xif
gosh> (xif #f (error "oops") 'ok)   ; works like if
ok

This is an unintended artifact, relying on the fact that Gauche compiles each toplevel form right before executing it in normal mode of operation. This hasn't been encouraged, however, since it mixes phases and will break unexpectedly when the timing of compilation and execution is changed. In fact, it doesn't work if both form is enclosed in a single toplevel form:

gosh> (begin (define yif if) (yif #f (error "oops") 'ok))
*** ERROR: oops

It also doesn't work if the file with those forms are precompiled. (Precompilation is not officially documented, for there are still unreliable behaviors in general cases. But quite a few built-in procedures and the compiler itself are precompiled into arrays of VM instructions).

With today's change, the rhs of define-syntax etc. can be any Scheme expression as far as it yields a macro transformer (or a syntactic handler; I won't go into details of difference of two for now.) Now, this is a proper way to give an alias to a syntactic/macro keyword:

gosh> (define-syntax zif if)

★ ★ ★

I'm still pondering the interface of macro transformers.

Internally, the current implementation uses a procedure that receives a source form and the compile-time enviornment, and returns an expanded form with possible syntactic annotations. But the compile-time enviornment is a private structure to the compiler and I don't want to expose such internal guts to the programmers, so I want to wrap the transformer with some nice abstraction.

A possible choice is the model defined in R6RS--- a macro transformer is a procedure that receives a syntax object and returns a syntax object. Which itself is ok (I can put the source form and compiler environment together into an opaque object). But the more I read the R6RS, the less I want to follow the spec... There are various ways the transformer is called (whether it receives entire form or just a keyword, or whether the keyword can appear in lhs of set!) and the rules to detemine which one is used looks somewhat ad-hoc. I understand those variations are needed to support identifier macros and variable transformers. I just hope they are designed as utility APIs on top of a simpler axiom, something you can say in one sentence.

Actually, it is not very clear to me that R6RS intends those transformer API as the API, or just one of possible APIs. The definition of transformer procedure is not in the main report at all, but rather in the library report. It's as though this particular definition of transformers are attached to (rnrs syntax-case (6)) library. Is an implementation allowed to provide different transformer interface, e.g. explicit renaming one, if the implementation imports other library? It doesn't look straightforward, since define-syntax etc. need to switch the interpretation of transformer procedures depeding on which library is imported. It would be more reasonable that the transfomer library also provides syntactic binding forms, or provide a intermediate macro to translate whatever system-internal transfomer API into the explicit transformer API:

(define-syntax foo
  (sc-transformer
    (lambda (expr)
      (syntax-case expr
        ....))))

I think MIT-scheme have something like this, and some other implementations too.

Probably I'll take this approach. My current plan is to provide an explicit renaming transformer as a basis, and build a syntax-case one on top of it. Alex Shinn's Chibi-Scheme does that, I believe. The only question is if I can get it fast enough, and that's what I need to implement it to see.

Tags: 0.9.1, macro, r6rs

2010/05/06

Import options: part two

In the previous article I introduced 0.9.1's import options. Now I describe how it is implemented currently.

How names are searched for in modules

Modules have two kinds of relations. Importing is the way to use one module from another module; importing-imported relationship forms a directed graph, possibly contains cycles. Inheriting is the way to extend existing module(s) to add something; like augumenting existing modules with new bindings, or making a single facade of a bunch of modules. Inheritance is handled the same way as the class inheritance, and forms a directed acyclic graph.

Importing is formed by import, and it is not transitive. If module X imports module Y which imports module Z, X only sees Y's exported bindings but not Z's. Inheritance is formed by extend, and it is transitive. (ref:import, ref:extend).

Module inheritance is less used than imports (except that all modules inherits gauche module by default), but it comes handy time to time. And implementing import options is one of the times inheritance comes handy unexpectedly.

Suppose you set up modules as follows:

(define-module P (extend Q R))
(define-module S (extend T U))
(define-module V (extend W X))

(define-module A
  (import V S P)
  (extend B C))

The following figure shows how a name used in module A is searched for . Red numbers are the order of the search. (It doesn't show the default inherited modules like gauche, scheme etc.)

[image]

Note that this search occurs at most twice per global name; once in compile time to see if the name has a syntactic binding, and another in the first time the code is run. Once the name is resolved, the result is cached and never be searched again.

Thus, although the modules are open, once the name is resolved you cannot insert shadowing binding into the module between the current one and the one the name is resolved in. This is a trade-off between speed and flexibility; if you want the new shadowing binding to be reflected, you can always reload the current module.

Supporting 'prefix'

A 'prefix' slot is added to a module. When a name is searched into the prefixed module, the prefix is stripped from the name first---if the name doesn't have the prefix, we can stop serching of that particular path on the imported module.

A prefix is attached by the importing module, so we cannot modify the imported module itself (Another module may import the same module with different prefix or without prefix at all). The 'prefix' import option creates an anonymous module that inherits the imported moudle, and import the created module instead. The created module behaves the same as the original imported module, except that it strips the prefix.

Supporting 'except'

except is also done with module inheritance, and another new feature, a negative binding. It is a special binding that answers "no, the name doesn't have a binding along this path" when a name is looked for in it.

So, the except option creates an anonymous module inheriting the imported module, and inserts the negative bindings of the names listed in the option. When one of the names are searched for, the search is gave up at this anonymous module. For other exported names, the search is continued to the ancestor modules and eventually gets a hit.

An except and a prefix can be combined to one anonymous modules. The difference of which one comes first can be reflected to the names inserted into the anonymous module: If prefix comes first, we insert the negative bindings with the name the prefix is stripped, since the search process strips the prefix before looking into this module. If except comes first, we insert the negative bindings with the name as they are.

Supporting 'only'

For the only option, we create an anonymous module that do not inherit anything, and inserts the bindings with the listed names from the imported module.

What we actually do is taking a gloc object which is a value of the module hash table keyed by a name, and registering the name and the gloc object as a new binding in the anonymous module. So the two bindings share the same gloc object. It allows the importing module to see whenever the original binding is modified by set!.

In effect, we create aliases to exising global bindings, although in only case the aliased name is the same, only visibility differs.

Supporting 'rename'

rename was the trickiest. Not only making a new names visible, but also it must make sure that importing modules won't see the original names before renaming. The interaction with prefix is also nasty, for we may want to see the renamed symbol with or without prefix, depending on which option comes first.

We create an anonymous module, inheriting the imported module. Then creates an alias binding in the same way as only, but using the renamed name instead of the original name. Next, we insert negative bindings to the original names, except the original names that are used as renamed names as well.

For example, when we have this crazy setting:

(import (M :rename ((kar kdr) (kdr kar) (kons snok))))

We insert the following bindings to the anonymous module:

  • an alias binding of kdr (which shares the gloc with the name kar in the module M)
  • an alias binding kar (which shares the gloc with the name kdr in the module M)
  • an alias binding 'snok (which shares the gloc with the name kons in the module M)
  • a negative binding of kons

Since it inherits the imported module M, searching all other names falls into M.

If prefix is added after the rename option, after-rename names also get prefix, so we can just add prefix to the anonymous module.

If prefix is added before the rename option, however, we have to make the after-rename names without prefix. So we need two anonymous modules, the first one for prefixing, and the second one with the renaming setup, inheriting the first one.

Tags: 0.9.1, import, module

2010/05/03

Import options: part one

A bit of background

Gauche's module system is based on STk's. What I liked about STk's module system was its simplicity and flexibility.

It's simple since all the forms don't have lots of options. Compare it with CL's defpackage---I always have to look up the manual whenever I write a new defpackage form.

It's flexible because modules are open---I can add definitions to existing modules later, or even alter the definition afterwards. Altering the definitions comes handy when you have to patch existing library, but you don't have permission to change the installed libraries, or don't want to risk affecting other programs by changing shared libraries. In your program you can just put some code like the following:

(with-module target-module
  (define (foo  ...)
    ...fixed definition..))

This (I think it is called monkey patching) isn't recommended in the final code to be shipped, but sometimes you have to step outside of "the Right Things" to fix the holes in emergency.

However, the system lacks some handy features like prefixing imported symbols. The limit gets in the programmer's way more often as we have more libraries, and the possibility to import from two modules that exports the same name grows.

I've been aware with this issue for long time, but procrastinating to implement it, for I want to avoid neither a complex beast that everybody have to look up manuals constantly, nor a half-baked solution that covers simple cases but falls short and nees to be reworked in practical situations. As R6RS was finalized, and it offered a module system with various options, that became my reference point in terms of implementing features.

In 0.9.1 the feature will be finally available. For a taster, the following code says load srfi-1 but only import iota and fold from it, prefixing them with srfi-1:.

(use srfi-1 :only (iota fold) :prefix srfi-1:)

(srfi-1:fold + 0 (srfi-1:iota 100 1)) => 5050

Challenge of open module

I eventually want to provide R6RS-compatible layer, so putting enough functionarity to support R6RS library form is one of the goals. However, R6RS specification poses some challenges to open module system like ours.

R6RS has a concept of import-set which is a set of names to import, and defines only, except, prefix and rename as operations on the sets; that is, each option takes a set of names, and returns a modified set of names. It is easy to explain, and straightforward to implement if you know all the names you are dealing with.

The if is the problem in our open modules, since we don't know the complete set of names when we process import forms. For example, except cannot be just a set-difference operation; if a new exported binding is added that is not listed in the except list later, that binding should become visible in the importing module.

I regret that I didn't discuss more on this during R6RS review process. Being simple to explain is a virtue, but the way of R6RS covers too broad space than necessary; for example, you can nest arbitrary number of prefix and rename forms. Suppose library lib exports names w, x, y, and z. If you import the library like the following, can you figure out what names you use to access to the original names?

(import (rename (prefix (rename (prefix (lib) n:)
                                (n:x y) (n:y x))
                        m:)
                (m:n:z z) (m:x y)))

If I understand R6RS correctly, it'll be like this:

imported original
y y
m:y x
z z
m:n:w w

It's hard to imagine when this kind of setting up is useful, but even if you really need this kind of multiple layering, I imagine it is doable by writing an intermediate module to remapping names. Probably nobody would want to write nested prefix/renames. But R6RS compatible implementations need to support them.

In the R6RS world, modules are closed, and the set of names is fixed when you process imports. So that's not a defect of R6RS per-se, but I like the designs more that encourages alternative views of the worlds, instead of putting hurdles to them.

Anyway, when we process import forms, we don't know yet the entire set of the names to import. We don't want to recalculate the set whenever a new exported binding is added to a module, or whenever we search for the imported module with those options.

So I employed a few tricks.

  • For prefixing, we ran operation in reverse. That is, when we search into an imported module with prefix, we strip the prefix from the searching symbol first and look for the stripped symbol in the module.
  • For only and inserted bindings by rename, the import form creates an anonymous intermediate module to which necessary names are injected. This is one-time cost at processing import and doesn't cost at symbol lookup.
  • For except and hidden binding by rename, we also use an intermedidate module that has a special shadow binding that prevents name searching further into the module's ancestors. This is also one-time cost at processing import and doesn't cost at symbol lookup.

The implementation is a bit more compilcated than I like, but it doesn't seem to have too much impact in performance. Espcecially, if you don't use prefix, overhead is negligible.

New import and use form

The import form is extended as follows:

<import-form> : (import <import-spec> ...)

<import-spec> : <module-name>
              | (<module-name> <import-option> ...)

<import-option> : :only (<symbol> ...)
                | :except (<symbol> ...)
                | :rename ((<symbol> <symbol>) ...)
                | :prefix <symbol>

<module-name> : <symbol>

use form is also extended to accept import options. You don't need extra parentheses, for use takes only one modules (note that import can take multiple modules, that's why we needed parens).

<use-form> : (use <module-name> <import-option> ...)

The option modifies imported symbols as the way it appears, so the order matters. The following two import forms are equivalent, both make iota available in the current module under the name srfi-1:iota.

   (use srfi-1 :only (iota) :prefix srfi-1:)
   (use srif-1 :prefix srfi-1: :only (srfi-1:iota))

In the latter form, symbols in :only option must be prefixed since they are already prefixed in the previous :prefix option.

I think it is a good idea to put :only and :except option always before :prefix, for less confusion.

On the other hand, you may need both orders of :rename and :prefix, depending on what you want. If you put :prefix clause after :rename, the renamed identifier gets prefix as well:

   (use srfi-1 :rename ((iota i)) :prefix srfi-1:)
   
   srfi-1:i => #<iota>
   srfi-1:fold => #<fold>

If you put :prefix first, you can import renamed symbols without prefix:

   (use srfi-1 :prefix srfi-1: :rename ((srfi-1:iota i)))
   
   i => #<iota>
   srfi-1:fold => #<fold>

The contrived complex imports above can be written in our syntax as follows, though I don't recommend it.

(use lib
     :prefix n:
     :rename ((n:x y) (n:y x))
     :prefix m:
     :rename ((m:n:z z) (m:x y)))

In the next entry, I'd like to explain how this is implemented in Gauche. Stay tuned.

Tags: 0.9.1, import, use, r6rs

More entries ...