2010/05/06
Import options: part two
In the previous article I introduced 0.9.1's import options. Now I describe how it is implemented currently.
How names are searched for in modules
Modules have two kinds of relations. Importing is the way to use one module from another module; importing-imported relationship forms a directed graph, possibly contains cycles. Inheriting is the way to extend existing module(s) to add something; like augumenting existing modules with new bindings, or making a single facade of a bunch of modules. Inheritance is handled the same way as the class inheritance, and forms a directed acyclic graph.
Importing is formed by import, and it is not transitive. If module X imports module Y which imports module Z, X only sees Y's exported bindings but not Z's. Inheritance is formed by extend, and it is transitive. (ref:import, ref:extend).
Module inheritance is less used than imports (except that all modules inherits gauche module by default), but it comes handy time to time. And implementing import options is one of the times inheritance comes handy unexpectedly.
Suppose you set up modules as follows:
(define-module P (extend Q R)) (define-module S (extend T U)) (define-module V (extend W X)) (define-module A (import V S P) (extend B C))
The following figure shows how a name used in module A is searched for . Red numbers are the order of the search. (It doesn't show the default inherited modules like gauche, scheme etc.)
Note that this search occurs at most twice per global name; once in compile time to see if the name has a syntactic binding, and another in the first time the code is run. Once the name is resolved, the result is cached and never be searched again.
Thus, although the modules are open, once the name is resolved you cannot insert shadowing binding into the module between the current one and the one the name is resolved in. This is a trade-off between speed and flexibility; if you want the new shadowing binding to be reflected, you can always reload the current module.
Supporting 'prefix'
A 'prefix' slot is added to a module. When a name is searched into the prefixed module, the prefix is stripped from the name first---if the name doesn't have the prefix, we can stop serching of that particular path on the imported module.
A prefix is attached by the importing module, so we cannot modify the imported module itself (Another module may import the same module with different prefix or without prefix at all). The 'prefix' import option creates an anonymous module that inherits the imported moudle, and import the created module instead. The created module behaves the same as the original imported module, except that it strips the prefix.
Supporting 'except'
except is also done with module inheritance, and another new feature, a negative binding. It is a special binding that answers "no, the name doesn't have a binding along this path" when a name is looked for in it.
So, the except option creates an anonymous module inheriting the imported module, and inserts the negative bindings of the names listed in the option. When one of the names are searched for, the search is gave up at this anonymous module. For other exported names, the search is continued to the ancestor modules and eventually gets a hit.
An except and a prefix can be combined to one anonymous modules. The difference of which one comes first can be reflected to the names inserted into the anonymous module: If prefix comes first, we insert the negative bindings with the name the prefix is stripped, since the search process strips the prefix before looking into this module. If except comes first, we insert the negative bindings with the name as they are.
Supporting 'only'
For the only option, we create an anonymous module that do not inherit anything, and inserts the bindings with the listed names from the imported module.
What we actually do is taking a gloc object which is a value of the module hash table keyed by a name, and registering the name and the gloc object as a new binding in the anonymous module. So the two bindings share the same gloc object. It allows the importing module to see whenever the original binding is modified by set!.
In effect, we create aliases to exising global bindings, although in only case the aliased name is the same, only visibility differs.
Supporting 'rename'
rename was the trickiest. Not only making a new names visible, but also it must make sure that importing modules won't see the original names before renaming. The interaction with prefix is also nasty, for we may want to see the renamed symbol with or without prefix, depending on which option comes first.
We create an anonymous module, inheriting the imported module. Then creates an alias binding in the same way as only, but using the renamed name instead of the original name. Next, we insert negative bindings to the original names, except the original names that are used as renamed names as well.
For example, when we have this crazy setting:
(import (M :rename ((kar kdr) (kdr kar) (kons snok))))
We insert the following bindings to the anonymous module:
- an alias binding of kdr (which shares the gloc with the name kar in the module M)
- an alias binding kar (which shares the gloc with the name kdr in the module M)
- an alias binding 'snok (which shares the gloc with the name kons in the module M)
- a negative binding of kons
Since it inherits the imported module M, searching all other names falls into M.
If prefix is added after the rename option, after-rename names also get prefix, so we can just add prefix to the anonymous module.
If prefix is added before the rename option, however, we have to make the after-rename names without prefix. So we need two anonymous modules, the first one for prefixing, and the second one with the renaming setup, inheriting the first one.
2010/05/03
Import options: part one
A bit of background
Gauche's module system is based on STk's. What I liked about STk's module system was its simplicity and flexibility.
It's simple since all the forms don't have lots of options. Compare it with CL's defpackage---I always have to look up the manual whenever I write a new defpackage form.
It's flexible because modules are open---I can add definitions to existing modules later, or even alter the definition afterwards. Altering the definitions comes handy when you have to patch existing library, but you don't have permission to change the installed libraries, or don't want to risk affecting other programs by changing shared libraries. In your program you can just put some code like the following:
(with-module target-module (define (foo ...) ...fixed definition..))
This (I think it is called monkey patching) isn't recommended in the final code to be shipped, but sometimes you have to step outside of "the Right Things" to fix the holes in emergency.
However, the system lacks some handy features like prefixing imported symbols. The limit gets in the programmer's way more often as we have more libraries, and the possibility to import from two modules that exports the same name grows.
I've been aware with this issue for long time, but procrastinating to implement it, for I want to avoid neither a complex beast that everybody have to look up manuals constantly, nor a half-baked solution that covers simple cases but falls short and nees to be reworked in practical situations. As R6RS was finalized, and it offered a module system with various options, that became my reference point in terms of implementing features.
In 0.9.1 the feature will be finally available. For a taster, the following code says load srfi-1 but only import iota and fold from it, prefixing them with srfi-1:.
(use srfi-1 :only (iota fold) :prefix srfi-1:) (srfi-1:fold + 0 (srfi-1:iota 100 1)) => 5050
Challenge of open module
I eventually want to provide R6RS-compatible layer, so putting enough functionarity to support R6RS library form is one of the goals. However, R6RS specification poses some challenges to open module system like ours.
R6RS has a concept of import-set which is a set of names to import, and defines only, except, prefix and rename as operations on the sets; that is, each option takes a set of names, and returns a modified set of names. It is easy to explain, and straightforward to implement if you know all the names you are dealing with.
The if is the problem in our open modules, since we don't know the complete set of names when we process import forms. For example, except cannot be just a set-difference operation; if a new exported binding is added that is not listed in the except list later, that binding should become visible in the importing module.
I regret that I didn't discuss more on this during R6RS review process. Being simple to explain is a virtue, but the way of R6RS covers too broad space than necessary; for example, you can nest arbitrary number of prefix and rename forms. Suppose library lib exports names w, x, y, and z. If you import the library like the following, can you figure out what names you use to access to the original names?
(import (rename (prefix (rename (prefix (lib) n:) (n:x y) (n:y x)) m:) (m:n:z z) (m:x y)))
If I understand R6RS correctly, it'll be like this:
imported | original |
y | y |
m:y | x |
z | z |
m:n:w | w |
It's hard to imagine when this kind of setting up is useful, but even if you really need this kind of multiple layering, I imagine it is doable by writing an intermediate module to remapping names. Probably nobody would want to write nested prefix/renames. But R6RS compatible implementations need to support them.
In the R6RS world, modules are closed, and the set of names is fixed when you process imports. So that's not a defect of R6RS per-se, but I like the designs more that encourages alternative views of the worlds, instead of putting hurdles to them.
Anyway, when we process import forms, we don't know yet the entire set of the names to import. We don't want to recalculate the set whenever a new exported binding is added to a module, or whenever we search for the imported module with those options.
So I employed a few tricks.
- For prefixing, we ran operation in reverse. That is, when we search into an imported module with prefix, we strip the prefix from the searching symbol first and look for the stripped symbol in the module.
- For only and inserted bindings by rename, the import form creates an anonymous intermediate module to which necessary names are injected. This is one-time cost at processing import and doesn't cost at symbol lookup.
- For except and hidden binding by rename, we also use an intermedidate module that has a special shadow binding that prevents name searching further into the module's ancestors. This is also one-time cost at processing import and doesn't cost at symbol lookup.
The implementation is a bit more compilcated than I like, but it doesn't seem to have too much impact in performance. Espcecially, if you don't use prefix, overhead is negligible.
New import and use form
The import form is extended as follows:
<import-form> : (import <import-spec> ...) <import-spec> : <module-name> | (<module-name> <import-option> ...) <import-option> : :only (<symbol> ...) | :except (<symbol> ...) | :rename ((<symbol> <symbol>) ...) | :prefix <symbol> <module-name> : <symbol>
use form is also extended to accept import options. You don't need extra parentheses, for use takes only one modules (note that import can take multiple modules, that's why we needed parens).
<use-form> : (use <module-name> <import-option> ...)
The option modifies imported symbols as the way it appears, so the order matters. The following two import forms are equivalent, both make iota available in the current module under the name srfi-1:iota.
(use srfi-1 :only (iota) :prefix srfi-1:)
(use srif-1 :prefix srfi-1: :only (srfi-1:iota))
In the latter form, symbols in :only option must be prefixed since they are already prefixed in the previous :prefix option.
I think it is a good idea to put :only and :except option always before :prefix, for less confusion.
On the other hand, you may need both orders of :rename and :prefix, depending on what you want. If you put :prefix clause after :rename, the renamed identifier gets prefix as well:
(use srfi-1 :rename ((iota i)) :prefix srfi-1:) srfi-1:i => #<iota> srfi-1:fold => #<fold>
If you put :prefix first, you can import renamed symbols without prefix:
(use srfi-1 :prefix srfi-1: :rename ((srfi-1:iota i))) i => #<iota> srfi-1:fold => #<fold>
The contrived complex imports above can be written in our syntax as follows, though I don't recommend it.
(use lib :prefix n: :rename ((n:x y) (n:y x)) :prefix m: :rename ((m:n:z z) (m:x y)))
In the next entry, I'd like to explain how this is implemented in Gauche. Stay tuned.
2010/05/01
Records and util.match
This is not a new feature for 0.9.1; it's been there for long time. But I rediscover it today with a pleasant surprise, so it may be worth to note about it, for I expect to use records more, as srfi-99 is supported in 0.9.1 (cf. ERR5RS Records and beyond).
The pattern matcher util.match can be used to match records. The feature already existed in the original Andrew Wright's match, and I believe it's available for many other Scheme implementations.
(use gauche.record) (use util.match) (use math.const) ; for pi (define-record-type circle #t #t radius) (define-record-type rectangle #t #t width height) (define (area shape) (match shape [($ circle r) (* pi r r)] [($ rectangle w h) (* w h)])) gosh> (area (make-circle 5)) 78.53981633974483 gosh> (area (make-rectangle 4 8)) 32
($ class var ...) matches the instance of class, with binding slot values to var ... in order.
This pattern can be used for any instances, nevertheless I haven't used it much before. I care less about the order of slots when I think in terms of classes. Slots are always accessed by thier name, and I freely change the order of slots in class definitions, so it's a bit cumbersome to have patterns that depend on the order of slots.
However, records' slots feel more positional---maybe because their default constructor takes positional arguments to initialize slots. Also their emphasis of immutability reminds me the style of other functional languages.
If I find myself using this pattern more and more, it'll be worth to optimize it. Currently it looks up n-th slot name in the list of slots of the matching object, then uses slot-ref to obtain a value. For general objects we need to do that, since associations of the position and the slot can be changed by class redefinition. But for records we don't need to worry about class redefinition, and internally it's much faster to access records' slot via its position instead of its name.
Tags: gauche.record, srfi-99, util.match
2010/04/30
Extended formals
One thing I miss most when I hop back to Scheme from CL is CL's lambda list for optional and keyword arguments. I feel the CL's spec is too complicated for my taste (CLHS specifies 10 different kind of lambda lists), nevertheless I know it is useful.
Gauche has been providing argument parsing utility macros, let-optionals*, let-keywords and let-keywords* (ref:let-optionals*, ref:let-keywords*). These cover enough functionalities to deal with optional and keyword arguments. Yet they are different from being able to specify those arguments directly in the formals. One thing is that those macros makes code longer. Another thing is that I feel specifying them directly in the formals somewhat makes them more a part of the public contract of the procedure. They stand out in the source code, claiming that the procedure takes two required argments and three optional arguments, something like that. It makes easier to read the source. Which is precious.
So I've been secretly experimenting the CL-like extended formal list for almost two years. Now I'm convinced that it has enough advantages to be in officially.
Actually, there's a SRFI for extended formals (srfi:89). It introduces new forms, define* and lambda*, that recognize extended syntax for optional and named (keyword) arguments.
Providing new forms is a polite way---it leaves original Scheme intact, and won't step on existing code accidentally. It is highly desirable for a portable library, and understandable that srfi-89 took that path.
However, for this feature, I rather opted to extend define and lambda. Support of extended formals is an upper compatible change (I mean, even with this extention, proper R5RS programs runs just fine), and having different forms only to be polite makes the language unnecessarily complex. Another important advantage is that, by extending existing forms, extended formals will be available for the macros that expands into defines or lambdas. If we were to have different forms, we'd need to change such macros around to make the extended feature available.
This is a kind of decision highly depends on the target of the implementation. I think it is bad to conflate standard syntax if the implementation is for education; students could confuse the language itself and the implementation's specifics. But that's not the target of Gauche.
For what's worth, this extension is attached to define and lambda of the gauche module. The original semantics is still kept if you import null module (corresponds to (null-environment)) or scheme module (corresponds to (scheme-report-enviornment 5)). In other words, Gauche's lambda has different syntactic binding from R5RS's lambda.
★ ★ ★
Gauche's extended formal syntax is similar to Common Lisp's, but I use keywords :optional, :key, :allow-other-keys and :rest instead of CL's &optional, &key, &allow-other-keys and &rest. I saw no point adding extra reserved symbols.
(define (foo x :optional (y 0) (z 1)) (list x y z)) gosh> (foo 9) (9 0 1) gosh> (foo 9 10) (9 10 1) gosh> (foo 9 10 11) (9 10 11) gosh> (foo 9 10 11 12) *** ERROR: too many arguments for (lambda (x :optional (y 0) (z 1)) (list x y z))
(define (foo x :key (y 0) (z 1)) (list x y z)) gosh> (foo 9 :z -1) (9 0 -1) gosh> (foo 9 :z -1 :zz 3) *** ERROR: unknown keyword :zz
(define (foo :key (x 0) :allow-other-keys) x) gosh> (foo :z 9) 0 gosh> (foo :x 8 :y 9) 8
If no default value is given, the variable is bound to #<undef>. (which can be tested with undefined?, but that hasn't been documented. I'll make it public in 0.9.1, too.)
(define (foo :key x) x) gosh> (foo) #<undef>
#<undef> is first-class value, so we can't be sure if the argument isn't provided, or the argument is provided but its value happens to be #<undef>. CL solves this problem by allowing extra parameter, supplied-p-parameter, that binds to a boolean value indicating whether the argument is provided. Gauche doesn't support that feature yet.
Internally, these lambdas with extended formals are expanded into the base lambdas and let-optionals*/let-keywords*.
One twist I added is an optional parameter after :allow-other-keys.
(define (foo :key x y :allow-other-keys others) (list x y others))
The parameter others is bound to a keyword-value list that didn't consumed by :key parameters.
gosh> (foo :w 0 :x 1 :y 2 :z 3) (1 2 (:z 3 :w 0))
It is handy to a procedure that wraps another procedure, and that wants to filter out whatever extra keyword argument it gets.
★ ★ ★
After implementing extended formal list, I rewrote argument parsing macros in my code with this feature. Interestingly, I found let-keywords* etc. weren't obsoleted competely by extended formals. They are still useful when you factor out common option processing:
(define (some-api x y z . options) (check-common-options options) ...) (define (another-api a b . options) (check-common-options options) ...) (define (check-common-options options) (let-keywords* options ((key1 init1) (key2 init2) ...) ...))
In CL, I would use destructuring-bind. But lack of that, it has a merit to have argument parsing feature separately from lambda syntax.
2010/04/29
Regexp read-write invariance
This is one of those things that appear trivial, but when you try to implement them you find a vine is attached that leads to unexpected places.
Gauche adopts a dedicated syntax for regexp. #/regexp/ is read as a literal regexp, that is, a regexp object is contructed by the reader. Then it is only natural that, when a regexp object is written out, it is represented as #/regexp/. In Gauche 0.9 and before, however, it is not guaranteed for a regexp object to have external representation. Such regexp objects are written as #<regexp>.
You may not have noticed that there are such regexps, since regexp objects created by the normal means have been written out in regexp syntax.
gosh> #/(\d\d):(\d\d):(\d\d)/ #/(\d\d):(\d\d):(\d\d)/ gosh> (string->regexp "(\\d\\d):(\\d\\d):(\\d\\d)") #/(\d\d):(\d\d):(\d\d)/
This is done trivially by saving the source string in a regexp object. You can access this string by regexp->string.
gosh> (regexp->string #/(\d\d):(\d\d):(\d\d)/) "(\\d\\d):(\\d\\d):(\\d\\d)"
However, there's an undocumented way to create a regexp object procedurally, without composing a string representation. And some of Gauche internals have used that interface, yielding unprintable regexps.
The construction of regexp objects is not an atomic blackbox, but handled in three steps: regexp-parse parses a string representation of regexp and creates an S-expression representing the AST. regexp-optimize transforms an AST, doing some rudimental optimizations. And finally regexp-compile takes an AST and generates an regexp object, which contains a VM instruction sequence that executes NFA of the regexp. (This VM is a specialized one for regexp, not the same one as Gauche's main VM. I'm musing a vague idea to integrate them for some time.)
If a regexp object is created directly from S-expression, it doesn't have a source string, hence it doesn't have an external representation in 0.9.
I finally hit an occasion that I was annoyed with unprintable regexp, and ended up making a procedure regexp-unparse, that takes AST and constructs a string representation of regrexp. This close the circle. Now regexp->string checks if the regexp object has a source string, and if not, it calls regexp-unparse to construct one. Another API regexp-ast returns an AST of the given regexp object.
★ ★ ★
The original plan was to provide alternative ways to denote regexps, specifically, SRE was one of those alternative syntax I had in my mind. The ways to plug in more sophisiticated regexp optimizer written in Scheme was another plan.
The current AST format is something I hacked up quickly and I've been hoping to come back to redesign it. That's why I've procrastinated documenting regexp-parse etc.
But after finishing regexp-unparse and looking back, it seems to me that the current AST is good enough, although not perfect. Providing the API set to manipulate regexps may open up interesting possibilities for users. So I decided to make it public in 0.9.1.
I'm working on the documentation, but here's a summary of regexp "under-the-hood" API.
- regexp-parse string :key case-fold => ast
- regexp-optimize ast => ast
- regexp-compile ast => regexp
- regexp-ast regexp => ast
- regexp-unparse ast :key (on-error :error) => string
Comments (0)