Gauche Devlog

< (cdr '#1='#1#) => ? | div, mod, div0, mod0 >

2010/06/11

Constructing circular structures

Since I brought up circular structures in the last entry, I describe another related issue that hasn't solved yet.

Using #n= and #n# notation (srfi:38), you can give an external representation to a circular structure. When read in, you'll get a structure with the same topology as DAG. It is pretty handy in practice.

A simple way to reconstruct a circular structure is that you allocate the container first, and fills its content later; e.g. if you see #1=(#1#), you allocate a cons cell and register it with label '1' first; then when you encounter #1# you set-car! the pointer to the cons cell to the car of the cons cell.

As trivial as it sounds, this scheme doesn't play well with functional style, where once an object is constructed it is immutable. Actually it is more than a matter of style.

We have srfi:10 that allows user-defined structures to have a readable extrenal representation. Which is also pretty handy in practice, and I've used the feature in production works number of times (it's too bad that R6RS conflicts with srfi-10; srfi-10 preceded years before R6RS.)

Unfortunately, srfi-10 requires all the initialization arguments must be present before constructing the object. So, there seems no clear way to mix circular reference and srfi-10, such as the following:

  #1=#,(my-record (x y #1#))

Gauche has a very ad-hoc solution for this. In fact, it's too ugly that I haven't made it official, although it's been in the code for years. In the current implementation, when a reader encounters a shared reference that hasn't been realized (#1# in the above example), it creates a special read-reference object to pass to the constructor. When a constructor sees a read-reference object, it can register a callback procedure which will be called when the reference has a concrete value. The callback procedure then replaces read references for the given values.

I don't like this solution since it doesn't have enough abstraction barrier; if a user-registered read-time constructor doesn't register the callback, for example, the read object contains read-reference objects, which we don't want to expose to the rest of the program.

There are several ways to address this issue.

Implicit pointer swapping

This automates the callback process; if we can know all the pointers in the objects created by read-time constructor, we can trace the objects after all the shared labels are realized, and replaces pointers to the read references for pointers to the real objects.

This would be trivial with precise GC. With Gauche's conservative GC, however, we need to ask user-defined structures with read-time constructors to register pointer layout of the objects. Besides, the structure created by read can be arbitrary large, and I wonder if it can be efficient enough.

Implicit pointer forwarding

If we have something like implicit future, we can wrap the to-be-constructed shared reference by a future to pass to the constructor. Implicit means the existence of the future is transparent to the program; if the code tries to use the future as a value, the system implicitly returns wrapped value. (It somewhat resembles with implicit forcing of promises, but the realization of the value is done by the time read returns, without explicitly forcing it, so calling it a future seems more appropriate. Cf. http://www.ps.uni-saarland.de/alice/manual/futures.html ).

This is probably cleanest solution since side effects are nicely abstracted, and also more general, for implicit futures can be used in various other places.

The obstacle is that we have to insert checks to see whether the given Scheme object is a future everywhere. For Scheme programs it is not a big deal since VM handles it all, but we also have to consider C code that deal with ScmObjs. For primitive aggregate types such as pairs and vectors, C API exposes accessor macros so we can insert future check transparently; but other type of C structs that contains ScmObjs.... I'm not sure we can cover every cases.

Two-phase construction

This idea is to split read-time constructor to allocate phase and initialize phase, like CLOS. It encapsulates side effects into initialize method, which is good. Srfi-10 style read-time constructors can be understood as a combination of allocate and initialize, and we can raise an error if circular reference is involved in structures that don't offer separate allocate and initialize methods.

An issue is that there are structures that adopt single-phase construction protocol. Some built-in objects do, and also the record system (srfi:9, srfi:99) assumes it.

This would work if we can say all objects should be able to be constructed in two phases in Gauche; otherwise it would be very confusing.

★ ★ ★

I'm currently swinging between implicit future and two-phase construction. Is there any other ways?

Tags: srfi-10, srfi-38, srfi-9, srfi-99

Post a comment

Name: