Gauche Devlog

2010/12/11

New directory structure

Until 0.9, you had to recompile Gauche extensions for every new release of Gauche, even if it was a micro version up (i.e. 0.8.12 -> 0.8.13). It was because C API could be changed between them.

After 1.0, it is our goal to keep API & ABI compatibility at least for minor version level, so that extension modules compiled for version X.Y.z1 will work for X.Y.z2 where z2 >= z1. 0.9.x series is the run-through rehearsal for the stable 1.0 release, to see our scheme really works after 1.0.

At 0.9 release, however, I overlooked one thing: Full Gauche version number (major.minor.micro) was embedded in the pathnames where architecture-depended files are installed; extensions' DSOs are in /usr/lib/gauche/site/X.Y.Z/${arch}/ (if Gauche was installed with --prefix=/usr.) This wouldn't work if we want to share extensions among different micro versions.

An easy way could be to keep using "0.9" throughout 0.9.x series for the 'site' directory. That is, while Gauche "core" files would still go to /usr/lib/gauche/X.Y.Z/${arch}/, extension modules install their files to /usr/lib/gauche/site/X.Y/${arch}/. Version 0.9's structure already fit this scheme, so transition would be smooth.

However, this scheme somewhat mixed gauche version (X.Y.Z) and ABI version (X.Y). That is, in /usr/lib/gauche/0.9/, the "0.9" part would mean gauche version, while in /usr/lib/gauche/site/0.9/, the "0.9" part would mean ABI version. It might not matter in practice, but I didn't quite like such mixture.

And there came another issue: The Gauche runtime DSO (libgauche.so) is placed in the system's common library directory, such as /usr/lib. It is common to append versions after the name (e.g. libgauche.so.0 and libgauche.so.0.9) and make versionless name a symlink to the versioned name. However, the common assumption for this scheme is that binary compatibility is kept among the same major versions; if something is compiled with libgauche.so.1.0, it is expected to work with libgauche.so.1.1 as well. At this moment I want to reserve the possibility to change ABI between Gauche version 1.0 and 1.1.

So, I decided to name the Gauche runtime DSO as libgauche-X.Y.so, where X.Y is the ABI version. Then it is clear that which version is compatible to which DSO. It also allows to install more than one versions of Gauche with different ABI versions.

If we name DSO as libgauche-X.Y, then why don't we name other runtime libraries as well? That is, we can install stuff under /usr/lib/gauche-X.Y/ and /usr/share/gauche-X.Y. Then it is clear that which ABI version the file(s) are for, and it is easier to manage when you have multiple versions of Gauche with different ABI versions (e.g. it's easy to delete files for old versions).

So, from 0.9.1, I adopt the following naming scheme:

  • Gauche runtime DSO
    • ${exec_prefix}/libgauche-X.Y.so
    • ${exec_prefix}/libgauche-X.Y.so.0 (soname)
    • ${exec_prefix}/libgauche-X.Y.so.0.Z (realname)
  • Architecture dependent files
    • ${exec_prefix}/gauche-X.Y/X.Y.Z/${arch}/* ;; core's *.so
    • ${exec_prefix}/gauche-X.Y/X.Y.Z/include/* ;; core's *.h
    • ${exec_prefix}/gauche-X.Y/site/${arch}/* ;; extensions' *.so
    • ${exec_prefix}/gauche-X.Y/site/include/* ;; extensions' *.h
  • Architecture independent files
    • ${datadir}/gauche-X.Y/X.Y.Z/* ;; core's *.scm
    • ${datadir}/gauche-X.Y/site/* ;; extensions' *.scm

During 0.9.x period, the old versionless directories are automatically added to the library search path, and symlink are created from libgauche.so etc. to the versioned DSOs. So, the extension modules you've installed for Gauche 0.9 should keep working after you install 0.9.1.

(Note for those who are chasing development trunk: If you've installed extensions for 0.9.1_pre1 or 0.9.1_pre2, the official 0.9.1 release may not be able to find them, since I tweaked structure at the last minute. Sorry.)

Tags: 0.9.1, DirectoryStructure

2010/12/06

To quote or not to quote

I've been bitten by this twice, so I write it down to avoid another bite.

Short summary: The popular way to pass compiler command-line options by command/parameter substitution does not work with arguments including whitespaces.

Here I'm talking about the typical Makefile idioms such as the following:

  gcc `gauche-config -I` ...

Or like this:

  CFLAGS=`gauche-config -L`
  gcc $(CFLAGS) ...

The commands gauche-config -I and gauche-config -L produce the -I and -L flag(s) to give to the compiler, respectively. Typically there's one -I flag, but there may be more than one -L flags. So it should be interspersed into the command line. That is, we can't quote outside of substitution like this:

  gcc "`gauche-config -L`"

On Windows, Gauche may be installed under a path that contains whitespaces. In fact, Gauche Windows installer uses C:\Program Files\Gauche as the default. To pass such pathnames to -I and -L options, each option must already be quoted right after substitution.

So, initially I naively changed the output of gauche-config to quote the pathname:

  -I"c:\Program Files\Gauche\lib\gauche-0.9\0.9.1_pre2\include"

Here came a twist. The modern way to compile extension modules is to use gauche-package compile command, which takes care of gory details and makes Makefile simpler; you just list the source files and the script takes care of compiling and linking, with proper options. Internally it uses gauche.config module to obtain the same information as output of gauche-config -I and gauche-config -L.

I used Windows installer to install Gauche under C:\Program Files\ and compiled Gauche-gl using it with MinGW/MSYS. Everything worked smoothly. I was satisfied.

Then, a few days later I was testing 0.9.1 prerelease on my Linux box, and found some extension modules didn't compile. Their makefile didn't use gauche-package, but directly invoked gcc with `guache-config -I` for the arguments.

I remembered I had been tripped with the same problem a few years ago and had given up. This time I wanted to solve it once for all. I thought my quoting scheme was wrong, and fiddled with gauche-config output for some time. I couldn't managed it to work. I carefully read the man page of bash. And leaned this:

  • Quote processing is done before command/parameter substitution.
  • The result of command/parameter substitution is subject of word splitting, unless the argument (before substitution) is quoted.
  • Word splitting honor neither quotes, nor escaping (such as backslash). - it is simple string splitting with the characters specified by $IFS.

This means that, as far as we use shell-level command-line substitution, the output of gauche-config cannot contain whitespaces inside each argument. Tools using the same scheme, such as pkg-config, have the same limitation, and in fact it is documented in pkg-config manual.

If we can have intermediate step to preprocess the command line, and passes the quoted pathnames to the shell, it works. It is what gauche-package does. An alternative way could be to invoke shell within Makefile, and let make consturct the command line:

  INCLUDES = $(shell gauche-config -I)
  gcc $(INCLUDES) ...

With this, shell sees already expanded options, so it can process quotes correctly.

However, there may be a case that an extention can't use gauche-package compile because it requires special build process, and also it can't use GNU make. For the backward compatibility, I keep gauche-config -I and gauche-config -L not to quote pathnames. Hence, they are inherently unsafe way to construct a command line.

So, what should be the proper way for the extension makefile and gauche-config to handle pathnames with spaces? I don't know yet. For the time being, I added --incdirs and --archdirs options to gauche-config. They return pathnames separated by colon (or semincolon on Windows), and gauche-package constructs command line arguments from them. I'm not satisfied with it, though.

Tags: extensions, makefile, gauche-config, gauche-package

2010/11/19

Some improvements of constant propagation

Most of this feature was implemented long time ago (December 2009 - April 2010) but have no appropriate place to describe except in release notes, which haven't yet come. It is probably worth to mention it here.

A constant propagation code in the compiler was overhauled and now it can precompute a lot wider range of the code. It can recognize constant bindings defined by define-constant, and many built-in functions are precomputed if it is side-effect free and arguments are constant expressions.

For example, in the following code (logior ...) is precomputed at the compile time and becomes a constant value, as most C programmers would expect:

(define-constant *flag-0* (ash 1 0))
(define-constant *flag-3* (ash 1 3))

(func (logior *flag-0* *flag-3*))

(Note that macros can't save this case, since a macro can only see the code it contains; it can't know whether *flag-0* is constant binding, if it is defined outside of the macro, unless you put all the constant bindings to macro expansion phase, which has its own inconvenience.)

If you are unsure, you can use disasm to check.

gosh> (disasm (lambda () (func (logior *flag-0* *flag-3*))))
CLOSURE #<closure #f>
main_code (name=#f, code=0x20f7820, size=4, const=1, stack=4):
args: #f
     0 CONSTI-PUSH(9) 
     1 GREF-TAIL-CALL(1) #<identifier user#func>; (func (logior *flag-0* *flag-3*))
     3 RET 

Precomputation isn't limited to numeric computations.

gosh> (define-constant *data* '(#(a b c) #(d e f)))
*data*
gosh> (disasm (lambda () (vector-ref (cadr *data*) 1)))
CLOSURE #<closure #f>
main_code (name=#f, code=0x1024b70, size=2, const=1, stack=0):
args: #f
     0 CONST-RET e

Currently only built-in SUBRs (procedures directly implemented in C) are subject of precomputation. Note that some elementary functions are defined in Scheme to handle complex numbers, on top of real-only SUBR version.

gosh> (use math.const)
#<undef>

;; this doesn't fold constants
gosh> (disasm (lambda () (func (/ (sqrt pi) 2))))
CLOSURE #<closure #f>
main_code (name=#f, code=0x21d4ea0, size=12, const=3, stack=11):
args: #f
     0 PRE-CALL(1) 6
     2 CONST-PUSH 3.141592653589793
     4 GREF-CALL(1) #<identifier user#sqrt>; (sqrt pi)
     6 PUSH 
     7 CONSTI(2) 
     8 NUMDIV2                  ; (/ (sqrt pi) 2)
     9 PUSH-GREF-TAIL-CALL(1) #<identifier user#func>; (func (/ (sqrt pi) 2))
    11 RET 

;; ... but this does:
gosh> (disasm (lambda () (func (/ (%sqrt pi) 2))))
CLOSURE #<closure #f>
main_code (name=#f, code=0x21cf540, size=5, const=2, stack=4):
args: #f
     0 CONST-PUSH 0.8862269254527579
     2 GREF-TAIL-CALL(1) #<identifier user#func>; (func (/ (%sqrt pi) 2))
     4 RET 

I hope this restriction is removed soon, since at least for standard functions, programmer shouldn't care if they are implemented in Scheme or as SUBRs. I'm not sure I can do that by 0.9.1 release, though. Anyway, most of the time you don't need to care, but when you're writing a performance sensitive code it may worth to play with disasm a bit to find out what the compiler can do for you.

(Added 2010/11/20 20:03:14 UTC): Oh, by the way, constant folding is done only when the bindings of built-in procedures haven't been altered at the time of compilation. This is a subtle issue; if you value dynamism, that you can change any parts of the system anytime, then you might expect that the argument of func should be recalculated when you redefine / or %sqrt some later time. R6RS made library modules closed, that is, you cannot alter its exported bindings afterwards. That's a possible solution. I like Gauche to be a bit more flexible, though. I think it is reasonable to expect programmers to consider the risk of overriding existing bindings. Yes you can do that, but if you do so, do it boldly. At the beginning of the module write redefinitions explicitly and prominently so that not only the compilers but the readers of your program will also notice you're attempting some trick.

★ ★ ★

This precomputation once caused an interesting event. I had a function like the following, which worked fine in 0.9 but stopped working after I put the new constant folding code.

(define (fn ...)
  (define tree '(root))

  (define (populate! node)
    ... a code that destcructively modifies cdr of node ...)

  (populate! tree)
  (cdr tree))

I represented a tree in which each node was something like (<name> <node> ...). The variable tree began with an empty root node, the populate! function grew the tree, and finally fn returned a list of root's children.

The above code is buggy, since it destructively modifies a literal list. Gauche doesn't check mutation on a literal pairs, since doing so would be costly.

On 0.9 the code seemed working, since fn was actually called only once during program execution. So mutated literal won't affect other parts.

With the improved constant folding, the compiler deduced that the return value of fn was a cdr of constant, thus computable at compile time. It eliminated all the code in fn as dead code and made fn a constant function that merely returned ().

Ultimately the compiler should warn if it can detect the code may mutate a constant. Doing so generally is difficult, but there may be some obvious cases easy to catch.

★ ★ ★

Actually, the real fun begins when this extended constant folding meets procedure inlining. Then there are some interesting code transformation happening; such that conditional expressions can be eliminated because the compile knows the condition always be satisfied (or never satisified).

Unfortunately inlining feature is still picky and I hesitate to expose it publicly yet. I hope I can write about it soon.

Tag: compiler

2010/11/16

Partial continuations and fun of stack fiddling

The 0.9.1 release will officially support partial continuations natively. The feature was added six months ago, but I finally managed to document it.

Partial continuations have been used in the continuation-based web application framework Kahua. It had a makeshift implementation of shift and reset in terms of full continuations, as shown in Gasbichler and Sperber's ICFP02 paper ( http://citeseer.ist.psu.edu/gasbichler02final.html ).

The implementation needs to copy full stack for every reset and shift, but it just worked, and it didn't seem to be a significant bottleneck. So I wasn't bothered to implement partial continuations natively.

It had come from unexpected corner. It was just a byproduct of something else.

In May I was working on a small project using Gauche, and it seemed I could write it neatly if I passed continuations among threads back and forth.

The implementation then, however, did not allow me to do that; a full continuation captured by call/cc grabs the chain of activation records; most of the elements in it were independent from the thread and the VM it was executed in. I had made so, expecting some day I'd want to invoke a continuation in different thread. However, there were a pitfall: When Scheme code is invoked from C routine, a reference to the C stack frame is recorded in the first activation record of VM. It is used to prevent the Scheme code from invoking a continuation which had become obsolete (since the C stack, on top of which the continuation was captured, could have been popped at the time the continuation was invoked.)

★ ★ ★

I tend to lose track while thinking about continuations, so let me go over my thoughts step by step. This is the initial state: a C stack ('C') at the bottom.

  C

A C routine calls a Scheme routine, and the Scheme routine calls other Scheme routines, pushing Scheme stacks ('S') into the chain of activation records.

  C <- S <- S <- S

Sometimes a Scheme routine calls back to C routine, which may in turn calls other Scheme routines. So the chain of activation records may look like this:

  C <- S <- S <- S <- C <- S <- S <- S

When call/cc is called, it creates a continuation object that keeps the pointer to this chain.

  C <- S <- S <- S <- C <- S <- S <- S
                                     ^
                               #<continuation>

When the continuation is invoked, the VM will restore this chain to the VM stack (actually, we copy activation records when a continuation is captured, but when it is invoked we directly refer activation records on heap. So there's no copy back. But that's an implementation detail.)

Now, suppose we return from the Scheme routines to the intermediate C callback. From C's point of view, it looks like it is returned from one of several APIs such as Scm_Eval.

  C <- S <- S <- S <- C

We further returns from this intermediate C frame:

  C <- S <- S

What happens if we invoke the continuation captured above here? VM can restore the chain of Scheme activation frames, since we have total control over VM stack frames. However, we cannot restore C stack frame... we can't save C stack frame totaly portably. There's a nasty technique to save C stack frames (SCM does that) but even using that, many C routines are not written in a way that function calls return more than once.

So, Gauche used to raise an error when an invoked continuation contained a reference to a C stack frame that was not in the current executing environment.

This, off course, prevented a continuation captured on one thread to be invoked on another thread, since C stack frames of the latter were totally different from the former.

However, this was unnecessarily strict limitation. What we wanted to prevent was to return to an obsoleted C stack. Even a continuation contained a reference to an obsoleted C stack, if we would never returned to it---e.g. by invoking another continuation before returning to the obsoleted C stack---we could safely execute the continuation.

  C <- S <- S

1) Captures a continuation (*)

  C <- S <- S
            *

2) Calls the continuation captured far above.  It has
 a reference to an obsoleted C stack frame (@)

  C <- S <- S <- S <- C <- S <- S <- S 
            *         @

3) Returns from Scheme routines:

  C <- S <- S <- S <- C <- S
                      @

4) Before returning to the obsoleted C stack frame, invokes
 the continuation captured by 1.  It is safe.

  C <- S <- S
            *

4') If we don't invoke the continuation(*) and the Scheme
 routine returns to the obsoleted C stack, we raise an error.

  C <- S <- S <- S <- C
            *         @

So, instead of raising an error at the step 2, we just allowed to invoke the continuation, and only checks the validity of C stack frames when we return from Scheme to C.

★ ★ ★

While pondering about this change, an idea occurred to me. What if we just never record references to the C stack frames? Instead, the first Scheme frame directly called from C routine can just have dangling end of activation record chain (indicated by 'o' below).

  o <- S <- S <- S

If a Scheme routine calls a C routine and it calls back to a Scheme routine, we make another chain.

  o <- S <- S <- S   (C-routine)   o <- S <- S <- S

The intermediate C routine knows which Scheme routine it should return to, so we can return normally even the activation chain itself isn't continuous.

This idea didn't replace the existing continuations because of other issues after all. But I realized that if we call the above (C-routine) as reset, we just get a partial continuation.

 Entering reset form:

  o <- S <- S <- S <- reset

 Executing some more expressions, then calls (shift k ...)

  o <- S <- S <- S <- reset  o <- S <- S <- S

 We capture a partial coninuation bound to k.

                             o <- S <- S <- S <- k

The actual implementation was a bit more involved, but in a couple of days we had both continuations that can be passed to another thread, and a native support of partial continuations.

Executing the amb example in Gasbichler&Sperber's paper is 3.5 times faster in this native shift/reset compared to the version emulating them with full continuations.

Tags: Continuation, shift, reset

2010/11/12

Exact sqrt

Several months ago one of our users asked me why (sqrt 4) yields 2.0; if the argument is an exact square number, should the result be exact as well?

Yes, it should, although coercing the result to inexact numbers is allowed in R5RS, I believe. It was just as it was because I was lazy: "Fix it when it becomes a problem." It seemed that the time had come.

It turned out to be an interesting little exercise to implement a 'proper' sqrt that returns an exact result when the input is a square of an exact number. That includes rational numbers. Maybe it's a nice quiz for CS freshmen. If you haven't implemented one, just think how you'll do it.

Here are some results in the current Gauche (the long digit numbers are wrapped around):

gosh> (sqrt 144)
12
gosh> (sqrt 169/81)
13/9
gosh> (sqrt 340282366920938463463374607431768211456)
18446744073709551616
gosh> (sqrt 32317006071311007300714876688669951960444102669
71548403213034542752465513886789089319720141152291346368871
79609218980194941195591504909210950881523864482831206308773
67300996091750197750389652106796057638384067568276792218642
61975616183809433847617047058164585203630504288757589154106
58086075523991239303855219143333896683424206849747865645694
94856176035326322058077805659331026192708460314150258592864
17711672594360371846185735759835115230164590440369761323328
72312271256847108202097251571017269313234696785425806566979
35045997268352998638215525166389437335543602135433229604645
318478604952148193555853611059596230656)
17976931348623159077293051907890247336179769789423065727343
00811577326758055009631327084773224075360211201138798713933
57658789768814416622492847430639474124377767893424865485276
30221960124609411945308295208500576883815068234246288147391
31105408272371633505106845862982399472459384797163048353563
29624224137216

Let's forget about non-integer rationals and think about exact integers. I ended up handling three cases depending on the range of the input.

  • If the input is less than 2^52, the integer can be represented exactly in IEEE double floating point number, and if FP sqrt yields an integer we know it's the result. FP sqrt which is fast on the modern hardware. (Edit: This part is corrected on 2013/03/20 02:14:15 UTC, thanks to Mark H Weaver. See this entry.)
  • Otherwise we run Newton-Rhapson with exact arithmetic.
    • If the input is representable by IEEE double (e.g. up to approx. 2.0^1024), we can get an approximated result by using FP sqrt, and we use an integer close to it as the initial approximation.
    • Otherwise we use 2^floor((log2(input)+1)/2) as the initial approximation. That's because log2(input) may be calculated quickly.

Other than utilizing FP sqrt, this is pretty straightforward. If you know clever implementation of exact integer sqrt, let me know.

For exact rationals, I naively apply exact integer sqrt to both its numerator and its denominator. I guess that's faster than running Newton-Rhapson directly with exact rationals, for Gauche's rational artihmetic isn't optimized at all.

Oh, btw, R6RS's exact-integer-sqrt also returns the remainder if the input is not an exact square of an integer. That adds another little fun to the exercise.

Tags: sqrt, exact-integer-sqrt

More entries ...