2010/12/16
Looking for alternative read-time constructor syntax
I rely on srfi:10 read-time constructor #,(tag datum ...) a lot.
It does have a dark corner (there's no clear semantics to
make sure a particular tag be available at the time
it is read), but having a uniform syntax and a standard way
to extend it is indispensable for practical applications.
So, it is very unfortunate that R6RS made an incompatible choice.
#,X is taken for (unsyntax X).
Although I don't plan to make Gauche fully comform R6RS,
I'd like to make it compatible to R6RS as much as possible,
and it is desirable to be able to read R6RS code.
The plan is to switch the reader: In R6RS mode, #, is for
unsyntax. Otherwise, #, is
for srfi-10. After all, you can write unsyntax without using
abbreviation, but you cannot write read-time constructor
in other ways.
However, there will be a time that someone wants to write abbreviated unsyntax and read-time constructor in one file. It won't harm to have alternative read-time constructor syntax for more flexibility.
Specifically, I'm thinking to make records (ref:gauche.record)
printable by default. Just like Common Lisp's struct,
but it would be better to use existing srfi-10 syntax
instead of inventing a new syntax. If records have
standard external representation,
I expect the srfi-10 syntax appear in the data and code a lot more
frequently. If Gauche adopts syntax-case, the demand
of using abbreviated unsyntax will also grow. I see
potential of conflict here.
★ ★ ★
What would be a good choice for alternative syntax of read-time constructor? I don't have a concrete idea yet. I just record some ideas here for future reference and discussion.
#.(tag datum ...): Borrows from read-time eval syntax of Common Lisp. I bet the chance that Scheme standard adopts read-time evaluation is a lot smaller than it adopts read-time constructor: The former opens a big can of worms on what environment the expression should be evaluated. The similarity of read-time evaluation and read-time construction, however, could lead more confusion than other choices.#!ctor(tag datum ...): The ctor word can be different. This is a valid syntax in R6RS, in which#!ctorpart is just treated as a comment and the whole expression is read as a list. I'm not sure whether it is a good thing or not, though. It is also more verbose than other choices.#!(tag datum ...): Some implementations (and past RnRS's) uses#!as a prefix for special data, e.g.#!null. This choice can be seen as an extention to it. A disadvantage: If this appears at the top of the file, it can be mistaken to be an interpreter line.#@(tag datum ...): The character@is kind of arbitrary. ChezScheme uses this prefix for fasl objects. It gives me a sort of "internal representation" feeling. Maybe too arbitrary.#$(name datum ...): I think this more as a dedicated syntax for records. Well, it looks like Common Lisp's#S(...), and it would be more compact than#,(record name datum ...). Chicken uses#$for a special purpose so we conflict with it.
Tags: R6RS, srfi-10, syntax, gauche.record
2010/12/13
Release and test
Whew, finally I put 0.9.1 out of the door. There are several loose ends I couldn't managed to tie; for example, the new interface of rfc.http wasn't made public since I couldn't tested it enough. We still had enough stuff to make a long release notes, though.
The bottleneck of release cycle is testing. Yes we have unit tests, but they tests mostly the internal consistency---it tests that whatever you do inside Gauche you won't get wrong results.
The kind of tests before releases are focused on the external consistency---how Gauche interacts with the external world. Does it behave the same on different OSes? Does it work consistently when combined with other programs, and communicating over the network?
This time I happened to find a bug in gauche-config program
right after the release. The bug would affect where extension
packages are installed, and fixing things after many extension
packages are placed in wrong directory would be messy. So I
replaced the release with the fix.
How can I prevent problems like this, and ensure checking
other stuff that interacts with the outside world?
I had added some unit tests for utility scripts gauche-install,
gauche-config and gauche-package, but they were not enough to
catch the error I had.
One idea is to have a script that automates packaging, installing, and checking integration of the external world. It should be automated, since the release test is taking longer and longer as more external programs interact with Gauche. I'm curious how other projects manage the release testing.
2010/12/11
New directory structure
Until 0.9, you had to recompile Gauche extensions for every new release of Gauche, even if it was a micro version up (i.e. 0.8.12 -> 0.8.13). It was because C API could be changed between them.
After 1.0, it is our goal to keep API & ABI compatibility at least for minor version level, so that extension modules compiled for version X.Y.z1 will work for X.Y.z2 where z2 >= z1. 0.9.x series is the run-through rehearsal for the stable 1.0 release, to see our scheme really works after 1.0.
At 0.9 release, however, I overlooked one thing:
Full Gauche version number (major.minor.micro) was embedded
in the pathnames where architecture-depended files
are installed; extensions' DSOs are
in /usr/lib/gauche/site/X.Y.Z/${arch}/
(if Gauche was installed with --prefix=/usr.)
This wouldn't work if we want to share extensions among
different micro versions.
An easy way could be to keep using "0.9" throughout 0.9.x
series for the 'site' directory. That is, while Gauche "core"
files would still go to /usr/lib/gauche/X.Y.Z/${arch}/,
extension modules install their files to
/usr/lib/gauche/site/X.Y/${arch}/. Version 0.9's
structure already fit this scheme, so transition would
be smooth.
However, this scheme somewhat mixed gauche version
(X.Y.Z) and ABI version (X.Y). That is,
in /usr/lib/gauche/0.9/, the "0.9" part would mean gauche version,
while in /usr/lib/gauche/site/0.9/, the "0.9" part would mean ABI
version. It might not matter in practice, but I didn't quite
like such mixture.
And there came another issue: The Gauche runtime DSO
(libgauche.so) is placed in the system's common library directory,
such as /usr/lib. It is common to append versions after the
name (e.g. libgauche.so.0 and libgauche.so.0.9) and make versionless
name a symlink to the versioned name. However, the common assumption
for this scheme is that binary compatibility is kept among
the same major versions; if something is compiled with libgauche.so.1.0,
it is expected to work with libgauche.so.1.1 as well.
At this moment I want to reserve the possibility to change ABI
between Gauche version 1.0 and 1.1.
So, I decided to name the Gauche runtime DSO as libgauche-X.Y.so,
where X.Y is the ABI version. Then it is clear that which version
is compatible to which DSO. It also allows to install more than
one versions of Gauche with different ABI versions.
If we name DSO as libgauche-X.Y, then why don't we name
other runtime libraries as well? That is, we can install
stuff under /usr/lib/gauche-X.Y/ and /usr/share/gauche-X.Y.
Then it is clear that which ABI version the file(s) are for,
and it is easier to manage when you have multiple versions
of Gauche with different ABI versions (e.g. it's easy to delete
files for old versions).
So, from 0.9.1, I adopt the following naming scheme:
- Gauche runtime DSO
${exec_prefix}/libgauche-X.Y.so${exec_prefix}/libgauche-X.Y.so.0(soname)${exec_prefix}/libgauche-X.Y.so.0.Z(realname)
- Architecture dependent files
${exec_prefix}/gauche-X.Y/X.Y.Z/${arch}/*;; core's *.so${exec_prefix}/gauche-X.Y/X.Y.Z/include/*;; core's *.h${exec_prefix}/gauche-X.Y/site/${arch}/*;; extensions' *.so${exec_prefix}/gauche-X.Y/site/include/*;; extensions' *.h
- Architecture independent files
${datadir}/gauche-X.Y/X.Y.Z/*;; core's *.scm${datadir}/gauche-X.Y/site/*;; extensions' *.scm
During 0.9.x period, the old versionless directories are automatically
added to the library search path, and symlink are created from
libgauche.so etc. to the versioned DSOs. So, the extension
modules you've installed for Gauche 0.9 should keep working
after you install 0.9.1.
(Note for those who are chasing development trunk: If you've installed extensions for 0.9.1_pre1 or 0.9.1_pre2, the official 0.9.1 release may not be able to find them, since I tweaked structure at the last minute. Sorry.)
Tags: 0.9.1, DirectoryStructure
2010/12/06
To quote or not to quote
I've been bitten by this twice, so I write it down to avoid another bite.
Short summary: The popular way to pass compiler command-line options by command/parameter substitution does not work with arguments including whitespaces.
Here I'm talking about the typical Makefile idioms such as the following:
gcc `gauche-config -I` ...
Or like this:
CFLAGS=`gauche-config -L` gcc $(CFLAGS) ...
The commands gauche-config -I and gauche-config -L
produce the -I and -L flag(s) to give to the compiler,
respectively.
Typically there's one -I flag, but there may be more
than one -L flags.
So it should be interspersed into the command line. That is,
we can't quote outside of substitution like this:
gcc "`gauche-config -L`"
On Windows, Gauche may be installed under a path that contains
whitespaces. In fact, Gauche Windows installer uses
C:\Program Files\Gauche as the default.
To pass such pathnames to -I and -L options,
each option must already be quoted right after substitution.
So, initially I naively changed the output of gauche-config to quote the pathname:
-I"c:\Program Files\Gauche\lib\gauche-0.9\0.9.1_pre2\include"
Here came a twist.
The modern way to compile extension modules is
to use gauche-package compile command,
which takes care of gory details and makes Makefile simpler;
you just list the source files and the script takes care of
compiling and linking, with proper options.
Internally it uses gauche.config module to obtain the same
information as output of gauche-config -I
and gauche-config -L.
I used Windows installer to install Gauche under C:\Program Files\
and compiled Gauche-gl using it with MinGW/MSYS. Everything worked
smoothly. I was satisfied.
Then, a few days later I was testing 0.9.1 prerelease on my Linux box,
and found some extension modules didn't compile. Their makefile didn't
use gauche-package, but directly invoked gcc with
`guache-config -I` for the arguments.
I remembered I had been tripped with the same problem
a few years ago and had given up. This time I wanted to solve it
once for all. I thought my quoting
scheme was wrong, and fiddled with gauche-config output
for some time. I couldn't managed it to work. I carefully read
the man page of bash. And leaned this:
- Quote processing is done before command/parameter substitution.
- The result of command/parameter substitution is subject of word splitting, unless the argument (before substitution) is quoted.
- Word splitting honor neither quotes, nor escaping (such as backslash). -
it is simple string splitting with the characters
specified by
$IFS.
This means that, as far as we use shell-level command-line substitution,
the output of gauche-config cannot contain whitespaces inside
each argument. Tools using the same scheme, such as pkg-config,
have the same limitation, and in fact it is documented in
pkg-config manual.
If we can have intermediate step to preprocess the command line,
and passes the quoted pathnames to the shell, it works.
It is what gauche-package does. An alternative way
could be to invoke shell within Makefile, and let make
consturct the command line:
INCLUDES = $(shell gauche-config -I) gcc $(INCLUDES) ...
With this, shell sees already expanded options, so it can process quotes correctly.
However, there may be a case that an extention can't use
gauche-package compile because it requires special build process,
and also it can't use GNU make. For the backward compatibility,
I keep gauche-config -I and gauche-config -L
not to quote pathnames. Hence, they are inherently unsafe way
to construct a command line.
So, what should be the proper way for the extension makefile
and gauche-config to handle pathnames with spaces?
I don't know yet. For the time being, I added
--incdirs and --archdirs options to gauche-config.
They return pathnames separated by colon (or semincolon on Windows),
and gauche-package constructs command line arguments from them.
I'm not satisfied with it, though.
Tags: extensions, makefile, gauche-config, gauche-package
2010/11/19
Some improvements of constant propagation
Most of this feature was implemented long time ago (December 2009 - April 2010) but have no appropriate place to describe except in release notes, which haven't yet come. It is probably worth to mention it here.
A constant propagation code in the compiler was overhauled and now it can precompute a lot wider range of the code. It can recognize constant bindings defined by define-constant, and many built-in functions are precomputed if it is side-effect free and arguments are constant expressions.
For example, in the following code (logior ...) is precomputed at the compile time and becomes a constant value, as most C programmers would expect:
(define-constant *flag-0* (ash 1 0)) (define-constant *flag-3* (ash 1 3)) (func (logior *flag-0* *flag-3*))
(Note that macros can't save this case, since a macro can only see the code it contains; it can't know whether *flag-0* is constant binding, if it is defined outside of the macro, unless you put all the constant bindings to macro expansion phase, which has its own inconvenience.)
If you are unsure, you can use disasm to check.
gosh> (disasm (lambda () (func (logior *flag-0* *flag-3*))))
CLOSURE #<closure #f>
main_code (name=#f, code=0x20f7820, size=4, const=1, stack=4):
args: #f
0 CONSTI-PUSH(9)
1 GREF-TAIL-CALL(1) #<identifier user#func>; (func (logior *flag-0* *flag-3*))
3 RET
Precomputation isn't limited to numeric computations.
gosh> (define-constant *data* '(#(a b c) #(d e f)))
*data*
gosh> (disasm (lambda () (vector-ref (cadr *data*) 1)))
CLOSURE #<closure #f>
main_code (name=#f, code=0x1024b70, size=2, const=1, stack=0):
args: #f
0 CONST-RET e
Currently only built-in SUBRs (procedures directly implemented in C) are subject of precomputation. Note that some elementary functions are defined in Scheme to handle complex numbers, on top of real-only SUBR version.
gosh> (use math.const)
#<undef>
;; this doesn't fold constants
gosh> (disasm (lambda () (func (/ (sqrt pi) 2))))
CLOSURE #<closure #f>
main_code (name=#f, code=0x21d4ea0, size=12, const=3, stack=11):
args: #f
0 PRE-CALL(1) 6
2 CONST-PUSH 3.141592653589793
4 GREF-CALL(1) #<identifier user#sqrt>; (sqrt pi)
6 PUSH
7 CONSTI(2)
8 NUMDIV2 ; (/ (sqrt pi) 2)
9 PUSH-GREF-TAIL-CALL(1) #<identifier user#func>; (func (/ (sqrt pi) 2))
11 RET
;; ... but this does:
gosh> (disasm (lambda () (func (/ (%sqrt pi) 2))))
CLOSURE #<closure #f>
main_code (name=#f, code=0x21cf540, size=5, const=2, stack=4):
args: #f
0 CONST-PUSH 0.8862269254527579
2 GREF-TAIL-CALL(1) #<identifier user#func>; (func (/ (%sqrt pi) 2))
4 RET
I hope this restriction is removed soon, since at least for standard functions, programmer shouldn't care if they are implemented in Scheme or as SUBRs. I'm not sure I can do that by 0.9.1 release, though. Anyway, most of the time you don't need to care, but when you're writing a performance sensitive code it may worth to play with disasm a bit to find out what the compiler can do for you.
(Added 2010/11/20 20:03:14 UTC): Oh, by the way, constant folding is done only when the bindings of built-in procedures haven't been altered at the time of compilation. This is a subtle issue; if you value dynamism, that you can change any parts of the system anytime, then you might expect that the argument of func should be recalculated when you redefine / or %sqrt some later time. R6RS made library modules closed, that is, you cannot alter its exported bindings afterwards. That's a possible solution. I like Gauche to be a bit more flexible, though. I think it is reasonable to expect programmers to consider the risk of overriding existing bindings. Yes you can do that, but if you do so, do it boldly. At the beginning of the module write redefinitions explicitly and prominently so that not only the compilers but the readers of your program will also notice you're attempting some trick.
★ ★ ★
This precomputation once caused an interesting event. I had a function like the following, which worked fine in 0.9 but stopped working after I put the new constant folding code.
(define (fn ...)
(define tree '(root))
(define (populate! node)
... a code that destcructively modifies cdr of node ...)
(populate! tree)
(cdr tree))
I represented a tree in which each node was something like (<name> <node> ...). The variable tree began with an empty root node, the populate! function grew the tree, and finally fn returned a list of root's children.
The above code is buggy, since it destructively modifies a literal list. Gauche doesn't check mutation on a literal pairs, since doing so would be costly.
On 0.9 the code seemed working, since fn was actually called only once during program execution. So mutated literal won't affect other parts.
With the improved constant folding, the compiler deduced that the return value of fn was a cdr of constant, thus computable at compile time. It eliminated all the code in fn as dead code and made fn a constant function that merely returned ().
Ultimately the compiler should warn if it can detect the code may mutate a constant. Doing so generally is difficult, but there may be some obvious cases easy to catch.
★ ★ ★
Actually, the real fun begins when this extended constant folding meets procedure inlining. Then there are some interesting code transformation happening; such that conditional expressions can be eliminated because the compile knows the condition always be satisfied (or never satisified).
Unfortunately inlining feature is still picky and I hesitate to expose it publicly yet. I hope I can write about it soon.
Tag: compiler

Comments (0)