2019/12/14

0.9.9 is out

What's left for 1.0?

The goal of 0.9.x releases has been to stabilize the binary API, so that after 1.0 release we can be sure that extension libraries won't break during 1.0.x series.

By now, we feel API is pretty stable; just want to make sure the exposed structure has enough information so that it won't hinder future development of various runtime analyzing tools such as debuggers.

Other than that, the items on the table so far are:

Full support of R7RS-Large Red and Tangerine Edition
- This includes support of exact complex numbers
Tweaking continuation frame handling. We expect it to open up several pending features, such as the issue of lost stack trace ( https://github.com/shirok/Gauche/issues/521 ), support of continuation marks (srfi-157), integration of stack trace and call trace, and better inspector-debugger.

Well, it doesn't seem a lot, but you never know.

So we hope the next release to be 1.0, but it can be 0.9.10 if we're sidetracked by some unforeseen issues. Let's see.

Tag: 0.9.9

Comments (0)

2019/12/08

#permalink

Definition is now a compile-time construct

As of 0.9.9, toplevel definitions insert bindings to the current module at compile time. Of course the value of the binding isn't known at compile time, so first the variable is marked as uninitialized. At runtime, the actual value is calculated and bound to the variable.

This is for consistent behaivor with modern module systems (the issue https://github.com/shirok/Gauche/issues/549 was the trigger of this change). Basically, defines in the same toplevel must first make those names in the same scope, then proceed to calculate the values. R6RS is clear about it, while R7RS allows implementations some leeway.

If you happen to use the value of the variable before it is bound to actual value, you'll get an error saying the variable is not initialized.

This doesn't make any difference if each definition is a toplevel form by its own. However, if multiple toplevel definitions are enclosed in a form such as begin, define-module or define-library, you'll see the difference.

This may affect an idiom, once popular until R5RS era:

(define orig-error error)
(define (error . args)
  (write args) (newline)
  (apply orig-error args))

The intention is to save the original error procedure in orig-error, then redefines error to show the arguments then calls the original error.

In R5RS where there's one toplevel, the (define (error ...) ...) is understood as reassignment to the original variable, so it works.

However, since R6RS, we have multiple toplevels as separate lexical scopes, and we have ambiguity.

(import (scheme base)      ; imports 'error' binding (in R6RS, its (rnrs))
        (scheme write))
(define orig-error error)  ; which 'error' should we refer?
(define (error . args)
  (write args) (newline)
  (apply orig-error args))

With the lexical scoping rule (toplevel definitions are treated as if in letrec* bindings; see R6RS section 10, for example), the error in the second line must refer to the error defined in the third line. And it's a violation to take a value of a variable that hasn't been calculated, hence the code above is invalid. In R7RS, it's implementation dependent.

Actually, R6RS also prohibits defining a toplevel variable that conflicts with imported names, so the above code can't work in that sense, too. In R7RS, it's implementation dependent so portable code can't do that.

The proper way is to use renaming import:

(import (except (scheme base) error)
        (rename (scheme base) (error r7rs:error))
        (scheme write))
(define (error . args)
  (write args) (newline)
  (apply r7rs:error args))

Now, Gauche has been rather permissive to this kind of implementation-dependent behaviors. And the above orig-error example still works if the file is loaded---in which case, Gauche processes each toplevel form one by one, so when it sees (define orig-error error) it doesn't know yet if error would be defined in this scope or not. So it refers to the imported error.

However, if the file is included (which effectively wraps all the forms by begin), or you write similar code within define-library or define-module, all the definitions are compiled at once, and then executed. In that case you'll see "uninitialized variable" error in 0.9.9.

We strongly recommend to avoid such ambiguous code. However, in case if you're using existing code that happens to rely on the old behavior, you can switch back to the old behavior by defining an enviornment variable GAUCHE_LEGACY_DEFINE.

Tags: 0.9.9, toplevel

Comments (0)

2019/07/22

#permalink

Nasty undefined

tl;dr - We discourage using #<undef> where generalized boolean is expected. Defining GAUCHE_CHECK_UNDEFINED_TEST environment variable turns on the warning.

The other day I was shaving yak, and encountered a nasty tick in the wool. Its name is undefined.

Sometimes, a Scheme expression is defined to be evaluated to an unspecified value. It means it can be any value, at the discretion of the implementation. However, if an ordinary looking value such as #f is returned, the users may depend on it. Such code has a hazard that it breaks when ported to other implementation.

Partly because of that, many implementations choose to have a special value returned from an expression whose return value is not specified. In Gauche, we use an undefined value, or #<undef> when printed. It is just a placeholder indicating the value doesn't (and shoudn't) matter.

A few days ago I was tweaking a precompiler and made a trivial change in the generation of .sci (interface) file. The change broke the precompiler with no obvious cause. I reverted the change and started gradually introducing the new code step by step. The breakage was reproducible, but the reason was incomprehensible. For example, emitting a newline to the file or not made the diffence.

It took a few hours, plus one night sleep, to finally identify the cause. The closure I was touching had called a procedure at the tail position that returned #<undef>. So I assumed its return value didn't matter. I changed the closure in a way that it retuned #f.

However, in other part of the code, the closure was called as something like this:

(or (and (some-condition) 
         (the-closure) 
         #t)
    (some-other-action))

Since #<undef> counts as a true value, it used to execute only the first arm of or. After the change (the-closure) returned #f, so (some-other-action) was also executed.

D'oh.

If #<undef> implies the value doesn't matter, we shouldn't rely on it being counted as true. Any code that tests the return value of the procedure whose value is undefined in the conditional has a time bomb.

Upon this realization, I added a check in Gauche VM to warn when #<undef> appears in the context of boolean test in the conditionals. Then...

I found a lot of such cases in Gauche code itself. Typical pitfall is and-let* forms in which one of the test expression yields #<undef>. And sometimes they were hard to track down, for the source of #<undef> may not always be obvious. (It was one of relatively uncommon occasions when I did miss static typing.)

The warnings can be very bothering, so it isn't turned on by default. You can turn it on by defining the environment variable GAUCHE_CHECK_UNDEFINED_TEST. This is an experimental feature and I might change my mind later, but the current plan is to enable it during unit tests eventually.

Note: There are occasions that #<undef> is used intentionally; one of them is to use it as a placeholder indicating the value isn't given. Such usage is not desirable but tolerated, and in general those values are checked with undefined?. Gauche only warns when #<undef> appears as the result of the test expression of conditionals.

Tags: undefined, 0.9.9

Comments (0)

2018/12/24

#permalink

Unbalanced unquotes undermine utility

tl;dr - Our design choice of quasirename having implicit quasiquoting was wrong, and we'll fix it.

＊＊＊

In Common Lisp, backquotes and commas are handled by the reader---this means (1) every comma (and comma-atmark) must have corresponding backquote that's lexically surrounding it, and (2) once S-expression is read, you never see the trace of commas and backquotes.

Scheme took a different approach. Quasiquotes and unquotes are just a shorthand of the form (quasiquote form) and (unquote form). Their interpretation is left to the semantics of these forms.

This opens tempting possibilities to expand usage of these forms. SCSH's extended process form ( https://scsh.net/docu/html/man-Z-H-3.html ) is one example. Its redirection forms are implicitly quasiquoted, and unquote forms in it are evaluated without a corresponding quasiquote.

(define *outfile* "output.txt")

;; Redirect output of my-program to the file named by the value of *outfile*
(run (my-program) (> ,*outfile*))

＊＊＊

Gauche adopted explicit-renaming macro for the lower hygienic macro layer (ref:er-macro-transformer). While syntax-case provides pattern matching and syntactic wrapping all in one set, ER-macro provides a minimal mechanism to hides underlying macro expansion system. In practice syntax-case is handy, but its features are inseparably tied to it. For example, you can't just simply use its pattern matcher as a runtime library independent from the macro system. We prefer basic tools each of which does one thing well, and building complicated systems combining those orthogonal tools.

For the pattern matcher, we already have mighty-powerful match (ref:util.match). On the other hand, constructing macro output is rather cumbersome with bare ER-macro, as we have to apply the rename procedure to every identifier we want to avoid from name conflict:

(define-syntax when-not
  (er-macro-transformer
    (^[form rename id=?]
      (match form
        [(_ test expr1 expr ...)
         `(,(rename 'if) (,(rename 'not) ,test)
            (,(rename 'begin) ,expr1 ,@expr))]
        [_ (error "malformed when-not:" form)]))))

So we introduced quasirename (ref:quasirename) that works quasiquote with renaming:

(define-syntax when-not
  (er-macro-transformer
    (^[form rename id=?]
      (match form
        [(_ test expr1 expr ...)
         (quasirename rename
           (if (not ,test) (begin ,expr1 ,@expr)))]
        [_ (error "malformed when-not:" form)]))))

Quasirename employs implicit quasiquote. It replaces every identifier in the form with the result of applying rename procedure on it, except the unquoted (and unquoted-spliced) portion which expands to the value of the expression as is. The code can be written almost identical to the legacy macro, except replacing quasiquote with quasirename (and provide the rename procedure).

We're quite happy with it and start rewriting lots of macros using it, then we realized its shortcomings.

＊＊＊

When quasiquote is nested, corresponding unquote should also be nested. The outermost quasiquote corresponds to the innermost unquote. It is simply implemented by keeping track of nesting levels (when you see quasiquote, increment the nest level; when you see unquote, decrement it; and keep the unquotes except zero-level ones).

For example, suppose you have the following nested quasiquote forms:

(let ((a 'outer))
  `(let ((a 'inner))
     `(list ,',a ,,'a)))

When you unwrap the outer quasiquote form, you get:

   (let ((a 'inner))
     `(list ,'outer ,a))

And when you unwrap the inner quasiquote you get:

      (list outer inner)

The nested unquotes may look scary but the rule is simple.

Count the level of unquote from left to right.
If you want to evaluate the form in a particular level (except the innermost level), put ', (quote - unquote).
Or, if you want to keep the form untouched in that level and leave it to be evaluated in higher level, put ,' (unquote - quote).

To make this mechanism work, however, every quasiquote form must know the levels of unquotes in it. Unquote forms that don't have corresponding quasiquotes would trip quasiquote forms.

Using implicit quasiquote in quasirename makes it very difficult, if not impossible, to write a quasiquote form that yields quasirename form, or other combination of nestings.

＊＊＊

So, what shall we do?

One solution is to recognize quasirename as a built-in syntax just like quasquote; let each one know the other, and count nestings properly.

However, that will make quasirename inherently unportable Gauche-specific syntax. Furthermore, what if we want to add more implicitly quasiquoted forms in future? Do we want to change every quasi-something form expanders?

Another solution is to let quasirename require its second argument to be quasiquoted. That is, this should be the proper form:

(quasirename r
  `(form ...))

and the argument without quasiquote should be invalid.

For the backward compatibility, we could allow the form being without quasiquote for a while. The only incompatible case is that the existing code intended to yield a quasiquoted form. In that case, it should be rewritten to use double quasiquotes.

(quasirename r
  ``(form ...))

Tags: 0.9.8, quasiquote, quasirename, macro

Comments (0)

2018/12/22

#permalink

Upgrading to 0.9.7

0.9.7 is out. (Noticed I didn't annouce 0.9.6 in this blog).

As described in the release notes, this release is not binary compatible---extensions must be recompiled. In case if you run a server with a bunch of Gauche extensions (like me), It's a bit of work.

In case if you forgot what extension modules you've installed, check a directory ${prefix}/share/gauche-0.9/site/lib/.packages/. It contains *.gpd files of extensions you've installed for 0.9.6 and before. For 0.9.7 and later, the gpd files are going into ${prefix}/share/gauche-0.97/site/lib/.packages/. (0.9 and 0.97 suffix in the directory name indicates ABI version.)

Tag: 0.9.7

Comments (0)

More entries ...