Gauche Devlog

< Nasty undefined | 0.9.9 is out >

2019/12/08

Definition is now a compile-time construct

As of 0.9.9, toplevel definitions insert bindings to the current module at compile time. Of course the value of the binding isn't known at compile time, so first the variable is marked as uninitialized. At runtime, the actual value is calculated and bound to the variable.

This is for consistent behaivor with modern module systems (the issue https://github.com/shirok/Gauche/issues/549 was the trigger of this change). Basically, defines in the same toplevel must first make those names in the same scope, then proceed to calculate the values. R6RS is clear about it, while R7RS allows implementations some leeway.

If you happen to use the value of the variable before it is bound to actual value, you'll get an error saying the variable is not initialized.

This doesn't make any difference if each definition is a toplevel form by its own. However, if multiple toplevel definitions are enclosed in a form such as begin, define-module or define-library, you'll see the difference.


This may affect an idiom, once popular until R5RS era:

(define orig-error error)
(define (error . args)
  (write args) (newline)
  (apply orig-error args))

The intention is to save the original error procedure in orig-error, then redefines error to show the arguments then calls the original error.

In R5RS where there's one toplevel, the (define (error ...) ...) is understood as reassignment to the original variable, so it works.

However, since R6RS, we have multiple toplevels as separate lexical scopes, and we have ambiguity.

(import (scheme base)      ; imports 'error' binding (in R6RS, its (rnrs))
        (scheme write))
(define orig-error error)  ; which 'error' should we refer?
(define (error . args)
  (write args) (newline)
  (apply orig-error args))

With the lexical scoping rule (toplevel definitions are treated as if in letrec* bindings; see R6RS section 10, for example), the error in the second line must refer to the error defined in the third line. And it's a violation to take a value of a variable that hasn't been calculated, hence the code above is invalid. In R7RS, it's implementation dependent.

Actually, R6RS also prohibits defining a toplevel variable that conflicts with imported names, so the above code can't work in that sense, too. In R7RS, it's implementation dependent so portable code can't do that.

The proper way is to use renaming import:

(import (except (scheme base) error)
        (rename (scheme base) (error r7rs:error))
        (scheme write))
(define (error . args)
  (write args) (newline)
  (apply r7rs:error args))

Now, Gauche has been rather permissive to this kind of implementation-dependent behaviors. And the above orig-error example still works if the file is loaded---in which case, Gauche processes each toplevel form one by one, so when it sees (define orig-error error) it doesn't know yet if error would be defined in this scope or not. So it refers to the imported error.

However, if the file is included (which effectively wraps all the forms by begin), or you write similar code within define-library or define-module, all the definitions are compiled at once, and then executed. In that case you'll see "uninitialized variable" error in 0.9.9.

We strongly recommend to avoid such ambiguous code. However, in case if you're using existing code that happens to rely on the old behavior, you can switch back to the old behavior by defining an enviornment variable GAUCHE_LEGACY_DEFINE.

Tags: 0.9.9, toplevel

Post a comment

Name: