Gauche Devlog

< Using Gauche in GitHub Actions | :immutable slot option >

2022/10/05

Source info propagation with macro expansion

Gauche tracks source code location information and shows it in the stack trace. However, what if the source is generated by macros? In 0.9.12, the macro expander re-attached the original source info to the outermost form of the macro output. However, if a runtime error occurred in constructed code other than the outermost one, stack trace couldn't find the info and had to show "[unknown location]". It was annoying especially when the code was the result of nested macro expansions, that you didn't get a clue about where the error came from.

I was annoyed enough, so from 0.9.13, you can have better stack trace. (Well, if you're familiar with other Scheme that employs syntax-case macro expander, you're already familiar with such a feature. Yes, Gauche finally caught up.)

Let's show it with a somewhat contrived example. The following cxr macro expands to cxxx...xxr according to the given sequence of a or d.

;; (cxr a r obj) == (car obj)
;; (cxr a a r obj) == (caar obj)
;; (cxr a d a r obj) == (cadar obj)
;;etc.
(define-syntax cxr
  (syntax-rules (a d r)
    [(_ r obj) obj]
    [(_ a xs ...) (car (cxr xs ...))]
    [(_ d xs ...) (cdr (cxr xs ...))]
    [(_ . xs) (syntax-error "Malformed cxr:" (cxr . xs))]))

In 0.9.12, if you pass something that causes a runtime error, you get the annoying unknown location:

gosh$ (cxr a a a a r '(1 2 3 4))
*** ERROR: pair required, but got 1
Stack Trace:
_______________________________________
  0  (car (cxr a r '(1 2 3 4)))
        [unknown location]
  1  (eval expr env)
        at "/usr/share/gauche-0.98/0.9.12/lib/gauche/interactive.scm":336

In 0.9.13, you'll get this:

gosh$ (cxr a a a a r '(1 2 3 4))
*** ERROR: pair required, but got 1
Stack Trace:
_______________________________________
  0  (car (cxr a r '(1 2 3 4)))
        at "/home/shiro/src/Gauche/test/macro-source-info.scm":15
        expanded from (cxr a a r '(1 2 3 4))
        at "/home/shiro/src/Gauche/test/macro-source-info.scm":15
        expanded from (cxr a a a r '(1 2 3 4))
        at "/home/shiro/src/Gauche/test/macro-source-info.scm":15
        expanded from (cxr a a a a r '(1 2 3 4))
        at "(standard input)":34
  1  (eval expr env)
        at "/home/shiro/src/Gauche/src/../lib/gauche/interactive.scm":354

This works with ER-macro, too. Suppose we have another macro, c*r, in which you can give a's and d's in a single symbol. That is, (c*r aada obj) is (caadar obj). We also let the code print the given symbol, just for the sake of making things complicated.

;; (c*r aa obj) == print 'aa' and return (caar obj)
;; (c*r addar obj) == print 'addar' and return (caadr obj)
;; etc.
(define-syntax c*r
  (er-macro-transformer
   (^[form rename cmp]
     (match form
       [(_ xs obj)
        (let1 cs (map ($ string->symbol $ string $)
                      (string->list (symbol->string xs)))
          (quasirename rename
            `(begin
               (print ',xs)
               (cxr ,@cs r ,obj))))]))))

Here's 0.9.12:

gosh$ (c*r aad '(1 2 3 4))
aad
*** ERROR: pair required, but got 2
Stack Trace:
_______________________________________
  0  (car (cxr a d r '(1 2 3 4)))
        [unknown location]
  1  (eval expr env)
        at "/usr/share/gauche-0.98/0.9.12/lib/gauche/interactive.scm":336

And HEAD:

gosh$ (c*r aad '(1 2 3 4))
aad
*** ERROR: pair required, but got 2
Stack Trace:
_______________________________________
  0  (car (cxr a d r '(1 2 3 4)))
        at "/home/shiro/src/Gauche/test/macro-source-info.scm":15
        expanded from (cxr a a d r '(1 2 3 4))
        at "/home/shiro/src/Gauche/test/macro-source-info.scm":60
        expanded from (quasirename rename `(begin (print ',xs) (cxr ,@cs r  ...
        at "/home/shiro/src/Gauche/test/macro-source-info.scm":57
  1  (eval expr env)
        at "/home/shiro/src/Gauche/src/../lib/gauche/interactive.scm":354

Now, if you're user of syntax-case or syntax-rules, there's no wonder how it can be done. Macro output is constructed as syntactic objects, which can carry any sideband information. But with ER-macro, you construct the output as a simple S-expression, so it's not obviouhs where those information comes from.

Gauche has an extended pair that can carry extra information other than car and cdr. Those sideband data isn't visible as far as you're treating it as a pair, nor it affects equal?-ity of the pairs. Source code information is stored there by read procedure and its families.

gosh$ (read-from-string "(a b c d)")
(a b c d)
gosh$ (pair-attributes *1)
((source-info "(input string port)" 1))

If you consturct lists with cons or list, those information won't be attached. However, quasirename does the trick. It extracts the original source info from the input, and re-attaches it to the constructed form.

Note that, for a macro expander, we need to consider two kinds of source information: One is of the macro definition, and another is of the macro input. The source info of the macro definition is available through the argument of quasirename. But how can it get the macro input information? The macro input is already deconstructed by the time quasirename is called.

We use another sideband mechanism, procedure tags. Srfi-229 defines a general mechanism to attach an arbitrary Scheme object to a procedure. Gauche has more general mechanism (although not documented yet) that a procedure can have multiple tags, and the macro input is attached to the rename procedure as one of such tags. Then quasirename extracts that information from the rename procedure and applies it to the output.

In the definition of quasirename, the output construction code looks like this:

;; in src/libmacro.scm

   (if-let1 si (pair-attribute-get objs 'source-info #f)
     (let1 orig (assoc-ref ((with-module gauche.internal %procedure-tags-alist) r)
                           'macro-input)
       `(,extended-cons. ,xx ,yys '((source-info ,@si)
                                    ,@(cond-list
                                       [orig `(original . ,orig)]))))
     `(,cons. ,xx ,yys)))))

The pair attribute source-info holds the source info of macro definition, and original holds the macro input form.


The disadvantage of having source info in the sideband data of pairs is, obviously, that you can't attach source info to other objects than pairs. I find it not a big issue in practice, for most expressions that need attention are function calls, macro calls or special forms.

On the other hand, it has an advantage that quoted literal lists can have source code information. It can't be done with syntax objects, for quote strips any syntax wrappings. It is handy when you put a literal nested structure as DSL and let its walker signals an error with the location of the literal structure.


This is a desired feature and I'm happy to have it. However, I'm feeling a bit of ambivalence, too.

The reason I prefer ER-macro to syntax-case is that ER-macro is explicit---input and output are raw S-expression which you can direclty touch and rearrange. With syntax-case, things are wrapped in opaque syntax object, and even though you can unwrap and rewrap the objects, that opaqueness bothers me.

However, with this quasirename modification, I did introduce an implicit operation; even though the output of quasirename can be treated as an ordinary S-expression, it does more to it than just consing.

If I feel comfortable with this, maybe I can also feel comfortable with syntax-case, too. I don't know yet. Let's see.

Tags: Macro,, 0.9.13,, quasirename

Post a comment

Name: