Source info propagation with macro expansion
Gauche tracks source code location information and shows it in the stack trace. However, what if the source is generated by macros? In 0.9.12, the macro expander re-attached the original source info to the outermost form of the macro output. However, if a runtime error occurred in constructed code other than the outermost one, stack trace couldn't find the info and had to show "[unknown location]". It was annoying especially when the code was the result of nested macro expansions, that you didn't get a clue about where the error came from.
I was annoyed enough, so from 0.9.13, you can have better stack trace. (Well, if you're familiar with other Scheme that employs syntax-case macro expander, you're already familiar with such a feature. Yes, Gauche finally caught up.)
Let's show it with a somewhat contrived example. The following
macro expands to
cxxx...xxr according to the given sequence
;; (cxr a r obj) == (car obj) ;; (cxr a a r obj) == (caar obj) ;; (cxr a d a r obj) == (cadar obj) ;;etc. (define-syntax cxr (syntax-rules (a d r) [(_ r obj) obj] [(_ a xs ...) (car (cxr xs ...))] [(_ d xs ...) (cdr (cxr xs ...))] [(_ . xs) (syntax-error "Malformed cxr:" (cxr . xs))]))
In 0.9.12, if you pass something that causes a runtime error, you get the annoying unknown location:
gosh$ (cxr a a a a r '(1 2 3 4)) *** ERROR: pair required, but got 1 Stack Trace: _______________________________________ 0 (car (cxr a r '(1 2 3 4))) [unknown location] 1 (eval expr env) at "/usr/share/gauche-0.98/0.9.12/lib/gauche/interactive.scm":336
In 0.9.13, you'll get this:
gosh$ (cxr a a a a r '(1 2 3 4)) *** ERROR: pair required, but got 1 Stack Trace: _______________________________________ 0 (car (cxr a r '(1 2 3 4))) at "/home/shiro/src/Gauche/test/macro-source-info.scm":15 expanded from (cxr a a r '(1 2 3 4)) at "/home/shiro/src/Gauche/test/macro-source-info.scm":15 expanded from (cxr a a a r '(1 2 3 4)) at "/home/shiro/src/Gauche/test/macro-source-info.scm":15 expanded from (cxr a a a a r '(1 2 3 4)) at "(standard input)":34 1 (eval expr env) at "/home/shiro/src/Gauche/src/../lib/gauche/interactive.scm":354
This works with ER-macro, too. Suppose we have another macro,
c*r, in which you can give
d's in a single symbol.
(c*r aada obj) is
We also let the code print the given symbol, just for the sake of
making things complicated.
;; (c*r aa obj) == print 'aa' and return (caar obj) ;; (c*r addar obj) == print 'addar' and return (caadr obj) ;; etc. (define-syntax c*r (er-macro-transformer (^[form rename cmp] (match form [(_ xs obj) (let1 cs (map ($ string->symbol $ string $) (string->list (symbol->string xs))) (quasirename rename `(begin (print ',xs) (cxr ,@cs r ,obj))))]))))
gosh$ (c*r aad '(1 2 3 4)) aad *** ERROR: pair required, but got 2 Stack Trace: _______________________________________ 0 (car (cxr a d r '(1 2 3 4))) [unknown location] 1 (eval expr env) at "/usr/share/gauche-0.98/0.9.12/lib/gauche/interactive.scm":336
gosh$ (c*r aad '(1 2 3 4)) aad *** ERROR: pair required, but got 2 Stack Trace: _______________________________________ 0 (car (cxr a d r '(1 2 3 4))) at "/home/shiro/src/Gauche/test/macro-source-info.scm":15 expanded from (cxr a a d r '(1 2 3 4)) at "/home/shiro/src/Gauche/test/macro-source-info.scm":60 expanded from (quasirename rename `(begin (print ',xs) (cxr ,@cs r ... at "/home/shiro/src/Gauche/test/macro-source-info.scm":57 1 (eval expr env) at "/home/shiro/src/Gauche/src/../lib/gauche/interactive.scm":354
Now, if you're user of
no wonder how it can be done. Macro output is constructed
as syntactic objects, which can carry any sideband information.
But with ER-macro, you construct the output as a simple S-expression, so
it's not obviouhs where those information comes from.
Gauche has an extended pair that can carry extra information other
than car and cdr. Those sideband data isn't visible as far as
you're treating it as a pair, nor it affects
the pairs. Source code information is stored there by
procedure and its families.
gosh$ (read-from-string "(a b c d)") (a b c d) gosh$ (pair-attributes *1) ((source-info "(input string port)" 1))
If you consturct lists with
list, those information
won't be attached. However,
quasirename does the trick.
It extracts the original source info from the input, and re-attaches
it to the constructed form.
Note that, for a macro expander, we need to consider
two kinds of source information: One is of the macro definition,
and another is of the macro input. The source info of the
macro definition is available through the argument of
But how can it get the macro input information? The macro input is
already deconstructed by the time
quasirename is called.
We use another sideband mechanism, procedure tags. Srfi-229 defines
a general mechanism to attach an arbitrary Scheme object to a procedure.
Gauche has more general mechanism (although not documented yet) that
a procedure can have multiple tags, and the macro input is attached
to the rename procedure as one of such tags. Then
extracts that information from the rename procedure and applies
it to the output.
In the definition of
quasirename, the output construction code
looks like this:
;; in src/libmacro.scm (if-let1 si (pair-attribute-get objs 'source-info #f) (let1 orig (assoc-ref ((with-module gauche.internal %procedure-tags-alist) r) 'macro-input) `(,extended-cons. ,xx ,yys '((source-info ,@si) ,@(cond-list [orig `(original . ,orig)])))) `(,cons. ,xx ,yys)))))
The pair attribute
source-info holds the source info of
macro definition, and
original holds the macro input form.
The disadvantage of having source info in the sideband data of pairs is, obviously, that you can't attach source info to other objects than pairs. I find it not a big issue in practice, for most expressions that need attention are function calls, macro calls or special forms.
On the other hand, it has an advantage that quoted literal lists can have
source code information. It can't be done with syntax objects,
quote strips any syntax wrappings. It is handy when you
put a literal nested structure as DSL and let its walker signals
an error with the location of the literal structure.
This is a desired feature and I'm happy to have it. However, I'm feeling a bit of ambivalence, too.
The reason I prefer ER-macro to syntax-case is that ER-macro is explicit---input and output are raw S-expression which you can direclty touch and rearrange. With syntax-case, things are wrapped in opaque syntax object, and even though you can unwrap and rewrap the objects, that opaqueness bothers me.
However, with this
quasirename modification, I did introduce
an implicit operation; even though the output of
can be treated as an ordinary S-expression, it does more to it than just
If I feel comfortable with this, maybe I can also feel comfortable
syntax-case, too. I don't know yet. Let's see.
Tags: Macro,, 0.9.13,, quasirename
Using Gauche in GitHub Actions
I created a GitHub action to install Gauche in the runner, so that you can use Gauche in subsequent steps: setup-gauche. Currently the action works on Linux and OSX.
To use the action, simply say
uses: shirok/setup-gauche@v3 in your job steps (check the latest version number in the setup-gauche page). The following is an excerpt of
.github/workflow/main.yml of Gauche-mecab:
jobs: build-and-test-linux: runs-on: ubuntu-latest timeout-minutes: 10 steps: - uses: actions/checkout@v3 - uses: shirok/setup-gauche@v3 - name: Install dependencies run: | sudo apt install -y libmecab-dev mecab-ipadic-utf8 - name: Build and check run: | ./configure make make -s check make -s check-dep
Gauche is installed in standard path (
/usr on Linux,
/usr/local on OSX) so that you can build Gauche extensions or run Gauche applications without any extra settings.
By default, it installs the latest release. You can choose a specific version of Gauche to install via
gauche-version input parameter; specifically, saying 'snapshot' installs the latest snapshot (prerelease) build, if there's any newer than the latest release.
Running gosh without installing
Recently I wrote some test scripts in Gauche for a project that didn't use Gauche in particular. I could've kicked get-gauche script during
make check to install Gauche locally as needed, but that seemed a bit of overkill, especially it was just for small test scripts.
Then I thought, well, I already have a Docker image. If I can feed a local script to it...
So here it is. I included in a Docker image a small script
which chdirs into
/home/app and run gosh. If you mount local cwd on
/home/app, the scripts, libraries and data in it are all visible to
gosh in the Docker:
docker run --rm -ti -v `pwd`:/home/app practicalscheme/gauche gosh-script TEST-SCRIPT
Or, you can use run-gosh-in-docker.sh script.
You can't acceses local resources other than the filesystem below the current directory, and you can't use extra libraries. But for the simple tasks this is enough.
See README in Gauche-docker-image for the details.
Is this an Undefined Behavior?
Automated tests of Gauche HEAD on Windows platform started failing since several days ago. The log showed SHA1 digest result didn't match. It's weird, for I haven't touched that part of code for long time.
I isolated the reproducible condition. It happens with the fairly
new gcc (11.2.0) with
-O2. It doesn't exhibit without optimization,
nor with gcc 10.2.0 or other previous versions of gcc I have.
The problematic code is Aaron D. Gifford's SHA implementation sha2.c ( http://www.aarongifford.com/computers/sha.html ). It was last updated in January 2004, so it's pretty old, but I think it's still widely used.
I narrowed down the problem to around here:
/* Set the bit count: */ #if BYTE_ORDER == LITTLE_ENDIAN /* Convert FROM host byte order */ REVERSE64(context->s1.bitcount,context->s1.bitcount); #endif void *buf56 = &context->s1.buffer; *(sha_word64*)buf56 = context->s1.bitcount; /* Final transform: */ SHA1_Internal_Transform(context, (sha_word32*)context->s1.buffer);
In our case,
is a macro to swap the byte order of a 64bit word.
context->s1.buffer is an array of unsigned chars.
What it does is to store 64bit-word of
bitcount into the
from 56th octet in the network byte order, and calls
It compiles to this code with optimization:
25ca75e9d: 48 8b 53 18 mov 0x18(%rbx),%rdx 25ca75ea1: 48 0f ca bswap %rdx 25ca75ea4: 48 89 53 18 mov %rdx,0x18(%rbx) 25ca75ea8: 48 89 d9 mov %rbx,%rcx 25ca75eab: 4c 89 e2 mov %r12,%rdx 25ca75eae: e8 8d fa ff ff call 25ca75940 <SHA1_Internal_Transform>
%rbx contains the pointer to
context->s1.buffer. The first three instructions swap the
64bit word. (By the way,
REVERSE64 macro is written with shifts and
bitmasks. Gcc cleverly figures out its intent and
replaces the whole expression by a
The next three instruction is the calling sequence of
Wait. There're no instructions emitted to store
*buf56. I checked the assembly after this but there're no
instructions for the store either.
If I insert a dummy external function call before
SHA1_Internal_Transform like this:
/* Set the bit count: */ #if BYTE_ORDER == LITTLE_ENDIAN /* Convert FROM host byte order */ REVERSE64(context->s1.bitcount,context->s1.bitcount); #endif void *buf56 = &context->s1.buffer; *(sha_word64*)buf56 = context->s1.bitcount; puts("foo"); /* Final transform: */ SHA1_Internal_Transform(context, (sha_word32*)context->s1.buffer);
Then the storing to
*buf56 appears (
mov %rdx, 0x58(%rbx)):
25ca75e9d: 48 8b 53 18 mov 0x18(%rbx),%rdx 25ca75ea1: 48 0f ca bswap %rdx 25ca75ea4: 48 89 53 18 mov %rdx,0x18(%rbx) 25ca75ea8: 48 8d 0d 8f c7 00 00 lea 0xc78f(%rip),%rcx # 25ca8263e <.rdata+0x9e> 25ca75eaf: 48 89 53 58 mov %rdx,0x58(%rbx) 25ca75eb3: e8 60 2e 00 00 call 25ca78d18 <puts> 25ca75eb8: 4c 89 e2 mov %r12,%rdx 25ca75ebb: 48 89 d9 mov %rbx,%rcx 25ca75ebe: e8 7d fa ff ff call 25ca75940 <SHA1_Internal_Transform>
Now, accessing type punned pointer can break the strict aliasing rule.
The gcc might have figured the storing
*buf56 had nothing to do with
But I feel there still needs to be
a leap that it completely eliminates the store instruction.
*(sha_word64*)buf56 = context->s1.bitcount triggers
Undefined Behavior? That's why gcc is entitled to remove that code?
Tags: gcc, Optimization, UndefinedBehavior
Better test failure report
I keep Gauche's test framework ref:gauche.test intentionally simple--a test evaluates a given expression and compares its result with the expected result; if they don't agree, reports it. That's all.
It doesn't have fancy knobs and dials, but it does the job. Fancy features can be written using Gauche's other features; e.g. if you need setup/teardown, you can just wrap tests with
I prefer this kind of explicit code to fat frameworks in which you need to track down its documents and (sometimes) implementation to know what exactly is done.
However, there has been one frustration: I can't easily change how the test failure is reported. Especially, when a test yields a large amount of results and it doesn't agree with expected one, it is hard to tell where is the difference, by looking at the entire expected and actual results.
Now I can have it. See the following test:
(test* "Beatrice" ;; expected '("What fire is in mine ears? Can this be true?" "Stand I condemned for pride and scorn so much?" "Contempt, farewell, and maiden pride, adieu!" "No glory lives behind the back of such.") ;; actual "What fire is in mine ears? Can this be true?\n\ Stand I condemn'd for pride and scorn so much?\n\ Contempt, farewell! and maiden pride, adieu!\n\ No glory lives behind the back of such.\n" test-check-diff ; check test-report-failure-diff) ; report
The expected text and the actual text have slight difference. This reports the difference in unified diff format.
ERROR: GOT diffs: --- expected +++ actual @@ -1,4 +1,4 @@ What fire is in mine ears? Can this be true? -Stand I condemned for pride and scorn so much? -Contempt, farewell, and maiden pride, adieu! +Stand I condemn'd for pride and scorn so much? +Contempt, farewell! and maiden pride, adieu! No glory lives behind the back of such.
The third argument of
test* is to compare the expected and
actual result. If you prepare expected text in one big string,
you can just use the default one;
test-check-diff adds a bit
of convenience by accepting a few different formats.
The fourth argument is the main addition. It accepts a report
proceudre which is called when the expected result and the actual
result didn't match, with three arguments, **message**,
**expected-result** and **acutual-result**. The **message**
argument is the first argument passed to
module to display the difference of the results in diff format
You can customize reporting as you wish. Another custom reporting we'd like to have is to show difference of tree structures.
Please refer to the manual for the details. (Before releasing 0.9.11, you can view the draft document.