Gauche Devlog

Exact and repeating decimals

2025-04-13T18:53:28+00:00

Exact and repeating decimals

Novice programmers are often perplexed by most programming languages being not able to add 0.1 ten times ``correctly'':

s = 0
for i in range(10):
   s += 0.1
print(s)

# prints: 0.9999999999999999

"Floating point numbers are inexact, that's why," tells a tutor. "You should expect some errors."

Gauche isn't an exception, for decimal notation is read as inexact numbers:

gosh> (apply + (make-list 10 0.1))
0.9999999999999999

However, Scheme also has exact numbers. Numbers without a decimal point or exponent, or rational numbers, are read as exact numbers. You can also prefix decimal numbers with #e to make them exact. Using exact numbers, you can have an exact result.

gosh> (apply + (make-list 10 #e0.1))
1

The trick is that Gauche reads #e0.1 as an exact rational number 1/10, and perform computation as exact rationals. It is revealed when the result is not a whole number:

gosh> (+ #e0.1 #e0.1)
1/5

It is incovenient, though, when you want to perform exact computation with decimal numbers, i.e. adding prices with dollars and cents. If you add $15.15 and $8.91, you want to see the result as 24.06 instead of 1203/50.

;; Inexact
gosh> (+ 15.15 8.91)
24.060000000000002

;; Exact
gosh> (+ #e15.15 #e8.91)
1203/50

So, we added a new REPL print mode, exact-decimal. If you set it to #t, Gauche tries to print exact non-integer result as decimal notation whenever possible.

gosh> ,pm exact-decimal #t
Current print mode:
        length :  50
         level :  10
        pretty :  #t
         width :  79
          base :  10
         radix :  #f
 string-length : 256
    bytestring :  #f
 exact-decimal :  #t

Let's see:

gosh> (+ #e15.15 #e8.91)
#e24.06

We can always have exact decimal notation of rational numbers whose denominator's factor contains only 2 and 5.

gosh> 1/65536
#e0.0000152587890625

As far as we use addition, subtraction, and multiplication of exact decimal notated numbers, the result is always representable with exact decimal notation.

But what if division is involved? Isn't it a shame that we have an exact value (as a rational number), but can't print it as a decimal exactly?

Decimal notation of rational numbers whose denominator contains factors other than 2 and 5 becomes repeating decimals. Hence if we have a notation of repeating decimals, we can cover such cases.

So, here it is. If a numeric literal contains # followed by one or more digits, we understand the digits after # repeating infinitely.

gosh> 0.#3
0.3333333333333333
gosh> 0.0#123
0.012312312312312312
gosh> 0.#5
0.5555555555555556
gosh> 0.1#9
0.2

(Note: If no digits follows #, it is "insignificant digit" notation in R5RS.)

The above examples have limited number of digits because they're inexact numbers (note that we didn't prefix them with #e). For exact numbers, we can represent any rational numbers exactly with this notation:

gosh> 1/3
#e0.#3
gosh> 1/7
#e0.#142857
gosh> (* 1/7 2)
#e0.#285714
gosh> (* #e0.#3 #e0.#142857)
#e0.#047619

Note that the length of repetition can be arbitrarily long, so there are numbers that can't practically be printed in this notation. For the time being, we have a hard limit of 1024 for the length of repetition. If the result exceeds this limitation, we fall back to rational notation.

;; 1/2063 has repeating cycle of 1031 digits
gosh> (/ 1 2063)
1/2063

Tags: Numbers, Syntax

Running prebuilt Gauche on GitHub workflow

2024-06-29T02:08:33+00:00

Running prebuilt Gauche on GitHub workflow

The setup-gauche action installs Gauche on GitHub workflow runners for you (Using Gauche in GitHub Actions). But it downloaded source tarball and compiled, which took time. Especially if your repo is a small library, it feels waste of time compiling Gauche every time you push to the repo.

Now, setup-gauche can use a prebuilt binary on ubuntu-latest and macos-latest platforms. Just give prebuilt-binary: true as the parameter:

name: Build and test

on: [push, pull_request]

jobs:
  build-and-test:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    steps:
    - uses: actions/checkout@v3
    - uses: practical-scheme/setup-gauche@v5
      with:
        prebuilt-binary: true
    - name: Install dependencies
      run: |
        sudo apt install -y gettext
    - name: Build and check
      run: |
        ./configure
        make
        make -s check

Installing prebuilt binary takes around 10s or so; huge time saving.

Note that the prebuilt binary is provided with the latest Gauche release only. Other parameters of setup-gauche are ignored if you use the prebuilt binary.

(You may have noticed that the repository name is now under practical-scheme instead of shirok--I made practical-scheme organization and am gradually moving Gauche repositories to there, for easier maintenance. The URL is redirected from shirok so you don't need to update immediately, but just FYI.)

The following is for those who are curious about behind-the-scene.

Prebuilt binaries are prepared in a different repository: https://github.com/practical-scheme/setup-gauche-binary

It has GitHub actions that fetches the latest release tarball, build in GitHub runner, and upload the result as the assets of the repo's release. That ensures the binary runs on GitHub runners.

Tags: github, CI

Caching formatter procedure

2024-01-18T02:37:07+00:00

Caching formatter procedure

Lisp's format procedure is very un-Schemy. Instead of having a set of composable, orthogonal, do-one-thing-well procedures, format introduces a mini-language that's syntactically and semantically separate from the base language. It is not extendable, loaded with obscure features from the past. Yet it is handy for typical trivial tasks and that's why Gauche (and other Schemes, plus a couple fo SRFIs) offer it. (And to be honest, there's some pleasure to tinker such mini-language implementations.)

Aside from the non-composability, another glaring drawback of format is that it needs to interpret the mini language (format string) at runtime. Most format calls have a literal format string, and it is waste of time to parse it every time format is called. An obvious optimization is to recognize the literal format string and translates the call to format by simpler procedures at compile-time. I believe most CL implemenations do so.

However, Gauche, as well as some other Scheme implementations and SRFI-48, allows the port argument to be omitted. It is convenient, but it indeed makes compile-time transformation difficult. If the first argument of format is a non-literal expression (it is the case if you're passing a port), it is diffuclt for the compiler to recognize if the format string is a constant, even the second argument is a literal string that looks like a format string. If the first expression yields a string at runtime, that is the format string and the literal string is an argument to be shown.

Despite these difficulties, we can still take advantage of literal format string, by caching the format string compilation result at run-time.

It is not exactly the same as memoization. It is difficult to control amount of memoized results, and we only want to cache literal format strings, which needs to be determined at compile time.

So, we implemented a hybrid solution. The compiler macro attached to format checks if possible format string is a literal, and if so, it transforms the call into an internal procedure that takes an extra argument. The extra argument contains the position of the possible literal format string, and a mutable box. The following is the core part of the compile-time transformation:

(define-syntax make-format-transformer
  (er-macro-transformer
   (^[f r c]
     (match f
       [(_ shared?)
        (quasirename r
          `(er-macro-transformer
            (^[f r c]
              (define (context-literal pos) `(,',shared? ,pos ,(box #f)))
              (match f
                [(_ (? string?) . _)
                 (quasirename r
                   `(format-internal ',(context-literal 0) (list ,@(cdr f))))]
                [(_ _ (? string?) . _)
                 (quasirename r
                   `(format-internal ',(context-literal 1) (list ,@(cdr f))))]
                [(_ _ _ (? string?) . _)
                 (quasirename r
                   `(format-internal ',(context-literal 2) (list ,@(cdr f))))]
                [_ f]))))]))))

(NB: shared? flag is used to share the routine with format and format/ss. We need to check the literal string in first, second and third position, for Gauche's format allows two optional arguments before the format string.)

At run-time, the internal function can see if the literal string is indeed a format string. If so, it computes a formatter procedure based on the format string, and stores it to the mutable box. Subsequent calls will use the computed formatter procedure, skipping parsing and compiling the format string. The caching occurs per-call-site, much like the global variable lookup (we cache the object, the result of lookup, in the code vector).

The format-internal procedure checks optional arguments, and calls format-2. Its first argument can be a mutable box introduced by the above macro, if we do know the format string is literal.

(define (format-2 formatter-cache shared? out control fmtstr args)
  (let1 formatter (if formatter-cache
                    (or (unbox formatter-cache)
                        (rlet1 f (formatter-compile fmtstr)
                          (set-box! formatter-cache f)))
                    (formatter-compile fmtstr))
    (case out
      [(#t)
       (call-formatter shared? #t formatter (current-output-port) control args)]
      [(#f) (let1 out (open-output-string)
              (call-formatter shared? #f formatter out control args)
              (get-output-string out))]
      [else (call-formatter shared? #t formatter out control args)])))

A micro benchmark shows it's effective. In real code, the effect may not be so prominent, but it does remove worries that you're wasting time for parsing format string.

(define (run p)
  (dotimes [n 1000000]
    (format p "n=~7d 1/n=~8,6f\n" n (/. n))))

(define (main _)
  (time (call-with-output-file "/dev/null" run))
  0)

With caching off:

;(time (call-with-output-file "/dev/null" run))
; real  19.796
; user  19.790
; sys    0.000

With caching on:

;(time (call-with-output-file "/dev/null" run))
; real  10.313
; user  10.310
; sys    0.000

Tag: format

Pipeworks

2023-09-30T08:24:26+00:00

Pipeworks

Ports are very handy abstraction of data source and sink. In Gauche libraries, you can find many utitlies that reads from input port or writes to output port, and then another utilities (e.g. convert from/to string) are built on top of them.

While they are useful, it becomes tricky when you want to compose those utilities. Suppose you have a procedure f that writes to an output port, and a procedure g that read from an input port. You want to feed the output of f to g while make f and g run concurrently, so some threading is involved. You can write such a pipe using procedural ports but it is cumbersome to do so for every occasion. I want something that's as easy as Unix pipe.

So I initially started to writing a pipe utility using procedural ports. Then I realised I also want a device dual to it; while a pipe flows data from an output port to an input port, the co-pipe, or pump, pulls data from an input port and push it to an output port. An example is that you run a subprocess and feed its error output to your current output port. When you invoke a subprocess (ref:gauche.process), you can get its error output from an input port. So you need to read it actively and feed the data to your current output port.

Then you might want to peek the error output to find out a specific error message appears. So your contraption reads actively an input port, and feed the data to an output port, and you can read whatever data flows through it from another input port to monitor.

There are many variations, and mulling over it for some time, I wrote a library that abstracts any of such configurations. I call the device plumbing (draft:control.plumbing).

You can also create an output port that feeds the data to multiple outputs, or gather multiple input port into one input port. Refer to the manual to see what you can do.

Tags: 0.9.13, control.plumbing

Real numerical functions

2023-09-29T00:31:52+00:00

Real numerical functions

Scheme devines a set of elementary functions that can handle complex numbers. In Gauche, complex elementary functions is built on top of real domain functions. Up to 0.9.12, we had real-only version with the name such as %sin or %exp. As the percent prefix suggests, they are not meant to be used directly; sin ro exp are built on top of them.

However, sometimes you want to use real-only versions to avoid overhead of type testing and dispatching complex numbers. srfi:94 defines real-domain functions, so we decided to adapt them. Now you have real-sin, real-exp etc. (draft:real-exp) as built-in.

Note that scheme.flonum also provides "flonum-only" version of elementary functions, e.g. flsin (ref:scheme.flonum). They won't even accept exact numbers. Since it is in R7RS-large, you may want to use them for portable code.

Although the names %sin etc. are undocumented and not meant to be directly used, they were visible by default, so some existing code are relying on it. It needs some effort to rewrite all occurrences of such functions with the new real-sin etc, so we provide a compatibility module, compat.real-elementary-functions. Just using it in your code provides compatibility names. If you want to make your code work on both 0.9.12 and 0.9.13, you can use cond-expand:

(cond-expand
  ((library compat.real-elementary-functions)
   (use compat.real-elementary-functions))
  (else))

Tags: 0.9.13, NumericFunctions

Pretty print indentation

2023-09-28T21:54:35+00:00

Pretty print indentation

Yet another small thing good to have. You can now specify base indentation to the pretty printer (ref:pprint). It is applied to the second line and after.

gosh> (pprint (make-list 100 'abc) :indent 20)
(abc abc abc abc abc abc abc abc abc abc abc abc abc abc
                     abc abc abc abc abc abc abc abc abc abc abc abc abc abc
                     abc abc abc abc abc abc abc abc abc abc abc abc abc abc
                     abc abc abc abc abc abc abc abc abc abc abc abc abc abc
                     abc abc abc abc abc abc abc abc abc abc abc abc abc abc
                     abc abc abc abc abc abc abc abc abc abc abc abc abc abc
                     abc abc abc abc abc abc abc abc abc abc abc abc abc abc
                     abc abc)
#

To say more precisely, when the pretty printer spills data to another line, it inserts "a newline + whitespace * indent", plus any indent computed by the pretty printer.

The benefit easiest to see is when the pretty printer is used inside format. When a pretty printing triggered by the ~:w directive, it sets the base indentation at the column it starts printing. Hence the entire pretty print is indented to align nicely:

gosh> (format #t "Long list: ~:w\n" (make-list 100 'abc))
Long list: (abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc
            abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc
            abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc
            abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc
            abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc
            abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc
            abc abc)
#

Since pretty printing is built-in to the core printer (pprint is just a simple interface to use), other output routines such as write can also use base indentation. You can set indent slot of a write-controls.

gosh> (write (make-list 100 'abc) (make-write-controls :pretty #t :indent 20 :width 79))
(abc abc abc abc abc abc abc abc abc abc abc abc abc abc
                     abc abc abc abc abc abc abc abc abc abc abc abc abc abc
                     abc abc abc abc abc abc abc abc abc abc abc abc abc abc
                     abc abc abc abc abc abc abc abc abc abc abc abc abc abc
                     abc abc abc abc abc abc abc abc abc abc abc abc abc abc
                     abc abc abc abc abc abc abc abc abc abc abc abc abc abc
                     abc abc abc abc abc abc abc abc abc abc abc abc abc abc
                     abc abc)#

Tags: 0.9.13, pretty-printing

Segmented completion

2023-09-26T17:30:47+00:00

Segmented completion

Another little something on REPL. It can now complete symbols like call-with-current-continuation from c-w-c-c. This is an old tradition of Lisp environment.

We added a new module text.segmented-match (draft:text.segmented-match) to support this.

Tags: 0.9.13, REPL

Hints for unbound variable error

2023-09-26T07:21:18+00:00

Hints for unbound variable error

While working on REPL, sometimes you accidentally try to evaluate a variable that isn't visible from your current module. It is a bit annoying if you know the module is loaded, just you forget to use it in the current module.

So we added a little feature. When REPL reports an unbound varriable error, it also lists if there are variables of that name, exported from modules that are loaded into the process but not visible from the evaluating module:

gosh$ (thread-start! (make-thread (^[] (print "Hi"))))
*** UNBOUND-VARIABLE-ERROR: unbound variable: make-thread
    NOTE: `make-thread' is exported from the following module:
     - gauche.threads
Stack Trace:
_______________________________________
  0  (report-error e)
  1  (make-thread (^ () (print "Hi")))
        at "(input string port)":1

It may be nice to show modules that aren't even loaded, too, but that would be too costly so we avoided it. It also doesn't show non-exported variables, which is debatable--sometimes you forgot to export one and that caused this error. Let's use this for a while and see if we need non-exported ones, too.

This is realized in a general mechanism in error reporting. We haven't documented it yet, for we may tweak the interface, but I'll show it to give the general idea.

The error message in REPL, including the stack trace, is produced by report-error (ref:report-error). It prints *** ... line, with the condition class name and error message, then calls a generic function report-additional-condition on the condition. We have a specialized method for which searches the name in the loaded modules and prints the hint.

If the thrown condition is a compound condition, report-additional-condition is called over each component of the compound condition. This allows custom report for each component. When you load a file that has a statically detectable error, you get the additional information (While compiling ...). It is also realized by the same mechanism. The compiler and the loader adds the location information as a compound condition, and report-error calls report-additional-condition on them, which shows those additional messages.

gosh> ,l ./foo
*** ERROR: wrong number of arguments: cons requires 2, but got 1
    While compiling "./foo.scm" at line 1: (define (bar x) (cons x))
    While loading "./foo.scm" at line 2
Stack Trace:
_______________________________________
  0  (report-error e)
  1  (errorf "wrong number of arguments: ~a requires ~a, but got ~"...
  2  (pass1/expand-inliner program id gval cenv)

Tags: 0.9.13, REPL, report-error

:immutable slot option

2023-04-13T21:40:43+00:00

`:immutable` slot option

Immutability is far more valued nowadays than when CLOS was designed. Back then, basically the system allows programmers to change almost everything, assuming he knows what he was doing.

However, specifying something immutable is not only to protect users from shooting their own foot; it communicates to the readers. Here the readers can be a programmer who reads and uses the code years later (who can be the same programmer who wrote it bot has forgotten the details), or a program-processing programs such as optimizer to take advantage of immutablity.

CLOS (and Gauche's object system) have enough flexibility to implement immutable slots, but it is somewhat awkward. It's not as simple as having a custom setter that throws an error; for, the slot setter is also called in the instance initializer which runs at the instance construction. You have to distinguish whether the slot setter is invoked during initialization or outside initialization, but such dynamic introspection would be costly.

We came up an alternative mechanism which is effectively realizes immutable slots in practical situations, but does not require to distinguish whether it's in initialization or not.

If a slot has a true value for :immutable slot option, the slot can only be initialized once--that is, the setter sets the value if the slot is previously unbound, but throws an error if not. If you give the slot an initial value, either with :init-keyword or :init-value etc., then that one chance to set the value is used within initializer. Uninitialized immutable slots don't make much sense, so we expect almost always immutable slots are initialized this way.

It is possible that the initializer leaves the slot unbound, and later the user call slot-set! to set it once. It can be viewed as delayed initialization.

(We first named this option init-once, for the slot can be set once, but changed our mind for it could be confusing.)

Tag: ObjectSystem

Source info propagation with macro expansion

2022-10-05T09:05:28+00:00

Source info propagation with macro expansion

Gauche tracks source code location information and shows it in the stack trace. However, what if the source is generated by macros? In 0.9.12, the macro expander re-attached the original source info to the outermost form of the macro output. However, if a runtime error occurred in constructed code other than the outermost one, stack trace couldn't find the info and had to show "[unknown location]". It was annoying especially when the code was the result of nested macro expansions, that you didn't get a clue about where the error came from.

I was annoyed enough, so from 0.9.13, you can have better stack trace. (Well, if you're familiar with other Scheme that employs syntax-case macro expander, you're already familiar with such a feature. Yes, Gauche finally caught up.)

Let's show it with a somewhat contrived example. The following cxr macro expands to cxxx...xxr according to the given sequence of a or d.

;; (cxr a r obj) == (car obj)
;; (cxr a a r obj) == (caar obj)
;; (cxr a d a r obj) == (cadar obj)
;;etc.
(define-syntax cxr
  (syntax-rules (a d r)
    [(_ r obj) obj]
    [(_ a xs ...) (car (cxr xs ...))]
    [(_ d xs ...) (cdr (cxr xs ...))]
    [(_ . xs) (syntax-error "Malformed cxr:" (cxr . xs))]))

In 0.9.12, if you pass something that causes a runtime error, you get the annoying unknown location:

gosh$ (cxr a a a a r '(1 2 3 4))
*** ERROR: pair required, but got 1
Stack Trace:
_______________________________________
  0  (car (cxr a r '(1 2 3 4)))
        [unknown location]
  1  (eval expr env)
        at "/usr/share/gauche-0.98/0.9.12/lib/gauche/interactive.scm":336

In 0.9.13, you'll get this:

gosh$ (cxr a a a a r '(1 2 3 4))
*** ERROR: pair required, but got 1
Stack Trace:
_______________________________________
  0  (car (cxr a r '(1 2 3 4)))
        at "/home/shiro/src/Gauche/test/macro-source-info.scm":15
        expanded from (cxr a a r '(1 2 3 4))
        at "/home/shiro/src/Gauche/test/macro-source-info.scm":15
        expanded from (cxr a a a r '(1 2 3 4))
        at "/home/shiro/src/Gauche/test/macro-source-info.scm":15
        expanded from (cxr a a a a r '(1 2 3 4))
        at "(standard input)":34
  1  (eval expr env)
        at "/home/shiro/src/Gauche/src/../lib/gauche/interactive.scm":354

This works with ER-macro, too. Suppose we have another macro, c*r, in which you can give a's and d's in a single symbol. That is, (c*r aada obj) is (caadar obj). We also let the code print the given symbol, just for the sake of making things complicated.

;; (c*r aa obj) == print 'aa' and return (caar obj)
;; (c*r addar obj) == print 'addar' and return (caadr obj)
;; etc.
(define-syntax c*r
  (er-macro-transformer
   (^[form rename cmp]
     (match form
       [(_ xs obj)
        (let1 cs (map ($ string->symbol $ string $)
                      (string->list (symbol->string xs)))
          (quasirename rename
            `(begin
               (print ',xs)
               (cxr ,@cs r ,obj))))]))))

Here's 0.9.12:

gosh$ (c*r aad '(1 2 3 4))
aad
*** ERROR: pair required, but got 2
Stack Trace:
_______________________________________
  0  (car (cxr a d r '(1 2 3 4)))
        [unknown location]
  1  (eval expr env)
        at "/usr/share/gauche-0.98/0.9.12/lib/gauche/interactive.scm":336

And HEAD:

gosh$ (c*r aad '(1 2 3 4))
aad
*** ERROR: pair required, but got 2
Stack Trace:
_______________________________________
  0  (car (cxr a d r '(1 2 3 4)))
        at "/home/shiro/src/Gauche/test/macro-source-info.scm":15
        expanded from (cxr a a d r '(1 2 3 4))
        at "/home/shiro/src/Gauche/test/macro-source-info.scm":60
        expanded from (quasirename rename `(begin (print ',xs) (cxr ,@cs r  ...
        at "/home/shiro/src/Gauche/test/macro-source-info.scm":57
  1  (eval expr env)
        at "/home/shiro/src/Gauche/src/../lib/gauche/interactive.scm":354

Now, if you're user of syntax-case or syntax-rules, there's no wonder how it can be done. Macro output is constructed as syntactic objects, which can carry any sideband information. But with ER-macro, you construct the output as a simple S-expression, so it's not obviouhs where those information comes from.

Gauche has an extended pair that can carry extra information other than car and cdr. Those sideband data isn't visible as far as you're treating it as a pair, nor it affects equal?-ity of the pairs. Source code information is stored there by read procedure and its families.

gosh$ (read-from-string "(a b c d)")
(a b c d)
gosh$ (pair-attributes *1)
((source-info "(input string port)" 1))

If you consturct lists with cons or list, those information won't be attached. However, quasirename does the trick. It extracts the original source info from the input, and re-attaches it to the constructed form.

Note that, for a macro expander, we need to consider two kinds of source information: One is of the macro definition, and another is of the macro input. The source info of the macro definition is available through the argument of quasirename. But how can it get the macro input information? The macro input is already deconstructed by the time quasirename is called.

We use another sideband mechanism, procedure tags. Srfi-229 defines a general mechanism to attach an arbitrary Scheme object to a procedure. Gauche has more general mechanism (although not documented yet) that a procedure can have multiple tags, and the macro input is attached to the rename procedure as one of such tags. Then quasirename extracts that information from the rename procedure and applies it to the output.

In the definition of quasirename, the output construction code looks like this:

;; in src/libmacro.scm

   (if-let1 si (pair-attribute-get objs 'source-info #f)
     (let1 orig (assoc-ref ((with-module gauche.internal %procedure-tags-alist) r)
                           'macro-input)
       `(,extended-cons. ,xx ,yys '((source-info ,@si)
                                    ,@(cond-list
                                       [orig `(original . ,orig)]))))
     `(,cons. ,xx ,yys)))))

The pair attribute source-info holds the source info of macro definition, and original holds the macro input form.

The disadvantage of having source info in the sideband data of pairs is, obviously, that you can't attach source info to other objects than pairs. I find it not a big issue in practice, for most expressions that need attention are function calls, macro calls or special forms.

On the other hand, it has an advantage that quoted literal lists can have source code information. It can't be done with syntax objects, for quote strips any syntax wrappings. It is handy when you put a literal nested structure as DSL and let its walker signals an error with the location of the literal structure.

This is a desired feature and I'm happy to have it. However, I'm feeling a bit of ambivalence, too.

The reason I prefer ER-macro to syntax-case is that ER-macro is explicit---input and output are raw S-expression which you can direclty touch and rearrange. With syntax-case, things are wrapped in opaque syntax object, and even though you can unwrap and rewrap the objects, that opaqueness bothers me.

However, with this quasirename modification, I did introduce an implicit operation; even though the output of quasirename can be treated as an ordinary S-expression, it does more to it than just consing.

If I feel comfortable with this, maybe I can also feel comfortable with syntax-case, too. I don't know yet. Let's see.

Tags: Macro,, 0.9.13,, quasirename

Using Gauche in GitHub Actions

2022-06-17T10:56:36+00:00

Using Gauche in GitHub Actions

I created a GitHub action to install Gauche in the runner, so that you can use Gauche in subsequent steps: setup-gauche. Currently the action works on Linux and OSX.

To use the action, simply say uses: shirok/setup-gauche@v3 in your job steps (check the latest version number in the setup-gauche page). The following is an excerpt of .github/workflow/main.yml of Gauche-mecab:

jobs:
  build-and-test-linux:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    steps:
    - uses: actions/checkout@v3
    - uses: shirok/setup-gauche@v3
    - name: Install dependencies
      run: |
        sudo apt install -y libmecab-dev mecab-ipadic-utf8
    - name: Build and check
      run: |
        ./configure
        make
        make -s check
        make -s check-dep

Gauche is installed in standard path (/usr on Linux, /usr/local on OSX) so that you can build Gauche extensions or run Gauche applications without any extra settings.

By default, it installs the latest release. You can choose a specific version of Gauche to install via gauche-version input parameter; specifically, saying 'snapshot' installs the latest snapshot (prerelease) build, if there's any newer than the latest release.

Tags: github, CI

Running gosh without installing

2022-04-02T00:44:54+00:00

Running gosh without installing

Recently I wrote some test scripts in Gauche for a project that didn't use Gauche in particular. I could've kicked get-gauche script during make check to install Gauche locally as needed, but that seemed a bit of overkill, especially it was just for small test scripts.

Then I thought, well, I already have a Docker image. If I can feed a local script to it...

So here it is. I included in a Docker image a small script gosh-script, which chdirs into /home/app and run gosh. If you mount local cwd on /home/app, the scripts, libraries and data in it are all visible to gosh in the Docker:

docker run --rm -ti -v `pwd`:/home/app practicalscheme/gauche gosh-script TEST-SCRIPT

Or, you can use run-gosh-in-docker.sh script.

You can't acceses local resources other than the filesystem below the current directory, and you can't use extra libraries. But for the simple tasks this is enough.

See README in Gauche-docker-image for the details.

Tag: Docker

Is this an Undefined Behavior?

2022-06-04T21:58:23+00:00

Is this an Undefined Behavior?

Automated tests of Gauche HEAD on Windows platform started failing since several days ago. The log showed SHA1 digest result didn't match. It's weird, for I haven't touched that part of code for long time.

I isolated the reproducible condition. It happens with the fairly new gcc (11.2.0) with -O2. It doesn't exhibit without optimization, nor with gcc 10.2.0 or other previous versions of gcc I have.

The problematic code is Aaron D. Gifford's SHA implementation sha2.c ( http://www.aarongifford.com/computers/sha.html ). It was last updated in January 2004, so it's pretty old, but I think it's still widely used.

I narrowed down the problem to around here:

        /* Set the bit count: */
#if BYTE_ORDER == LITTLE_ENDIAN
        /* Convert FROM host byte order */
        REVERSE64(context->s1.bitcount,context->s1.bitcount);
#endif
        void *buf56 = &context->s1.buffer[56];
        *(sha_word64*)buf56 = context->s1.bitcount;

        /* Final transform: */
        SHA1_Internal_Transform(context, (sha_word32*)context->s1.buffer);

In our case, BYTE_ORDER is LITTLE_ENDIAN. REVERSE64 is a macro to swap the byte order of a 64bit word. context->s1.bitcount is uint64_t, and context->s1.buffer is an array of unsigned chars. What it does is to store 64bit-word of bitcount into the buffer from 56th octet in the network byte order, and calls SHA1_Internal_Transform.

It compiles to this code with optimization:

   25ca75e9d:   48 8b 53 18             mov    0x18(%rbx),%rdx
   25ca75ea1:   48 0f ca                bswap  %rdx
   25ca75ea4:   48 89 53 18             mov    %rdx,0x18(%rbx)
   25ca75ea8:   48 89 d9                mov    %rbx,%rcx
   25ca75eab:   4c 89 e2                mov    %r12,%rdx
   25ca75eae:   e8 8d fa ff ff          call   25ca75940

Here, %rbx contains the pointer to context, and %r12 to context->s1.buffer. The first three instructions swap the 64bit word. (By the way, REVERSE64 macro is written with shifts and bitmasks. Gcc cleverly figures out its intent and replaces the whole expression by a bswap instrucion.) The next three instruction is the calling sequence of SHA1_Internal_Transform.

Wait. There're no instructions emitted to store bitcount into *buf56. I checked the assembly after this but there're no instructions for the store either.

If I insert a dummy external function call before SHA1_Internal_Transform like this:

        /* Set the bit count: */
#if BYTE_ORDER == LITTLE_ENDIAN
        /* Convert FROM host byte order */
        REVERSE64(context->s1.bitcount,context->s1.bitcount);
#endif
        void *buf56 = &context->s1.buffer[56];
        *(sha_word64*)buf56 = context->s1.bitcount;

        puts("foo");

        /* Final transform: */
        SHA1_Internal_Transform(context, (sha_word32*)context->s1.buffer);

Then the storing to *buf56 appears (mov %rdx, 0x58(%rbx)):

   25ca75e9d:   48 8b 53 18             mov    0x18(%rbx),%rdx
   25ca75ea1:   48 0f ca                bswap  %rdx
   25ca75ea4:   48 89 53 18             mov    %rdx,0x18(%rbx)
   25ca75ea8:   48 8d 0d 8f c7 00 00    lea    0xc78f(%rip),%rcx        # 25ca8263e <.rdata+0x9e>
   25ca75eaf:   48 89 53 58             mov    %rdx,0x58(%rbx)
   25ca75eb3:   e8 60 2e 00 00          call   25ca78d18 
   25ca75eb8:   4c 89 e2                mov    %r12,%rdx
   25ca75ebb:   48 89 d9                mov    %rbx,%rcx
   25ca75ebe:   e8 7d fa ff ff          call   25ca75940

Now, accessing type punned pointer can break the strict aliasing rule. The gcc might have figured the storing into *buf56 had nothing to do with SHA1_Internal_Transform. But I feel there still needs to be a leap that it completely eliminates the store instruction.

Does *(sha_word64*)buf56 = context->s1.bitcount triggers Undefined Behavior? That's why gcc is entitled to remove that code?

Tags: gcc, Optimization, UndefinedBehavior

Better test failure report

2021-10-21T06:23:27+00:00

Better test failure report

I keep Gauche's test framework ref:gauche.test intentionally simple--a test evaluates a given expression and compares its result with the expected result; if they don't agree, reports it. That's all.

It doesn't have fancy knobs and dials, but it does the job. Fancy features can be written using Gauche's other features; e.g. if you need setup/teardown, you can just wrap tests with unwind-protect. I prefer this kind of explicit code to fat frameworks in which you need to track down its documents and (sometimes) implementation to know what exactly is done.

However, there has been one frustration: I can't easily change how the test failure is reported. Especially, when a test yields a large amount of results and it doesn't agree with expected one, it is hard to tell where is the difference, by looking at the entire expected and actual results.

Now I can have it. See the following test:

(test* "Beatrice"
       ;; expected
        '("What fire is in mine ears?  Can this be true?"
          "Stand I condemned for pride and scorn so much?"
          "Contempt, farewell, and maiden pride, adieu!"
          "No glory lives behind the back of such.")
       ;; actual
       "What fire is in mine ears?  Can this be true?\n\
        Stand I condemn'd for pride and scorn so much?\n\
        Contempt, farewell! and maiden pride, adieu!\n\
        No glory lives behind the back of such.\n"
        test-check-diff           ; check
        test-report-failure-diff) ; report

The expected text and the actual text have slight difference. This reports the difference in unified diff format.

ERROR: GOT diffs:
--- expected
+++ actual
@@ -1,4 +1,4 @@
 What fire is in mine ears?  Can this be true?
-Stand I condemned for pride and scorn so much?
-Contempt, farewell, and maiden pride, adieu!
+Stand I condemn'd for pride and scorn so much?
+Contempt, farewell! and maiden pride, adieu!
 No glory lives behind the back of such.

The third argument of test* is to compare the expected and actual result. If you prepare expected text in one big string, you can just use the default one; test-check-diff adds a bit of convenience by accepting a few different formats.

The fourth argument is the main addition. It accepts a report proceudre which is called when the expected result and the actual result didn't match, with three arguments, **message**, **expected-result** and **acutual-result**. The **message** argument is the first argument passed to test*.

The test-report-failure-diff uses text.diff module to display the difference of the results in diff format (ref:text.diff).

You can customize reporting as you wish. Another custom reporting we'd like to have is to show difference of tree structures.

Please refer to the manual for the details. (Before releasing 0.9.11, you can view the draft document.

Tags: 0.9.11, Testing, text.diff

Negative zero

2021-08-26T19:15:44+00:00

Negative zero

Gauche supports IEEE754 negative zero -0.0. It simply wraps an IEEE754 double as a scheme object, so mostly it just works as specified in IEEE754 (and as supported by the underlying math library). Or, so we thought.

Let's recap the behavior of -0.0. It's numerically indistinguishable from 0.0 (so, it is not "an infinitely small value less than zero"):

(= -0.0 0.0) ⇒ #t

(< -0.0 0.0) ⇒ #f

(zero? -0.0) ⇒ #t

But it can make a difference when there's a functon f(x) such that it is discontinuous at x = 0, and f(x) goes to different values when x approaches to zero from positive side or negative side.

(/ 0.0)  ⇒ +inf.0
(/ -0.0) ⇒ -inf.0

For arithmetic primitive procedures, we simply pass unboxed double to the underlying math functions, so we didn't think we need to handle -0.0 specially.

The first wakeup call was this article via HackerNews:

One does not simply calculate the absolute value

It talks about writing abs in Java, but every time I saw articles like this I just try it out on Gauche, and alas!

;; Gauche 0.9.10
(abs -0.0) ⇒ -0.0    ; Ouch!

Yeah, the culprit was the C implementation of abs, the gist of which was:

   if (x < 0.0) return -x;
   else return x;

-0.0 doesn't satisfy x < 0.0 so it was returned without negation.

The easy fix is to use signbit.

   if (signbit(x)) return -x;

I reported the fix on Twitter, then somebody raised an issue: What about (eqv? -0.0 0.0)?

My initial reaction was that it should be #t, since (= -0.0 0.0) is #t. In fact, R5RS states this:

The eqv? procedure returns #t if: ... obj1 and obj2 are both numbers, are numerically equal (see = ...), and are either both exact or both inexact.

However, I realized that R7RS has more subtle definition.

The eqv? procedure returns #f if: ... obj1 and obj2 are both inexact numbers such that either they are numerically unequal (in the sense of =), or they do not yield the same results (...) when passed as arguments to any other procedure that can be defined as a finite composition of Scheme's standard arithmetic procedures, ...

Clearly, -0.0 and 0.0 don't yield the same results when passed to /, so it should return #f. (It is also mentioned in 6.2.4 that -0.0 is distinct from 0.0 in a sense of eqv?.)

Fix for this is a bit involved. When I fixed eqv?, a bunch of tests started failing. It looks like some inexact integer division routines in the tests yield -0.0, and are compared to 0.0 with equal?, which should follow eqv? if arguments are numbers.

It turned out that the root cause was rounding primitives returning -0.0:

;; Gauche 0.9.10
(ceiling -0.5) ⇒ -0.0

Although this itself is plausible, in most of the cases when you're thinking of integers (exact or inexact), you want to treat zero as zero. Certainly you don't want to deal with two distint zeros in quotients or remainders. The choices would be either leave the rounding primitives as are and fix the integer divisions, or change the rounding primitives altogether. I choose the latter.

The fixes are in the HEAD now.

;; Gauche 0.9.11
(eqv? -0.0 0.0) ⇒ #f
(ceiling -0.5) ⇒ 0.0

Tags: 0.9.11, Flonums