Gauche Devlog

2021/10/24

Is this an Undefined Behavior?

Automated tests of Gauche HEAD on Windows platform started failing since several days ago. The log showed SHA1 digest result didn't match. It's weird, for I haven't touched that part of code for long time.

I isolated the reproducible condition. It happens with the fairly new gcc (11.2.0) with -O2. It doesn't exhibit without optimization, nor with gcc 10.2.0 or other previous versions of gcc I have.

The problematic code is Aaron D. Gifford's SHA implementation sha2.c ( http://www.aarongifford.com/computers/sha.html ). It was last updated in January 2004, so it's pretty old, but I think it's still widely used.

I narrowed down the problem to around here:

        /* Set the bit count: */
#if BYTE_ORDER == LITTLE_ENDIAN
        /* Convert FROM host byte order */
        REVERSE64(context->s1.bitcount,context->s1.bitcount);
#endif
        void *buf56 = &context->s1.buffer[56];
        *(sha_word64*)buf56 = context->s1.bitcount;

        /* Final transform: */
        SHA1_Internal_Transform(context, (sha_word32*)context->s1.buffer);

In our case, BYTE_ORDER is LITTLE_ENDIAN. REVERSE64 is a macro to swap the byte order of a 64bit word. context->s1.bitcount is uint64_t, and context->s1.buffer is an array of unsigned chars. What it does is to store 64bit-word of bitcount into the buffer from 56th octet in the network byte order, and calls SHA1_Internal_Transform.

It compiles to this code with optimization:

   25ca75e9d:   48 8b 53 18             mov    0x18(%rbx),%rdx
   25ca75ea1:   48 0f ca                bswap  %rdx
   25ca75ea4:   48 89 53 18             mov    %rdx,0x18(%rbx)
   25ca75ea8:   48 89 d9                mov    %rbx,%rcx
   25ca75eab:   4c 89 e2                mov    %r12,%rdx
   25ca75eae:   e8 8d fa ff ff          call   25ca75940 <SHA1_Internal_Transform>

Here, %rbx contains the pointer to context, and %r12 to context->s1.buffer. The first three instructions swap the 64bit word. (By the way, REVERSE64 macro is written with shifts and bitmasks. Gcc cleverly figures out its intent and replaces the whole expression by a bswap instrucion.) The next three instruction is the calling sequence of SHA1_Internal_Transform.

Wait. There're no instructions emitted to store bitcount into *buf56. I checked the assembly after this but there're no instructions for the store either.

If I insert a dummy external function call before SHA1_Internal_Transform like this:

        /* Set the bit count: */
#if BYTE_ORDER == LITTLE_ENDIAN
        /* Convert FROM host byte order */
        REVERSE64(context->s1.bitcount,context->s1.bitcount);
#endif
        void *buf56 = &context->s1.buffer[56];
        *(sha_word64*)buf56 = context->s1.bitcount;

        puts("foo");

        /* Final transform: */
        SHA1_Internal_Transform(context, (sha_word32*)context->s1.buffer);

Then the storing to *buf56 appears (mov %rdx, 0x58(%rbx)):

   25ca75e9d:   48 8b 53 18             mov    0x18(%rbx),%rdx
   25ca75ea1:   48 0f ca                bswap  %rdx
   25ca75ea4:   48 89 53 18             mov    %rdx,0x18(%rbx)
   25ca75ea8:   48 8d 0d 8f c7 00 00    lea    0xc78f(%rip),%rcx        # 25ca8263e <.rdata+0x9e>
   25ca75eaf:   48 89 53 58             mov    %rdx,0x58(%rbx)
   25ca75eb3:   e8 60 2e 00 00          call   25ca78d18 <puts>
   25ca75eb8:   4c 89 e2                mov    %r12,%rdx
   25ca75ebb:   48 89 d9                mov    %rbx,%rcx
   25ca75ebe:   e8 7d fa ff ff          call   25ca75940 <SHA1_Internal_Transform>

Now, accessing type punned pointer can break the strict aliasing rule. The gcc might have figured the storing into *buf56 had nothing to do with SHA1_Internal_Transform. But I feel there still needs to be a leap that it completely eliminates the store instruction.

Does *(sha_word64*)buf56 = context->s1.bitcount triggers Undefined Behavior? That's why gcc is entitled to remove that code?

Tags: gcc, Optimization, UndefinedBehavior

2021/10/21

Better test failure report

I keep Gauche's test framework ref:gauche.test intentionally simple--a test evaluates a given expression and compares its result with the expected result; if they don't agree, reports it. That's all.

It doesn't have fancy knobs and dials, but it does the job. Fancy features can be written using Gauche's other features; e.g. if you need setup/teardown, you can just wrap tests with unwind-protect. I prefer this kind of explicit code to fat frameworks in which you need to track down its documents and (sometimes) implementation to know what exactly is done.

However, there has been one frustration: I can't easily change how the test failure is reported. Especially, when a test yields a large amount of results and it doesn't agree with expected one, it is hard to tell where is the difference, by looking at the entire expected and actual results.

Now I can have it. See the following test:

(test* "Beatrice"
       ;; expected
        '("What fire is in mine ears?  Can this be true?"
          "Stand I condemned for pride and scorn so much?"
          "Contempt, farewell, and maiden pride, adieu!"
          "No glory lives behind the back of such.")
       ;; actual
       "What fire is in mine ears?  Can this be true?\n\
        Stand I condemn'd for pride and scorn so much?\n\
        Contempt, farewell! and maiden pride, adieu!\n\
        No glory lives behind the back of such.\n"
        test-check-diff           ; check
        test-report-failure-diff) ; report

The expected text and the actual text have slight difference. This reports the difference in unified diff format.

ERROR: GOT diffs:
--- expected
+++ actual
@@ -1,4 +1,4 @@
 What fire is in mine ears?  Can this be true?
-Stand I condemned for pride and scorn so much?
-Contempt, farewell, and maiden pride, adieu!
+Stand I condemn'd for pride and scorn so much?
+Contempt, farewell! and maiden pride, adieu!
 No glory lives behind the back of such.

The third argument of test* is to compare the expected and actual result. If you prepare expected text in one big string, you can just use the default one; test-check-diff adds a bit of convenience by accepting a few different formats.

The fourth argument is the main addition. It accepts a report proceudre which is called when the expected result and the actual result didn't match, with three arguments, **message**, **expected-result** and **acutual-result**. The **message** argument is the first argument passed to test*.

The test-report-failure-diff uses text.diff module to display the difference of the results in diff format (ref:text.diff).

You can customize reporting as you wish. Another custom reporting we'd like to have is to show difference of tree structures.

Please refer to the manual for the details. (Before releasing 0.9.11, you can view the draft document.

Tags: 0.9.11, Testing, text.diff

2021/08/26

Negative zero

Gauche supports IEEE754 negative zero -0.0. It simply wraps an IEEE754 double as a scheme object, so mostly it just works as specified in IEEE754 (and as supported by the underlying math library). Or, so we thought.

Let's recap the behavior of -0.0. It's numerically indistinguishable from 0.0 (so, it is not "an infinitely small value less than zero"):

(= -0.0 0.0) ⇒ #t

(< -0.0 0.0) ⇒ #f

(zero? -0.0) ⇒ #t

But it can make a difference when there's a functon f(x) such that it is discontinuous at x = 0, and f(x) goes to different values when x approaches to zero from positive side or negative side.

(/ 0.0)  ⇒ +inf.0
(/ -0.0) ⇒ -inf.0

For arithmetic primitive procedures, we simply pass unboxed double to the underlying math functions, so we didn't think we need to handle -0.0 specially.


The first wakeup call was this article via HackerNews:

One does not simply calculate the absolute value

It talks about writing abs in Java, but every time I saw articles like this I just try it out on Gauche, and alas!

;; Gauche 0.9.10
(abs -0.0) ⇒ -0.0    ; Ouch!

Yeah, the culprit was the C implementation of abs, the gist of which was:

   if (x < 0.0) return -x;
   else return x;

-0.0 doesn't satisfy x < 0.0 so it was returned without negation.

The easy fix is to use signbit.

   if (signbit(x)) return -x;

I reported the fix on Twitter, then somebody raised an issue: What about (eqv? -0.0 0.0)?

My initial reaction was that it should be #t, since (= -0.0 0.0) is #t. In fact, R5RS states this:

The eqv? procedure returns #t if: ... obj1 and obj2 are both numbers, are numerically equal (see = ...), and are either both exact or both inexact.

However, I realized that R7RS has more subtle definition.

The eqv? procedure returns #f if: ... obj1 and obj2 are both inexact numbers such that either they are numerically unequal (in the sense of =), or they do not yield the same results (...) when passed as arguments to any other procedure that can be defined as a finite composition of Scheme's standard arithmetic procedures, ...

Clearly, -0.0 and 0.0 don't yield the same results when passed to /, so it should return #f. (It is also mentioned in 6.2.4 that -0.0 is distinct from 0.0 in a sense of eqv?.)

Fix for this is a bit involved. When I fixed eqv?, a bunch of tests started failing. It looks like some inexact integer division routines in the tests yield -0.0, and are compared to 0.0 with equal?, which should follow eqv? if arguments are numbers.

It turned out that the root cause was rounding primitives returning -0.0:

;; Gauche 0.9.10
(ceiling -0.5) ⇒ -0.0

Although this itself is plausible, in most of the cases when you're thinking of integers (exact or inexact), you want to treat zero as zero. Certainly you don't want to deal with two distint zeros in quotients or remainders. The choices would be either leave the rounding primitives as are and fix the integer divisions, or change the rounding primitives altogether. I choose the latter.


The fixes are in the HEAD now.

;; Gauche 0.9.11
(eqv? -0.0 0.0) ⇒ #f
(ceiling -0.5) ⇒ 0.0

Tags: 0.9.11, Flonums

2021/06/09

Cookbook: Running commands remotely

(I frequently write throw-away scripts in Gauche, and it occurred to me that they can be a nice source of cookbook recipe. I'll write them up as I come to useful snippets.)

With run-process or do-process, you can invoke external commands (ref:gauche.process). One of their interesting feature is that you can run the commands on a remote host, if you have ssh access with public-key authentication to it.

Just add :host keyword argument.

(do-process '(ls) :host "remote.example.com")
 ;=> you see listing of your home directory at remote.example.com

Stdio is forwarded to the local process, so as process exit status. The :directory keyword argument works, though it is relative to your home directory of the remote machine. So, you can mix local execution and remote execution pretty much seamlessly.

The following snippet pushes local commits to the repo, pulls them on the remote machine, rebuild and restart the service. The return value of do-process is a boolean indicating command success or failure, so combining with and is like && in shell scripts.

(and (do-process '(git push) :directory *local-dir*)
     (do-process '(git pull) :host *host* :directory *remote-dir*)
     (do-process '(make)     :host *host* :directory *remote-dir*)
     (do-process '(make restart) :host *host* :directory *remote-dir*))

Tags: Cookbook, gauche.process

2021/05/31

Functional and linear-updating interfaces

Recent trend of data structure SRFIs is to provide two flavors of updating procedures:

  • Functional updaters never mutate the input structure, and always return a newly allocated structure.
  • Linear updaters are allowed to reuse the storage of input structure to produce the output, given that the caller guarantees the input structure will never be used.

Functional interface has a good nature--it won't create hidden dependencies thus the code is easy to reason about. It also plays nicely with concurrent execution, for you don't need to worry that your operations step on other threads' toes.

Linear updating interface gives the user to express opportunities of optimization. The implementation can take advantage of it to reduce allocations.

So, it appears to be a nice combination---except that, I think, the way they are currently specified is actually pulling each one's leg and reducing their merits.


Performance-sensitive users often frown on functional data structures, for they seem to copy everything every time. "It won't be that bad," functionally-minded users replies, "for it is often the case that the input structure and the updated structure can share most of their internals; the updated structure just allocates enough to store the updated parts. In the extreme case, the updater can just return the input as is, when it finds out the structure isn't altered at all (e.g. adjoining an item to a set that already has the item). The beauty of functional programming is that nobody cares whether it is shared or not---only the content matters."

It is true if everything is written functionally. However, to use the linear-updating interface, the caller needs to know that the structure to pass in isn't shared. If the functional interface may return a (partially) shared structure, it's hard to guarantee the "no-share" condition. Thus, SRFI states the functional interface always copies the input, even if there's no change at all. It can't take advantage of partial sharing as well, if the linear-updating version mutates internal structure.

This takes out the opportunity of optimization in the functional interface. The implementation needs to choose either (1) makes a slow functional version, in order to provide an efficient linear-updating version, or (2) makes a linear-updating version not mutate the input at all, and put functional optimizations.


I think we can do better. One idea is this:

  • The data structure has a mutability flag internally.
  • The functional interface always returns immutable data. It may return the input as is, or return a structure that partially shares the input, if the input structure is flagged immutable. If the input is flagged mutable, it always returns a fleshly copied immutable structure.
  • The linear-updating interface may mutate the input structure if it is flagged mutable, and copies if the input structure is flagged immutable.

If the SRFI does not provide an explicitly-mutating interface, it is actually almost indistinguishable from the existing SRFI spec, except when you compare input and output structures with eq?.

Given that explicitly-mutating interfaces (such as vector-set!) aren't popular in the SRFIs, I think it's good to allow the implementation to take the latter choice.


Discussion on srfi-discuss

Tags: srfi, Immutability, DataStructures

More entries ...