2021/10/24
Is this an Undefined Behavior?
Automated tests of Gauche HEAD on Windows platform started failing since several days ago. The log showed SHA1 digest result didn't match. It's weird, for I haven't touched that part of code for long time.
I isolated the reproducible condition. It happens with the fairly
new gcc (11.2.0) with -O2. It doesn't exhibit without optimization,
nor with gcc 10.2.0 or other previous versions of gcc I have.
The problematic code is Aaron D. Gifford's SHA implementation sha2.c ( http://www.aarongifford.com/computers/sha.html ). It was last updated in January 2004, so it's pretty old, but I think it's still widely used.
I narrowed down the problem to around here:
/* Set the bit count: */
#if BYTE_ORDER == LITTLE_ENDIAN
/* Convert FROM host byte order */
REVERSE64(context->s1.bitcount,context->s1.bitcount);
#endif
void *buf56 = &context->s1.buffer[56];
*(sha_word64*)buf56 = context->s1.bitcount;
/* Final transform: */
SHA1_Internal_Transform(context, (sha_word32*)context->s1.buffer);
In our case, BYTE_ORDER is LITTLE_ENDIAN. REVERSE64
is a macro to swap the byte order of a 64bit word. context->s1.bitcount
is uint64_t, and context->s1.buffer is an array of unsigned chars.
What it does is to store 64bit-word of bitcount into the buffer
from 56th octet in the network byte order, and calls
SHA1_Internal_Transform.
It compiles to this code with optimization:
25ca75e9d: 48 8b 53 18 mov 0x18(%rbx),%rdx 25ca75ea1: 48 0f ca bswap %rdx 25ca75ea4: 48 89 53 18 mov %rdx,0x18(%rbx) 25ca75ea8: 48 89 d9 mov %rbx,%rcx 25ca75eab: 4c 89 e2 mov %r12,%rdx 25ca75eae: e8 8d fa ff ff call 25ca75940 <SHA1_Internal_Transform>
Here, %rbx contains the pointer to context, and %r12 to
context->s1.buffer. The first three instructions swap the
64bit word. (By the way, REVERSE64 macro is written with shifts and
bitmasks. Gcc cleverly figures out its intent and
replaces the whole expression by a bswap instrucion.)
The next three instruction is the calling sequence of
SHA1_Internal_Transform.
Wait. There're no instructions emitted to store bitcount
into *buf56. I checked the assembly after this but there're no
instructions for the store either.
If I insert a dummy external function call before
SHA1_Internal_Transform like this:
/* Set the bit count: */
#if BYTE_ORDER == LITTLE_ENDIAN
/* Convert FROM host byte order */
REVERSE64(context->s1.bitcount,context->s1.bitcount);
#endif
void *buf56 = &context->s1.buffer[56];
*(sha_word64*)buf56 = context->s1.bitcount;
puts("foo");
/* Final transform: */
SHA1_Internal_Transform(context, (sha_word32*)context->s1.buffer);
Then the storing to *buf56 appears (mov %rdx, 0x58(%rbx)):
25ca75e9d: 48 8b 53 18 mov 0x18(%rbx),%rdx 25ca75ea1: 48 0f ca bswap %rdx 25ca75ea4: 48 89 53 18 mov %rdx,0x18(%rbx) 25ca75ea8: 48 8d 0d 8f c7 00 00 lea 0xc78f(%rip),%rcx # 25ca8263e <.rdata+0x9e> 25ca75eaf: 48 89 53 58 mov %rdx,0x58(%rbx) 25ca75eb3: e8 60 2e 00 00 call 25ca78d18 <puts> 25ca75eb8: 4c 89 e2 mov %r12,%rdx 25ca75ebb: 48 89 d9 mov %rbx,%rcx 25ca75ebe: e8 7d fa ff ff call 25ca75940 <SHA1_Internal_Transform>
Now, accessing type punned pointer can break the strict aliasing rule.
The gcc might have figured the storing
into *buf56 had nothing to do with SHA1_Internal_Transform.
But I feel there still needs to be
a leap that it completely eliminates the store instruction.
Does *(sha_word64*)buf56 = context->s1.bitcount triggers
Undefined Behavior? That's why gcc is entitled to remove that code?
Tags: gcc, Optimization, UndefinedBehavior

Gambiteer (2021/10/25 22:25:35):
shiro (2021/11/09 07:37:58):