Gauche Devlog

< Searching multibyte strings | New REPL >


Curse of extended floating point arithmetic

In hindsight, Intel's decision of having 80bit extended-precision arithmetic internally in x87 was unfortunate. 64bit double-precision became pervasive, and the curse of double-rounding has bitten number of careless programmers.

And it finally bit me when I was implementing more reasonable flonum to rational conversion. To be precise, the issue had been already there, but that feature made me find a concrete example to exhibit the issue.

★ ★ ★

The new exact procedure tries to find the simplest rational number within the range where a flonum covers. With that scheme, 1.0000000000000002 (the "next" floating point number above 1.0 in double-precision flonum) becomes 3002399751580332/3002399751580331.

gosh> (exact 1.0000000000000002)

However, users on i686 systems reported that converting the rational back to flonum yielded 1.0000000000000004. That is, 1 unit greater than the original number.

Let's see what was happening, exactly. First, save these big numerator and denominator in variables for the convenience.

gosh> (define x 3002399751580332)
gosh> (define y 3002399751580331)

When we divide x by y with double precision arithmetic, what will we get?

gosh> (quotient&remainder (* x (expt 2 52)) y)
gosh> (* 1501199875790165 2)
3002399751580330  ; smaller than y=3002399751580331

Since the remainder is slightly less than the divisor, we will round down the result and the mantissa will be 4503599627370497, or, in binary:

          ^53th bit

In normalized double precision number, the 1 in MSB is hidden, and the rest of 52 bits fit into the 52bit mantissa field.

The fact that the remainder is so close to the divisor makes this division susceptible to the rounding errors. Let's see the extended precision division. 80-bit extended floating point number does not use hidden bit and the fraction part is 63 bit wide.

gosh> (quotient&remainder (* x (expt 2 63)) y)

The quotient, in binary notation, is as follows:

          ^53th bit

However, the remainder is greater than the half of divisor, so it is rounded up in the FPU register:

          ^53th bit

When this extended number is stored into a double-precision floating point variable, the bits beyond 53-th one is rounded. since it falls exactly the half point of two double-precision floating numbers, "round-to-even" rule is applied, and the result becomes...

          ^53th bit

Alas, the trap of double-rounding.

★ ★ ★

We can set up the control register to force the FPU to round at double-precision for every operation. The way to do so depends on OS, though, so I've been avoided it. This time there seemed no choice but using it---I could've carried out the division in bignum, as we shown above, and manually fitted the calculated 52bits into the flonum's mantissa field, but it would be huge performance loss, given that almost all cases FPU would yield correct results.

I tried to find a way to test if the input falls in this kind of particular cases---if I could do that cheaply, I'd first apply the test then would do the expensive calculation only in the special cases--- but couldn't find any. I ended up having special C preprocessor macros in src/gauche/float.h to set up FPU registers depending on OSes.

Tags: Flonums, Rational, 0.9.4, exact, inexact

Post a comment