Gauche Devlog

2013/01/28

Curse of extended floating point arithmetic

In hindsight, Intel's decision of having 80bit extended-precision arithmetic internally in x87 was unfortunate. 64bit double-precision became pervasive, and the curse of double-rounding has bitten number of careless programmers.

And it finally bit me when I was implementing more reasonable flonum to rational conversion. To be precise, the issue had been already there, but that feature made me find a concrete example to exhibit the issue.

★ ★ ★

The new exact procedure tries to find the simplest rational number within the range where a flonum covers. With that scheme, 1.0000000000000002 (the "next" floating point number above 1.0 in double-precision flonum) becomes 3002399751580332/3002399751580331.

gosh> (exact 1.0000000000000002)
3002399751580332/3002399751580331

However, users on i686 systems reported that converting the rational back to flonum yielded 1.0000000000000004. That is, 1 unit greater than the original number.

Let's see what was happening, exactly. First, save these big numerator and denominator in variables for the convenience.

gosh> (define x 3002399751580332)
x
gosh> (define y 3002399751580331)
y

When we divide x by y with double precision arithmetic, what will we get?

gosh> (quotient&remainder (* x (expt 2 52)) y)
4503599627370497
1501199875790165
gosh> (* 1501199875790165 2)
3002399751580330  ; smaller than y=3002399751580331

Since the remainder is slightly less than the divisor, we will round down the result and the mantissa will be 4503599627370497, or, in binary:

1000...0001
          ^53th bit

In normalized double precision number, the 1 in MSB is hidden, and the rest of 52 bits fit into the 52bit mantissa field.

The fact that the remainder is so close to the divisor makes this division susceptible to the rounding errors. Let's see the extended precision division. 80-bit extended floating point number does not use hidden bit and the fraction part is 63 bit wide.

gosh> (quotient&remainder (* x (expt 2 63)) y)
9223372036854778879
3002399751579307

The quotient, in binary notation, is as follows:

1000...000101111111111
          ^53th bit

However, the remainder is greater than the half of divisor, so it is rounded up in the FPU register:

1000...000110000000000
          ^53th bit

When this extended number is stored into a double-precision floating point variable, the bits beyond 53-th one is rounded. since it falls exactly the half point of two double-precision floating numbers, "round-to-even" rule is applied, and the result becomes...

1000...0010
          ^53th bit

Alas, the trap of double-rounding.

★ ★ ★

We can set up the control register to force the FPU to round at double-precision for every operation. The way to do so depends on OS, though, so I've been avoided it. This time there seemed no choice but using it---I could've carried out the division in bignum, as we shown above, and manually fitted the calculated 52bits into the flonum's mantissa field, but it would be huge performance loss, given that almost all cases FPU would yield correct results.

I tried to find a way to test if the input falls in this kind of particular cases---if I could do that cheaply, I'd first apply the test then would do the expensive calculation only in the special cases--- but couldn't find any. I ended up having special C preprocessor macros in src/gauche/float.h to set up FPU registers depending on OSes.

Tags: Flonums, Rational, 0.9.4, exact, inexact

2013/01/24

Searching multibyte strings

While integrating a patch in string.c I rememberd a couple of optimizations in string-scan I had planned to add but long forgotten.

The current code uses Boyer-Moore if both source string and search string are single-byte, but falls back to a naive search method when either one is multibyte. That's because, if the internal encoding is EUC-JP or Shift_JIS, bytewise scan could lead a spurious match (matching with incorrect character boundaries). However, utf-8 is designed in a way that such spurious match is impossible. I just made it always use Boyer-Moore search.

Another possible optimization is when the length of search string is 1. It may seem that you'd rather search the character instead of searching a string of length 1 in such a case. Unfortunately, one character may occupy multiple bytes, so the current "search a character in a string" naively converts it to search of length 1 string. I just changed it to handle single-byte search specially.

The optimization is pretty effective. The following simple benchmark shows searching multibyte and single-byte string within a long multibyte strings. (It shows slowdown when multibyte match is at the beginning of the source string, since the optimized one has overhead of setting up Boyer-Moore skip table. But in such cases the the search returns quickly and degradation won't won't be a problem.)

Optimized:

last-mb    #<time-result 100000 times/ 17.202 real/ 17.170 user/  0.040 sys>
middle-mb  #<time-result 100000 times/  8.700 real/  8.700 user/  0.000 sys>
first-mb   #<time-result 100000 times/  0.023 real/  0.020 user/  0.000 sys>
last-sb    #<time-result 100000 times/  8.827 real/  8.830 user/  0.000 sys>
middle-sb  #<time-result 100000 times/  4.530 real/  4.520 user/  0.000 sys>
first-sb   #<time-result 100000 times/  0.011 real/  0.020 user/  0.000 sys>

Original:

last-mb    #<time-result 100000 times/ 29.267 real/ 29.240 user/  0.020 sys>
middle-mb  #<time-result 100000 times/ 14.604 real/ 14.590 user/  0.010 sys>
first-mb   #<time-result 100000 times/  0.014 real/  0.020 user/  0.000 sys>
last-sb    #<time-result 100000 times/ 26.303 real/ 26.290 user/  0.020 sys>
middle-sb  #<time-result 100000 times/ 13.149 real/ 13.140 user/  0.000 sys>
first-sb   #<time-result 100000 times/  0.013 real/  0.010 user/  0.000 sys>

Tags: string-scan, Strings

2013/01/23

Fun with primes

Every year, on new year's holidays I tend to play with writing algorithms that aren't directly related to work. (In Japanese tradition, we write calligraphy in the beginning of each year, called 書き初め (first writing). Mine is a bit like that---first coding.)

Last year I played with procedural modeling using Markov-chain Monte Carlo (WiLiKi:Gauche:MetropolisProceduralModeling). Initially I planned to pick some graphics-related papers again. However, @cddddr wrote a nice entry of generating infinite prime sequence taking advantage of Gauche's generators and I thought that'd be something nice to have in Gauche, so I decided to add math.prime module and to throw in some utilities related to prime numbers.

First thing the module offer is an infinite lazy sequence of prime numbers *primes*. I optimize the original code by @cddddr a bit, notably to avoid using continuations; currently it can generate first 10^7 primes in 14 seconds on 2.4GHz Core2 machine.

gosh> (time (list-ref *primes* #e1e7))
;(time (list-ref *primes* 10000000))
; real  14.297
; user  14.060
; sys    0.240
179424691

Primality tests are also suitable to be in the math.prime module, right? I only knew Miller-Rabin probabilistic primality test, so I googled a bit to hunt around other options. I found out a few interesting things.

  • Miller-Rabin tests doesn't need to be probabilistic. Up to certain numbers it is known to give definite answer with the selected "witness" values. Wikipedia lists up to 341550071728321---for a number below that we can say definitely it's a prime or not, by only using up to first 7 primes (2 to 17) as witnesses. If I increase the number of witnesses this bound can be higher, though I found out that the following BPSW test becomes faster than 8 or more rounds of Miller-Rabin tests.
  • Baillie-PSW primality test is, like Miller-Rabin, a test that may tell a pseudoprime as a prime, but it is verified that no pseudoprime exist under 2^64, according to the page (the author of the page says independent check is needed, though). So up to that range it can be used as a deterministic test.
  • Then there's AKS primality test, fastest known determinisic primality test.

I implemented Miller-Rabin and BPSW tests but haven't done AKS test; running out of time.

What else would be nice to have? How about prime factorization? This, again, is very interesting topic to search for various algorithms. I found This answer in stackoverflow and used it as a starting point. I ended up writing Monte-Carlo factorization (Brent's modification of Pollard's Rho alrogithm), as well as the naive "divide by every prime" approach.

There's something fascinating in prime factorization. I found myself blankly typing random numbers to factorize, again and again...

gosh> (mc-factorize 435728572856238754559)
(104579 1018217 4091958413)
gosh> (mc-factorize 984597945623658375937)
(79 12463265134476688303)

If the number goes around 20digits in decimal or so, sometimes Monte-Carlo factorization takes very long. Factoring 19740274219868223167 took several minutes.

gosh> (mc-factorize 19740274219868223167)
(2971215073 6643838879)

I ran mc-factorize on 7-th fermat number (2^128+1) for more than a day but it couldn't find factors.

I thought I'd stop here, but when I tried WolframAlpha it factorized those big numbers without sweat, in seconds. Ok, I'd proceed to Elliptic curve factorization... some day...

The module also provides Jacobi symbol (used in BPSW test), and also Euler's totient function. Following Wikipedia links reveals more and more interesting functions (Mobius function? What's a cool name!) but it'd be endless process so I had to stop, for now.

I have no prospect to use this module in any practical work so far---if you need serious calculation in these fields, there already exist more general and optimized packages like PARI/GP, so I think math.prime module has little practical importance. Nonetheless, it was fun to make those things run---and who knows someday it might come in handy.

Tags: math.prime, Prime

2012/12/12

NUL in a string

Recently there was a discussion in R7RS list whether we should support NUL character in a string (AFAIK, the resolution is that an implementation is allowed not to support NUL char in a string, but it's ok to support it.) One of the primary concern is the interoperability between NUL-terminated string representation of the foreign libraries; it can cause security problem such as http://www.ruby-lang.org/en/news/2012/10/12/poisoned-NUL-byte-vulnerability/

Gauche had the same problem and I recently fixed it (commit abca7b2). I addressed it on the C calling interface, where I had two choices---either I'd throw an error when Scm_GetStringConst was applied to a Scheme string containing NUL, or keep the existing function as it was and provide an additional function that checks it.

The former would make the check exhaustive, but there's a possibility that it would break existing code that intentionally passes a character array containing NUL in middle of it. As an old-type C programmer I had written such code---sometimes as an ad-hoc way to passing a struct to ioctl, but I do remember there was a weird API that took an array of strings as "each string is separated by NUL byte, and the end of the array is marked by two consecutive NUL bytes".

So I chose the latter---I added a new 'safe' version of converting strings, and changed a bunch of system call functions to use the safe version.

Today I stumbled upon Peter Bex's article Lessons learned from NUL byte bugs and it makes me change my mind. The case that I need to pass a char array with NUL in middle is much, much rarer than passing C strings, and even if there's a case, we can provide a special API for those rare cases, while making the default API safe.

The fix can break backward compatibility but I expect it's very unlikely. If you know your code passes a character array with NUL as a string, let me know.

Tag: string

2012/09/30

(exact (/ 3.0)) => ?

Since Gauche's floating point numbers (flonums) have limited precision, you can always calculate the exact number the flonum represents. If a flonum d is finite, it can be represented as follows:

 d = s * m * 2^e
    
 where s : 1 or -1
       m,e : integer

If e >= 0, d has no fractional part. If e < 0, we can have an exact rational number for d as follows:

 (s * m) / 2^(-e)

As of Gauche 0.9.3.3, exact works exactly like this. This is correct, but it has annoying behavior.

;; Gauche 0.9.3.3
gosh> (exact (/ 3.0))
6004799503160661/18014398509481984
gosh> (exact 0.1)
3602879701896397/36028797018963968

Neither 1/3 nor 1/10 can be represented accurately by a flonum; internally we use the closest flonum, but whose exact value is such a gigantic rational number.

But does that make sense? An inexact number is, after all, inexact. We can regard a flonum d as an approximation of any real number within a certain range around d. There are infinite number of rationals in the range. Why don't we pick the simpler rational among them?

(Since I sometimes see the argument that floating point numbers are inexact, I tempted to add a note. A floating point number itself can be either inexact or exact; there's nothing inherently inexact in the floating point number representation. It is the operations on them, in general, that are inexact (when rounding, overflow or underflow occur). If we use "exact floating point number", then the above gigantic rational representation is the correct answer. It is just that Gauche uses floating point representation for inexact numbers, which allows us to choose alternative rational representation.)

We already do similar sort of thing in printing flonums; we choose the shortest decimal representation that can uniquely identify given flonum.

Now we have rationalize, we can use the same algorithm to find out the simplest rational number within the precision of a flonum. We can't use rationalize directly, though, for the error range is more complicated.

In general, any real number r that falls between this range will be represented by a flonum d:

 d - q/2  <   r <  d + q/2   ; m is odd
 d - q/2  <=  r <= d + q/2   ; m is even
 
 where d = s * m * 2^e
   and q = 2^e

That is, q is the least quantum representable with the current exponent.

Since Gauche adopts round-to-even rule, the value that exactly falls between two floating point numbers would belong to the even flonum--- the one whose LSB of mantissa is zero.

This alone makes rationalize short for our purpose, since it always consider a closed range.

There are also two special cases we need to consider. If a flonum is a power of two, the flonums below is more "dense" than usual, so we have smaller lower error range:

 d - q/4  <= r <= d + q/2

Also, if d falls under denormalized range, q is fixed to to the smallest denormalized number.

So we implemented a new procedure real->rational, which can take individual upper and lower error boundary and a flag to indicate whether the range is open or not. Then rationalize can be written on top of it. Our new exact and inexact->exact calls real->rational if the argument is inexact real. Here's what you get in HEAD:

gosh> (exact (/ 3.0))
1/3
gosh> (exact 0.1)
1/10

If you need the exact rational representation of d itself, you can call real->rational with zero error tolerance.

gosh> (real->rational (/. 3) 0 0)
6004799503160661/18014398509481984
gosh> (real->rational 0.1 0 0)
3602879701896397/36028797018963968

Tags: Flonums, 0.9.4, exact, inexact

More entries ...