2013/01/23
Fun with primes
Every year, on new year's holidays I tend to play with writing algorithms that aren't directly related to work. (In Japanese tradition, we write calligraphy in the beginning of each year, called 書き初め (first writing). Mine is a bit like that---first coding.)
Last year I played with procedural modeling using Markov-chain Monte Carlo
(WiLiKi:Gauche:MetropolisProceduralModeling). Initially I planned to pick some graphics-related papers again. However, @cddddr wrote a nice entry of generating infinite prime sequence taking advantage of Gauche's generators and I thought that'd be something nice to have in Gauche, so I decided to add math.prime
module and to throw in some utilities related to prime numbers.
First thing the module offer is an infinite lazy sequence of prime numbers *primes*
. I optimize the original code by @cddddr a bit, notably to avoid using continuations; currently it can generate first 10^7 primes in 14 seconds on 2.4GHz Core2 machine.
gosh> (time (list-ref *primes* #e1e7)) ;(time (list-ref *primes* 10000000)) ; real 14.297 ; user 14.060 ; sys 0.240 179424691
Primality tests are also suitable to be in the math.prime
module, right?
I only knew Miller-Rabin probabilistic primality test, so I googled
a bit to hunt around other options. I found out a few interesting things.
- Miller-Rabin tests doesn't need to be probabilistic. Up to certain numbers it is known to give definite answer with the selected "witness" values. Wikipedia lists up to 341550071728321---for a number below that we can say definitely it's a prime or not, by only using up to first 7 primes (2 to 17) as witnesses. If I increase the number of witnesses this bound can be higher, though I found out that the following BPSW test becomes faster than 8 or more rounds of Miller-Rabin tests.
- Baillie-PSW primality test is, like Miller-Rabin, a test that may tell a pseudoprime as a prime, but it is verified that no pseudoprime exist under 2^64, according to the page (the author of the page says independent check is needed, though). So up to that range it can be used as a deterministic test.
- Then there's AKS primality test, fastest known determinisic primality test.
I implemented Miller-Rabin and BPSW tests but haven't done AKS test; running out of time.
What else would be nice to have? How about prime factorization? This, again, is very interesting topic to search for various algorithms. I found This answer in stackoverflow and used it as a starting point. I ended up writing Monte-Carlo factorization (Brent's modification of Pollard's Rho alrogithm), as well as the naive "divide by every prime" approach.
There's something fascinating in prime factorization. I found myself blankly typing random numbers to factorize, again and again...
gosh> (mc-factorize 435728572856238754559) (104579 1018217 4091958413) gosh> (mc-factorize 984597945623658375937) (79 12463265134476688303)
If the number goes around 20digits in decimal or so, sometimes Monte-Carlo factorization takes very long. Factoring 19740274219868223167 took several minutes.
gosh> (mc-factorize 19740274219868223167) (2971215073 6643838879)
I ran mc-factorize
on 7-th fermat number (2^128+1) for more than a day but it couldn't find factors.
I thought I'd stop here, but when I tried WolframAlpha it factorized those big numbers without sweat, in seconds. Ok, I'd proceed to Elliptic curve factorization... some day...
The module also provides Jacobi symbol (used in BPSW test), and also Euler's totient function. Following Wikipedia links reveals more and more interesting functions (Mobius function? What's a cool name!) but it'd be endless process so I had to stop, for now.
I have no prospect to use this module in any practical work so far---if you need serious calculation in these fields, there already exist more general and optimized packages like PARI/GP, so I think math.prime
module has little practical importance. Nonetheless, it was fun to make those things run---and who knows someday it might come in handy.
Tags: math.prime, Prime
2012/12/12
NUL in a string
Recently there was a discussion in R7RS list whether we should support NUL character in a string (AFAIK, the resolution is that an implementation is allowed not to support NUL char in a string, but it's ok to support it.) One of the primary concern is the interoperability between NUL-terminated string representation of the foreign libraries; it can cause security problem such as http://www.ruby-lang.org/en/news/2012/10/12/poisoned-NUL-byte-vulnerability/
Gauche had the same problem and I recently fixed it (commit abca7b2). I addressed it on the C calling interface, where I had two choices---either I'd throw an error when Scm_GetStringConst was applied to a Scheme string containing NUL, or keep the existing function as it was and provide an additional function that checks it.
The former would make the check exhaustive, but there's a possibility that it would break existing code that intentionally passes a character array containing NUL in middle of it. As an old-type C programmer I had written such code---sometimes as an ad-hoc way to passing a struct to ioctl
, but I do remember there was a weird API that took an array of strings as "each string is separated by NUL byte, and the end of the array is marked by two consecutive NUL bytes".
So I chose the latter---I added a new 'safe' version of converting strings, and changed a bunch of system call functions to use the safe version.
Today I stumbled upon Peter Bex's article Lessons learned from NUL byte bugs and it makes me change my mind. The case that I need to pass a char array with NUL in middle is much, much rarer than passing C strings, and even if there's a case, we can provide a special API for those rare cases, while making the default API safe.
The fix can break backward compatibility but I expect it's very unlikely. If you know your code passes a character array with NUL as a string, let me know.
Tag: string
2012/09/30
(exact (/ 3.0)) => ?
Since Gauche's floating point numbers (flonums) have limited precision, you can always calculate the exact number the flonum represents. If a flonum d is finite, it can be represented as follows:
d = s * m * 2^e where s : 1 or -1 m,e : integer
If e >= 0, d has no fractional part. If e < 0, we can have an exact rational number for d as follows:
(s * m) / 2^(-e)
As of Gauche 0.9.3.3, exact
works exactly like this.
This is correct, but it has annoying behavior.
;; Gauche 0.9.3.3 gosh> (exact (/ 3.0)) 6004799503160661/18014398509481984 gosh> (exact 0.1) 3602879701896397/36028797018963968
Neither 1/3 nor 1/10 can be represented accurately by a flonum; internally we use the closest flonum, but whose exact value is such a gigantic rational number.
But does that make sense? An inexact number is, after all, inexact. We can regard a flonum d as an approximation of any real number within a certain range around d. There are infinite number of rationals in the range. Why don't we pick the simpler rational among them?
(Since I sometimes see the argument that floating point numbers are inexact, I tempted to add a note. A floating point number itself can be either inexact or exact; there's nothing inherently inexact in the floating point number representation. It is the operations on them, in general, that are inexact (when rounding, overflow or underflow occur). If we use "exact floating point number", then the above gigantic rational representation is the correct answer. It is just that Gauche uses floating point representation for inexact numbers, which allows us to choose alternative rational representation.)
We already do similar sort of thing in printing flonums; we choose the shortest decimal representation that can uniquely identify given flonum.
Now we have rationalize, we can use the same algorithm to find out the simplest rational
number within the precision of a flonum. We can't use rationalize
directly, though, for the error range is more complicated.
In general, any real number r that falls between this range will be represented by a flonum d:
d - q/2 < r < d + q/2 ; m is odd d - q/2 <= r <= d + q/2 ; m is even where d = s * m * 2^e and q = 2^e
That is, q is the least quantum representable with the current exponent.
Since Gauche adopts round-to-even rule, the value that exactly falls between two floating point numbers would belong to the even flonum--- the one whose LSB of mantissa is zero.
This alone makes rationalize
short for our purpose, since
it always consider a closed range.
There are also two special cases we need to consider. If a flonum is a power of two, the flonums below is more "dense" than usual, so we have smaller lower error range:
d - q/4 <= r <= d + q/2
Also, if d falls under denormalized range, q is fixed to to the smallest denormalized number.
So we implemented a new procedure real->rational
, which
can take individual upper and lower error boundary and a flag
to indicate whether the range is open or not.
Then rationalize
can be written on top of it.
Our new exact
and inexact->exact
calls real->rational
if the argument is inexact real. Here's what you get in HEAD:
gosh> (exact (/ 3.0)) 1/3 gosh> (exact 0.1) 1/10
If you need the exact rational representation of d itself,
you can call real->rational
with zero error tolerance.
gosh> (real->rational (/. 3) 0 0) 6004799503160661/18014398509481984 gosh> (real->rational 0.1 0 0) 3602879701896397/36028797018963968
2012/09/25
Way to rationalize
I confess. I claimed Gauche conformed R5RS, but it didn't.
In the early stage of development, I somehow thought rationalize
was an optional procedure and
I didn't bother to implement it.
But it wasn't optional; it was a library procedure. Ugh.
So one day I decided to write one, and here's my little journey.
What (rationalize x e)
returns
is the simplest rational number p/q, where
|x - p/q| is within the specified error bound e.
A quick search finds a method to use Stern-Brocot tree. Suppose we have a/b and c/d where a/b < x < c/d. We calculate the mediant M = (a+c)/(b+d), then we have the following cases:
- If M is close enough to x, it's the answer.
- If M is above x, shrink the search range to a/b < x < M and recurse.
- If M is below x, shrink the search range to M < x < c/d and recurse.
Reasonable. This is the straight implementation (for simplicity,
I made it always return an exact rational, and also omitted
handling exceptional cases like +inf.0
).
Looks good, yeah?
gosh> (use math.const) #<undef> gosh> (rationalize1 pi 1/100) 22/7 gosh> (rationalize1 pi 1/1000) 201/64 gosh> (rationalize1 pi 1/100000) 355/113 gosh> (rationalize1 pi 1/10000000) 75948/24175 gosh> (rationalize1 (/ 3.0) 1e-17) 1/3
Well, not quite. This doesn't work well when the error bound is small and x falls just outside of error bound from either a/b or c/d. Suppose x is 1/1000000 and error bound is 1/2000000. We start from 0/1 and 1/1, but the mediant is always greater than x so only higher bound is moving towards x. The denominator of the lower bound is fixed to 1, so the mediant's denominator increases only 1 at every iteration. It'll take 1000000 steps to get the answer.
So I searched a bit more. Someone described an algorithm about calculating (regular) continued fraction up to the point the error gets small enough. Calculating continued fraction is easy; it's basically the same as Euclidean algorithm.
gosh> (continued-fraction (exact pi)) (3 7 15 1 292 1 1 1 2 1 3 1 14 3 3 2 1 3 3 7 2 1 1 3 2 42 2) gosh> (continued-fraction (exact e)) (2 1 2 1 1 4 1 1 6 1 1 8 1 1 10 1 1 12 1 1 11 1 1 1 11 5 1 1 2 1 4 2 1 1 9 17 3)
That is, π = 3 + 1/(7 + 1/(15 + 1/(1 + 1/(292 + ...)))) and so on.
(Note1: Since we have limited precision in the floating number constant pi and e, the above expansions don't match the real expansion of pi and e after first dozen items or so.)
(Note2: Each rational number has two regular
continued fraction representation, for you can always
decrement the last item and append '1' to the tail; e.g.
3/8 can be [0; 2, 1, 2] and [0; 2, 1, 1, 1].
Our continued-fraction
procedure always returns
the shorter one.)
Suppose we have a continued fraction [a0; a1, a2, ...], and its calculated value up to n-th term is p_n/q_n, then it is known that the value up to n+1-th term p_{n+1}/q_{n+1} is (a_{n+1}*p_{n} + p_{n-1})/(a_{n+1}*q_{n} + q_{n-1}), where p_0/q_0 = a0 and p_1/q_1 = a0 + 1/a1. So we can keep two previous approximated value to calcuate the next approximation, instead of calculating the continued fraction every time we add the term.
Here's an implementation. Again, we always return an exact result
for the simplicity.
Note that continued-fraction
returns a lazy sequence,
so we don't calculate the continued fraction more than necessary.
(If an gets empty, Qn must be exactly the same as x, so we never need to recurse further.)
Is this good enough?
Unfortunately, this converges too quickly and possibly
misses simpler (but less closer) approximation. In the following example,
rationalize1
finds the desired approximation (1/500)
but rationalize2
returns closer but less simple rational (1/999).
gosh> (rationalize1 2/1999 1/1000) 1/500 gosh> (rationalize2 2/1999 1/1000) 1/999
The Wikipedia entry of continued fraction has some good explanation. Basically, if we have a truncated continued fraction [a0; a1, a2, ..., an-1, an], then we have to check [a0; a1, a2, ... , an-1, ceiling(an/2)], [a0; a1, a2, ... , an-1, ceiling(an/2)+1], ... etc. This would make convergence slow again, when we have very large an.
The solution is in the next headings of the same Wikipedia article. We can find the simplest rational between two numbers, by comparing continued fractions of the two. Let's try some examples.
gosh> (continued-fraction 31415/10000) (3 7 14 1 8 2) gosh> (continued-fraction 31416/10000) (3 7 16 11)
To find a simplest rational between 31415/10000 and 31416/10000, we look for the first item that differ between the two sequences---in this case, it's the third item, 14 vs 16. We replace the item with (+ (min 14 16) 1), which is 15, and discard the rest; that is, [3; 7, 15] is the answer.
gosh> (+ 3 (/ (+ 7 (/ 15)))) 333/106
Given x and e, the answer of rationalize is the simplest rational between x-e and x+e, hence we get this implementation:
The actual code in Gauche includes handling of inexactness, infinites and NaNs.
(Note: The above implementation finds the answer in
the closed range between x-e and x+e,
which is ok for rationalize
, but if you want
to search in the open range, the refine
part
will be a bit more complicated. Besides, the above
implementation doesn't handle the case when e >= 1.)
Tags: r5rs, rationalize
2012/07/15
Working with generators - John Nash's Cipherer
Generators and lazy sequences are relatively new addition in Gauche, and I keep looking for small exercises I can practice using them. Here's one I did several months ago.
I stumbled on John Nash's 1955 letter to NSA discussing how encryption scheme should be designed, accompanied with a concrete design of cipherer/decipherer. It is basically a state machine that reads bits and write out moified bits, and the configuration of the state machine is the "key" of encryption. The nature of the machine can be implemented naturally as a generator converter---it takes a generator that feeds input information, and returns a generator that yields encrypted/decrypted information.
Here's the code:
make-permuter
is a common function that implements a
bit stream (takes 1 bit, and returns 1 bit) with internal
states. make-encipherer
and make-decipherer
return
procedures that take an input bit generator and return
an output bit generator.
To run this "machine", you want some convenience functions
that converts between text and bit generators
(Note: Recently I fixed reverse-bits->generator
; you need
git HEAD of Gauche at this moment to run this example):
Now, here's an example configuration (key) of the machine:
(define key '((1 3 6 2 4 0 5) (2 1 5 0 3 6 4) (#f #t #f #f #t #f #f) (#t #t #t #f #f #t #t))) (define E (apply make-encipherer key)) (define D (apply make-decipherer key))
And a couple of examples. First one takes a string,
encripts it by E
, immediately decripts by D
, and
converts back to string to show it. The second one reads
the text from current input port, encripts and decripts it
and echo back.
($ bools->string $ D $ E $ string->bools "hi, there!") ($ write-bools $ D $ E $ read-bools (current-input-port))
It is pleasant to be able to pipe componends just like I would do with machines. Could be done with lazy sequences, but generator version has less overhead.
Tag: gauche.generator
Comments (0)