Gauche Devlog

2012/09/25

Way to rationalize

I confess. I claimed Gauche conformed R5RS, but it didn't. In the early stage of development, I somehow thought rationalize was an optional procedure and I didn't bother to implement it. But it wasn't optional; it was a library procedure. Ugh.

So one day I decided to write one, and here's my little journey.

What (rationalize x e) returns is the simplest rational number p/q, where |x - p/q| is within the specified error bound e.

A quick search finds a method to use Stern-Brocot tree. Suppose we have a/b and c/d where a/b < x < c/d. We calculate the mediant M = (a+c)/(b+d), then we have the following cases:

  • If M is close enough to x, it's the answer.
  • If M is above x, shrink the search range to a/b < x < M and recurse.
  • If M is below x, shrink the search range to M < x < c/d and recurse.

Reasonable. This is the straight implementation (for simplicity, I made it always return an exact rational, and also omitted handling exceptional cases like +inf.0).

Looks good, yeah?

gosh> (use math.const)
#<undef>
gosh> (rationalize1 pi 1/100)
22/7
gosh> (rationalize1 pi 1/1000)
201/64
gosh> (rationalize1 pi 1/100000)
355/113
gosh> (rationalize1 pi 1/10000000)
75948/24175
gosh> (rationalize1 (/ 3.0) 1e-17)
1/3

Well, not quite. This doesn't work well when the error bound is small and x falls just outside of error bound from either a/b or c/d. Suppose x is 1/1000000 and error bound is 1/2000000. We start from 0/1 and 1/1, but the mediant is always greater than x so only higher bound is moving towards x. The denominator of the lower bound is fixed to 1, so the mediant's denominator increases only 1 at every iteration. It'll take 1000000 steps to get the answer.

So I searched a bit more. Someone described an algorithm about calculating (regular) continued fraction up to the point the error gets small enough. Calculating continued fraction is easy; it's basically the same as Euclidean algorithm.

gosh> (continued-fraction (exact pi))
(3 7 15 1 292 1 1 1 2 1 3 1 14 3 3 2 1 3 3 7 2 1 1 3 2 42 2)
gosh> (continued-fraction (exact e))
(2 1 2 1 1 4 1 1 6 1 1 8 1 1 10 1 1 12 1 1 11 1 1 1 11 5 1 1 2 1 4 2 1 1 9 17 3)

That is, π = 3 + 1/(7 + 1/(15 + 1/(1 + 1/(292 + ...)))) and so on.

(Note1: Since we have limited precision in the floating number constant pi and e, the above expansions don't match the real expansion of pi and e after first dozen items or so.)

(Note2: Each rational number has two regular continued fraction representation, for you can always decrement the last item and append '1' to the tail; e.g. 3/8 can be [0; 2, 1, 2] and [0; 2, 1, 1, 1]. Our continued-fraction procedure always returns the shorter one.)

Suppose we have a continued fraction [a0; a1, a2, ...], and its calculated value up to n-th term is p_n/q_n, then it is known that the value up to n+1-th term p_{n+1}/q_{n+1} is (a_{n+1}*p_{n} + p_{n-1})/(a_{n+1}*q_{n} + q_{n-1}), where p_0/q_0 = a0 and p_1/q_1 = a0 + 1/a1. So we can keep two previous approximated value to calcuate the next approximation, instead of calculating the continued fraction every time we add the term.

Here's an implementation. Again, we always return an exact result for the simplicity. Note that continued-fraction returns a lazy sequence, so we don't calculate the continued fraction more than necessary.

(If an gets empty, Qn must be exactly the same as x, so we never need to recurse further.)

Is this good enough? Unfortunately, this converges too quickly and possibly misses simpler (but less closer) approximation. In the following example, rationalize1 finds the desired approximation (1/500) but rationalize2 returns closer but less simple rational (1/999).

gosh> (rationalize1 2/1999 1/1000)
1/500
gosh> (rationalize2 2/1999 1/1000)
1/999

The Wikipedia entry of continued fraction has some good explanation. Basically, if we have a truncated continued fraction [a0; a1, a2, ..., an-1, an], then we have to check [a0; a1, a2, ... , an-1, ceiling(an/2)], [a0; a1, a2, ... , an-1, ceiling(an/2)+1], ... etc. This would make convergence slow again, when we have very large an.

The solution is in the next headings of the same Wikipedia article. We can find the simplest rational between two numbers, by comparing continued fractions of the two. Let's try some examples.

gosh> (continued-fraction 31415/10000)
(3 7 14 1 8 2)
gosh> (continued-fraction 31416/10000)
(3 7 16 11)

To find a simplest rational between 31415/10000 and 31416/10000, we look for the first item that differ between the two sequences---in this case, it's the third item, 14 vs 16. We replace the item with (+ (min 14 16) 1), which is 15, and discard the rest; that is, [3; 7, 15] is the answer.

gosh> (+ 3 (/ (+ 7 (/ 15))))
333/106

Given x and e, the answer of rationalize is the simplest rational between x-e and x+e, hence we get this implementation:

The actual code in Gauche includes handling of inexactness, infinites and NaNs.

(Note: The above implementation finds the answer in the closed range between x-e and x+e, which is ok for rationalize, but if you want to search in the open range, the refine part will be a bit more complicated. Besides, the above implementation doesn't handle the case when e >= 1.)

Tags: r5rs, rationalize

2012/07/15

Working with generators - John Nash's Cipherer

Generators and lazy sequences are relatively new addition in Gauche, and I keep looking for small exercises I can practice using them. Here's one I did several months ago.

I stumbled on John Nash's 1955 letter to NSA discussing how encryption scheme should be designed, accompanied with a concrete design of cipherer/decipherer. It is basically a state machine that reads bits and write out moified bits, and the configuration of the state machine is the "key" of encryption. The nature of the machine can be implemented naturally as a generator converter---it takes a generator that feeds input information, and returns a generator that yields encrypted/decrypted information.

Here's the code:

make-permuter is a common function that implements a bit stream (takes 1 bit, and returns 1 bit) with internal states. make-encipherer and make-decipherer return procedures that take an input bit generator and return an output bit generator.

To run this "machine", you want some convenience functions that converts between text and bit generators (Note: Recently I fixed reverse-bits->generator; you need git HEAD of Gauche at this moment to run this example):

Now, here's an example configuration (key) of the machine:

(define key '((1 3 6 2 4 0 5)
              (2 1 5 0 3 6 4)
              (#f #t #f #f #t #f #f)
              (#t #t #t #f #f #t #t)))

(define E (apply make-encipherer key))
(define D (apply make-decipherer key))

And a couple of examples. First one takes a string, encripts it by E, immediately decripts by D, and converts back to string to show it. The second one reads the text from current input port, encripts and decripts it and echo back.

($ bools->string $ D $ E $ string->bools "hi, there!")

($ write-bools $ D $ E $ read-bools (current-input-port))

It is pleasant to be able to pipe componends just like I would do with machines. Could be done with lazy sequences, but generator version has less overhead.

Tag: gauche.generator

2012/05/11

Release timing

I finally pushed Gauche 0.9.3 out of door the day before yesterday---something I've been thinking I'd get to do "soon" for the last few months. There hasn't been big changes for a while and I thought it was quite stable and ready.

The day after the release, I found that I couldn't build git HEAD with 0.9.3 on Windows. Damn. I have Jenkins running on Linux that checks self-build, but the bug was in Windows-specific code. It doesn't affect ordinary users who just install Gauche (building from a release tarball is just fine), but it's inconvenient for developers, so I quickly wrap the fixed version as 0.9.3.1.

The day after that, somebody notified about a build failure if Gauche is configured with sjis or eucjp character encoding. I introduced the bug when I supported out-of-tree build but it did't affect utf-8 build. I pondered whether I'd just say "apply this patch if you want sjis/eucjp Gauche". But well, I've just went through packaging process so many times (including testing), it's just as easy to make another release---0.9.3.2.

I feel that the difficulty of releasing mostly comes from the fact that I'm not used to it enough. If it were everyday thing, I'd just "know" what to do and could avoid the pitfalls almost unconsciously. With the pace of one release in several months, however, I need to go through details consciously, keep checking if I'm not missing anything. That's a lot of mental burden.

It seems that there's a "release admittance" parameter, which increases if you know the process well but decreases as the time passes. The parameter is maximum when you've just went thorough the process. ("Admittance" is EE jargon: It's a reciprocal of impedance. It shows how easy the release process is.)

[image]

Releasing 0.9.3.1 and 0.9.3.2 were so easy since my release admittance was still high.

On the other hand, there's a "release pressure"; as the time passes and unreleased changes accumulates, the pressure increases.

[image]

(I think the pressure gradually saturates, since if there are already so many unreleased changes, the relative importance of one more unreleased change decreases. I suspect this curve actually decreases if I extend it further---if there's no releases for years and everybody you know has started using git HEAD, you won't feel much pressure for the next release.)

I always feel I've waited too long for a new release; the release happens when I couldn't hold the pressure any longer, but at the time my admittance gets too low and I have to plow through the process against high resistance.

Probably the optimum time of release is the peak of Y*P.

[image]

That is, when the pressure and easiness gets well balanced.

My feeling is that it is somewhere between 1-3 months after the release. I aim at that range for 0.9.4 (crossing fingers).

Tags: 0.9.3, release

2012/03/18

Consumers for generators

I've been trying out generators for day-to-day scripts and have repeatedly seen a common pattern--- to repeat reading from a generator and process values until it is exhausted. I wrote a bunch of utilities in gauche.generator but they are mostly a sort of filters, taking generators and yielding another generator. What I needed is a kind of "terminator" of the chain of generators, where the generated values were actually consumed.

Since a generator is just a thunk, I can use already existing tools---actually, port-fold, port-map and port-for-each expect thunks (in spite of their names), and we can pass a generator directly.

(port-for-each (^v ...) generator-expr)

Probably I should rename them to indicate they are actually about generators, not ports.

(Another option is to convert a generator to a lazy sequence by generator->lseq, so that we can use all list-handling utilities without wasting space for intermediate lists. But it still has some overhead (not only CPU-wise, but mentally for programmers who read and write the code). So I think it makes sense to have specialized utilities for generators.)

For lists, we have for-each. But sometimes it is handy and more readable to use dolist macro, taken from CL. We can have its equivalent for generators:

(do-generator [v generator-expr]
  ... do something with v ...)

What if we want to run across multiple generators, or even to mix generators and other sequences? Instead of inventing more macros, we can extend the existing srfi-42 mechanism to handle generators.

(do-ec (:parallel (: x '(a b c))
                  (: y (gdrop-while skip-preamble
                                    (file->generator "foo"))))
  (do-something x y))

For the latter, we added :generator EC-qualifier (so the second clause of :parallel in the above example can also be written as (:generator y ...)). The generic dispatcher of : dispatches to the :generator if its argument is an applicable entity taking zero arguments.

Those changes have been pushed to the master.

Tags: Generator, gauche.generator, srfi-42

2012/02/08

Threads, build process, lazy sequences

Haven't updated this blog for a while (again). Here's a quick progress report.

Windows threads

On Windows/MinGW, threads are finally supported. It uses Windows thread (not pthread-win32.) Configure with --enable-threads=win32 on MinGW&MSYS environment.

  ./configure --enable-threads=win32

By 0.9.2, you might have checked the availability of threads by gauche.sys.pthreads feature identifier in cond-expand. This feature identifier isn't available on Windows threads build. We created a new feature identifier, gauche.sys.threads, which is defined in both pthreads build and Windows threads build, and suitable for the Scheme programs that wants to switch behavior based on the availability of threads.

  (cond-expand
    [gauche.sys.threads
      ... code using threads ...]
    [else
      ... code not using threads ...])

The feature id gauche.sys.pthreads is still available on pthreads build, and a new feature id gauche.sys.wthreads is available on Windows threads build, in case if you need finer distinction of the thread library (but there's hardly visible difference in Scheme level).

Better thread safety

Kirill Zorin is using Gauche in a way that is a good stress test for Gauche's thread support, and he has identified and fixed numerous thread-related bugs. Thanks, Kirill! I won't surprise if we find more, but I'm quite happy about the improvement.

Out-of-source-tree build

In the very early stage, we made it so that Gauche can be build in separate directory from the source tree, but it has been broken since then. It's not trivial to fix since Gauche needs multi-stage build, that is, one process generates a source file to be compiled, and running the compiled one generates another source file, and so on. Now I finsished updating various *.in files so that it could compile out of source tree again.

To build in a different directory, just call the 'configure' script of the source tree from the build directory, e.g.

  $ ls
  Gauche
  $ mkdir Gauche-build
  $ cd Gauche-build
  $ ../Gauche/configure
  $ make

Lazy sequences

Finally documented lazy sequences and generators. Check out the chapter of "Lazy evaluation", and also "gauche.sequence" and "gauche.lazy" modules.

Tags: Threads, gauche.lazy, gauche.sequence

More entries ...