Gauche Devlog

2013/03/19

When the inexact square root of an integer is exact

In Exact sqrt entry, I wrote that, for a natural number smaller than 2^53, we could use double-precision floating point sqrt and take the answer exact if the result was an integer. As Mark H Weaver pointed out, it was wrong.

Suppose we have nonnegative integers N, M, m, and non-zero real number e, where M = m^2 < 2^53, N = (m*(1+e))^2 < 2^53.

If the absolute value of e isn't greater than 2^-53, square root of N, which is m*(1+e), can be rounded to m in the double-precision sqrt calcluation. The result becomes an integer, but not exact. That's the case we want to exclude.

N - M = (2e+e^2)M. The maximum bound of (2e+e^2) is (2^-52 + 2^-106) when e = 2^-53, so if M is greater than 2^52, N - M can exceed 1 and we might incorrectly recognize N as a square number of m.

The greatest square number below 2^52 is (2^26-1)^2 = 4503599493152769, and (2^-52 + 2^-106)*4503599493152769 is 81129635996755064984528658366465/81129638414606681695789005144064, which is smaller than 1.

2^52 = 4503599627370496 is also a square number, and calculating (sqrt (inexact (+ (expt 2 52) 1))) yields a rounded integer 67108864.0. But the lower bound, (-(2^52) + 2^-106)*4503599627370496 is -18014398509481983/18014398509481984, which is grater than -1, so applying inexact sqrt on 2^52-1 won't lead us to the rounded integer. (Indeed, (sqrt (inexact (- (expt 2 52) 1))) yields 67108863.99999999.)

So we can use inexact square root when the input is smaller than 2^52.

Tags: Flonums, sqrt, exact-integer-sqrt

2013/03/16

Checking scripts

I write small scripts in Gauche, all the time. They're not so 'serious' as to require unit-tests and configure scripts and other scaffolds, but they're not one-time throwaway scripts, either. So it's annoying to find out a silly bug like misspelling in less-executed code paths later.

Here's a command I run to quickly check such scripts.

gosh -ugauche.test -l script-name -E "test-module 'user" -Eexit

Tag: Script

2013/03/12

And here comes random data generators

I just checked in data.random, a collection of random data generators and their combinators. The names of API functions are not yet fixed, but I think the overall it's in a good shape. (Since 0.9.4 is overdue, I might be going to release it without making data.random official. I'm not sure yet.)

Here's the code: http://gauche.git.sourceforge.net/git/gitweb.cgi?p=gauche/Gauche;a=blob;f=lib/data/random.scm;hb=HEAD

It provides a bunch of primitive random generators such as followings.

  • uniform distribution
    • (integer size :optional (start 0)) returns a generator that produces random integer between start and start+size-1, uniformly.
    • (integer-between lo hi) returns a generator that produces random integer between lo and hi (both inclusive).
    • int8, uint8 etc. are preset generators to produce the range their name suggest.
    • (char :optional cset) returns a generator of random characters from a character set. When omitted, we use #[A-Za-z0-9] as the default character set.
    • We also have boolean, real, real-between.
    • We want to have exact rational generators and complex generators, but I wonder how the range and distribution should be specified.
  • nonuniform distribution
    • For discrete sampling, we have geometric and poisson distribution.
    • For continuous sampling, we have normal and exponential distribution.

Then, those generators can be combined to make more complex generators.

  • random choice
    • (one-of generators) returns a generator that picks one generator in generators randomly to produce the next value.
    • (weighted-sample weight&generators) allows you to specify weight of selection probability for each generators.
  • aggregate data
    • (pair-of gen1 gen2), (tuple-of gen ...)
    • list-of, vector-of, string-of - these combinators can be called in two different forms, e.g.
      • (list-of sizer item-gen): sizer can be an integer, or an integer-generator, to give the length of the resulting list. item-gen is a generator to produce elements.
      • (list-of item-gen): If sizer is omitted, we use some default generator to determine the length of the resulting list. Currently I use (poisson 4) provisionally.

I also have permutation-of and combination-of, which takes a list of items (not item generators).

What I like about the current shape is that those generators can be combined using gauche.generator framework as well; e.g. you can have series of sum of two dice rolling by:

    (gmap + (integer-between 1 6) (integer-between 1 6))

or apply a filter:

    (gfilter (cut < 0 <> 1) (exponential 1))

or taking some values into a list:

    (generator->list (poisson 5) 10)

Here are some elements about API I'm still pondering about:

  • We have procedures that creates a generator (e.g. integer, real, char) and pre-created generators (e.g. fixnum, int8). Without the static typing support, this kind of layers could be confusing. Shall we use some naming convention to distinguish these two layers?
  • There's an idea rolling in my head to provide plural names as an alias, e.g. chars for char. It plays nicely with the combinators, e.g. (list-of fixnums) or (string-of 5 (chars)). But I also feel this is just a superficial convenience; we double the number of exported names to get nothing added functionally.
  • The handling of omitted argument of list-of etc. is also different from Gauche's convention of optional arugments.

If you have data generator ideas to be thrown in to this module, let me know.

Now I'm writing a generative test framework, using this module as a data generators.

Tags: data.random, Generators

2013/03/11

Math fun

Recently I added a few math functions in the core of Gauche, for they were needed to implement some more domain-specific functions.

Just committed are gamma and lgamma, Gamma function and natural logarithm of absolute value of it.

(What prompted me to add them was the random variable of Poisson distribution; the algorithm I found needed log(n!). And the reason I was doing Poisson distribution thing was that I was writing a general random data generator module and I wanted to cover typical distributions. And the reason that I was writing such a module was that they were needed for generative test framework. It's like a domino effect; I'm writing X, then finds X needs Y, then Y needs Z, etc...)

Some time ago I also added expt-mod, that calculates modulus exponentiation.

I used to be worried for adding too much stuff to the core library, and tried to put functions together into a module whenever I could attach a suitable name for the bunch. However, some functions are just fundamental enough that appear in different fields now and then, and the core library may be the suitable place for them, after all.

Tags: gamma, expt-mod

2013/02/23

Land of Lisp on Gauche

I've been working on translating Conrad Barski's "Land of Lisp" to Japanese, and it finally hit the bookstores in Japan (published from O'Reilly Japan). To celebrate that, I ported the code in the book to Gauche.

Since Gauche adopts quite a few Common Lisp idioms, the port is pretty straightforward. The code may serve as an example to show how to translate CL-style code into Gauche.

Here are some observations.

  • Most of CL's loop use cases can be written using srfi:42.
    • The trick is that, in CL, you think "OK, I loop over this (list, vector, integer, etc.) and I want to get this (list, sum, etc.)", while in srfi-42, you first think what you want (ok, I want list so I use list-ec / ok, I want to stop iteration when any of the element satisfies the condition, so I use any?-ec), then you think about what to iterate.
  • CL has many idioms that take advantage of () being boolean false, and car/cdr of nil is still nil. It is not a good idea to carry over that idiom to Scheme, or you'll be frustrated feeling that Scheme's distinction of #f and () gets in your way. It just needs a different mindset (complaining about it is like a C programmer complaining 0 isn't false in some languages).
    • For example, in CL, you can access to the value associated to the key in alist by the following code, without worrying the case when key isn't found:
      (cdr (assoc key alist))
      
      You'll be annoyed that in Scheme, assoc returns #f when key isn't found, and cdr barfs. In Gauche, recommended way is as follows:
      (assoc-ref alist key :optional default)
      
    • Another common CL idiom is to build a list conditionally:
      (append (list up down)
              (unless (zerop (mod pos *board-size*))
                (list (1- up) (1- pos)))
              (unless (zerop (mod (1+ pos) *board-size*))
                (list (1+ pos) (1+ down))))
      
      This idiom uses the fact that unless returns nil when the condition is satisfied. In Gauche, unless returns #<undef> so this doesn't work. Instead of falling back to more verbose if, you can use cond-list.
      (cond-list 
       [#t @ (list up down)]
       [(not (zero? (mod pos *board-size*))) @ (list (- up 1) (- pos 1))]
       [(not (zero? (mod (+ pos 1) *board-size*))) @ (list (+ pos 1) (+ down 1))])
      
  • The lazy evaluation stuff built in Chapter 18 (lazy.lisp) comes free in Gauche, as util.stream module. The lazy.scm file in Gauche version only shows the correspondence of two APIs.
  • I have a plan to make format CL-compatible. It's not done yet, so I had to rewrite some code using advanced formatting directives with the plain Scheme code.
  • The CL's defstruct code is translated to define-record-type straightforwardly, except that our record type can't set the default value when the constructor argument is omitted. (R6RS records can do that using protocol. I prefer easier way, though.)

Tag: CommonLisp

More entries ...