Gauche Devlog

2010/04/28

Shorter names

People have different opinions on terse code. I generally agree with Paul Graham's "Succinctness is Power", but I'm less bold than him and tend to take more cautious steps; I'm afraid that optimizing for small domain of applications may put us into a local maximum. For example, APL may be at such a local maximum; probably no other language can beat it on succinctness in the field APL is for, but extending it to broader applications looks almost impossible. (Disclaimer: I've never programmed in APL, just skimmed through several documentations. I may be wrong.)

There are certain things I can introduce to make code shorter without deviating from standard Scheme syntax, and actually I have been trying out some. For 0.9.1, I chose (at this moment) two of such ideas to support officially.

~ : a universal accessor.

One of the pervasive verbosities I see in my Lisp/Scheme code is accessors. Lisp tends to give accessors descriptive names, resulting code that looks like this:

  (slot-ref (vector-ref (hash-table-get pvhash 'prop) 0) 'name)

It has a virtue that the reader can guess pvhash is a hashtable that contains vectors of objects, without type declaration or anything. But... I often feel envy to other languages in which I may be able to write something like this:

  pvhash['prop'][0].name

Gauche already has a generic function ref defined on most aggregate types, so the above expression can be written as follows:

   (ref (ref (ref pvhash 'prop) 0) 'name)

It is shorter, but it doesn't feel much shorter. Maybe it's because the number of nodes or nesting levels remains the same.

Issac Trotts has suggested ref*, which allows chaining. (ref* a b c) = (ref* (ref* a b) c) etc. It hasn't been officially documented but in Gauche since 2006. Using ref*, the expression gets even shorter.

   (ref* pvhash 'prop 0 'name)

However, I found myself not using ref* much. Most of the places where I use accessor, it is not chained, so ref is shorter. And I felt it cumbersome to switch to a ref* when I found I need chanining.

So I looked for shorter abbreviation of ref*. (. x y) will be easier to understand from the convention of C family, but that required changing S-expression syntax (Clojure took this notation, though). (-> x y) and (.. x y) are also easily understandable, but those requires two characters. Ideally I want to cut it down to a single character. I tried (@ x y) for several projects, but ultimately found it stood out too much.

After all, I settled down to ~. There are not many one-character punctuations that are usable, so it is rather chosen by elimination. I think it not bad once you get used to, but if you find it doesn't look good, well, you can always use ref. The above example comes down to this:

   (~ pvhash 'prop 0 'name)

Which is only two characters wider than my hypothetical example:

   pvhash['prop'][0].name

For slot accesses, now I tend to drop the space before the quote of the slot name:

   (~ obj'slot1'slot2'slot3) 

Which I think is not bad compared to these:

   obj.slot1.slot2.slot3
   obj->slot1->slot2->slot3

Oh, and note also that generalized setter works.

   (set!  (~ pvhash 'prop 0 'name) "newname")

This comes very handy when you have to write OO-ish code (I mean, network of mutable objects).

^: an alias of lambda.

Quack has a nice feature to display lambda as λ, but that's an illusion. Your code still contains six-character lambda and opening the code with other editors reveals that. Seasoned Lisper/Schemers are trained to recognize these six characters as one chunk, so it may not be much a problem for reading, but still it takes screen estate. I'm an old type who feels awkward when code doesn't fits in 80-columns. Having six characters for lambda tends to make my line longer which makes me insert more line breaks, which resulting vertically streched code. Arrrggg.

Gauche doesn't have a problem to treat λ as an alias of lambda, but typing λ is not very convenient, probably except for Greeks.

So again, I looked for punctuation characters I can use. This time ^ seemed a good choice. Actually, the use of λ for functional abstraction originally came from a caret, I was told.

Combined with ~, I found I could cram more logic to a single line.

(fold (^(block sum) (+ (or (~ block'recv-total) 0) sum)) 0 (~ db'blocks))

I also define ^a, ^b, ... ^z, ^_ as a macro for the case of single argument function. That is, (^p (string? (cdr p))) is the same as (lambda (p) (string? (cdr p))). (Note that in this case I can't use cut (ref:cut)).

I've been testing these for, well, probably more than two years, and I feel they have a merit. They can also be implemented relatively easily in portable Scheme, so they don't break portability significantly (compared to introducing a new syntax, such as Clojure's use of vectors as shorthand notation of closures).

Tags: 0.9.1, ~, ^

2010/04/27

Blowfish password hashing

Ever since I read this article on password hashing, which was more than two years ago, I wanted to bundle a proper password hashing library to Gauche. I finally managed to put bcrypt in, as crypt.bcrypt module. (ref:crypt.bcrypt : This link will be valid after 0.9.1 release).

The bcrypt library does password encryption with adaptive hashing, using blowfish algorithm. (Do not confuse it with another program also called bcrypt, which also uses blowfish but it encrypts files.) They say it is the default password hashing used on OpenBSD. Most popular lightweight languages have an extension package to use bcrypt password hashing, e.g. bcrypt-ruby for Ruby, py-bcrypt for Python, or Crypt::Eksblowfish::Bcrypt for Perl.

I decided to include crypt.bcrypt in the Gauche distribution instead of making it an optional extension package. Gauche has sys-crypt (ref:sys-crypt) which is a direct interface to system's crypt(3) function, and also a few popular hash functions such as MD5 and SHA1/SHA2 (ref:rfc.md5, ref:rfc.sha). If crypt.bcrypt is something you have to install separately, it would be very tempting to use one of those coming with Gauche by default instead. I confess I have used MD5 before for an web app.

I'm totally convinced by the Thomas Ptacek's article I linked at the top of this post; original unix crypt(3) or even MD5 or SHA shouldn't be used for password hashing anymore. It's not so much because MD5 is a weak hash; it's because password hashing requires different property than data hashing schemes such as SHA. See the article for more details. But even if somebody doesn't agree or doesn't care, there's not much reason to use weaker schemes if the proper hashing library also comes by default.

I took bcrypt code from http://www.openwall.com/crypt/ , which is provided in public domain. So it doesn't add any external dependency.

Using bcrypt is very easy. You only need to remember just one function, bcrypt-hashpw. To calculate a password hash, call it with the password string:

  (bcrypt-hashpw password-string)  => hashed-string

The library automatically takes care of salting. To check if the given password is correct, pass the given password and hashed string. If returned string matches the provided hashed string, the password is correct.

  (bcrypt-hashpw password-string hashed-string) => hashed-string

There's also a function that you can calcurate a secure salt if you want.

Tags: 0.9.1, crypt.bcrypt

2010/04/26

Enhanced queue

util.queue module (ref:util.queue) has been completely rewritten for 0.9.1.

The main motivation is to support thread-safe queue (mtqueue). Last couple of years I had quite a few projects that needed to fully utilize multiple cores, and for almost every one of them I ended up writing some kind of thread-safe queue. It is a simple but robust way as a synchronization primitive. Naturally the feature is a strong candidate for Gauche to provide natively.

Writing a thread-safe queue is seemingly a simple task. Yet it took some time for me to come up an API I feel right.

Just protecting access to a queue by a mutex makes the queue thread-safe, but it doesn't make the queue usable for synchronization; we want a consumer thread to block when it finds the queue is empty, and to resume automatically as soon as some data is available in the queue.

Providing a simple-minded thread-safety, that just protects access by a mutex, isn't a good strategy. It doesn't help much to build synchronization operations, since the latter needs a condition variable to cooperate with the mutex of the queue. Such a strategy is a dead-end of abstraction; it does abstract something, but does not useful to build more abstractions on top of it.

So, we use a condition variable and let a consumer thread block on an empty queue. Is that enough? Not really.

If a producer thread fails for some reason, the consumer thread would wait indefinitely. It is usually unacceptable in the production code; the consumer thread must detect abnormality and try to recover, report to supervisor, or shut itself down gracefully. There's also a situation that a consumer thread fails; in such case a producer thread would just keep pushing the data into the queue until it exhausts memory, which is also undesirable.

I explored some options. In one option, a queue had low and high water marks, and a callback was called when the length goes below the low mark or above the high mark, and it adjusted the pace of either a consumer or a producer threads. It worked for the project, but I felt it too complicated for general use.

Finally I ended up on a simple concept of timeouts, plus optional upper-bound of the queue length. The API is parallel to other timeout-able thread operations so that it's easy to remember, and I think it is general enough so that more sophisticated stuff can be built on top of it.

Here's an outline of synchronizing mtqueue operations. See the manual in svn trunk if you are curious to know the details.

  • make-mtqueue :key max-length
    Returns a thread-safe queue, with optional upper bound of the queue length.
  • dequeue/wait! mtqueue :optional (timeout #f) (timeout-val #f)
    Pop an item from mtqueue, or block if the queue is empty, with optional timeout. The semantics of timeout and timeout-val are the same as thread-join! etc. (ref:thread-join!).
  • enqueue/wait! mtqueue obj :optional (timeout #f) (timeout-val #f)
    Push obj into the queue, or block if the queue has maximum number of elements, with optional timeout.

Note that there are also dequeue! and enqueue! that do not block. They work on both mtqueue and normal (non-thread-safe, but cheap) queue. dequeue! throws an error by default if the queue is empty, though you can suppress it by providing a fallback value.

Tags: 0.9.1, util.queue

2010/04/26

Better REPL?

Gosh's default REPL provides bare minimum features. I almost always run gosh inside Emacs and usually it's fine, but sometimes I do feel it can be better.

So I'm experimenting ideas here: http://gauche.svn.sourceforge.net/viewvc/gauche/Gauche-scripts/trunk/xrepl/

Here are some stuff I have. I'll use it for a while to see if it's worth to be included into gosh or not. Other ideas and feedbacks are welcome.

Prettyprinting the result

It has rudimental pretty printer (which is a bit dumb; it may take exponential time for large S-expr that I need to fix). Ideally pretty printer should be a part of Gauche, integrated with built-in format etc.

gosh> (call-with-input-string
          (values-ref (http-get "blog.practical-scheme.net" "/gauche") 2)
        (cut ssax:xml->sxml <> '()))
(*TOP*
 (html
  (head
   (title "Gauche Devlog")
   (base (|@| (href "http://blog.practical-scheme.net/gauche")))
   (link
    (|@| (type "application/rss+xml") (title "RSS") (rel "alternate")
     (href "http://blog.practical-scheme.net/gauche?c=rss")))
   (link (|@| (type "text/css") (rel "stylesheet") (href "blog.css")))
   (link (|@| (type "text/css") (rel "stylesheet") (href "gauche-devlog.css"))))
  (body
...

History

Accessing recent evaluation results. This feature has been sitting long in my wishlist, but I couldn't make up my mind of what symbols I would use; CL uses *, **, ***, ... but in Scheme * would conflict with the built-in function. I finally decided to adapt Clojure's convention; *1 refers to the last evaluated result, *2 for the second last, etc.

gosh> *1
(*TOP*
 (html
  (head
   (title "Gauche Devlog")
...
gosh> (length (cadr *1))
3

The procedure *history prints the list of history.

*1, *2, ... only keeps the first value of the result(s). In case if we have more than one values, *1+, *2+, ... keeps the list of values.

If the last evaluation ends as an error, the condition object is bound to *e.

gosh> (car 3)
*** ERROR: pair required, but got 3
Stack Trace:
_______________________________________
  0  (with-error-handler (lambda (e) (let ((e e)) (%guard-rec e e (else ...
        [unknown location]
gosh> *e
#<error "pair required, but got 3">

Shell-like commands

You can type shell commands by leading it with a comma:

gosh> ,cd ~/src/Gauche
/home/shiro/src/Gauche
gosh> ,ls
AUTHORS       README          configure.ac       ltmain.sh
COPYING       VERSION         configure.ac.orig  m4/
ChangeLog     acinclude.m4    doc/               missing*
DIST*         aclocal.m4      examples/          mkinstalldirs*
DIST_EXCLUDE  config.guess*   ext/               rpmfiles-common.txt
Gauche.spec   config.log      framework.sh*      rpmfiles-encoding.txt
HACKING       config.rpath*   gc/                rpmfiles-gdbm.txt
INSTALL.in    config.status*  gc.diff            src/
Makefile      config.sub*     install-sh*        test/
Makefile.in   config.threads  lib/               test.record
NEWS          configure*      libsrc/            winnt/
gosh> ,make -j
for d in gc src lib ext doc; do (cd $d; make all); done
make[1]: Entering directory `/home/shiro/src/Gauche/gc'
...

Sometimes I'm too lazy to switch from *scheme* buffer to *shell* buffer (or, more like that I forget to switch and am annoyed by getting 'unbound variable: ls').

It would be nicer if I can easily grab the shell output into Scheme variable, and feed the content of Scheme variable into shell commands.

Smarter (or verbose) error handling

I'm not sure I'll keep this one, but giving a try.

"." isn't in *load-path* by default, for the obvious security reasons. But sometimes I forget to type "./foo.scm" and being annoyed, for I know foo.scm in the current dir is safe. So, here's a little help...

gosh> (load "t.scm")
*** REPL: Cannot load file "t.scm" in *load-path* ("../lib" "../libsrc" "../src" "/usr/share/gauche/site/lib" "/usr/share/gauche/0.9.1_pre1/lib" "/usr/share/gauche/0.9/lib" "/home/shiro/src/Gauche-scripts/xrepl/")
But you have the file under the current directory.
Do you want to add "." in *load-path*? (y/n): 

This is actually implemented in more general "error filter" framework. When REPL gets an error, a series of registered tests are run, and if there's a match, an associated action is invoked.

This kind of thing may be too annoying (in fact, I don't even like CL implementations entering the debugger on error by default.) But I'll see.

Tag: repl

2010/04/26

ERR5RS Records and beyond

Release 0.9.1 will include support for ERR5RS Records (srfi:99). It also supersedes the existing srfi-9 records (srfi:9).

The srfi-9 record implementation as of Gauche 0.9 is just a thin wrapper of Gauche's class facility. That is, a record type that has slot x, y, z is actually an ordinary Gauche class with slot x, y, z. It has no advantage to, and less features than Gauche's object system, except that it is portable.

In 0.9.1 I make records a bit more special. A record type is still a class, but using MOP, it is customized to take advantage of inflexibility of records---we can make it more efficient. So in some places records can be better choice than classes.

First of all, I don't make record types obey the class redefinition protocol. If you define a record type and bound to a variable which previously bound to another record type, then the variable is just overwritten; the old type and the new type don't have any relation; and instances of the old type remains as they are, instead of being upgraded to the new type.

This allows us to allocate a record instance as a flat array, instead of a header pointing to a value array. We can also pre-calculate offsets of slots for accessors and modifiers. I'm also planning to enable some aggressive inlining so that record accessors and modifiers are inline expanded to be as fast as vector access. (The inlining part has not implemented yet. But it is a fun MOP twiddling to adapt the record semantics into Gauche's class system).

Secondly, I extended ERR5RS and added a "pseudo record type", which treats other type of aggregates such as lists or vectors as if they are records. For Common Lisp users, it is like (:type list) or (:type vector) option of defstruct. Here's an example:

gosh> (use gauche.record)
#<undef>
gosh> (define-record-type (point (pseudo-rtd <vector>))
        #t #t (x) (y) (z))
point-z-set!
gosh> (make-point 2 3 4)
#(2 3 4)
gosh> (point-x (make-point 2 3 4))
2
gosh> (let1 p (make-point 2 3 4) (point-z-set! p 9) p)
#(2 3 9)

The point record type inherits a pseudo record type returned from 'pseudo-rtd, which in this case a record type that uses a vector as a storage. The constructor make-point returns a vector. Accessors and modifiers takes a vector.

Besides the efficiency (vector access is very fast in Gauche), I see another advantage of this; it is less binding. Suppose your library defines concrete point record type, and ask users to send the data in points through API. If the user module have different representation of points, it have to convert every points back and forth to call your library. If your library exposes points with pseudo type, on the other hand, the user can simply use vectors; which can be interpreted in may other ways.

Currently I'm only testing with vector-backed records, but the plan is also enable uniform-vector backed ones---then it will be more attractive since uniform vectors can be directly passed into GL layer (via Gauche-gl).

Finally, a small addition to the srfi; I made the record accessor work with generalized set!, if the slot is mutable.

gosh> (define-record-type pare #t #t (kar) (kdr))
pare-kdr-set!
gosh> (define p (make-pare 1 2))
p
gosh> (set! (pare-kar p) 100)
#<undef>
gosh> (pare-kar p)
100

Another idea I'm pondering is to define a default printer to records, so that it prints out its slot values something like CL's struct.

Tags: 0.9.1, gauche.record, srfi-9, srfi-99

More entries ...