Gauche Devlog

< Blowfish password hashing | Regexp read-write invariance >


Shorter names

People have different opinions on terse code. I generally agree with Paul Graham's "Succinctness is Power", but I'm less bold than him and tend to take more cautious steps; I'm afraid that optimizing for small domain of applications may put us into a local maximum. For example, APL may be at such a local maximum; probably no other language can beat it on succinctness in the field APL is for, but extending it to broader applications looks almost impossible. (Disclaimer: I've never programmed in APL, just skimmed through several documentations. I may be wrong.)

There are certain things I can introduce to make code shorter without deviating from standard Scheme syntax, and actually I have been trying out some. For 0.9.1, I chose (at this moment) two of such ideas to support officially.

~ : a universal accessor.

One of the pervasive verbosities I see in my Lisp/Scheme code is accessors. Lisp tends to give accessors descriptive names, resulting code that looks like this:

  (slot-ref (vector-ref (hash-table-get pvhash 'prop) 0) 'name)

It has a virtue that the reader can guess pvhash is a hashtable that contains vectors of objects, without type declaration or anything. But... I often feel envy to other languages in which I may be able to write something like this:


Gauche already has a generic function ref defined on most aggregate types, so the above expression can be written as follows:

   (ref (ref (ref pvhash 'prop) 0) 'name)

It is shorter, but it doesn't feel much shorter. Maybe it's because the number of nodes or nesting levels remains the same.

Issac Trotts has suggested ref*, which allows chaining. (ref* a b c) = (ref* (ref* a b) c) etc. It hasn't been officially documented but in Gauche since 2006. Using ref*, the expression gets even shorter.

   (ref* pvhash 'prop 0 'name)

However, I found myself not using ref* much. Most of the places where I use accessor, it is not chained, so ref is shorter. And I felt it cumbersome to switch to a ref* when I found I need chanining.

So I looked for shorter abbreviation of ref*. (. x y) will be easier to understand from the convention of C family, but that required changing S-expression syntax (Clojure took this notation, though). (-> x y) and (.. x y) are also easily understandable, but those requires two characters. Ideally I want to cut it down to a single character. I tried (@ x y) for several projects, but ultimately found it stood out too much.

After all, I settled down to ~. There are not many one-character punctuations that are usable, so it is rather chosen by elimination. I think it not bad once you get used to, but if you find it doesn't look good, well, you can always use ref. The above example comes down to this:

   (~ pvhash 'prop 0 'name)

Which is only two characters wider than my hypothetical example:


For slot accesses, now I tend to drop the space before the quote of the slot name:

   (~ obj'slot1'slot2'slot3) 

Which I think is not bad compared to these:


Oh, and note also that generalized setter works.

   (set!  (~ pvhash 'prop 0 'name) "newname")

This comes very handy when you have to write OO-ish code (I mean, network of mutable objects).

^: an alias of lambda.

Quack has a nice feature to display lambda as λ, but that's an illusion. Your code still contains six-character lambda and opening the code with other editors reveals that. Seasoned Lisper/Schemers are trained to recognize these six characters as one chunk, so it may not be much a problem for reading, but still it takes screen estate. I'm an old type who feels awkward when code doesn't fits in 80-columns. Having six characters for lambda tends to make my line longer which makes me insert more line breaks, which resulting vertically streched code. Arrrggg.

Gauche doesn't have a problem to treat λ as an alias of lambda, but typing λ is not very convenient, probably except for Greeks.

So again, I looked for punctuation characters I can use. This time ^ seemed a good choice. Actually, the use of λ for functional abstraction originally came from a caret, I was told.

Combined with ~, I found I could cram more logic to a single line.

(fold (^(block sum) (+ (or (~ block'recv-total) 0) sum)) 0 (~ db'blocks))

I also define ^a, ^b, ... ^z, ^_ as a macro for the case of single argument function. That is, (^p (string? (cdr p))) is the same as (lambda (p) (string? (cdr p))). (Note that in this case I can't use cut (ref:cut)).

I've been testing these for, well, probably more than two years, and I feel they have a merit. They can also be implemented relatively easily in portable Scheme, so they don't break portability significantly (compared to introducing a new syntax, such as Clojure's use of vectors as shorthand notation of closures).

Tags: 0.9.1, ~, ^

Past comment(s)

Taylor Venable (2010/04/29 13:48:30):

For what it's worth, if one does like using λ in one's code: In Vim, Greek characters can be written fairly easily using the builtin digraphs in insert mode — ^k l * inserts λ, ^k a * inserts α, etc. See :help digraph for more info. With Emacs one can enter TeX input mode (C-x <RET> C-\ TeX) and type \lambda - the input mode can then be toggled back and forth with C-\.

shiro (2010/04/29 15:08:06):

Thanks, Taylor. Didn't know much about Vim. The age of using non-ascii characters freely in source code has already come, it seems? (Fortress may fly!) With emacs, I use C-\ to switch Japanese and Ascii input mode. C-\ ramuda SPC RET gives me λ. ("ramuda" is a phonetic description of "らむだ", which is kana description of "λ"). If people doesn't have a problem using λ, then I could even provide λ as an alias of lambda by default.

I still have memory of bad experiences from 90s where source code containing Japanese characters caused lots of headaches, because we had a few different encodings. That makes me reluctant to use extended characters in the source. Old habits die hard...

Taylor Venable (2010/04/29 19:03:17):

I think being able to use λ for "lambda" would be cool; PLT does it (DrScheme also provides the convenience of inserting λ when typing C-\) so from that perspective there is some precedent too. Of course, those writing portable code probably wouldn't want to use it, but as I usually write for one specific system, I'd much prefer λ when writing compact lambdas.

By the way, I think having a blog for developing work in Gauche is a great idea, I look forward to reading more about the latest and greatest features. :-)

Grant Rettke (2010/05/13 14:38:08):

Emacs 'pretty-mode is nice for Scheme and any other language where you want to visualize your programs representation differently.

John Cowan (2012/04/19 18:10:10):

Note that 'a'b'c is not portable; some Schemes interpret it as (quote a'b'c) rather than (quote a) (quote b) (quote c).

shiro (2012/04/20 07:22:35):

Hi John. Yes, I'm following the discussion about the delimiters in r7rs discussion list. I'm ambivalent---I like the compactness of (~ a'b) if the quote character delimits the symbol. OTOH, I occasionally wish I could have a variable name with primes e.g. a', probably because of Haskell influence.

Arne Babenhauserheide (2014/01/21 16:35:51):

In GNU Guile you can also just use λ for lambda - and you can configure your keyboard with extra layers which make it easy to insert - for example via

shiro (2014/01/22 01:33:36):

Since unicode has become de-facto in Scheme source, λ can be a sensible choice. These days I find myself using non-ascii letters in source code without worrying too much.

Post a comment