Gauche Devlog


Splitting generators

(This entry is inspired by a private email exchange.)

Generators (ref:gauche.generator - Generators, srfi:121) are very handy abstraction to construct a data-flow pipeline. You can put together small parts, each of which modifies data streams locally, to carry out the whole computation. You can also concatenate (ref:gappend, ref:gconcatenate, ref:gflatten) or merge (ref:gmerge) generators.

But how about splitting? If you think in terms of data-flow, you sometimes want to split a stream, say, you have an incoming stream of integers and want to have two output streams, one with odd numbers and another with even integers. Why don't we have gpartition?

Once you start writing it, it becomes immediately apparent that you need infinite buffering. You split generator of integers to odds and evens. You try to read from odds. But what if the input generator yields even numbers consecutively? You have to save them up so that they can be retrieved later when you read from "even" generators.

It's doable, using a pair of queues for example, but it'll be a bit messy.

In fact, there's a lot simpler solution.

(use gauche.lazy)
(use gauche.generator)

(define (gpartition pred input-gen)
  (let1 input-lseq (generator->lseq input-gen)
    (values (list->generator (lfilter pred input-lseq))
            (list->generator (lfilter (complement pred) input-lseq)))))

That is, convert the input generator to a lazy seq (lseq), creates two lazy sequences from it, and convert them back to generators. The buffering is implicitly handled as a realized portion of the lazy sequence.

Let's split stream of integers into odds and evens:

gosh> (define-values (o e) (gpartition odd? (grange 0)))
gosh> (o)
gosh> (o)
gosh> (o)
gosh> (o)
gosh> (o)
gosh> (e)
gosh> (e)
gosh> (o)

However, if you do this, you might as well build the whole network using lseqs, without bothering to convert generators and lseqs back and forth.

This is a sort of unfortunate situation that you have to choose. In languages like Haskell, there's no question that you'll go with lazy sequences from the first place. In Gauche, and in Scheme general, lazy streams incur certain overhead compared to eager solutions and you need to speculate, before starting to write code, which way would lead a good balance between code clarity and performance.

Fully lazy stream (like srfi:40 and srfi:41) needs to allocate a thunk for each element, unless the implementation has special optimizations. Generators can eliminate per-item allocation at all. Lightweight lseqs (such as Gauche's ref:Lazy sequences and srfi:127) can be a good compromise, for it requires to allocate just one pair per item, and allocation of pairs is likely to be optimized in wider range of implementations. At this moment, I'd suggest to go with lseqs as a first choice, and if you can't afford allocations at all you'd switch to full-generator design.

Btw, with srfi-127, gpartition can be written as follows:

(use srfi-127)

(define (gpartition pred input-gen)
  (let1 input-lseq (generator->lseq input-gen)
    (values (lseq->generator (lseq-filter pred input-lseq))
            (lseq->generator (lseq-remove pred input-lseq)))))

Gauche has srfi-127 in the development HEAD.

Tags: generator, lseq, srfi-121, srfi-127


0.9.5, and what's next

Released 0.9.5, finally. Every release I say I'll make next release quicker, but slip away every time, so I don't say it this time. (I did have hard time to put the release notes together, so I do hope shorter release notes next time.)

There are a few things I marked "after 0.9.5", so I guess I start with them. Here are top priority items:

  • Finishing up line-editing feature: It's "experimental" in 0.9.5, with a few known issues. Let's make it official. Completion feature is also nice to have.
  • Rewriting internal legacy macros with ER-macros. I can only do this after 0.9.5, since the source must be compilable by the latest release. I'm also hoping to rewrite syntax-rules expander in Scheme.
  • I like to look into better instrument to examine what's going on in the running program---e.g. debugger and profiler. The advantage of dynamically typed language is that you can examine and modify the running program without stopping it. We need to take maximum advantage of it.
  • Better way to organize and maintain library packages written by others---a central site to register and track them, I'm thinking.

Other thoughts, in more long term, but before 1.0:

  • Performance - I haven't visited this area for a while, but the technology in this field is advancing, so it might be a good time to revisit it once more. The current VM instructions was designed when ScmWord is 32bit and it's not likely to optimal for 64bit architecture; especially, lots of instruction space is wasted. I might redesign VM instructions, with possibility of adopting JIT.
  • Deployment - there are several experimental features---such as precompiling Gauche program, or statically linking Gauche runtime---but there's no standard workflow yet.

Tag: 0.9.5


Better error reporting during loading/compiling

As the release of 0.9.5 nears, we'd like to post some miscellaneous topics about the features introduced in 0.9.5 so that it can be referred back.

If you've been using HEAD, you might have noticed slightly different error message when you hit something wrong during loading or compiling. This is how it looks like (I added random thingy in rfc.uri just to trigger an error.):

gosh> (use rfc.http)
*** ERROR: unbound variable: error!
    While loading "../lib/rfc/uri.scm" at line 344
    While compiling "../lib/rfc/http.scm" at line 46: (define-module rfc.http (use srfi-11) (use srfi-13) (use rfc.822) (use rfc.uri) (use rfc.base64) (use ...
    While loading "../lib/rfc/http.scm" at line 76
    While compiling "(standard input)" at line 1: (use rfc.http)
Stack Trace:
  0  (eval expr env)
        at "../lib/gauche/interactive.scm":282

It shows that Gauche just evaluated the form typed in to REPL, which is (use rfc.http), which caused to load ../lib/rfc/http.scm, which triggers to compile (define-module ...), which in turn loads ../lib/rfc/uri.scm, which had a problem.

We implement this feature using compound conditions.

  1. load-from-port and compile catches an error, and re-raise a compound condition with the original condition plus a mixin condition that holds the context of loading or compilation.
  2. The mixin condition, <load-condition-mixin> or <compile-error-mixin>, inherits <mixin-condition>.
  3. The standard error reporting procedure, report-error, prints the information of thrown condition before the stack trace. If the thrown condition consists of a main condition (e.g. <error> and several mixin conditions, it first shows the main condition, then lists the mixin condition info using a method report-mixin-condition. In the above example, the ERROR: line is the main condition, and each While ... line is generated by report-mixin-condition method.

This mechanism is open to the user---if you want to add some context information to the error, all you need to do is:

  1. Define a condition as a subclass of <mixin-condition>.
  2. Capture the error while you're doing stuff and re-raise a condition with additional mixin condition.
    (guard (e [(condition? e)
               (raise ($ make-compound-condition e
                         $ make <my-mixin-condition> :key val ...))])
  3. Define report-mixin-condition specialized to your mixin condition.
    (define-method report-mixin-condition ((e <my-mixin-condition>) port)

You can see how <load-mixin-condition> and <compile-error-mixin> condition are handled in src/libomega.scm.

Tags: Condition, 0.9.5


Did you mean...?

Recently I improved REPL info feature (browsing document on REPL) in a few ways. If you use Emacs, it is more convenient to jump to info directly; however, having document available on REPL is sometimes handy, for example, when you start gosh on ssh prompt to do some quick chores.

To view online doc on REPL, use info function, or toplevel ,info command:

gosh> ,info assoc
 -- Function: assq obj list
 -- Function: assv obj list
 -- Function: assoc obj list :optional key=
     [R7RS][SRFI-1] Each element in LIST must be a pair.  These
     procedures search a pair whose car matches OBJ (in the sense of
     'eq?' for 'assq', 'eqv?' for 'assv', and 'equal?' for 'assoc') from
     left to right, and return the leftmost matched pair if any.  If no
     pair matches, these return '#f'.

     If the optional argument of 'ascoc' is given, it is called instead
     of 'equal?' to check the equivalence of OBJ and each key.

Here are the new features:

  • You can now search not only function and syntax names, but also module name and class names:
gosh> ,info data.imap
 -- Module: data.imap
     This module provides a immutable data structure with O(log n)
     access and update operations (here, update means to return a new
     structure with requested changes).  The current implementation is
     based on the functional red-black tree.

gosh> ,info <tree-map>
 -- Builtin Class: <tree-map>
     Tree map class.  Tree maps are a data structure that maps key
     objects to value objects.  It's like hash tables except tree maps
     uses balanced tree internally.  Insertion and lookup is O(log n).

     Unlike hashtables, a tree map keeps the order of the keys, so it is
     easy to traverse entries in the order of keys, to find
     minimum/maximum keys, or to find a key closest to the given value.

     The '<tree-map>' class inherits '<sequence>' and
  • If there are more than one entry with the same name, the system prompts you to choose:
gosh> ,info string-map
There are multiple entries for string-map:
 1. "R7RS base library"
 2. "SRFI-13 String mapping"
Select number, or q to cancel [1]: 
 -- Function: string-map proc str ...
 -- Function: string-for-each proc str ...
     [R7RS] These take different arguments from 'string-map' and
     'string-for-each' in SRFI-13 (*note SRFI-13 String mapping::), so
     provided only in 'scheme.base' module to avoid confusion.

  • If there are no entries for the given word, it suggests similar words if there's any:
gosh> ,info vectre
There is no entry for vectre.
Did you mean:

I hope these changes would make REPL more pleasant to use. BTW, is ,info intuitive enough? I'm also tempted to add alias ,doc, especially for clojure users.

Oh, by the way, when you're on Emacs, this little elisp code allows you to jump to info page with a few keystrokes on a word...

(defun gauche-info-index (topic)
   (list (read-string
          (concat "Gauche help topic : ")
  (switch-to-buffer-other-window (get-buffer-create "*info*"))
  (info "/usr/share/info/")
  (Info-index topic))

(define-key global-map "\C-xH" 'gauche-info-index)

Tags: info, repl



A new comparator srfi, srfi:128, was finalized a couple of weeks ago. It is a rewrite of srfi:114, with simpler interface that appears to work with other parts of Scheme better (e.g. the comparison procedure of srfi-114, which returns -1, 0 or 1, doesn't seem Scheme-y, even though such interface is seen in other languages.)

I like srfi-128 better and I would've adopted it natively if Gauche hadn't already adopted srfi-114. At this moment, however, srfi-114 comparator is built-in and tightly coupled with Gauche's native datatypes such as built-in treemaps. Also, Gauche's internal compare API has been returning -1/0/1 scheme, so it goes better with srfi-114.

On the other hand, we expect future portable libraries will use srfi-128, and we want to encourage Gauche libraries to do so. We don't want to penalize code using srfi-128 with the overhead of adaptation layer.

So we chose to support both srfi-114 and srfi-128 natively. Srfi-114 comparator can emulate srfi-128's, and vice versa. Speed-sensitive internal code checks which flavor the passed comparator is, and uses optimized path. Most Scheme-level code would use high-level utilities such as =? and <? anyway, which hides the difference under the hood.

The only conflict in two srfis is the constructor, make-comparator. We already extended srfi-114 make-comparator to take the optional name argument, so it'll be tricky to distinguish two flavors with extra arguments. Besides, we do want to encourage new code to use srfi-128; so we decided to break the backward compatibility.

Gauche's make-comparator will be an extension of srfi-128, with optional name argument. We provide srfi-114 comparator under a different name make-comparator/compare. (The name could be confusing, but it reflects the compare generic function which is compatible to srfi-114's comparison procedure---thus, make-comparator with compare-like procedure.) If you import srfi-114, it imports make-comparator/compare as make-comparator and shadows the built-in one.

Gauche have supported subset of srfi-114 as built-in. Some of them are the same in srfi-128, some have different names, and some are newly introduced in srfi-128. Here's the correspondence.

srfi-114 srfi-128 notes
make-comparator *1
comparator? comparator?
comparator-comparison-procedure? comparator-ordered?
comparator-hash-function? comparator-hashable?
comparator-type-test-procedure comparator-type-test-predicate
comparator-equality-predicate comparator-equality-predicate
comparator-comparison-procedure *2
comparator-ordering-predicate *3
comparator-hash-function comparator-hash-function
comparator-test-type comparator-test-type
comparator-check-type comparator-check-type
comparator-hash comparator-hash
comparator-compare *4
default-hash *5
  • *1 Provided as make-comparator/compare.
  • *2 If srfi-128 comparator is given, returns a synthesized procedure that uses equality and ordering predicates.
  • *3 If srfi-114 comparator is given, returns a synthesized procedure that uses comparison procedure.
  • *4 If srfi-128 comparator is given, use its equality and ordering predicates.
  • *5 Same as Gauche's hash.

The change is currently undergone. We expect this to be the last major change before releasing 0.9.5.

Tags: Comparator, srfi-114, srfi-128

More entries ...