Gauche Devlog


Method call optimization - skipping sort-applicable-methods

It's not that we don't sort methods. It's just that if we have only one applicable method, we don't need to call sort-applicable-methods at all. Obvious, right?

I thought it didn't matter, for that part is written in C (we call Scm_SortMethods) and it just returns without sorting at all when we have only one method. But lo and behold, it is quite effective. Here's the average of several runs, calling ref on a sparse vector 10M times.

Baseline:   3.065 real, 3.863 cpu
Optimized:  2.431 real, 2.793 cpu

Scm_SortMethods uses shell sort on C array to avoid allocation, and we have a setup overhead (convert list to array). That was a waste when we have only one method. And in many cases, we do.

Not bad for just a few lines of change.

We could also optimize for the cases we have just two or three applicable methods, in which case we can use hard wired comparison instead of using general sorting. It depends on how often we get such cases; some benchmark is required.

Tags: 0.9.6, sort-applicable-methods


Method call optimization - avoiding next-method

Gauche's object system is a direct descendant of stklos, which is based on TinyCLOS. Each method takes an implicit parameter, next-method, which is bound to a next-method object and can be used to invoke the less specific method in a method chain. It is similar to the super method in class-oriented object systems. However, the next-method object must remember the actual arguments used in the method's invocation. That is, when it is called without arguments, it should pass the original arguments to the next method:

gosh> (define-method foo ((a <number>)) `((number ,a)))
#<generic foo (2)>
gosh> (define-method foo ((a <real>)) (cons `(real ,a) (next-method)))
#<generic foo (2)>
gosh> (foo 3)
((real 3) (number 3))

Here, (next-method) in (foo <real>) passes 3 to the next method, (foo <number>). Note that the next-method object is a first-class object, so for example, it can be saved to a global variable:

gosh> (define nm #f)
gosh> (define-method foo ((a <real>)) (set! nm next-method) nm)
#<generic foo (2)>
gosh> (foo 3)
#<next-method (#<method (foo <number>)>)0 (3)>
gosh> nm
#<next-method (#<method (foo <number>)>)0 (3)>

And invoked later:

gosh> (nm)
((number 3))

This means that a next-method object must be allocated for every invocation of a method. In many cases, though, next-method isn't used at all in the method body, so it's a waste. Can we do something?

An easy path would be to let programmers indicate whether the method want to use next-method or not, probably using a CL-style method qualifier. But it's kind of silly---whether the method uses next-method or not is already shown in the code; why the programmer need to bother to say it again?

Fortunately, during compilation, we track how many times each local variable is referenced. So at some point in the compiler passes, we know whether next-method is used in the method body or not. We can extract that information and save in the method object, and when the method is invoked we check it and avoid next-method creation if possible.

The define-method form is expanded as follows:

(define-method name ((arg spec) ...) <method-body>)


(add-method! name
             (make <method>
               :specializers (spec ...)
               :body (lambda (next-method arg ...)

The body of the method is just a plain lambda form; it has nothing special about being a method body. So the compiler can't treat method body differently.

Instead, we introduced a general mechanism to save a list of unused argument in each closure; it is a meta-information and saved along source-code information.

The <method> initializer looks that information to find out whether next-method argument is used. If not, it sets method-leaf flag of the method object.

The method invocation code looks at the flag and omits creation of next-method object.

Let's see how effective this optimization is. This is the benchmark script.

(use gauche.time)

;; method without next-method
(define-method foo (x) #t)

;; method with next-method (for comparison)
(define-method bar (x) (when x (next-method)))

($ time-these/report 10000000
   `((foo . ,(cut foo #f))
     (bar . ,(cut bar #f))))

On 0.9.5:

Benchmark: ran foo, bar, each for 10000000 times.
  foo: 2.021 real, 3.550 cpu (3.470 user + 0.080 sys)@2816901.41/s n=10000000
  bar: 2.099 real, 3.680 cpu (3.570 user + 0.110 sys)@2717391.30/s n=10000000


Benchmark: ran foo, bar, each for 10000000 times.
  foo: 1.313 real, 1.740 cpu (1.730 user + 0.010 sys)@5747126.44/s n=10000000
  bar: 2.051 real, 3.670 cpu (3.570 user + 0.100 sys)@2724795.64/s n=10000000

In wall-clock time, it's about 35% speedup (actually, averaging multiple runs show it's more like 40%). It's also interesting that CPU time is halved---since GC runs in parallel, it indicates this modification generates a lot less garbage, hence less GC time.

Note: Since 0.9.6's built-in methods will be pre-compiled by the 0.9.5 compiler, you won't see the effect of this optimization on built-in methods in the 0.9.6 release. You'll need to rebuild the source with 0.9.6.

There are still lots of room of optimization in our method dispatch mechanism, and we'll explore it more in the comping releases.

Tags: 0.9.6, next-method


Another little improvement of online doc

Here's another little feature to be in 0.9.6. You can search info entries using regexp.

gosh> ,i #/^symbol-/
symbol->string           Symbols:62
symbol-append            Symbols:96
symbol-hash              Hashing:154
symbol-interned?         Symbols:54
symbol-sans-prefix       Symbols:87

Since module builds the table from entry names to locations in info documents, it's just easy to pick entry names that matches the given regexp.

This raises an interesting question: We already have apropos which can search symbols. What's the difference?

On systems such as CL or Clojure, where docstring is tied to symbols, it's reasonable to let apropos for searching, and documentation/doc for retrieving the actual document.

In Gauche, we chose not to adopt docstring convention; instead, we provide a way to lookup separately provided document. This allows you to browse the document of symbols that are not loaded into the current repl, which is handy, since often you need to read doc before finding which module to import.

We can consider apropos more as an introspection tool into the current running process. With that view, there could be some options to enhance apropos---e.g. showing visibility of each binding from the current module, and if it's visible, why (e.g. this is visible because the current module imports this module which inherits this module, etc.)

Tags: Documentation, repl


Little improvement of online doc

As of 0.9.5, online document (info command on REPL) only shows the specified entry, instead of the entire info page that contains the entry.

gosh> ,info make-queue
 -- Function: make-queue
     Creates and returns an empty simple queue.

It is easier to read than the entire page, but has one drawback---from the entry alone, it is not clear which module I should import. If you're reading it in info, it's easy to look up which section you're in, and the section shows the module name. The online doc is out of such context.

I've been mulling about it and finally decided to go for a kind of brute-force solution; add the module name to every entry. In HEAD (which is to be 0.9.6), it will show as follows:

gosh> ,info make-queue
 -- Function: make-queue
     {data.queue} Creates and returns an empty simple queue.

It may be a bit annoying when you're reading it in info document, for the module name is repeated in every entry. But it is more frustrating that necessary info isn't shown.

Tags: Documentation, repl


Splitting generators

(This entry is inspired by a private email exchange.)

Generators (ref:gauche.generator - Generators, srfi:121) are very handy abstraction to construct a data-flow pipeline. You can put together small parts, each of which modifies data streams locally, to carry out the whole computation. You can also concatenate (ref:gappend, ref:gconcatenate, ref:gflatten) or merge (ref:gmerge) generators.

But how about splitting? If you think in terms of data-flow, you sometimes want to split a stream, say, you have an incoming stream of integers and want to have two output streams, one with odd numbers and another with even integers. Why don't we have gpartition?

Once you start writing it, it becomes immediately apparent that you need infinite buffering. You split generator of integers to odds and evens. You try to read from odds. But what if the input generator yields even numbers consecutively? You have to save them up so that they can be retrieved later when you read from "even" generators.

It's doable, using a pair of queues for example, but it'll be a bit messy.

In fact, there's a lot simpler solution.

(use gauche.lazy)
(use gauche.generator)

(define (gpartition pred input-gen)
  (let1 input-lseq (generator->lseq input-gen)
    (values (list->generator (lfilter pred input-lseq))
            (list->generator (lfilter (complement pred) input-lseq)))))

That is, convert the input generator to a lazy seq (lseq), creates two lazy sequences from it, and convert them back to generators. The buffering is implicitly handled as a realized portion of the lazy sequence.

Let's split stream of integers into odds and evens:

gosh> (define-values (o e) (gpartition odd? (grange 0)))
gosh> (o)
gosh> (o)
gosh> (o)
gosh> (o)
gosh> (o)
gosh> (e)
gosh> (e)
gosh> (o)

However, if you do this, you might as well build the whole network using lseqs, without bothering to convert generators and lseqs back and forth.

This is a sort of unfortunate situation that you have to choose. In languages like Haskell, there's no question that you'll go with lazy sequences from the first place. In Gauche, and in Scheme general, lazy streams incur certain overhead compared to eager solutions and you need to speculate, before starting to write code, which way would lead a good balance between code clarity and performance.

Fully lazy stream (like srfi:40 and srfi:41) needs to allocate a thunk for each element, unless the implementation has special optimizations. Generators can eliminate per-item allocation at all. Lightweight lseqs (such as Gauche's ref:Lazy sequences and srfi:127) can be a good compromise, for it requires to allocate just one pair per item, and allocation of pairs is likely to be optimized in wider range of implementations. At this moment, I'd suggest to go with lseqs as a first choice, and if you can't afford allocations at all you'd switch to full-generator design.

Btw, with srfi-127, gpartition can be written as follows:

(use srfi-127)

(define (gpartition pred input-gen)
  (let1 input-lseq (generator->lseq input-gen)
    (values (lseq->generator (lseq-filter pred input-lseq))
            (lseq->generator (lseq-remove pred input-lseq)))))

Gauche has srfi-127 in the development HEAD.

Tags: generator, lseq, srfi-121, srfi-127

More entries ...