Gauche Devlog

< Partial continuations and fun of stack fiddling | To quote or not to quote >

2010/11/19

Some improvements of constant propagation

Most of this feature was implemented long time ago (December 2009 - April 2010) but have no appropriate place to describe except in release notes, which haven't yet come. It is probably worth to mention it here.

A constant propagation code in the compiler was overhauled and now it can precompute a lot wider range of the code. It can recognize constant bindings defined by define-constant, and many built-in functions are precomputed if it is side-effect free and arguments are constant expressions.

For example, in the following code (logior ...) is precomputed at the compile time and becomes a constant value, as most C programmers would expect:

(define-constant *flag-0* (ash 1 0))
(define-constant *flag-3* (ash 1 3))

(func (logior *flag-0* *flag-3*))

(Note that macros can't save this case, since a macro can only see the code it contains; it can't know whether *flag-0* is constant binding, if it is defined outside of the macro, unless you put all the constant bindings to macro expansion phase, which has its own inconvenience.)

If you are unsure, you can use disasm to check.

gosh> (disasm (lambda () (func (logior *flag-0* *flag-3*))))
CLOSURE #<closure #f>
main_code (name=#f, code=0x20f7820, size=4, const=1, stack=4):
args: #f
     0 CONSTI-PUSH(9) 
     1 GREF-TAIL-CALL(1) #<identifier user#func>; (func (logior *flag-0* *flag-3*))
     3 RET 

Precomputation isn't limited to numeric computations.

gosh> (define-constant *data* '(#(a b c) #(d e f)))
*data*
gosh> (disasm (lambda () (vector-ref (cadr *data*) 1)))
CLOSURE #<closure #f>
main_code (name=#f, code=0x1024b70, size=2, const=1, stack=0):
args: #f
     0 CONST-RET e

Currently only built-in SUBRs (procedures directly implemented in C) are subject of precomputation. Note that some elementary functions are defined in Scheme to handle complex numbers, on top of real-only SUBR version.

gosh> (use math.const)
#<undef>

;; this doesn't fold constants
gosh> (disasm (lambda () (func (/ (sqrt pi) 2))))
CLOSURE #<closure #f>
main_code (name=#f, code=0x21d4ea0, size=12, const=3, stack=11):
args: #f
     0 PRE-CALL(1) 6
     2 CONST-PUSH 3.141592653589793
     4 GREF-CALL(1) #<identifier user#sqrt>; (sqrt pi)
     6 PUSH 
     7 CONSTI(2) 
     8 NUMDIV2                  ; (/ (sqrt pi) 2)
     9 PUSH-GREF-TAIL-CALL(1) #<identifier user#func>; (func (/ (sqrt pi) 2))
    11 RET 

;; ... but this does:
gosh> (disasm (lambda () (func (/ (%sqrt pi) 2))))
CLOSURE #<closure #f>
main_code (name=#f, code=0x21cf540, size=5, const=2, stack=4):
args: #f
     0 CONST-PUSH 0.8862269254527579
     2 GREF-TAIL-CALL(1) #<identifier user#func>; (func (/ (%sqrt pi) 2))
     4 RET 

I hope this restriction is removed soon, since at least for standard functions, programmer shouldn't care if they are implemented in Scheme or as SUBRs. I'm not sure I can do that by 0.9.1 release, though. Anyway, most of the time you don't need to care, but when you're writing a performance sensitive code it may worth to play with disasm a bit to find out what the compiler can do for you.

(Added 2010/11/20 20:03:14 UTC): Oh, by the way, constant folding is done only when the bindings of built-in procedures haven't been altered at the time of compilation. This is a subtle issue; if you value dynamism, that you can change any parts of the system anytime, then you might expect that the argument of func should be recalculated when you redefine / or %sqrt some later time. R6RS made library modules closed, that is, you cannot alter its exported bindings afterwards. That's a possible solution. I like Gauche to be a bit more flexible, though. I think it is reasonable to expect programmers to consider the risk of overriding existing bindings. Yes you can do that, but if you do so, do it boldly. At the beginning of the module write redefinitions explicitly and prominently so that not only the compilers but the readers of your program will also notice you're attempting some trick.

★ ★ ★

This precomputation once caused an interesting event. I had a function like the following, which worked fine in 0.9 but stopped working after I put the new constant folding code.

(define (fn ...)
  (define tree '(root))

  (define (populate! node)
    ... a code that destcructively modifies cdr of node ...)

  (populate! tree)
  (cdr tree))

I represented a tree in which each node was something like (<name> <node> ...). The variable tree began with an empty root node, the populate! function grew the tree, and finally fn returned a list of root's children.

The above code is buggy, since it destructively modifies a literal list. Gauche doesn't check mutation on a literal pairs, since doing so would be costly.

On 0.9 the code seemed working, since fn was actually called only once during program execution. So mutated literal won't affect other parts.

With the improved constant folding, the compiler deduced that the return value of fn was a cdr of constant, thus computable at compile time. It eliminated all the code in fn as dead code and made fn a constant function that merely returned ().

Ultimately the compiler should warn if it can detect the code may mutate a constant. Doing so generally is difficult, but there may be some obvious cases easy to catch.

★ ★ ★

Actually, the real fun begins when this extended constant folding meets procedure inlining. Then there are some interesting code transformation happening; such that conditional expressions can be eliminated because the compile knows the condition always be satisfied (or never satisified).

Unfortunately inlining feature is still picky and I hesitate to expose it publicly yet. I hope I can write about it soon.

Tag: compiler

Post a comment

Name: