2024/01/17
Caching formatter procedure
Lisp's format
procedure is very un-Schemy. Instead of having a set of
composable, orthogonal, do-one-thing-well procedures, format
introduces
a mini-language that's syntactically and semantically separate from the base
language. It is not extendable, loaded with obscure features from the past.
Yet it is handy for typical trivial tasks and that's why Gauche (and other Schemes,
plus a couple fo SRFIs) offer it.
(And to be honest, there's some pleasure to tinker such mini-language implementations.)
Aside from the non-composability, another glaring drawback of format
is
that it needs to interpret the mini language (format string) at runtime.
Most format
calls have a literal format string, and it is waste of time
to parse it every time format
is called. An obvious optimization
is to recognize the literal format string and translates the call to format
by simpler procedures at compile-time. I believe most CL implemenations do so.
However,
Gauche, as well as some other Scheme implementations and SRFI-48, allows the
port argument to be omitted. It is convenient,
but it indeed makes compile-time transformation difficult. If the first
argument of format
is a non-literal expression (it is the case
if you're passing a port), it is diffuclt for the compiler to recognize
if the format string is a constant, even the second argument is a literal
string that looks like a format string. If the first expression yields
a string at runtime, that is the format string and the literal
string is an argument to be shown.
Despite these difficulties, we can still take advantage of literal format string, by caching the format string compilation result at run-time.
It is not exactly the same as memoization. It is difficult to control amount of memoized results, and we only want to cache literal format strings, which needs to be determined at compile time.
So, we implemented a hybrid solution. The compiler macro attached
to format
checks if possible format string is a literal, and if so,
it transforms the call into an internal procedure that takes an extra
argument. The extra argument contains the position of the possible literal
format string, and a mutable box. The following is the core part of
the compile-time transformation:
(define-syntax make-format-transformer (er-macro-transformer (^[f r c] (match f [(_ shared?) (quasirename r `(er-macro-transformer (^[f r c] (define (context-literal pos) `(,',shared? ,pos ,(box #f))) (match f [(_ (? string?) . _) (quasirename r `(format-internal ',(context-literal 0) (list ,@(cdr f))))] [(_ _ (? string?) . _) (quasirename r `(format-internal ',(context-literal 1) (list ,@(cdr f))))] [(_ _ _ (? string?) . _) (quasirename r `(format-internal ',(context-literal 2) (list ,@(cdr f))))] [_ f]))))]))))
(NB: shared?
flag is used to share the routine with format
and
format/ss
. We need to check the literal string in first, second and third
position, for Gauche's format
allows two optional arguments before the
format string.)
At run-time, the internal function can see if the literal string is
indeed a format string. If so, it computes a formatter procedure
based on the format string, and stores it to the mutable box. Subsequent
calls will use the computed formatter procedure, skipping parsing and
compiling the format string. The caching occurs per-call-site, much like
the global variable lookup (we cache the <gloc>
object, the result
of lookup, in the code vector).
The format-internal
procedure checks optional arguments, and calls
format-2
. Its first argument can be a mutable box introduced
by the above macro, if we do know the format string is literal.
(define (format-2 formatter-cache shared? out control fmtstr args) (let1 formatter (if formatter-cache (or (unbox formatter-cache) (rlet1 f (formatter-compile fmtstr) (set-box! formatter-cache f))) (formatter-compile fmtstr)) (case out [(#t) (call-formatter shared? #t formatter (current-output-port) control args)] [(#f) (let1 out (open-output-string) (call-formatter shared? #f formatter out control args) (get-output-string out))] [else (call-formatter shared? #t formatter out control args)])))
A micro benchmark shows it's effective. In real code, the effect may not be so prominent, but it does remove worries that you're wasting time for parsing format string.
(define (run p) (dotimes [n 1000000] (format p "n=~7d 1/n=~8,6f\n" n (/. n)))) (define (main _) (time (call-with-output-file "/dev/null" run)) 0)
With caching off:
;(time (call-with-output-file "/dev/null" run)) ; real 19.796 ; user 19.790 ; sys 0.000
With caching on:
;(time (call-with-output-file "/dev/null" run)) ; real 10.313 ; user 10.310 ; sys 0.000
Tag: format
Post a comment