Plugged an annoying error behavior
I've been aware of an annoying behavior in Gauche. From time to time, I myself got bitten by it and thought "Gee, it's terrible! I should fix it." But there was always more urgent tasks to finish so it was let untouched.
It is that, when an error is caused by a huge object (e.g. long list or deep tree), Gauche tries to report the offending object entirely, producing huge error message:
*** ERROR: vector required, but got (*TOP* (html (head (title "RSSMix: Recent En tries") (link (|@| (type "text/css") (rel "stylesheet") (href "wiliki-sample.css ")))) (body (h1 "RSSMix: Recent Entries") (div (|@| (align "right")) "[" (a (|@| (href "http://practical-scheme.net/wiliki/wiliki.cgi?WiLiKi:RSSMix")) "What's T his?") "][" (a (|@| (href "?c=info")) "Sources") "]") (hr) (table (tr (td "2018/ 05/13 16:27:40 UTC") (td (a (|@| (href "http://ja.reddit.com/r/lisp_ja/")) "Redd it - LISP ja") ": " (a (|@| (href "https://ja.reddit.com/r/lisp_ja/comments/8j4y ff/実行時のデータ型の表現手法_2012/")) "実行時のデータ型の表現手法 (2012)"))) (tr (td "20 18/05/13 15:11:00 UTC") (td (a (|@| (href "http://ja.reddit.com/r/lisp_ja/")) "R eddit - LISP ja") ": " (a (|@| (href "https://ja.reddit.com/r/lisp_ja/comments/8 j4ez7/evolution_in_lisps_qiita/")) "Evolution In LISPs - Qiita"))) (tr (td "2018 ...
It is especially bad when you're running
buffer of Emacs with font-lock mode. Emacs starts to parse this huge
S-expression and does nothing else---even not accepting keystrokes.
Yesterday I hit it again and had to kill Emacs. That was the last straw.
It turns out it's so simple that I wonder why I didn't already do it long time ago.
--- a/src/libexc.scm +++ b/src/libexc.scm @@ -73,7 +73,7 @@ (values '() (list exc))) (let1 name (condition-type-name exc) (if (condition-has-type? exc <message-condition>) - (format out "*** ~a: ~a\n" name (~ exc'message)) + (format out "*** ~a: ~,,,,200:a\n" name (~ exc'message)) (format out "*** ~a\n" name))) (for-each (cut report-mixin-condition <> out) mixins)))))
Now the error message is truncated if it's too long.
*** ERROR: vector required, but got ((*TOP* (html (head (title "RSSMix: Recent Entries") (link (|@| (type "text/css") (rel "stylesheet") (href "wiliki-sample.css")))) (body (h1 "RSSMix: Recent Entries") (div ... Stack Trace: _______________________________________ 0 (vector-ref zz 1) at "(standard input)":4
Distributing a standalone binary executable is the easiest way to deliver Gauche applications to the users. However, what if you need your code to be built on the user's machine? Maybe your code is part of larger system and the client needs to build it on their site.
Once upon a time, delivering a source tarball and asking the user to run
./configure && make && make install wasn't a big deal. It was so much easier than before, when you had to read instructions cafefully and edit Makefiles according to your environment. However, the world has moved on.
So I wrote a small shell script get-gauche.
If you trust me enough, you can ask the user to do this:
curl -s https://raw.githubusercontent.com/shirok/getgauche/master/get-gauche.sh | /bin/bash
It works as follows:
- Ask the user where to install Gauche
- Check if the latest version of Gauche is already installed; if so, do nothing.
- Check if the user has write permission to the install destination; if not, ask the user if it's ok to use 'sudo' for installation.
- Download the official tarball of the latest release, compile and run check, then install it.
You can also download the
get-gauche script and run it.
The script can accept a bunch of command-line options to customize the behavior. See https://github.com/shirok/get-gauche for the details.
One instance I used it was like this: I wanted to use Gauche for testing and various management work in the product. I included
get-gauche.sh in the source tree, and in the Makefile I invoked it to install Gauche under the build tree.
Static linking and standalone executables
One of the most frequent-asked feature for Gauche is the ability to compile a Scheme program and produce an executable file. Well, it finally comes in the upcoming 0.9.6 release!
Already in the repo, the build process creates a static library of
libgauche by default. There's also a script
build-standalone that takes Scheme script source files and spits a stand-alone binary executable---meaning, you can copy the single file to another machine and just run it, without having Gauche runtime there, given that the other machine is the same architecture. (Note: It statically links
libgauche but still depends on quasi-standard dynamic libraries such as
The detailed explanation is in the "Building standalone executables" section of the manual. There's a sample source in
examples/standalone you can play with.
If you just want to use the feature, that's all you need to know. The following is the discussion behind the scene...
＊ ＊ ＊
The ability to produce standalone executable is somewhat considered a distinguishing feature of programming language implementation that separates itself from interpreters or scripting languages.
However, we'd say such a feature has nothing to do with the language being interpreted or scripted. It is purely for the convenience of distributing Gauche programs.
In fact, what
build-standalone does is to take Scheme source files and generates a C code snippet with the Scheme code embedded as a C string literal, which is just
evaled when the binary is run. There's no performance advantage in the binary form. The static version of
libgauche also contains all of library code written in Scheme as C string literals, to be
evaled on demand.
In fact, it is a trade-off; a standalone binary is easier to distribute, but you lose flexibility of scripting that the user can easily modify the source and run immediately. Also, the size of stand-alone binaries is not small (16MB on Linux x86_64), because it carries around entire Gauche runtime in it. (It's Lisp's curse---because of
eval, we can't statically pick only required code.)
To distribute Gauche applications as Scheme scripts, you need Gauche runtime installed on the target machines. But that's the same for most languages---you need Java runtime to run Java applications, or even need
libc to run C applications. If you deploy a bunch of Gauche applications in a limited number of users, it might still be better to install Gauche runtime on each target machine and distribute Scheme files, rather than build a standalone binary for each of applications.
That said, the standalone executable feature will expand the use of Gauche, I hope.
(And I do plan to improve ahead-of-time compilation, employing more optimizations, so standalone binary may have performance advantage in some day.)
Generalized setter inlining
(This is a continuation of Procedure inliner improvement.)
Another improvement of procedure inlining in 0.9.6 is about generalized setters.
Generalized setter srfi:17 allows
set! to behave as if it takes
lvalue of the first argument.
gosh> (define p (list (vector 1))) p gosh> p (#(1)) gosh> (set! (vector-ref (car p) 0) 2) #<undef> gosh> p (#(2))
It isn't really calculating lvalue of
(vector-ref (car p) 0).
The trick is a very simple conversion:
(set! (fn arg ...) value) ↓ ((setter fn) arg ... value)
fn is known at the compile time, and
is a constant binding, the compiler can inline the value of
(setter fn). In the following example, you see
the compiler knows
(setter vector-ref) is
vector-set!, and emit the VM instruction instead
of making a procedure call:
gosh> (disasm (^ (set! (vector-ref (car p) 0) 2))) CLOSURE #<closure (#f)> === main_code (name=#f, code=0x2939600, size=6, const=1 stack=1): signatureInfo: ((#f)) 0 GREF #<identifier user#p.2629740>; p 2 CAR-PUSH ; (car p) 3 CONSTI(2) 4 VEC-SETI(0) ; ((setter vector-ref) (car p) 0 2) 5 RET
This is exactly the same code as
(vector-set! (car p) 0 2).
Actually, srfi-17 is written with this optimization in mind. The
ref:getter-with-setter procedure is specified to
bind an accessor and a mutator in immutable way, so that the
compiler can safely replace
(setter fn) form with
fn's mutator. It's just that we haven't taken advantage of it
Since we have this optimization now, it is less important to have
separate mutator procedure than accessor---you can attach the mutator
as the setter of the accessor, and without penalty you can use
to invoke the mutator.
Procedure inliner improvement
This is about the compiler internal change in 0.9.6. There's no user-visible changes; it might be interesting if you're curious about compiler internals.
＊ ＊ ＊
Some of Gauche's primitive procedures are inlined as VM instructions.
For example, Gauche's VM has an instruction to reference an element
of a vector with immediate offset. If
vector-ref's second argument
is a constant, the instruction is used:
gosh> (disasm (^x (vector-ref x 0))) CLOSURE #<closure (#f x)> === main_code (name=#f, code=0x20e3160, size=3, const=0 stack=0): signatureInfo: ((#f x)) 0 LREF0 ; x 1 VEC-REFI(0) ; (vector-ref x 0) 2 RET
This tends to be pretty effective in performance.
You can also define a procedure as inlinable, using
define-inline. We've been using it
to optimize Gauche runtime.
However, up to 0.9.5, inlining is limited only when the inlinable procedure
appears at the procedure position of the function application.
In the following example,
vector-ref wasn't inlined because
it does not appear literally at the procedure position:
gosh> (define-inline (foo ref) (^x (ref x 0))) foo gosh> (disasm (^x ((foo vector-ref) x))) CLOSURE #<closure (#f x)> === main_code (name=#f, code=0x2228f50, size=9, const=2 stack=14): signatureInfo: ((#f x)) 0 LREF0-PUSH ; x 1 PRE-CALL(1) 7 ; (foo vector-ref) 3 GREF-PUSH #<identifier user#vector-ref.2238300>; vector-ref 5 GREF-CALL(1) #<identifier user#foo.2238340>; (foo vector-ref) 7 TAIL-CALL(1) ; ((foo vector-ref) x) 8 RET
It's been bothering us, since it is natural to expect that the effect of inlining and β-reduction cascades as follows:
((foo vector-ref) v) ↓ (((^[ref] (^x (ref x 0))) vector-ref) v) ↓ ((^x (vector-ref x 0)) v) ↓ (vector-ref v 0)
It was the old inline interface that prevented this.
＊ ＊ ＊
Gauche's compiler has 5 passes. Pass 1 expands macros and replaces input S expressions into a graph called IForm. Pass 2-4 works on IForm to optimize, and pass 5 walks final IForm to emit VM instructions.
It is pass 1 where
vector-ref was expanded. Inlinable
procedures have an attached inliner procedure, which takes
an input S expression and compile-time environment Cenv, and
returns expanded IForm:
Inliner :: Sexpr, Cenv -> IForm
When the compiler see a form
(fn arg ...) and
an inliner, the form
(fn arg ...) is passed to it and
the returned IForm becomes the result of pass1 of the original form.
fn is defined with
define-inline, the inliner procedure
is a closure that keeps IForm of the original procedure. If the compiler
sees this definition:
(define-inline (vref v) (vector-ref v 0))
(v (vector-ref v 0)) with pass 1 and obtain
its IForm. Let's denote it as
#<IForm (^v (vector-ref v 0))> for now.
The inliner of
vref would be something like this:
(^[sexpr cenv] (#Apply #<IForm (^v (vector-ref v 0))> (map (cut pass1 <> cenv) (cdr sexpr))))
Here, the pseudo function "#Apply" builds an IForm that applies the
first argument on the list of IForms in the second argument.
The subsequent optimization passes would eliminate binding of
v if possible.
We could've kept the body of
vref in S-expression and just expand
it much like legacy macro expansion:
(^[sexpr cenv] (pass1 `(apply (^v (vector-ref v 0)) ,@(cdr sexpr)) cenv))
But such way would run into the hygiene problem. We have to
vector-ref to avoid potential name conflict. Running
through pass1 resolves all the identifier references, hence it addresses
hygiene issue as well.
The problem is that the inliner's input is Sexpr and output is IForm.
that means we can't apply an inliner on the result of another inliner.
It was the reason that the
foo case above didn't inlined as expected.
There are also other opportunities of inlining during pass 3. Once we do closure optimization and redundant local variable elimination, we might get a procedure call where the procedure has attached inliner. Since we're working on IForm in pass 3, we couldn't apply inliners in such case.
＊ ＊ ＊
The solution is to change inliner protocol. Instead of taking input Sexpr, we can make the inliner takes IForms:
Inliner :: IForm, [IForm] -> IForm
The first IForm is the body of inlinable procedure, and the second list of IForms are the IForm from argument expressions. The information of Cenv is already folded in IForm.
In pass 1, we process the procedure and the argument with pass1 before calling the inliner. In pass 3 we can just call the inliner.
Now the effect of inlining cascades.
;; Returns a procedure that takes the first element of the given sequence, ;; using REF as the accessor. (define-inline (foo ref) (^v (ref v 0))) ;; Returns accessor accoring to the symbol TYPE. (define-inline (bar type) (cond [(eq? type 'v) vector-ref] [(eq? type 'u8) u8vector-ref])) ;; Using foo and bar to access the first element of V, using an accessor ;; designated by TYPE: (define-inline (baz type v) (let ((ref (bar type))) ((foo ref) v)))
type is constant, the
cond expression in
is computed at the compile time, allowing that we inline everything
down to a VM instruction:
gosh> (disasm (^v (baz 'v v))) CLOSURE #<closure (#f v)> === main_code (name=#f, code=0x28783e0, size=3, const=0 stack=0): signatureInfo: ((#f v)) 0 LREF0 ; v 1 VEC-REFI(0) ; (ref v 0) 2 RET gosh> (disasm (^v (baz 'u8 v))) CLOSURE #<closure (#f v)> === main_code (name=#f, code=0x2878340, size=4, const=0 stack=1): signatureInfo: ((#f v)) 0 LREF0-PUSH ; v 1 CONSTI(0) 2 UVEC-REF(1) ; (ref v 0) 3 RET
＊ ＊ ＊
This change allows us to write abstraction without worring that it taxes performance.