# 2013/02/04

## Building a list with index---which way is faster?

It is often needed to create a list of length N, whose k-th elements are calculated from k, e.g. `(f k)`. The shortest way is to map f over a sequence of indices:

```(map f (iota N))
```

If it's a throwaway script I just do this. But if it's for reusable code, creating a temporary list by `(iota N)` looks kind of waste. Is there a better way?

Using a lazy list for index can avoid creating a full intermediate list. However, we realize the entire list anyway, so the overhead of lazy calculation can be a drag.

```(map f (liota N))
```

If `f` is a bit of complicated expression, I like to use eager comprehensions. This avoids intermediate list entirely. The only downside is that when I don't want to depend on the `srfi-42` module (there's nothing wrong with it, but there's a time that pulling entire srfi-42 just for this small expression seems overkill.)

```(list-ec (: i n) (f i))
```

What else? The comprehensive `srfi-1` provides a universal list builder, `unfold` and `unfold-right`.

```(unfold (cut = n <>) f (cut + <> 1) 0)
(unfold-right negative? f (cut - <> 1) (- n 1))
```

And this is the freshman's code.

```(do ([i 0 (+ i 1)]
[r '() (cons (- i) r)])
[(= i n) (reverse r)])
```

Now, time to benchmark! To measure the difference of the perfomance of these looping constructs, I just pass `-` as `f`.

```(define (timeit n)
(time-these/report '(cpu 3)
`((eager . ,(^[] (map - (iota n))))
(lazy . ,(^[] (map - (liota n))))
(ec . ,(^[] (list-ec (: i n) (- i))))
(unfold . ,(^[] (unfold (cut = n <>) -
(cut + <> 1)
0)))
(unfoldr . ,(^[] (unfold-right negative? -
(cut - <> 1)
(- n 1))))
(do . ,(^[] (do ([i 0 (+ i 1)]
[r '() (cons (- i) r)])
[(= i n) (reverse r)]))))))
```

And the winner is... (drum)

```gosh> (timeit 10)
Benchmark: ran eager, lazy, ec, unfold, unfoldr, do, each for at least 3 cpu seconds.
eager: 3.000 real, 3.030 cpu (3.030 user + 0.000 sys)@275029.70/s n=833340
lazy: 3.265 real, 3.280 cpu (3.270 user + 0.010 sys)@139736.89/s n=458337
ec: 3.100 real, 3.080 cpu (3.080 user + 0.000 sys)@243506.49/s n=750000
unfold: 3.072 real, 3.070 cpu (3.060 user + 0.010 sys)@274837.13/s n=843750
unfoldr: 3.275 real, 3.280 cpu (3.280 user + 0.000 sys)@474576.22/s n=1556610
do: 3.302 real, 3.270 cpu (3.260 user + 0.010 sys)@663932.42/s n=2171059

Rate eager  lazy    ec unfold unfoldr    do
eager 275030/s    -- 1.968 1.129  1.001   0.580 0.414
lazy 139737/s 0.508    -- 0.574  0.508   0.294 0.210
ec 243506/s 0.885 1.743    --  0.886   0.513 0.367
unfold 274837/s 0.999 1.967 1.129     --   0.579 0.414
unfoldr 474576/s 1.726 3.396 1.949  1.727      -- 0.715
do 663932/s 2.414 4.751 2.727  2.416   1.399    --
#<undef>
gosh> (timeit 100)
Benchmark: ran eager, lazy, ec, unfold, unfoldr, do, each for at least 3 cpu seconds.
eager: 3.266 real, 3.260 cpu (3.240 user + 0.020 sys)@36154.91/s n=117865
lazy: 3.034 real, 3.030 cpu (3.030 user + 0.000 sys)@19042.90/s n=57700
ec: 3.072 real, 3.070 cpu (3.060 user + 0.010 sys)@40719.87/s n=125010
unfold: 3.167 real, 3.160 cpu (3.150 user + 0.010 sys)@37299.05/s n=117865
unfoldr: 3.221 real, 3.230 cpu (3.230 user + 0.000 sys)@53212.07/s n=171875
do: 3.220 real, 3.220 cpu (3.220 user + 0.000 sys)@71172.05/s n=229174

Rate eager  lazy    ec unfold unfoldr    do
eager 36155/s    -- 1.899 0.888  0.969   0.679 0.508
lazy 19043/s 0.527    -- 0.468  0.511   0.358 0.268
ec 40720/s 1.126 2.138    --  1.092   0.765 0.572
unfold 37299/s 1.032 1.959 0.916     --   0.701 0.524
unfoldr 53212/s 1.472 2.794 1.307  1.427      -- 0.748
do 71172/s 1.969 3.737 1.748  1.908   1.338    --
#<undef>
gosh> (timeit 1000)
Benchmark: ran eager, lazy, ec, unfold, unfoldr, do, each for at least 3 cpu seconds.
eager: 3.264 real, 3.250 cpu (3.250 user + 0.000 sys)@3846.15/s n=12500
lazy: 3.024 real, 3.020 cpu (3.020 user + 0.000 sys)@1986.75/s n=6000
ec: 3.080 real, 3.090 cpu (3.090 user + 0.000 sys)@4414.24/s n=13640
unfold: 3.127 real, 3.130 cpu (3.130 user + 0.000 sys)@2284.35/s n=7150
unfoldr: 3.277 real, 3.290 cpu (3.290 user + 0.000 sys)@5453.19/s n=17941
do: 3.221 real, 3.220 cpu (3.220 user + 0.000 sys)@7320.81/s n=23573

Rate eager  lazy    ec unfold unfoldr    do
eager 3846/s    -- 1.936 0.871  1.684   0.705 0.525
lazy 1987/s 0.517    -- 0.450  0.870   0.364 0.271
ec 4414/s 1.148 2.222    --  1.932   0.809 0.603
unfold 2284/s 0.594 1.150 0.517     --   0.419 0.312
unfoldr 5453/s 1.418 2.745 1.235  2.387      -- 0.745
do 7321/s 1.903 3.685 1.658  3.205   1.342    --
#<undef>
```

The freshman's code using `do` is the clear winner. (I'm using git HEAD).

This is a bit dissapointing, for my goal is to make the most concise and clear code run the fastest. I don't like to be forced to use lower-level abstraction just because of performance, for the code will eventually be cluttered with those "hand optimizations". It indicates the incompetency of the implementation.

For this goal, I guess I should optimize the eager comprehension--- it is a macro, so it's supposed to generate code as efficient as hand-coded `do` loop in such a simple case!

The performance of `unfold-right` is notable. I don't use `unfold` family much, just because I always forget the order of arguments and need to look up the manual. Maybe I should make myself more familiar with it.

For the readers: Don't take this entry as "you should use do loop for performance". It's fast in the current versions of Gauche, but I'll improve it over time so that the concise code can run as fast as the expanded `do` loop.

★ ★ ★

(Edit: 2013/02/05 10:32:47 UTC): Peter Bex pointed me to `list-tabulate`, which is exactly for this kind of work. Aha! Here's the updated benchmark. Yes, `list-tabulate` is the winner!

```(define (timeit n)
(time-these/report '(cpu 3)
`((eager . ,(^[] (map - (iota n))))
(lazy . ,(^[] (map - (liota n))))
(ec . ,(^[] (list-ec (: i n) (- i))))
(unfold . ,(^[] (unfold (cut = n <>) -
(cut + <> 1)
0)))
(unfoldr . ,(^[] (unfold-right negative? -
(cut - <> 1)
(- n 1))))
(do . ,(^[] (do ([i 0 (+ i 1)]
[r '() (cons (- i) r)])
[(= i n) (reverse r)])))
(tabulate . ,(^[] (list-tabulate n -))))))
```
```gosh> (timeit 1000)
Benchmark: ran eager, lazy, ec, unfold, unfoldr, do, tabulate, each for at least 3 cpu seconds.
eager: 3.312 real, 3.300 cpu (3.300 user + 0.000 sys)@3846.67/s n=12694
lazy: 3.056 real, 3.060 cpu (3.060 user + 0.000 sys)@1960.78/s n=6000
ec: 3.138 real, 3.140 cpu (3.140 user + 0.000 sys)@4777.07/s n=15000
unfold: 3.045 real, 3.050 cpu (3.050 user + 0.000 sys)@2459.02/s n=7500
unfoldr: 3.044 real, 3.040 cpu (3.040 user + 0.000 sys)@5550.99/s n=16875
do: 3.076 real, 3.090 cpu (3.090 user + 0.000 sys)@7281.55/s n=22500
tabulate: 3.030 real, 3.040 cpu (3.040 user + 0.000 sys)@8509.87/s n=25870

Rate eager  lazy    ec unfold unfoldr    do tabulate
eager 3847/s    -- 1.962 0.805  1.564   0.693 0.528    0.452
lazy 1961/s 0.510    -- 0.410  0.797   0.353 0.269    0.230
ec 4777/s 1.242 2.436    --  1.943   0.861 0.656    0.561
unfold 2459/s 0.639 1.254 0.515     --   0.443 0.338    0.289
unfoldr 5551/s 1.443 2.831 1.162  2.257      -- 0.762    0.652
do 7282/s 1.893 3.714 1.524  2.961   1.312    --    0.856
tabulate 8510/s 2.212 4.340 1.781  3.461   1.533 1.169       --
#<undef>
```

Tags: Performance, unfold, srfi-42

Past comment(s)

## Peter Bex (2013/02/05 09:09:19):

I'm missing SRFI-1's list-tabulate from the benchmarks. This would generally be the naturally first thing I'd try for this task.

## shiro (2013/02/05 10:28:42):

Oh yeah! I completely forgot about it. Let me check...