Gauche Devlog

< Two concurrency utilities | Cookbook: Running commands remotely >


Functional and linear-updating interfaces

Recent trend of data structure SRFIs is to provide two flavors of updating procedures:

  • Functional updaters never mutate the input structure, and always return a newly allocated structure.
  • Linear updaters are allowed to reuse the storage of input structure to produce the output, given that the caller guarantees the input structure will never be used.

Functional interface has a good nature--it won't create hidden dependencies thus the code is easy to reason about. It also plays nicely with concurrent execution, for you don't need to worry that your operations step on other threads' toes.

Linear updating interface gives the user to express opportunities of optimization. The implementation can take advantage of it to reduce allocations.

So, it appears to be a nice combination---except that, I think, the way they are currently specified is actually pulling each one's leg and reducing their merits.

Performance-sensitive users often frown on functional data structures, for they seem to copy everything every time. "It won't be that bad," functionally-minded users replies, "for it is often the case that the input structure and the updated structure can share most of their internals; the updated structure just allocates enough to store the updated parts. In the extreme case, the updater can just return the input as is, when it finds out the structure isn't altered at all (e.g. adjoining an item to a set that already has the item). The beauty of functional programming is that nobody cares whether it is shared or not---only the content matters."

It is true if everything is written functionally. However, to use the linear-updating interface, the caller needs to know that the structure to pass in isn't shared. If the functional interface may return a (partially) shared structure, it's hard to guarantee the "no-share" condition. Thus, SRFI states the functional interface always copies the input, even if there's no change at all. It can't take advantage of partial sharing as well, if the linear-updating version mutates internal structure.

This takes out the opportunity of optimization in the functional interface. The implementation needs to choose either (1) makes a slow functional version, in order to provide an efficient linear-updating version, or (2) makes a linear-updating version not mutate the input at all, and put functional optimizations.

I think we can do better. One idea is this:

  • The data structure has a mutability flag internally.
  • The functional interface always returns immutable data. It may return the input as is, or return a structure that partially shares the input, if the input structure is flagged immutable. If the input is flagged mutable, it always returns a fleshly copied immutable structure.
  • The linear-updating interface may mutate the input structure if it is flagged mutable, and copies if the input structure is flagged immutable.

If the SRFI does not provide an explicitly-mutating interface, it is actually almost indistinguishable from the existing SRFI spec, except when you compare input and output structures with eq?.

Given that explicitly-mutating interfaces (such as vector-set!) aren't popular in the SRFIs, I think it's good to allow the implementation to take the latter choice.

Discussion on srfi-discuss

Tags: srfi, Immutability, DataStructures

Post a comment