Gauche Devlog

< (exact (/ 3.0)) => ? | Fun with primes >


NUL in a string

Recently there was a discussion in R7RS list whether we should support NUL character in a string (AFAIK, the resolution is that an implementation is allowed not to support NUL char in a string, but it's ok to support it.) One of the primary concern is the interoperability between NUL-terminated string representation of the foreign libraries; it can cause security problem such as

Gauche had the same problem and I recently fixed it (commit abca7b2). I addressed it on the C calling interface, where I had two choices---either I'd throw an error when Scm_GetStringConst was applied to a Scheme string containing NUL, or keep the existing function as it was and provide an additional function that checks it.

The former would make the check exhaustive, but there's a possibility that it would break existing code that intentionally passes a character array containing NUL in middle of it. As an old-type C programmer I had written such code---sometimes as an ad-hoc way to passing a struct to ioctl, but I do remember there was a weird API that took an array of strings as "each string is separated by NUL byte, and the end of the array is marked by two consecutive NUL bytes".

So I chose the latter---I added a new 'safe' version of converting strings, and changed a bunch of system call functions to use the safe version.

Today I stumbled upon Peter Bex's article Lessons learned from NUL byte bugs and it makes me change my mind. The case that I need to pass a char array with NUL in middle is much, much rarer than passing C strings, and even if there's a case, we can provide a special API for those rare cases, while making the default API safe.

The fix can break backward compatibility but I expect it's very unlikely. If you know your code passes a character array with NUL as a string, let me know.

Tag: string

Past comment(s)

Peter (2012/12/12 22:30:35):

Hey Shiro,

Great to see that my post has some positive influence!

I think safety trumps convenience or performance in these cases, and you've made the right call to revert the change (if I understand it correctly).

If the user knows what he or she is doing, it's possible to get around the check in Chicken, too. We have a "scheme-object" FFI type which just passes the pointer as-is. If you have a string, blob or u8vector containing raw bytes you can use that, and it won't do any copying or checking on the data. Generally you *must* know the length in these situations anyway.

Cheers, Peter

Peter (2012/12/12 22:31:59):

Sorry, that's "scheme-pointer". "scheme-object" just passes the object itself, while "scheme-pointer" passes the data pointer inside the object (which is the raw pointer to the start of the string, if it's a string). I always get those two mixed up for some silly reason :)

shiro (2012/12/13 03:22:32):

Thanks for the article, Peter.

In our case, it wasn't about performance or convenience but more about the backward compatibility. But yes, your article made me realize that trading security for potential rare incompatibility wouldn't cut it.

In case it wasn't clear, what I decided is (1) to put NUL check in the API that converts a Scheme string to a C string and raises an error if it contains NUL string, and (2) to provide a new API that specifically extracts byte array and length from Scheme string, in case some weird C API expects NUL in it. If the change #1 breaks compatibility the user can change the code to use the new API.

Post a comment