Edit: The problem has been resolved! See this announcement for details.
I recently started writing a Go client for
librosie and ran into an obstacle:
Go allocates very small stacks for goroutines, too small for a goroutine to
instantiate and use a Rosie matching engine.
A workaround suitable for development but not actual use is to pin the main
routine of the Go “test program” to the initial OS thread. (The “test program”
librosie from Go.) This works because the initial thread has access
to an essentially unlimited stack.
The rest of this post ignores this workaround and considers the general case in which goroutines may be scheduled on any thread.
Go stacks and C stacks
My understanding is that when Go code runs in a goroutine, the stack grows as needed. But when that same Go code calls a C library routine, the C code, having been separately compiled, will of course follow C convention and assume that it has a “typical” (and large) amount of contiguous stack space available.
This means that the C code can blindly overrun the small amount of stack space allotted by the Go runtime, thus corrupting memory.
librosie is written in C.
cgo, a tool distributed with Go, to (attempt to) write my Go client of
librosie. The Go compiler knows when
cgo is used and which calls are to the
integrated C library. It seems possible for this knowledge to end up in my
compiled Go program, causing it to allocate large stacks for goroutines that
could call my C library.
Now, I could certainly reduce the stack space needed by the “largest”
routines, with respect to stack usage. The worst offender is
being implemented naively, employs two local (i.e. stack-allocated) arrays of
32-bit integers, where each array has 64k entries. Two arrays of 65,536 4-byte
entries consumes a cool 512Kb.
Instrumenting Rosie reveals that typical usage requires less than 40 entries in those arrays, which are functioning as stacks in an iterative implementation of code that was once recursive. A good approach in this situation, of course, is to allocate 40 or so entries statically, and expand when needed.
This is straightforward to do and is on the Rosie work item list.
By experimentation using pthreads (see the sample program mt.c, a stack allocation of 1MB should be sufficient (on OS X) with the current implementation. The improvement described above would eliminate nearly 512Kb from this figure.
But this discussion of reducing the stack space needed to call
librosie is not
why I’m writing this post.
How is this kind of issue solved by Go developers?
What I would like to know is how to solve this problem generally. I am not a Go developer, though I do like many things about the language and its tools. When Go developers integrate a foreign (non-Go) library, what do they do?
How do they control the amount of stack space allocated by the Go runtime when a goroutine calls foreign code? If this is not possible, then how are such integrations made to work?
And, assuming some control over stack space is possible, what’s the best practice for establishing how much space is needed?
In my case, I wrote
librosie and I have access to the source code for the
(non-system) libraries that Rosie uses. I can model the stack needs of this
code, up to libraries like
libc, etc. (And those are available for
Linux and possibly for OS X, if I wanted to spend a lot more time reading code.)
Or I can experimentally determine the minimum stack size needed to execute some
Rosie workload. Having done this, I find the results unsatisfying in their
fragility. Even a partial model of
librosie execution should give better
results, because it would guide the construction of adversarial workloads with
which to test.
So, can anyone shed some light on this for me? In the land of Go, how is this issue solved?
Discussion on reddit
Please comment on the Rosie subreddit to show me the way. Thanks in advance!