Edit: The problem has been resolved! See this announcement for details.


I recently started writing a Go client for librosie and ran into an obstacle: Go allocates very small stacks for goroutines, too small for a goroutine to instantiate and use a Rosie matching engine.

A workaround suitable for development but not actual use is to pin the main routine of the Go “test program” to the initial OS thread. (The “test program” exercises librosie from Go.) This works because the initial thread has access to an essentially unlimited stack.

The rest of this post ignores this workaround and considers the general case in which goroutines may be scheduled on any thread.

Go stacks and C stacks

My understanding is that when Go code runs in a goroutine, the stack grows as needed. But when that same Go code calls a C library routine, the C code, having been separately compiled, will of course follow C convention and assume that it has a “typical” (and large) amount of contiguous stack space available.

This means that the C code can blindly overrun the small amount of stack space allotted by the Go runtime, thus corrupting memory.

And librosie is written in C.

I used cgo, a tool distributed with Go, to (attempt to) write my Go client of librosie. The Go compiler knows when cgo is used and which calls are to the integrated C library. It seems possible for this knowledge to end up in my compiled Go program, causing it to allocate large stacks for goroutines that could call my C library.

Now, I could certainly reduce the stack space needed by the “largest” librosie routines, with respect to stack usage. The worst offender is match(), which, being implemented naively, employs two local (i.e. stack-allocated) arrays of 32-bit integers, where each array has 64k entries. Two arrays of 65,536 4-byte entries consumes a cool 512Kb.

Instrumenting Rosie reveals that typical usage requires less than 40 entries in those arrays, which are functioning as stacks in an iterative implementation of code that was once recursive. A good approach in this situation, of course, is to allocate 40 or so entries statically, and expand when needed.

This is straightforward to do and is on the Rosie work item list.

By experimentation using pthreads (see the sample program mt.c, a stack allocation of 1MB should be sufficient (on OS X) with the current implementation. The improvement described above would eliminate nearly 512Kb from this figure.

But this discussion of reducing the stack space needed to call librosie is not why I’m writing this post.

How is this kind of issue solved by Go developers?

What I would like to know is how to solve this problem generally. I am not a Go developer, though I do like many things about the language and its tools. When Go developers integrate a foreign (non-Go) library, what do they do?

How do they control the amount of stack space allocated by the Go runtime when a goroutine calls foreign code? If this is not possible, then how are such integrations made to work?

And, assuming some control over stack space is possible, what’s the best practice for establishing how much space is needed?

In my case, I wrote librosie and I have access to the source code for the (non-system) libraries that Rosie uses. I can model the stack needs of this code, up to libraries like pthread, libc, etc. (And those are available for Linux and possibly for OS X, if I wanted to spend a lot more time reading code.)

Or I can experimentally determine the minimum stack size needed to execute some Rosie workload. Having done this, I find the results unsatisfying in their fragility. Even a partial model of librosie execution should give better results, because it would guide the construction of adversarial workloads with which to test.

So, can anyone shed some light on this for me? In the land of Go, how is this issue solved?

Discussion on reddit

Please comment on the Rosie subreddit to show me the way. Thanks in advance!


Follow us on Twitter for announcements about the RPL approach to #modernpatternmatching.