To please a new generation of readers, Bentley changed the question to deal with a "multi-megabyte" machine, with "only about a megabyte free", in the second edition of Programming Pearls, published in early 2000. I suppose Bentley's reasoning was, "anything smaller than that would sound unreasonable to a modern reader". How disappointing.
It was therefore very refreshing to read Eliminating the Call Stack to Save RAM, by Xuejun Yang, Nathan Cooprider, and John Regehr (LCTES '09). Over twenty-five years after Bentley's column first appeared, there still are people trying to save RAM on embedded systems, machines with "between hundreds of bytes and tens of kilybytes"! And, hey, what a neat memory-saving trick they present!
The idea presented in the paper is flattening: replacing function invocation and returns with simple jumps. This is different from inlining, since functions called from multiple sites are not duplicated; and in fact, flattening is performed only after single-callsite functions are inlined. With flattening, a function such as:
int foo (int formal1, int formal2) {
... body ...
return ret;
}Is transformed into:
_foo_body:
... body ...
_foo_ret_value = ret;
switch (_foo_ret_site) {
case 0: goto _foo_ret_site_0;
case 1: goto _foo_ret_site_1;
case 2: goto _foo_ret_site_2;
...
default: panic();
}
And a simple call like val = foo(actual1, actual2) is transformed into:
_foo_formal1 = actual1;
_foo_formal2 = actual2;
_foo_ret_site = 2;
goto _foo_body;
_foo_ret_site_2:
val = _foo_ret_value;
Clearly, the technique cannot be easily applied to programs with any sort of recursion. The presence of other "flattening hazards", such as calls through function pointers, also complicate things. But in the world of embedded software for micro-controller units (MCUs), these are rarities, exceptions rather than the norm. (Also clearly, the authors are being conservative, since the default clause in the returning switch statement could be replaced with the last valid value.)
Flattening is followed by lifting, where global variables are transformed into local variables (of main, the last remaining function). As the authors note, "In effect, flattening and lifting trick a legacy C compiler into performing whole-program optimizations. This works because modern compilers, running on modern desktop machines, are perfectly capable of whole-program compilation for MCU-sized codes."
Flattening and lifting increase program size (by 14% on average, ranging from about 2% to slightly less than 40% on a set of 15 test applications). How can this save memory? Well, as the title implies, the process, when applied aggressively, eliminates all usage of a call stack. And this, in turn, reduces RAM usage by 20% on average (ranging from 2% to over 40% for individual cases). CPU usage is decreased by 3% on average (while CPU usage for some programs is reduced by over 15%, it is actually increased for some other programs, by as much as 22%. I wonder how much would this be reduced by optimizing the order of the options in the "return switch"?).
In their lucidly written paper, the authors describe some of the interesting challenges they encountered in implementing this technique (for example, false execution paths that prevent some compiler optimizations) and the solutions found (in this case, passing callgraph data into the optimizer, even though the callgraph no longer exists as such by the time the optimizer runs, due to flattening). They also present some unexpected benefits, for example in terms of program security (no buffer overflow vulnerabilities!).
I find it fun to play in the domains where these types of things still matter. This seems like a nice summary of our work for that paper. You might get a kick out of our RAM compression paper too. It comes a couple years earlier, but follows the same vein of doing crazy things to save RAM.
ReplyDeleteHi Doc,
ReplyDeleteThis comment is not about this post.
I am pretty impressed with your writing and would like to follow you on twitter.
My Twitter is @kanchirk
Can you please let me know your twitter profile id ?
Also a suggestion , please add the twitter gadget to your webpage so that future fans would find it convenient to follow you.
Thank you
Raghuraman