Assertions are a Good Thing. Sadly, most programming languages support a very limited set of assertions; the most common case is just specifying a boolean expression, based on current variable values, that must evaluate to true in order for the assertion to succeed. Some languages, such as Eiffel, offer a slightly more advanced language for expressing some assertions -- for example, the old keyword can be used in post-conditions to refer to the original value of variables that might have changed during evaluation.
In general, any assertion mechanism that allows programmers to use higher-level concepts, or measure properties that are not readily accessible otherwise, is good progress. Assertions for heap properties are not a new thing, but in GC Assertions: Using the Garbage Collector to Check Heap Properties (PLDI '09), E. A. Aftandilian and S. Z. Guyer suggest both a rich set of heap-related assertions, plus a novel mechanism for performing the tests with minimal runtime overhead (<3% when a significant set of assertion is used).
The authors claim their system to be more precise than static analysis, since it tests the actual, concrete heap, and work well in the presence of (commonly used) mechanisms like reflection and dynamic class loading; more efficient than runtime invariant checks; and more accurate than heuristics.
The set of assertions offered in this system include assert-dead(ref) (zero incoming references), assert-alldead() (for all objects allocated in a marked region), assert-instances(type), num), assert-unshared(ref) (one incoming pointer), and more. Assertion violations provide a dump that shows the path from a memory root (e.g., thread stack or static field) to the violating object.
The system's strength lies in the fact that assertions are tested not when encountered at runtime, but rather on the following GC cycle; but this imposes some limitations on the assertions that can be tested for. What I found missing is (a) a slightly richer set of assertions, with variants such as assert-min-references or assert-max-references; and (b) conditional asserts, as in, "test this assertion only if that condition holds". The conditions can be limited to very trivial ones, e.g., "if ref != null, assert-min-references(ref, 2)" can be used to verify that a given reference to an object is never the only one. (Yes, this specific example can be achieved by means of weak references, but that complicates the program's code, and unlike assertions, it cannot be turned off; I don't want the system to auto-release my reference if it's the last one, I want to assert that it never is.)
All in all, the paper presents a great new take on memory-related assertions. If Design-by-Contract is ever added to the Java language (as it should be), this paper provides a first glimpse into the possibilities of contracts that go beyond the trivial and easily-tested ones, by wisely taking advantage of the managed runtime environment.
Wednesday, December 23, 2009
Sunday, December 6, 2009
Eliminating the Call Stack to Save RAM
"How do I sort on disk?". This is the question that opens the first column in Jon Bentley's delectable Programming Pearls. The column was originally printed in 1983, and in book format in 1986. The asker thought that sorting on disk is required because the system he was programming for only had "about a thousand 16-bit words" (in other words, about 2KB) in free memory.
To please a new generation of readers, Bentley changed the question to deal with a "multi-megabyte" machine, with "only about a megabyte free", in the second edition of Programming Pearls, published in early 2000. I suppose Bentley's reasoning was, "anything smaller than that would sound unreasonable to a modern reader". How disappointing.
It was therefore very refreshing to read Eliminating the Call Stack to Save RAM, by Xuejun Yang, Nathan Cooprider, and John Regehr (LCTES '09). Over twenty-five years after Bentley's column first appeared, there still are people trying to save RAM on embedded systems, machines with "between hundreds of bytes and tens of kilybytes"! And, hey, what a neat memory-saving trick they present!
The idea presented in the paper is flattening: replacing function invocation and returns with simple jumps. This is different from inlining, since functions called from multiple sites are not duplicated; and in fact, flattening is performed only after single-callsite functions are inlined. With flattening, a function such as:
Is transformed into:
And a simple call like val = foo(actual1, actual2) is transformed into:
Clearly, the technique cannot be easily applied to programs with any sort of recursion. The presence of other "flattening hazards", such as calls through function pointers, also complicate things. But in the world of embedded software for micro-controller units (MCUs), these are rarities, exceptions rather than the norm. (Also clearly, the authors are being conservative, since the default clause in the returning switch statement could be replaced with the last valid value.)
Flattening is followed by lifting, where global variables are transformed into local variables (of main, the last remaining function). As the authors note, "In effect, flattening and lifting trick a legacy C compiler into performing whole-program optimizations. This works because modern compilers, running on modern desktop machines, are perfectly capable of whole-program compilation for MCU-sized codes."
Flattening and lifting increase program size (by 14% on average, ranging from about 2% to slightly less than 40% on a set of 15 test applications). How can this save memory? Well, as the title implies, the process, when applied aggressively, eliminates all usage of a call stack. And this, in turn, reduces RAM usage by 20% on average (ranging from 2% to over 40% for individual cases). CPU usage is decreased by 3% on average (while CPU usage for some programs is reduced by over 15%, it is actually increased for some other programs, by as much as 22%. I wonder how much would this be reduced by optimizing the order of the options in the "return switch"?).
In their lucidly written paper, the authors describe some of the interesting challenges they encountered in implementing this technique (for example, false execution paths that prevent some compiler optimizations) and the solutions found (in this case, passing callgraph data into the optimizer, even though the callgraph no longer exists as such by the time the optimizer runs, due to flattening). They also present some unexpected benefits, for example in terms of program security (no buffer overflow vulnerabilities!).
To please a new generation of readers, Bentley changed the question to deal with a "multi-megabyte" machine, with "only about a megabyte free", in the second edition of Programming Pearls, published in early 2000. I suppose Bentley's reasoning was, "anything smaller than that would sound unreasonable to a modern reader". How disappointing.
It was therefore very refreshing to read Eliminating the Call Stack to Save RAM, by Xuejun Yang, Nathan Cooprider, and John Regehr (LCTES '09). Over twenty-five years after Bentley's column first appeared, there still are people trying to save RAM on embedded systems, machines with "between hundreds of bytes and tens of kilybytes"! And, hey, what a neat memory-saving trick they present!
The idea presented in the paper is flattening: replacing function invocation and returns with simple jumps. This is different from inlining, since functions called from multiple sites are not duplicated; and in fact, flattening is performed only after single-callsite functions are inlined. With flattening, a function such as:
int foo (int formal1, int formal2) {
... body ...
return ret;
}Is transformed into:
_foo_body:
... body ...
_foo_ret_value = ret;
switch (_foo_ret_site) {
case 0: goto _foo_ret_site_0;
case 1: goto _foo_ret_site_1;
case 2: goto _foo_ret_site_2;
...
default: panic();
}
And a simple call like val = foo(actual1, actual2) is transformed into:
_foo_formal1 = actual1;
_foo_formal2 = actual2;
_foo_ret_site = 2;
goto _foo_body;
_foo_ret_site_2:
val = _foo_ret_value;
Clearly, the technique cannot be easily applied to programs with any sort of recursion. The presence of other "flattening hazards", such as calls through function pointers, also complicate things. But in the world of embedded software for micro-controller units (MCUs), these are rarities, exceptions rather than the norm. (Also clearly, the authors are being conservative, since the default clause in the returning switch statement could be replaced with the last valid value.)
Flattening is followed by lifting, where global variables are transformed into local variables (of main, the last remaining function). As the authors note, "In effect, flattening and lifting trick a legacy C compiler into performing whole-program optimizations. This works because modern compilers, running on modern desktop machines, are perfectly capable of whole-program compilation for MCU-sized codes."
Flattening and lifting increase program size (by 14% on average, ranging from about 2% to slightly less than 40% on a set of 15 test applications). How can this save memory? Well, as the title implies, the process, when applied aggressively, eliminates all usage of a call stack. And this, in turn, reduces RAM usage by 20% on average (ranging from 2% to over 40% for individual cases). CPU usage is decreased by 3% on average (while CPU usage for some programs is reduced by over 15%, it is actually increased for some other programs, by as much as 22%. I wonder how much would this be reduced by optimizing the order of the options in the "return switch"?).
In their lucidly written paper, the authors describe some of the interesting challenges they encountered in implementing this technique (for example, false execution paths that prevent some compiler optimizations) and the solutions found (in this case, passing callgraph data into the optimizer, even though the callgraph no longer exists as such by the time the optimizer runs, due to flattening). They also present some unexpected benefits, for example in terms of program security (no buffer overflow vulnerabilities!).
Subscribe to:
Posts (Atom)