Friday, November 11, 2011

Let the Children Code

I've recently read The Design of Kodu: A Tiny Visual Programming Language for Children on the XBox 360 (Matthew MacLaurin, POPL'11). Like me, the designers of Kodu first learned programming as children, using ROM BASIC in the home computers of the 80s (it was an Atari 800XL and then an Apple II in my case). I think this was a wonderful way to learn programming (Dijkstra's quip notwithstanding), but can children today be introduced to programming in a remotely similar manner?

Apparently not. As the paper states, "We realized very early that the kids of today have a very different context than we ourselves had in the 80s, particularly in regard to user interface and the games themselves, which often have art budgets comparable to those of the largest Hollywood films. We didn't want 21st century language for building 1980s games; instead, we felt our new language should be able to achieve the same quality of work product that kids experience in commercial console games today."

To address this goal, the team (in Microsoft Research) created Kodu as a GUI environment were kids use sensors, filters, and selectors to decide when to change the world using actuators and modifiers. A program could be something like:

1. See (sensor) - purple (filter) - move (actuator) - away (modifier)
2. See (sensor) - apple (filter) - move (actuator) - towards (modifier)

with the expected results regarding the behavior of the avatar in the game. "In the case of a purple apple, the rule to move away from purple objects will take priority" (because it appears first). More complex designs are made possible by means of pages, where an actuator can switch a "robot" (as the avatars are called) from one "page" (program) to another, in effect changing the robot's control procedure and allowing for some level of structured programming.

There are no variables in Kodu: state is stored and read from the world. Apparently, the intended way to keep state was using, for example, objects' colors; but children quickly realized that they could use the score of each robot as an integer variable.

More than anything else, this clarifies to me that programming should be taught using programming languages. I agree that console output, not to mention command-line games, is too limited for today's children. But, aiming at youth the same age I was when I first started programming (10-12 or so), I think that using a real programming language is far superior to using a toy one. My recommendation would be starting with HTML: it's not a programming language per se, but it would teach children the importance of accuracy, nesting, the implication of a simple typo on your output, the difference between a visible mistake and an invisible one, and so forth. Then, add JavaScript. There's no need for expert-level HTML and JS, mind you. The HTML would just set the ground for the JavaScript output; pick up images, or areas for text, etc. with getElementById, and then manipulate that element -- its location, size, and textual content. And input is very simple, too, using either text fields, buttons, or clickable elements with onClick events.

Just make sure you don't try to teach the children object-oriented programming using JavaScript. The language (at least in its current version) is simply not object-oriented, and seeing Java/C#/C++ programmers trying to force object-oriented design into an object-based, but not object-oriented, langauge is painful, and reminds me of the old adage about FORTRAN programmers who remain FORTRAN programmers, no matter what programming language they use.

Sunday, May 22, 2011

A Co-Relational Model of Data for Large Shared Data Banks

A Co-Relational Model of Data for Large Shared Data Banks is a paper by Erik Meijer (a used programming language salesman) and Gavin Bierman, in Communications of the ACM, vol. 54, no. 4 (April 2011; originally published in Queue vol. 9, no. 3). The paper is rather pretentious, but fascinating nonetheless.

The authors correctly compare the situation in the current noSQL market with that of the early database days, before Codd's landmark paper from 1970 heralded relational databases. Whereas customers can choose between relational (SQL) DB providers based on product merits, but using a standard interface (which makes switching relatively easy, and the market competitive), no such choice exists in the world of noSQL. While there are multiple noSQL products (Cassandra, Dynamo, Hibari, and many others), there is no unifying standard, and once a project picks a noSQL database, it becomes tightly bound to it.

Meijer and Bierman claim that noSQL should actually be named coSQL, since, in terms of category theory, the two models are complementary. Here's some quick background: category theory explores math by viewing the world as entities and arrows between them (domains and functions). Given a category T of objects and arrows, co(T) is the same set of objects, with the direction of arrows reversed. Well, in SQL, the arrows are between properties of objects and the objects themselves (e.g., if a book has multiple authors, then the table AUTHORS will have a foreign key into table BOOK). In co(SQL), the arrows would be from the book to the authors: and indeed, this is the case in noSQL, where (using in-memory representation) a "Book" object has pointers or references (arrows) to one or more "Author" objects. Ergo, noSQL is coSQL.

The authors then go into further mathematical details, and in particular discuss how, mathematically speaking, a MapReduce operation over a noSQL data store is effectively a Monad in category theory. Ultimately, their claim is that obtaining a standardized, mathematical view of noSQL (just as Todd did with pre-SQL databases, using relational algebra) will result in a universal standard for noSQL products, and a truly competitive and thriving market. They do not hesitate to share their belief that this mathematical formalization of noSQL will lead to similar economic growth. In other words, what we have here is a paper that claims in advance to be a landmark paper.

And it might just be right.

Friday, April 22, 2011

C++ template syntax patterns

A simple and straightforward guide to reading C++ template code: C++ template syntax patterns, by Eli Bendersky. Useful if you found yourself trying to interpret odd generic C++ code, and failed. Note, though, that it's not very deep -- more of an overview.

C++ increasingly bothers me. I'll write more about this soon; meanwhile, you can see an earlier post by Bendersky: Modern C++ Scares Me. (In general, Bendersky's blog is worth following.)

Tuesday, March 15, 2011

On OOPSLA 2010's acceptance policy

The opening pages of a Proceedings volume, the editor's forward, is something one normally skips without even noticing. Yet the OOPSLA 2010 proceeding start with a rather unusual "Program Chair Statement" that is well worth reading. And debating.

The chair, Martin Rinard, joins the heated discussion that is raging on the pages of Communications of the ACM, under the editorial guidance of Moshe Y. Vardi, for quite some time now: how to improve the state of publications in computer science. See, for example, Vardi's editorial from the March 2010 issue: Revisiting the Publication Culture in Computing Research.

Rinard, however, decided to take action on the matter. And he introduced several key changes to the acceptance procedure for OOPSLA. These include:


  • Unilateral paper acceptance, where each program committee member can pick a single paper that will be accepted, regardless of the opinions of the rest of the committee.

  • Standard scientific review criteria, focusing on whether the paper introduces new results or techniques, are its results significant and interesting, and whether it contains errors.

  • Reproducibility, measuring whether the paper contained sufficient information for other researchers to reproduce the result.

There were additional, more technical changes (e.g., with regard to submissions by program committee members, page limits, etc.).

Overall, I think these changes are most welcome. I particularly applaud the reproducibility concern, which (finally!) states the obvious: our field has a problem with papers by researches in the industry that work with proprietary systems, and produce papers that, while often fascinating, do not provide sufficient detail to truly advance the science.

It is the unilateral acceptance policy that bothers me; more specifically, the way it was executed. The cause, I should state, is a worthy one: as Rinard claims, "Papers that open up new directions or go against the established value system in the field are routinely rejected, either because they are controversial or because the program committee finds it difficult to evaluate the paper with the same degree of certainty as more mainstream papers"; and, in Rinard's view, "this situation has become severe enough to seriously hamper the development of the field" (!). In other words, the mundane wins over the challenging, and researchers prefer churning out mundane papers over doing ground-breaking research.

And is unilateral acceptance a good solution? I actually think it is. Except for this: Rinard chose to publish exactly which papers were unilaterally accepted, thus singling them out as the "odd ones"; worse still, the table of contents explicitly states which program committee member chose what paper.

The problem, obviously, is that this could create a sense of personal debt in the industry. Let's say committee member X picked a paper by author Y. Next year, Y would be in a committee of some other conference, to which X will submit a paper. Can we truly say that Y will never give X preferential treatment? Can we truly say that X will not hold a grudge against Y if he learns that, for this new conference, Y picked some other paper as part of his "unilateral acceptance prerogative" rather than picking X's?

Even if X and Y are both perfectly honest people, things can still seem wrong to others. If Y does pick X's paper in the following conference, will everyone believe it was a purely impersonal choice? When Y argues for X's paper in the next conference's PC meetings, how will the other PC members hold his views?

With the best of intentions, Rinard and the OOPSLA 2010 committee created a dangerous precedence. Yes, PC members should be able to pick one paper unilaterally. But they should justify their choice clearly to other PC members: why do they believe this paper they've picked is ground-breaking and novel. And more importantly, once the choice is made, the details should never be published in public; much like the scores given by PC members to different papers, during the review process, are never published except anonymously.

[Full disclosure: I have not submitted a paper to OOPSLA 2010. For that matter, since moving to the industry, I haven't submitted a paper to a conference in the past three years or so. So this isn't about me. I did take part in reviewing papers for several academic conferences during this time, though.]

Saturday, November 20, 2010

Arraylets, Schism, and Z-Rays

Garbage collection is a Good Thing™ -- few software engineers will dispute that. But garbage collection is a real pain for real-time software engineering. The problem is that GC cycles sometimes lead to pauses of unpredictable length in the program's execution.

One possible approach to that is putting a cap on object sizes: for any given size limit, we can calculate a maximal incurred GC pause time, thereby yielding predictability. This works real nicely... until you reach the problem of arrays. Arrays, by their very nature, can be of arbitrary sizes, and will mock any limit you put on object sizes.

An interesting solution to the unbound size of arrays is representing them, internally, as multiple, independent memory blocks, called arraylets. These were introduced in Bacon, Cheng, and Rajan's paper from POPL '03: A Real-Time Garbage Collector with Low Overhead and Consistent Utilization. The idea is rather straightforward: an array now consists of a "spine", which is an array of pointers to smaller array parts. The array lookup operation a[i], which was previously a simple calculation and a single dereferencing (i.e., deref(a + i * sizeof(T)), where a is of type T[]), is now a more complex operation, involving two dereferences: deref(deref(base + i / elements_per_arraylet) + (i % elements_per_arraylet) * sizeof(T)). The first dereferencing finds the relevant arraylet's location, the second finds the desired element at index i inside that arraylet.

While the new lookup code is still O(1), it does introduce a significant loss of efficiency. Still, it also gives us back control over the GC's performance. As was shown by Bacon et al., the result can be used in real-time systems.

In PLDI '10, two papers continue this research trend: Pizlo et al.'s Schism: Fragmentation-Tolerant Real-Time Garbage Collector focus on improving the GC algorithm where arraylets are used. The authors implement a new GC, "Schism/CMR", in the Fiji VM, that reaches 65% of Sun's JDK throughput, while being fragmentation-tolerant. (Lack of fragmentation tolerance implies a risk that long-running programs will eventually fail. From personal experience I can attend that this risk is very real, and that finding and debugging a memory fragmentation issue in a non-GC environment, namely C++, can be significantly harder than finding a memory leak. A previous effort, CMR, reached 84% of the JDK's throughput but was not fragmentation-tolerant.)

I found the second paper to be closer to my personal field of interest. Sartor et al.'s Z-Rays: Divide Arrays and Conquer Speed and Flexibility discusses a set of incremental, yet significant, enhancements to the basic arraylet concept. These include:
  • Separate memory region. Since arraylets are all of identical size (i.e., if the type is larger, there are fewer objects per arraylet), this greatly simplifies the memory management of arraylets.
  • First-N optimization. The authors show that, empirically, nearly 90% of array access operations use the first 4KB of the array. By storing this prefix in the array object itself, alongside the spine (rather than in an arraylet), they were able to reduce the performance overhead of using arraylets by half.
  • Lazy allocation and zero compression. Arrays in Java are initialized to all-zero (or all-null). Z-Rays are initialized by having all new arrays point to a global, immutable zero arraylet; actual arraylets are allocated lazily, on the first write operation. And whenever an arraylet is detected (during GC) to be all-zero, it is released, and the spine pointer is set to point to the shared zero arraylet.
  • Fast array copy and copy-on-write, taking advantage of the Z-Ray structure to optimize the library call arraycopy. Since many higher-level operations depend on arraycopy (e.g., insertion to and deletion from an ArrayList), this can be highly beneficial.
All in call, the authors reach a 14.5% overhead compared to using standard arrays (this cannot really be compared to the results of Pizlo et al., since different benchmark suites are used). More interestingly, the authors hint that a final optimization that they did not carry out could yield even better results -- possibly to the extent of being faster than a naive, continuousness array implementation! The idea is the improve the JIT compiler so that it becomes aware of arraylets, and reduces the number of arraylet lookups when scanning an array serially (a very common operation). When going over indexes 0 to size-1, there's no need to find the arraylet with every iteration, but rather only every elements_per_arraylet iterations. While array lookup remains slower than with naive arrays, the GC time savings apparently more than compensate for that.

Of course, this is mostly an unfounded speculation; we should now await a follow-up research, by the same authors or by others, that will carry out this optimization and report the results. Still, I find it absolutely fascinating how existing Java programs (real-time or otherwise) can possibly be enhanced by optimizing such a basic facet of the JVM -- something which was generally considered too basic to optimize. The benefits of managed environments keep on piling.

Sunday, November 7, 2010

Evaluating the Accuracy of Java Profilers

A well-known saying by Don Knuth is, "premature optimization is the root of all evil". But I was just convinced that there is, in fact, a greater evil in the world of program optimization: optimizing based on carefully-gathered, informative performance data... which is outright wrong. Unlike premature optimization, the evildoer here is not the codesmith, but rather the toolmaker, i.e., the profiler developer.

In the paper titled Evaluating the Accuracy of Java Profilers (PLDI '10), the four authors, Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, and Peter F. Sweeney show how four different, popular profilers for Java programs completely disagree with each other regarding the performance profile of the programs they analyze. These profilers are xprof, hprof, jprofile, and yourkit; and I am rather certain that many other profilers suffer from the exact same problem.

The authors show that the profilers provide surprisingly inconsistent answers, and then -- in the beautifully-written Section 6 of the paper -- explain exactly why this happens. Consider, for example, the following simple program (Listing 1 from the paper):
static int[] array = new int[1024];

public static void hot (int i) {
int ii = (i + 10 ∗ 100) % array.length;
int jj = (ii + i / 33) % array.length;
if (ii < 0) ii = −ii;
if (jj < 0) jj = −jj;
array [ii] = array[jj] + 1;
}

public static void cold() {
for (int i = 0; i < Integer.MAX_VALUE; i++)
hot(i) ;
}
Clearly, this program spends a small amount of time in method cold proper, and most of its time in hot; the method that profilers should point at is hot. Yet xprof, for example, attributes 99.8% of the runtime to method cold. Other profilers provide similar answers.

How come? As it turns out, these profilers (and presumably others) sample the program not in a uniform and randomized manner, but rather by relying on yield points in the compiled code. In this particular example, method hot has no yield points, so it is practically a blind spot to the profilers -- a stealth method, if you will! (This probably indicates that the problem is largely limited to profilers in managed environment, such as profilers for Java or C# programs.)

Yet if they all suffer from the same problem, why do different profilers still disagree with each other (while all being wrong)? Because of the observer effect, something Heisenberg could have told you about ages ago. Basically, the profiler itself affects the JVM and the runtime environment (memory usage, paging, cache state), and each profiler does so differently, affecting in a different manner the very program it attempts to measure.

In Section 7, the authors describe a proof-of-concept profiler they have developed, and show how it is significantly more accurate than the other profilers examined. Let's hope the techniques presented in this work will quickly find their way to real-world profilers.

Friday, October 29, 2010

Aspects and Laziness

Can aspects add lazy evaluation to a programming language?

Based on the example of Java and AspectJ, the answer is, sadly, no. Many years ago, somebody new to AspectJ asked in an AspectJ mailing list what amounts to: can AspectJ help us do lazy evaluation?

More specifically, the question was about logging. Logging statements in Java look something like this:
logger.log(Level.INFO, "Value is: " + value);
But since string concatenation can be costly, many developers place the condition (which is part of the log method) in an if statement around the logging statement, like this:
if (logger.getLevel().intValue() <= Level.INFO.intValue()) {
logger.log(Level.INFO, "Value is: " + value);
}
Which is (a) more efficient and (b) patently ugly.

So, can AspectJ help? What about an around advice like this,
void around(Logger log, Level level, String s): callLog(log, level, s) {
if (log.getLevel().intValue() <= level.intValue() {
proceed(log, level, s);
}
}
-- with callLog defined as the appropriate pointcut for calls to log?

This certainly looks promising... except it doesn't work. Sure, the advice will prevent log from being called, but it won't prevent the value from being calculated. In particular, the advice is called with a ready value for the string argument s, meaning any required string concatenation had already taken place. This just adds an extra if that will then be repeated inside the log method, and never saves any time at all.

There is no proper AspectJ solution to this problem (that I'm aware of). In a recent paper called Transactional Pointcuts: Designation, Reification and Advice of Interrelated Join Points (GPCE '09) [link is to ACM Digital Library; I did not find a free PDF online], Hossein Sadat-Mohtasham and H. James Hoover outline an AspectJ solution that, among other things, solves this problem, by allowing the developer to define a sequence of events as a single, "transactional" pointcut, which can then be adviced. In this case, we can treat the single statement
logger.log(Level.INFO, "Value is: " + value);
as two statements: one builds the string value (using StringBuilder), and the other calls log. By defining a transactional pointcut around both, we can prevent the concatenated string from being built and the log method from being invoked needlessly. Mission accomplished.

Except, not really. Leaving aside the fact that the transactional pointcut requires twenty lines to define, it is also extremely specific. It will not prevent the string-returning costlyMethod from being called in a statement like this:
logger.log(Level.INFO, costlyMethod());
because there's no StringBuilder involved. The solution is extremely ad-hoc and utterly helpless in solving the general problem.

(This is not to say that transactional pointcuts, in general, are a bad idea. They're just not a solution to this specific problem.)

So, how can we solve this problem (and further generalize the solution, so it's not just about logging)? Let's look at what other languages do. In Python, logging statements accept a format string, followed by a list of values. Format strings are similar to the strings C's printf accepts. So in Python, our familiar logging line would look like this:
logging.info("Value is: %s", s)
The formatting, if required, takes place only inside the logging method; no costly string operations are performed if the logging will be suppressed in the current logging level. Good!

But this solution cannot be generalized for operations other than string handling. In particular, the problem remains with statements like
logging.info(costlyMethod())
To address this, Python programmers can use Lambda expressions. A method can accept a callback function rather than an explicit value, if it knows this parameter is not always required and could be costly to create. So we can have a line like:
foo(lambda: costlyMethod() + 3)
And if all we need is costlyMethod, we can just use the function name as the lambda expression:
foo(costlyMethod)  # No '()'
Still, I think a much neater solution would be to allow a method to declare a parameter as lazy. Something like:
public void log(Level level, lazy String str) { ... }
which would be completely transparent to the caller.

Maybe in Java 12.