Premature Optimization vs “Prototyping”

“Premature Optimization is the root of all evil.”
–Donald Knuth

In response to <a href=”http://www.alexjacobson.com/archives/2004_04.html#000046&#8243; my post of a relational database management system in HaskellI got a bunch of backchannel feedback from folks about whether Haskell is just a prototype language and about the utility of this sort of code. I’m going to try to answer it here.

Concern: Functional languages may be easier to use but they consume more CPU and memory which can cause problems when you need to scale.

Response I don’t know that Haskell performs substantiallyworse than e.g. Java or Python on various tasks, but, either way, my bet is that CPU and memory have gotten sufficiently cheap so that it takes a while before
scale hits you and until that time, the thing to optimize is development time. Also, scale is more of an architectural and systems issue than a programming language issue. 1-10x differences in memory/cpu just aren’t enough to justify
language decisions given that your choice of algorithm (O(n) vs O(n^2) vs O(log(n)) matter so much more when you get scale and Haskell make it easier to program efficient algorithms. However, although on the memory
consumption side, Haskell does not yet have a 64-bit implementation, its compiler developers say it would not be hard to get there. (Note: There is no 64 bit java either). On the CPU side, there is already GPH, a version ofHaskell that parellizes code accross multiple
CPUs. Haskell’s referential transparency means that it is much more straightforward to parellize execution than it is in traditional imperative programming lanugages.

Concern: Why not use an off-the-shelf RDBMS?

Response:Off the shelf RDBMSs are optimizations for a prior era when memory was VERY expensive and people believed in long transaction locks. Today Sun’s web site offers servers with .5 terabyte of RAM (nearly 2k for every man, woman, and child in the US!) and the management costs of big disk arrays combined with the development and CPU cost of optimizing disk read/writes mean that memory is actually cheaper than disk in many contexts. (Note: Google doesn’t use disk for exactly this reason). The web made long duration transactions obsolete. All the information for a database transaction is available in a single HTTP request. If the server imposes a total order on all HTTP requests (see SEDA), then write ahead logging of HTTP requests combined with periodic serialization of server state to disk obviates the need for much of the transaction infrastructure in databases (see Prevayler). The bonus here is much faster development time and a substantial performance improvement because you are not writing and executing code to martial data/queries to/from some external relational database. (Prevayler in Java claims a 1000x improvement over MySQL and a 3000x improvement over Oracle)

Concern:Why not use a more standard language (databases are a small part of application code and there are tons of examples of how to do stuff in traditional lanuages with OTS DBMSs)?

Response: If I can write a RDBMS in Haskell in a couple of part-time weeks in under 1000 lines of code then you can probably write your application equivalently fast and integrating that app with a Haskell DBMS will be easier than calling an external one (and save you money on system administration). Also I recommend this Paul Graham here:

We knew that everyone else was writing their software in C++ or Perl. But we also knew that that didn’t mean anything. If you chose technology that way, you’d be running Windows. When you choose technology, you have to ignore what otherpeople are doing, and consider only what will work the best.

This is especially true in a startup. In a big company, you can do what all the other big companies are doing. But a startup can’t do what all the other startups do. I don’t think a lot ofpeople realize this, even in startups.

The average big company grows at about ten percent a year. So if you’re running a big company and you do everything the way the average big company does it, you can expect to do as well as the average
big company– that is, to grow about ten percent a year.

The same thing will happen if you’re running a startup, of course. If you do everything the way the average startup does it, you should expect average performance. The problem here is, average performance means that you’ll go out of business. The survival rate for startups is way less than fifty percent. So if you’re running a startup, you had better be doing something odd. If not, you’re in trouble.

Beating the Averages

Concern: OTS RDBMS’s do indexing.

Response: My HARDBMS automatically indexes all fields (and lets you define custom datatypes and indexes for specialized data). Moreover being native means that recursive queries (such as all the friends of friends of friends etc.) can be done with reasonable performance (traditional database require a round trip to the DBMS for every level of recursion and that quickly becomes too hard — note the performance problems of Friendster running MySQL!).

Advertisements

2 Responses to Premature Optimization vs “Prototyping”

  1. mike says:

    Friendster’s performance problems are not with mysql, it is the computation of the friend network, which isn’t done at the database level, from what I understand.

  2. My understanding is that the performance problems is that computation of the friend network requires download of basically the entire database on each query or they have to engineer synching the mysql data with some other network traversal database….etc.

    Can you think of another alternative?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: