Google to offer web cache

May 5, 2005

Fron CNet:

A beta, or test version, of Web Accelerator was introduced via the Google Labs technology incubation site late Wednesday. The tool, which must be downloaded, will tap into the power of Google’s global computer network and thus help sites load faster, according to the company.

Web Accelerator works by sending URL requests through company servers designated specifically for speeding site downloads. The application also can compress site data before sending it to computers.

Testing New Anti-Spam Software

April 27, 2005

Based on my sysadmin’s recomendation, I am experimenting with CRM114. CRM114 is a system that needs training to operate well. It requires training on error rather than training on a corpus of prototypes. So, I wanted to have it installed so that all legit mail comes to my inbox and all my spam to a spam folder. While training I will check both folders, saving false positives in a teach-non-spam folder and false negatives in a teach-spam folder. For this to work I needed a program to grab the contents of my teach folders and pass them to CRM114 for training. So I wrote some code that I actually put into production! If it works, I will put into into production for the other people on this server. It works for anyone using an IMAP server with access to shell. Now I have cron run it every 5 minutes and I am now training. Yay. The source is here. Feel free to use it and tell me what you think.

HAppS 0.2 Released!

February 22, 2005

I’ve been working on an application server in Haskell on and off for a while. Here is the announcement I just posted to the Haskell mailing lists:

HAppS is a Haskell library for building Internet applications, featuring:

  • HAppS.ACID: Guarantee application integrity in the face of unplanned outages using this module’s integrated write-ahead logging and checkpointing framework.
  • HAppS.DBMS: Do relational operations in Haskell (rather than SQL) on Haskell sets (outside the IO Monad). Define custom indices for your Haskell datatypes (e.g. geographic/geometric types). Use in combination with ACID for a robust relational DBMS customized for your application.
  • HAppS.Protocols: Expose your application using as an HTTP server and/or by recieving and sending SMTP.

HAppS is released by under the terms of the GNU General Public License Version 2, a copy of which is enclosed with this package.
How do I get HAppS?
More information is available at
A tarball of HAppS is available at HAppS is also available via darcs:

darcs get

Another reason to build in Haskell

January 13, 2005

Someone built a web server in Erlang. It scales really well on multiprocessor architectures because of its threading model. I believe that Haskell has a similar threading model and therefore could achieve similar performance.

Via John Udell, Patrick Logan makes this point in the contect of pointing to this register article is saying that multi-core computing is the future and this bodes well for the sorts of threading models found in Erlang and Haskell:

A single Erlang node on a single CPU today can comfortably get into the tens of thousands of dynamic processes. What would your system look like running hundreds of thousands or a million dynamic processes and lot of activity is spent collaborating with other systems also running at that scale?

Thinking about a better open source license

November 16, 2004

The GPL and the MPL combine with Moore’s law and the Internet to allow people to take source private by operating it as a service. I am planning to release an app server open source and want to remedy this problem. Here are the key terms. I’m looking for comments:

Simple Source License, Version 0.1
Copyright (c) 2004 All rights reserved.


The copyright holder of the covered work has released it without charge to the developer community in the hopes that it will be of value and in the hopes that other developers will in turn make and distribute improvements back and in turn get the same value.

This license reflects these hopes by permitting anyone to use, modify, bundle, or provide a service derived from covered work in whatever manner they choose and for whatever they wish to charge as long as they make the associated source freely available as well.

Terms and Conditions
1. Scope of License
This license covers any work that contains notice placed by the copyright holder stating that fact.

2. Covered Works
You may reproduce, modify, display, and/or distribute verbatim copies of covered works.

3. Modified Versions Of Covered Works
You may reproduce, modify, display, and/or distribute modified versions of covered works or patches to original or modified versions of covered works
a. only if they retain the original’s copyright, license, disclaimer, and all notices, if any, thereof,
b. only if they include prominent notice that they derive directly or indirectly from covered work and are NOT verbatim copies thereof, and
c. only if they include some way to identify the covered work from which they directly or indirectly derive.

4. Derived Works and Derived Services
A derived work is any work that results from modification or transformation of derived or covered works. A derived service is any service that depends in part or whole on the operation of derived work. Associated source is any
source used to create a derived work or provide a derived service. You may distribute derived works and/or provide derived services and/or permit others to do the same only if for the following year you make all associated source
available to the recipients of such works and/or users of such services
a. at no charge,
b. at a URL displayed prominently and designated as such in the acknowledgment section of the associated end-user documentation, and
c. under terms conformant with Open Source Definition 1.9 or later as available from the Open Source Initiative or under this license,
d. and only if the license or terms of use conspicuously advises the recipient and/or end user of their rights to the source under the terms of this license.

5. Acknowledgement
Any derived work or service must have end-user documentation with anacknowledgment section containing the following text:
“This product includes software developed by [copyright holder’s name] ([copyright holder’s URL]).” (replacing bracketed text with its value).

How Much Does it Cost to Serve EBay

June 22, 2004

According to Carlos Perez eBay serves ~400M page views per day on a database of 60 million auctions with 30k lines of code changes per week with 99.92% uptime. According to eBay’s 10-k, it has 41.2M active users and is growing at 50% per year! The 10-k is ambiguous but it looks like eBay depreciated at least $21M of equipment last year. I think eBay is paying WAYYY too much for server ops.

A modern CPU can handle ~10k HTTP requests per second (see these benchmarks. I don’t know how much memory eBay’s database consumes, but we can make some estimates. Lets assume each auction and each user requires 1k of memory. 60M auctions plus 40M users implies 100GB of memory. Now lets double that for indexing and other overhead and we are at 200GB.

Today you can buy a Sun E25K for $3.6M with 72 processors and up to 500GB of memory. That seems like enough to handle eBay’s load. Potential problems

  • Need to save to disk: Use Prevayler style architecture (Note there are major problems with Prevayler itself, but I am describing a style).
  • Memory access bottleneck: Someone with more hardware knowledge, please enlighten me.
  • Failover: Ok, buy two E25Ks. Have the second one always recovering.
  • Code Changes: This one strikes me as more difficult. Getting adequate web server performance requires compilation. You need a development infrastructure that allows dynamic modification of code.

How To Run An Organization

June 15, 2004

I was talking to a friend a few days ago who is trying to organize a comunity living space. He was talking about all the detail he had to manage. Some general thoughts on leading a business/organization:

  • Establish a clear mission/strategy. A leader’s fundamental job is to get everyone to understand it. Once they do, they can execute against it without undue communication burden on the leader. In Built To Last the authors argue that religious adherence to a mission/strategy is the key to the success of great companies. Note, a mission is much more specific than e.g. earn money. It is something the heart can grasp. It is something that excludes as well as included.
  • Let people do the things they are best at even if their best is worse than yours. You need to do the things you are best at. This is basic David Ricardo and it is as true today as it was in the Eighteenth Century
  • Seek excellence. In many areas of cognitive skill (e.g. programming, writing, selling, managing, etc) the best is more than 10x better than the worst. If you can find these 10x better people and get them working for you in the areas they specialize, that is a gigantic win. Organize the company to leverage their excellence and insure against their inadequacies. Note: You too may be both excellent and terribly flawed. Insure against your own flaws too.
  • Trial and Error. In Art and Fear, the authors tell a story of divding a ceramics class in two. One half is told that they will be judged on quantity. At the end of the term, all their projects will be weighed; 50 lbs gets an A, 40 gets a B, etc. The other half will be judged on quality. They only have to produce one piece. It just has to be great. At the end of the term, all the best art came from the quantity half. Think about why. There are lots of reasons. Structure your business so you can iterate quickly and learn fast. Lower the cost of trial, insure against error.
  • Plan and execute. Eisenhower one said ” In preparing for battle I have always found that plans are useless, but planning is indispensable.” Planning helps you react effectively when things go wrong, because part of the planning process is thinking about risk and contingency. By the way, things always go wrong. Don’t freak. Just handle them.

By the way, it is much easier to say these things than to live with them. I am writing this post as much to reinforce these thoughts in myself as I am to share them with my readers.