MySQL forks: could Drizzle be the next of the new generation of relational database?

By Andy Oram
July 22, 2008 | Comments: 6

I had a brief talk with leading MySQL developer Brian Aker today about one of the biggest turns in MySQL history: this morning's Drizzle announcement. Brian presented Drizzle as an irrevocable fork of MySQL. To me it represents four deliberate steps in one: two big steps backward and two big steps forward. It could also be the signal of a new era in databases.

The big step backward, as is obvious immediately to anyone who has followed MySQL over the past few years, is the removal of precisely the major "enterprise" features that MySQL worked so hard to add between versions 4 and 5: stored procedures (which actually were added between 4.0 and 4.1), views, and triggers. A few other features were also removed.

Aker presents this step as a return to the quick and lightweight MySQL that made it popular in the first place, a database engine that may not appeal to large corporate back offices but can easily power web sites. I see it also as a step back to the philosophy that Aker calls "Databases without business logic": let the application handle consistency and complex calculations instead of making the database do them. Trust your programmers.

The first step forward is to position MySQL to better handle the physical infrastructure of modern computing (Aker cites clouds and multicores). The second step forward is to welcome vibrant participation by the community, something that up to now has eluded the MySQL AB company (now part of Sun Microsystems).

MySQL was always free as in beer (to many classes of users), but it wasn't placed under an open license until pretty far along in its existence, Even now, the company's dual-licensing strategy drives them to compulsively hire the best contributors from the community, so that no significant base of code can build up outside of their ownership. (That said, the community has developed many wonderful utilities and some new storage engines of promise that are not owned by MySQL AB.) And they still use the powerful but proprietary tool BitKeeper (which the Linux community sidestepped long ago) to maintain their source code.

Drizzle comes out just as Margo Seltzer (a leading CS researcher and former member of the Berkeley DB team) publishes an article called "Beyond Relational Databases" in the most recent issue of Communications of the ACM (July 2008). Seltzer's complaints echo a lot of Aker's. Databases offer more features than anyone needs (like most software packages nowadays); they have correspondingly become slow, hard to administer, buggy, and expensive to deploy; and they need to slim down in order to adapt to the wide range of new hardware and applications that they have to work with.

Seltzer calls on manufacturers to make databases more modular, obviously what Drizzle is doing with its micro-kernel approach. But she also wants users to be able to choose from a menu of features. Don't need transactions? Leave 'em out and save a lot of overhead. Don't expect a lot of concurrent queries? Skip threads and run each query in its own process.

Aker does not present Drizzle as a configurable collection of options; he just promises to strip MySQL down to what he considers the essentials for modern web-driven applications. So for now, Drizzle is not the extensible database Seltzer envisions. But one can't avoid speculating about what MySQL itself could look like if the company adopted this micro-kernel approach and started adding back features (which Aker insists Drizzle will not do).

After all, forks are expensive. What company wants to maintain two completely different code bases? It looks like Aker conceives of Drizzle as a community-maintained project, with nominal support from Sun. I get the feeling this new database engine is a pet project of Aker and his community collaborators, who don't feel a need to look back or consider the long-term business needs of the larger MySQL project. But if both Drizzle and MySQL are successful over time, somebody is going to insist on coordinating them again.

Now we have to consider where MySQL is right now. Its innovative approach to multiple storage engines is definitely modular, and some of Seltzer's suggestions (such as allowing a choice between hashes and B+ trees for index storage) are present in MySQL. But I don't believe MySQL is modular in the way that it would have to be to support a separate Drizzle project. I'm sure some refactoring would be necessary to achieve the radical configurability Seltzer wants.

But I hope it happens. I think it could bring databases even more thoroughly into the next computing age. And if MySQL goes this route, other projects will do so too.


You might also be interested in:


6 Comments

What a joke! As a programmer who's written many a DB-backed app, I don't want to have to re-invent those features on the application side where they will be weak, poorly optimized and potentially unportable. "Let the application handle consistency" translates to "abandon any guarantee of consistency." This is a giant step backwards.

I'm not clear why there's a need for a stripped down client-server DB.

In the majority of instances, for low traffic (beginner) websites you'd not even need multi-user client-server MySQL, and SQLite would work just fine for the level of concurrency required - which is mostly "here's a set of ID keys, get the page content"

@G - the point if you're already managing the consistency in your application (ie. you're not taking advantage of the features in the DB) then you can probably speed things up by removing support for them. Makes sense to me for some web stuff where you control the client and the server. No way would I remove consistency from the DB for desktop clients... russian roulette until some user doesn't apply the latest update and so corrupts everybody elses data...

-

@Neil - SQLite is fine when you don't really *need* SQL, any type of data storage would do and you just happen to choose SQL because it's the easiest solution.

If you're dealing with a low traffic website, you wouldn't notice (or care) if your data takes 0.5 or 1.0 seconds to retrieve. You'll probably still be waiting for images to load even though it's 100% difference - for a high traffic site, that's the difference between running 5 or 10 database servers to handle the load for example.


Hi!

The idea with the micro-kernel is to allow others to extend, think Apache. So will someone someday maybe want to have a place to insert an application engine, aka SP, into the database?

Possibly?

We are not adding that connection point to the kernel right now. The focus right now is ACL, logs, and functions.

Cheers,
-Brian

Hi Andy,

You make some good points but I don't think MySQL can make the cut for cloud computing. In that same issue of the ACM mag, there are some other articles, like the one on BASE. Anyway, my point here is that the architecture of MySQL is not what will scale in a cloud environment.

SimpleDB and BigTable are the first entry points but those really won't cut it for enterprise applications. They can scale but aren't ACID.

What is needed is ACID (or a good replacement) and scalability. Oracle RAC is is the first generation that really does that but it can only scale to cluster size (say 100 or so nodes).

Vertica is truly scalable. There are a couple of others coming like Hadoop and, possibly, LucidDB. I'm sure others are on the horizon.

Considering that, I just don't see a good reason for Drizzle. Why not make SPs, Views, etc installable options? I just don't see Drizzle being significant.

Thanks,

LewisC

I applaud this back to basics approach! The MySQL story has been a David Vs Goliath battle, but David was gradually turning into Goliath and its good to see them take stock of where they were going and what the original goals were.

It's inspired a fairytale that I've posted on my blog at blogs.ingres.com/emmamcgrattan

Popular Topics

Archives

Or, visit our complete archives.

Recommended for You

Got a Question?