James Turner: This is James Turner, contributing editor to O'Reilly Online. With me today is Brian Aker, Director of Technology for MySQL. Brian is the author of Running Weblogs w/ Slash. He's also leading a tutorial at O'Reilly's Open Source Convention, July 21-25, in Portland, Oregon, titled "Memcached and MySQL: Everything You Need to Know." Good day, Brian.
Brian Aker: Ah good day.
JT: Why don't you start with a little bit of a history of where you got to where you are today?
JT: You can take it from there I think would be good.
BA: [Laughs] Okay; so the story goes like this; we used to work on a project a very long time ago called the Virtual Hospital. You may remember that from early Webbies and at the time we were running one--one of the projects we were running needed a database and at the time we had--didn't really have any other licenses for any other databases for the project. So what I had done is I had went out on the net and said okay; can we go find a database and the only database we could find that would really work that we thought may be for free was this thing called Mini-SQL and was written David Hughes and he's an Australian. So we wrote the application around Mini-SQL and it promptly blew up within about two months--if it lasted that long; it turns out that the database really couldn't handle anywhere near the sort of data that we had inserted into it.
So at that point I went hunting around and I found MySQL; it took about 30 minutes to port to MySQL at the time and then it took all of three or four days before I found any bugs and then I just kind of started fixing stuff and started exchanging email back and forth on the mailing list and from there on I pretty much have been either hacking or working on MySQL or working with it or something for the last 10 years.
JT: What would you say your major contributions have been into the MySQL chain?
BA: Well I'd say my major contributions around it have generally been in the addition of storage engines and of ideas of how to--early on we had this concept of --we will have different storage engines and that comes back from the previous version of MySQL which was a database called UNIREG and what I wanted to do and what kind of the agreement was for when David Axmarktalked me into coming to MySQL is I wanted to make that a livable design. So what I did is I and a number of other people, Antony Curtis, Sergei Golubchik, pretty much started working on, let's try to make this thing a--an actual interface that has a design that would allow us to dynamically load engines. Now I would say we are at the first iteration of that at this point but that's where--that was kind of one of the major contributions that I want to do is the idea of bringing on features into MySQL in a modular and kind of online fashion.
JT: There was some controversy when Sun acquired MySQL because some of the engines--there was some question about whether they would be able to be carried over in a Sun acquisition. Has there been any issues there?
BA: No, no; there's been no issues around engines or anything like that. Today pretty much the market for engines is there; the one that MySQL maintains, there's one that Oracle maintains; IBM has one, there is PBXT; I think they're now up to a couple employees working on that. No; the market for engines are--I don't know; it still looks to be pretty steady.
JT: So what has been happening lately with MySQL that makes you excited?
BA: A lot of what I've seen that gets me excited about MySQL is I--it's got--kind of a change or a view internal of what we should be doing and how we should be doing it. I spoke the other day to one of our Replication Engineers and he was talking about you part of the design that they're working on and how they had worked out the design behind an interface so it should work to MySQL's benefit in that case, so kind of that whole modularity thing. That part had me fairly excited. Some of the--I was exited to see some of the work that Hickey had done inside of InnoDB as far as the page compressions and so forth. It's been really neat to see what Monty has been doing with Maria as far as him rethinking how MyISAM would be done both from a standpoint of going from what is really a stream format to kind of a--a page format with the possibility of transactions and such, so I find that pretty exciting to watch.
JT: MySQL has kind of been moving up the food chain over the last few years. The line at which you see I need to go and use Oracle or I need to go and use Sybase for the people who still say that seems to be moving up from year to year. How do you consider yourself or the product compared to the traditional big SQL databases at this point?
BA: Well let's see; if you're talking about big SQL databases feature wide--so what I believe personally so if you look at what the majority of users do is they're using SQL at about what we call the SQL92 standards; so about 92 is your basic create a table, insert, cursor based stuff and so forth. There is obviously a need for stored proceedures for some people especially for more legacy based applications. So pretty much today there's a pretty good on-par for all of the relational databases at this point for this level. This--there's a lot of things in 99 that either we don't support or half the other vendors don't support either, but then there was a lot that was done there that people thought would be exciting five years later and it turned out to pretty much not be all that exciting.
BA: So what went into the decision? Well I heard what went into the decision was that Jonathan talked Martin into dinner--that's what I heard. In fact, I think that can even be pretty easily confirmed. With some things about Sun, Sun has been around for a very long time. They have some incredibly capable hardware engineers and people who are really I think a generation or two out there as far as what is being built. You know the more of when I get to poke around Sun and see what-- what they're designing and hardware-wise and it gives you kind of a view of what the future is and that's pretty cool to actually think of when you're writing something like a database. It also means that you get an idea of where is hardware going, more than just beyond a simple concept of Moore's Law; I mean some of the stuff is we're moving to solid state disks; we're getting more and more cores. You know all of that stuff you can generally get. There's other pieces to it, and I think it just really enforces how quickly that kind of world is going to be hitting us.
JT: Sun has been kind of all over the place lately with their Open Source strategies; they have open sourced Java; there was some rumors that they were going to actually close parts of MySQL and then they didn't. Where do you guys fit into Sun's strategy?
JT: Sun obviously has a large investment in Java; it's been one of their crown jewels. Bringing you onboard are we going to start to see more of an integration of Java into the database and the database into Java?
BA: MySQL is a C, C++; there's already a Java DB out there so I don't see much of a reason to translate it. there's been an interest around--one of our Engineers, in fact a couple of our Engineers, Antony Curtis and Eric Herman are out writing a way to plug into external languages into stored procedures--something that Postgres has actually had for a while now. We've been looking--I think they--I think Antony and Eric looked a little bit at their design and looked at what Oracle did and been trying to come up with something that looks pretty reasonable. That--at that code you can actually go and find that code off on Launch Pad if you're interested in finding it. But that's bringing in not only Java, but that's bringing in--I think they already have a Lua one up; they have a perl one up. That's bringing in many languages into the database. So really if you're going to write--if you want to have your applications that were built into your database there really shouldn't be any reason to stick you with learning a language that you don't want to use or a language that's really not optimized for what you're doing. Language should be whatever you want to use it as.
JT: Turning more to a general discussion of databases; in a lot of ways it seems like there have been slow incremental improvements in the SQL standard but pretty much the SQL you were writing as you said earlier 10--15 years ago looks a lot like the SQL that you write today. Is the underlying philosophy and architecture of databases stagnated; is there something that really needs to happen in terms of a shift of thinking?
BA: I think there's two things right now that are pushing the changes; they're really pushing the database world. The first thing that's going to push the basic old OLCP transactional database world, which you're right--that world really hasn't change in some time now--is really a change in the number of cores and the move to solid state disks because a lot of the code that has been written today or a lot of the concept around database is the idea that you don't have access to enough memory. Your disk is slow, can't do random reads very well, and you maybe have one, maybe eight processors but you think about yourself like-- you look at some of the upper end hardware and the mini-core stuff, like some of what Sun has got to a lesser degree Intel has got and you're almost looking at kind of an array of processing that you're doing; you've got access to so many processors. And well the whole story of trying to optimize for getting away with --trying to optimize around the problem of Random IO being expensive well that's not that big of a deal when you actually have solid state disks. So that's one whole area I think that will not actually push but it will cause a rethinking in what we call--what we think of today as the standard Jim Gray relational database design.
On the second side of this which may actually be more exciting is the issue of--instead of the structured data world of the relational database but the semi--the semi-structured world. You look at what is being done today with CouchDB, you look at Amazon ScaleDB, to a lesser extent but to a similar extent you--not ScaleDB, SimpleDB--to a lesser extent or a similar extent Tokyo Cabinet, those databases are really kind of fascinating because those databases are redefining really how we access data and how we are going to be searching and using data. So there's a whole world out there that's just starting to open up in that direction.
JT: Another thing we're seeing is as part of the whole platform as a service movement you're seeing things like www.force.comand Amazon S3 are kind of blurring the line between what you're--where your application lives and where your database lives and having to keep your data close to your in-house as opposed to close to your server. Where does that play into where databases are going?
BA: You mean as far as--well I mean you still have an issue of latency so no matter what you've got data sitting in something that you have to run through an application server and there's latency issue involved in that. So that--that's an unsolved problem at this point, trying to make sure that everything is in close proximity. The second thing of your question, people at this point are not wanting to deploy data centers; in fact it's becoming rather egotistical to say you want to build out a data center. There is no reason why --if you're building something new today you're going to be building it in some kind of hosted facility whether that be something that is a little more traditional along the lines of say a Rack Space or a DreamHosting up to something as nontraditional as EC2 something like that's just even the application process like what we see in Google Apps or--or Project Caroline. You know that in itself is an entirely different realm. But it's certainly the, buying a rack of servers and keep buying racks of servers. There doesn't seem to be a right way of spending your initial capital anymore.
JT: Where is MySQL going to fit into there?
BA: Well the only thing we know today is MySQL is quite popular inside of the EC2; I think we're told that we're generally the most popular database brand inside there, so it kind of gives strength to MySQL's general lightweight and easy to install. I think you'll see a few more things with the technology than what we're doing today. I mean certainly on one side of that you can already see some of that with the proxy project and how people want to do automatic sharding of databases. There's certainly a quite a bit of that going on.
JT: So how about service oriented architectures? Does that put any new demands or different demands onto database users?
BA: I think it does; well it actually removes some demand. It removes some of the demand by--you mean by--in use of them or in trying to create them?
JT: I think in both because you've got--you--as a consumer of it, it kind of changes your data model and as a creator of it you have to think in different ways about how you want to gather your data.
BA: Yeah; I mean in the--the user of it, if we look at for instance, if we look and say how RightScale works today or there's a few other vendors that have started popping up in that case or even a SimpleDB, you're all pushing the cost of management of the database to another --to some other vendor to handle for you, which means that you deal with whatever their-- whatever their latency issues are, whatever their growth issues are, but you get to walk away from that problem and that's something that is pretty sexy to a lot of people. As far as building out architectures today, I think that's, it goes back to the reason of why we see proxies and stuff being talked about. You know very few--very few application environments today do you have something which is really not an object relational model that is being mapped into a standard relational database. Because of that ORM though that means that you can do all kinds of little nifty things like it makes sharding very easy because you've got objects you're moving around. I was introduced recently to a database called ZopeDB which has--it's sort of python specific. It allows these people to store natively inside of the--inside of its own structures where it can then use MySQL to store its data, so that's kind of-- that kind of stuff is pretty fascinating to see.
JT: Looking forward, what do you see the big challenges for database vendors and database users being over the next say five or ten years?
BA: I think on the database--I think on the database vendors I think the hard things we have in front of us is that I don't think anybody is built today to handle the number of cores that are going to be thrown at us in the next decade. I think we are all still scrambling for a design that will actually work. I think in that scramble to find a design that works in that environment it's going to open up the playing field for new databases to actually show up. So it's kind of--one of the first times in a while that there might be a good chance that somebody else could build a new database from scratch. The same thing goes with solid state disks; all the designs that we see today are pretty much optimized around hard disks. There are very rare--very rare exceptions to that, so to me that is something that is going to be a challenge for all the data-- for anybody writing a database today.
As far as users go I think that the number of different technologies coming out really kind of opens up--it really opens up the world of what they can actually make use of. It's not just a database today; you can have anything from your-- you have structured data to unstructured data to semi-structured data, so for your unstructured data what are you going to pick out there to use? Are you going to go down to the level of something like an Isilon to store data; are you going to move up to a MogileFS? Are you going to--are you going to push that out to like an S3 and have somebody else handle it? The semi-structured stuff is still so new that I don't think we have a clear destiny at this point. And on the structured side it's going to be finding the right database that matches the hardware that you're buying today.
JT: To finish up you're going to be speaking at OSCON this summer. Do you want to give us a little bit of a preview of what you're playing to be talking about?
BA: Well at the moment I know I'm confirmed on at least one thing. There's at least two other things I'm unconfirmed on, so I'll talk about the confirmed thing. The confirmed thing I'm talking about is myself and dormando who is another co-writer on a project called Memcached we're going to be doing a tutorial I think on the first day of the tutorial sessions for OSCON and presenting for about three or so hours on all the internals and how to work Memcached, when not to use Memcached, a number of different architectures and use cases that you can use with it, so that will be pretty exciting. We've actually made a--if you Google for it you can find an memcached study which is a PDF we have put together because we're trying to get more people interested in that technology but also kind of understanding where and when to actually deploy it--and even when not to deploy it; so that's the main thing. I also hear that we will be doing something at the Conference on the issues around creating crippleware versus Open Source, namely how can you take an Open Source project and make sure that it's Open Source but still allow people to extend it in both Open Source and proprietary ways but in ways--but in some manner that it doesn't-- you don't end up with some crippleware application.
JT: All right; well it's been great talking to you, Brian.
BA: Hey it's been great talking to you as well.
JT: We've been talking with Brian Aker, Director of Technology for MySQL and you can see Brian this summer at OSCON in Portland.
Photo of Brian Aker courtesy of Julian Cash.