Audio: Luiz Barroso on Energy Proportional Computing

By Timothy M. O'Brien
July 1, 2008

You may also download this file. Running time: 00:15:48

Editor's Note: This interview is focused on an IEEE Computer article "The Case for Energy Proportional Computing" from December 2007. Grab a pair of good headphones, the sound quality is less than ideal, but we didn't want to hold this back. Luiz Barroso has some important things to say, regardless of the audio quality. Photo credit: James Duncan Davidson.

barroso.jpg

Tim O'Brien: I had the chance to sit down with Luiz Barroso a Google Distinguished Engineer with a PhD in Computer Science from the University of Southern California. The context for the interview is an article which was published in IEEE Computer "The Case for Energy Proportional Computing" in December 2007 with co-author Urs Holzle also of Google. In the article, both Barroso and Holzle make the case that dramatic changes are needed to the way that computers are architected and the way that hardware is designed. Based on a statistical analysis of servers at Google, Barroso and Holzle identify that most servers are operating an an average CPU utilization of between 15% and 45%. They identify this region as the region in which a computer operates the least efficiently while most efficiency benchmarks assume that a computer is operating at its maximum potential performance.

I began the interview by asking Barroso to explain the concept of Energy Proportional Computing:

Luiz Barroso:It's something we feel strongly about because it seems in principle something that's relatively obvious but it's something that I think that the computer industry has a whole has missed. When you talk about energy efficiency you're never talking about just one point in the activity spectrum; you're talking about the entire spectrum.

LB: It turns out that most people will present information about the energy efficiency of their computing infrastructure using the benchmarks that are available today which try to run the machine as fast as they possibly can. And, it turns out that when you use benchmarks, that people will design [computers] to do well on those benchmarks and as a result if you look at the ">SPECpower numbers that have been recently published, the energy efficiency of just about every computer out there peaks when you are running it full blast. But, every time you're not running the computer at 100-percent utilization you are much less energy efficient than what the benchmark results would have hinted at. In particular, if you look at server class machines, the argument that we make is that, especially for large server installations, servers are not like mobile computers or like PDAs that are on standby most of the time, and then when they're working they're actually trying to work as fast as possible to be responsive to the user. Servers are running in the--somewhere between the 10-percentile and the 50-percentile on average over time. And in that particular region of the activity spectrum our computers today are very inefficient and some of the reasons for that I think are because it is a bit harder to be energy efficient in that range for the type of components we use, particularly for things like DRAM and for disk drives. I think a lot of it has to do with the fact that people just haven't really focused on it. The reason I was asked to write that article was to provide a little bit more motivation for the computing industry as a whole to understand that. Does that make sense?

TO: Yes; it looks like you've identified a region between 15-percent and 45-percent as being the range in which most of the servers in your study were operating within. Is that true?

LB: On average, yeah; so see this is an average over six months. Of course during that time there will be large groups of servers that will be running much faster than that for example. Energy is different fundamentally from power obviously in the sense that energy is sort of an average over time as opposed to power which is an instantaneous metric, so if you focus on energy alone it's actually not unreasonable for servers to be in that range. Most well put-together internet services are likely to be on that range. There's nothing particularly embarrassing or wrong with that as an average utilization range. It's not unreasonable. Now this range that I mentioned is not important right. It's certainly not 100-percent; it's certainly not 80-percent or 60-percent. It's a number that on average is much smaller than that.

TO: Would it be accurate to say that we purchase our servers for the highest possible speed that they can run but the real number that we need to be interested in if we're looking to conserve energy is the efficiency in this sweet spot?

LB: That's exactly it.

TO: Is it true that you've been involved with the EPA setting standards for data center energy efficiency?

LB: I think quite a few of us at Google have collaborated with the EPA. Bill Weihl who is our energy efficiency czar has been more directly involved in that. I believe I went to maybe one of the EPA workshops and of course I provided--or myself and a bunch of people at Google have provided feedback to the EPA on--on that effort, which is I think a really--really nice effort and very timely.

TO: When you buy a water heater or when you buy an air-conditioner you get a number from the EPA, "this has scored 200 on a scale of 300 on an efficiency scale". Is there any way to solve that with something similar for computers--would be rating the CPU or the entire computer?

LB: That's a great question. It's much more difficult for computers, so I think it's sort of understandable to some extent that the computer industry is maybe behind the HVAC industry in that regard and in part because the HVAC industry is a much older industry. But it's much easier to measure performance of an air-conditioning system it is a more objective metric than a computer system. A computer system is such a general purpose machine that it's hard to pin down what it is producing for you. Now having said that it is possible for example--there's a combination of factors you have to consider. One of them is how efficient the entire data center is in using its energy budget and that's a metric that the industry uses called power usage efficiency or PUE that essentially tries to give you an idea of how much energy is wasted or used on things that are not computers. And so that's one metric we need to optimize; the other metric is the metric we're trying to craft as part of our Climate Savers Computing Initiative which is something that Google and Intel spearheaded in its initiative with the World Wildlife Fund and maybe hundreds of other members in trying to increase the efficiency of our conversion on personal computers and servers, and that is the power supply and the voltage regulator modules that are on the PCs and laptops trying to make sure that of the power we're sucking out of sockets in the wall that the vast majority of it is actually going to the chip in the machine as opposed to being dissipated as heat in the power supply or the voltage regulators. And we've established minimum efficiency requirements for that and we're working with the EPA actually as well--the Climate Savers Computing Initiative is working with the EPA and together setting standards for high efficiency in that regard. The last piece of the puzzle is once you put energy on chips how do you define the amount of work they're doing for you and that's the part that is tricky.

Now the SPECpower benchmarks are really good first steps. And I think right now I'm--the last time I checked we only had one benchmark which is a Java enterprise workload. But there's already quite a few systems benchmarks under that, and you know one of the things that they do very well in that benchmark is the fact that they don't measure the efficiency only when the workload is running at peak they actually give you 10--actually 11 data points from idle, 10-percent utilization, 20, 30--all the way to 100. In that sense it's very nice for the energy proportionality argument because it will give the potential customer that is purchasing that equipment a much better idea of you know where do they think they are on average in terms of utilization of that machine and what kind of energy efficiency they are likely to achieve at that point. So that was a long answer I guess because as I mentioned this is a really tricky issue but there is mainly three components to summarize right and we need to make sure that data centers are very efficient in removing heat-- a cool infrastructure and not losing energy in the power distribution, like transformers, etcetera.

We need to make sure that then once the energy gets all the way to the server itself that the power supply and the voltage regulators from the power supply are not throwing all this energy again before it gets chipped. And then finally, we want to make sure that once you put energy on memory chips and computer chips in disk drives that the workloads are being as efficient as possible and that's something that things like spec power benchmarks can help us with that.

TO: So is this something that needs to be solved at the hardware level in terms of moving towards CPUs that are more energy efficient or as you coined the phrase--energy proportional? Or, is this something that software can play a part in?

LB: That's a really great question. That's sort of the core of one of the things I want to mention on Monday. The real answer is both; ideally and what we are trying to push for is that the electronics industries help, the people that manufacture components use the DRAM, disk drives, flash the components that go into the computer--they need to do better at making their components naturally energy proportional, which should be the case that if you're only utilizing your memory to a certain small fraction of the capacity that your memory system will only consume a small fraction of its maximum energy capacity. If you don't have that kind of natural energy proportionality then you'd have to solve this thing somehow in software. And in some cases it is easy but in a lot of cases it's actually very, very hard because you know software especially for a large-scale internet service is very, very complex.

A lot of things that you want to do to make an internet service very efficient and very reliable kind of go against the grain or against our management essentially, and hopefully I'll be able to make it a little bit more clear in the talk than I'm making today. But essentially there are two aspects of this; the article that you're looking at tries to talk about it--unfortunately in only one or two paragraphs but two things that are tried and true methods, the designing a good quality distributed system and the internet services are--essentially distributed system software--are doing a really good job at load balancing and doing a really good job as widespread data distribution. These things help you in performance and help you in making sure that you're very likely to have your data available to you so you won't have data problems. And when you do that you create situations where it's very unlikely when you are in the period of lower capacity that you have--any one individual machine being idle for long periods of time. What you have is instead of lots machine working very hard, you still have lots of machines working a little bit less hard, but you don't have lots of useful periods of full idleness. At that point the software that has to create these periods of idleness while allowing the machines to be more energy proportional--if the underlying hardware is not--it's really challenging; all right, it's really difficult and potentially will end up sacrificing potentially performance or availability which is not good either, right. So it's a tough problem.

My main argument is that this cannot be fully solved in software the component manufacturers need to do better at that for us to have systems that as a whole the combination of hardware and software will be more energy proportional.

TO: Is this going to force hardware manufacturers, people who make CPUs like Intel--is this going to force them to stop focusing on the clock speed and start focusing on other concerns like efficiency? Is what you're talking about--the energy efficient, the energy proportional computing sort of the end of Moore's Law?

LB: No; no it isn't. I mean it--it will force them to focus not on performance alone but they're already there. We don't need to push them in that direction. There was another power-related issue with CPUs a few year's back that has already forced them to do the big switch to multi-cores for example. That wasn't necessarily an energy issue and that's sort of a--a little bit of a subtle difference but the main reason why you've seem more multi-cores instead of seeing more megahertz these days is because of temperature and temperature essentially you know is a function of power and area. So these--you know folks just were unable with next generation CMOS technology to operate chips at very high frequencies and still not have it melt on you essentially. So that issue alone already has caused the--the CPU industry to take a broader view of value that's not just megahertz but it's performance for cost and performance in general. Now I must say that the CPU industry is not perfect in terms of any proportionality but they are ahead of the other components in your system, like if you look at CPU energy usage across low-activity to high activity it's varied by a much wider margin than what you typically see for DRAM for example and for disk drives in particular and even that's working.

So one of the arguments that I would try to make in that article is that this is not just something that the CPU manufacturers are going to have to do but in fact they are doing a better job attacking this power right now than the rest of the eco-system. But this has to be a fact everywhere.

TO: Is there anyone in the industry doing anything exotic with CPU design that could affect power consumption? Is there anybody looking at moving away from CMOS to something different that would use--that would consume less power?

LB: You know lots of people are looking for it and that's the part that's a little bit sad in that there's no good candidate in the horizon. It looks like we're still going to be able to scale CMOS for a little while longer; you know you should ask other experts in that industry and not mine--people talk about you know within the next 10 years CMOS is still going to be around. There's not going to be anything that is going to replace CMOS you know within the next 10 years. It looks like realistically as an industry we're stuck with--with CMOS technology. There's nothing that's much better than is an obvious winner at this point. So it's a really tough problem we need to tackle.

PHOTO CREDIT: Photograph from Velocity 2008 in Burlingame, California, presented by O'Reilly Media. Produced by Good Company Communications. Photograph copyright James Duncan Davidson.


Some Related Links:



You might also be interested in:


Popular Topics

Archives

Or, visit our complete archives.

Recommended for You

Got a Question?