Ten minutes with…Christof Fetzer, Technische Universität Dresden
Christof Fetzer holds an endowed chair (Heinz-Nixdorf endowment) in Systems Engineering in the Computer Science Department at Technische Universität Dresden (TUD), as well as being chair of the Distributed Systems Engineering International Master’s Programme. His PhD students Thomas Knauth and Lenar Yazdanov also work on the ParaDIME project.
Can you tell me a bit about your main research interests? What led you to work in this field?
My research interests include cloud computing, dependability, security and energy efficiency, partly because I supervise several PhD students doing different things. I take on new research problems which interest me personally – so a few years ago I thought there were interesting problems in cloud computing. For example, as a cloud customer, how can I trust that confidentiality is ensured for my data and its computation? How do we ensure the integrity of the data and its availability?
Cloud computing has some great advantages with regard to energy efficiency because the shared infrastructure means that you can provide infrastructure for peak load. Computers are more fully utilised and are therefore more energy efficient.
There are varying daily patterns, with some periods experiencing high loads and others low loads. Most data centres don’t switch off their machines; as a result, a 2012 study by the New York Times found that data centres can waste 90% of the electricity they take from the grid, as they use only a small percentage of the electricity powering their servers to perform computations and the rest to keep servers idling – and servers are idle for about 90% of the time. You could therefore potentially achieve energy savings in the region of an order of magnitude if the servers were utilised 80% of the time. One way of doing this would be to consolidate the load, moving all the computation onto a few machines and switching off the other machines. This presents significant challenges, as moving applications too often should be avoided.
What does Dresden bring in particular to the ParaDIME project?
TUD has two large projects which are complementary to ParaDIME. The first is a cluster which forms part of the German excellence initiative, the Centre for Advancing Electronics. In this cluster we look at resilient computing, and one of the topics we consider is how we can lower the energy consumption of computers to a point where there might be errors in the computation, so that we can try to detect them, correct them and from there aim not to introduce them in the first place. We are also investigating new technology, material and devices, such as carbon nanotubes for integrated circuits, which might provide better energy efficiency.
The other major project is SG Labs Germany, which researches the next generation of wireless networks which will replace long-term evolution (LTE) networks. In this project, we are researching edge clouds to see how we can distribute computing such that we reduce the latency in order to communicate/compute within less than a millisecond. Applying this in the area of the tactile internet, for example, providing a low response time and give users immediate feedback so they would not notice any latency, would allow the creation of a range of new applications, such as in the domains of health or music.
What, for you, are the most compelling reasons why we should create more energy-efficient computing systems?
One reason is the cost of computing, as energy consumption contributes to the total cost of ownership. Another reason is that if computers are more energy efficient you could pack them closer together, thereby achieving a higher compute density in the data centre and reducing the space required for the data centre.
Ecological reasons are also important: the total electricity consumption by the ICT infrastructure is greater than that of India. It therefore makes a lot of sense from an ecological perspective to increase energy efficiency.
What, for you, are the key technical challenges which need to be tackled in order to achieve more energy-efficient computing systems?
One of the ways to save energy in data centres is to switch off some of the machines when they are not needed. However, this poses technical problems as the machines might not come back on when switched on, so technicians would theoretically need to be available in case of any issues.
Storage is usually attached directly to the computer, as there is higher throughput when you attach solid-state drives directly to the computing nodes. If you switch off the machine, the storage attached to it is lost. We need to find a solution which would allow us to support directly attached storage and still be able to switch off machines.
Is it possible to deliver genuine energy savings while achieving optimum performance?
It depends what you mean by optimum performance. As computers running at maximum capacity are more energy efficient, we need to keep them at a high level of utilisation. The most efficient algorithms should also be the most energy efficient, but in parallel computing you often want to reduce the runtime of an application, which you do by parallelisation. However, you almost never get linear increases in speed. This means you pay a price in terms of energy in order to achieve shorter runtimes, so the throughput per server is lower than in the case of sequential programming.
So what we would have to do is to maximise the throughput per server of an application and not minimise the runtime of the application. In so doing I think we can achieve genuine energy savings for batch jobs, but this might not be the case for interactive jobs, where you need to get a response quickly.
What are the main improvements which you would like to see in the computing systems of the future?
What I really want to see is decentralised computing infrastructure that will compute at the edges of the internet. That would have a positive impact on energy efficiency as less data would have to be transferred and therefore the energy consumption of the network would be reduced. If you keep computation local you can achieve much higher energy efficiency in comparison to a centralised system where you have to transfer data across Europe, for example, and back again.
What will the lasting impact of the ParaDIME project be?
Edge computing will allow devices to become more ‘intelligent’: with traffic systems, for example, you could offload some of the intelligence of controlling the cars and routing the traffic. To optimise the energy consumption of cars and schedule routes you would need intelligent computational infrastructure which we don’t have today but could have in the future. Cars could therefore be more connected and autonomous and could interact with the system around them, which could lead to more energy-efficient transportation.