Abstract: Large Cluster (Data Center) Management
Cluster management is the term that Google uses to describe how we control the computing infrastructure in our data centers that supports almost all of our external services. It includes allocating resources to different applications on our fleet of computers, looking after software installations, and monitoring.
I will present an overview of these systems, introduce the new cluster management tool that Google team is building, and present some of the challenges that are exciting research opportunities. Then, I will concentrate on our job scheduling research project, which resulted in an actual implementation of our algorithm in all the existing Google’s data centers.
Joint work with Monika Henzinger, Clifford Stein and many of Google engineers.
Ana Radovanovic graduated from University of Belgrade, Faculty of Electrical Engineering in 1999. Then, she got accepted to the graduate program in Electrical Engineering, Columbia University, New York. Upon receiving her PhD in 2004, she joined IBM T.J. Watson Research Labs as a Research Staff Member. In 2008, she joined Google Research, where she has been involved in various research projects in the areas of scheduling, online advertising, traffic modeling.