I’m an assistant professor at Stanford CS, where I work on computer systems and machine learning as part of Stanford DAWN. I’m also co-founder and Chief Technologist of Databricks, a data and AI platform startup. Before joining Stanford, I was an assistant professor at MIT.
    I’m interested in computer systems for emerging large-scale workloads such as machine learning, big data analytics and cloud computing. In DAWN, we’re working on infrastructure for usable machine learning to make it dramatically easier to bring ML applications to production: these issues are often much larger obstacles than ML algorithms in practice. My work includes software runtimes, quality assurance tools and systems optimizations for ML. Beyond usability, I am intersted in data privacy as the flipside to big data, and have worked on systems that can provide scalable privacy for communication, Internet queries and SaaS applications.
    :Our group works closely with the open source community to test and publish our ideas. During my PhD, I started the Apache Spark project, which is now one of the most widely used frameworks for distributed data processing, and co-started other widely used datacenter software such as Apache Mesos, Alluxio, and Spark Streaming. At Stanford, we developed DAWNBench, a machine learning performance competition that drew submissions from the top industry groups and influenced the industry-standard MLPerf, and we are continuing to develop open source software such as Weld, Sparser, NoScope, and MacroBase DIFF.