Asst. Prof. Aaron Elmore Receives CAREER Award for Resource-Efficient Databases

For decades, the number one mission of databases was speed. Researchers and software developers raced to design systems that sifted through ever larger and complex datasets under the hood, returning desired answers to their users as quickly as possible. With Moore’s Law scaling up the available computation, resource use wasn’t usually a concern. But that luxury is coming to an end, and resource efficiency is a new priority as more computation shifts to the pay-for-compute cloud and remote devices.

Aaron Elmore, assistant professor at UChicago CS, develops database models that address this need, giving users the power to sacrifice speed for reduced resource use and cost. His approach, intermittent query processing (IQP), grafts machine learning prediction to database processing, providing more efficient computation to systems working with bursty data or intermittent monitoring. As a new recipient of the CAREER award, the National Science Foundation’s most prestigious award in support of early-career faculty, Elmore will continue designing these innovative systems for data-driven applications.

[Read about the five UChicago CS faculty who received NSF CAREER awards in the 2021 cycle.]

Many modern technologies, such as Internet of Things devices, sensors, or e-commerce, produce “bursty data” — data that comes in spaced-out, unpredictable packets. But databases are typically designed for either relatively static datasets that are updated infrequently, or more recently, steady streams of data. For these database types, systems use batch processing or continuous query strategies, but bursty data is a poor fit for either framework.

“Database systems traditionally were designed such that when you ask them to do something, they do everything they can do to get you that answer right now, or if data is constantly coming in, as soon as they get data they update the answer,” Elmore said. “My vision was to think about what we can do when we know that either the data, or a user’s interest, is going to be bursty. In this case, how can we be smart about deferring work to save resources?”

An example might be a bank keeping track of customer accounts and wanting to monitor balances that are higher than the average. While the overall average can be easily updated as new records are added to the database, searching for outliers requires comparing every account to the average every time it changes, a more computationally intensive task. IQP analyzes and separates out these “easy” and “hard” tasks, and then schedules them at the frequency that meets the user’s expectations, whether that’s determined by time, cost, or available computation.

“It might be a case where you’re on an edge or a battery-powered device, and you have limited resources. Or you could be on a cloud, where you’re paying for everything that you do,” Elmore said. “We wanted a knob that somebody could turn and say, ‘if I slow things down, or if I’m willing to trade this off, can I save some amount of money?’”

Elmore’s system folds in machine learning to help make these decisions by predicting when new data will arrive and how long different tasks will take. Because the database language SQL is declarative — users specify what they want, not how to do it — these estimates aren’t simple, Elmore said. So IQP builds a model on previous runs of the query, predicting future runtimes and resource usage. Users can then make their choices about how much they prioritize cost versus performance, and the database will automatically find the level of operation that satisfies those high-level goals.

Thus far, Elmore has created early versions of IQP in Spark and with the programming language Rust. The latter project, nicknamed CrustyDB, is used in his Introduction to Database Systems course, co-taught with Assistant Professor Raul Castro Fernandez. The IQP system is also part of a project with Sanjay Krishnan and Michael Franklin called CrocodileDB — so named because crocodiles remain motionless for long periods of time before they quickly strike, just like a database running intermittent queries.

The research involved in developing these systems goes hand in hand with Elmore’s teaching and mentoring work. The pedagogical databases he uses to teach database principles to computer science and Master’s in Computational Analysis and Public Policy (MS-CAPP) students include many IQP features, allowing students with an interest in database research to easily contribute to the project. Elmore has also taught younger students database basics as a mentor in the “BigDataX: From Theory to Practice in Big Data computing at eXtreme Scales” REU program and through the compileHer student group and their annual tech capstone event for middle school girls.

“As the value of data continues to grow, database systems are increasingly a critical part for teams working on data science, AI, big data, or IoT platforms,” Elmore said. “As faculty, it is essential to train the generation working with data to understand the solutions, principles, and opportunities of database systems.”

Resources

Community

AI-Powered Network Management: GATEAU Project Advances Synthetic Traffic Generation

Sebo Lab: Programming robots to better interact with humans

UChicago CS Student Awarded NSF Graduate Research Fellowship

Arman Shehabi (Lawrence Berkeley National Laboratory)- Powering the AI Revolution: Navigating the Surge in Data Center Energy

AI & Scientific Discovery Online Seminar

Basheer Tome: Prototyping Hardware Interfaces for Consumer Products

Inside The Lab: How Can Robots Improve Our Lives?

The Future of AI Panel: Alumni Weekend

Can we authenticate human creativity?

UChicago Partners On New National Science Foundation Large-Scale Research Infrastructure For Education

Data Ecology: A Socio-Technical Approach to Controlling Dataflows

NeurIPS 2023 Award-winning paper by DSI Faculty Bo Li, DecodingTrust, provides a comprehensive framework for assessing trustworthiness of GPT models

“Machine Learning Foundations Accelerate Innovation and Promote Trustworthiness” by Rebecca Willett

Nightshade: Data Poisoning to Fight Generative AI with Ben Zhao

UChicago Undergrad Analyzes Machine Learning Models Used By CPD, Uncovers Lack of Transparency About Data Usage

In The News: U.N. Officials Urge Regulation of Artificial Intelligence

UChicago Computer Scientists Bring in Generative Neural Networks to Stop Real-Time Video From Lagging

UChicago Team Wins The NIH Long COVID Computational Challenge

UChicago Assistant Professor Raul Castro Fernandez Receives 2023 ACM SIGMOD Test-of-Time Award

Mike Franklin, Dan Nicolae Receive 2023 Arthur L. Kelly Faculty Prize

PhD Student Kevin Bryson Receives NSF Graduate Research Fellowship to Create Equitable Algorithmic Data Tools