Lin Tan (Purdue)- LLMs for Code: More Data or More Domain Knowledge? Can They Replace Programmers?

Date & Time:

November 19, 2024 12:30 pm – 1:30 pm

Location:

Crerar 298, 5730 S. Ellis Ave., Chicago, IL,

11/19/2024 12:30 PM 11/19/2024 01:30 PM America/Chicago Lin Tan (Purdue)- LLMs for Code: More Data or More Domain Knowledge? Can They Replace Programmers? Crerar 298, 5730 S. Ellis Ave., Chicago, IL,

Abstract: Recent techniques leverage deep learning techniques, including large language models (LLMs), to improve coding tasks such as code generation, automated program repair, security vulnerability fixing, and binary analysis. An important question is, whether adding more data or more domain knowledge to deep-learning models is a more effective direction to improve LLMs for code. I will discuss existing studies and techniques that answer this question positively or negatively. I will also introduce our code-generation benchmark RepoCod, which answers the question, “Can Language Models Replace Programmers?”, to some extent. RepoCod tasks are real-world, whole-function code generation with repository-level context and contain test cases for validation. Our results show that GPT-4o and other LLMs achieve < 30% pass@1 on RepoCode’s code generation tasks.

https://lt-asset.github.io/REPOCOD/

Speakers

Lin Tan

Mary J. Elmore New Frontiers Professor. Purdue University

Lin Tan is a Mary J. Elmore New Frontiers Professor in the Department of Computer Science at Purdue University. She received her PhD from the University of Illinois, Urbana-Champaign. Prior to joining Purdue, she was a Canada Research Chair and an associate professor at the University of Waterloo. Her research interests include software dependability, software-AI synergy, and software text analytics. Some of her research focuses are leveraging machine learning and natural language processing techniques to improve software dependability, and using software approaches to improve the dependability of machine learning systems. Dr. Tan’s co-authored papers have received ACM Distinguished Paper Awards at CCS 2024, ASE 2020, MSR 2018, and FSE 2016; and IEEE Micro’s Top Picks in 2006. Dr. Tan was a recipient of an Early Career Academic Achievement Alumni Award by the University of Illinois, Urbana-Champaign, Canada Research Chair, an NSERC Discovery Accelerator Supplements Award, an Ontario Early Researcher Award, an Ontario Professional Engineers Award–Engineering Medal for Young Engineer, and multiple industry awards including J.P.Morgan AI Faculty Research Awards, Meta/Facebook Research Awards, Google Faculty Research Awards, and an IBM CAS Research Project of the Year Award. She has served as program co-chair of FSE 2024 (one of the top 2 conferences in software engineering). She was an associate editor of IEEE Transactions on Software Engineering (2017-2022) and Springer Empirical Software Engineering Journal (2015-2021). She was the ACM SIGSOFT Treasurer and an elected Member-at-Large (2021-2024).

Resources

Community

Innovation at the Forefront: UChicago CS Researchers Make Significant Contributions to CHI 2025

The University of Chicago Hosts the First Great Lakes Graphics Workshop

Quantum Materials, Built By AI Robot

Sarah Sebo (UChicago)- A Tutorial on Designing a Human-Subjects Research Study

Tianyu Gao (Princeton)- Enabling Language Models to Process Information at Scale

Department of Computer Science’s Alumni Weekend Events

“Machine Learning Foundations Accelerate Innovation and Promote Trustworthiness” by Rebecca Willett

Nightshade: Data Poisoning to Fight Generative AI with Ben Zhao

Ian Foster – Better Information Faster: Programming the Continuum

Speakers

Lin Tan

Innovation at the Forefront: UChicago CS Researchers Make Significant Contributions to CHI 2025

The University of Chicago Hosts the First Great Lakes Graphics Workshop

Quantum Materials, Built By AI Robot

New Research Explores Augmented Breathing Through Thermal Feedback

University of Chicago’s Fred Chong Awarded $2 Million for Innovative Quantum Computing Cancer Research Project

Helping Elementary School Children Learn About Digital Privacy and Security With Micro-Lessons

New Study Reveals Gaps in Common Types of Cybersecurity Training

Jasmine Lu on Sustainable Computing: Rethinking E-Waste and Innovation

Pedro Lopes Honored with 2025 IEEE VGTC Virtual Reality Significant New Researcher Award

University of Chicago Researchers Revolutionize Network Traffic Generation with AI Breakthrough

Federal budget cuts threaten to decimate America’s AI superiority—and other countries are watching

The Hidden Cost of Netflix’s Autoplay: A Study on Viewing Patterns and User Control