The State of Data Mining, Benchmarks, Double Blind Trials, and Software Engineering Systems in Industry
This program is tentative and subject to change.
This Vision and Reflection talk presents lessons from building and evaluating software engineering systems in industry, spanning both traditional developer infrastructure and AI-assisted tools. The focus is on the interaction between data mining from software repositories, benchmark-based evaluation, and controlled experimental trials in production environments.
The talk first examines how benchmarks are designed and used for training and system comparison. Mining results and benchmark improvements provide evidence for selecting which model or system should be rolled into a production experiment, but they are not sufficient indicators of real-world impact. The talk then describes experimental methodology for defining goal metrics and safety or guardrail metrics, and for running double blind randomized trials or A/B experiments in production settings. This includes examples where AI benchmarks showed promising gains, but deployment led to productivity regressions that were identified through safety trials and addressed by updating the developer workflow.
Examples are drawn from recent studies and deployed systems, including work on repository mining, code reviewer recommendation, workload balancing, code change risk prediction, developer tooling, and program repair systems. Discrepancies between historical back-testing and randomized trials are highlighted, along with cases where controlled experiments revealed effects not visible from benchmark results alone. The keynote concludes by arguing for evaluation practices that tightly integrate data mining, benchmarks, and double blind trials to ensure that software engineering systems deliver measurable value in practice.
This program is tentative and subject to change.
Tue 14 AprDisplayed time zone: Brasilia, Distrito Federal, Brazil change
09:00 - 10:30 | Plenary: Awards & Keynote II / Industry Track / Technical Papers / Junior PC / Tutorials / MSR Awards / / FOSS Award / Social Events / Vision and Reflection / Data and Tool Showcase Track / Mining Challenge / / / / Registered Reports / Keynotes / MSR Program at Oceania IV Chair(s): Gema Rodríguez-Pérez Department of Computer Science, Mathematics, Physics and Statistics, University of British Columbia, Okanagan Campus, Igor Steinmacher RESHAPE LAB, Northern Arizona University, USA, Bianca Trinkenreich Colorado State University | ||
09:00 10mTalk | Day Opening MSR Program | ||
09:10 30mTalk | The State of Data Mining, Benchmarks, Double Blind Trials, and Software Engineering Systems in Industry Vision and Reflection Peter Rigby Concordia University; Meta | ||
09:40 50mKeynote | From Hallucinations to Helpful Agents: Advancing Trustworthy Automation in Code Review Keynotes Patanamon Thongtanunam The University of Melbourne | ||
