How do Agents Refactor: An Empirical Study (MSR 2026 - Mining Challenge)

Mon 13 - Tue 14 April 2026 Rio de Janeiro, Brazil

co-located with ICSE 2026

Who

Lukas Ottenhof, Daniel Penner, Abram Hindle, Thibaud Lutellier

Track

MSR 2026 Mining Challenge

Abstract

Software development agents such as Claude Code, GitHub Copilot, Cursor Agent, Devin, and OpenAI Codex are being increasingly integrated into developer workflows. While prior work has evaluated agent capabilities for code completion and task automation, there is little work investigating how these agents perform Java refactoring in practice, the types of changes they make, and their impact on code quality. In this study, we present the first analysis of agentic refactoring pull requests in Java, comparing them to developer refactorings across 86 projects per group. Using RefactoringMiner and DesigniteJava 3.0, we identify refactoring types and detect code smells before and after refactoring commits. Our results show that agent refactorings are dominated by annotation changes (the 5 most common refactoring types done by agents are annotation related), in contrast to the diverse structural improvements typical of developers. Despite these differences in refactoring types, we find Cursor to be the only model to show a statistically significant increase in refactoring smells.

Link to Preprint

https://arxiv.org/abs/2601.20160

How do Agents Refactor: An Empirical Study

Lukas Ottenhof

University of Alberta

Daniel Penner

University of Alberta

Canada

Abram Hindle

University of Alberta

Canada

Thibaud Lutellier

University of Alberta

Canada

Tracks