Consistent or Sensitive? Automated Code Revision Tools Against Semantics-Preserving Perturbations (MSR 2026 - Technical Papers)

Who

Shirin Pirouzkhah, Souhaila Serbout, Alberto Bacchelli

Track

MSR 2026 Technical Papers

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 14 Apr 2026 12:00 - 12:10 at Oceania V - Session 1-A: AI & Autonomous Agents

Abstract

Automated Code Revision (ACR) tools aim to reduce human effort in code revision by automatically generating revised code based on reviewer feedback and have shown promising performance. Yet, for them to be reliable in real-world software development practice, they must be consistent in generating code revisions for input code variants that express the same issue.This challenge becomes even more pronounced in the case of AI-assisted ACR tools, whose probabilistic generation process can lead to nondeterministic and inconsistent revisions across semantically equivalent code variants. In this paper, we evaluate the consistency of five state-of-the-art transformer-based ACR tools against semantics-preserving perturbations, controlled code modifications that alter a program’s structure or syntax without changing its behavior. We designed ten types of such perturbations and applied them to 2,032 Java methods from real-world GitHub projects, generating over 18K perturbed variants for evaluation. These perturbations were applied one at a time in order isolate the effect of each perturbation type and systematically assess how consistently models such as T5, LLaMA variants, ChatGPT, and DeepSeek generate correct revisions for semantically identical but syntactically altered code. Our findings show that the later models’ ability to generate correct revisions can drop by up to 45.3%, when presented with semantically equivalent yet structurally modified code. The extreme drop in consistency manifests when perturbations occur near the specific lines of code referenced in the reviewer’s feedback. The closer the perturbation is to this target region, the more likely an ACR tool fails to generate the correct revision. We further explored potential mitigation strategies, evaluating attention-based prompt heuristics, such as Chain-of-Thought prompting, repeating the referenced code region within the review comment, or embedding the feedback directly as inline code comments near that region, which did not lead to any improvement in consistency.

Link to Preprint

https://arxiv.org/abs/2602.14595

Shirin Pirouzkhah

University of Zurich

Switzerland

Souhaila Serbout

University of Zurich, Zurich, Switzerland

Switzerland

Alberto Bacchelli

IfI, University of Zurich

Switzerland

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 14 Apr
Displayed time zone: Brasilia, Distrito Federal, Brazil change

11:00 - 12:30	Session 1-A: AI & Autonomous AgentsTechnical Papers / MSR Program at Oceania V

11:00 10m Talk		Speed at the Cost of Quality: How Cursor AI Increases Short-Term Velocity and Long-Term Complexity in Open-Source Projects Technical Papers Hao He Carnegie Mellon University, Courtney Miller Carnegie Mellon University, Shyam Agarwal Carnegie Mellon University, Christian Kästner Carnegie Mellon University, Bogdan Vasilescu Carnegie Mellon University Pre-print
11:10 10m Talk		LLM-Based Detection of Tangled Code Changes for Higher-Quality Method-Level Bug Datasets Technical Papers Md Nahidul Islam Opu University of Manitoba, Shaowei Wang University of Manitoba, Shaiful Chowdhury University of Manitoba Pre-print
11:20 10m Talk		Adversarial Bug Reports as a Security Risk in Language Model-Based Automated Program Repair Technical Papers Piotr Przymus Nicolaus Copernicus University in Toruń, Poland, Andreas Happe TU Wien, Jürgen Cito TU Wien Pre-print
11:30 10m Talk		Investigating Autonomous Agent Contributions in the Wild: Activity Patterns and Code Change over Time Technical Papers Răzvan Mihai Popescu Delft University of Technology, David Gros University of California, Davis, Andrei Botocan Delft University of Technology, Rahul Pandita GitHub, Inc., Prem Devanbu University of California at Davis, Maliheh Izadi Delft University of Technology
11:40 10m Talk		Evaluating the Use of LLMs for Automated DOM-Level Resolution of Web Performance Issues Technical Papers Gideon Peters Concordia University, SayedHassan Khatoonabadi Concordia University, Emad Shihab Concordia University
11:50 10m Talk		Are Coding Agents Generating Over-Mocked Tests? An Empirical Study Technical Papers Andre Hora UFMG, Romain Robbes CNRS, LaBRI, University of Bordeaux Pre-print
12:00 10m Talk		Consistent or Sensitive? Automated Code Revision Tools Against Semantics-Preserving Perturbations Technical Papers Shirin Pirouzkhah University of Zurich, Souhaila Serbout University of Zurich, Zurich, Switzerland, Alberto Bacchelli IfI, University of Zurich Pre-print
12:10 10m Talk		Beyond the Prompt: An Empirical Study of Cursor Rules Technical Papers Shaokang Jiang University of California, Irvine, Daye Nam University of California, Irvine Pre-print
12:20 10m Talk		Bridging Design and Implementation: A Study of Multi-Agent LLM Architectures for Automated Front-End Generation Technical Papers Caren Rizk Concordia University, SayedHassan Khatoonabadi Concordia University, Emad Shihab Concordia University