Beyond Bug Fixes: An Empirical Investigation of Post-Merge Code Quality Issues in Agent-Generated Pull Requests (MSR 2026 - Mining Challenge)

Mon 13 - Tue 14 April 2026 Rio de Janeiro, Brazil

co-located with ICSE 2026

Who

Shamse Tasnim Cynthia, Al Muttakin, Banani Roy

Track

MSR 2026 Mining Challenge

Abstract

The increasing adoption of AI coding agents has led to a growing number of agent-generated pull requests (PRs) being merged with little or no human intervention. While such agentic PRs promise productivity gains, their impact on code quality remain underexplored. While prior work has evaluated coding agents using benchmarks and controlled tasks, large-scale evidence on post-merge quality issues in agentic PRs remains limited. In this study, we analyze 1,210 merged agent-generated bug-fix PRs from Python repositories in the AIDev dataset. Using SonarQube, we perform a differential analysis between base and merged commits to identify code quality issues newly introduced by PR changes. We examine issue frequency, density, severity, and rule-level prevalence across five agents. Our results show that apparent differences in raw issue counts across agents largely disappear after normalizing by code churn, indicating that higher issue counts are primarily driven by larger PRs. Across all agents, code smells dominate, particularly at critical and major severities, while bugs are less frequent but often severe. Security hotspots occur unevenly across agents, most notably for OpenAI Codex. Overall, our findings show that merge success does not reliably reflect post-merge code quality, highlighting the need for systematic, size-aware quality checks for agent-generated bug-fix PRs.

Beyond Bug Fixes: An Empirical Investigation of Post-Merge Code Quality Issues in Agent-Generated Pull Requests

Shamse Tasnim Cynthia

University of Saskatchewan

Canada

Al Muttakin

University of Saskatchewan

Bangladesh

Banani Roy

University of Saskatchewan

Tracks