An Empirical Study on Line-Level Software Defect Prediction (MSR 2026 - Technical Papers)

Who

Enci Zhang, Yutong Jiang, Tianmeng Zhang, Haonan Tong

Track

MSR 2026 Technical Papers

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 13 Apr 2026 11:30 - 11:40 at Oceania IV - Session 1-B: Quality & Security I

Abstract

Line-level software defect prediction (LLDP) is crucial for locating and modifying defective code and has drawn increasing attention from both academic and industrial communities. Recently, a series of deep learning-based LLDP models have been proposed. However, there is no systematic comparison of these LLDP models. In this paper, we aim to compare the performance of these models, evaluate their consistency in different scenarios, and investigate whether fusing handcrafted features can improve the prediction performance. To this end, we conducted a comprehensive empirical study on eight recent state-of-the-art models using 32 benchmark datasets from nine software projects’ releases. We evaluated them in both cross-version and cross-project scenarios with four widely used performance metrics, including AUC, Recall@Top20%LOC, Effort@Top20%Recall, and IFA. Moreover, we investigated the impact of handcrafted features on LLDP models based on two different fusion strategies. The results show that 1) the difference among these models is statistically significant; 2) no model is always superior across all performance metrics but Bugsplorer generally performs best except IFA; 3) model performance rankings are inconsistent between cross-version and cross-project scenarios, revealing that a model’s effectiveness is highly scenario-dependent; 4) furthermore, fusing handcrafted features significantly improves model prediction performance, and the fusion strategy also matters. In conclusion, the selection of an LLDP model should be guided by specific scenarios and performance metrics, and the hybrid model that combines deep learning representations with handcrafted features is a promising alternative for LLDP.

Enci Zhang

Beijing Jiaotong University

Yutong Jiang

Beijing Jiaotong University

Tianmeng Zhang

Beijing Jiaotong University

Haonan Tong

Beijing Jiaotong University

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 13 Apr
Displayed time zone: Brasilia, Distrito Federal, Brazil change

11:00 - 12:30	Session 1-B: Quality & Security ITechnical Papers / Industry Track / MSR Program at Oceania IV

11:00 10m Research paper		Where Do Smart Contract Security Analyzers Fall Short? Technical Papers Tamer Abdelaziz NYU Abu Dhabi, Salma Alsaghir NYU Abu Dhabi, Karim Ali NYU Abu Dhabi DOI Pre-print
11:10 10m Talk		An Empirical Study of Vulnerabilities in Python Packages and Their Detection Technical Papers Haowei Quan Monash University, Junjie Wang Tianjin University, Xinzhe Li College of Intelligence and Computing, Tianjin University, Terry Yue Zhuo Monash University and CSIRO's Data61, Xiao Chen University of Newcastle, Xiaoning Du Monash University
11:20 10m Talk		Does Programming Language Matter? An Empirical Study of Fuzzing Bug Detection Technical Papers Tatsuya Shirai Nara Institute of Science and Technology, Olivier Nourry The University of Osaka, Yutaro Kashiwa Nara Institute of Science and Technology, Kenji Fujiwara Nara Women’s University, Hajimu Iida Nara Institute of Science and Technology
11:30 10m Talk		An Empirical Study on Line-Level Software Defect Prediction Technical Papers Enci Zhang Beijing Jiaotong University, Yutong Jiang Beijing Jiaotong University, Tianmeng Zhang Beijing Jiaotong University, Haonan Tong Beijing Jiaotong University
11:40 10m Talk		Characterizing and Modeling the GitHub Security Advisories Review Pipeline Technical Papers Claudio Segal UFF, Paulo Segal UFF, Carlos Eduardo de Schuller Banjar UFRJ, Felipe Paixão Federal University of Bahia (UFBA), Hudson Silva Borges UFMS, Paulo Silveira Neto Federal University Rural of Pernambuco, Eduardo Santana de Almeida Federal University of Bahia, Joanna C. S. Santos University of Notre Dame, Anton Kocheturov Siemens Technology, Gaurav Kumar Srivastava Siemens, Daniel Sadoc Menasche UFRJ, Brazil Pre-print
11:50 10m Talk		Linux Kernel Recency Matters, CVE Severity Doesn’t, and History Fades Technical Papers Piotr Przymus Nicolaus Copernicus University in Toruń, Poland, Witold Weiner Nicolaus Copernicus University in Toruń and Adtran Networks Sp. z o.o, Krzysztof Rykaczewski Nicolaus Copernicus University in Toruń, Poland, Gunnar Kudrjavets Amazon Web Services, USA Pre-print
12:00 10m Talk		Beyond Single Code Changes: An Empirical Study of Topic-Based Code Review Practices in Gerrit for OpenStack Technical Papers Moataz Chouchen Concordia University, Mahi Begoug ETS Montreal, Ali Ouni Ecole de Technologie Superieure (ETS)
12:10 10m Talk		LogSieve: Task-Aware CI Log Reduction for Sustainable LLM-Based Analysis Technical Papers Marcus Barnes University of Toronto, Taher A. Ghaleb Trent University, Safwat Hassan University of Toronto Pre-print
12:20 5m Talk		Finding Important Stack Frames in Large Systems Industry Track Aleksandr Khvorov JetBrains; Constructor University Bremen, Yaroslav Golubev JetBrains Research, Denis Sushentsev JetBrains
12:25 5m Talk		Stop Comparing Apples and Oranges: Matching for Better Results in Mining Software Repositories Studies Technical Papers Sabato Nocera University of Salerno, Nyyti Saarimäki University of Luxembourg, Valentina Lenarduzzi University of Southern Denmark, Davide Taibi University of Southern Denmark and University of Oulu, Sira Vegas Universidad Politecnica de Madrid