MSR 2026 - Mining Challenge

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

You're viewing the program in a time zone which is different from your device's time zone change time zone

Mon 13 Apr
Displayed time zone: Brasilia, Distrito Federal, Brazil change

	09:00 - 10:30	Plenary: Opening + Keynote IMSR Program at Oceania V

11:00 - 12:30	Session 1-B: Quality & Security ITechnical Papers / Industry Track / MSR Program at Oceania IV

11:00 10m Research paper		Where Do Smart Contract Security Analyzers Fall Short? Technical Papers Tamer Abdelaziz NYU Abu Dhabi, Salma Alsaghir NYU Abu Dhabi, Karim Ali NYU Abu Dhabi DOI Pre-print
11:10 10m Talk		An Empirical Study of Vulnerabilities in Python Packages and Their Detection Technical Papers Haowei Quan Monash University, Junjie Wang Tianjin University, Xinzhe Li College of Intelligence and Computing, Tianjin University, Terry Yue Zhuo Monash University and CSIRO's Data61, Xiao Chen University of Newcastle, Xiaoning Du Monash University
11:20 10m Talk		Does Programming Language Matter? An Empirical Study of Fuzzing Bug Detection Technical Papers Tatsuya Shirai Nara Institute of Science and Technology, Olivier Nourry The University of Osaka, Yutaro Kashiwa Nara Institute of Science and Technology, Kenji Fujiwara Nara Women’s University, Hajimu Iida Nara Institute of Science and Technology
11:30 10m Talk		An Empirical Study on Line-Level Software Defect Prediction Technical Papers Enci Zhang Beijing Jiaotong University, Yutong Jiang Beijing Jiaotong University, Tianmeng Zhang Beijing Jiaotong University, Haonan Tong Beijing Jiaotong University
11:40 10m Talk		Characterizing and Modeling the GitHub Security Advisories Review Pipeline Technical Papers Claudio Segal UFF, Paulo Segal UFF, Carlos Eduardo de Schuller Banjar UFRJ, Felipe Paixão Federal University of Bahia (UFBA), Hudson Silva Borges UFMS, Paulo Silveira Neto Federal University Rural of Pernambuco, Eduardo Santana de Almeida Federal University of Bahia, Joanna C. S. Santos University of Notre Dame, Anton Kocheturov Siemens Technology, Gaurav Kumar Srivastava Siemens, Daniel Sadoc Menasche UFRJ, Brazil Pre-print
11:50 10m Talk		Linux Kernel Recency Matters, CVE Severity Doesn’t, and History Fades Technical Papers Piotr Przymus Nicolaus Copernicus University in Toruń, Poland, Witold Weiner Nicolaus Copernicus University in Toruń and Adtran Networks Sp. z o.o, Krzysztof Rykaczewski Nicolaus Copernicus University in Toruń, Poland, Gunnar Kudrjavets Amazon Web Services, USA Pre-print
12:00 10m Talk		Beyond Single Code Changes: An Empirical Study of Topic-Based Code Review Practices in Gerrit for OpenStack Technical Papers Moataz Chouchen Concordia University, Mahi Begoug ETS Montreal, Ali Ouni Ecole de Technologie Superieure (ETS)
12:10 10m Talk		LogSieve: Task-Aware CI Log Reduction for Sustainable LLM-Based Analysis Technical Papers Marcus Barnes University of Toronto, Taher A. Ghaleb Trent University, Safwat Hassan University of Toronto Pre-print
12:20 5m Talk		Finding Important Stack Frames in Large Systems Industry Track Aleksandr Khvorov JetBrains; Constructor University Bremen, Yaroslav Golubev JetBrains Research, Denis Sushentsev JetBrains
12:25 5m Talk		Stop Comparing Apples and Oranges: Matching for Better Results in Mining Software Repositories Studies Technical Papers Sabato Nocera University of Salerno, Nyyti Saarimäki University of Luxembourg, Valentina Lenarduzzi University of Southern Denmark, Davide Taibi University of Southern Denmark and University of Oulu, Sira Vegas Universidad Politecnica de Madrid

11:00 - 12:30	Session 1-A: AI Agents & AutomationTechnical Papers / Industry Track / MSR Program at Oceania V

11:00 10m Talk		Toward Linking Declined Proposals and Source Code: An Exploratory Study on the Go Repository Technical Papers Sota Nakashima Kyushu University, Masanari Kondo Kyushu University, Mahmoud Alfadel University of Calgary, Aly Ahmad University of Calgary, Toshihiro Nakae DENSO CORPORATION, Hidenori Matsuzaki DENSO CORPORATION, Yasutaka Kamei Kyushu University Pre-print
11:10 10m Talk		IntelliSA: An Intelligent Static Analyzer for IaC Security Smell Detection Using Symbolic Rules and Neural Inference Technical Papers Qiyue Mei The University of Melbourne, Michael Fu The University of Melbourne Pre-print File Attached
11:20 10m Talk		Model See, Model Do? Exposure-Aware Evaluation of Bug-vs-Fix Preference in Code LLMs Technical Papers Ali Al-Kaswan Delft University of Technology, Netherlands, Claudio Spiess University of California, Davis, Prem Devanbu University of California at Davis, Arie van Deursen TU Delft, Maliheh Izadi Delft University of Technology Pre-print
11:30 10m Talk		A Match Made in Heaven? AI-driven Matching of Vulnerabilities and Security Unit Tests Technical Papers Emanuele Iannone Hamburg University of Technology, Quang-Cuong Bui Hamburg University of Technology, Riccardo Scandariato Hamburg University of Technology Pre-print
11:40 10m Talk		PhantomRun: Auto Repair of Compilation Errors in Embedded Open Source Software Technical Papers Han Fu , Sigrid Eldh Ericsson AB, Mälardalen University, Carleton University, Kristian Wiklund Ericsson AB, Andreas Ermedahl Ericsson AB; KTH Royal Institute of Technology, Philipp Haller KTH Royal Institute of Technology, Cyrille Artho KTH Royal Institute of Technology, Sweden
11:50 10m Talk		Promises, Perils, and (Timely) Heuristics for Mining Coding Agent Activity Technical Papers Romain Robbes CNRS, LaBRI, University of Bordeaux, Théo Matricon CNRS, LaBRI, University of Bordeaux, Thomas Degueule CNRS, Andre Hora UFMG, Stefano Zacchiroli LTCI, Télécom Paris, Institut Polytechnique de Paris, Palaiseau, France Pre-print
12:00 10m Talk		From Logic to Toolchains: An Empirical Study of Bugs in the TypeScript Ecosystem Technical Papers TianYi Tang Simon Fraser University, Saba Alimadadi Simon Fraser University, Nick Sumner Simon Fraser University Pre-print
12:10 10m Talk		Are We All Using Agents Now? An Empirical Study of Core and Peripheral Developers’ Use of Coding Agents Technical Papers Shamse Tasnim Cynthia University of Saskatchewan, Joy Krishan Das University of Saskatchewan, Banani Roy University of Saskatchewan
12:20 5m Talk		Context Engineering for AI Agents in Open-Source Software Technical Papers Seyedmoein Mohsenimofidi Heidelberg University, Matthias Galster University of Canterbury, Christoph Treude Singapore Management University, Sebastian Baltes Heidelberg University Pre-print
12:25 5m Talk		A Blueprint for Trustworthy Code Annotation at Scale: An LLM-Powered Pipeline for Industrial Software Analytics Industry Track Ailon dos Santos Teixeira UFAM, Jaine Brito da Silva UFAM, Nikolas Rocha de Medeiros UFAM, Raimundo da Silva Barreto UFAM, José Reginaldo Hughes Carvalho UFAM, Alex Fernando Monteiro UFAM

14:00 - 15:30	Session 2 - PostersMSR Program / Mining Challenge at Catering and Exhibition Hall (Europa I to IV)

14:00 90m Poster		Session 2 - Posters MSR Program
14:00 90m Talk		When AI Agents Touch CI/CD Configurations: Frequency and Success Mining Challenge Taher A. Ghaleb Trent University Pre-print
14:00 90m Talk		Fingerprinting AI Coding Agents on GitHub Mining Challenge Taher A. Ghaleb Trent University Pre-print
14:00 90m Talk		When AI Code Doesn’t Stick: An Empirical Study on Reverted Changes Introduced by AI Coding Agents Mining Challenge Issam Oukay Department of Software and IT Engineering, ETS Montreal, University of Quebec, Montreal, Canada, Mahi Begoug ETS Montreal, Moataz Chouchen Concordia University, Ali Ouni Ecole de Technologie Superieure (ETS)
14:00 90m Talk		Characterizing Self-Admitted Technical Debt Generated by AI Coding Agents Mining Challenge Zaki Brahmi ETS Montreal, University of Quebec, Ali Ouni Ecole de Technologie Superieure (ETS), Mohammed Sayagh ETS Montreal, University of Quebec, Mohamed Aymen saied Laval University
14:00 90m Talk		How Do Agents Perform Code Optimization? An Empirical Study Mining Challenge Huiyun Peng Purdue University, Antonio Zhong Qiu Purdue University, Ricardo Andres Calvo Mendez Purdue University, Kelechi G. Kalu Purdue University, James C. Davis Purdue University
14:00 90m Talk		Comparing AI Coding Agents: A Task-Stratified Analysis of Pull Request Acceptance Mining Challenge Giovanni Pinna University of Trieste, Jingzhi Gong King's College London, David Williams University College London, Federica Sarro University College London
14:00 90m Talk		More Code, Less Reuse: Investigation on Code Quality and Reviewer Sentiment towards AI-generated Pull Requests Mining Challenge Haoming Huang Institute of Science Tokyo, Pongchai Jaisri Nara Institute of Science and Technology, Shota Shimizu Ritsumeikan University, Lingfeng Chen Kyushu University, Sota Nakashima Kyushu University, Gema Rodríguez-Pérez Department of Computer Science, Mathematics, Physics and Statistics, University of British Columbia, Okanagan Campus DOI Pre-print
14:00 90m Short-paper		On the Adoption of AI Coding Agents in Open-source Android and iOS Development Mining Challenge Muhammad Ahmad Khan Lahore University of Management Sciences, Hasnain Ali Lahore University of Management Sciences, Muneeb Rana Xtra App Studios, Muhammad Saqib Ilyas Lahore University of Management Sciences, Abdul Ali Bangash Lahore University of Management Sciences Pre-print
14:00 90m Talk		Who Writes the Docs in SE 3.0? Agent vs. Human Documentation Pull Requests Mining Challenge Kazuma Yamasaki Nara Institute of Science and Technology, Joseph Ayobami Joshua Nara Institute of Science and Technology, Tasha Settewong Nara Institute of Science and Technology, Mahmoud Alfadel University of Calgary, Kazumasa Shimari Nara Institute of Science and Technology, Kenichi Matsumoto Nara Institute of Science and Technology DOI Pre-print
14:00 90m Talk		Testing with AI Agents: An Empirical Study of Test Generation Frequency, Quality, and Coverage Mining Challenge Suzuka Yoshimoto NARA Institute of Science and Technology, Shun Fujita NARA Institute of Science and Technology, Kosei Horikawa , Daniel Feitosa University of Groningen, Yutaro Kashiwa Nara Institute of Science and Technology, Hajimu Iida Nara Institute of Science and Technology
14:00 90m Talk		Safer Builders, Risky Maintainers: A Comparative Study of Breaking Changes in Human vs Agentic PRs Mining Challenge K M Ferdous Kennesaw State University, Dipayan Banik Quanta Technology, Kowshik Chowdhury Kennesaw State University, Shazibul Islam Shamim Kennesaw State University
14:00 90m Talk		On the Reliability of Agentic AI in Continuous Integration Pipelines Mining Challenge Jasem Khelifi École de technologie supérieure, Mahi Begoug ETS Montreal, Ali Ouni Ecole de Technologie Superieure (ETS), Mohammed Sayagh ETS Montreal, University of Quebec, Mohamed Aymen saied Laval University, Moataz Chouchen Concordia University
14:00 90m Talk		Early-Stage Prediction of Review Effort in AI-Generated Pull Requests Mining Challenge Dao Sy Duy Minh University of Science - VNUHCM, Huynh Trung Kiet University of Science - VNUHCM, Nguyen Lam Phu Quy University of Science - VNUHCM, Pham Phu Hoa University of Science - VNUHCM, Tran Chi Nguyen University of Science - VNUHCM, Nguyen Dinh Ha Duong University of Science - VNUHCM, Truong Bao Tran University of Economics and Law -VNUHCM
14:00 90m Talk		Test Coverage of Code Changes in AI-Generated Pull Requests Mining Challenge Tales Alves Informatics Center, Federal University of Pernambuco, Leopoldo Teixeira Federal University of Pernambuco
14:00 90m Talk		When AI Teammates Meet Code Review: Collaboration Signals Shaping the Integration of Agent-Authored Pull Requests Mining Challenge Costain Nachuma Idaho State University, Minhaz Zibran Idaho State University Pre-print
14:00 90m Talk		Toward Instructions-as-Code: Understanding the Impact of Instruction Files on Agentic Pull Requests Mining Challenge Ali Arabat École de Technologie Supérieure, Mohammed Sayagh ETS Montreal, University of Quebec
14:00 90m Talk		An Empirical Study of Code Clone Genealogies in Human–AI Collaborative Development Mining Challenge Denis Sousa State University of Ceara, Brazil, Italo Uchoa State University of Ceará, Matheus Paixao State University of Ceará, Chaiyong Rakhitwetsagul Mahidol University, Thailand, Thiago Lima State University of Ceara, Brazil
14:00 90m Talk		On the Footprints of Reviewer Bots' Feedback on Agentic Pull Requests in OSS GitHub Repositories Mining Challenge Syeda Kaneez Fatima Lahore University of Management Sciences, Yousuf Abrar Lahore University of Management Sciences, Abdul Rehman Lahore University of Management Sciences, Amelia Nawaz Lahore University of Management Sciences, Shamsa Abid National University of Computer and Emerging Sciences, Abdul Ali Bangash Lahore University of Management Sciences
14:00 90m Talk		When Bots Get the Boot: Understanding Pull Request Rejections in the Era of AI Coders Mining Challenge Karla Gonzalez Royal Military College of Canada, Mariam El Mezouar Royal Military College
14:00 90m Talk		Understanding the Rejection of Fixes Generated by Agentic Pull Requests - Insights from the AIDev Dataset Mining Challenge Mahmoud Abujadallah ETS - Québec University, Ali Arabat ETS - Québec University, Mohammed Sayagh ETS Montreal, University of Quebec
14:00 90m Talk		What to Cut? Predicting Unnecessary Methods in Agentic Code Generation Mining Challenge Kan Watanabe Nara Institute of Science and Technology, Tatsuya Shirai Nara Institute of Science and Technology, Yutaro Kashiwa Nara Institute of Science and Technology, Hajimu Iida Nara Institute of Science and Technology
14:00 90m Talk		How AI Coding Agents Modify Code: A Large-Scale Study of GitHub Pull Requests Mining Challenge Daniel Ogenrwot University of Nevada Las Vegas, John Businge University of Antwerp; Flanders Make; University of Nevada at Las Vegas DOI Pre-print
14:00 90m Talk		Reliability of AI Bots Footprints in GitHub Actions CI/CD Workflows Mining Challenge Syed Muhammad Ashhar Shah Lahore University of Management Sciences, Lahore, Sehrish Habib Lahore University of Management Sciences, Lahore, Muizz Ahmed Hussain Lahore University of Management Sciences, Lahore, Maryam Abdul Ghafoor Lahore University of Management Sciences, Lahore, Abdul Ali Bangash Lahore University of Management Sciences
14:00 90m Talk		The Dose Makes the Agent: Therapeutic Index Analysis of AI Coding Contributions Mining Challenge Giuseppe Destefanis University College London, Ronnie de Souza Santos University of Calgary, Marco Ortu University of Cagliari, Mairieli Wessel Radboud University
14:00 90m Talk		Beyond Bug Fixes: An Empirical Investigation of Post-Merge Code Quality Issues in Agent-Generated Pull Requests Mining Challenge Shamse Tasnim Cynthia University of Saskatchewan, Al Muttakin University of Saskatchewan, Banani Roy University of Saskatchewan
14:00 90m Talk		Why Are Agentic Pull Requests Merged or Rejected? An Empirical Study Mining Challenge Sien Reeve O. Peralta Waseda University, Fumika Hoshi Waseda University, Hironori Washizaki Waseda University, Naoyasu Ubayashi Waseda University, Inase Kondo Osaka University, Yoshiki Higo Osaka University, Hiroki Mukai Ritsumeikan University, Norihiro Yoshida Ritsumeikan University, Kazuki Kusama , Hidetake Tanaka Nara Institute of Science and Technology, Youmei Fan Nara Institute of Science and Technology
14:00 90m Talk		Let's Make Every Pull Request Meaningful: An Empirical Analysis of Developer and Agentic Pull Requests Mining Challenge Haruhiko Yoshioka Nara Institute of Science and Technology, Takahiro Monno Nara Institute of Science and Technology, Haruka Tokumasu Kyushu University, Taiki Wakamatsu Kyushu University, Yuki Ota Ritsumeikan University, Nimmi Weeraddana University of Calgary , Kenichi Matsumoto Nara Institute of Science and Technology DOI Pre-print
14:00 90m Talk		Humans Integrate, Agents Fix: How Agent-Authored Pull Requests Are Referenced in Practice Mining Challenge Islem Khemissi Concordia University, Moataz Chouchen Concordia University, Dong Wang Tianjin University, Raula Gaikovina Kula The University of Osaka
14:00 90m Short-paper		How Do Agentic AI Systems Deal With Software Energy Concerns? A Pull Request-Based Study Mining Challenge Tanjum Motin Mitul University of Manitoba, Md. Masud Mazumder University of Manitoba, Md Nahidul Islam Opu University of Manitoba, Shaiful Chowdhury University of Manitoba Pre-print
14:00 90m Talk		When AI Writes Code: Investigating Security Issues in Agentic Software Changes Mining Challenge Esteban Dectot-Le Monnier de Gouville Polytechnique Montréal, Mohammad Hamdaqa Polytechnique Montreal, Moataz Chouchen Concordia University
14:00 90m Talk		Novice Developers Produce Larger Review Overhead for Project Maintainers while Vibe Coding Mining Challenge Syed Ammar Asdaque Lahore University of Management Sciences, Imran Haider Lahore University of Management Sciences, Muhammad Umar Malik Lahore University of Management Sciences, Maryam Abdul Ghafoor Lahore University of Management Sciences, Lahore, Abdul Ali Bangash Lahore University of Management Sciences
14:00 90m Talk		Code Change Characteristics and Description Alignment: A Comparative Study of Agentic versus Human Pull Requests Mining Challenge Dung Pham Trent University, Taher A. Ghaleb Trent University Pre-print
14:00 90m Talk		A Task-Level Evaluation of AI Agents in Open-Source Projects Mining Challenge Shojibur Rahman Idaho State University, Md Fazle Rabbi Idaho State University, Minhaz Zibran Idaho State University Pre-print
14:00 90m Talk		Behind Agentic Pull Requests: An Empirical Study on Developer Interventions in AI Agent-Authored Pull Requests Mining Challenge Syrine Khelifi École de technologie supérieure (ÉTS) Montréal, Ali Ouni Ecole de Technologie Superieure (ETS), Maha Khemaja ISSAT Sousse, PRINCE Lab, University of Sousse
14:00 90m Talk		Readability of AI-Generated Pull Request Descriptions Across Pull Request Types Mining Challenge Aidan Tobar Bowling Green State University, Joseph Peterson Bowling Green State University, Abbas Heydarnoori Bowling Green State University
14:00 90m Short-paper		The Quiet Contributions: Insights into AI-Generated Silent Pull Requests Mining Challenge S. M. Mahedy Hasan Idaho State University, Md Fazle Rabbi Idaho State University, Minhaz Zibran Idaho State University Pre-print
14:00 90m Talk		AI IDEs or Autonomous Agents? Measuring the Impact of Coding Agents on Software Development Mining Challenge Shyam Agarwal Carnegie Mellon University, Hao He Carnegie Mellon University, Bogdan Vasilescu Carnegie Mellon University Pre-print
14:00 90m Talk		AI builds, We Analyze: An Empirical Study of AI-Generated Build Code Quality Mining Challenge Anwar Ghammam University of Michigan - Dearborn, Mohamed Almukhtar University of Michigan-Flint
14:00 90m Talk		Understanding Dominant Themes in Reviewing Agentic AI-authored Code Mining Challenge Md. Asif Haider University of California, Irvine, Thomas Zimmermann University of California, Irvine Pre-print
14:00 90m Talk		Why and When Agentic Pull Requests are (not) Accepted: An Exploratory Study Mining Challenge Marius Christoph Strauss Anhalt University of Applied Sciences, Sandro Schulze Anhalt University of Applied Sciences DOI Pre-print
14:00 90m Talk		Analyzing Message-Code Inconsistency in AI Coding Agent-Authored Pull Requests Mining Challenge Jingzhi Gong King's College London, Giovanni Pinna University of Trieste, Yixin Bian Harbin Normal University, Jie M. Zhang King's College London
14:00 90m Talk		How Do Agentic AI Systems Address Performance Optimizations? A BERTopic-Based Analysis of Pull Requests Mining Challenge Md Nahidul Islam Opu University of Manitoba, Md Shahidul Islam University of Manitoba, Muhammad Asaduzzaman University of Windsor, Shaiful Chowdhury University of Manitoba Pre-print
14:00 90m Talk		Mining Type Constructs Using Patterns in AI-Generated Code Mining Challenge Imgyeong Lee University of Alberta, Tayyib Ul Hassan University of Alberta, Abram Hindle University of Alberta
14:00 90m Talk		Bug-Fixing in the Age of AI: Human vs. Agentic Pull Requests Mining Challenge Renato Domingues UFPE, Fernando Castor University of Twente, Fernanda Madeiral Universidade Federal de Pernambuco
14:00 90m Talk		Why Are AI Agent–Involved Pull Requests (Fix-Related) Remain Unmerged? An Empirical Study Mining Challenge Khairul Alam University of Saskatchewan, Saikat Mondal University of Saskatchewan, Banani Roy University of Saskatchewan
14:00 90m Talk		LGTM! Characteristics of Auto-Merged LLM-based Agentic PRs Mining Challenge Ruben Branco LASIGE, Informática, Faculdade de Ciências, Universidade de Lisboa, Paulo Canelas Carnegie Mellon University, Catarina Gamboa Carnegie Mellon University and University of Lisbon, Alcides Fonseca LASIGE; University of Lisbon DOI Pre-print Media Attached
14:00 90m Talk		Do AI-Generated Pull Requests Get Rejected More? (Yes but Why?) Mining Challenge Rosie Wang University of Alberta, Zhou Yang University of Alberta, Alberta Machine Intelligence Institute
14:00 90m Talk		How AI Coding Agents Communicate: A Study of Pull Request Characteristics and Human Review Responses Mining Challenge Kan Watanabe Nara Institute of Science and Technology, Rikuto Tsuchida Nara Institute of Science and Technology, Takahiro Monno Nara Institute of Science and Technology, Bin Huang Nara Institute of Science and Technology, Kazuma Yamasaki Nara Institute of Science and Technology, Youmei Fan Nara Institute of Science and Technology, Kazumasa Shimari Nara Institute of Science and Technology, Kenichi Matsumoto Nara Institute of Science and Technology
14:00 90m Talk		Where Do AI Coding Agents Fail? An Empirical Study of Failed Agentic Pull Requests in GitHub Mining Challenge Ramtin Ehsani Drexel University, Sakshi Pathak Drexel University, Shriya Rawal Drexel University, Abdullah Al Mujahid Missouri University of Science and Technology, Mia Mohammad Imran Missouri University of Science and Technology, Preetha Chatterjee Drexel University, USA Pre-print
14:00 90m Talk		An Empirical Study of Tests in Agentic Pull Requests Mining Challenge Sabrina Haque The University of Texas at Arlington, Sarvesh Ingale The University of Texas at Arlington, Christoph Csallner University of Texas at Arlington DOI Pre-print Media Attached
14:00 90m Talk		Who Said CVE? How Vulnerability Identifiers Are Mentioned by Humans, Bots, and Agents in Pull Requests Mining Challenge Pien Rooijendijk Radboud University, Christoph Treude Singapore Management University, Mairieli Wessel Radboud University
14:00 90m Talk		Behavioral Analysis of AI Code Generation Agents: Edit, Rewrite, and Repetition Mining Challenge Mahdieh Abazar University of Calgary, Reyhaneh Farahmand University of Calgary, Gouri Ginde Schulich School of Engineering, University of Calgary, Calgary, Alberta, Canada, Benjamin Tan University of Calgary, Lorenzo De Carli University of Calgary, Canada
14:00 90m Talk		A Study on Code Clone Lifecycles in Pull Requests Created by AI Agents Mining Challenge Italo Uchoa State University of Ceará, Denis Sousa State University of Ceara, Brazil, Henrique Chuvas State University of Ceará, Matheus Paixao State University of Ceará, Chaiyong Rakhitwetsagul Mahidol University, Thailand, Thiago Lima State University of Ceara, Brazil
14:00 90m Talk		On Autopilot? An Empirical Study of Human–AI Teaming and Review Practices in Open Source Mining Challenge Haoyu Gao The University of Melbourne, Peerachai Banyongrakkul The University of Melbourne, Hao Guan the University of Melbourne, Mansooreh Zahedi The Univeristy of Melbourne, Christoph Treude Singapore Management University
14:00 90m Talk		From Industry Claims to Empirical Reality: An Empirical Study of Code Review Agents in Pull Requests Mining Challenge Kowshik Chowdhury Kennesaw State University, Dipayan Banik Quanta Technology, K M Ferdous Kennesaw State University, Shazibul Islam Shamim Kennesaw State University
14:00 90m Talk		A Study of Library Usage in Agent-Authored Pull Requests Mining Challenge Lukas Twist King's College London, Jie M. Zhang King's College London DOI Pre-print
14:00 90m Talk		Studying the Footprints of AI Coding Agents in Blockchain Repositories Mining Challenge Munim Iftikhar Lahore University of Management Sciences, Lahore, Maaz Shahid Lahore University of Management Sciences, Lahore, Shahreyar Ashraf Lahore University of Management Sciences, Lahore, Muhammad Saqib Ilyas Lahore University of Management Sciences, Abdul Ali Bangash Lahore University of Management Sciences
14:00 90m Talk		Human-Agent versus Human Pull Requests: A Testing-Focused Characterization and Comparison Mining Challenge Roberto Milanese Politecnico di Torino, University of Molise, Francesco Salzano University of Molise, Angelica Spina University of Molise, Antonio Vitale Politecnico di Torino, University of Molise, Remo Pareschi University of Molise, Fausto Fasano University of Molise, Mattia Fazzini University of Minnesota DOI Pre-print
14:00 90m Talk		Do AI Agents Really Improve Code Readability? Mining Challenge Kyogo Horikawa National Institute of Technology, Nara College, Kosei Horikawa , Yutaro Kashiwa Nara Institute of Science and Technology, Hidetake Uwano National Institute of Technology, Nara College, Japan, Hajimu Iida Nara Institute of Science and Technology
14:00 90m Talk		When is Generated Code Difficult to Comprehend? Assessing AI Agent Python Code Proficiency in the Wild Mining Challenge Nanthit Temkulkiat Mahidol University, Chaiyong Rakhitwetsagul Mahidol University, Thailand, Morakot Choetkiertikul Mahidol University, Thailand, Ruksit Rojpaisarnkit Nara Institute of Science and Technology, Raula Gaikovina Kula The University of Osaka
14:00 90m Talk		How do Agents Refactor: An Empirical Study Mining Challenge Lukas Ottenhof University of Alberta, Daniel Penner University of Alberta, Abram Hindle University of Alberta, Thibaud Lutellier University of Alberta Pre-print
14:00 90m Talk		An Empirical Analysis of Test Failures in AI-Generated Pull Requests Mining Challenge Alireza Hoseinpour Bowling Green State University, Sajjad Rezvani Boroujeni Bowling Green State University, Jashhvanth Tamilselvan Kunthavai Bowling Green State University, Kyle Cusimano Bowling Green State University, Abbas Heydarnoori Bowling Green State University

16:00 - 17:30	Session 3-B: Demo and tool & TutorialData and Tool Showcase Track / Tutorials / MSR Program at Oceania IV

16:00 90m Talk		MOOT: a Repository of many Multi-objective Optimization Tasks Data and Tool Showcase Track Tim Menzies North Carolina State University, Tao Chen University of Birmingham, Yulong Ye University of Birmingham, Kishan Kumar Ganguly NC State, Amirali Rayegan NC State, Srinath Srinivasan North Carolina State University, Andre Lustosa North Carolina State University
16:00 90m Talk		Mapping Decentralized Autonomous Organization Governance Across Chains: An Updated, Multi-Platform Dataset Data and Tool Showcase Track Mashiat Amin Farin University of Texas at Dallas, Samer Hassan Institute of Knowledge Technology, Universidad Complutense de Madrid, Madrid, Spain & Berkman Klein Center at Harvard University, Cambridge MA, USA, Javier Arroyo Dpt. of Computer Science, Universidad de Alcalá, Madrid, Spain & Institute of Knowledge Technology, Universidad Complutense de Madrid Madrid, Spain
16:00 90m Talk		Assessing Task-based Chatbots: Snapshot and Curated Datasets for Dialogflow Data and Tool Showcase Track Elena Masserini University of Milano - Bicocca, Diego Clerissi University of Milano-Bicocca, Daniela Micucci University of Milano-Bicocca, Italy, Leonardo Mariani University of Milano-Bicocca
16:00 90m Talk		LILA: Decentralized Build Reproducibility Monitoring for the Functional Package Management Model Data and Tool Showcase Track Julien Malka LTCI, Télécom Paris, Institut Polytechnique de Paris, France, Arnout Engelen Independent
16:00 90m Talk		Mining Kubernetes Repositories: The Cloud was Not Built in a Day Data and Tool Showcase Track Giuseppe Destefanis University College London, Silvia Bartolucci University College London, Daniel Feitosa University of Groningen
16:00 90m Talk		RustXec: A Vulnerability Reproduction Dataset for Assessing Security Risks in Open-Source Rust Applications Data and Tool Showcase Track Zhengjie Ji Virginia Tech, Xin Wang Virginia Tech, Wang Lingxiang Unaffiliated, Geng Li Wake Forest University, Fan Yang Wake Forest University, Ying Zhang Wake Forest University
16:00 90m Talk		OSSGameBench: A Large-Scale Dataset of Development Activities in Open-Source Video Games Data and Tool Showcase Track Faiz Marsad University of Calgary, Nimmi Weeraddana University of Calgary DOI Pre-print
16:00 90m Talk		JavaBackports: A Dataset for Benchmarking Automated Backporting in Java Data and Tool Showcase Track Kaushal Kahapola University of Moratuwa, Sri Lanka, Sharada Galappaththi University of Moratuwa, Sri Lanka, Dinith Ranasinghe University of Moratuwa, Sri Lanka, Ridwan Salihin Shariffdeen SonarSource, Nisansa de Silva University of Moratuwa, Sri Lanka, Srinath Perera WSO2, Sandareka Wickramanayake University of Moratuwa, Sri Lanka
16:00 90m Talk		HackRep: A Large-Scale Dataset of GitHub Hackathon Projects Data and Tool Showcase Track Sjoerd Halmans Eindhoven University of Technology, Lavinia Francesca Paganini Eindhoven University of Technology, Alexander Serebrenik Eindhoven University of Technology, Alexander Nolte Eindhoven University of Technology
16:00 90m Talk		KubeObjects: A Dataset of Real-World Kubernetes Objects Data and Tool Showcase Track Matteo Grella University of Twente, Danil Aliforenko University of Twente, Luca Mariot University of Twente
16:00 90m Talk		IssuePilot: An Agentic Framework for Personalized Issue Recommendation and Onboarding in Open-Source Projects Data and Tool Showcase Track Shlok Pandey IIIT Hyderabad, Akhila Sri Manasa Venigalla IIIT Hyderabad
16:00 90m Talk		DBSecQA: A Curated Dataset of Developer Discussions on Database Security from Stack Exchange Data and Tool Showcase Track Md Rakibul Islam Lamar University, Farha Kamal Lamar University, MD HUMAUN KABIR Lamar University, Md Murad Sharif Lamar University
16:00 90m Talk		PoolinGH: Fast, Efficient, and Robust GitHub Repository Mining Data and Tool Showcase Track Maxime ANDRÉ Namur Digital Institute, University of Namur, Marco Raglianti REVEAL @ Software Institute – USI, Lugano, Switzerland, Souhaila Serbout University of Zurich, Zurich, Switzerland, Anthony Cleve University of Namur, Michele Lanza Software Institute - USI, Lugano Pre-print
16:00 90m Talk		GivenWhenThen: A Dataset of BDD Test Scenarios Mined from Open Source Projects Data and Tool Showcase Track Luciano Belo de Alcântara Júnior UFMG, João Eduardo Montandon Universidade Federal de Minas Gerais (UFMG)
16:00 90m Talk		AnoMod: A Dataset for Anomaly Detection and Root Cause Analysis in Microservice System Data and Tool Showcase Track Ke Ping University of Helsinki, Hamza Bin Mazhar University of Helsinki, Yuqing Wang University of Helsinki, Finland, Ying Song University of Helsinki, Mika Mäntylä University of Helsinki and University of Oulu
16:00 90m Talk		GLiSE: A Prompt-Driven and ML-Powered Tool for Automated Grey Literature Extraction in Software Engineering Data and Tool Showcase Track Brahim Mahmoudi École de technologie supérieure, Zacharie Chenail-Larcher École de technologie supérieure (ÉTS), Houcine Abdelkader Cherief Ecole de Technologie Supérieure, Quentin Stiévenart Université du Québec à Montréal, Naouel Moha École de Technologie Supérieure (ETS), Florent AVELLANEDA Université du Québec à Montréal
16:00 90m Talk		OmniCCG: Agnostic Code Clone Genealogy Extractor Data and Tool Showcase Track Denis Sousa State University of Ceara, Brazil, Matheus Paixao State University of Ceará, Thiago Lima State University of Ceara, Brazil, Adriely Silva State University of Ceara, Brazil, Italo Uchoa State University of Ceará, Chaiyong Ragkhitwetsagul Mahidol University
16:00 90m Talk		GitEvo: Code Evolution Analysis for Git Repositories Data and Tool Showcase Track Andre Hora UFMG Pre-print
16:00 90m Talk		Skyt: Prompt Contracts for Software Repeatability in LLM-Assisted Development Data and Tool Showcase Track Heitor Roriz Filho Massimus, Nasser Jazdi University of Stuttgart, Vicente Lucena Universidade Federal do Amazonas
16:00 90m Talk		InEx-Bug: A Human Annotated Dataset of Intrinsic and Extrinsic Bugs in the NPM Ecosystem Data and Tool Showcase Track Tanner Wright University of British Columbia, Adams Chen University of British Columbia, Gema Rodríguez-Pérez Department of Computer Science, Mathematics, Physics and Statistics, University of British Columbia, Okanagan Campus
16:50 40m Talk		Running Large Language Models at Scale for Mining Software Repositories: Lessons Learned from HPC-Based Batch Inference Tutorials Ruoyu Su , Matteo Esposito University of Oulu, Davide Taibi University of Southern Denmark and University of Oulu, Valentina Lenarduzzi University of Southern Denmark

16:00 - 17:30	Session 3-A: Tutorial + Registered reports talksRegistered Reports / Tutorials / MSR Program at Oceania V

16:00 40m Talk		Selecting the Data Source that Matter: Fine-Tuning Domain-Specific Ecosystem Studies with MARIN Tutorials Johannes Düsing Technische Universität Dortmund, Ben Hermann University of Stuttgart
16:40 5m Talk		Ask, Then Think: Enhancing LLM Performance with Socratic Reasoning Registered Reports Antonio Della Porta University of Salerno, Jonan Richards Radboud University, Lucageneroso Cammarota University of Salerno, Stefano Lambiase Department of Computer Science, Aalborg University, Denmark, Fabio Palomba University of Salerno, Mairieli Wessel Radboud University DOI Pre-print
16:45 5m Talk		Beyond the Prompt: Assessing Domain Knowledge Strategies for High-Dimensional LLM Optimization in Software Engineering Registered Reports Srinath Srinivasan North Carolina State University, Tim Menzies North Carolina State University
16:50 5m Talk		Does Impact Analysis Support the Review of Changes to Build Specifications? Registered Reports Mahtab Nejati University of Waterloo, Mahmoud Alfadel University of Calgary, Shane McIntosh University of Waterloo DOI Pre-print
16:55 5m Talk		Parameterized Tests in Practice: Adoption, Styles, and Impact in Apache Java Projects Registered Reports Xinyi Li Stevens Institute of Technology, Lu Xiao Stevens Institute of Technology, Gengwu Zhao Stevens Institute of Technology, Sunny Wong Envestnet DOI Pre-print
17:00 5m Talk		Causal Inference for the Effect of Code Coverage on Bug Introduction Registered Reports Lukas Schulte University of Passau, Gordon Fraser University of Passau, Steffen Herbold University of Passau DOI Pre-print
17:05 5m Talk		Automated Testing of Task-based Chatbots: How Far Are We? Registered Reports Diego Clerissi University of Milano-Bicocca, Elena Masserini University of Milano - Bicocca, Daniela Micucci University of Milano-Bicocca, Italy, Leonardo Mariani University of Milano-Bicocca DOI Pre-print
17:10 5m Talk		The Influence of Code Smells in Efferent Neighbors on Class Stability Registered Reports Zushuai Zhang University of Auckland, Elliott Wen The University of Auckland, Ewan Tempero The University of Auckland DOI Pre-print

Tue 14 Apr
Displayed time zone: Brasilia, Distrito Federal, Brazil change

	09:00 - 10:30	Plenary: Awards & Keynote IIMSR Program at Oceania V

11:00 - 12:30	Session 1-B: Maintenance, Evolution & ProcessesTechnical Papers / MSR Program at Oceania IV

11:00 10m Talk		Source Code Hotspots: A Diagnostic Method for Quality Issues Technical Papers Saleha Muzammil University of Virginia, Mughees Ur Rehman Virginia Tech, Zoe Kotti AUEB & DeepSea Technologies, Diomidis Spinellis AUEB & TU Delft Pre-print
11:10 10m Talk		Evolving Kubernetes: A Technical Debt Perspective Technical Papers Jesse Maarleveld University of Groningen, Giuseppe Destefanis University College London, Daniel Feitosa University of Groningen
11:20 10m Talk		How do third-party Python libraries use type annotations? Technical Papers Eric Asare New York University Abu Dhabi, Sarah Nadi New York University Abu Dhabi Pre-print
11:30 10m Talk		Coordination at Scale in Large Distributed Development: The Case of Kubernetes Technical Papers Sabrina Aufiero University College London (UCL), Matteo Vaccargiu University of Cagliari, Silvia Bartolucci University College London, Fabio Caccioli University College London (UCL), Giuseppe Destefanis University College London
11:40 10m Talk		Combining Example-Based and Rule-Based Program Transformations to Resolve Build Conflicts Technical Papers Sheikh Shadab Towqir Virginia Tech, Fei He Tsinghua University, Todd Mytkowicz Google, Na Meng Virginia Tech Pre-print
11:50 10m Talk		Mining Quantum Software Patterns in Open-Source Projects Technical Papers Neilson Carlos Leite Ramalho Universidade de São Paulo, Erico Augusto Da Silva Universidade de São Paulo, Higor Amario de Souza University of São Paulo, Marcos Lordello Chaim University of São Paulo
12:00 10m Talk		Analyzing Dependency Distribution Changes Arising from Code Smell Interactions Technical Papers Zushuai Zhang University of Auckland, Elliott Wen , Ewan Tempero The University of Auckland Pre-print
12:10 10m Talk		The Value of Effective Pull Request Description Technical Papers Shirin Pirouzkhah University of Zurich, Pavlina Wurzel Goncalves University of Zurich, Alberto Bacchelli IfI, University of Zurich Pre-print
12:20 10m Talk		Secret Leak Detection in Software Issue Reports using LLMs: A Comprehensive Evaluation Technical Papers Sadif Ahmed Bangladesh University of Engineering and Techonology, Md Nafiu Rahman Bangladesh University of Engineering and Technology, Zahin Wahab The University of British Columbia, Gias Uddin York University, Canada, Rifat Shahriyar Bangladesh University of Engineering and Technology Dhaka, Bangladesh Pre-print Media Attached

11:00 - 12:30	Session 1-A: AI & Autonomous AgentsTechnical Papers / MSR Program at Oceania V

11:00 10m Talk		Speed at the Cost of Quality: How Cursor AI Increases Short-Term Velocity and Long-Term Complexity in Open-Source Projects Technical Papers Hao He Carnegie Mellon University, Courtney Miller Carnegie Mellon University, Shyam Agarwal Carnegie Mellon University, Christian Kästner Carnegie Mellon University, Bogdan Vasilescu Carnegie Mellon University Pre-print
11:10 10m Talk		LLM-Based Detection of Tangled Code Changes for Higher-Quality Method-Level Bug Datasets Technical Papers Md Nahidul Islam Opu University of Manitoba, Shaowei Wang University of Manitoba, Shaiful Chowdhury University of Manitoba Pre-print
11:20 10m Talk		Adversarial Bug Reports as a Security Risk in Language Model-Based Automated Program Repair Technical Papers Piotr Przymus Nicolaus Copernicus University in Toruń, Poland, Andreas Happe TU Wien, Jürgen Cito TU Wien Pre-print
11:30 10m Talk		Investigating Autonomous Agent Contributions in the Wild: Activity Patterns and Code Change over Time Technical Papers Răzvan Mihai Popescu Delft University of Technology, David Gros University of California, Davis, Andrei Botocan Delft University of Technology, Rahul Pandita GitHub, Inc., Prem Devanbu University of California at Davis, Maliheh Izadi Delft University of Technology
11:40 10m Talk		Evaluating the Use of LLMs for Automated DOM-Level Resolution of Web Performance Issues Technical Papers Gideon Peters Concordia University, SayedHassan Khatoonabadi Concordia University, Emad Shihab Concordia University
11:50 10m Talk		Are Coding Agents Generating Over-Mocked Tests? An Empirical Study Technical Papers Andre Hora UFMG, Romain Robbes CNRS, LaBRI, University of Bordeaux Pre-print
12:00 10m Talk		Consistent or Sensitive? Automated Code Revision Tools Against Semantics-Preserving Perturbations Technical Papers Shirin Pirouzkhah University of Zurich, Souhaila Serbout University of Zurich, Zurich, Switzerland, Alberto Bacchelli IfI, University of Zurich Pre-print
12:10 10m Talk		Beyond the Prompt: An Empirical Study of Cursor Rules Technical Papers Shaokang Jiang University of California, Irvine, Daye Nam University of California, Irvine Pre-print
12:20 10m Talk		Bridging Design and Implementation: A Study of Multi-Agent LLM Architectures for Automated Front-End Generation Technical Papers Caren Rizk Concordia University, SayedHassan Khatoonabadi Concordia University, Emad Shihab Concordia University

14:00 - 15:30	Session 2-B: QualityMSR Program / Technical Papers at Oceania IV

14:00 10m Talk		How are MLOps Frameworks Used in Open Source Projects? An Empirical Characterization Technical Papers Fiorella Zampetti University of Sannio, Italy, Federico Stocchetti University of Sannio, Italy, Federica Razzano University of Sannio, Italy, Damian Andrew Tamburri University of Sannio - JADS/NXP Semiconductors, Massimiliano Di Penta University of Sannio, Italy Pre-print
14:10 10m Talk		Do We Agree on What an “Audit” Is? Toward Standardized Smart Contract Audit Reporting Technical Papers Ilham Qasse Reykjavik University, Mohammad Hamdaqa Polytechnique Montreal, Gísli Hjálmtýsson Reykjavik University
14:20 10m Talk		AFGNN: API Misuse Detection using Graph Neural Networks and Clustering Technical Papers Ponnampalam Pirapuraj IIT Hyderabad, Tamal Mondal Oracle, Sharanya Gupta Yokogawa Digital, Akash Lal Microsoft Research, Somak Aditya IIT Kharagpur, Jyothi Vedurada IIT Hyderabad
14:30 10m Talk		An Empirical Analysis of Cross-OS Portability Issues in Python Projects Technical Papers Denini Silva Federal University of Pernambuco, MohamadAli Farahat North Carolina State University, Marcelo d'Amorim North Carolina State University Pre-print
14:40 10m Talk		Learning Compiler Fuzzing Mutators from Historical Bugs Technical Papers Lingjun Liu North Carolina State University, Feiran Qin North Carolina State University, Owolabi Legunsen Cornell University, Marcelo d'Amorim North Carolina State University
14:50 40m Meeting		Mining Challenge Finalists MSR Program

14:00 - 15:30	Session 2-A: Ecosystems & MethodsTechnical Papers / Industry Track / MSR Program at Oceania V

14:00 10m Talk		Analyzing GitHub Issues and Pull Requests in nf-core Pipelines: Insights into nf-core Pipeline Repositories Technical Papers Khairul Alam University of Saskatchewan, Banani Roy University of Saskatchewan
14:10 10m Talk		Modeling Sampling Workflows for Code Repositories Technical Papers Romain Lefeuvre University of Rennes, Maiwenn Le Goasteller University of Rennes, Inria, CNRS, IRISA, Jessie Galasso-Carbonnel McGill University, Benoit Combemale University of Rennes, Inria, CNRS, IRISA, Quentin Perez INSA Rennes, Houari Sahraoui DIRO, Université de Montréal
14:20 10m Talk		Quantifying Competitive Relationships Among Open-Source Software Projects Technical Papers Yuki Takei Japan Advanced Institute of Science and Technology, Toshiaki Aoki JAIST, Chaiyong Rakhitwetsagul Mahidol University, Thailand Pre-print
14:30 10m Talk		Role of CI Adoption in Mobile App Success: An Empirical Study of Open-Source Android Projects Technical Papers xiaoxin zhou University of Toronto, Taher A. Ghaleb Trent University, Safwat Hassan University of Toronto Pre-print
14:40 10m Talk		ML in a Box: Analyzing Containerization Practices in Open Source ML Projects Technical Papers Faten Jebari Grand Valley State University, Emna Ksontini University of North Carolina Wilmington, Amine Barrak Oakland University, USA, Wael Kessentini DePaul University
14:50 10m Talk		An Empirical Study of Policy as Code: Adoption, Purpose, and Maintenance Technical Papers Ruben Opdebeeck Vrije Universiteit Brussel, Mahmoud Alfadel University of Calgary, Akond Rahman Auburn University, Yutaro Kashiwa Nara Institute of Science and Technology, João F. Ferreira Faculty of Engineering, University of Porto & INESC-ID, Raula Gaikovina Kula The University of Osaka, Coen De Roover Vrije Universiteit Brussel Pre-print
15:00 10m Talk		Tracing Stereotypes in Pre-trained Transformers: From Biased Neurons to Fairer Models Technical Papers Gianmario Voria University of Salerno, Moses Openja Polytechnique Montreal, Foutse Khomh Polytechnique Montréal, Gemma Catolino University of Salerno, Fabio Palomba University of Salerno Pre-print
15:10 5m Industry talk		Can Data Mining Help to Survive the Annual Compiler Upgrade? Industry Track Gunnar Kudrjavets Amazon Web Services, USA, Aditya Kumar Google, Piotr Przymus Nicolaus Copernicus University in Toruń, Poland Pre-print
15:15 5m Talk		Underutilization in Research GPU Clusters: SE Challenges Industry Track Krzysztof Kaczmarski Warsaw University of Technology, Jakub Narębski Nicolaus Copernicus University in Toruń, Piotr Przymus Nicolaus Copernicus University in Toruń, Poland

	16:00 - 17:30	Plenary: Vision and FCA AwardMSR Program at Oceania V

Accepted Papers

	Title
	AI builds, We Analyze: An Empirical Study of AI-Generated Build Code Quality Mining Challenge Anwar Ghammam, Mohamed Almukhtar
	AI IDEs or Autonomous Agents? Measuring the Impact of Coding Agents on Software Development Mining Challenge Shyam Agarwal, Hao He, Bogdan Vasilescu Pre-print
	Analyzing Message-Code Inconsistency in AI Coding Agent-Authored Pull Requests Mining Challenge Jingzhi Gong, Giovanni Pinna, Yixin Bian, Jie M. Zhang
	An Empirical Analysis of Test Failures in AI-Generated Pull Requests Mining Challenge Alireza Hoseinpour, Sajjad Rezvani Boroujeni, Jashhvanth Tamilselvan Kunthavai, Kyle Cusimano, Abbas Heydarnoori
	An Empirical Study of Code Clone Genealogies in Human–AI Collaborative Development Mining Challenge Denis Sousa, Italo Uchoa, Matheus Paixao, Chaiyong Rakhitwetsagul, Thiago Lima
	An Empirical Study of Tests in Agentic Pull Requests Mining Challenge Sabrina Haque, Sarvesh Ingale, Christoph Csallner DOI Pre-print Media Attached
	A Study of Library Usage in Agent-Authored Pull Requests Mining Challenge Lukas Twist, Jie M. Zhang DOI Pre-print
	A Study on Code Clone Lifecycles in Pull Requests Created by AI Agents Mining Challenge Italo Uchoa, Denis Sousa, Henrique Chuvas, Matheus Paixao, Chaiyong Rakhitwetsagul, Thiago Lima
	A Task-Level Evaluation of AI Agents in Open-Source Projects Mining Challenge Shojibur Rahman, Md Fazle Rabbi, Minhaz Zibran Pre-print
	Behavioral Analysis of AI Code Generation Agents: Edit, Rewrite, and Repetition Mining Challenge Mahdieh Abazar, Reyhaneh Farahmand, Gouri Ginde, Benjamin Tan, Lorenzo De Carli
	Behind Agentic Pull Requests: An Empirical Study on Developer Interventions in AI Agent-Authored Pull Requests Mining Challenge Syrine Khelifi, Ali Ouni, Maha Khemaja
	Beyond Bug Fixes: An Empirical Investigation of Post-Merge Code Quality Issues in Agent-Generated Pull Requests Mining Challenge Shamse Tasnim Cynthia, Al Muttakin, Banani Roy
	Bug-Fixing in the Age of AI: Human vs. Agentic Pull Requests Mining Challenge Renato Domingues , Fernando Castor, Fernanda Madeiral
	Characterizing Self-Admitted Technical Debt Generated by AI Coding Agents Mining Challenge Zaki Brahmi, Ali Ouni, Mohammed Sayagh, Mohamed Aymen saied
	Code Change Characteristics and Description Alignment: A Comparative Study of Agentic versus Human Pull Requests Mining Challenge Dung Pham, Taher A. Ghaleb Pre-print
	Comparing AI Coding Agents: A Task-Stratified Analysis of Pull Request Acceptance Mining Challenge Giovanni Pinna, Jingzhi Gong, David Williams, Federica Sarro
	Do AI Agents Really Improve Code Readability? Mining Challenge Kyogo Horikawa, Kosei Horikawa, Yutaro Kashiwa, Hidetake Uwano, Hajimu Iida
	Do AI-Generated Pull Requests Get Rejected More? (Yes but Why?) Mining Challenge Rosie Wang, Zhou Yang
	Early-Stage Prediction of Review Effort in AI-Generated Pull Requests Mining Challenge Dao Sy Duy Minh, Huynh Trung Kiet, Nguyen Lam Phu Quy, Pham Phu Hoa, Tran Chi Nguyen, Nguyen Dinh Ha Duong, Truong Bao Tran
	Fingerprinting AI Coding Agents on GitHub Mining Challenge Taher A. Ghaleb Pre-print
	From Industry Claims to Empirical Reality: An Empirical Study of Code Review Agents in Pull Requests Mining Challenge Kowshik Chowdhury, Dipayan Banik, K M Ferdous, Shazibul Islam Shamim
	How AI Coding Agents Communicate: A Study of Pull Request Characteristics and Human Review Responses Mining Challenge Kan Watanabe, Rikuto Tsuchida, Takahiro Monno, Bin Huang, Kazuma Yamasaki, Youmei Fan, Kazumasa Shimari, Kenichi Matsumoto
	How AI Coding Agents Modify Code: A Large-Scale Study of GitHub Pull Requests Mining Challenge Daniel Ogenrwot, John Businge DOI Pre-print
	How Do Agentic AI Systems Address Performance Optimizations? A BERTopic-Based Analysis of Pull Requests Mining Challenge Md Nahidul Islam Opu, Md Shahidul Islam, Muhammad Asaduzzaman, Shaiful Chowdhury Pre-print
	How Do Agentic AI Systems Deal With Software Energy Concerns? A Pull Request-Based Study Mining Challenge Tanjum Motin Mitul, Md. Masud Mazumder, Md Nahidul Islam Opu, Shaiful Chowdhury Pre-print
	How Do Agents Perform Code Optimization? An Empirical Study Mining Challenge Huiyun Peng, Antonio Zhong Qiu, Ricardo Andres Calvo Mendez, Kelechi G. Kalu, James C. Davis
	How do Agents Refactor: An Empirical Study Mining Challenge Lukas Ottenhof, Daniel Penner, Abram Hindle, Thibaud Lutellier Pre-print
	Human-Agent versus Human Pull Requests: A Testing-Focused Characterization and Comparison Mining Challenge Roberto Milanese, Francesco Salzano, Angelica Spina, Antonio Vitale, Remo Pareschi, Fausto Fasano, Mattia Fazzini DOI Pre-print
	Humans Integrate, Agents Fix: How Agent-Authored Pull Requests Are Referenced in Practice Mining Challenge Islem Khemissi, Moataz Chouchen, Dong Wang, Raula Gaikovina Kula
	Let's Make Every Pull Request Meaningful: An Empirical Analysis of Developer and Agentic Pull Requests Mining Challenge Haruhiko Yoshioka, Takahiro Monno, Haruka Tokumasu, Taiki Wakamatsu, Yuki Ota, Nimmi Weeraddana, Kenichi Matsumoto DOI Pre-print
	LGTM! Characteristics of Auto-Merged LLM-based Agentic PRs Mining Challenge Ruben Branco, Paulo Canelas, Catarina Gamboa, Alcides Fonseca DOI Pre-print Media Attached
	Mining Type Constructs Using Patterns in AI-Generated Code Mining Challenge Imgyeong Lee, Tayyib Ul Hassan, Abram Hindle
	More Code, Less Reuse: Investigation on Code Quality and Reviewer Sentiment towards AI-generated Pull Requests Mining Challenge Haoming Huang, Pongchai Jaisri, Shota Shimizu, Lingfeng Chen, Sota Nakashima, Gema Rodríguez-Pérez DOI Pre-print
	Novice Developers Produce Larger Review Overhead for Project Maintainers while Vibe Coding Mining Challenge Syed Ammar Asdaque, Imran Haider, Muhammad Umar Malik, Maryam Abdul Ghafoor, Abdul Ali Bangash
	On Autopilot? An Empirical Study of Human–AI Teaming and Review Practices in Open Source Mining Challenge Haoyu Gao, Peerachai Banyongrakkul, Hao Guan, Mansooreh Zahedi, Christoph Treude
	On the Adoption of AI Coding Agents in Open-source Android and iOS Development Mining Challenge Muhammad Ahmad Khan, Hasnain Ali, Muneeb Rana, Muhammad Saqib Ilyas, Abdul Ali Bangash Pre-print
	On the Footprints of Reviewer Bots' Feedback on Agentic Pull Requests in OSS GitHub Repositories Mining Challenge Syeda Kaneez Fatima, Yousuf Abrar, Abdul Rehman, Amelia Nawaz, Shamsa Abid, Abdul Ali Bangash
	On the Reliability of Agentic AI in Continuous Integration Pipelines Mining Challenge Jasem Khelifi, Mahi Begoug, Ali Ouni, Mohammed Sayagh, Mohamed Aymen saied, Moataz Chouchen
	Readability of AI-Generated Pull Request Descriptions Across Pull Request Types Mining Challenge Aidan Tobar, Joseph Peterson, Abbas Heydarnoori
	Reliability of AI Bots Footprints in GitHub Actions CI/CD Workflows Mining Challenge Syed Muhammad Ashhar Shah, Sehrish Habib, Muizz Ahmed Hussain, Maryam Abdul Ghafoor, Abdul Ali Bangash
	Safer Builders, Risky Maintainers: A Comparative Study of Breaking Changes in Human vs Agentic PRs Mining Challenge K M Ferdous, Dipayan Banik, Kowshik Chowdhury, Shazibul Islam Shamim
	Studying the Footprints of AI Coding Agents in Blockchain Repositories Mining Challenge Munim Iftikhar, Maaz Shahid, Shahreyar Ashraf, Muhammad Saqib Ilyas, Abdul Ali Bangash
	Test Coverage of Code Changes in AI-Generated Pull Requests Mining Challenge Tales Alves, Leopoldo Teixeira
	Testing with AI Agents: An Empirical Study of Test Generation Frequency, Quality, and Coverage Mining Challenge Suzuka Yoshimoto, Shun Fujita, Kosei Horikawa, Daniel Feitosa, Yutaro Kashiwa, Hajimu Iida
	The Dose Makes the Agent: Therapeutic Index Analysis of AI Coding Contributions Mining Challenge Giuseppe Destefanis, Ronnie de Souza Santos, Marco Ortu, Mairieli Wessel
	The Quiet Contributions: Insights into AI-Generated Silent Pull Requests Mining Challenge S. M. Mahedy Hasan, Md Fazle Rabbi, Minhaz Zibran Pre-print
	Toward Instructions-as-Code: Understanding the Impact of Instruction Files on Agentic Pull Requests Mining Challenge Ali Arabat, Mohammed Sayagh
	Understanding Dominant Themes in Reviewing Agentic AI-authored Code Mining Challenge Md. Asif Haider, Thomas Zimmermann Pre-print
	Understanding the Rejection of Fixes Generated by Agentic Pull Requests - Insights from the AIDev Dataset Mining Challenge Mahmoud Abujadallah, Ali Arabat, Mohammed Sayagh
	What to Cut? Predicting Unnecessary Methods in Agentic Code Generation Mining Challenge Kan Watanabe, Tatsuya Shirai, Yutaro Kashiwa, Hajimu Iida
	When AI Agents Touch CI/CD Configurations: Frequency and Success Mining Challenge Taher A. Ghaleb Pre-print
	When AI Code Doesn’t Stick: An Empirical Study on Reverted Changes Introduced by AI Coding Agents Mining Challenge Issam Oukay, Mahi Begoug, Moataz Chouchen, Ali Ouni
	When AI Teammates Meet Code Review: Collaboration Signals Shaping the Integration of Agent-Authored Pull Requests Mining Challenge Costain Nachuma, Minhaz Zibran Pre-print
	When AI Writes Code: Investigating Security Issues in Agentic Software Changes Mining Challenge Esteban Dectot-Le Monnier de Gouville, Mohammad Hamdaqa, Moataz Chouchen
	When Bots Get the Boot: Understanding Pull Request Rejections in the Era of AI Coders Mining Challenge Karla Gonzalez, Mariam El Mezouar
	When is Generated Code Difficult to Comprehend? Assessing AI Agent Python Code Proficiency in the Wild Mining Challenge Nanthit Temkulkiat, Chaiyong Rakhitwetsagul, Morakot Choetkiertikul, Ruksit Rojpaisarnkit, Raula Gaikovina Kula
	Where Do AI Coding Agents Fail? An Empirical Study of Failed Agentic Pull Requests in GitHub Mining Challenge Ramtin Ehsani, Sakshi Pathak, Shriya Rawal, Abdullah Al Mujahid, Mia Mohammad Imran, Preetha Chatterjee Pre-print
	Who Said CVE? How Vulnerability Identifiers Are Mentioned by Humans, Bots, and Agents in Pull Requests Mining Challenge Pien Rooijendijk, Christoph Treude, Mairieli Wessel
	Who Writes the Docs in SE 3.0? Agent vs. Human Documentation Pull Requests Mining Challenge Kazuma Yamasaki, Joseph Ayobami Joshua, Tasha Settewong, Mahmoud Alfadel, Kazumasa Shimari, Kenichi Matsumoto DOI Pre-print
	Why and When Agentic Pull Requests are (not) Accepted: An Exploratory Study Mining Challenge Marius Christoph Strauss, Sandro Schulze DOI Pre-print
	Why Are Agentic Pull Requests Merged or Rejected? An Empirical Study Mining Challenge Sien Reeve O. Peralta, Fumika Hoshi, Hironori Washizaki, Naoyasu Ubayashi, Inase Kondo, Yoshiki Higo, Hiroki Mukai, Norihiro Yoshida, Kazuki Kusama, Hidetake Tanaka, Youmei Fan
	Why Are AI Agent–Involved Pull Requests (Fix-Related) Remain Unmerged? An Empirical Study Mining Challenge Khairul Alam, Saikat Mondal, Banani Roy

Call for Mining Challenge Papers

Mining challenge PDF available here
AIDev dataset preprint available here

Update 2025-12-30: Author Notification moved to Jan 19, 2026, and Camera Ready to Jan 26, 2026.

Update 2025-11-06: We updated the Zenodo link for the dataset and added a new FAQ section.

Update 2025-10-28: Paper Deadline is now Dec 23, 2025 (AoE) to avoid the holiday period.

Update 2025-10-14: The submission deadlines have been extended to provide authors with more time to prepare submissions.

Update 2025-09-29: Accepted papers will be published as Short Papers in the ACM Digital Library.

AI coding agents are rapidly reshaping the landscape of software engineering by autonomously developing features, fixing bugs, and writing tests. These tools, such as Claude Code, Cursor, Devin, GitHub Copilot, and OpenAI Codex, are no longer just assisting developers; they are becoming active AI teammates in the software development process. Yet, despite their growing presence, the research community lacks a comprehensive, large-scale understanding of how AI coding agents collaborate with developers in real-world projects: how they propose code changes, how developers respond, and what kinds of collaboration patterns emerge.

This year’s MSR Mining Challenge invites the global research community to explore unprecedented questions and present their insights using AIDev, the first large-scale, openly available dataset capturing agent-authored pull requests (Agentic-PRs) from real-world GitHub repositories:

Scale: 932,791 Agentic-PRs
Breadth: 116,211 repositories and 72,189 developers, across five AI agents (Claude Code, Cursor, Devin, GitHub Copilot, OpenAI Codex)
Depth: 33,596 curated Agentic-PRs from 2,807 popular repositories (over 100 stars), enriched with comments, reviews, commits, and related issues

Challenge

The AIDev dataset opens up rich and timely research directions around AI adoption, code quality, testing, review dynamics, risks, and human-AI collaboration in software engineering. Example research questions include (but are not limited to):

1) Adoption and Practices

i. Who adopts Coding Agents on GitHub (e.g., newcomers vs. experienced developers)?
ii. How do adoption patterns vary across repositories and ecosystems?
iii. What practices (e.g., PR size, task type, and commit granularity) correlate with the quality of Agentic-PRs?
iv. How can these practices inform concrete guidelines for developers to work with Agentic-PRs?

2) Code Patch Characteristics

i. How do Agentic-PRs change code (e.g., additions, deletions, files touched)?
ii. How consistent are their descriptions with the actual code changes?
iii. To what extent do Agentic-PRs introduce original code versus reusing existing snippets?
iv. What are the implications for maintainability?

3) Testing Behavior

i. How frequently do Coding Agents contribute tests? What types (e.g., unit, integration, end-to-end) are most common?
ii. What is the test-to-code churn ratio across ecosystems?
iii. When tests are missing in initial Agentic-PRs, do developers intervene to ensure reliable software testing (via follow-up commits or related PRs)?

4) Review Dynamics

i. What aspects of Agentic-PRs (e.g., correctness, style, security, testing) receive the most attention during review?
ii. To what extent do Coding Agents address review comments?
iii. Which comment types are challenging for agents to resolve?

5) Failure Patterns and Risks

i. What common failure patterns and code quality issues appear in Agentic-PRs? Why do they occur?
ii. How can we leverage these insights to reduce failure rates, optimize human–AI collaboration, and improve AI model training that prioritizes learning from mistakes?
iii. How well can early signals (e.g., PR description, touched paths, and patch characteristics) predict Agentic-PRs rejection or review effort?
iv. How frequently do Agentic-PRs introduce or mitigate security vulnerabilities?

We also suggest checking our preprint paper for more research questions and ideas: https://arxiv.org/abs/2507.15003

How to Participate in the Challenge

First, familiarize yourself with the AIDev dataset:

The details about the AIDev infrastructure and the data are provided in our preprint.
The dataset can be downloaded from either Hugging Face or Zenodo.
GitHub (example code & notebooks): https://github.com/SAILResearch/AI_Teammates_in_SE3.
An example Jupyter notebook demonstrating how to load and analyze the dataset is available here, you can also open it directly in Google Colab.

Use the dataset to answer your research questions, and report your findings in a four-page challenge paper that you submit to our challenge. If your paper is accepted, present your results at MSR 2026 in Rio de Janeiro, Brazil!

Submission

IMPORTANT: Accepted papers in Mining Challenge will be published as Short Papers in the ACM Digital Library. Starting 2026, all articles published by ACM will be made Open Access. This is greatly beneficial to the advancement of computer science and leads to increased usage and citation of research.

Most authors will be covered by ACM OPEN agreements by that point and will not have to pay Article Processing Charges (APC). Check if your institution participates in ACM OPEN.
Authors not covered by ACM OPEN agreements may have to pay APC; however, ACM is offering several automated and discretionary APC Waivers and Discounts.

A challenge paper should describe the results of your work by providing an introduction to the problem you address and why it is worth studying, the version of the dataset you used, the approach and tools you used, your results and their implications, and conclusions. Make sure your report highlights the contributions and the importance of your work. See also our open science policy regarding the publication of software and additional data you used for the challenge.

To ensure clarity and consistency in research submissions:

When detailing methodologies or presenting findings, authors should specify which snapshot/version of the AIDev dataset was utilized.
Given the continuous updates to the dataset, authors are reminded to be precise in their dataset references. This will help maintain transparency and ensure consistent replication of results.

All authors should use the official “ACM Primary Article Template”, as can be obtained from the ACM Proceedings Template page. LaTeX users should use the sigconf option, as well as the review (to produce line numbers for easy reference by the reviewers) and anonymous (omitting author names) options. To that end, the following LaTeX code can be placed at the start of the LaTeX document:

\documentclass[sigconf,review,anonymous]{acmart}
\acmConference[MSR 2026]{MSR '26: Proceedings of the 23rd International Conference on Mining Software Repositories}{April 2026}{Rio de Janeiro, Brazil}

Submissions to the Challenge Track can be made via the submission site by the submission deadline. We encourage authors to upload their paper info early (the PDF can be submitted later) to properly enter conflicts for anonymous reviewing. All submissions must adhere to the following requirements:

Submissions must not exceed the page limit (4 pages plus 1 additional page of references). The page limit is strict, and it will not be possible to purchase additional pages at any point in the process (including after acceptance).
Submissions must strictly conform to the ACM formatting instructions. Alterations of spacing, font size, and other changes that deviate from the instructions may result in desk rejection without further review.
Submissions must not reveal the authors’ identities. The authors must make every effort to honor the double-anonymous review process. In particular, the authors’ names must be omitted from the submission and references to their prior work should be in the third person. Further advice, guidance, and explanation about the double-anonymous review process can be found in the Q&A page for ICSE 2026.
Submissions should consider the ethical implications of the research conducted within a separate section before the conclusion.
The official publication date is the date the proceedings are made available in the ACM or IEEE Digital Libraries. This date may be up to two weeks prior to the first day of the ICSE 2026. The official publication date affects the deadline for any patent filings related to published work.
Purchases of additional pages in the proceedings are not allowed.

Any submission that does not comply with these requirements is likely to be desk rejected by the PC Chairs without further review. In addition, by submitting to the MSR Challenge Track, the authors acknowledge that they are aware of and agree to be bound by the following policies:

The ACM Policy and Procedures on Plagiarism and the IEEE Plagiarism FAQ. In particular, papers submitted to MSR 2026 must not have been published elsewhere and must not be under review or submitted for review elsewhere whilst under consideration for MSR 2026. Contravention of this concurrent submission policy will be deemed a serious breach of scientific ethics, and appropriate action will be taken in all such cases (including immediate rejection and reporting of the incident to ACM/IEEE). To check for double submission and plagiarism issues, the chairs reserve the right to (1) share the list of submissions with the PC Chairs of other conferences with overlapping review periods and (2) use external plagiarism detection software, under contract to the ACM or IEEE, to detect violations of these policies.
The authorship policy of the ACM and the authorship policy of the IEEE.

Upon notification of acceptance, all authors of accepted papers will be asked to fill a copyright form and will receive further instructions for preparing the camera-ready version of their papers. At least one author of each paper is expected to register and present the paper at the MSR 2026 conference. All accepted contributions will be published in the electronic proceedings of the conference.

The AIDev dataset can be cited as:

@article{li2025aidev,
title={{The Rise of AI Teammates in Software Engineering (SE) 3.0: How Autonomous Coding Agents Are Reshaping Software Engineering}}, 
author={Li, Hao and Zhang, Haoxiang and Hassan, Ahmed E.},
journal={arXiv preprint arXiv:2507.15003},
year={2025}
}

A preprint is available online: https://arxiv.org/abs/2507.15003

Submission Site

Papers must be submitted through HotCRP: https://msr2026-challenge.hotcrp.com/

Important Dates (AoE)

Abstract Deadline: Dec 18, 2025 (Optional, but encouraged to help us plan the review process)
Paper Deadline: Dec 23, 2025
Author Notification: Jan 19, 2026
Camera Ready Deadline: Jan 26, 2026

Open Science Policy

Openness in science is key to fostering progress via transparency, reproducibility and replicability. Our steering principle is that all research output should be accessible to the public and that empirical studies should be reproducible. In particular, we actively support the adoption of open data and open source principles. To increase reproducibility and replicability, we encourage all contributing authors to disclose:

the source code of the software they used to retrieve and analyze the data
the (anonymized and curated) empirical data they retrieved in addition to the AIDev dataset
a document with instructions for other researchers describing how to reproduce or replicate the results

Already upon submission, authors can privately share their anonymized data and software on archives such as Zenodo or Figshare (tutorial available here). Zenodo accepts up to 50GB per dataset (more upon request). There is no need to use Dropbox or Google Drive. After acceptance, data and software should be made public so that they receive a DOI and become citable. Zenodo and Figshare accounts can easily be linked with GitHub repositories to automatically archive software releases. In the unlikely case that authors need to upload terabytes of data, Archive.org may be used.

We recognise that anonymizing artifacts such as source code is more difficult than preserving anonymity in a paper. We ask authors to take a best effort approach to not reveal their identities. We will also ask reviewers to avoid trying to identify authors by looking at commit histories and other such information that is not easily anonymized. Authors wanting to share GitHub repositories may want to look into using https://anonymous.4open.science/ which is an open source tool that helps you to quickly double-blind your repository.

We encourage authors to self-archive pre- and postprints of their papers in open, preserved repositories such as arXiv.org. This is legal and allowed by all major publishers including ACM and IEEE and it lets anybody in the world reach your paper. Note that you are usually not allowed to self-archive the PDF of the published article (that is, the publisher proof or the Digital Library version). Please note that the success of the open science initiative depends on the willingness (and possibilities) of authors to disclose their data and that all submissions will undergo the same review process independent of whether or not they disclose their analysis code or data. We encourage authors who cannot disclose industrial or otherwise non-public data, for instance due to non-disclosure agreements, to provide an explicit (short) statement in the paper.

Best Mining Challenge Paper Award

As mentioned above, all submissions will undergo the same review process independent of whether or not they disclose their analysis code or data. However, only accepted papers for which code and data are available on preserved archives, as described in the open science policy, will be considered by the program committee for the best mining challenge paper award.

Best Student Presentation Award

Like in the previous years, there will be a public voting during the conference to select the best mining challenge presentation. This award often goes to authors of compelling work who present an engaging story to the audience. Only students can compete for this award.

FAQ

Q1. Can we augment AIDev with additional data for the challenge?

Yes. You are welcome and encouraged to “bring your own data” (BYOD) by integrating the AIDev dataset with information from other public, readily available sources (e.g., GitHub REST/GraphQL APIs, repository clones, ecosystem registries). Please document all sources and extraction steps. We urge participants to thoroughly consider the ethical implications of merging the AIDev dataset with other sources. The share or use of personally identifiable information (PII) is strictly prohibited.

Q2. Some PRs seem to have missing patch content. What happened and what should we do?

The dataset has been updated to include all available patches provided by GitHub API. However, the GitHub API may omit content for large patches. If you need the exact patch for these cases, you may need to clone the source repositories and obtain the commit diffs locally. When using patches, verify and comply with the original repository licenses.

Q3. Where is the data dictionary/table?

A data dictionary is available here: https://huggingface.co/datasets/hao-li/AIDev/blob/main/data_table.md

Q4. What do the `commit_stats_additions/deletions` vs. `additions/deletions` fields mean in `pr_commit_details`?

commit_stats_additions, commit_stats_deletions: Totals per commit (sum across all files touched in that commit).
additions, deletions: Per-file counts within that commit (for the specific file row).

Call for Mining Challenge Proposals

The International Conference on Mining Software Repositories (MSR) has hosted a mining challenge since 2006. With this challenge, we call upon everyone interested to apply their tools to a common dataset. The challenge is for researchers and practitioners to bravely use their mining tools and approaches on a dare.

One of the secret ingredients behind the success of the International Conference on Mining Software Repositories (MSR) is its annual Mining Challenge, in which MSR participants can showcase their techniques, tools, and creativity on a common data set. In true MSR fashion, this data set is a real data set contributed by researchers in the community, solicited through an open call. There are many benefits of sharing a data set for the MSR Mining Challenge. The selected challenge proposal explaining the data set will appear in the MSR 2026 proceedings, and the challenge papers using the data set will be required to cite the challenge proposal or an existing paper of the researchers about the selected data set. Furthermore, the authors of the data set will join the MSR 2026 organizing committee as Mining Challenge (co-)chair(s), who will manage the reviewing process (e.g., recruiting a Challenge PC, managing submissions, and reviewing assignments). Finally, it is not uncommon for challenge data sets to feature in MSR and other publications well after the edition of the conference in which they appear!

If you would like to submit your dataset for consideration for the 2026 MSR Mining Challenge, prepare a short proposal (1-2 pages plus appendices, if needed) containing the following information:

Title of data set.
High-level overview:
- Short description, including what types of artifacts the data set contains.
- Summary statistics (how many artifacts of different types).
Internal structure:
- How are the data structured and organized?
- (Link to) Schema, if applicable
How to access:
- How can the data set be obtained?
- What are the recommended ways to access it? Include examples of specific tools, shell commands, etc, if applicable.
- What skills, infrastructure, and/or credentials would challenge participants need to effectively work with the data set?
What kinds of research questions do you expect challenge participants could answer?
A link to a (sub)sample of the data for the organizing committee to pursue (e.g., via GitHub, Zenodo, Figshare).

Submissions must conform to the IEEE conference proceedings template, specified in the IEEE Conference Proceedings Formatting Guidelines (title in 24pt font and full text in 10pt type, LaTeX users must use \documentclass[10pt,conference]{IEEEtran} without including the compsoc or compsocconf options). Submit your proposal here.

The first task of the authors of the selected proposal will be to prepare the Call for Challenge Papers, which outlines the expected content and structure of submissions, as well as the technical details of how to access and analyze the dataset. This call will be published on the MSR website on September 2nd. By making the challenge data set available by late summer, we hope that many students will be able to use the challenge data set for their graduate class projects in the Fall semester.

Important dates:

Submission site: https://msr2026-miningchallenge.hotcrp.com/

Deadline for proposals: August 20, 2025

Notification: August 28, 2025

Call for Challenge Papers Published: September 15, 2025

Mining ChallengeMSR 2026

This program is tentative and subject to change.

Program Display Configuration

Mon 13 AprDisplayed time zone: Brasilia, Distrito Federal, Brazil change

Tue 14 AprDisplayed time zone: Brasilia, Distrito Federal, Brazil change

Accepted Papers

Call for Mining Challenge Papers

Call for Mining Challenge Proposals

Hao LiMining Challenge Co-Chair

Queen's University

Canada

Haoxiang ZhangMining Challenge Co-Chair

Queen's University

Canada

Ahmad AbdellatifCommittee Member

University of Calgary

Canada

Mohammad AbdollahiCommittee Member

York University

Canada

Saima AfrinCommittee Member

William and Mary, USA

United States

Md AhasanuzzamanCommittee Member

Queen's University

Canada

Adekunle AjibodeCommittee Member

Queen's University

Canada

Khairul AlamCommittee Member

University of Saskatchewan

Canada

Mohamed AlmukhtarCommittee Member

University of Michigan-Flint

Giusy AnnunziataCommittee Member

University of Salerno

Italy

Muhammad AsaduzzamanCommittee Member

University of Windsor

Abdul Ali BangashCommittee Member

Lahore University of Management Sciences

Pakistan

Oussama Ben SghaierCommittee Member

Queen's University

Narjes BessghaierCommittee Member

SE researcher

Canada

Ricardo Andres Calvo MendezCommittee Member

Purdue University

United States

Genevieve CaumartinCommittee Member

Concordia University

Canada

Debasish ChakrobortiCommittee Member

University of Saskatchewan

Arifa Islam ChampaCommittee Member

Idaho State University

United States

Shi ChangCommittee Member

An Ran ChenCommittee Member

University of Alberta

Canada

Shaiful ChowdhuryCommittee Member

University of Manitoba

Canada

Shamse Tasnim CynthiaCommittee Member

University of Saskatchewan

Canada

James C. DavisCommittee Member

Purdue University

United States

Zishuo DingCommittee Member

The Hong Kong University of Science and Technology (Guangzhou)

China

Kalvin EngCommittee Member

Arianna FedeliCommittee Member

Gran Sasso Science Institute (GSSI)

Italy

Eduardo FigueiredoCommittee Member

Federal University of Minas Gerais

Mon 13 Apr
Displayed time zone: Brasilia, Distrito Federal, Brazil change

Tue 14 Apr
Displayed time zone: Brasilia, Distrito Federal, Brazil change