JavaBackports: A Dataset for Benchmarking Automated Backporting in Java
This program is tentative and subject to change.
Manually backporting critical patches to long-term support versions is both error-prone and often overlooked, resulting in substantial security risks. Progress in this area is constrained by the absence of datasets that capture the semantic complexities across versions, inherent to backporting in large Java ecosystems. To address this gap, we present JavaBackports, a curated dataset of 474 real-world backport instances, systematically selected and manu- ally validated from more than 11,000 candidate patches in eight widely used open-source Java projects: Druid, Elasticsearch, Hadoop, Kafka, and four major JDK versions (jdk8, jdk11, jdk17, jdk21). To assess the utility of JavaBackports, we conduct preliminary experiments using state-of-the-art Large Language Models (LLMs) to automatically generate backported patches. The results indicate that current LLMs struggle with backporting tasks, particularly when the required changes involve non-trivial logical or structural modifications. These findings demonstrate both the difficulty of the problem and the potential of JavaBackports to stimulate new research directions in automated software maintenance and repair.
| JavaBackports Presentation Video (JavaBackports_presentation_video.mp4) | 10.56MiB |
| JavaBackports Presentation (JavaBackports_msr_2026_presentation.pdf) | 590KiB |
This program is tentative and subject to change.
Mon 13 AprDisplayed time zone: Brasilia, Distrito Federal, Brazil change
16:00 - 17:30 | Session 3-B: Demo and ToolsData and Tool Showcase Track / Registered Reports / MSR Program at Catering and Exhibition Hall (Europa I to IV) Chair(s): Klaas-Jan Stol Lero; University College Cork; SINTEF Digital , Minghui Zhou Peking University | ||
16:00 50mTalk | MOOT: a Repository of many Multi-objective Optimization Tasks Data and Tool Showcase Track Tim Menzies North Carolina State University, Tao Chen University of Birmingham, Yulong Ye University of Birmingham, Kishan Kumar Ganguly NC State, Amirali Rayegan NC State, Srinath Srinivasan North Carolina State University, Andre Lustosa North Carolina State University Pre-print | ||
16:00 50mTalk | Mapping Decentralized Autonomous Organization Governance Across Chains: An Updated, Multi-Platform Dataset Data and Tool Showcase Track Mashiat Amin Farin University of Texas at Dallas, Samer Hassan Institute of Knowledge Technology, Universidad Complutense de Madrid, Madrid, Spain & Berkman Klein Center at Harvard University, Cambridge MA, USA, Javier Arroyo Dpt. of Computer Science, Universidad de Alcalá, Madrid, Spain & Institute of Knowledge Technology, Universidad Complutense de Madrid Madrid, Spain | ||
16:00 50mTalk | Web3BlockSet: A Dataset for Empirical Research in Blockchain-Oriented Software Engineering Data and Tool Showcase Track Pamella Soares State University of Ceara (UECE), Giuseppe Destefanis University College London, Allan C. N. dos Santos Universidade Federal Fluminense, Allysson Allex Araújo Federal University of Cariri, Raphael Saraiva State University of Ceara, Jerffeson Teixeira de Souza State University of Ceara, Brazil | ||
16:00 50mTalk | Assessing Task-based Chatbots: Snapshot and Curated Datasets for Dialogflow Data and Tool Showcase Track Elena Masserini University of Milano - Bicocca, Diego Clerissi University of Milano-Bicocca, Daniela Micucci University of Milano-Bicocca, Italy, Leonardo Mariani University of Milano-Bicocca | ||
16:00 50mTalk | LILA: Decentralized Build Reproducibility Monitoring for the Functional Package Management Model Data and Tool Showcase Track Julien Malka LTCI, Télécom Paris, Institut Polytechnique de Paris, France, Arnout Engelen Independent | ||
16:00 50mTalk | Mining Kubernetes Repositories: The Cloud was Not Built in a Day Data and Tool Showcase Track Giuseppe Destefanis University College London, Silvia Bartolucci University College London, Daniel Feitosa University of Groningen | ||
16:00 50mTalk | RustXec: A Vulnerability Reproduction Dataset for Assessing Security Risks in Open-Source Rust Applications Data and Tool Showcase Track Zhengjie Ji Virginia Tech, Xin Wang Virginia Tech, Wang Lingxiang Unaffiliated, Geng Li Wake Forest University, Fan Yang Wake Forest University, Ying Zhang Wake Forest University | ||
16:00 50mTalk | OSSGameBench: A Large-Scale Dataset of Development Activities in Open-Source Video Games Data and Tool Showcase Track DOI Pre-print | ||
16:00 50mTalk | JavaBackports: A Dataset for Benchmarking Automated Backporting in Java Data and Tool Showcase Track Kaushal Kahapola University of Moratuwa, Sri Lanka, Sharada Galappaththi University of Moratuwa, Sri Lanka, Dinith Ranasinghe University of Moratuwa, Sri Lanka, Ridwan Salihin Shariffdeen SonarSource, Nisansa de Silva University of Moratuwa, Sri Lanka, Srinath Perera WSO2, Sandareka Wickramanayake University of Moratuwa, Sri Lanka Media Attached File Attached | ||
16:00 50mTalk | HackRep: A Large-Scale Dataset of GitHub Hackathon Projects Data and Tool Showcase Track Sjoerd Halmans Eindhoven University of Technology, Lavinia Francesca Paganini Eindhoven University of Technology, Alexander Serebrenik Eindhoven University of Technology, Alexander Nolte Eindhoven University of Technology | ||
16:00 50mTalk | A Large-Scale Dataset of MCP Implementations on GitHub Data and Tool Showcase Track Benny Toeppe Oakland University, Amine Barrak Oakland University, USA, Emna Ksontini University of North Carolina Wilmington | ||
16:00 50mTalk | SMELLDroid: A Dataset for Code Smells in Android Apps Data and Tool Showcase Track Joyce Champie Florida Polytechnic University, Karim Elish Florida Polytechnic University, Mahmoud Elish Florida Polytechnic University | ||
16:00 50mTalk | KubeObjects: A Dataset of Real-World Kubernetes Objects Data and Tool Showcase Track Matteo Grella University of Twente, Danil Aliforenko University of Twente, Luca Mariot University of Twente | ||
16:00 50mTalk | IssuePilot: An Agentic Framework for Personalized Issue Recommendation and Onboarding in Open-Source Projects Data and Tool Showcase Track | ||
16:00 50mTalk | VoxTransPD: A Reusable Framework for Noise-Resilient Speech Analysis and Early Parkinson's Detection Data and Tool Showcase Track | ||
16:00 50mTalk | DBSecQA: A Curated Dataset of Developer Discussions on Database Security from Stack Exchange Data and Tool Showcase Track Md Rakibul Islam Lamar University, Farha Kamal Lamar University, MD HUMAUN KABIR Lamar University, Md Murad Sharif Lamar University | ||
16:00 50mTalk | AndroMetric: Bridging Multi-Dimensional Software Metrics and Mobile Application Security Data and Tool Showcase Track | ||
16:00 50mTalk | PoolinGH: Fast, Efficient, and Robust GitHub Repository Mining Data and Tool Showcase Track Maxime ANDRÉ Namur Digital Institute, University of Namur, Marco Raglianti REVEAL @ Software Institute – USI, Lugano, Switzerland, Souhaila Serbout University of Zurich, Zurich, Switzerland, Anthony Cleve University of Namur, Michele Lanza Software Institute - USI, Lugano Pre-print | ||
16:00 50mTalk | How Does Experience Influence Developer Perceptions of Atoms of Confusion? Registered Reports Guoshuai Shi University of Waterloo, Farshad Kazemi University of Waterloo, Shane McIntosh University of Waterloo, Michael W. Godfrey University of Waterloo, Canada DOI Pre-print | ||
16:00 50mTalk | SQuaD: The Software Quality Dataset Data and Tool Showcase Track Mikel Robredo University of Oulu, Matteo Esposito University of Oulu, Davide Taibi University of Southern Denmark and University of Oulu, Rafael Penaloza University of Milano-Bicocca, Valentina Lenarduzzi University of Southern Denmark Pre-print | ||
16:00 50mTalk | GivenWhenThen: A Dataset of BDD Test Scenarios Mined from Open Source Projects Data and Tool Showcase Track Luciano Belo de Alcântara Júnior UFMG, João Eduardo Montandon Universidade Federal de Minas Gerais (UFMG) | ||
16:00 50mTalk | World of Logs: A Dataset of Logs from Online Documents Data and Tool Showcase Track Xiaohui Wang University of Waterloo, Kundi Yao Ontario Tech University, Lizhi Liao University of Guelph, Pengyu Nie University of Waterloo, Xuan Zhang Yunnan University, Weiyi Shang University of Waterloo | ||
16:00 50mTalk | AnoMod: A Dataset for Anomaly Detection and Root Cause Analysis in Microservice System Data and Tool Showcase Track Ke Ping University of Helsinki, Hamza Bin Mazhar University of Helsinki, Yuqing Wang University of Helsinki, Finland, Ying Song University of Helsinki, Mika Mäntylä University of Helsinki and University of Oulu | ||
16:00 50mTalk | Causal Inference for the Effect of Code Coverage on Bug Introduction Registered Reports Lukas Schulte University of Passau, Gordon Fraser University of Passau, Steffen Herbold University of Passau DOI Pre-print | ||
16:00 50mTalk | GLiSE: A Prompt-Driven and ML-Powered Tool for Automated Grey Literature Extraction in Software Engineering Data and Tool Showcase Track Brahim Mahmoudi École de technologie supérieure, Zacharie Chenail-Larcher École de technologie supérieure (ÉTS), Houcine Abdelkader Cherief Ecole de Technologie Supérieure, Quentin Stiévenart Université du Québec à Montréal, Naouel Moha École de Technologie Supérieure (ETS), Florent AVELLANEDA Université du Québec à Montréal | ||
16:00 50mTalk | OmniCCG: Agnostic Code Clone Genealogy Extractor Data and Tool Showcase Track Denis Sousa State University of Ceara, Brazil, Matheus Paixao State University of Ceará, Thiago Lima State University of Ceara, Brazil, Adriely Silva State University of Ceara, Brazil, Italo Uchoa State University of Ceará, Chaiyong Ragkhitwetsagul Mahidol University | ||
16:00 50mTalk | GitEvo: Code Evolution Analysis for Git Repositories Data and Tool Showcase Track Andre Hora UFMG Pre-print Media Attached | ||
16:00 50mTalk | Skyt: Prompt Contracts for Software Repeatability in LLM-Assisted Development Data and Tool Showcase Track Heitor Roriz Filho Massimus, Nasser Jazdi University of Stuttgart, Vicente Lucena Universidade Federal do Amazonas | ||
16:00 50mTalk | AndroT: A Dataset of Android Apps with Tests Data and Tool Showcase Track | ||
16:00 50mTalk | InEx-Bug: A Human Annotated Dataset of Intrinsic and Extrinsic Bugs in the NPM Ecosystem Data and Tool Showcase Track Tanner Wright University of British Columbia, Adams Chen University of British Columbia, Gema Rodríguez-Pérez Department of Computer Science, Mathematics, Physics and Statistics, University of British Columbia, Okanagan Campus | ||