JavaBackports: A Dataset for Benchmarking Automated Backporting in Java (MSR 2026 - Data and Tool Showcase Track)

Mon 13 - Tue 14 April 2026 Rio de Janeiro, Brazil

co-located with ICSE 2026

Who

Kaushal Kahapola, Sharada Galappaththi, Dinith Ranasinghe, Ridwan Salihin Shariffdeen, Nisansa de Silva, Srinath Perera, Sandareka Wickramanayake

Track

MSR 2026 Data and Tool Showcase Track

Abstract

Manually backporting critical patches to long-term support versions is both error-prone and often overlooked, resulting in substantial security risks. Progress in this area is constrained by the absence of datasets that capture the semantic complexities across versions, inherent to backporting in large Java ecosystems. To address this gap, we present JavaBackports, a curated dataset of 474 real-world backport instances, systematically selected and manu- ally validated from more than 11,000 candidate patches in eight widely used open-source Java projects: Druid, Elasticsearch, Hadoop, Kafka, and four major JDK versions (jdk8, jdk11, jdk17, jdk21). To assess the utility of JavaBackports, we conduct preliminary experiments using state-of-the-art Large Language Models (LLMs) to automatically generate backported patches. The results indicate that current LLMs struggle with backporting tasks, particularly when the required changes involve non-trivial logical or structural modifications. These findings demonstrate both the difficulty of the problem and the potential of JavaBackports to stimulate new research directions in automated software maintenance and repair.

Kaushal Kahapola

University of Moratuwa, Sri Lanka

Sharada Galappaththi