ICSE 2026 (series) / MSR 2026 (series) / Tutorials /
Running Large Language Models at Scale for Mining Software Repositories: Lessons Learned from HPC-Based Batch Inference
This program is tentative and subject to change.
Mon 13 Apr 2026 16:50 - 17:30 at Oceania IV - Session 3-B: Demo and tool & Tutorial
This program is tentative and subject to change.
Mon 13 AprDisplayed time zone: Brasilia, Distrito Federal, Brazil change
Mon 13 Apr
Displayed time zone: Brasilia, Distrito Federal, Brazil change
16:00 - 17:30 | Session 3-B: Demo and tool & TutorialData and Tool Showcase Track / Tutorials / MSR Program at Oceania IV | ||
16:00 90mTalk | MOOT: a Repository of many Multi-objective Optimization Tasks Data and Tool Showcase Track Tim Menzies North Carolina State University, Tao Chen University of Birmingham, Yulong Ye University of Birmingham, Kishan Kumar Ganguly NC State, Amirali Rayegan NC State, Srinath Srinivasan North Carolina State University, Andre Lustosa North Carolina State University | ||
16:00 90mTalk | Mapping Decentralized Autonomous Organization Governance Across Chains: An Updated, Multi-Platform Dataset Data and Tool Showcase Track Mashiat Amin Farin University of Texas at Dallas, Samer Hassan Institute of Knowledge Technology, Universidad Complutense de Madrid, Madrid, Spain & Berkman Klein Center at Harvard University, Cambridge MA, USA, Javier Arroyo Dpt. of Computer Science, Universidad de Alcalá, Madrid, Spain & Institute of Knowledge Technology, Universidad Complutense de Madrid Madrid, Spain | ||
16:00 90mTalk | Assessing Task-based Chatbots: Snapshot and Curated Datasets for Dialogflow Data and Tool Showcase Track Elena Masserini University of Milano - Bicocca, Diego Clerissi University of Milano-Bicocca, Daniela Micucci University of Milano-Bicocca, Italy, Leonardo Mariani University of Milano-Bicocca | ||
16:00 90mTalk | LILA: Decentralized Build Reproducibility Monitoring for the Functional Package Management Model Data and Tool Showcase Track Julien Malka LTCI, Télécom Paris, Institut Polytechnique de Paris, France, Arnout Engelen Independent | ||
16:00 90mTalk | Mining Kubernetes Repositories: The Cloud was Not Built in a Day Data and Tool Showcase Track Giuseppe Destefanis University College London, Silvia Bartolucci University College London, Daniel Feitosa University of Groningen | ||
16:00 90mTalk | RustXec: A Vulnerability Reproduction Dataset for Assessing Security Risks in Open-Source Rust Applications Data and Tool Showcase Track Zhengjie Ji Virginia Tech, Xin Wang Virginia Tech, Wang Lingxiang Unaffiliated, Geng Li Wake Forest University, Fan Yang Wake Forest University, Ying Zhang Wake Forest University | ||
16:00 90mTalk | OSSGameBench: A Large-Scale Dataset of Development Activities in Open-Source Video Games Data and Tool Showcase Track DOI Pre-print | ||
16:00 90mTalk | JavaBackports: A Dataset for Benchmarking Automated Backporting in Java Data and Tool Showcase Track Kaushal Kahapola University of Moratuwa, Sri Lanka, Sharada Galappaththi University of Moratuwa, Sri Lanka, Dinith Ranasinghe University of Moratuwa, Sri Lanka, Ridwan Salihin Shariffdeen SonarSource, Nisansa de Silva University of Moratuwa, Sri Lanka, Srinath Perera WSO2, Sandareka Wickramanayake University of Moratuwa, Sri Lanka | ||
16:00 90mTalk | HackRep: A Large-Scale Dataset of GitHub Hackathon Projects Data and Tool Showcase Track Sjoerd Halmans Eindhoven University of Technology, Lavinia Francesca Paganini Eindhoven University of Technology, Alexander Serebrenik Eindhoven University of Technology, Alexander Nolte Eindhoven University of Technology | ||
16:00 90mTalk | KubeObjects: A Dataset of Real-World Kubernetes Objects Data and Tool Showcase Track Matteo Grella University of Twente, Danil Aliforenko University of Twente, Luca Mariot University of Twente | ||
16:00 90mTalk | IssuePilot: An Agentic Framework for Personalized Issue Recommendation and Onboarding in Open-Source Projects Data and Tool Showcase Track | ||
16:00 90mTalk | DBSecQA: A Curated Dataset of Developer Discussions on Database Security from Stack Exchange Data and Tool Showcase Track Md Rakibul Islam Lamar University, Farha Kamal Lamar University, MD HUMAUN KABIR Lamar University, Md Murad Sharif Lamar University | ||
16:00 90mTalk | PoolinGH: Fast, Efficient, and Robust GitHub Repository Mining Data and Tool Showcase Track Maxime ANDRÉ Namur Digital Institute, University of Namur, Marco Raglianti REVEAL @ Software Institute – USI, Lugano, Switzerland, Souhaila Serbout University of Zurich, Zurich, Switzerland, Anthony Cleve University of Namur, Michele Lanza Software Institute - USI, Lugano Pre-print | ||
16:00 90mTalk | GivenWhenThen: A Dataset of BDD Test Scenarios Mined from Open Source Projects Data and Tool Showcase Track Luciano Belo de Alcântara Júnior UFMG, João Eduardo Montandon Universidade Federal de Minas Gerais (UFMG) | ||
16:00 90mTalk | AnoMod: A Dataset for Anomaly Detection and Root Cause Analysis in Microservice System Data and Tool Showcase Track Ke Ping University of Helsinki, Hamza Bin Mazhar University of Helsinki, Yuqing Wang University of Helsinki, Finland, Ying Song University of Helsinki, Mika Mäntylä University of Helsinki and University of Oulu | ||
16:00 90mTalk | GLiSE: A Prompt-Driven and ML-Powered Tool for Automated Grey Literature Extraction in Software Engineering Data and Tool Showcase Track Brahim Mahmoudi École de technologie supérieure, Zacharie Chenail-Larcher École de technologie supérieure (ÉTS), Houcine Abdelkader Cherief Ecole de Technologie Supérieure, Quentin Stiévenart Université du Québec à Montréal, Naouel Moha École de Technologie Supérieure (ETS), Florent AVELLANEDA Université du Québec à Montréal | ||
16:00 90mTalk | OmniCCG: Agnostic Code Clone Genealogy Extractor Data and Tool Showcase Track Denis Sousa State University of Ceara, Brazil, Matheus Paixao State University of Ceará, Thiago Lima State University of Ceara, Brazil, Adriely Silva State University of Ceara, Brazil, Italo Uchoa State University of Ceará, Chaiyong Ragkhitwetsagul Mahidol University | ||
16:00 90mTalk | GitEvo: Code Evolution Analysis for Git Repositories Data and Tool Showcase Track Andre Hora UFMG Pre-print | ||
16:00 90mTalk | Skyt: Prompt Contracts for Software Repeatability in LLM-Assisted Development Data and Tool Showcase Track Heitor Roriz Filho Massimus, Nasser Jazdi University of Stuttgart, Vicente Lucena Universidade Federal do Amazonas | ||
16:00 90mTalk | InEx-Bug: A Human Annotated Dataset of Intrinsic and Extrinsic Bugs in the NPM Ecosystem Data and Tool Showcase Track Tanner Wright University of British Columbia, Adams Chen University of British Columbia, Gema Rodríguez-Pérez Department of Computer Science, Mathematics, Physics and Statistics, University of British Columbia, Okanagan Campus | ||
16:50 40mTalk | Running Large Language Models at Scale for Mining Software Repositories: Lessons Learned from HPC-Based Batch Inference Tutorials Ruoyu Su , Matteo Esposito University of Oulu, Davide Taibi University of Southern Denmark and University of Oulu, Valentina Lenarduzzi University of Southern Denmark | ||