SQuaD: The Software Quality Dataset
Software quality research increasingly relies on large-scale datasets that measure both the product and process aspects of software systems. However, existing resources often focus on limited dimensions, such as code smells, technical debt, or refactoring activity, thereby restricting comprehensive analyses across time and quality dimensions. To address this gap, we present the Software Quality Dataset (SQuaD), a multi-dimensional, time-aware collection of software quality metrics extracted from 450 mature open-source projects across diverse ecosystems, including Apache, Mozilla, FFmpeg, and the Linux kernel. By integrating nine state-of-the-art static analysis tools, i.e., SonarQube, CodeScene, PMD, Understand, CK, JaSoMe, RefactoringMiner, RefactoringMiner++, and PyRef, our dataset unifies over 700 unique metrics at method, class, file, and project levels. Covering a total of 63,586 analyzed project releases, SQuaD also provides version control and issue-tracking histories, software vulnerability data (CVE/CWE), and process metrics proven to enhance Just-In-Time (JIT) defect prediction. The SQuaD enables empirical research on maintainability, technical debt, software evolution, and quality assessment at unprecedented scale. We also outline emerging research directions, including automated dataset updates and cross-project quality modeling to support the continuous evolution of software analytics. The dataset is publicly available on ZENODO (DOI: 10.5281/zenodo.17541458).
Mon 13 AprDisplayed time zone: Brasilia, Distrito Federal, Brazil change
16:00 - 17:30 | Session 3-B: Demo and ToolsData and Tool Showcase Track / Registered Reports / MSR Program at Catering and Exhibition Hall (Europa I to IV) Chair(s): Klaas-Jan Stol Lero; University College Cork; SINTEF Digital , Minghui Zhou Peking University | ||
16:00 50mTalk | MOOT: a Repository of many Multi-objective Optimization Tasks Data and Tool Showcase Track Tim Menzies North Carolina State University, Tao Chen University of Birmingham, Yulong Ye University of Birmingham, Kishan Kumar Ganguly NC State, Amirali Rayegan NC State, Srinath Srinivasan North Carolina State University, Andre Lustosa North Carolina State University Pre-print | ||
16:00 50mTalk | Mapping Decentralized Autonomous Organization Governance Across Chains: An Updated, Multi-Platform Dataset Data and Tool Showcase Track Mashiat Amin Farin University of Texas at Dallas, Samer Hassan Institute of Knowledge Technology, Universidad Complutense de Madrid, Madrid, Spain & Berkman Klein Center at Harvard University, Cambridge MA, USA, Javier Arroyo Dpt. of Computer Science, Universidad de Alcalá, Madrid, Spain & Institute of Knowledge Technology, Universidad Complutense de Madrid Madrid, Spain | ||
16:00 50mTalk | Web3BlockSet: A Dataset for Empirical Research in Blockchain-Oriented Software Engineering Data and Tool Showcase Track Pamella Soares State University of Ceara (UECE), Giuseppe Destefanis University College London, Allan C. N. dos Santos Universidade Federal Fluminense, Allysson Allex Araújo Federal University of Cariri, Raphael Saraiva State University of Ceara, Jerffeson Teixeira de Souza State University of Ceara, Brazil | ||
16:00 50mTalk | Assessing Task-based Chatbots: Snapshot and Curated Datasets for Dialogflow Data and Tool Showcase Track Elena Masserini University of Milano - Bicocca, Diego Clerissi University of Milano-Bicocca, Daniela Micucci University of Milano-Bicocca, Italy, Leonardo Mariani University of Milano-Bicocca | ||
16:00 50mTalk | LILA: Decentralized Build Reproducibility Monitoring for the Functional Package Management Model Data and Tool Showcase Track Julien Malka LTCI, Télécom Paris, Institut Polytechnique de Paris, France, Arnout Engelen Independent | ||
16:00 50mTalk | Mining Kubernetes Repositories: The Cloud was Not Built in a Day Data and Tool Showcase Track Giuseppe Destefanis University College London, Silvia Bartolucci University College London, Daniel Feitosa University of Groningen | ||
16:00 50mTalk | RustXec: A Vulnerability Reproduction Dataset for Assessing Security Risks in Open-Source Rust Applications Data and Tool Showcase Track Zhengjie Ji Virginia Tech, Xin Wang Virginia Tech, Wang Lingxiang Unaffiliated, Geng Li Wake Forest University, Fan Yang Wake Forest University, Ying Zhang Wake Forest University | ||
16:00 50mTalk | OSSGameBench: A Large-Scale Dataset of Development Activities in Open-Source Video Games Data and Tool Showcase Track DOI Pre-print | ||
16:00 50mTalk | JavaBackports: A Dataset for Benchmarking Automated Backporting in Java Data and Tool Showcase Track Kaushal Kahapola University of Moratuwa, Sri Lanka, Sharada Galappaththi University of Moratuwa, Sri Lanka, Dinith Ranasinghe University of Moratuwa, Sri Lanka, Ridwan Salihin Shariffdeen SonarSource, Nisansa de Silva University of Moratuwa, Sri Lanka, Srinath Perera WSO2, Sandareka Wickramanayake University of Moratuwa, Sri Lanka Media Attached File Attached | ||
16:00 50mTalk | HackRep: A Large-Scale Dataset of GitHub Hackathon Projects Data and Tool Showcase Track Sjoerd Halmans Eindhoven University of Technology, Lavinia Francesca Paganini Eindhoven University of Technology, Alexander Serebrenik Eindhoven University of Technology, Alexander Nolte Eindhoven University of Technology Pre-print | ||
16:00 50mTalk | A Large-Scale Dataset of MCP Implementations on GitHub Data and Tool Showcase Track Benny Toeppe Oakland University, Amine Barrak Oakland University, USA, Emna Ksontini University of North Carolina Wilmington | ||
16:00 50mTalk | SMELLDroid: A Dataset for Code Smells in Android Apps Data and Tool Showcase Track Joyce Champie Florida Polytechnic University, Karim Elish Florida Polytechnic University, Mahmoud Elish Florida Polytechnic University | ||
16:00 50mTalk | KubeObjects: A Dataset of Real-World Kubernetes Objects Data and Tool Showcase Track Matteo Grella University of Twente, Danil Aliforenko University of Twente, Luca Mariot University of Twente | ||
16:00 50mTalk | IssuePilot: An Agentic Framework for Personalized Issue Recommendation and Onboarding in Open-Source Projects Data and Tool Showcase Track | ||
16:00 50mTalk | VoxTransPD: A Reusable Framework for Noise-Resilient Speech Analysis and Early Parkinson's Detection Data and Tool Showcase Track | ||
16:00 50mTalk | DBSecQA: A Curated Dataset of Developer Discussions on Database Security from Stack Exchange Data and Tool Showcase Track Md Rakibul Islam Lamar University, Farha Kamal Lamar University, MD HUMAUN KABIR Lamar University, Md Murad Sharif Lamar University | ||
16:00 50mTalk | AndroMetric: Bridging Multi-Dimensional Software Metrics and Mobile Application Security Data and Tool Showcase Track | ||
16:00 50mTalk | PoolinGH: Fast, Efficient, and Robust GitHub Repository Mining Data and Tool Showcase Track Maxime ANDRÉ Namur Digital Institute, University of Namur, Marco Raglianti REVEAL @ Software Institute – USI, Lugano, Switzerland, Souhaila Serbout Quantena AG, Anthony Cleve University of Namur, Michele Lanza Software Institute - USI, Lugano Pre-print | ||
16:00 50mTalk | How Does Experience Influence Developer Perceptions of Atoms of Confusion? Registered Reports Guoshuai Shi University of Waterloo, Farshad Kazemi University of Waterloo, Shane McIntosh University of Waterloo, Michael W. Godfrey University of Waterloo, Canada DOI Pre-print | ||
16:00 50mTalk | SQuaD: The Software Quality Dataset Data and Tool Showcase Track Mikel Robredo University of Oulu, Matteo Esposito University of Oulu, Davide Taibi University of Southern Denmark and University of Oulu, Rafael Penaloza University of Milano-Bicocca, Valentina Lenarduzzi University of Southern Denmark Pre-print | ||
16:00 50mTalk | GivenWhenThen: A Dataset of BDD Test Scenarios Mined from Open Source Projects Data and Tool Showcase Track Luciano Belo de Alcântara Júnior UFMG, João Eduardo Montandon Universidade Federal de Minas Gerais (UFMG) | ||
16:00 50mTalk | World of Logs: A Dataset of Logs from Online Documents Data and Tool Showcase Track Xiaohui Wang University of Waterloo, Kundi Yao Ontario Tech University, Lizhi Liao University of Guelph, Pengyu Nie University of Waterloo, Xuan Zhang Yunnan University, Weiyi Shang University of Waterloo | ||
16:00 50mTalk | AnoMod: A Dataset for Anomaly Detection and Root Cause Analysis in Microservice System Data and Tool Showcase Track Ke Ping University of Helsinki, Hamza Bin Mazhar University of Helsinki, Yuqing Wang University of Helsinki, Finland, Ying Song University of Helsinki, Mika Mäntylä University of Helsinki and University of Oulu | ||
16:00 50mTalk | Causal Inference for the Effect of Code Coverage on Bug Introduction Registered Reports Lukas Schulte University of Passau, Gordon Fraser University of Passau, Steffen Herbold University of Passau DOI Pre-print | ||
16:00 50mTalk | GLiSE: A Prompt-Driven and ML-Powered Tool for Automated Grey Literature Extraction in Software Engineering Data and Tool Showcase Track Brahim Mahmoudi École de technologie supérieure, Zacharie Chenail-Larcher École de technologie supérieure (ÉTS), Houcine Abdelkader Cherief Ecole de Technologie Supérieure, Quentin Stiévenart Université du Québec à Montréal, Naouel Moha École de Technologie Supérieure (ETS), Florent AVELLANEDA Université du Québec à Montréal | ||
16:00 50mTalk | OmniCCG: Agnostic Code Clone Genealogy Extractor Data and Tool Showcase Track Denis Sousa State University of Ceara, Brazil, Matheus Paixao State University of Ceará, Thiago Lima State University of Ceara, Brazil, Adriely Silva State University of Ceara, Brazil, Italo Uchoa State University of Ceará, Chaiyong Ragkhitwetsagul Mahidol University | ||
16:00 50mTalk | GitEvo: Code Evolution Analysis for Git Repositories Data and Tool Showcase Track Andre Hora UFMG Pre-print Media Attached | ||
16:00 50mTalk | Skyt: Prompt Contracts for Software Repeatability in LLM-Assisted Development Data and Tool Showcase Track Heitor Roriz Filho Massimus, Nasser Jazdi University of Stuttgart, Vicente Lucena Universidade Federal do Amazonas | ||
16:00 50mTalk | AndroT: A Dataset of Android Apps with Tests Data and Tool Showcase Track | ||
16:00 50mTalk | InEx-Bug: A Human Annotated Dataset of Intrinsic and Extrinsic Bugs in the NPM Ecosystem Data and Tool Showcase Track Tanner Wright University of British Columbia, Adams Chen University of British Columbia, Gema Rodríguez-Pérez Department of Computer Science, Mathematics, Physics and Statistics, University of British Columbia, Okanagan Campus | ||