A Blueprint for Trustworthy Code Annotation at Scale: An LLM-Powered Pipeline for Industrial Software Analytics
In modern software ecosystems like Kotlin/Android, data-driven quality assurance is often stalled by a critical bottleneck: the scarcity of high-quality, expertly labeled datasets. Manual annotation is economically unviable and non-scalable. This presentation introduces a validated, production-grade blueprint for automating code annotation, transforming noisy commit histories into high-confidence data for software analytics.
We present a practical two-stage pipeline acting as a “classifier-as-a-service.” First, we employ MSR-driven mining to filter candidate pools from over 75,000 industrial commits. Second, we utilize an Ensemble of Large Language Models (each exceeding 20 billion parameters) acting as a virtual expert panel. By leveraging Chain-of-Thought prompting and enforcing consensus logic, our approach mitigates individual model hallucinations and biases.
Key contributions include:
-
Empirical Reliability: Validation demonstrating 86.84% unanimous agreement between our LLM consensus and senior human experts.
-
Actionable Guidelines: Identification of a 20B-parameter performance threshold necessary for nuanced code analysis.
-
Reusable Methodology: A strategic engineering asset that organizations can adapt to build their own scalable software analytics capabilities.
This session provides MSR attendees with a reliable path to overcome the data bottleneck, moving beyond theoretical discussions to deliver a deployable solution for industrial challenges.