Secret Leak Detection in Software Issue Reports using LLMs: A Comprehensive Evaluation (MSR 2026 - Technical Papers)

Who

Sadif Ahmed, Md Nafiu Rahman, Zahin Wahab, Gias Uddin, Rifat Shahriyar

Track

MSR 2026 Technical Papers

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 14 Apr 2026 12:20 - 12:30 at Oceania IV - Session 1-B: Maintenance, Evolution & Processes Chair(s): Gregorio Robles

Abstract

In the digital era, accidental exposure of sensitive information such as API keys, tokens, and credentials is a growing security threat. While most prior work focuses on detecting secrets in source code, leakage in software issue reports remains largely unexplored. This study fills that gap through a large-scale analysis and a practical detection pipeline for exposed secrets in GitHub issues. Our pipeline combines regular expression–based extraction with large language model (LLM)–based contextual classification to detect real secrets and reduce false positives. We build a benchmark of 54,148 instances from public GitHub issues, including 5,881 manually verified true secrets. Using this dataset, we evaluate entropy-based baselines and keyword heuristics used by prior secret detection tools, classical machine learning, deep learning, and LLM-based methods. Regex and entropy based approaches achieve high recall but poor precision, while smaller models such as RoBERTa and CodeBERT greatly improve performance (F1 = 92.70%). Proprietary models like GPT-4o perform moderately in few-shot settings (F1 = 80.13%), and fine-tuned open-source larger LLMs such as Qwen and LLaMA reach up to 94.49% F1. Finally, we also validate our approach on 178 real-world GitHub repositories, achieving an F1-score of 81.6% which demonstrates our approach’s strong ability to generalize to in-the-wild scenarios.

Link to Preprint

https://arxiv.org/abs/2410.23657

File attachments

Presentation Slide (Secret_Leak_Prevention_in_Software_Issue_Reports__MSR_ Presentation.pdf)	630KiB

Sadif Ahmed

Bangladesh University of Engineering and Techonology

Bangladesh

Md Nafiu Rahman

Bangladesh University of Engineering and Technology

Zahin Wahab

The University of British Columbia

Bangladesh

Gias Uddin

York University, Canada

Canada

Rifat Shahriyar

Bangladesh University of Engineering and Technology Dhaka, Bangladesh

Secret Leak Detection in Software Issue Reports using LLMs: A Comprehensive Evaluation | MSR 2026