An Empirical Study of Vulnerabilities in Python Packages and Their Detection (MSR 2026 - Technical Papers)

Mon 13 - Tue 14 April 2026 Rio de Janeiro, Brazil

co-located with ICSE 2026

Who

Haowei Quan, Junjie Wang, Xinzhe Li, Terry Yue Zhuo, Xiao Chen, Xiaoning Du

Track

MSR 2026 Technical Papers

Abstract

In the rapidly evolving software development landscape, Python stands out for its simplicity, versatility, and extensive ecosystem. Python packages, as units of organization, reusability, and distribution, have become a pressing concern, highlighted by the considerable number of vulnerability reports. Existing benchmarks either do not target Python package-vulnerabilities or faces label accuracy issues stem from non-security-related changes within patching commits. This paper addresses these gaps by introducing PyVul, the first comprehensive benchmark suite of Python-package vulnerabilities. PyVul includes 1,157 publicly reported, developer-verified vulnerabilities, annotated at both the commit level and function level. To enhance labeling quality, we propose LLM-VDC, a generic vulnerability benchmark cleansing method that leverages the code semantic understanding capability of LLMs. LLM-VDC improves PyVul’s function-level label accuracy by 2.0 fold and establish PyVul the most precise automatically collected vulnerability benchmark. Based on PyVul, we conduct the first empirical study to unveil the characteristics of Python-package vulnerabilities and the limitations of state-of-the-art detection tools. Our empirical analysis reveals that current rule-based vulnerability detectors suffer from mismatches between their assumptions and real-world security scenarios, and limited support for high-order vulnerabilities, cross-language interactions, and Python’s unique language features. On the other hand, ML-based detectors suffer from their inability to reach the necessary context. PyVul provides a solid foundation for advancing vulnerability research and tool development in this domain.

Haowei Quan

Monash University

Junjie Wang

Tianjin University