Where Do Smart Contract Security Analyzers Fall Short?
This program is tentative and subject to change.
Smart contracts underpin high-value ecosystems such as decentralized finance (DeFi), yet recurring vulnerabilities continue to cause losses worth billions of dollars. Although numerous security analyzers that detect such flaws exist, real-world attacks remain frequent, raising the question of whether these tools are truly effective or simply under-used due to low developer trust. Prior benchmarks have evaluated analyzers on synthetic or vulnerable-only contract datasets, limiting their ability to measure false positives, false negatives, and usability factors that drive adoption.
To close this gap, we present a mixed-methods study that combines large-scale benchmarking with practitioner insights. We evaluate six widely used analyzers (i.e., Confuzzius, DLVA, Mythril, Osiris, Oyente, and Slither) on 653 real-world smart contracts that cover three high-impact vulnerability classes from the OWASP Smart Contract Top Ten (i.e., reentrancy, suicidal contract termination, and integer arithmetic errors). Our results show substantial variation in accuracy (F1 = 31.2–94.6%), high false-positive rates (up to 32.6%), and runtimes exceeding 700 seconds per contract. We then survey 150 professional developers and auditors to understand how they use and perceive these tools. Our findings reveal that excessive false positives, vague explanations, and long analysis times are the main barriers to trust and adoption in practice. By linking measurable performance gaps to developer perceptions, we provide concrete recommendations for improving the precision, explainability, and usability of smart-contract security analyzers.
| (MSR26 Presentation.pdf) | 708KiB |
This program is tentative and subject to change.
Mon 13 AprDisplayed time zone: Brasilia, Distrito Federal, Brazil change
11:00 - 12:30 | Session 1-B: Quality & Security ITechnical Papers / Industry Track / MSR Program at Oceania IV Chair(s): Diomidis Spinellis AUEB & TU Delft | ||
11:00 10mResearch paper | Where Do Smart Contract Security Analyzers Fall Short? Technical Papers DOI Pre-print File Attached | ||
11:10 10mTalk | An Empirical Study of Vulnerabilities in Python Packages and Their Detection Technical Papers Haowei Quan Monash University, Junjie Wang Tianjin University, Xinzhe Li College of Intelligence and Computing, Tianjin University, Terry Yue Zhuo Monash University and CSIRO's Data61, Xiao Chen University of Newcastle, Xiaoning Du Monash University Media Attached | ||
11:20 10mTalk | Does Programming Language Matter? An Empirical Study of Fuzzing Bug Detection Technical Papers Tatsuya Shirai Nara Institute of Science and Technology, Olivier Nourry The University of Osaka, Yutaro Kashiwa Nara Institute of Science and Technology, Kenji Fujiwara Nara Women’s University, Hajimu Iida Nara Institute of Science and Technology | ||
11:30 10mTalk | An Empirical Study on Line-Level Software Defect Prediction Technical Papers Enci Zhang Beijing Jiaotong University, Yutong Jiang Beijing Jiaotong University, Tianmeng Zhang Beijing Jiaotong University, Haonan Tong Beijing Jiaotong University | ||
11:40 10mTalk | Characterizing and Modeling the GitHub Security Advisories Review Pipeline Technical Papers Claudio Segal UFF, Paulo Segal UFF, Carlos Eduardo de Schuller Banjar UFRJ, Felipe Paixão Federal University of Bahia (UFBA), Hudson Silva Borges UFMS, Paulo Silveira Neto Federal University Rural of Pernambuco, Eduardo Santana de Almeida Federal University of Bahia, Joanna C. S. Santos University of Notre Dame, Anton Kocheturov Siemens Technology, Gaurav Kumar Srivastava Siemens, Daniel Sadoc Menasche UFRJ, Brazil Pre-print | ||
11:50 10mTalk | Linux Kernel Recency Matters, CVE Severity Doesn’t, and History Fades Technical Papers Piotr Przymus Nicolaus Copernicus University in Toruń, Poland, Witold Weiner Nicolaus Copernicus University in Toruń and Adtran Networks Sp. z o.o, Krzysztof Rykaczewski Nicolaus Copernicus University in Toruń, Poland, Gunnar Kudrjavets Amazon Web Services, USA Pre-print | ||
12:00 10mTalk | Beyond Single Code Changes: An Empirical Study of Topic-Based Code Review Practices in Gerrit for OpenStack Technical Papers Moataz Chouchen Concordia University, Mahi Begoug ETS Montreal, Ali Ouni Ecole de Technologie Superieure (ETS) | ||
12:10 10mTalk | LogSieve: Task-Aware CI Log Reduction for Sustainable LLM-Based Analysis Technical Papers Marcus Barnes University of Toronto, Taher A. Ghaleb Trent University, Safwat Hassan University of Toronto Pre-print | ||
12:20 5mTalk | Finding Important Stack Frames in Large Systems Industry Track Aleksandr Khvorov JetBrains; Constructor University Bremen, Yaroslav Golubev JetBrains Research, Denis Sushentsev JetBrains | ||
12:25 5mTalk | Stop Comparing Apples and Oranges: Matching for Better Results in Mining Software Repositories Studies Technical Papers Sabato Nocera University of Salerno, Nyyti Saarimäki University of Luxembourg, Valentina Lenarduzzi University of Southern Denmark, Davide Taibi University of Southern Denmark and University of Oulu, Sira Vegas Universidad Politecnica de Madrid | ||