World of Logs: A Dataset of Logs from Online Documents
Software logs serve as valuable resources for understanding system running and are extensively used in diverse software maintenance tasks. As software systems get more complex and log data grows, a good log dataset is fundamental for developing automated log analysis tools. However, current log datasets are limited in three aspects, i.e., narrow in scope, lacking context information, and outdated. To bridge this gap, in this paper, we aim to extract software logs from online resources (e.g., JIRA issue reports, GitHub repositories, and Stack Overflow discussions), which concern various types of software systems and provide context for logs, such as observed behaviors and expected behaviors. This work introduces WoL, a dataset comprising real-world logs along with their contextual information. WoL currently contains over 2.5 million log messages or logging statements from diverse online resources and is publicly available to facilitate reproducible research. WoL can be used for various log-related tasks, including understanding logging intent and quality, anomaly detection, and linking logs with software artifacts for contextual analysis. WoL is publicly available on Zenodo and will be continuously updated. Furthermore, based on WoL, we develop a search engine, LogSearch, to support user queries.