Assessing Task-based Chatbots: Snapshot and Curated Datasets for Dialogflow (MSR 2026 - Data and Tool Showcase Track)

Mon 13 - Tue 14 April 2026 Rio de Janeiro, Brazil

co-located with ICSE 2026

Who

Elena Masserini, Diego Clerissi, Daniela Micucci, Leonardo Mariani

Track

MSR 2026 Data and Tool Showcase Track

Abstract

In recent years, chatbots have gained widespread adoption thanks to their ability to assist users at any time and across diverse domains. However, the lack of large-scale curated datasets limits research on their quality and reliability.

This paper presents TOFU-D, a snapshot of 1,788 Dialogflow chatbots from GitHub, and COD, a curated subset of TOFU-D including 185 validated chatbots. The two datasets capture a wide range of domains, languages, and implementation patterns, offering a sound basis for empirical studies on chatbot quality and security. A preliminary assessment using the Botium testing framework and the Bandit static analyzer revealed gaps in test coverage and frequent security vulnerabilities in several chatbots, underscoring the need for systematic, multi-platform research on chatbot quality and security.

Elena Masserini

Assessing Task-based Chatbots: Snapshot and Curated Datasets for Dialogflow

Elena Masserini

University of Milano - Bicocca

Italy

Diego Clerissi

University of Milano-Bicocca

Italy

Daniela Micucci

University of Milano-Bicocca, Italy

Italy

Leonardo Mariani

University of Milano-Bicocca

Italy

Tracks