Automated Testing of Task-based Chatbots: How Far Are We? (MSR 2026 - Registered Reports)

Who

Diego Clerissi, Elena Masserini, Daniela Micucci, Leonardo Mariani

Track

MSR 2026 Registered Reports

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 13 Apr 2026 17:10 - 17:15 at Oceania V - Session 3-A: Tutorial + Registered reports talks Chair(s): Taher A. Ghaleb

Abstract

Task-based chatbots are software, typically embedded in real-world applications, that assist users in completing tasks through a conversational interface. As chatbots are gaining popularity, effectively assessing their quality has become crucial. Whereas traditional testing techniques fail to systematically exercise the conversational space of chatbots, several approaches specifically targeting chatbots have emerged from both industry and research. Although these techniques have shown advancements over the years, they still exhibit limitations, such as simplicity of the generated test scenarios and weakness in implemented oracles.

In this paper, we conduct a confirmatory study to investigate such limitations by evaluating the effectiveness of state-of-the-art chatbot testing techniques on a curated selection of task-based chatbots from GitHub, developed using the most popular commercial and open-source platforms.

Link to Preprint

https://doi.org/10.48550/arXiv.2602.13072

DOI

https://doi.org/10.48550/arXiv.2602.13072

Diego Clerissi

University of Milano-Bicocca

Italy

Elena Masserini

University of Milano - Bicocca

Italy

Daniela Micucci

University of Milano-Bicocca, Italy

Italy

Leonardo Mariani

University of Milano-Bicocca

Italy

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 13 Apr
Displayed time zone: Brasilia, Distrito Federal, Brazil change

16:00 - 17:30	Session 3-A: Tutorial + Registered reports talksRegistered Reports / Tutorials / MSR Program at Oceania V Chair(s): Taher A. Ghaleb Trent University

16:00 45m Talk		Selecting the Data Source that Matter: Fine-Tuning Domain-Specific Ecosystem Studies with MARIN Tutorials Johannes Düsing Technische Universität Dortmund, Ben Hermann University of Stuttgart
16:45 5m Talk		Ask, Then Think: Enhancing LLM Performance with Socratic Reasoning Registered Reports Antonio Della Porta University of Salerno, Jonan Richards Radboud University, Lucageneroso Cammarota University of Salerno, Stefano Lambiase Department of Computer Science, Aalborg University, Denmark, Fabio Palomba University of Salerno, Mairieli Wessel Radboud University DOI Pre-print
16:50 5m Talk		Beyond the Prompt: Assessing Domain Knowledge Strategies for High-Dimensional LLM Optimization in Software Engineering Registered Reports Srinath Srinivasan North Carolina State University, Tim Menzies North Carolina State University Pre-print
16:55 5m Talk		Does Impact Analysis Support the Review of Changes to Build Specifications? Registered Reports Mattie Nejati Ubisoft Montréal, Mahmoud Alfadel University of Calgary, Shane McIntosh University of Waterloo DOI Pre-print
17:00 5m Talk		Parameterized Tests in Practice: Adoption, Styles, and Impact in Apache Java Projects Registered Reports Xinyi Li Stevens Institute of Technology, Lu Xiao Stevens Institute of Technology, Gengwu Zhao Stevens Institute of Technology, Sunny Wong Envestnet DOI Pre-print
17:05 5m Talk		Causal Inference for the Effect of Code Coverage on Bug Introduction Registered Reports Lukas Schulte University of Passau, Gordon Fraser University of Passau, Steffen Herbold University of Passau DOI Pre-print
17:10 5m Talk		Automated Testing of Task-based Chatbots: How Far Are We? Registered Reports Diego Clerissi University of Milano-Bicocca, Elena Masserini University of Milano - Bicocca, Daniela Micucci University of Milano-Bicocca, Italy, Leonardo Mariani University of Milano-Bicocca DOI Pre-print
17:15 5m Talk		The Influence of Code Smells in Efferent Neighbors on Class Stability Registered Reports Zushuai Zhang University of Auckland, Elliott Wen The University of Auckland, Ewan Tempero The University of Auckland DOI Pre-print Media Attached File Attached
17:20 5m Talk		How Does Experience Influence Developer Perceptions of Atoms of Confusion? Registered Reports Guoshuai Shi University of Waterloo, Farshad Kazemi University of Waterloo, Shane McIntosh University of Waterloo, Michael W. Godfrey University of Waterloo, Canada DOI Pre-print