Bridging Design and Implementation: A Study of Multi-Agent LLM Architectures for Automated Front-End Generation
This program is tentative and subject to change.
Automating front-end development directly from design artifacts and textual requirements could accelerate iteration cycles and reduce implementation errors, yet most prior work addresses only a single modality (either design-to-code or text-to-code generation) without integrating complementary specifications. We propose a multi-agent framework that jointly reasons over user stories and Figma designs to synthesize complete React applications. The framework coordinates generation, validation, and repair through three architectural strategies: Supervisor (tool-calling) for centralized routing, Hierarchical for decomposed supervision, and Custom for deterministic workflow execution. Evaluated on four real-world projects (75 user stories) using six generator–judge model pairs (Claude, Gemini, GPT), the system achieves 54% full functional coverage and 58% full visual fidelity; including partial matches raises success rates to 77% and 85%, respectively. Architectural choice modestly affects quality (3–5 percentage-point variation) but substantially impacts cost: the Custom architecture reduces generator token usage by 21–65% compared to Hierarchical and Supervisor (tool-calling) configurations, while judge models consistently dominate overall cost (5.9× more tokens on average than generators). To further enhance pipeline stability and reduce manual intervention, we introduce a lightweight repair toolkit comprising automated refusal retries, JSX sanitization, and template scaffolding resolves the majority of generation-stage failures without regeneration. Overall, these results demonstrate that multimodal, agentic frameworks can reliably automate front-end synthesis, though achieving full production-grade quality still requires human refinement and improved handling of complex interaction behaviors.