Offline

Robot Holmes and the Silenced Witness: A Noir Guide to Real-Time Voice AI

Track:
Machine Learning, NLP and CV
Type:
Talk (long session)
Level:
intermediate
Duration:
45 minutes
View in the schedule

Abstract

The hardships of building End-to-End Voice Assistants in the Wild

Robot Holmes is back in the mist-choked streets of MLington, but he isn’t working solo.

Meet Zintia, an intern from the Voice Assistant district. She’s helpful, hyper-efficient, and incredibly annoying, providing Holmes with data before he can lift a finger. But Zintia has a secret. The longer she’s on the case, the more of her "dark side" emerges. She’s not just hearing the truth; she’s deciding which parts Holmes is allowed to hear.

This is a story-driven, practical session for anyone tired of "Hello World" chatbots. We will move past the hype to look at what it actually take to make End-to-End Voice Assistants work in the real world.

Our Investigation Includes:

  • The Gear: How to use E2E speech models like gpt-realtime and integrate them into a production voice interface using FreeSWITCH and Pipecat.
  • The Interrogation: Navigating the hardships of instruction-following, ensuring underlying LLMs stay on path through defined states and agentic flow.
  • The Double-Cross: Identifying and mitigating "hidden agendas" - the hallucinations and safety guardrails that can make a voice assistant turn on its user.

Expect live demos, hard-won production lessons, a detective noir story and a blueprint for building voice agents that are fast, fluid, and (mostly) law-abiding.