Realtime Models vs Traditional Pipelines
Are realtime audio-to-audio models ready to replace your VAD→STT→LLM→TTS pipeline? As context windows grow and models get better at reading intent, users are shifting to vibe interaction. But like physics engines in games, they still need guardrails and flow control.
 
OpenAI's Realtime API
I imagine that many in conversational AI, like me, have been trying to wrap their minds around the recent release of OpenAI's Realtime API. Inflexible turn-taking has been a major thorn affecting the usability of spoken dialog systems, and the announcement of a production-quality real-time audio-to-audio API (aka full-duplex, aka continuous recognition and generation) is somewhat world-shaking.
First post
Vision
What's your startup about? That's not a question any founder is expected to fumble. Yet here I am, three years since founding and countless pivots, pondering this question more than I can remember.