susuROBO blog: access to AI through advanced UI

2025-09-04

Realtime Models vs Traditional Pipelines

Are realtime audio-to-audio models ready to replace your VAD→STT→LLM→TTS pipeline? As context windows grow and models get better at reading intent, users are shifting to vibe interaction. But like physics engines in games, they still need guardrails and flow control.

Read more →

2024-10-05

OpenAI's Realtime API

I imagine that many in conversational AI, like me, have been trying to wrap their minds around the recent release of OpenAI's Realtime API. Inflexible turn-taking has been a major thorn affecting the usability of spoken dialog systems, and the announcement of a production-quality real-time audio-to-audio API (aka full-duplex, aka continuous recognition and generation) is somewhat world-shaking.

Read more →

First post

2024-09-22

Vision

What's your startup about? That's not a question any founder is expected to fumble. Yet here I am, three years since founding and countless pivots, pondering this question more than I can remember.

Read more →