KugelAudio

Real-time text-to-speech model you can self-host on your own cluster

About

KugelAudio is a real-time TTS model with sub-60ms latency, purpose-built for voice agents that need to feel like a live conversation.

It covers 24 European languages, clones a voice from a 30-60 second sample, and — critically — can be self-hosted in your own cluster instead of behind their API, which matters whenever the audio data has to stay on your network.

Ships with adapters for LiveKit, Pipecat, and Vapi plus SDKs in Python, JS, and Java. Built by a Berlin-based team in Y Combinator's Spring 2026 batch.