What are the best options for local conversational voice agents?

sobchak@programming.dev · 7 days ago

What are the best options for local conversational voice agents?

Sims@lemmy.ml · 18 hours ago

I haven’t followed that closely lately, but KoljaB on github had some interesting repos. Also ‘livekit’ is new and afaik creates an audio agent. No personal experience though…

hendrik@palaver.p3x.de · 12 hours ago

Nice, that’s quite an assortment of random stuff from speaker diarization to full-blown virtual assistants with avatars. Probably worth having a look when tinkering around with Python and AI. By the way, Livekit is a WebRTC video conferencing toolkit. I didn’t know they had AI agents in there. Guess I can build my own callcenter now. Or the AI granny (Youtube), I’ve always wanted something like that answer my landline calls. 😆

SmokeyDope@lemmy.world · 7 days ago

Kobold.CPP has pretty good TTS model integration I used OuteTTS model when I played around with it but theres also API integration with commercial ones like kokoro.

However, I’m no sure if its able to stream to a TTS model as the llm is generating when I tried it just waited till after output to send to voice model you may need to do some documentation reading to see if real time streaming is possible if you go that route.

hendrik@palaver.p3x.de · 7 days ago

I got a bonus question… Is there a good end-to-end voice conversation solution? I’d like to try something which directly processes the audio and returns audio, rather than the whole pipeline with vad -> stt -> llm -> tts

lynx@sh.itjust.works · 7 days ago

There are not many models that support any-to-any, currently the best seems to be Qwen3-Omni, the audio quality is not great and it is not supported by llama.cpp: https://github.com/ggml-org/llama.cpp/issues/16186

hendrik@palaver.p3x.de · 7 days ago

Thanks! if anyone has more (good) alternatives or something like a curated list, I’d have a look at that as well… always a bit complicated to stay up to date and go through the myriad of options myself…

TheLeadenSea@sh.itjust.works · 7 days ago

Alpaca on Flathub seems ok

kata1yst@sh.itjust.works · 7 days ago

OpenWebUI has TTS and STT.