Live transcription - just Vosk? Other possibilities? Requirements?

valentijn · January 12, 2026, 2:11pm

Hi, while preparing for live transcription I’m having a couple of questions. First of all: is the Vosk local AI the only way to do live transcription? Or is it possible to plugin the Talk HPB to other AI systems? Also: does live transcription include live captions, i.e. subtitles? I’m asking because Live Closed Captions · Issue #6915 · nextcloud/spreed · GitHub is still open although Advanced Talk features — Nextcloud latest User Manual latest documentation seems to provide subtitles. Also, there seems to be no mention whatsoever of system requirements for the live_transcription stuff, which feels a bit weird as I’m suspecting it to be significant. (I realise that this is on the verge of a technical “support” question - I’m not asking support, I’m asking about the situation.)

kyteinsky · January 14, 2026, 6:26am

hello,

is the Vosk local AI the only way to do live transcription? Or is it possible to plugin the Talk HPB to other AI systems?

yes, that’s the only way for now.

it is the other way round, the live_transcription plugs into Talk HPB, but I’m not aware of any other transcription app.

Also: does live transcription include live captions, i.e. subtitles? I’m asking because Live Closed Captions · Issue #6915 · nextcloud/spreed · GitHub is still open although Advanced Talk features — Nextcloud latest User Manual latest documentation seems to provide subtitles.

They have tiny differences but in this context I suppose they are the same things, maybe drop a message in the issue if it can be closed now.
The PR that added the feature in Talk side: https://github.com/nextcloud/spreed/pull/15696

Also, there seems to be no mention whatsoever of system requirements for the live_transcription stuff, which feels a bit weird as I’m suspecting it to be significant.

yeah, we’re testing it out a bit before posting them but it will be available in the admin docs soon. For now, around 16GiB of RAM/VRAM and 2 CPU threads should be plenty for 1 call, with a few GiBs increase in RAM and 1 extra CPU thread for each additional call. For GPU, only NVIDIA is supported.