[G] GenAI subtitles for Korean BW content

The latest Gemini model by Google can handle video input and works surprisingly well for generating English subtitles for Korean Brood War videos. It still makes mistakes here and there and sometimes hallucinates, but it's a big step up from the gibberish you get from YouTube's auto-subtitles. If I had to guesstimate, I’d say it gets >80% right, which feels pretty impressive.

Workflow below.

+ Show Spoiler +

I'm using Gemini 2.5 Pro Preview (05-06) at https://aistudio.google.com/ with default settings. The model is currently free to test. It supports up to 1 million tokens of context; one minute of video is roughly 20k tokens, so the videos above ended up around ~160k–170k tokens each. However this means long content videos like daily proleague or KCM would not work as these exceed the context limit. Maybe chopping them up somehow could work?

Basically, I just pass it the YouTube link and ask it to generate English subtitles.

I've found it works better if I do this in two steps. First, I give it the link and just ask, "what is happening here?".

It will take a while and output a summary.
+ Show Spoiler +

+ Show Spoiler +

Interestingly, this summary often has hallucinations and often doesn’t accurately describe the video. Still, I noticed that when I skip this step and instead ask for subtitles right away, the results are worse. It seems like preloading the context window with Brood War jargon actually helps when it comes time to generate the subtitles. The summary itself being wrong doesn't seem to have any effect on the quality of the subtitles.

After that, I ask it to create the subtitles. The prompt I use looks like this:

+ Show Spoiler +

create english subtitles (.srt)

Quick sanity checklist for SRT files:

Sequential numbers starting at 1.

Timestamp line exactly HH:MM:SS,mmm --> HH:MM:SS,mmm.

The video is less than 1 hour long so all timestamps must start with 00 for HH.

One subtitle text line.

A blank line after every cue.

This should give you subtitles you can copy, save as an .srt file, and use with the video. + Show Spoiler +

The resulting .srt file sometimes has errors which results in missing text; this is often due to the generated formatting being wrong. Most of the times I found it best to just re-run until it worked. Alternatively you could adjust the prompt or fix the .srt yourself. I found the browser addon substital useful, because it allows you to use a local .srt file for youtube videos; and it generated error messages caused by wrong formatting of the .srt files faster than captionify.

I’m still figuring out the best way to share these or upload them for YouTube. I found captionfy, which seems pretty easy to use. You sign up and can create a shareable overlay for any YouTube video. The good thing is that traffic still goes to the original creator, and anyone can upload subtitles that are then available for everyone.

I guess the end goal would be to automate the full pipeline and translate a lot of stuff? It seems captionfy does not have an api so maybe something else might be better suited?

Also the gemini model likely won't be free forever, but with current pricing it should be possible at about ~6cent per 1 minute of content (for videos of similar length) which seems cheap enough? The price scales with (video) input length so longer videos will be more expensive.

Last.Midnight

Australia898 Posts

13 hours ago

I was curious about doing this. Surely there are models/n8n setups that can automatically replace/overdub the voice too?

Thanks for sharing man this is great.

Last.Midnight

Australia898 Posts

12 hours ago

Recall (https://www.getrecall.ai/) provides written translations and app.vozo.ai apparently does voiceover dubs, but I'm not sure how accurate they are and it's expensive.

Simplistik

1956 Posts

11 hours ago

I feel like there is a webservice niche for automating this workflow if anyone has the patience to make to makw it work.

Last.Midnight

Australia898 Posts

11 hours ago

yt-dlp for download into ElevenLabs overdub most likely. Only problem is the EL credits.

Possibly with a specialised Eng>Kor model in between.

rtyrt7

46 Posts

7 hours ago

Maybe the free models over here would also be helpful, as API:
https://openrouter.ai/models?max_price=0

But it has these limits for the models whose ID is ending in ":free":
- Per-Minute Limit: 20 requests per minute
- Daily Limit: 50 requests per day per account

prosatan

Romania7761 Posts

7 hours ago

Thank you Kraekkling !

Kraekkling

Romania382 Posts

3 hours ago

On May 07 2025 10:53 Last.Midnight wrote:
Surely there are models/n8n setups that can automatically replace/overdub the voice too?

This is likely not feasible yet. What you're talking about is basically a different piece of technology.

You're right though that there are models that are able to translate audio and output sound in a voice similar to the speaker. However those models are several orders of magnitudes smaller than what we have here and do purely audio-to-audio. They can't handle long-term context. Also there just isn't much training data for these models to be able to properly handle bw jargon.

The advantage of the Gemini model is that we're using information from the video itself (not only the audio) and also tapping inside its "general intelligence" which is due to the very big model size. Additionally here we have inference time scaling, which means the model internally outputs an ensemble of chain-of-thought threads in which it discusses the best way to translate a given passage of video given the overall context, before giving an answer to the user.

However I think we might be not too far away to have models which could do what you suggested, give it 1-2 years at max and we'll be there. The next iteration of openai's omni-series might already do it.

Please or register to reply.

[G] GenAI subtitles for Korean BW content

Completed

Ongoing

Upcoming