ASL RO8 spoilers below!!
g1 + Show Spoiler +
g2 + Show Spoiler +
g3 + Show Spoiler +
g4 + Show Spoiler +
g5 + Show Spoiler +
g6 + Show Spoiler +
https://www.captionfy.com/video/youtube/YZWpi_IUi94?c=en-Ntb
g7 + Show Spoiler +
https://www.captionfy.com/video/youtube/YZWpi_IUi94?c=en-Ntb
The latest Gemini model by Google can handle video input and works surprisingly well for generating English subtitles for Korean Brood War videos. It still makes mistakes here and there and sometimes hallucinates, but it's a big step up from the gibberish you get from YouTube's auto-subtitles. If I had to guesstimate, I’d say it gets >80% right, which feels pretty impressive.
Workflow below.
+ Show Spoiler +
I'm using Gemini 2.5 Pro Preview (05-06) at https://aistudio.google.com/ with default settings. The model is currently free to test. It supports up to 1 million tokens of context; one minute of video is roughly 20k tokens, so the videos above ended up around ~160k–170k tokens each. However this means long content videos like daily proleague or KCM would not work as these exceed the context limit. Maybe chopping them up somehow could work?
Basically, I just pass it the YouTube link and ask it to generate English subtitles.
I've found it works better if I do this in two steps. First, I give it the link and just ask, "what is happening here?".
It will take a while and output a summary.
+ Show Spoiler +![[image loading]](https://i.ibb.co/hxMGvj1z/33t.png)
+ Show Spoiler +
After that, I ask it to create the subtitles. The prompt I use looks like this:
+ Show Spoiler +
create english subtitles (.srt)
Quick sanity checklist for SRT files:
Sequential numbers starting at 1.
Timestamp line exactly HH:MM:SS,mmm --> HH:MM:SS,mmm.
The video is less than 1 hour long so all timestamps must start with 00 for HH.
One subtitle text line.
A blank line after every cue.
This should give you subtitles you can copy, save as an .srt file, and use with the video. + Show Spoiler +![[image loading]](https://i.ibb.co/hRVJD2TV/3535nt.png)
The resulting .srt file sometimes has errors which results in missing text; this is often due to the generated formatting being wrong. Most of the times I found it best to just re-run until it worked. Alternatively you could adjust the prompt or fix the .srt yourself. I found the browser addon substital useful, because it allows you to use a local .srt file for youtube videos; and it generated error messages caused by wrong formatting of the .srt files faster than captionify.
I’m still figuring out the best way to share these or upload them for YouTube. I found captionfy, which seems pretty easy to use. You sign up and can create a shareable overlay for any YouTube video. The good thing is that traffic still goes to the original creator, and anyone can upload subtitles that are then available for everyone.
I guess the end goal would be to automate the full pipeline and translate a lot of stuff? It seems captionfy does not have an api so maybe something else might be better suited?
Also the gemini model likely won't be free forever, but with current pricing it should be possible at about ~6cent per 1 minute of content (for videos of similar length) which seems cheap enough? The price scales with (video) input length so longer videos will be more expensive.
Basically, I just pass it the YouTube link and ask it to generate English subtitles.
I've found it works better if I do this in two steps. First, I give it the link and just ask, "what is happening here?".
It will take a while and output a summary.
+ Show Spoiler +
![[image loading]](https://i.ibb.co/hxMGvj1z/33t.png)
+ Show Spoiler +
Interestingly, this summary often has hallucinations and often doesn’t accurately describe the video. Still, I noticed that when I skip this step and instead ask for subtitles right away, the results are worse. It seems like preloading the context window with Brood War jargon actually helps when it comes time to generate the subtitles. The summary itself being wrong doesn't seem to have any effect on the quality of the subtitles.
After that, I ask it to create the subtitles. The prompt I use looks like this:
+ Show Spoiler +
create english subtitles (.srt)
Quick sanity checklist for SRT files:
Sequential numbers starting at 1.
Timestamp line exactly HH:MM:SS,mmm --> HH:MM:SS,mmm.
The video is less than 1 hour long so all timestamps must start with 00 for HH.
One subtitle text line.
A blank line after every cue.
This should give you subtitles you can copy, save as an .srt file, and use with the video. + Show Spoiler +
![[image loading]](https://i.ibb.co/hRVJD2TV/3535nt.png)
The resulting .srt file sometimes has errors which results in missing text; this is often due to the generated formatting being wrong. Most of the times I found it best to just re-run until it worked. Alternatively you could adjust the prompt or fix the .srt yourself. I found the browser addon substital useful, because it allows you to use a local .srt file for youtube videos; and it generated error messages caused by wrong formatting of the .srt files faster than captionify.
I’m still figuring out the best way to share these or upload them for YouTube. I found captionfy, which seems pretty easy to use. You sign up and can create a shareable overlay for any YouTube video. The good thing is that traffic still goes to the original creator, and anyone can upload subtitles that are then available for everyone.
I guess the end goal would be to automate the full pipeline and translate a lot of stuff? It seems captionfy does not have an api so maybe something else might be better suited?
Also the gemini model likely won't be free forever, but with current pricing it should be possible at about ~6cent per 1 minute of content (for videos of similar length) which seems cheap enough? The price scales with (video) input length so longer videos will be more expensive.