• Log InLog In
  • Register
Liquid`
Team Liquid Liquipedia
EST 08:19
CET 14:19
KST 22:19
  • Home
  • Forum
  • Calendar
  • Streams
  • Liquipedia
  • Features
  • Store
  • EPT
  • TL+
  • StarCraft 2
  • Brood War
  • Smash
  • Heroes
  • Counter-Strike
  • Overwatch
  • Liquibet
  • Fantasy StarCraft
  • TLPD
  • StarCraft 2
  • Brood War
  • Blogs
Forum Sidebar
Events/Features
News
Featured News
RSL Season 3 - RO16 Groups C & D Preview0RSL Season 3 - RO16 Groups A & B Preview2TL.net Map Contest #21: Winners12Intel X Team Liquid Seoul event: Showmatches and Meet the Pros10[ASL20] Finals Preview: Arrival13
Community News
Weekly Cups (Nov 10-16): Reynor, Solar lead Zerg surge1[TLMC] Fall/Winter 2025 Ladder Map Rotation13Weekly Cups (Nov 3-9): Clem Conquers in Canada4SC: Evo Complete - Ranked Ladder OPEN ALPHA8StarCraft, SC2, HotS, WC3, Returning to Blizzcon!45
StarCraft 2
General
Weekly Cups (Nov 10-16): Reynor, Solar lead Zerg surge [TLMC] Fall/Winter 2025 Ladder Map Rotation Mech is the composition that needs teleportation t RotterdaM "Serral is the GOAT, and it's not close" RSL Season 3 - RO16 Groups C & D Preview
Tourneys
$5,000+ WardiTV 2025 Championship RSL Revival: Season 3 Sparkling Tuna Cup - Weekly Open Tournament Constellation Cup - Main Event - Stellar Fest Tenacious Turtle Tussle
Strategy
Custom Maps
Map Editor closed ?
External Content
Mutation # 500 Fright night Mutation # 499 Chilling Adaptation Mutation # 498 Wheel of Misfortune|Cradle of Death Mutation # 497 Battle Haredened
Brood War
General
FlaSh on: Biggest Problem With SnOw's Playstyle What happened to TvZ on Retro? BGH Auto Balance -> http://bghmmr.eu/ SnOw's ASL S20 Finals Review BW General Discussion
Tourneys
[Megathread] Daily Proleagues Small VOD Thread 2.0 [BSL21] RO32 Group D - Sunday 21:00 CET [BSL21] RO32 Group C - Saturday 21:00 CET
Strategy
How to stay on top of macro? Current Meta PvZ map balance Simple Questions, Simple Answers
Other Games
General Games
Should offensive tower rushing be viable in RTS games? Path of Exile Clair Obscur - Expedition 33 Stormgate/Frost Giant Megathread Nintendo Switch Thread
Dota 2
Official 'what is Dota anymore' discussion
League of Legends
Heroes of the Storm
Simple Questions, Simple Answers Heroes of the Storm 2.0
Hearthstone
Deck construction bug Heroes of StarCraft mini-set
TL Mafia
TL Mafia Community Thread SPIRED by.ASL Mafia {211640}
Community
General
Russo-Ukrainian War Thread Things Aren’t Peaceful in Palestine US Politics Mega-thread About SC2SEA.COM Canadian Politics Mega-thread
Fan Clubs
White-Ra Fan Club The herO Fan Club!
Media & Entertainment
Movie Discussion! [Manga] One Piece Anime Discussion Thread Korean Music Discussion Series you have seen recently...
Sports
2024 - 2026 Football Thread Formula 1 Discussion NBA General Discussion MLB/Baseball 2023 TeamLiquid Health and Fitness Initiative For 2023
World Cup 2022
Tech Support
SC2 Client Relocalization [Change SC2 Language] Linksys AE2500 USB WIFI keeps disconnecting Computer Build, Upgrade & Buying Resource Thread
TL Community
The Automated Ban List
Blogs
Dyadica Gospel – a Pulp No…
Hildegard
Coffee x Performance in Espo…
TrAiDoS
Saturation point
Uldridge
DnB/metal remix FFO Mick Go…
ImbaTosS
Reality "theory" prov…
perfectspheres
Customize Sidebar...

Website Feedback

Closed Threads



Active: 2123 users

[G] GenAI subtitles for Korean BW content

Forum Index > BW General
Post a Reply
Kraekkling
Profile Blog Joined June 2007
555 Posts
Last Edited: 2025-05-07 04:12:04
May 07 2025 01:42 GMT
#1
ASL RO8, Soulkey vs Rush, from Flash/Shuttle stream

ASL RO8 spoilers below!!

g1 + Show Spoiler +
https://www.captionfy.com/video/youtube/Ixu6V3pCQf8?c=en

g2 + Show Spoiler +
https://www.captionfy.com/video/youtube/p7l6c5qzoDw?c=en

g3 + Show Spoiler +
https://www.captionfy.com/video/youtube/p_rvWNRKhgw?c=en

g4 + Show Spoiler +
https://www.captionfy.com/video/youtube/a478rarEBTY?c=en

g5 + Show Spoiler +
https://www.captionfy.com/video/youtube/YZWpi_IUi94?c=en-Ntb

g6 + Show Spoiler +
https://www.captionfy.com/video/youtube/YZWpi_IUi94?c=en-Ntb

g7 + Show Spoiler +
https://www.captionfy.com/video/youtube/YZWpi_IUi94?c=en-Ntb


The latest Gemini model by Google can handle video input and works surprisingly well for generating English subtitles for Korean Brood War videos. It still makes mistakes here and there and sometimes hallucinates, but it's a big step up from the gibberish you get from YouTube's auto-subtitles. If I had to guesstimate, I’d say it gets >80% right, which feels pretty impressive.

Workflow below.

+ Show Spoiler +
I'm using Gemini 2.5 Pro Preview (05-06) at https://aistudio.google.com/ with default settings. The model is currently free to test. It supports up to 1 million tokens of context; one minute of video is roughly 20k tokens, so the videos above ended up around ~160k–170k tokens each. However this means long content videos like daily proleague or KCM would not work as these exceed the context limit. Maybe chopping them up somehow could work?

Basically, I just pass it the YouTube link and ask it to generate English subtitles.

I've found it works better if I do this in two steps. First, I give it the link and just ask, "what is happening here?".

It will take a while and output a summary.
+ Show Spoiler +
[image loading]

+ Show Spoiler +
Interestingly, this summary often has hallucinations and often doesn’t accurately describe the video. Still, I noticed that when I skip this step and instead ask for subtitles right away, the results are worse. It seems like preloading the context window with Brood War jargon actually helps when it comes time to generate the subtitles. The summary itself being wrong doesn't seem to have any effect on the quality of the subtitles.


After that, I ask it to create the subtitles. The prompt I use looks like this:

+ Show Spoiler +

create english subtitles (.srt)

Quick sanity checklist for SRT files:

Sequential numbers starting at 1.

Timestamp line exactly HH:MM:SS,mmm --> HH:MM:SS,mmm.

The video is less than 1 hour long so all timestamps must start with 00 for HH.

One subtitle text line.

A blank line after every cue.


This should give you subtitles you can copy, save as an .srt file, and use with the video. + Show Spoiler +
[image loading]


The resulting .srt file sometimes has errors which results in missing text; this is often due to the generated formatting being wrong. Most of the times I found it best to just re-run until it worked. Alternatively you could adjust the prompt or fix the .srt yourself. I found the browser addon substital useful, because it allows you to use a local .srt file for youtube videos; and it generated error messages caused by wrong formatting of the .srt files faster than captionify.

I’m still figuring out the best way to share these or upload them for YouTube. I found captionfy, which seems pretty easy to use. You sign up and can create a shareable overlay for any YouTube video. The good thing is that traffic still goes to the original creator, and anyone can upload subtitles that are then available for everyone.

I guess the end goal would be to automate the full pipeline and translate a lot of stuff? It seems captionfy does not have an api so maybe something else might be better suited?

Also the gemini model likely won't be free forever, but with current pricing it should be possible at about ~6cent per 1 minute of content (for videos of similar length) which seems cheap enough? The price scales with (video) input length so longer videos will be more expensive.
(*^^)(^*)
Last.Midnight
Profile Blog Joined July 2006
Australia906 Posts
May 07 2025 01:53 GMT
#2
I was curious about doing this. Surely there are models/n8n setups that can automatically replace/overdub the voice too?

Thanks for sharing man this is great.
Last.Midnight
Profile Blog Joined July 2006
Australia906 Posts
May 07 2025 02:49 GMT
#3
Recall (https://www.getrecall.ai/) provides written translations and app.vozo.ai apparently does voiceover dubs, but I'm not sure how accurate they are and it's expensive.
Simplistik
Profile Blog Joined November 2007
2093 Posts
May 07 2025 03:34 GMT
#4
I feel like there is a webservice niche for automating this workflow if anyone has the patience to make to makw it work.
Dear BW Gods, it IS now autumn, so...
Last.Midnight
Profile Blog Joined July 2006
Australia906 Posts
Last Edited: 2025-05-07 04:25:05
May 07 2025 04:24 GMT
#5
yt-dlp for download into ElevenLabs overdub most likely. Only problem is the EL credits.

Possibly with a specialised Eng>Kor model in between.
rtyrt7
Profile Joined August 2018
48 Posts
May 07 2025 07:34 GMT
#6
Maybe the free models over here would also be helpful, as API:
https://openrouter.ai/models?max_price=0

But it has these limits for the models whose ID is ending in ":free":
- Per-Minute Limit: 20 requests per minute
- Daily Limit: 50 requests per day per account
prosatan
Profile Joined September 2009
Romania8495 Posts
May 07 2025 07:57 GMT
#7
Thank you Kraekkling !
Lee JaeDong Fighting! The only church that illuminates is the one that burns.
Kraekkling
Profile Blog Joined June 2007
555 Posts
May 07 2025 11:56 GMT
#8
On May 07 2025 10:53 Last.Midnight wrote:
Surely there are models/n8n setups that can automatically replace/overdub the voice too?


This is likely not feasible yet. What you're talking about is basically a different piece of technology.

You're right though that there are models that are able to translate audio and output sound in a voice similar to the speaker. However those models are several orders of magnitudes smaller than what we have here and do purely audio-to-audio. They can't handle long-term context. Also there just isn't much training data for these models to be able to properly handle bw jargon.

The advantage of the Gemini model is that we're using information from the video itself (not only the audio) and also tapping inside its "general intelligence" which is due to the very big model size. Additionally here we have inference time scaling, which means the model internally outputs an ensemble of chain-of-thought threads in which it discusses the best way to translate a given passage of video given the overall context, before giving an answer to the user.

However I think we might be not too far away to have models which could do what you suggested, give it 1-2 years at max and we'll be there. The next iteration of openai's omni-series might already do it.

(*^^)(^*)
yubo56
Profile Joined May 2014
690 Posts
May 07 2025 20:31 GMT
#9
On May 07 2025 20:56 Kraekkling wrote:
Show nested quote +
On May 07 2025 10:53 Last.Midnight wrote:
Surely there are models/n8n setups that can automatically replace/overdub the voice too?


This is likely not feasible yet. What you're talking about is basically a different piece of technology.

You're right though that there are models that are able to translate audio and output sound in a voice similar to the speaker. However those models are several orders of magnitudes smaller than what we have here and do purely audio-to-audio. They can't handle long-term context. Also there just isn't much training data for these models to be able to properly handle bw jargon.

The advantage of the Gemini model is that we're using information from the video itself (not only the audio) and also tapping inside its "general intelligence" which is due to the very big model size. Additionally here we have inference time scaling, which means the model internally outputs an ensemble of chain-of-thought threads in which it discusses the best way to translate a given passage of video given the overall context, before giving an answer to the user.

However I think we might be not too far away to have models which could do what you suggested, give it 1-2 years at max and we'll be there. The next iteration of openai's omni-series might already do it.


Wait, but you're describing the difficulty of direct audio-audio translation. If you already can do audio -> translated text though, can't you just slap a text-to-speech and have a (basic) audio-audio translation?

I guess you'd have trouble matching the duration of the sentences, but with some simple squeezing and stretching of audio bytes it's still surely quite feasible compared to direct audio-to-audio translation...
Jung Yoon Jong fighting, even after retirement! Feel better soon.
prion_
Profile Joined September 2022
78 Posts
Last Edited: 2025-05-07 22:10:15
May 07 2025 22:08 GMT
#10
The problem is that it would sound like TikTok caption voice. I mean, not exactly that, but you wouldn't be able to keep the rhythm and modulation of their voices by going audio->text->audio, even if you adjusted for time.
IntoTheWow
Profile Blog Joined May 2004
is awesome32277 Posts
May 08 2025 02:26 GMT
#11
This is really cool!

Do you think that adding some keywords in the prompt could help the model? Like units, BW jargon, etc? Or are errors due to other factors?
Moderator<:3-/-<
Last.Midnight
Profile Blog Joined July 2006
Australia906 Posts
May 08 2025 03:36 GMT
#12
I tried ElevenLabs dubbing feature and it works pretty great. Of course I can't speak to the accuracy of the translation but it's certainly more accurate than "translate to English" on Chrome. Only funny thing is that it also dubs the unit sounds so whenever the player isn't speaking he'll repeat SCV commands etc. haha
Lorch
Profile Joined June 2011
Germany3686 Posts
May 08 2025 13:00 GMT
#13
This is completely useless if you don't speak Korean.
You can never know what part of the translation are accurate and which aren't. You thinking that it sounds reasonable/makes sense is not a great heuristic, especially with how AIs tend to hallucinate.

Would probably need a dedicated bw ai model trained under the supervision of someone who speaks korean + english and is knowledgeable in starcraft to create something worth using.
Kraekkling
Profile Blog Joined June 2007
555 Posts
May 08 2025 14:32 GMT
#14
On May 08 2025 11:26 IntoTheWow wrote:
This is really cool!

Do you think that adding some keywords in the prompt could help the model? Like units, BW jargon, etc? Or are errors due to other factors?


We're pre-filling the prompt with BW jargon by asking for a video summary first. As to why there are errors - I guess the easiest answer is that the technology is not 100% there yet. Machine translation generally got useful only in the last decade or so... Additionally, BW is a niche domain - one needs a sufficient world model to make sense of the meaning behind words. Koreans often use abbreviations, for example they'd say "zildra" for a zealot/dragoon army; or "sam-hat" (삼햇) for a 3-hatchery opening, etc. I've also tried older models but this one by far is the best one to make sense of stuff like this.

To me, the fact that any of this works at all is pretty crazy.

On May 08 2025 22:00 Lorch wrote:
Would probably need a dedicated bw ai model trained under the supervision of someone who speaks korean + english and is knowledgeable in starcraft to create something worth using.


Unfortunately this won't happen, so for now its either youtube auto-subs or this. + Show Spoiler +
also this is not how models are trained


This is completely useless if you don't speak Korean.
You can never know what part of the translation are accurate and which aren't. You thinking that it sounds reasonable/makes sense is not a great heuristic, especially with how AIs tend to hallucinate.


Maybe someone who speaks Korean could comment? I'm only comparing this to yt auto-subs, and it felt like even with some obvious hallucinations the overall commentary was pretty easy to grasp?
(*^^)(^*)
Last.Midnight
Profile Blog Joined July 2006
Australia906 Posts
Last Edited: 2025-05-08 21:25:04
May 08 2025 21:24 GMT
#15
On May 08 2025 22:00 Lorch wrote:
This is completely useless if you don't speak Korean.
You can never know what part of the translation are accurate and which aren't. You thinking that it sounds reasonable/makes sense is not a great heuristic, especially with how AIs tend to hallucinate.

Would probably need a dedicated bw ai model trained under the supervision of someone who speaks korean + english and is knowledgeable in starcraft to create something worth using.


Not useless, but not optimal either. Some phrases are lost but things like "focus fire the tank here" when he's also clicking a tank is pretty clear. Hallucinations don't happen as much when models draw from source material, they tend to happen when the trained parameters through a massive database misinterpret a request.

That's why for enterprise integration RAG is all the rage, since the "database" the models link to is the company's data.
Please log in or register to reply.
Live Events Refresh
Wardi Open
12:00
#61
WardiTV687
TKL 181
Rex117
Liquipedia
[ Submit Event ]
Live Streams
Refresh
StarCraft 2
Reynor 329
TKL 181
Harstem 146
Rex 117
ProTech114
StarCraft: Brood War
Britney 43268
Calm 6931
Horang2 1559
Jaedong 834
Soma 793
EffOrt 736
firebathero 437
Stork 393
Larva 378
Rush 197
[ Show more ]
ZerO 179
Pusan 177
Zeus 144
Killer 115
Mind 99
ToSsGirL 69
Sea.KH 57
yabsab 52
Liquid`Ret 38
scan(afreeca) 30
Icarus 20
Hm[arnc] 15
Noble 13
ivOry 10
NaDa 6
Dota 2
Dendi1407
qojqva833
XcaliburYe209
Counter-Strike
olofmeister1838
x6flipin714
allub225
oskar120
markeloff0
Other Games
B2W.Neo779
Pyrionflax428
crisheroes336
Fuzer 308
hiko211
Sick106
QueenE25
ZerO(Twitch)20
Liquid`LucifroN10
Organizations
Dota 2
PGL Dota 2 - Main Stream12058
PGL Dota 2 - Secondary Stream2653
StarCraft: Brood War
UltimateBattle 72
StarCraft 2
Blizzard YouTube
StarCraft: Brood War
BSLTrovo
sctven
[ Show 11 non-featured ]
StarCraft 2
• AfreecaTV YouTube
• intothetv
• Kozan
• IndyKCrew
• LaughNgamezSOOP
• Migwel
• sooper7s
StarCraft: Brood War
• BSLYoutube
• STPLYoutube
• ZZZeroYoutube
Dota 2
• C_a_k_e 1714
Upcoming Events
Monday Night Weeklies
3h 41m
Replay Cast
9h 41m
ChoboTeamLeague
11h 41m
WardiTV Korean Royale
22h 41m
BSL: GosuLeague
1d 7h
The PondCast
1d 20h
Replay Cast
2 days
RSL Revival
2 days
herO vs Zoun
Classic vs Reynor
Maru vs SHIN
MaxPax vs TriGGeR
BSL: GosuLeague
3 days
RSL Revival
3 days
[ Show More ]
WardiTV Korean Royale
3 days
RSL Revival
4 days
WardiTV Korean Royale
4 days
IPSL
5 days
Julia vs Artosis
JDConan vs DragOn
RSL Revival
5 days
Wardi Open
6 days
IPSL
6 days
StRyKeR vs OldBoy
Sziky vs Tarson
Replay Cast
6 days
Liquipedia Results

Completed

Proleague 2025-11-14
Stellar Fest: Constellation Cup
Eternal Conflict S1

Ongoing

C-Race Season 1
IPSL Winter 2025-26
KCM Race Survival 2025 Season 4
SOOP Univ League 2025
YSL S2
BSL Season 21
CSCL: Masked Kings S3
SLON Tour Season 2
RSL Revival: Season 3
META Madness #9
BLAST Rivals Fall 2025
IEM Chengdu 2025
PGL Masters Bucharest 2025
Thunderpick World Champ.
CS Asia Championships 2025
ESL Pro League S22
StarSeries Fall 2025
FISSURE Playground #2
BLAST Open Fall 2025

Upcoming

BSL 21 Non-Korean Championship
Acropolis #4
IPSL Spring 2026
HSC XXVIII
RSL Offline Finals
WardiTV 2025
IEM Kraków 2026
BLAST Bounty Winter 2026
BLAST Bounty Winter 2026: Closed Qualifier
eXTREMESLAND 2025
ESL Impact League Season 8
SL Budapest Major 2025
TLPD

1. ByuN
2. TY
3. Dark
4. Solar
5. Stats
6. Nerchio
7. sOs
8. soO
9. INnoVation
10. Elazer
1. Rain
2. Flash
3. EffOrt
4. Last
5. Bisu
6. Soulkey
7. Mini
8. Sharp
Sidebar Settings...

Advertising | Privacy Policy | Terms Of Use | Contact Us

Original banner artwork: Jim Warren
The contents of this webpage are copyright © 2025 TLnet. All Rights Reserved.