AI Captions
Generate accurate captions with word-level timestamps and 6 professional presets.
AI Caption Generator
Captions aren’t optional anymore. They’re how most people watch video. ChatCut generates accurate, styled captions with word-level timing and gives you real control over how they look.
Skip the menus. Type what you need. Tell the AI “Add Netflix-style captions” and they’re on your timeline in seconds, perfectly synced.
Dual-Engine Transcription
ChatCut runs two transcription engines to handle different languages at their best:
- AssemblyAI – optimized for English and European languages, high accuracy on conversational speech
- Huoshan – purpose-built for Chinese (Mandarin, Cantonese), handles tonal languages and CJK character segmentation correctly
The right engine is selected automatically based on your content’s language. You’ll get accurate transcription without configuring anything.
6 Professional Presets
Start with a look that works, then customize from there:
- Netflix – clean white text, semi-transparent background, industry-standard positioning
- Minimal – no background, subtle drop shadow, stays out of the way
- Vox – bold, colorful word-by-word highlights (Vox Media style)
- Focus – highlights the current word, dims surrounding text
- TikTok – large, centered, high-contrast, built for vertical video
- YouTube – readable at any size, optimized for 16:9 content
Each preset’s a starting point. Every visual property is adjustable.
Add your video
Import footage or use content already on your timeline
Generate captions
The AI transcribes with word-level timestamps and speaker identification
Pick a preset
Choose from 6 professional styles: Netflix, Minimal, Vox, Focus, TikTok, or YouTube
Customize anything
Adjust 20+ properties: font, size, color, position, animation, background, and more
20+ Customizable Properties
This is where ChatCut pulls ahead of basic caption tools. You’re not limited to font and color. Pair these with AI voiceover narration or text-based editing for a complete spoken-word workflow. The full property list includes:
- Font family, weight, and size
- Text color, stroke color, stroke width
- Background color and opacity
- Position (x, y) and alignment
- Line height and letter spacing
- Word highlight color and animation
- Shadow properties
- Maximum lines and characters per line
- Animation style (fade, pop, slide)
Every property updates in real-time on your preview. There’s no re-rendering, no guessing.
Captions generated with word-level timestamps, Netflix styling applied, font size increased, active word highlighted in blue, all synced to timeline
Word-Level Timestamps
ChatCut doesn’t just timestamp sentences; it timestamps every word. This enables:
- Per-word highlighting – the active word lights up as it’s spoken
- Precise trimming – cut to the exact word boundary
- Text-based editing – delete a word from the transcript, and the corresponding video is removed
- Accurate sync – captions never drift, even in fast speech
Speaker Identification
Multi-speaker content is handled automatically. According to Wistia’s research, captioned videos see significantly higher engagement. The transcription engine identifies different speakers and labels them. This means:
- Interview captions show who’s talking
- Podcast episodes with multiple hosts are properly attributed
- Panel discussions don’t get confusing
- You can style different speakers with different colors
CJK Language Support
Most caption tools treat Chinese, Japanese, and Korean as afterthoughts. ChatCut doesn’t. The Huoshan engine provides:
- Proper character segmentation (there’s no mid-word breaks)
- Intelligent line breaking that respects grammar
- Correct punctuation handling
- Natural reading flow for vertical and horizontal text
If you’re creating content in Chinese or for Chinese-speaking audiences, this is the caption tool that actually works.
| Feature | ChatCut | Descript |
|---|---|---|
| Customizable properties | 20+ visual properties | Basic font, color, position |
| Style presets | 6 professional presets | Limited preset options |
| CJK language support | Dedicated engine with intelligent line breaking | Basic support, frequent segmentation issues |
| Word-level timestamps | Yes, with per-word highlighting | Yes |
| Speaker identification | Automatic with color coding | Automatic |
Describe What You Want in Plain English. ChatCut Handles the Rest.
You don’t need to manually position text boxes or fiddle with timing. Tell the AI agent what style you want, and it configures everything. Want to change the look later? Just describe the change.
“Make the captions bigger, move them to the top third, and use a bold font,” done.
“Switch to TikTok style but keep my custom colors,” done.
The AI understands context and applies changes across all caption segments at once.
Filler words ('um', 'uh', 'like', 'you know') removed from transcript and timeline, Focus preset applied with yellow word highlights
When to Use AI Captions
- Social media – most social media content is watched on mute, so captions are required
- YouTube – burned-in captions improve watch time and accessibility
- Interviews and podcasts – speaker identification keeps talking-head editing and multi-person content clear
- Educational content – word-level highlighting aids comprehension
- International content – dual-engine transcription handles English and Chinese natively