FramePack With LipSync: Create AI Talking Avatar – Let's Push the Limits for Local AI!

Introduction
Ever dreamed of creating your own AI talking avatar that actually speaks, sings, and reacts? With this easy-to-follow ComfyUI workflow, you’ll learn how to go from a static image to a fully lip-synced, voice-cloned avatar. No need for paid cloud tools—we’re going local and open-source!

Whether you're making YouTube Shorts, building virtual influencers, or telling creative stories—this guide will walk you through every step.

What You’ll Learn

How to use FramePack to animate avatars frame-by-frame
Integrate Flux LoRA (ACE++) for expressive face styles
Use Latent Sync 1.5 for accurate lip synchronization
Apply F5TTS for realistic voice cloning
Enhance your output with upscaling and frame blending

Step-by-Step Workflow

Note: A beginner-friendly ComfyUI .json workflow file is attached at the end. You must be logged in to download.

Generate the Base Character Image
Use your preferred model in ComfyUI (e.g., Flux ACE++) to create a clean, frontal face with a neutral expression.
Load the FramePack F1 Node
Install FramePack F1 to animate the static image using motion vectors. You can adjust "expression frames" for subtle emotion control.
Add Lip Sync Using LatentSync 1.5
Feed in your audio (TTS or recorded). LatentSync aligns mouth shapes with the waveform for realistic speech matching.
Use F5TTS for Voice Cloning (Optional)
Clone voices using text-to-speech models like Bark or F5TTS. Paste your script, and export audio to sync with LatentSync.
Export Video & Upscale
Once you're happy with the animation, use VideoCombine or FlowFrames to upscale and interpolate the final result to 60 FPS.

Mentioned AI Models & Workflows

Who Is This For?

[] Content Creators: Wanting to build custom AI avatars for video content.
[] ComfyUI Users: Curious about multi-model animation pipelines.
[] Developers: Exploring local voice cloning and syncing tools.
[] AI Experimenters: Looking for local, cost-effective solutions.

Why It Matters

This tutorial bridges the gap between static image generation and dynamic, realistic AI avatars. You can now create 10+ second talking or singing videos—entirely offline, with stunning results. No API keys. No hidden fees.

Perfect for prototyping before using paid services—or going full local for creative independence.

Got Questions?
Reply below or join the discussion to share your avatar results and ask for help. Let’s keep pushing the boundaries of local AI together!

Bonus Tip 💡 said:
Try combining this with SadTalker or RVC (Retrieval-Based Voice Conversion) for deeper storytelling control.

Search

Search

FramePack With LipSync: Create AI Talking Avatar – Let's Push the Limits for Local AI! 2025-05-09

Ratings

More resources from Supto AI

Share this resource