Local AI Studio — Part 4: Local Video With LTX-Video and Wan 2.1

This is Part 4 of the series, and the most surprising chapter: your Mac can generate
video.
It's slow — minutes per clip, not seconds — but it works, entirely offline, with open
models. After the image work in Part 3,
I wanted to see how far the Studio could go. Two models, two very different experiences.
LTX-Video: the fast(er) one
LTX-Video is built for speed, and on Apple Silicon that
makes it the friendlier starting point. It's a single all-in-one checkpoint, and ComfyUI has
native nodes for it. The graph is the familiar shape with video-specific pieces:
EmptyLTXVLatentVideo for the latent, LTXVConditioning, an LTXVScheduler feeding a
SamplerCustom, and a SaveWEBM at the end.
LTX-Video 2B, 768×512, 97 frames (≈4s @ 24fps), 30 steps → ~7 min
The output is a real, coherent clip — in my test, a jaguar moving through a misty,
golden-lit rainforest. Seven minutes for four seconds of video isn't fast, but it's your
machine, for free, with nothing leaving the building.
Wan 2.1: higher quality, and a trap
Wan 2.1 is the quality leader of the two. I used the
1.3B text-to-video variant — the one sized sensibly for a Mac (the 14B would be
punishing). Its native ComfyUI recipe is a little different: a UNETLoader for the
diffusion model, a CLIPLoader with the umt5 text encoder, EmptyHunyuanLatentVideo
for the latent, a ModelSamplingSD3 shift, then a standard KSampler.
I followed the canonical workflow to the letter — including its default uni_pc
sampler — and got this:
A four-second clip of pulsing neon-rainbow soup. No jaguar. No ruins. Just psychedelic mush.
It rendered without a single error, which is the worst kind of failure. Sixteen minutes of
GPU time for garbage.
The one-word fix
Rainbow output that renders "successfully" is the signature of a sampler diverging — the
math going unstable and producing nonsense the VAE then dutifully decodes. On Apple Silicon
this has a known culprit: the uni_pc multistep sampler is finicky on MPS. The fix is
boring and total:
- "sampler_name": "uni_pc"
+ "sampler_name": "euler"
Same model, same prompt, same everything else — and euler produced a clean, coherent
scene: a misty Mayan temple in golden morning light. One word turned 16 wasted minutes into
a real video.
If a local video model gives you rainbow noise on a Mac, suspect the sampler before the prompt.
euleris the safe default on MPS.
What I'd tell you before you start
A few honest takeaways from living with this for a bit:
- Video on a Mac is for patience, not real-time. Queue clips and walk away. This is
exactly where the API approach from Part 2
earns its keep — fire a batch, come back later. - Unified memory is the enabler. The 1.3B Wan and the LTX checkpoint sit in the
Studio's 64 GB without drama. This is the thing a Mac does that a 12 GB consumer GPU
can't. - Keep clips short and modest. 2–4 seconds at 480–512p is the comfortable zone for the
M1 Max. Push frame count and resolution and the minutes pile up fast. - A still frame is a free bonus. Pull any frame out of the clip with
ffmpegand you've
got a poster image for it.
The whole studio, in review
Across this series we went from an empty ~/ComfyUI folder to a real local studio:
- Part 1 — installed on Apple Silicon, GPU confirmed.
- Part 2 — driven from code as a plain function.
- Part 3 — SDXL vs FLUX, and a myth measured into the ground.
- Part 4 — video, with the MPS sampler trap defused.
- Part 5 — a real 15-second reel of the Salvadoran coast, and the case for not using a video model to make it.
No cloud, no API keys, no per-image bill — and every picture in the series painted by the
machine the series is about. If you've got an Apple Silicon Mac with a healthy amount of
memory, you already own a capable little generative studio. Go turn it on.
And if you want to see what you can actually make with it, that's
Part 5.