Local AI Studio — Part 4: Local Video With LTX-Video and Wan 2.1

This is Part 4 of the series, and the most surprising chapter: your Mac can generate
video.

It's slow — minutes per clip, not seconds — but it works, entirely offline, with open
models. After the image work in Part 3,
I wanted to see how far the Studio could go. Two models, two very different experiences.

LTX-Video: the fast(er) one

LTX-Video is built for speed, and on Apple Silicon that
makes it the friendlier starting point. It's a single all-in-one checkpoint, and ComfyUI has
native nodes for it. The graph is the familiar shape with video-specific pieces:
EmptyLTXVLatentVideo for the latent, LTXVConditioning, an LTXVScheduler feeding a
SamplerCustom, and a SaveWEBM at the end.

LTX-Video 2B, 768×512, 97 frames (≈4s @ 24fps), 30 steps  →  ~7 min

The output is a real, coherent clip — in my test, a jaguar moving through a misty,
golden-lit rainforest. Seven minutes for four seconds of video isn't fast, but it's your
machine, for free, with nothing leaving the building.

Wan 2.1: higher quality, and a trap

Wan 2.1 is the quality leader of the two. I used the
1.3B text-to-video variant — the one sized sensibly for a Mac (the 14B would be
punishing). Its native ComfyUI recipe is a little different: a UNETLoader for the
diffusion model, a CLIPLoader with the umt5 text encoder, EmptyHunyuanLatentVideo
for the latent, a ModelSamplingSD3 shift, then a standard KSampler.

I followed the canonical workflow to the letter — including its default uni_pc
sampler — and got this:

A four-second clip of pulsing neon-rainbow soup. No jaguar. No ruins. Just psychedelic mush.

It rendered without a single error, which is the worst kind of failure. Sixteen minutes of
GPU time for garbage.

The one-word fix

Rainbow output that renders "successfully" is the signature of a sampler diverging — the
math going unstable and producing nonsense the VAE then dutifully decodes. On Apple Silicon
this has a known culprit: the uni_pc multistep sampler is finicky on MPS. The fix is
boring and total:

- "sampler_name": "uni_pc"
+ "sampler_name": "euler"

Same model, same prompt, same everything else — and euler produced a clean, coherent
scene: a misty Mayan temple in golden morning light. One word turned 16 wasted minutes into
a real video.

If a local video model gives you rainbow noise on a Mac, suspect the sampler before the prompt. euler is the safe default on MPS.

What I'd tell you before you start

A few honest takeaways from living with this for a bit:

Video on a Mac is for patience, not real-time. Queue clips and walk away. This is
exactly where the API approach from Part 2
earns its keep — fire a batch, come back later.
Unified memory is the enabler. The 1.3B Wan and the LTX checkpoint sit in the
Studio's 64 GB without drama. This is the thing a Mac does that a 12 GB consumer GPU
can't.
Keep clips short and modest. 2–4 seconds at 480–512p is the comfortable zone for the
M1 Max. Push frame count and resolution and the minutes pile up fast.
A still frame is a free bonus. Pull any frame out of the clip with ffmpeg and you've
got a poster image for it.

The whole studio, in review

Across this series we went from an empty ~/ComfyUI folder to a real local studio:

Part 1 — installed on Apple Silicon, GPU confirmed.
Part 2 — driven from code as a plain function.
Part 3 — SDXL vs FLUX, and a myth measured into the ground.
Part 4 — video, with the MPS sampler trap defused.
Part 5 — a real 15-second reel of the Salvadoran coast, and the case for not using a video model to make it.

No cloud, no API keys, no per-image bill — and every picture in the series painted by the
machine the series is about. If you've got an Apple Silicon Mac with a healthy amount of
memory, you already own a capable little generative studio. Go turn it on.

And if you want to see what you can actually make with it, that's
Part 5.