Jun 2, 20264 min read/2026/06/02/local-ai-studio-part-4-local-video-with-ltx-and-wan/

Local AI Studio — Part 4: Local Video With LTX-Video and Wan 2.1

This is Part 4 of the series, and the most surprising chapter: your Mac can generate
video.

It's slow — minutes per clip, not seconds — but it works, entirely offline, with open
models. After the image work in Part 3,
I wanted to see how far the Studio could go. Two models, two very different experiences.

LTX-Video: the fast(er) one

LTX-Video is built for speed, and on Apple Silicon that
makes it the friendlier starting point. It's a single all-in-one checkpoint, and ComfyUI has
native nodes for it. The graph is the familiar shape with video-specific pieces:
EmptyLTXVLatentVideo for the latent, LTXVConditioning, an LTXVScheduler feeding a
SamplerCustom, and a SaveWEBM at the end.

LTX-Video 2B, 768×512, 97 frames (≈4s @ 24fps), 30 steps  →  ~7 min

The output is a real, coherent clip — in my test, a jaguar moving through a misty,
golden-lit rainforest. Seven minutes for four seconds of video isn't fast, but it's your
machine
, for free, with nothing leaving the building.

Wan 2.1: higher quality, and a trap

Wan 2.1 is the quality leader of the two. I used the
1.3B text-to-video variant — the one sized sensibly for a Mac (the 14B would be
punishing). Its native ComfyUI recipe is a little different: a UNETLoader for the
diffusion model, a CLIPLoader with the umt5 text encoder, EmptyHunyuanLatentVideo
for the latent, a ModelSamplingSD3 shift, then a standard KSampler.

I followed the canonical workflow to the letter — including its default uni_pc
sampler — and got this:

A four-second clip of pulsing neon-rainbow soup. No jaguar. No ruins. Just psychedelic mush.

It rendered without a single error, which is the worst kind of failure. Sixteen minutes of
GPU time for garbage.

The one-word fix

Rainbow output that renders "successfully" is the signature of a sampler diverging — the
math going unstable and producing nonsense the VAE then dutifully decodes. On Apple Silicon
this has a known culprit: the uni_pc multistep sampler is finicky on MPS. The fix is
boring and total:

- "sampler_name": "uni_pc"
+ "sampler_name": "euler"

Same model, same prompt, same everything else — and euler produced a clean, coherent
scene: a misty Mayan temple in golden morning light. One word turned 16 wasted minutes into
a real video.

If a local video model gives you rainbow noise on a Mac, suspect the sampler before the prompt. euler is the safe default on MPS.

What I'd tell you before you start

A few honest takeaways from living with this for a bit:

  • Video on a Mac is for patience, not real-time. Queue clips and walk away. This is
    exactly where the API approach from Part 2
    earns its keep — fire a batch, come back later.
  • Unified memory is the enabler. The 1.3B Wan and the LTX checkpoint sit in the
    Studio's 64 GB without drama. This is the thing a Mac does that a 12 GB consumer GPU
    can't.
  • Keep clips short and modest. 2–4 seconds at 480–512p is the comfortable zone for the
    M1 Max. Push frame count and resolution and the minutes pile up fast.
  • A still frame is a free bonus. Pull any frame out of the clip with ffmpeg and you've
    got a poster image for it.

The whole studio, in review

Across this series we went from an empty ~/ComfyUI folder to a real local studio:

  • Part 1 — installed on Apple Silicon, GPU confirmed.
  • Part 2 — driven from code as a plain function.
  • Part 3 — SDXL vs FLUX, and a myth measured into the ground.
  • Part 4 — video, with the MPS sampler trap defused.
  • Part 5 — a real 15-second reel of the Salvadoran coast, and the case for not using a video model to make it.

No cloud, no API keys, no per-image bill — and every picture in the series painted by the
machine the series is about. If you've got an Apple Silicon Mac with a healthy amount of
memory, you already own a capable little generative studio. Go turn it on.

And if you want to see what you can actually make with it, that's
Part 5.