nobody is building for the instinct
People think building AI films is simple. You prompt, you get a video, you're good to go. But that is not the reality.
If you want a good output, you have to constantly retry. Different models. Different compositions. Different camera angles. And when you finally land on something that feels right, now you have to pray that the video model adheres to your vision.
If it doesn't, you go back and iterate on the entire loop. Unlike traditional filmmaking, where you shoot once and an editor fixes things later, here you iterate the whole pipeline over and over until something worthwhile emerges.
The people who survive this, we call them technical creatives.
On one hand, they have to be really good at systems thinking. Understanding model limitations, knowing what to prompt and what not to prompt, structuring the pipeline so every iteration moves them forward.
On the other hand, they have to trust their lizard brain. That ancient instinct that fires the moment a face is slightly off, or the physics of a scene don't add up, or when an environment has quietly broken its own rules.
Nobody is building tools to enable these technical creatives to take the instinct of the lizard brain and translate it into a systems level action fast enough to keep up with it.
Model providers do try to cater to the lizard brain, but they can only do it in a templatized way. If a model looks great on faces, it might mess up the environment or graphics. So you end up needing multiple models to build an AI film.
The aggregators, seeing that fragmentation, just pull everything together and drop it in front of the technical creative. That is where they stop.
That is a huge gap. And closing the response time between the lizard brain and systems brain would fundamentally change what it means to make films with AI.