How i made an AI that can generate motion graphics from text prompt (Insane prompt engeneering)

First thing first, i did NOT coded a full LLM, as it's impossible for me, so i had to find a way around it, but there are no APIs that i could use, because motion graphics is something completely new to AI, often closed to the public or extremely expensive. so i had to find a way out, and of course i found it, but to explain it, i first need to explain how does the editor work:

The editor, believe it or not, is actually web based, because it was way easier to do it web. so, a shape is created with CSS canvas, like a div, but better. and those canvas have fixed proprieties such as color, position, whatever. so to create an animation, have to make an AI that is able to create those canvas and implement them correctly. and while the solution seems simple, it's way, way harder then you think, let me explain.

The solution itself is simple, Create a JSON file, witch in non nerdy terms is "list of proprieties", so when i implement this JSON into the Shape proprieties, they get changed based on the JSON. so... JSON = Shape, with all it's settings. now we need an AI that creates this JSON, witch is point number one: Assistant.

An assistant is a pre trained large AI, like chatgpt or claude, that are trained to perform a single task. now let me take you to the real behind the scenes of the software, and see the assistant yourself.

We are in OpenAI platform, with is the official chatgpt API page, here, in assistants, you can see i have many already created. but now let's focus on this one, FlashFX Shape Gen, here i can find the instructions of the assistant, witch i won't spend too much time reading this, but you can see that i said to generate a JSON file with specific requests. we can test this by clicking on playground here.

Now i'll give him a shape to create let's create a red circle. you can see that the Assistant returned a code, with all the settings for a red circle, now let's put this into action. now to do it, i will use an older version of FlashFX. it did a very good glow up didn't it. but here, is more raw editing, so, let's create a file, i'll call it circle, now i'll paste the code and save the file. now i'll change it's extension from dot txt to dot flashfx, now here i can find an import preset button, i can click it, and add the preset to my shape, and wow, a red circle. who expected that. now let's get off this clunky UI and get back to our better one, so how do we turn a red circle, into a full blown animation, because now i can generate one shape, that guess what, you can call the single shape generation too.

Here comes the step 2, the agent.

The key difference from an assistant to an agent is that the assistant does a single thing. while the agent can coordinate multiple assistants together. now i tried with many agent builders but none work and i can't connect them to my app, so i had to code my own assistant within the app. here is how it works.

The user write a prompt describing his animation, a first assistant, breaks it down in "More prompts", and each prompt has a number from 1 to 999999, and if you need more prompts, just kill yourself. Then, the App breaks donw all the prompts from a single file, to a multi file, that gets given one by one to another assistant, that generates the Shapes prompts. if you are not understanding anything don't worry, i'll do an example. then all the shape prompts get separated into different files one again by the app, and one by one it requests the shape generator assistant to give all the shapes, that the app places into the canvas. nice, now we have a good pile of nothing, because we have all the necessary components, but the animation is not assembled yet. it'd like a leg, we have all the pieces, but they are all over the place. so here comes step 3, Composing the design. this is by far the hardest thing so follow along with maximum focus.

the objective is to place the objects into their correct coordinates to make a design, and all we have is a prompt and a bunch of shapes. so, another assistant, takes all these information's, and start thinking, given the dimensions of the Shapes, where do i need to place them, in a 3840 by 2160 canvas, to create an animation? so it calculates the center of the animation, and starts to build around it. if i place this at 1000 x, since it has an height of 200, i will have 800 left. based on the reequipments, he asks to do a steps animation, similar to this image, i notice this white line, let's do this one first.

after all the calculations, he checks and refines the animation adjusting positions of the shapes, and finally, for each shape, it gives it's coordinates, and the app places them in the correct position.

So, the result, an Ai that can actually generate motion graphics, but now, we just create an image, it's static, it has no keyframes, so now we are entering the first step of a new different level. we are going from creating to Animating.

Now to make you understand this, i need to introduce the concept of "context" or "input context", that is simply the ai, that keeps in mind the animation while building it, because if it does things singularly, we will end up with a mess, imagine that you, and 500 other workers are building an house, if you place your brick, and don't give any context (or project) to the other builders, you will end up with a messy home, same thing here.

The context gets measured in tokens, and ChatGPT has 4096 context tokens, and 4096 output tokens. that's why you can't ask infinite messages.

Now, i'll show you the code of a simple keyframe animation, because yes, even keyframes and clips at the end are JSON files. wow it seems to never end. now imagine giving 20 shapes, because that's the avrege for an animation, to chatgpt, and he has to output so many animations, we can't use our everyday tool, we need to use something way, WAY more powerful, and i chose androphobic with Claude Sonnet 4.0, why? because it has over two hundred THOUSAND context tokens and over 170 thousand output tokens, way more then what i need. but here comes a problem, the AI needs to know all possible features in the editor, and well, i can just put them all in the context input right, well. the tokens have a cost, and while text tokens are very very cheap, code generation is way more expensive, if i give him all the features, and how to use them, i will consume way too much money, about 1.31$ per prompt, that by making a video, i can easily burn 42 dollars, don't ask me how i know.

so i had to find a better solution, and it's always him, our good nice chatgpt, i'm gonna marry him like he saved me so many times.

so i created a Training file with all the Specification of the software, and this thing was so complex i think i will use it for documentation too.

So when the design is generated, the last thing that the app does is asking chagpt, hey, what tools should we use to animate it, so it gives the list of tools we might need.

at this point the process is basically the same except that is totally different.

so, to animate, claude takes as context all the code of the shapes, and the instructions of the animation, from here, shape by shape, he keyframes based on instructions, and keeps in context for the new keyframes. what's good about this is that the Ai first thinks of the animation before applying it. for example if all the animation heeds to ode to the left, all shapes will move to the left. this saves a lot of input tokens.

Now that we understood what is the AI and how does it work, will this be free, well, the design mode will be free forever, this means that you can create designs with AI, and you have 50 cents of tokens each month, witch is basically 40 animations like this, enough for about 5 videos a month, i had to put a limit just to avoid abuse.

while the animating part, that will have around another 50 cents of free credits, witch is enought for like 8 animations, i know, it's not much, but i have to pay on my side, the only sounrce of income of this software is actually social media, so if you like the video you will help me a lot.

So when will the software release? in about 4 months, by the end of 2025, the software will be done at least in it's MVP phase, if you have ideas for the software, just comment down below, if you want some sort of feature or button that will save your hours of editing, just comment it. see you next time, bolts.

0
Subscribe to my newsletter

Read articles from Gabriele Bolognese directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Gabriele Bolognese
Gabriele Bolognese