Lise Yellen's Portfolio
    Lise Yellen
    Lise Yellen
    /
    Case Studies
    /
    Generating AI Video from a Photo Reference
    Generating AI Video from a Photo Reference

    Generating AI Video from a Photo Reference

    My two Gen Z daughters tell me they will never approve of AI art.

    In their view, art should be made by people not computers. To which, I agree. But, some kinds of art are made by people using computers.

    The way I see it, a computer is an innovation, like the printing press and the camera and the ball point pen. It’s just one of the ways to make art. You don’t have to like it. It didn’t replace the fine arts: painting, drawing, ballet… but it did become a too to help you get your work done a lot faster if you work in advertising.

    Along the way there has been the outraged and existentially threatened establishment who objected to the crass and unworthy innovations putting them out of a job. With cameras you can make a portrait in one click of the shutter - what is that next to a master painter? The first professional photographers were women because it wasn’t considered “real” art.

    Now we are in the era of Generative AI, and like it or not, it’s a huge driver of our economy and the biggest change in how we (designers, creative people) work since the personal computer. And people are worried! Does this mean the art is no longer made by us? Have we crossed a threshold that removes human effort from the making process now that AI puts art making in the hands of anybody?

    I think that is the heart of the problem that our culture is struggling with. To some, the very existence of generative AI is five-alarm existential threat. My brother, a musician and songwriter with a 30-year loyal following, looses fans by just suggesting he MIGHT use AI video in his music videos. It’s extremely political now. AI servers are polluting the environment and bringing an end to gainful employment for all beings. It’s ending democracy and capitalism at the same time. My daughters truly believe it is evil. People are doxed if their Substack posts display a trace of AI intervention. Scary!

    With that kind of potential blowback, and between the doxers and the stern scorn of my teenage daughters, I’m wary of what could happen if I jump fully into this. I don’t want to contribute to the downfall of civilization, obviously! But, with apologies to the end of gainful employment of all knowledge workers, I’m genuinely curious.

    As a designer and digital artist, what can generative AI do? It’s fascinating. I’ve been dabbling for the past five years, trying out tools here and there. And, I work at a creative agency that touts itself proudly as the first “AI agency”, and since I don’t have a Substack, or any fans to speak of, I’m giving myself permission to go for it. While keeping it on the DL from my daughters for now.

    Yes there is a lot of crappy AI ads all over the socialsphere, but there is also some good stuff here and there. I’m obsessed with the work of AI Video artist Kelly Boesch. I’ve been following her on YouTube and other platforms for a few months and her hypnotic AI creations that blend surreal imagery with true vibes, are unmistakable. She has an ownable style and her creations take AI to a level of surreal art, inspired by Salvadore Dali, and she achieves what I think is the true interesting promise of generative AI: making art that is beautiful, not trying to be real, not trying to fool anyone, but its what it is. Generative AI. Unapologetically.

    I found an interview where she described her process: with finely-honed prompts she generates imagery with Midjourney, then brings it to life as a video clip and edits it with her original AI-generated songs.

    So just to see if I could, I’m going to try to make my own AI video, an homage to Kelly Boesch. To quote Song Sung Blue, “I’m not a Kelly Boesch impersonator, I’m a Kelly Boesch interpreter.

    So for my first clip, I’m going to see what platform of the three that I regularly use, can generate a video based on my specific vision.

    I’ll start with this image I created in Midjourney as an avatar for my Suno channel, Ione:

    Image generated by Midjourney.
    Image generated by Midjourney.

    The meditating woman is still but the painting reflects the inner breadth of her expanding consciousness, and the ability of art to create rich worlds. My goal is to create a video clip that I can use in a longer vide, that shows this exact image, where the woman holds still and the clouds animate in the painting in front of her.

    A simple first exercise.

    First, let’s let Midjourney try. It has a handy button right there that says “Animate,” so easy peasy? I added:

    The clouds in the painting are churning as though blown by the wind
    image

    I got FOUR videos, in about 30 seconds. This one was the most interesting but for the wrong reason:

    Nightmare Fodder. Bodies don’t do that—aahhhhh! #No.

    The color, the values, the aesthetic are all great, but… maybe it was my prompt? Why would I expect a computer to know what “Churning” means? Yikes.

    I’ve had good results with ChatGPT generating very specific images based on references. So I uploaded the original image and the following prompt:

    Animate this image. The girl in lotus pose is still but the painting in front of her is animating. The clouds are billowing as if blown by a strong wind in the painted environment. The girl's hair softly flutters.
    This is an animated gif. Its easy to miss at first but its moving. Wobble
    This is an animated gif. Its easy to miss at first but its moving. Wobble

    Wow, very disappointing results from my favorite robot therapist.

    I thought I was more specific with my prompt, but I guess “Billowing” is also a pretty esoteric word for a computer.

    So now I’m turning to Veo3 - Google’s video tool. And I did some research to see what I really wanted from those clouds and if I could articulate it better. I found a time-lapse video of clouds doing what I had in mind and so I used that word in my next prompt in the Gemini AI studio.

    The woman is still as she meditates in front of a painting of a large cloud that is moving as though a time-lapse video of clouds billowing in the wind.

    Veo3 returned four videos two of which I think, nailed it:

    The woman is still as she meditates in front of a painting of a large cloud that is moving as though a time-lapse video of clouds billowing in the wind.

    Wow, not only were my results much more accurate, it gave me multiple takes that had subtle differences. It has it lost some of the aesthetic of MidJourney that I love, Maybe? So back there tor a second try, so erasing the first prompt, I included in the text field simply:

    ”The woman is still as she meditates in front of a painting of a large cloud that is moving as though a time-lapse video of clouds billowing in the wind.”
    Now the clouds are moving but stiffly, but… she is not still or meditating. The tornado is coming for her and her twitchy movements are a wee bit creepy.

    I love how midjourney kept the painting / animation quality of the clouds rather than making them photorealistc. Now they feel like an animatic, or anime motion, holding the integrity of the painted world where Veo3 made the much more like real clouds until the end when the resolved into the painting. These are stiffer but somehow keep the artistic integrity.

    But the woman’s reaction is right out of a tornado movie, and she is definitely NOT meditating. No acting!

    So, this was an interesting experiment!

    My take-away is that prompt writing really is a tricky art, as overly-simplistic as that sounds. The nuance of using Generative AI is more about learning a vocabulary for describing motion and imagery (something that may not come that easily for artists) and knowing which tools to tap for what - requires a lot of research and trial and error.

    The and the skills are different from those of a painter or videographer, but in the end making something that can be considered art, skillful, professional or just not “AI Slop” falls comfortably into the domain of the art director. A person who basically makes creative decisions without necessarily having the narrow skill verticals of a painter, photographer, or videographer. It is an art form for the generalist with an idea, rather than the specialist with a skill. But it should never replace the work of the actual artist or videographer.