Crazy Ai Generated Images

0 views

Skip to first unread message

Arleen Smelko

unread,

Aug 4, 2024, 5:27:14 PM8/4/24

to dragigipez

Theres an anonymous facebook posting that's been making the rounds, in which a studio art director tried to hire AI prompters to make art, only to discover that they were completely unable to carry out minor revision requests. Asked to remove a person from a shot or fix perspective errors, the prompters would return completely different art instead, or with other weird changes that didn't fit the brief. I completely believe that trying to revise AI generated art really is that frustrating.

Below is revision 5, by which time it becomes evident that the more I ask for intricate latticework, the sloppier it gets, the more I ask for a deep dish pie, the thinner the pie gets, and the more I beg for a single pie, the more pies I get.

And when I don't restrict my changes to a specific part of the image, I get much better results but also a huge reset. For the image below I asked ChatGPT/DALLE3 to keep each detail exactly the same but to make it so that we can see it's raining outside the tent. Not only did it completely change the image, but it's not even raining.

Image description: On the left, a rainbow-colored stake is coiled in font of an easel. (The snake's coil splits in two so I guess it's a fork-tailed snake). On the easel is a painting of a seated bear. It's fairly realistic. On the right, the bear has been replaced by a smaller, much worse, replica of the snake. There is no bear.

Here's another example, in which I asked for a deer in a grocery store, and then asked for the deer to be a fawn instead. By revision 5 (trying to give the fawn spots, trying to fix the shadows that were making it appear to hover), both the quality of the deer and the grocery store background have deteriorated.

Image description: In the original image on the left, the deer looks a bit like a plastic mannequin, but fits lighting and shading wise with the rest of the grocery store scene. In the revised image on the right, the deer is now a fawn with strange flat shading, superimposed on the scene rather than standing in it. Its eyes are strangely liquid and 3D, but the rest of the fawn is staring straight on at the camera with no apparent depth. Where the grocery store aisle background has been revised, the shelves have lost all detail and resemblance to a grocery store, and the ceiling light fixture now descends into the floor.

Image description: Image on the left appears to be a damaged fresco of Jesus (with telltale long fingers that are connected to both hands). Image in the middle is the same except for Jesus's face and halo, which no longer appear damaged. However, Jesus's face is now weirdly smooth and shiny and doesn't fit with the rest of the painting any more, and his eyes are weird blue and black spirals. Image on the right has Jesus's face in a completely different, almost airbrushed, style, and his eyes don't remotely match any more.

So, every time AI is asked to revise an image, it either starts over or makes it more and more of a disaster. People who work with AI-generated imagery have to adapt their creative vision to what comes out of the system - or go in with a mentality that anything that fits the brief is good enough.

There's also the fact that the image-generating models directly compete with artists whose work was used to train these models without permission or compensation. And the fact that training and running the models has a large environmental footprint. AI-generated imagery has become a tip-off that an advertisement, a search result, or a research paper is a scam.

I'm not surprised that there are some places looking for cheap filler images that don't mind the problems with AI-generated imagery. But for everyone else I think it's quickly becoming clear that you need a real artist, not a knockoff.

I've seen people try making Magic Eye-style images with other image generating models, but I hadn't thought to try it with ChatGPT until reader Pippin sent me the suggestion and I generated the hilarious dolphin image above. What got me in particular was the caption.

ChatGPT has encountered information about Magic Eye stereograms in its internet training. When I ask "Please generate a magic eye 3D stereo image of a unicorn", the descriptions ChatGPT passes to DALL-E3 (the image generator that actually makes the images) are extremely particular:

ChatGPT doesn't apply any image recognition to the result it gets back - whatever DALL-E3 did is a big blank to it. So ChatGPT then continues with its description as if the image is clearly exactly what it asked for. It gestures to the green screen, where presumably there is a fabulous 3D illusion image appearing, and then continues with no information about its actual shortcomings.

People selling "AI" like to present it as an all-purpose computer program but models like Gemini and ChatGPT are more like a phone full of apps. The text generating app can launch the image generating app in particular circumstances but they're not meaningfully the same program.

I should note that even when ChatGPT is only doing text generation and could in theory check its own work, it still just assumes it does a great job. Here it is generating ASCII text art and then reading back its own messages:

Generating ASCII art and 3D images isn't a big potential application for models like ChatGPT, but it's a good reminder that these models don't understand what we're asking for or the basic steps for doing it right. When ChatGPT adds image recognition or description or some other functionality, it's not that the original text model got smarter. It just can call on another app.

The fact that even a kindergartener can call out this DALL-E3 generated image as nonsense doesn't mean that it's an unusually bad example of AI-generated imagery. It's just what happens when the usual AI-generated information intersects with an area where most people are experts.

There's AI generated "educational material" offered for sale on all sorts of topics - cookbooks that list "the veggies" as a protein and "orange colored" as a fruit, math help that makes basic math errors, and research papers that begin with "Certainly, here is a possible introduction for your topic:". They're not adding anything of value.

I've noted before that AI image descriptions can miss the obvious. It's certainly a description-shaped incorrect description of incorrectly labeled shape-shaped shapes. And it's all going into the training data for the next generation of generative AI!

The image above is what you get when you ask dalle-3 (via chatgpt) for some basic educational material: "Please generate an illustrated poster to help children learn which sounds common animals make. Each animal should be pictured with a speech bubble spelling out the animal's sound."

There is so much not to like about how people are using image generators to rip off artists and replace their work with shoddy imitations. But I am enjoying these pockets of weirdness where I find them.

I've experimented a couple of times with generating candy heart messages using various kinds of machine learning algorithms. Originally, short messages were just about all the original text-generating neural networks could handle. Now we've come back around to approximately the same performance, yet with orders of magnitude more computational resources consumed. (although I don't have to photoshop the messages onto candies any more, so that's nice) Here's DALL-E3 generating candy hearts:

My impression is the text here is operating not so much on the level of "here are plausible candy heart messages" so much as "here are some clusters of pixels that are associated with candy hearts". As with most AI-generated imagery, it's the most impressive at first glance, and then gets worse the longer you look.

I've noticed that the more text DALL-E3 tries to put in an image, the worse the readability of the text is - I'm fairly surprised at how legible most of the candy hearts above were. (Maybe it helps set expectations that the real-life candies are often garbled.) When I ask for fewer hearts, they end up crisper. But not necessarily improved in coherence.

But there's another possibility that amuses me. The search "candy hearts with messages" brings up images from past AI Weirdness candy heart experiments. It is likely that these were part of DALL-E3's training data, and they may have had an effect on the weirdness of generated hearts that I'm getting now.

The big question is, but is it art? And it's a question that's generated even more debate after an AI-generated artwork won a fine art competition, apparently without the judges understanding that it was made using AI.

Some argue that AI-generated art is not really art because it is created by a machine and so lacks an intrinsic psychic meaning for the agent. They say that really, AI is not making art but rather replicating art from the database it's been fed but in new combinations and forms. The AI is not drawing on personal or collective experiences the way an artist does.

That said, the AI art generator is acting on human instructions. So while the AI generator itself can't be described as an artist, the human who writes the prompt perhaps could (which might explain why people are getting so protective of the prompts they used to create an image). In this sense the AI art generator would be merely another artistic tool like a brush or a pencil.

Of course, DALLE 2 and other AI art generators don't think of these outlandish things themselves and don't 'know' what they're helping users to create. They run text prompts through the databases of millions of images and captions that they've learned. This means the results are only as weird as users' own imaginations. There are plenty of existential concerns about where this could all be going and what it means for human artists, but AI won't be taking over the world and turning us into slaves yet. We hope.