I Can 39;t Believe Your Eyes

0 views

Skip to first unread message

Cristhian Cinq-Mars

unread,

Aug 3, 2024, 4:55:51 PM8/3/24

to zischcotama

Text data was never trustworthy and images and pictures, although they were mostly believable for a while, can be trivially altered with software that far precedes AI (e.g. Photoshop). ChatGPT and DALL-E add a scale factor to that untrustworthiness, but we have long been wary of the media ecosystem they belong to (which they allegedly threaten). Videos, however, are (were?) a source of truth.

I promised my analysis of Sora would be divided into two parts. One, that I published last week, about Sora, the AI model\u2014what it is and what it can do\u2014and a second one about the broader second-order cultural and social implications of the technology.

As I deep-dived into this second part I realized it should be further divided into two. Both of them deal with how video-generation AI (not just Sora but in general) affects our culture. But I approach them from opposite perspectives: the part you\u2019re reading now is concerned with what we\u2019ll lose as AI-generated videos become indistinguishable from human-made ones. The next, which I\u2019ve yet to write, is about what we\u2019ll gain.

Although this distinction appears to obey a negative-positive dichotomy, that\u2019s not my intention. We can gain something we don\u2019t want or need, which doesn\u2019t make it better than losing something we held dear to us\u2014or at least, something we took for granted whose loss we\u2019ll have to grudgingly deal with.

Alternatively, out of fairness with a position I can\u2019t support at this time but that I acknowledge some people do, I concede adaptation to seemingly unneeded changes is at least as powerful as resistance to unwanted ones. Worlds are built out of this paradox and most people don\u2019t complain that much after the fact (or do you know anyone who, like Socrates did, condemns writing as a forgetfulness-enabling tool?). I\u2019ll try my best to reflect this truth in my articles as well.

We\u2019re at the critical time window when, like a newborn cat, we must open our eyes or go blind forever. It\u2019s the brief period between the moment we see the storm appearing on the horizon and the moment it hits us with all its power and wrath. That\u2019s us, right in the middle.

The blow will hit us, true, but thanks to OpenAI\u2019s heads-up it won\u2019t take us off guard this time. We have a precedent. ChatGPT (or GPT-4) was the natural continuation of GPT-3. The scaling laws that predicted GPT-4 from GPT-3 seem to apply for text-to-video as they do for language models: larger computers, better data, and more parameters will eventually lead to performance breakthroughs.

However, Sora isn\u2019t the first AI tool that can alter, modify, or create fake videos. Deepfakes have been improving since they first came out in 2017, conquering audio, images, and video. The recent Taylor Swift fake porn incident and $25 million less in the bank account of some Hong Kong multinational, suggest Sora will be, at most, a new way of doing a similar kind of harm.

But, as OpenAI generously advises us, we should look forward; to the incoming storm. Sora is okay but Sora 2 will be much more powerful. A kind of improvement we wouldn\u2019t get from linearly improving existing tech. Sora\u2014the idea, the breakthrough\u2014is not the same as common deepfake tech. It\u2019s not the same kind of threat to video as a source of truth.

What appears to be a \u201Csmall\u201D, quantitative technical step (e.g. edit real videos to make a deepfake \u2192 generate deepfake videos with Sora), can entail a drastic qualitative jump once its effects are translated to the socio-cultural landscape.

I know the consequences will be drastic because it\u2019s not the first time it\u2019s happened. As I said, pictures were mostly believable for a while\u2014once a trustable source of information comparable to video today\u2014but we lost that.

Photography wasn\u2019t reliable at first. At the time it was invented, we could only rely on reason and our senses to access the truths of the world. Cameras were originally thought of as an artist\u2019s tool, not a device for reality capture. 19th-century photographers didn\u2019t hesitate to change a detail or remove an object here and there, as long as it favored their artistic desires (or other, more obscure motives); if it helped them escape the \u201Ctyranny of the lens,\u201D as Henry Peach Robinson called it.

Only a century later did photography become a trustable medium. But even then, in the naive pre-Photoshop era, most people conceded pictures a high, albeit at times undeserving, epistemic value. Photoshop and other editing software weren\u2019t the first inventions to challenge the camera\u2019s role as a source of truth, but the scope to which they could be threatening was remarkable. Suddenly, anyone could twist reality at will, eroding a means of truth-grounding we had taken for granted.

What will happen when we start to use text-to-video AI to create educational videos that have subtle but critical mistakes? What will happen when the deepfakes malicious actors create aren\u2019t constrained to existing videos but unbounded in style, setting, and character\u2014capable of generating seemingly real \u201Ccounterfeit people\u201D?

It\u2019s possible to take comfort from the long history of photographic manipulation, in an \u201CIt was ever thus\u201D way. Today\u2019s alarm pullers, however, insist that things are about to get worse. With A.I., a twenty-first-century Hoxha would not stop at awkwardly scrubbing individuals from the records; he could order up a documented reality \u00E0 la carte.

Immerwahr goes on to say that perhaps we\u2019re underestimating humans\u2019 ability to not be deceived. I agree. Deepfakes, even high-quality ones or from-scratch deepfakes, like Sora\u2019s, are not reality-bending in the way most people believe. Perhaps the \u201Cepistemic apocalypse\u201D we\u2019re terrified of is not such.

And why would it be? We evolved without a \u201Ctrust first, check later\u201D kind of natural mechanism. For most of our history\u2014and I mean hundreds of thousands of years\u2014knowing with such certainty what we can or can\u2019t believe just wasn\u2019t a thing. We had to \u201Ccheck first and trust later,\u201D as Kevin Kelly writes in his recent essay \u201CThe Trust Flip.\u201D Photography and video cameras were a fleeting\u2014albeit welcome\u2014detour from an eternal state of epistemic uncertainty.

The arrival of generative AI has flipped the polarity of truthfulness back to what it was in old times. Now when we see a photograph we assume it is fake, unless proven otherwise. When we see video, we assume it has been altered, generated, special effected, unless claimed otherwise. The new default for all images, including photographic ones, is that they are fiction \u2013 unless they expressly claim to be real.

To this, I say: \u201CFear might not be warranted but something else is; asking why?\u201D Why would I want to get back to epistemically brittle pre-photography times just because someone found it worth it to make a fake-video-generating tool? Something in exchange should be provided to merit such an unnecessary concession.

Kelly doesn\u2019t enter into judgments, but that\u2019s the part that matters, right? It\u2019s deeply undesirable to sacrifice the common good of having a shared ground truth for\u2026 the promise that the future will be better. Because that requires trust. And in an ironic twist of fate, my long-evolved trust-deserving people-detecting evolutionary mechanism tells me to not trust OpenAI.

But I said I\u2019d be fair so here\u2019s the natural response to my argument\u2014not really evidence-based but a history-doesn\u2019t-repeat-but-it-rhymes kind of prediction: We adapt. We always do. And once we do, we realize the world is better off.

I won\u2019t challenge this notion because I agree with it. Technology happens and, a few generations later, it\u2019s not only taken for granted but often seen as an irreplaceable reality. How many times have you thought, \u201CI couldn\u2019t live without [insert literally anything].\u201D Well, that thing, whatever it is, is the product of technological progress.

This is true just like it\u2019s true the other side of the coin; technology is always a trade-off with the customs of the times it disrupts. While it happens, it\u2019s hard to see the benefits of technology. After it\u2019s happened, what becomes hard is, instead, to remember the part of life that was better before.

As Kelly says, we came from chaos\u2014we evolved to thrive in a \u201Ccheck first and trust later\u201D kind of world. We\u2019ll do fine this time there, too. But losing that calmness and adapting to chaos will be a slow, painful process. The flip trust was never without a cost.

Having gotten out of that wild, misty, uncertain, and unforgiving world for a while, only to be forced to go back once again, is just not something I had in my \u201CThis is the future I want\u201D bingo card.

Look at an optical illusion and you may think you're seeing things -- such as a curved line that's actually straight, or a moving object that's standing still. You wonder if your eyes are playing tricks on you. It's not your eyes. An illusion is proof that you don't always see what you think you do -- because of the way your brain and your entire visual system perceive and interpret an image. Visual illusions occur due to properties of the visual areas of the brain as they receive and process information. In other words, your perception of an illusion has more to do with how your brain works -- and less to do with the optics of your eye. An illusion is \"a mismatch between the immediate visual impression and the actual properties of the object,\" said Michael Bach, a vision scientist and professor of neurobiophysics at the University of Freiburg Eye Hospital in Freiburg, Germany, who studies illusions and has a large collection of them on a Web site.