Something very strange and disturbing happened to me this week. If it was just relevant to me, it wouldn’t be that important (except perhaps to me), and I wouldn’t be writing this column about it. But it’s something that is likely more important and more ominous than we can even imagine.
There are already common fraudulent schemes being perpetrated by both telephone and internet. One know as the “Grandparent Scam” is particularly reprehensible, first because it is perpetrated on elderly people who are, in general, more susceptible to tech-savvy criminals and second because it is based on the manipulation of familial love, trust, and compassion. The criminal running the Grandparent Scam calls or emails the victim, pretending to represent a grandchild who is now in trouble with the law or who needs money for a hospital bill for an injury that can’t be discussed, say, with parents, because of the moral trouble that might ensue. They generally call late at night—say at four in the morning—because that adds to the confusion. The preferred mechanism of money movement is wire transfer—and that’s a warning: don’t transfer money by wire without knowing for certain who is receiving it, because once it’s gone, it’s not coming back.
Now what if it was possible to conduct such a scam using the actual voice of the hypothetical victim? Worse, what if was possible to do so with voice and video image, indistinguishable from the real thing? If we’re not at that point now (and we probably are) we will be within months.
In April of this year, a company called Coding Elite exposed an artificial intelligence (AI) program that to a substantial sample of my voice, which is easily accessible on the YouTube lectures and podcasts that I have posted over the last years. In consequence, they were able to duplicate my manner of speaking with exceptional precision, starting out by producing versions of me rapping Eminem songs such as Lose Yourself (which has now garnered 250,000 views) and Rap God (which has only garnered 17,000) as well as Rock Lobster (1400 views). They have done something similar with Bernie Sanders (singing Dancing Queen), Donald Trump (Sweet Dreams) and Ben Shapiro, who also delivered Rap God. The company has a model, the address of which you can find on their YouTube channel, which allows the use to make Trump, Obama, Clinton or Sanders say anything whatsoever.
I happen to think Rap God is an amazing piece of work, and when I first encountered my verbal avatar belting out the lyrics I thought that it was cool, in a teenage tech-geek sort of way. And I suppose it was. This caused quite a stir on the net in April, with media companies such as Forbes and Motherboard (a division of Vice) noting that the machine learning technology only required six hours of original audio (that is, actually generated by me) to produce its credible fakes, matching rhythm, stress, sound and prose intonation.
This week, however, a company called notjordanpeterson.com put an AI engine online that allows anyone to type anything and have it reproduced in my voice. It’s hard to get access to or use the site, at the moment, presumably because it is currently attracting more traffic than its servers can handle. A variety of sites that pass themselves off as news portals—and sometimes are—have either reported this story straight (Sputnik News) or had a field day (Gizmodo) having me read, for example, the SCUM manifesto (hypothetically an acronym for Society for Cutting Up Men), a radical feminist rant by Valerie Solanos published in 1967. Solanos, by the way, later shot the artist Andy Warhol, an act, driven by her developing paranoia. He was seriously wounded, requiring a surgical corset to hold his organs in place for the rest of his life. TNW takes a middle path, reporting the facts of the situation with little bias but using the system to have me voice very vulgar phrases.
Some of you might know—and those of you who don’t should—that similar technology has also been developed for video. This was reported, for example, by BBC, as far back in July of 2017, who broadcast a speech delivered by an AI Obama, that was essentially indistinguishable from the real thing. Similar technology has been used, equally notoriously, to superimpose the faces of famous actresses on porn stars, while they perform their various sexual exploits (you can find this story covered, for example, on The Verge, Jan 24, 2018). Movies have also been reshot so that the main actor is transformed from someone unknown to someone with real box office draw. This has happened, for example, to Nicolas Cage, primarily on a YouTube site known as Derpfakes, a play on the phrase “Deep Fakes,” which is what the video recordings created fraudulently by AI have come to be known. More recently Ctrl Shift Face, a YouTube channel, posted a video showing Bill Hader transforming very subtly into Tom Cruise as he performs an impression of the latter on Dave Letterman’s show. It’s picked up four million views in a week. It’s important to note, by the way, that this ability is available to amateurs. I don’t mean people with no tech knowledge whatsoever, obviously—more that the electronic machinery that makes such things possible will soon be within the reach of everyone.
It’s hard to imagine a technology with more power to disrupt. I’m already in the position (as many of you soon will be as well) where anyone can produce a believable audio and perhaps video of me saying absolutely anything they want me to say. How can that possible be fought? More to the point: how are we going to trust anything electronically-mediated in the very near future (say, during the next Presidential election)? We’re already concerned, rightly or wrongly, with “fake news”—and that’s only news that has been slanted, arguably, by the bias of the reporter or editor or news organization. What do we do when “fake news” is just as real as “real news”? What do we do when anyone can imitate anyone else, for any reason that suits them?
And what of the legality of this process? It seems to me that active and aware lawmakers would take immediate steps to make the unauthorized production of AI Deep Fakes a felony offense, at least in the case where the fake is being used to defame, damage or deceive. And it seems to be that we should perhaps throw caution to the wind, and make this an exceptionally wide-ranging law. We need to seriously consider the idea that someone’s voice is an integral part of their identity, of their reality, of their person—and that stealing that voice is a genuinely criminal act, regardless (perhaps) of intent. What’s the alternative? Are we entering a future where the only credible source of information will be direct personal contact? What’s that going to do to mass media, of all types? Why should we not assume that the noise to signal ratio will creep so high that all political and economic information disseminated broadly will be rendered completely untrustworthy?
I can tell you from personal experience, for what that’s worth, that it is far from comforting to discover an entire website devoted to allowing whoever is inspired to do so produce audio clips imitating my voice delivering whatever content the user chooses—for serious, comic or malevolent purposes. I can’t imagine what the world will be like when we will truly be unable to distinguish the real from the unreal, or exercise any control whatsoever on what videos reveal about behaviors we never engaged in, or audio avatars broadcasting any opinion at all about anything at all. I see no defense, and a tremendously expanded opportunity for unscrupulous troublemakers to warp our personal and collective reality in any manner they see fit.
Wake up. The sanctity of your voice, and your image, is at serious risk. It’s hard to imagine a more serious challenge to the sense of shared, reliable reality that keeps us linked together in relative peace. The Deep Fake artists need to be stopped, using whatever legal means are necessary, as soon as possible.