AI is coming for your jobs, creatives! Or… it will be, eventually. Not this year. But coming soon.
The last two years have seen surprising and indeed almost shocking advances in AI creative work. Image-generators like Dall-E, Midjourney, and Stable Diffusion that work from text prompts using diffusion models are capable of some amazingly high quality work at times, though admittedly their understanding(1) of fingers still leaves something to be desired. The even more astonishing ChatGPT can sometimes not just hold conversations that pass the Turing Test, but can also create working computer programs, compose poetry, and also write coherent short stories, apparently from scratch.
Some may reasonably argue the work of such AI systems isn’t creative at all, but is a mere regurgitation or combination of human efforts compiled from its training sets. But on the other hand, much of human art is based on pastiche and emulation, not to mention plagiarism, and both writers and artists have to spend years of training learning the techniques passed on by their instructors.
The State of the Art
Caption: Protagonist Bunny Häschen from my novel in progress. Generated by Stable Diffusion 2.1 using the text prompt “intersex bunny detective”. A generally rather appealing caricature, but note the odd left hand, the characteristic nonsense characters in the sign, the content oddity of them holding a sign at all, and the mysterious flap rising out of their cravat.
Source: Stable Diffusion
Recent developments are truly astonishing given previous struggles to get AI systems to create art, prose, and other works. It’s now possible to type a few words and get some art back in a minute that might have taken a human hours or even days to produce from scratch. The quality of the AI-generated art is quite variable; from one request to another you might get back commercially usable material or you might get worthless trash. But even considering that you may have to make quite a number of requests of the system to obtain a usable image, the results are still much faster and cheaper than using a human artist for the equivalent level of work.
It’s arguable that if created by a human, these AI-generated images would sometimes be direct plagiarism. Even when not clearly a direct copy of an existing work, it’s often the case that the AI system builders failed to obtain artist permission to incorporate their work into the system’s training data. The legal consequences of this failure will work themselves out in the courts, but for now there’s nothing stopping people from making use of these tools.
At present, complete high-quality AI art is generally unobtainable no matter how much time is taken playing with prompts, but on the one hand, commercial-grade art is often quite acceptable, and on the other hand, by using the AI’s art as a basis, a skilled digital artist can create a high-quality composite of human and AI art much more quickly than working from scratch. This, for example, is what Tor did with Christopher Paolini’s recent book cover.
In the world of prose, with suitable guidance ChatGPT can generate a coherent and consistent story, but not one that’s very high quality. Without substantial revision, the current quality of such text is typically mediocre, but at present it’s not inconceivable for some exceptional AI-generated story to make it past a slush reader’s quality threshold and at least be held for review. Quite the achievement considering the high level of competition many magazines impose on contributors. This is already motivating many AI-generated submissions, though it’s unclear if they are meant for prestige and profit for their pseudonymous submitters or to count coup. Clarkesworld, for example, has reported a recent spike in AI submissions despite their guidelines to the contrary.
The Future of the Art
Because these recent developments in AI art generation have seemed to come almost from nowhere, it’s hard to say how rapidly they will improve to match or even exceed human capabilities. We should keep in mind that autonomous cars were expected to be widely available by now on the basis of work done in the 2010s, and yet after a promising beginning, only slow progress has been made and such vehicles are still unsafe on real-world roads.
On the other hand, given the eagerness of many extremely well-funded tech groups to work in this area, including not just OpenAI but Microsoft and Google among others, it’s reasonable to expect further progress up to a point. What’s the limit? For current system architectures, I expect that limit is based on these systems’ lack of explicit real-world knowledge. Just as ChatGPT is capable of absolutely authoritative but totally incorrect and easily refutable statements, these systems will be hamstrung by the inability to reason until such capabilities are combined with their generative models.
Let’s establish three bars:
- Acceptable low-grade commercial art. The kind of graphic art that creative freelancers currently make to order for ad campaigns and other commercial applications without much funding, but which will still pay their rent. Equivalent prose would be acceptable for publication in some token and semipro magazines and for routine copywriting assignments.
- Acceptable high-grade commercial art. Commercial art produced for highly funded campaigns by major agencies and design firms, or prose and content that can be published in the better magazines and journals.
- Superior-to-human fine art. Masterpieces that would displace human efforts from galleries and museums and bookstore racks and win awards if AI and humans competed on a level playing field.
We’re currently just entering stage 1 for certain applications. This means that soon some semipro freelancers may experience serious competition from AI systems, and after a while some agencies and design firms may reduce their staff because their routine low-end work can be done by machine. There will still be plenty of human work involved in revising, incorporating, or compositing AI work into larger and higher quality artistic achievements, but the amount of human touch in the overall process will diminish.
At stage 2, entire industries will be turned upside down and the effects will shake up whole economies. For example, design firms may no longer require creatives, or else the few creatives they retain will largely be employed managing requests to AI systems. There will be no more need for human touch in photoshopping or otherwise compositing AI art with human-created elements as in stage 1; the AI will do all the work on demand. (Note that while some very high quality AI art has been created already, it typically requires considerable manual effort from a human to achieve such a level).
The good news is that my average-case guess for stage 2 is at least ten years, and it may well be generations before this level is achieved. The bad news is that I could be wrong and it could be next year. I was shocked by ChatGPT’s capabilities in 2022 after years of crappy chat bots that wouldn’t fool Turing for a minute, and so I may easily be wrong again. Still, I do think that without explicit real-world knowledge and reasoning abilities these tools will be unable to really excel for quite some time. At present no one has any idea how to give a generative AI system that kind of understanding.
At stage 3, humans start wondering why they should even bother creating art, especially if such superior work can be routinely requested of AI systems by anyone without incurring much cost.
Similar considerations apply for stage 3, which to my mind requires Artificial General Intelligence to achieve. Such a system is not absolutely out of the question. We just don’t know how to build one today, nor do we even have a good idea on a direction to follow to achieve such a goal.
Still, even stage 1 is problematic enough for human creatives. I’d guess full stage 1 will be achieved within five years, and whatever the courts decide about existing systems and their theft of artist work without permission, one way or another AI-generated art will become ubiquitous for inexpensive applications.
(1) Indeed, the reason that such systems don’t know how to draw fingers properly is they don’t even understand the concept of fingers. These systems don’t have old-fashioned knowledge bases that contain explicit facts or relations. The appearance of a hand is emergent from mysterious mathematical features derived automatically from the digitized training set. Discrete counts of things that characteristically appear in fixed numbers in nature but are often hidden from view in individual images can be problematic for such systems. Human images very frequently show two clearly identifiable arms and legs and so AI images get these right more often than not (not always however!) but hands in photographs less frequently show all five fingers clearly, and so generated images often don’t do so either.
BIO: Laurence Raphael Brothers is a writer and a technologist with five patents and a background in AI and Internet R&D. He has published over 50 short stories in such magazines as Nature, PodCastle, and Galaxy’s Edge. His noir urban fantasy novellas The Demons of Wall Street, The Demons of the Square Mile, and The Demons of Chiyoda are available from Mirror World Publishing, while his new standalone novel The World’s Shattered Shell has just been published by Water Dragon.