My thoughts on the short and long-term
Generated by DALL•E 2
There’s been a lot of Twitter discourse around AI art in recent weeks and months. I’ve mostly refrained from weighing in, partially due to an incredibly packed work and travel schedule, but also because it’s a discussion that requires a level of nuance and length that Twitter is ill-suited for. After considerable research and many discussions with other professionals, I’m going to try to to collect my thoughts on the subject in this post.
I draw pictures and hate numbers, but I nonetheless think it’s important to have an understanding of how something works before forming an opinion on it. I’m going to attempt a simplified description of how modern AI algorithms generate images through a process called diffusion. I want to preface this by saying I’m obviously not a computer scientist, this is just my takeaway from my own research.
Imagine we create a sequence of 50 pictures, starting with a simple line drawing of a triangle, and progressively add noise to each image until we reach a final image that is only noise.
Next, we tell a computer to look at each image we’ve generated and come up with the equation needed to produce the next in our sequence. Since the only difference between a given image and the image after it is the small amount of noise we added, the computer is actually just learning how to generate the noise needed to produce the next picture, not how to generate the entire image itself.
This process is called training, which is a fancy way of saying you provide the computer with some input data and it keeps tweaking its metaphorical knobs until it can produce something that best approximates it. The important thing to note here is that the computer does not actually keep the images we provide as input, it only creates an equation that can be used to roughly regenerate the noise in the images it was provided.
Now we’re going to spend thousands of hours doing this process for every image of a triangle we can come up with. We’re talking right triangles. Isosceles triangles. Gay triangles. Gay isosceles triangles. Any kind of image containing a triangle we can find, it’s going in our training set. And with each image set we run through our program, the computer gradually refines its process for producing the noise needed to transform the current image into the next image in its sequence.
Congratulations, we’ve just built the world’s worst program. 🌞 We can pass it an image of a triangle and get an image of pure noise in return.
But if we can train our computer to take an image of a triangle and turn it into noise, is it possible to take an image of pure noise and turn it into a triangle? The short answer is yes—using the power of math I don’t understand, we can reverse the process. Now, instead of adding noise with each step, we’re asking the computer to remove it gradually. It doesn’t “know” what a triangle is, but the noise removal process it taught itself always produces a triangular shape as a result. If we provide it with a large selection of training images, we can cause it to denoise into almost any pattern we want, including art shaped ones…
Now that I’ve thoroughly frustrated any AI-focused computer scientist who may be reading this, I want to underscore that modern, diffusion-based AI is doing more than just collaging bits of pre-existing images; it’s developing a process whereby it can produce new images based on the characteristics it’s identified as belonging to the giant set of categorized images it’s been trained on.
A more in-depth thread that details some intricacies about diffusion that I’ve excluded for brevity.
With a basic understanding of how it works out of the way, let’s discuss the consequences. In my view, these fall into two buckets: the philosophical ramifications and the economic ones. The philosophical questions—are the resulting images even art? is the person who prompted the generation considered an artist? and so on—have already been discussed at great length by other professionals, so this line of questioning feels unnecessary for me to spend much time wading into personally. Instead, I’d like to consider the second set of concerns: the economic ones.
Adult behavior can often be understood through one question: how will this affect my livelihood? And when your profession is so associated with precarity of employment, that the word it’s most commonly prefixed by is “starving”, anything that introduces additional uncertainty is going to be a sore subject.
The pending automation of labor has been a topic of discussion for as long as I can remember. Those discussions often center around its inevitability, but more notably ascribe an order with which it will progress: it will be the “unskilled” or “low skill” jobs that are the first casualties. Truck drivers, garbage collectors, warehouse workers—of course, it was only a matter of time before self-driving vehicles and automated forklifts did away with their necessity, right? “What do they expect?” one professional might say to another, before suggesting these workers move to an urban area and retrain themselves with a modern, irreplaceable skillset.
This imagined order in which automation unfolds reveals the fable many in the modern professional class have internalized regarding the worker-employer relationship. The concept of “skilled labor” has grown from economic jargon into a powerful cultural narrative, in which we’re incentivized to believe in a hierarchy of workers that share no common concern and, more importantly, no common fate. This is, of course, false. In my opinion, this is where mixing philosophical critiques with economic ones becomes problematic, because it continues a tradition of artists viewing our labor as a fundamentally unique pursuit, distinct from other forms. Almost everyone would agree that a delivery is a delivery, whether it’s a postal worker or a drone that drops the package on your doorstep. But art? Commercial artists tell ourselves that at the end of the day, our employers can never replace us, because you cannot, by definition, create art without an artist.
While this has historically been true (and I believe will largely remain so, even in the era of AI generated imagery), the following is the only truth that ultimately matters: under the economic system of capitalism, an employer's job is to generate the greatest amount of profit using the fewest amount of employees paid at the lowest acceptable rate.
That doesn’t mean the individual people that employ you are bad humans—we’re all functioning under the same system, after all—it just means it’s important to remember that it’s not their job to ensure you retain yours. If there’s a machine they can use to achieve the same outcome at a lower cost, it’s in their interest to do so. To a balance sheet, you are not an artist, you are an expense and under capitalism, your labor does not create art, it produces a product.
Given this, the question is not “will AI replace human artists?”, rather it’s “will our employers?”
The atomization caused by the competition required to succeed in creative fields, on social media, and in art school has broken our brains. While it’s understandable to fixate on how this may hurt our individual career opportunities, we’re currently in the narrow window of time where AI art and the culture around it are still malleable enough that we can actually contribute to shaping it. We cannot afford to simply take a principled stance against it and refuse to interact with/ignore it. As long as we remain under an economic system which prioritizes profit, all workers will remain in competition with technology looking to make them redundant—that is fundamental to our system’s design. The struggle is not against a specific technology, nor is it specific to our industry. The struggle is a generalized one against the way companies wield and profit from our labor and the anti-democratic nature of our current economic model.
My hope is that this will encourage some of us to begin exploring the creation of small art collectives, co-ops, and employee-owned studios, so that we can collaborate with one another to bring our ideas to life—not as self-contained “brands” who are in competition with each other, but as teams that can complement each other’s unique skill sets. We can advocate for policies and regulations that stop exploitation in creative fields, but nobody is immune to where this is ultimately heading and we have to see ourselves as part of a larger struggle all workers face for economic agency. We must participate in advocacy and action to support generalized rights for workers and a democratic economy.
Even if automation is inevitable, neo-feudalism is not.
Thoughts on the Near-Term
I think the immediate effects on commercial artists won’t be quite as dire as some have predicted. Yes, these programs can generate some impressive images right now. Yes, it’s getting better every day. But there’s a huge cloud of legal ambiguity over AI generated imagery at the moment and that will remain the case for quite some time. Most of these legal questions are not going to be preemptively answered through legislation, especially here in the U.S. It’s going to be a long process of litigation and appeals, likely producing no broadly applicable ruleset, only loose guidelines based on legal precedent. There will certainly be projects that use AI imagery in lieu of hiring a human artist, but these are likely projects that would’ve opted for stock imagery or attempted to hire a gig worker for offensively low wages. I don’t think it will be the companies primarily responsible for employing many of us, as their decisions will not be made by executives, but instead be mandated by their legal departments.
That’s not to say that the job landscape won’t change. What I believe is more likely to happen in the next few years is AI tools integrating into existing workflows to aid in rapid conceptualization of mood and tone, before being refined by in-house artists. However, the practically limitless and immediate nature of diffusion-based AI means that whatever time is saved conceptualizing is traded for time spent on curation and refinement, as choice fatigue becomes a very real problem.
It may surprise some to know that many tools and features that have become mainstays, such as Photoshop’s Content-Aware Fill, are either driven by or have been improved using the same sort of machine learning algorithms that have created diffusion-based image generators.
Dynamically relighting a static image. An example of the way AI assisted tooling can be used to the benefit of visual artists.
There will be new positions created within the industry that center around knowing how to skillfully interact with the AI to produce usable results. Even if a manager or director can technically prompt the AI themselves, that doesn’t mean they will. Many can attest to the boss who insists on walking over to your workstation and explaining things face-to-face that would’ve been better done by email.
That human factor is something a lot of professional artists have glossed over when discussing the likelihood of our obsolescence: draftsmanship, technical skill, taste and talent—all the factors that come to mind when we think about “what makes a good artist”—are only 50% of what most creative jobs actually require. The other 50% is purely soft-skills: how much of a team player you are, how well you take direction, how effective your problem-solving skills are, whether you can make positive contributions during stressful periods, and frankly, whether or not you can form bonds with your colleagues/collaborators by being trustworthy and likable. All extremely *human* factors, all much more important than “how good of an artist” you are. (In my opinion, anyway.)
The reality is that the fallout of this technology will not be equally distributed. How automated image generation affects you is going to vary a lot based on your age, location, specialization and professional relationships. Although we speak of artists as a collective, the same stratification present in the broader economy is present in our own creative industries. What this technology means for students versus entry-level artists versus established professionals is very different, further compounded by whether you’re working an in-house job or as a freelancer.
Established professionals are the least at risk of immediate turmoil, in my opinion. These are artists that entered the industry in a different era, even if it was only 3-5 years ago, and whose professional relationships have been built and maintained through an extended period of work. When you’ve reached this level in your career, your influence is enough that there are likely already multiple artists working in a style similar to or inspired by yours, so if a client’s goal is simply to have the aesthetic rather than the artist at the lowest possible price, there are already options.
Professionals at this level, and I include myself in this category, really need to inspect the way we engage with aspiring and emerging artists. Condemning those who experiment with new technology or demanding they refuse employment/define rigid contractual language comes across as very out-of-touch. It also places the blame on the wrong people and frankly gives me flashbacks to the professors and professionals who told my graduating class all of the things to never do and the jobs to never take, ignoring the incredible amount of leverage and influence it takes to make any demands of a client.
That’s not to say seasoned pros aren’t making valid points—they’re often coming from a place of genuine concern, informed by years of exploitation themselves. But when you’re at this point in your career, you’re no longer actively looking for an “on-ramp” and the landscape has shifted so fast that we’re bound to give advice that is outdated, if not insensitive.
For aspiring artists and students: don’t be afraid to experiment with new technology, even if your seniors approach it with skepticism or derision. There are clearly tons of ethical questions raised by the rapid emergence of A.I. generated imagery, but that shouldn’t prevent you from finding interesting ways to work new technology into your workflows.
Thoughts on the Long-Term
In terms of what I think this means for the future of creative storytelling, visual arts, media, entertainment, etc.: we need to recognize that AI is, at this stage, still fundamentally a tool. And more than a tool, it’s an incredible advancement in technology that will make the act of creating visual media more accessible to everyone, which I think will ultimately prove to be a good thing in the long term.
It’s worth noting that art has historically been a profession that only the wealthy could pursue. Setting aside the upfront financial costs of supplies/materials and access to dedicated space, the time needed to train and develop the prerequisite skills practically necessitated access to wealthy patrons or family. The introduction of commercial art produced for mass consumption made it possible for a wider range of people to work in the arts (myself included), but it also meant the artists in these fields were not working in service of personal vision, but executing other people’s ideas for increasingly low budgets.
And as media becomes more complex and its audience more sophisticated, production of interdisciplinary media like blockbuster films and AAA video games require teams of hundreds of people working exploitative hours for years at a time. This requires a financial investment only a handful of conglomerates can manage and is reliant on a global audience to generate a profit, which translates to incredible risk aversion by decision makers. Even on smaller productions, character design and aesthetic decisions are made around what’s merchandisable, as recouping expenses and generating a profit always takes priority over creative vision.
AI generated/assisted imagery has the potential to fundamentally change this dynamic. For as much uncertainty that individual illustrators face, media companies will reckon with a new world freed from their creative monopoly, where a small group of artists can make incredibly complex media that would’ve previously been impossible without their resources. Consider a few creators being able to make a completely self-funded feature-length animated film, a co-operative indie studio developing an enormous AAA open world game, or a lone teenager creating cinematic short films without leaving their bedroom. Large, profit-focused production houses will be making media in a world where equally elaborate projects can be developed and distributed independently at minimal cost. While I don’t expect this to result in their demise, it may fundamentally alter the balance of power.
There will always be low-effort media. This predates AI images (think “Pop Art” filters imitating Warhol), and will continue to exist after. But just because a technology can be used to produce low/no-effort images doesn’t mean it can’t be used to create legitimately compelling art when put into the right hands. The antagonism of some in the AI art community is saddening and frustrating, but these types are just extensions of those that have been tracing images, removing watermarks, and reposting uncredited artwork for years. We cannot let the attitudes of users be conflated with the value of the tools; we should not hold curious creators responsible for the ethical failings of the companies involved; and we must not buy into the idea of a caste system of labor, where artists exist as an insulated class of worker detached from the larger whole.
I’m proud of the artists who have put effort into directly engaging with the companies and teams developing this tech and with the lawyers and policymakers who may help design guardrails. But we should be careful that our guardrails do not become gates, and remind ourselves that there exists the potential for a radically different future, where artists can create elaborate works independent of the profit-focused systems we must work within today.