Generative Style

The Farm Security photographs revealed enduring stories of American life. As a dataset, what can they tell us about artificial intelligence?

WINTER 2024 Noam M. Elcott, Tim Trombley

Generative Style Noam M. Elcott, Tim Trombley WINTER 2024

Generative Style

The Farm Security photographs revealed enduring stories of American life. As a dataset, what can they tell us about artificial intelligence?

Noam M. Elcott

Tim Trombley

RUGGED CLOTHES, SKIN WRINKLED BEFORE ITS time, eyes draped in steeply raked shadows—the unnamed farmer on the previous page aggregates all the features of a vintage Dorothea Lange photograph. The off-kilter image, cream-colored mat, dark-brown card stock crowned with faint black type convey an origin in the files of the Library of Congress Farm Security Administration (FSA) collection. This description is true, intuitively and factually. But it is not the whole truth. The image is entirely the product of generative AL Its appearance is vintage Lange; its essence is data visualization.

The striking features of this image have nothing to do with the prompt we wrote to generate it. As with all the images in the first batch generated for this article, our prompt was simply “a photo of a farmer by Dorothea Lange.” But it is the underlying model, not the prompt, that produces iterations; the row of three farmers below show the outputs of the same prompt interpreted through three different models.

The image on the far right was produced with a commercial web interface and represents the most generic and impoverished products of generative AI today (an impressive feat all the same). It is a blandly attractive composition with no direct relation to Lange and would be more at home on a mass-market wine label from Sonoma than in the files of the FSA. The center image was produced with Stable Diffusion—specifically Stable Diffusion XL or SDXL—one of the state-of-the-art base models for AI image generation. Although this image demonstrates a basic conception of the world of Lange, it is a superficial resemblance emblematic of the blending of her work within the larger and unfocused context of the model. These two images belong in the files of stock photography and have little purchase—for good or ill—on the qualities we might associate with the work of a specific photographer. Rather than generate “a photo of a farmer by Dorothea Lange,” the base models generate something closer to a prompt for “a generic photo of a farmer during the Great Depression.”

As specialists in photography, media, and technology, it was clear to us that AI could do better. So we fine-tuned our own models to generate images that captured many of the peculiarities of the Lange photographs in the FSA archive. Like aZZthe images in this article—unless otherwise specified—the “Lange photograph” discussed in the opening paragraph is neither by Lange, nor is it a photograph. It is the product of our customized models. And we can produce infinite variations from them.

Today’s deep neural networks mark a rupture in the history of photography.

Consider, finally, the image on the far left on this spread. It parades the visible edge of the film and mat, overexposure at the bottom, strong sunlight and deep shadow, frayed edges of a hat, and a flat dusty landscape, among numerous other features found in Lange’s actual FSA photographs. We did not name any of these features in our prompt; instead, they were embedded in our model and integrated into this image. The consistent results can be seen in the images on page 62, four of which were generated by our model, two of which are Lange originals. Study them. Can you distinguish Lange from “Lange”? (Answers at the end of the article.)

Even if the advancement of AI technologies slows or stalls, today’s deep neural networks mark a rupture in the history of photography. And insofar as photography claims an ontological link to the world, AI fundamentally alters our relation not only to images but also to the visual world mediated by images. Stated bluntly, AI cannot generate photographs. So-called photographs, like all other images produced through generative AI, are data visualizations of highly abstracted surface patterns learned by deep neural networks.

Take the band of pictures on the next spread that depict a woman in the forest. The one at the far left appears like an early twentieth-century portrait photograph. At the far right, we recognize a computer-generated image of the same subject. But both are data visualizations. Indeed, they were produced with the exact same prompt: “a photo of a young woman seated in a forest by August Sander.” The only difference is that the image on the far right was produced using the default configuration of Stable Diffusion, while the image at the far left was produced with one of our fine-tuned models. The intermediate pictures, moving left to right, mark the waning influence of our custom model in favor of the more generic style rendered by Stable Diffusion. It is essential to understand that there are no fundamental distinctions between any of these images— they are all data visualizations. At no point does the image (if we pretend it is a single image) become more or less “photographic” in any substantive sense of the word.

Just as Walker Evans understood long ago that there is no such thing as documentary photography, only “documentary style,” today we must confront the fact that in AI imaging there is no photography, only “photographic style.” Or rather, styles. As virtuosos of surface pattern, generative models can digitally mimic, more or less successfully (depending on their underlying data and training), the appearances of nearly every photographic technique, from daguerreotypes and gelatin-silver prints through Polaroids and smartphone snapshots. From the perspective of a generative AI model, the choice between “daguerreotype” and “gelatin silver” is no different from a broader choice between “photograph” and “anime drawing.” AI “photography” is a suite of styles.

An Ai-generated “photograph” is closer to a bar graph produced in PowerPoint than it is to a photograph produced by a digital SLR camera. This fact may be difficult to accept when an AI “photograph” and a digital photograph appear indistinguishable. But it is a fact nonetheless. Other facts are more nebulous, and thus more troubling. In the new AI regime, a smartphone snapshot posted to a major social-media site lies uncomfortably between traditional photography and algorithmic images synthesized from data without any optical, chemical, or camera-based input. Consider the recent controversy over Instagram’s haphazardly applied “Made with AI” label. The dysfunction of this well-intentioned label masks the fact that nearly every photograph taken with a smartphone and uploaded to social-media sites such as Instagram undergoes heavy algorithmic intervention from the moment it is snapped through the moment it is shared.

We must confront the fact that in AI imaging there is no photography, only “photographic style.”

Finally, the notion that “prompting”—the process of providing natural-language written instructions to an AI model to elicit a desired output—has replaced the camera as the locus of photographic creativity is bunk. No doubt, prompting is already proving to be a significant commercial utility and a delightful amateur practice. But placed in its proper historical and technological context, prompting is the Kodak Brownie of AI imaging, and “You write a prompt, we do the rest” means the “we” are corporate hegemons and “the rest” is generic kitsch. One need not succumb to the pseudomorphological oxymoron that is “AI photography” or indulge in corporate creative fantasies (or nightmares) tied to prompting.

Although the imaging models released by OpenAI, Google, Stability AI, and other AI behemoths can readily generate images that are indistinguishable from photographs, those images are rarely mistaken for the output of individual photographers. This disability is unlikely to be remedied soon, for two independent reasons. First, copyright concerns have led most major AI companies to limit rather than deepen the capacity to generate pictures that follow the styles or images of specific artists. Second, the creators of generative image models must weigh technological efficiency against visual specificity. As a rule, individuality is stripped to facilitate the most economically viable and blandly proficient output. Models can mimic the look of different media (animation, watercolor, photography, et cetera), different photographic media (from daguerreotypes through digital SLR), and even different cameras. The amount of data and training required to plausibly render particular photographic styles appears to be greater than the currently available resources.

A technological solution is to “fine-tune” a base model to perform specialized tasks, such as the generation of images in the style of a known photographer. We opted for LoRA (LowRank Adaptation). We trained several discrete LoRAs on public -domain images scraped from the Library of Congress, specifically its FSA collection. Between 1935 and 1944, the FSA and related offices commissioned extensive documentation of American life by many of the now canonical documentary photographers: Walker Evans, Dorothea Lange, Arthur Rothstein, Gordon Parks, and others. We trained our two FSA LoRAs on roughly a thousand images by Lange and more than fourteen hundred by Parks. Notably, Parks was the only African American photographer employed as part of the FSA project, and his photographs include his most famous picture: the image of Ella Watson known as American Gothic (1942).

Although no more “authentic” or “photographic” than any other Ai-generated outputs, the images produced through a well-trained LoRA exhibit the features and qualities of the original photographs to an unprecedented degree. Stated more simply, they are (nearly) indistinguishable from the images produced by the source photographers. We call this “latent specificity.”

The specificity generated by our models is latent in at least two respects. On a technical level, our Lange LoRA, for example, does not add more Lange images to the Stable Diffusion training data but, instead, intervenes in the navigation of its latent space, the hidden layers of a neural network, which contain no images, only data about the images’ abstract patterns and relations. Second, the specificity that we generate is latent in the archives and must be learned by our models. It is not a feature that we find but rather a feature we produce through the selection, labeling, and processing of Lange’s images.

Latent specificity not only enables the creation of highly engaging images, it sheds light on peculiarities—and thus photographic practices—not evident in the few iconic photographs with which we are all familiar, such as Parks’s American Gothic. The color image of a construction worker on the previous spread is the output of Stable Diffusion in response to the prompt “a photo of a worker at a construction site by Gordon Parks.” What initially appears like a compelling image can be shown to be a generic iteration of his style, which is rendered with far more specificity in the black-and-white image to its right, generated by our Parks LoRA.

The full series of images from this prompt (above) comprises iterations ranging from no LoRA (far left) to full-strength LoRA (far right), with io percent stepwise intensifications in between (the middle image shows the LoRA at 50 percent strength). Comparing the Stable Diffusion and LoRA images reveals several dimensions of Parks’s distinctive style during his time with the FSA. In place of the perfect exposure and Life magazine hues of Stable Diffusion’s generalized rendering, we witness the emergence of texture mimicking grainy black-andwhite film, imperfect exposure, and, most intriguingly, a compositional shift as the LoRA reaches full application. Our Parks LoRA regularly produced subjects at a heroic elevation—a composition evident throughout the photographer’s FSA images used for training but unaddressed in his nearly contemporaneous reflections on photographic portraiture, Camera Portraits: The Techniques and Principles of Documentary Portraits (1948). When plotted along this image axis, the generated construction worker rises as we assume the position of “Parks’s” camera, moving from parallel with the worker to, eventually, monumentalizing him against the skyline from below.

In the absence of race and gender specifications, most models, Stable Diffusion included, tend to default to white, male subjects (or stereotypically attractive women). Bias remains a major unsolved problem for generative AI, except for those companies that have abandoned the struggle altogether. Our LoRA models demonstrate that the problem is solvable, at least with small-scale datasets. As the author and scholar Kate Crawford has argued, “improvements in AI will require putting a lot more care and thought into how data is collected and curated.” FSA photographs represent one such carefully and thoughtfully curated dataset. For all its inadequacies, the FSA aimed “to introduce America to Americans,” an ambition that included systematic efforts to photograph a diverse and inclusive range of its inhabitants. These efforts are especially pronounced in the photographs taken by Parks and Lange. The six Lange/“Lange” images on page 62 include two photographs by Lange and four generated with our Lange LoRA while using our purposefully vague prompt for “a photo of a farmer by Dorothea Lange.” The results include interesting, historically consistent representations of women and African Americans to a degree rarely achieved through standard image models. Compare these with the images on this page generated by Stable Diffusion from the prompt “a person sitting in a chair,” which overwhelmingly produced images of white men. Corporate AI defaults typically yield white men and cartoonishly dollish women in generic bland images. If photography and AI are to have a compelling future, we need to chart alternate courses.

The fine-tuning of base models on select photographic archives marks one alternate course. LoRAs can anchor AI image generation in specific archives and shift at least some of Al’s new powers away from the formulaic approach of giant AI corporations and toward the nuances of individual photographers. Nonetheless, a ghost haunts our Ai-generated LoRA images. The ghost of photography. We can mimic past photographers with ever greater precision; yet each generated image is a monument to what will be lost in the age of generative AI imaging. Generative AI is generic by design. Our hope is that fine-tuning models can inject specificity and visual complexity into systems designed to be blandly attractive. But the successful pursuit of specificity runs the risk of further marginalizing actual photographic practices and practitioners. The capacity to generate “photographic” images with newfound ease may prove to be the coup de grace for the medium: Why invest the often significant time and effort required of good photographs when a good-enough “photograph” is just a prompt away? Generative AI imaging is at once a risk and an opportunity.

On page 62, the center images are original photographs by Dorothea Lange. The other images were generated with our LoRA using the prompt “a photo of a farmer by Dorothea Lange.”

Noam M. Elcott is a professor of modern art history at Columbia University. His next book is Photography, identity, Status: August Sander's People of the Twentieth Century.

Tim Trombley is the senior educational technologist at the Media Center for Art History at Columbia University.