News Register for our weekly news digest here.

Photo: Excavating AI.
Photo: Excavating AI.

Image Database Purges 600K Photos After Trevor Paglen Project Reveals Biases

ImageNet—a database of more than fourteen million images and one of the most widely used sources for training AI machine learning technologies to facially recognize people and objects—has announced it will remove more than 600,000 photos of people from its system. The news comes five days after artist Trevor Paglen and Kate Crawford, distinguished professor of New York University and cofounder of the AI Now Institute, unveiled ImageNet Roulette, a viral project illustrating the biases and fallibility in how the program identifies people. 

Currently on view in Paglen and Crawford’s “Training Humans” exhibition, which opened at the Fondazione Prada museum in Milan last week, ImageNet Roulette allows users to upload selfies to see how AI might classify them. The human categories have been adapted from ImageNet’s actual labels which range from “enchantress,” “flutist,” “preterm baby,” “microeconomist,” “skier,” and “mediator” to more damning labels and racist and misogynistic slurs such as “slut,” “rapist,” “Negroid,” and “criminal.”

Established in 2009, ImageNet was created by Stanford and Princeton University researchers, who pulled the millions of photos in the database from the internet. The researchers then enlisted fifty thousand low-paid workers through Amazon’s crowdsourcing labor platform, Amazon Mechanical Turk, to apply labels to the images. The laborers’ biases were ultimately embedded into the project, and as Paglen and Crawford’s application reveals, the prejudices of that labor pool are reflected in the AI technologies drawing from the data. Since AI is used not only by tech giants and academic labs, but also by state and federal governments and law enforcement agencies, flaws in its data sets can have wide-ranging impact.

“As we have shown, ImageNet contains a number of problematic, offensive, and bizarre categories,” Paglen and Crawford write in their accompanying research paper. “The results ImageNet Roulette returns often draw upon those categories. That is by design: we want to shed light on what happens when technical systems are trained using problematic data. AI classifications of people are rarely made visible to the people being classified. ImageNet Roulette provides a glimpse into that process—and to show how things can go wrong.”

While ImageNet did not cite ImageNet Roulette as the reason, days after the digital art project went viral, ImageNet released a statement saying that it will remove 438 people categories and 600,040 associated images that they have labeled as unsafe. “As AI technology advances from research lab curiosities into people’s daily lives, ensuring that AI systems produce appropriate and fair results has become an important scientific question,” the statement reads. The database is also planning to update its website so that future users may report offensive terminology. 

In response, Paglen and Crawford have announced they will take ImageNet Roulette offline on September 27, as it has now “made its point—it has inspired a long-overdue public conversation about the politics of training data, and we hope it acts as a call to action for the AI community to contend with the potential harms of classifying people.”