Disclaimer: These opinions are my own and do not reflect the views of my employer. While I did apply some UX Research methods to inform this piece, this is not a formal study.
“Do you know someone who could make an anime portrait of me?” a friend asked months ago.
I connected her to an artist who’d done a portrait for another friend of mine, who she then paid to paint her current social media avatar. I feel a little joy every time I see it pop up on Instagram. Something about how social media networks can bring people together and even inspire art. The artist was able to capture her likeness and stylize her features in a way that brought out the best in them. Until last week, this was a talent I assumed only a human could possess.
Now, Lensa AI has promised to do the same with their Magic Avatars feature, which launched in late November. Instead of finding and paying an artist (I found out too late that Lensa does not), you can download an app, upload some selfies, and pay a small fee for hundreds of portraits on demand. No human contact required. I’ve seen tons of sci-fi portraits in my social media feeds, and they’re surprisingly pretty good.
Of course, Lensa has been criticized as an obvious biometric data grab that capitalizes on our social media-driven vanity. I get it, but I had also happily handed over $30 during the CapCut craze to superimpose my face onto a Megan Thee Stallion music video, and I had no qualms about doing the same now. To me, that ship has sailed.
So I started the free trial and fed the AI 16 selfies, which it would then use to generate 100 images in 10 different styles ranging from Cosmic to Focus to Anime.
Strangely, the app asked whether I was female, male, or “other.” This confused me; what difference would it make if it was just going to analyze my face? As a non-binary person whose presentation leans femme, I wavered on whether to choose the “female” option or the more mysterious “other” option. Assuming the former would be a more fully developed product and because I was paying $6 for said product, I decided to start with female.
(Upon further research, I discovered that male avatars get to be astronauts and female avatars get to be fairy princesses with huge boobs. It is unusual that rather than allowing users to choose their desired styles, Lensa assumes your preferences based on gender, but that’s a topic for another day).
Before running the generator, I was warned by a “What to expect” screen:
“The type of AI we utilise for Magic Avatars may generate artefacts, inaccuracies, and defects in output images — it’s out of our control. So please acknowledge and accept that risk before continue [sic].”
I took this to mean that the AI would probably spit out a few incoherent, maybe even nightmarish images as AI are prone to do. Having played around in Nightcafe a bit, I expected to see an extra arm here, a misshapen ear there.
What I did not expect was 100 images of different East Asian women.
Here are two images from the “Kawaii” style:
It was as if rather than analyzing my unique facial features, Lensa had slotted me into the broader category of “Asian” and pulled from random images of Chinese, Japanese, and Korean women, whose facial features were treated as interchangeable.
Lensa seemed to be using the same neural network as the elementary school kids who asked me “Are you Chinese?” (I am not). Or maybe the neural network of my predominantly white ex-coworkers who would mistake me for other East Asian femmes at the company even though we looked nothing alike.
For an app whose entire value proposition is generating images in one’s likeness, it couldn’t even generate faces that were like each other.
Unlike Kawaii and Focus, Cosmic held onto my more Southeast Asian features and darker skin tone but still gave me weird results:
And as a cherry on top, Lensa also exaggerated an emerging insecurity I hadn’t even fully admitted to myself. Can you guess what it is?
I felt like I had paid Lensa to tell me “All Asian people look the same.” And then roast me a little bit as a treat.
I’m writing about this now from a reflective place but when I initially got my photoset after 20 minutes of anticipation (generation takes a lot of computational power, the app reminded me), I couldn’t stop laughing for hours. I was in hysterics.
In fact, I had to tell my personal trainer (for safety reasons) why I couldn’t stop laughing while doing kettlebell squats. When I showed her my pictures, she almost choked.
Can you imagine waiting expectantly for your royal portrait only to have it turn out like this?
When Googling to see if others had experienced something similar, I came across this incredible Youtube video called “This AI App is Racist” by RubidiumMoon (aka Linnea), a Korean-American creator. While scrolling through her photoset, she also couldn’t stop cracking up. Among other bizarre interpretations, her face was superimposed on Rose Tico, who she noted is the only Asian character in Star Wars.
“Everyone on Twitter is showing themselves as supermodels or cool heroes, and I have absolutely nothing,” she said, “A majority of these look nothing like me. They’re just stereotypical Asian. Some of these aren’t even the right ethnicity, I’m pretty sure.”
Like me, Linnea paid another $6 fee to generate 100 more images hoping for better results, which still ended up “mostly racist.” On my second try, I took great care to include clearer, more direct headshots as many of my followers suggested. And somehow, they too came out even worse.
To better understand my results, I put all of them into FigJam. I then labeled them across 3 categories: “Looks like me”, “Has some of my features”, and “Not even my cousin.”
Of the 200 images…
- Looks like me: 12 (6%)
- Has some of my features: 30 (15%)
- Not even my cousin: 158 (79%)
I also scored accuracy across styles. For each “Looks like me”, I awarded 2 points and for each “Has some of my features”, I awarded 1.
Ranked from most to least accurate:
- Cosmic: 13
- Anime: 9
- Fairy Princess: 8
- Light: 7
- Iridescent: 7
- Stylish: 5
- Pop: 3
- Fantasy: 1
- Kawaii: 0
- Focus: 0
Looking for more data points, I posted to my Instagram story asking if any of my Asian friends had also received a flip book of other Asians from Lensa.
I had a mixed friend tell me that many of her images amounted to “digital yellowface.” A Vietnamese woman had already shared her results with the caption: “This one is clearly another Asian woman.” A Chinese friend said that after seeing their white partner’s eerily accurate results, they couldn’t forgive mine.
At the same time, I had another Thai friend who had mixed feelings about her results. She loved many of her images but also said some looked like “prettier” versions of herself that made her feel insecure. In the example she showed me, Lensa seemed to have put her through a skin-lightening K-pop filter (“I look like the 5th member of Blackpink lmao”).
My friend, Alyss Noland, shared a Twitter thread from a Black woman named Rizèl who only received 4 accurate images out of 100. In the thread, Rizèl also mentioned having her skin lightened and features Anglicized, which brought up the point that accuracy varied greatly from style to style, presumably because of the models used to train each one.
Alyss replied: “It seems like the cosmic and kawaii ones were built with training data containing people of Asian descent and the fairy princess ones had more Mucha/European descent, skewing the output.”
I had also been lightened in a vast majority of images, which felt like an unsurprising reminder of how fairness is frequently linked to femininity and desirability. Perhaps this was the result of the training models, but I couldn’t help but wonder if it was also giving me the beauty filter treatment. This conflation of skin color and gender seems to be echoed by Amazon’s Rekognition software, which misgenders dark-skinned women 31% of the time. Perhaps my initial choice of “female” influenced these results.
This also made me think about how I used to only share photos where I appeared lighter or edited myself lighter because I, too, had been programmed to prize this trait. How much of the training data was self-selected in this way? Or edited to be this way?
It is no secret that the accuracy of face-matching technology still varies widely based on demographics. When standard algorithmic training databases are predominantly white and male, we end up with technology that misidentifies Asian and Black people up to 100 times more often than their white male counterparts (research shows that the group with the lowest accuracy is Black, female, 18–30). More than bruising egos, inequities in AI can result in FaceID that can’t tell its Chinese customers apart and law enforcement technology that supercharges existing racial biases against Black and Latinx people to the point of being banned in several cities.
Acknowledging my confirmation bias in only courting opinions from those with similar experiences, I posted another story asking my followers — regardless of race or experience — to rate the accuracy of their Lensa results on a scale of 1–7 (7 being “most accurate”).
After 24 hours, 17 people responded with the following averages, yielding a modest 4% advantage favoring white participants:
- 9 white: 4.7
- 8 POC: 4.5
What I found most interesting was that the only perfect 7 scores I received on the survey came from a mixed Black/Asian woman and an East Asian woman. Despite this, the latter had captioned her carousel post of 8 gorgeous images:
“I am ashamed to say that I paid this AI TWICE to learn how to recognize my Asian features… but now I get to share my favorite video game/anime ones hehe.”
Is Lensa selling POC an inferior product? And charging us what amounts to a tax for retries?
This quick and dirty Instagram survey didn’t provide enough data to draw firm conclusions, but it did illuminate how we adapt even when we suspect a technology is working against us. When I launched this survey, I expected people to rate Lensa’s accuracy on its own, not combined with their own efforts. Linnea mentioned how AI “doesn’t like Asians”, but that didn’t stop her from trying again. At least two Asian participants mentioned having to run the app several times to get accurate results but still rated it highly. My Filipino friend, Josh Delson, had iterated with a beard and without, zoomed in and zoomed out; he learned the algorithm as it learned him.
Maybe I should’ve asked about the first try. Or maybe, as a fellow UX Researcher suggested, I should’ve asked how closely the app aligned with one’s expectations. POC participants may have rated higher because they didn’t expect the app to work for them and vice versa. But who knows? The question of identifying one’s own likeness defies easy quantification.
Based on what I saw advertised across social media, I expected the app to perform a scientific, unbiased analysis of my face and apprehend it as precisely as I have over the course of my life. If biometrics can be used to bypass airport security, surely they can simulate my likeness on an anime character. But alas, there is bias in everything.
In day to day life, I often experience micro-aggressions, or small reminders that the systems governing our lives are not made for me. When people commit micro-aggressions, they reveal the centuries of biased programming all of our minds undergo. Sometimes, they shock and upset me. Other times, they make me laugh. Now, here is an app — similarly programmed — that can unflinchingly assail me with 100 micro-aggressions on command for the low price of $6.
Moving about my daily life, I often forget how others perceive me. Whether they see a unique face or an interchangeable “other.” What are we feeding our algorithms? And who do they think we are?
Of the 12 total images that looked like me, I only liked 2, for a 1% total success rate. 1 winner from each photoset.
Here they are!