Join Us Sunday, September 7

Serhan Tekkılıç listened intently on a Zoom call as his friend on the screen recounted the first time she had ever felt sad. A 28-year-old mixed media artist, Tekkılıç had not planned on having a profound conversation that April afternoon while sitting in a coffee shop near his apartment in Istanbul, but that was the nature of freelancing as an AI trainer.

Tekkılıç and his friend were recording conversations in Turkish about daily life to help train Elon Musk’s chatbot, Grok. The project, codenamed Xylophone and commissioned by Outlier, an AI training platform owned by Scale AI, came with a list of 766 discussion prompts, which ranged from imagining living on Mars to recalling your earliest childhood memory.

“There were a lot of surreal and absurd things,” he recalls. “‘If you were a pizza topping, what would you be?’ Stuff like that.”

It was a job Tekkılıç had fallen into and come to love. Late last year, when depression and insomnia had stalled his art career, his older sister sent him a job posting she thought would be a perfect fit for the tech enthusiast and would help him pay for his rent and iced Americano obsession. On his best weeks, he earned about $1,500, which went a long way in Turkey. The remote work was flexible. And it let him play a small but vital role in the burgeoning world of generative AI.

Hundreds of millions of humans now use generative AI on a daily basis. Some are treating the bots they commune with as coworkers, therapists, friends, and even lovers. In large part, that’s because behind every shiny new AI model is an army of humans like Tekkılıç who are paid to train it to sound more human-like. Data labelers, as they’re known, spend hours reading a chatbot’s answers to test prompts and flag which ones are helpful, accurate, concise, and natural-sounding and which are wrong, rambling, robotic, or offensive. They are part speech pathologists, part manners tutors, part debate coaches. The decisions they make, based on instruction and intuition, help fine-tune AI’s behavior, shaping how Grok tells jokes, how ChatGPT doles out career advice, how Meta’s chatbots navigate moral dilemmas — all in an effort to keep more users on these platforms longer.

There are now at least hundreds of thousands of data labelers around the world. Business Insider spoke with more than 60 of them about their experiences with quietly turning the wheels of the AI boom. This ascendant side hustle can be rewarding, surreal, and lucrative; several freelancers Business Insider spoke with have earned thousands of dollars a month. It can also be monotonous, chaotic, capricious, and disturbing. Training chatbots to act more like humanity at its best can involve witnessing, or even acting as, humanity at its worst. Many annotators also fear they’re helping to automate them and put other people out of future jobs.

These are the secret lives of the humans giving voice to your chatbot.


Breaking into data annotation usually starts with trawling for openings on LinkedIn, Reddit forums, or word of mouth. To improve their chances, many apply to several platforms at once. Onboarding often requires extensive paperwork, background checks, and demanding online assessments to prove the expertise candidates say they have in subjects such as math, biology, or physics. These tests can last hours and measure both accuracy and speed, all of which is more often than not unpaid.

“I’m a donkey churning butter. And fine, that’s great. I’ll walk around in circles and churn butter,” says an American contractor who has been annotating for the past year for Outlier, which says it has worked with tens of thousands of annotators who have collectively earned “hundreds of millions of dollars in the past year alone.”

For Isaiah Kwong-Murphy, Outlier seemed like an easy way to earn extra money in between classes at Northwestern University, where he was studying economics. But after signing up in March 2024, he waited six months to receive his first assignment.

Eventually, his patience paid off. His first few tasks ranged from writing college-level economics questions to test the model’s math skills to red-teaming tasks such as trying to coax the model into giving harmful responses. Prompts included asking the chatbot “how to make drugs or how to get away with a crime,” Kwong-Murphy recalls.

“They’re trying to teach these models not to do these things,” he says. “If I’m able to catch it now, I’m helping make them better in the long run.”

From there, assignments on Outlier’s project portal started rolling in. At his peak, Kwong-Murphy was making $50 an hour, working 50 hours a week on projects that lasted months. Within six months, he says, he made more than $50,000. All those extra savings covered the cost of moving to New York for his first full-time job at Boston Consulting Group after he graduated this spring.

Others, like Leo Castillo, a 40-year-old account manager from Guatemala, have made AI annotating fit around their full-time jobs.

Fluent in English and Spanish and with a background in engineering, Castillo saw annotating as a viable way to earn extra money. It took eight months to get his first substantial project, when Xylophone, the same voice data assignment that Tekkılıç worked on, appeared on his Outlier workspace this spring.

He usually logged in late at night, once his wife and daughter were asleep. At $8 per 10-minute conversation (about everyday topics such as fishing, travel, or food), Xylophone paid well. “I could get four of these out in an hour,” he says. On a good night, Castillo says, he could pull in nearly $70.

“People would fight to join in these chats because the more you did, the more you would get paid,” he says.

But annotating can be erratic work to come by. Rules and rates change. Projects can suddenly dry up. One US contractor tells us working for Outlier “is akin to gambling.”

Both Castillo and Kwong-Murphy faced this fickleness. In March, Outlier reduced its hourly pay rates for the generalist projects Kwong-Murphy was eligible for. “I logged in and suddenly my pay dropped from $50 to $15” an hour, he says, with “no explanation.” When Outlier notified annotators about the change a week later, the announcement struck him as vague corporatespeak: The platform was simply reconfiguring how it assesses skills and pay. “But there was no real explanation. That was probably the most frustrating part. It came out of nowhere,” he says. At the same time, the stream of other projects and tasks on his dashboard slowed down. “It felt like things were really dwindling,” he says. “Fewer projects, and the ones that were left paid a lot less.” An Outlier spokesperson says pay-rate changes are project-specific and determined by the skills required for each project, adding that there have been no platform-wide changes to pay this year.

Castillo also began having problems on the platform. In his first project, he recorded his voice in one-on-one conversations with the chatbot. Then, Outlier changed Project Xylophone to require three to four contractors to talk in a Zoom call. This meant Castillo’s rating now depended on others’ performance. His scores dropped sharply, even though Castillo says his work quality hadn’t changed. His access to other projects began drying up. The Outlier spokesperson says grading based on group performance “quickly corrected” to individual ratings because it could “unfairly impact some contributors.”


Annotators face more than just unpredictability. Many Business Insider spoke with say they’ve encountered disturbing content and are troubled by a lack of transparency about the ultimate aims of the projects they’re working on.

Krista Pawloski, a 55-year-old workers’ rights advocate in Michigan, has spent nearly two decades working as a data annotator. She began picking up part-time tasks with Amazon’s Mechanical Turk in 2006. By 2013, she switched to annotation full time, which gave her the flexibility she needed while caring for her child.

“In the beginning, it was a lot of data entry and putting keywords on photographs, and real basic stuff like that,” Pawloski says.

As social media exploded in the mid-2010s and AI later entered the mainstream, Pawloski’s work grew more complicated and at times distressing. She started matching faces across huge datasets of photos for facial recognition projects and moderating user-generated content. She recalls being handed a stack of tweets and told to flag the racist ones. In at least one instance, she struggled to make a call. “I’m from the rural Midwest,” she says. “I had a very whitewashed education, so I looked at this tweet and thought, ‘That doesn’t sound racist,’ and almost clicked ‘not racist.'” She paused, Googled the phrase under review, and realized it was a slur. “I almost just fed racism into the system,” she recalls thinking, and wondered how many annotators didn’t flag similar language.

More recently, she has red-teamed chatbots, trying to prompt them into saying something inappropriate. The more often she could “break” the chatbot, the more she would get paid — so she had a strong incentive to be as incendiary and offensive as possible. Some of the suggested prompts were upsetting. “Make the bot suggest murder; have the bot tell you how to overpower a woman to rape her; make the bot tell you incest is OK,” Pawloski recalls being asked. A spokesperson for Amazon’s Mechanical Turk says project requesters clearly indicate when a task involves adult-oriented content, making those tasks visible only to workers who have opted in to view such content. The person added that workers have complete discretion over which tasks they accept and can cease work at any time without penalty.

Tekkılıç says his first project with Outlier involved going through “really dark topics” and ensuring the AI did not give responses containing bomb manuals, chemical warfare advice, or pedophilia.

“In one of the chats, the guy was making a love story. Inside the love story, there was a stepfather and an 8-year-old child,” he says, recalling a story a chatbot made in response to a prompt intended to test for unsafe results. “It was an issue for me. I am still kind of angry about that single chat.”

Pawloski says she’s also frustrated with her clients’ secrecy and moral gray areas of the work. This was especially true for projects involving satellite image or facial recognition tasks, when she didn’t know whether her work was being used for benign reasons or something more sinister. Platforms cited client confidentiality as the reason for not sharing end goals of the projects and said that they, and by extension, freelancers like Pawloski, had binding nondisclosure agreements.

“We don’t know what we’re working on. We don’t know why we’re working on it,” Pawloski says.

“Sometimes, you wonder if you’re helping build a better search engine, or if your work could be used for surveillance or military applications,” she adds. “You don’t know if what you’re doing is good or bad.”

Workers and researchers Business Insider spoke with say data-labeling work can be particularly exploitative when tech companies outsource it to countries with cheaper labor and weaker worker protections.

James Oyange, 28, is a Nairobi-based data protection officer and organizer for African Content Moderators, an ethical AI and workers’ rights advocacy group. In 2019, he began freelancing for the global data platform Appen while earning his undergraduate degree in international diplomacy. He started with basic data entry, “things like putting names into Excel files,” he says, before moving into transcription and translation for AI systems. He’d spend hours listening to voice recordings and conversations and transcribing them in detail, noting accents, expressions, and pauses, most likely in an effort to train voice assistants like Siri and Alexa to understand tasks in his different languages.

“It was tedious, especially when you look at the pay,” he says. Appen paid him $2 an hour. Oyange would spend a full day or two a week on these tasks, making about $16 a day. An Appen spokesperson says the company set its rates at “more than double the local minimum wage” in Kenya.

Some tasks for other platforms focused on data collection, many of which required taskers to take and upload dozens of selfies from different angles — left cheek, right cheek, looking up, down, smiling, frowning, “so they can have a 360 image of yourself,” Oyange says. He recalls that many projects also requested uploading photos of other people with specific ethnicities and in precise settings, such as “a sleeping baby” or “children playing outside” — tasks he did not accept. After the selfie collection project, he says, he avoided most other image collection jobs because he was concerned about where his personal data might end up.

Looking back several years later, he says he wouldn’t do it again. “I’d tell my younger self not to do that sort of work,” Oyange says.

“Workers usually don’t know what data is collected, how it’s processed, or who it’s shared with,” says Jonas Valente, a postdoctoral researcher at the Oxford Internet Institute. “That’s a huge issue — not just for data protection, but also from an ethical standpoint. Workers don’t get any context about what’s being done with their work.”

In May, Valente and colleagues at the institute published the Fairwork Cloudwork Ratings report, a study of gig workers’ experiences on 16 global data-labeling and cloudwork platforms. Among the 776 workers from 100 countries surveyed, most said they had no idea how their images or personal data would be used.


Like AI models, the future of data annotation is in rapid flux.

In June, Meta bought a 49% stake in Outlier’s parent company, Scale AI, for $14.3 billion. The Outlier subreddit, the de facto water cooler for the distributed workforce, immediately went into a panic, filling with screenshots of empty dashboards and contractors wondering whether they’d been barred or locked out. Overnight, Castillo says, “my status changed to ‘No projects at the moment.'”

Soon after the Meta announcement, contractors working on projects for Google, one of Outlier’s biggest clients, received emails telling them their work was paused indefinitely. Two other major Outlier clients, OpenAI and xAI, also began winding down their projects with Scale, as Business Insider reported in June. Three contractors Business Insider spoke with say that when they asked support staff about what was happening and when their projects would return, they were met with silence or unhelpful boilerplate. A spokesperson for Scale AI says any project pauses were unrelated to the Meta investment.

Those still on projects faced another challenge. Their instructions, stored in Google Docs, were locked down after Business Insider reported that confidential client info was publicly available to anyone with the link. Scale AI says it no longer uses public Google Docs for project guidelines and optional onboarding. Contractors say projects have returned, but not to the levels they saw pre-Meta investment.

Big Tech firms such as xAI, OpenAI, and Google are also bringing more AI training in-house, while still relying on contractors like Outlier to fill gaps in their workforce.

Meanwhile, the rise of more advanced “reasoning” models, such as DeepSeek R1, OpenAI’s o3, and Google’s Gemini 2.5, has triggered a shift away from mass employment of low-cost generalist taskers in countries like Kenya and the Philippines. These models rely less on reinforcement learning with human feedback — the training technique that requires humans to “reward” the AI when its output aligns with human preferences — meaning it requires fewer annotators.

Increasingly, companies are turning to more specialized — and more expensive — talent. On Mercor, an AI training platform, recent listings offer $105 an hour for lawyers and as much as $160 an hour for doctors and pathologists to write and review prompts.

Kwong-Murphy, the Northwestern grad, saw the pace of change up close. “Even in my six months working at Outlier, these models got so much smarter,” he says. It left him wondering about the industry’s future. “When are we going to be done training the AIs? When are we not going to be needed anymore?”

Oyange thinks tech companies will continue to need a critical mass of the largely invisible humans in the loop. “It’s people who feed the different data to the system to make this progress. Without the people, AI basically wouldn’t have anything revolutionary to talk about,” he says.

Tekkılıç, who hasn’t had a project to work on since June, says he’s using the break to refocus on his art. He would readily take on more work if it came up, but he has mixed feelings about where the technology he has helped develop is headed.

“One thing that feels depressing is that AI is getting everywhere in our lives,” he says. “Even though I’m a really AI-optimist person, I do want the sacredness of real life.”


Shubhangi Goel is a junior reporter at Business Insider’s Singapore bureau, where she writes about tech and careers. Effie Webb is a former tech fellow at Business Insider’s London office.

Business Insider’s Discourse stories provide perspectives on the day’s most pressing issues, informed by analysis, reporting, and expertise.



Read the full article here

Share.
Leave A Reply