An Anthropic project is using feedback from about 1,000 human software engineers to improve the performance of Claude Code, the AI coding tool whose recent advancements have disrupted the vibe-coding industry.
The project, known internally at Snorkel AI as “Marlin,” focuses on fine-tuning Claude Code’s answers so that it could mimic what a professional developer could do.
AI companies like Anthropic often outsource data work to third parties like Snorkel, which hire contractors to teach AI a variety of specialist subjects and do other tasks to improve models. Contractor interviews and training material from these projects provide a look into how this unseen army operates across the world.
Two contractors working on the Anthropic project told Business Insider they are being paid $280 per task to create prompts and review code. They said each task takes about an hour, although some submissions needed more back-and-forth with Snorkel’s approval layer.
Project Marlin’s freelancers, who have software engineering backgrounds, were directed to A/B test code written by two different models. Through this process, they compared the outputs from two models and chose which they preferred, according to project guidelines from Snorkel reviewed by Business Insider. One contractor said that the project aimed to ensure the model could achieve the level of detail expected in the prompt, essentially training Claude Code to write simplified, easier-to-maintain code.
The project is ongoing. The contractors did not know what version of the models they were evaluating.
As AI gets smarter and more capable, data-labeling platforms have shifted from generalist work to increasingly specialized tasks that require field expertise or postgraduate degrees. Snorkel’s website says that it works with people with advanced degrees, such as Ph.Ds, MDs, and JDs, or equivalent experience. The company says that top experts earn over $3,000 a week.
The industry’s transition to this specialization includes software training by computer engineers. Besides Snorkel, platforms like Scale AI and Mercor also offer up to $110 an hour for software engineers’ work.
Neither Anthropic nor Snorkel responded to requests for comment from Business Insider.
Clean and reliable code
Project Marlin instructed contractors to create a series of scenarios for which software developers may use Claude Code.
Contractors were instructed to select a GitHub repository from a list of thousands of repositories. They then had to create a Pull Request — a step where a developer proposes changes, such as new features or bug fixes. Contractors also had to create a prompt — a series of questions to explain what is expected of the model.
In one task, the contractor prompted the model to reorganize how the system stores and handles “execution metadata” — extra information about how things are run. The goal was to make the code clearer and easier for developers to work with, without changing how anything about the product or feature actually works.
The model returned two sets of code. Then, the contractor chose which they thought was more efficient. Contractors were also instructed to give follow-up prompts to “test how models handle conversation context,” according to the project’s directions.
In another task, the contractor prompted the model for a security fix focused on how MLFlow, an open-source machine learning platform, downloads Python packages when loading certain models.
The task’s set of instructions to the contractor read: “evaluate production-ready code based on correctness, security, reliability, and maintainability. The fix must properly block command injection attempts while still allowing all legitimate whitelisted pip options.”
Snorkel AI, founded in 2019 by Stanford researchers, creates datasets to improve AI models and creates tests for AI companies’ chatbots. The Silicon Valley-based startup touts top labs, including Google, Mistral, and Anthropic, on its customer list. It raised $100 million in Series D funding at a $1.3 billion valuation in May 2025. Snorkel cut 13% of its workforce in September, Business Insider previously reported.
The company is part of a slew of startups, including Scale AI, Mercor, and Handshake, that pay hundreds of thousands of contractors around the world to filter, rank, and train AI responses for the world’s largest tech companies. This data work helps improve everything from self-driving cars to OpenAI and Meta’s chatbots.
Read the full article here















