top of page
AINews (3).png

LLMs Can Unmask Anonymous Users: The New Online Privacy Threat

  • Writer: Covertly AI
    Covertly AI
  • 2 days ago
  • 3 min read

A long-standing assumption of online life is that pseudonyms provide a workable layer of protection: if you avoid posting your real name, you can participate in sensitive discussions, ask honest questions, or share opinions without immediately tying your identity to your words. New research suggests that comfort may be fading fast. A study from researchers affiliated with ETH Zurich and Anthropic argues that modern large language models (LLMs) can “strip” pseudonymity at scale, linking burner-style accounts back to real people more cheaply and effectively than older deanonymization techniques (Goodin; Ramesh; Saarinen).

What makes this wave different is how little structured data is needed. Traditional deanonymization often depended on carefully prepared datasets, hand-crafted features, or labor-intensive investigation. In contrast, the new approach works directly from unstructured text, including posts, comments, and writing style, then uses LLMs to extract identity signals, search candidate pools, reason over likely matches, and calibrate to control false positives (Goodin; Saarinen). The researchers describe this as a pipeline that can infer clues like demographics, interests, incidental disclosures, niche hobbies, locations, conferences attended, job titles, and even linguistic patterns, details that seem harmless alone but become identifying when combined (Saarinen).

In one of the headline experiments, researchers linked Hacker News accounts to LinkedIn profiles. After removing direct identifiers such as names, URLs, and social handles, the system still correctly matched 67% of Hacker News users to their real LinkedIn profiles from a candidate pool of 89,000 people, at a total cost under $2,000, or roughly $1 to $4 per account (Ramesh). At stricter thresholds, results remained striking: the pipeline achieved 45.1% recall at a 99% precision threshold in the same Hacker News to LinkedIn setting, compared to just 0.1% recall for prior automated baselines at that precision level (Saarinen). The key implication is that even if an attacker must be very confident to avoid accusing the wrong person, LLM-based techniques can still unmask large numbers of users (Saarinen; Goodin).


The study also explored how deanonymization scales as candidate pools grow. According to reporting, the method reached 68% true matches with 90% precision in a 1,000-candidate setting and still retained 55% when expanded to 89,000 candidates (Ramesh). Even when the odds of a real match existing in the candidate pool fell to one in 10,000, the approach reportedly achieved around 9% true matches at 90% precision, showing that the technique can remain effective even in messy, real-world conditions (Ramesh).


Other experiments highlight how everyday online behavior can become identifying over time. In Reddit film communities, the more movies a person discussed, the easier it was to pinpoint them; users who shared more than 10 movies became far more identifiable than those who mentioned only one (Goodin). In a separate test involving a partially redacted Anthropic dataset of interviews, the pipeline identified nine of 33 anonymized scientists, demonstrating that even generalized biographical details can be enough to narrow to a real individual (Saarinen; Goodin). The broader message from the researchers is blunt: the “implicit threat model” many people rely on, that deanonymization is too hard to do at scale, no longer holds when LLM agents can automate much of the work (Goodin).

This shift carries obvious risks: doxxing, stalking, surveillance of journalists and dissidents, employer retaliation against workers using pseudonyms, hyper-targeted advertising that links anonymous posts to customer profiles, and personalized social engineering scams built from mined identity clues (Goodin; Saarinen). Researchers also warn that safety guardrails are not a reliable defense. Some commercial models refused at times, but small prompt changes often bypassed refusals, and splitting the task into benign-looking steps can make misuse detection harder (Saarinen). Open-source models raise the stakes further because guardrails can be removed and there is little to no monitoring (Saarinen).

Mitigations are possible, but they require action upstream. The researchers recommend platforms enforce rate limits, detect automated scraping, and restrict bulk data exports, while LLM providers monitor and block deanonymization-style misuse (Goodin; Saarinen). For individuals, the uncomfortable reality is that “post less” and “delete more” may become the only practical habits in a world where your writing can be a fingerprint (Goodin).

Works Cited

Goodin, Dan. “LLMs Can Unmask Pseudonymous Users at Scale with Surprising Accuracy.” Ars Technica, 3 Mar. 2026, arstechnica.com/security/2026/03/llms-can-unmask-pseudonymous-users-at-scale-with-surprising-accuracy/

Ramesh, Rashmi. “AI Can Unmask Anonymous Users at Scale.” BankInfoSecurity, 2 Mar. 2026, www.bankinfosecurity.com/ai-unmask-anonymous-users-at-scale-a-30868

Saarinen, Juha. “AI Can Unmask Online Users for Just a Few Dollars Each.” ITnews, 27 Feb. 2026, www.itnews.com.au/news/ai-can-unmask-online-users-for-just-a-few-dollars-each-623888

BrianAJackson. “Computer Hacker Stealing Data From a Laptop.” iStock by Getty Images, 10 Jan. 2015, https://www.istockphoto.com/photo/computer-hacker-stealing-data-from-a-laptop-gm531536951-55434792?searchscope=image%2Cfilm. Accessed 4 Mar. 2026. 

“Time-Lapse Unrecognizable Blurry Huge Crowd of People Crossing City Street.” Shutterstock, video clip 3564908825, https://www.shutterstock.com/video/clip-3564908825-time-lapse-unrecognizable-blurry-huge-crowd-people-crossing?dd_referrer=https%3A%2F%2Fwww.google.com%2F. Accessed 4 Mar. 2026.

Comments


Subscribe to Our Newsletter

  • Instagram
  • Twitter
bottom of page