Hallucinated humans: The identity problem hiding in your AI stack

Robots are getting better at seeing people. They can track a worker in a warehouse aisle, recognise a visitor at a reception desk, match a face to a delivery ticket, or pull up a profile of a customer before a sales rep walks into the meeting.

A growing number of automation systems also reach beyond the camera feed. They query language models to enrich what they see with context: who this person is, what they do, where they have appeared online, and whether their public footprint matches the record on file.

This shift is part of a wider pattern that Robotics & Automation News has described as robotics becoming a branch of artificial intelligence rather than a separate engineering discipline.

That second step is where things quietly go wrong. A camera is only as useful as the identity it attaches to a face, and a language model asked to summarise a person from public data will often invent details, mix up people with similar names, or present a confident profile of someone who does not exist.

For robotics and automation teams building systems that touch HR screening, access control, customer service, or any workflow where a human is involved, single-model identity lookups are becoming a serious reliability problem.

Where robots and automation meet identity data

Identity-aware automation is not limited to airports and border systems. It now sits inside routine commercial workflows. Humanoid robots in reception and exhibition spaces greet visitors by name.

Service robots in hotels and hospitals pair a face with a room number. HR platforms built on top of vision systems cross-reference public profiles before an interview.

Coverage of how AI agents are streamlining factory HR tasks shows how quickly this identity layer has moved from pilot to production in manufacturing environments.

Field-service dispatch tools profile the technician and the customer before a job is assigned. Even warehouse and logistics automation increasingly touch identity at the handover step, where a package meets a person.

The pattern behind most of these systems is the same. A robot, a camera, or a scheduling engine detects a signal tied to a person, and then a downstream AI service is asked to interpret it. The interpretation layer is almost always a large language model or a pipeline built on top of one.

These models have well-documented hallucination rates. Stanford researchers found hallucination rates between 58% and 88% on legal queries across major models, and a more recent multilingual benchmark published at EMNLP 2025 found that average rates across 30 languages and 11 models still fall well above zero even on routine knowledge tasks.

When the task is identifying a person, those numbers become a design risk rather than an academic curiosity.

Why a single-model lookup is the weakest link

Public-data identity summarisation is a surprisingly hard problem for language models. Three failure modes dominate.

First, common names. A single model asked about a “John Rodriguez, software engineer” will happily merge five different people into one confident biography. There is no internal check that the LinkedIn profile, the conference talk, and the patent filing belong to the same person.

Second, speculative filling. When the public record is thin, models fill the gap. They invent employers, credentials, locations, and publications. The output reads clean, which is the worst possible property for a safety-critical identity step.

The NIST Generative AI Profile refers to this behaviour as confabulation and flags it as a distinct class of risk, especially when users are prone to automation bias and accept plausible-sounding outputs without verification.

Third, stale public data. A model trained or retrieved six months ago will not know that the person changed roles, deleted accounts, or updated their credentials.

This is especially relevant for robots placed in executive offices, medical settings, or client-facing environments where the wrong background brief is worse than no brief at all.

The common thread is that one model is being asked to do the job of several. It is retrieval, disambiguation, and synthesis at the same time, with no second opinion.

A University of Michigan study reported by Robotics & Automation News found that humans stop trusting robots after three mistakes, and that no repair strategy fully restores the trust.

For any robot that speaks a person’s name or cites a personal fact, a hallucinated identity is exactly the kind of mistake that compounds.

Consensus as a design pattern

Robotics teams already use consensus on the hardware side. Sensor fusion combines lidar, radar, and vision because no single sensor is trustworthy in every environment.

The same logic applies to AI-driven identity work. If one model is unreliable on any given query, the defensible pattern is to ask several and keep only the parts they agree on.

This matches the “valid and reliable” trustworthiness characteristic defined in the NIST AI Risk Management Framework, which treats reliability as the baseline condition for any other trustworthy AI property.

This is the approach behind a free tool developed by Tomedes, a translation company that has been building consensus-based AI infrastructure for several years. The tool, What AI Knows About Me, accepts a name, email, username, or URL and returns a public-footprint summary generated by a feature called SMART.

SMART sends the input to multiple leading AI models at the same time, breaks each response into segments, and keeps only the segment versions that the majority of models agree on. Low-agreement claims are filtered out before the summary is assembled.

The result is a shorter, more conservative profile than any single model would produce on its own. For robotics and automation contexts, that trade is exactly the right one. A brief, confidence-scored answer is easier to act on than a long, plausible-sounding one that might be fabricated.

What it looks like in practice

A brief walk-through of the tool is useful because it shows what a consensus-filtered identity lookup actually looks like at the output layer.

A user enters a single input, such as a full name or a LinkedIn URL. The tool sends the query to several leading models in parallel. Each model returns its best guess at the person’s public footprint.

The SMART layer then compares the outputs segment by segment. Where most models agree, the segment is kept. Where they disagree or speculate, the segment is dropped. What the user sees is a reassembled summary made only of the agreed-upon parts.

For a robotics team thinking about this as a design reference rather than a consumer product, the interesting elements are the interface choices.

The output is free, requires no sign-up, and is explicit about its limits. Tomedes states clearly that the tool reflects only public signals and should not be used as the sole basis for hiring, security, or compliance decisions.

That framing matters. It is a reminder that consensus-based identity data is a support layer, not an authority layer, and that the same caveat belongs in any automated system that acts on it.

Implications for robotics and automation teams

A few practical takeaways follow from treating identity as a multi-model problem rather than a single-model one.

Treat single-model identity calls as a liability. If a robot, chatbot, or workflow quotes a personal detail about a named human, the sentence should be traceable to more than one source. Otherwise it is a hallucination waiting to happen.

Expose confidence, not just content. Consumer users can accept a fuzzy summary, but industrial systems cannot. Whatever identity layer sits behind a robot needs a confidence score on each claim, and the robot needs a policy for what to do when that score drops below a threshold.

Separate the sensor from the interpreter. Vision systems detect and match; language models interpret. Blurring these two is how a warehouse robot ends up introducing a visitor by the wrong job title.

The same discipline that robotics cybersecurity frameworks already apply to data-in-transit should apply to the AI layer that interprets that data on arrival.

Design for refusal. The most important behaviour of any identity system is knowing when to say nothing. A tool that filters out low-agreement claims is demonstrating that behaviour at the content level. Robots and automation flows need the same option at the action level.

The broader shift

Robotics has spent the last few years absorbing the generative AI stack at speed. Foundation models for perception, vision-language-action systems, and synthetic data pipelines have all become standard parts of a modern robotics roadmap, as ongoing coverage of physical AI and industrial deployment regularly shows.

The next phase is less glamorous but more consequential. It is about reliability, not capability. Readers who follow the industry’s reporting on physical AI will already be familiar with the pattern: systems that make decisions about people, or in front of people, need to earn that trust the same way any other safety-relevant component earns it, through redundancy and cross-checking.

Rachelle, AI Lead at Tomedes, frames it this way: “A single model’s answer about a person is a starting guess, not a verified fact. The only reliable signal you can get from the public web today is the one that several models independently converge on. Everything else is a plausible story.”

For teams building the next generation of robots and automation systems, the design implication is direct. Consensus is not a luxury feature. It is the minimum viable reliability standard for any AI layer that touches a human being.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.