Anaxi Labs co-founder warns robotics industry to address worker data rights before physical AI scales

As artificial intelligence continues its rapid expansion into the physical world, much of the industry’s attention has focused on increasingly capable robots, larger AI models, and the vast datasets required to train them.

But a growing number of observers are asking a different question: who owns the data that makes physical AI possible, and who should benefit from it?

Kate Shen, co-founder of Anaxi Labs, is among those pushing the debate into the spotlight. Her company is developing infrastructure for what it describes as a global AI and robotics data supply chain, with a particular emphasis on worker consent, data ownership, compensation, and regulatory compliance.

As robotics companies race to collect the real-world data needed to train humanoid robots and other autonomous systems, Shen argues that the industry risks creating future legal, ethical, and economic problems if it ignores the people whose actions and expertise generate that data.

In this interview, Shen discusses why she believes worker-generated data should be treated as a valuable economic asset rather than a free byproduct of industrial operations.

She explains how new approaches to data valuation could allow contributors to be compensated based on the measurable impact their data has on AI performance, and why she sees such frameworks as increasingly necessary as physical AI systems become more widespread.

The conversation also explores broader issues facing the robotics industry, including the concentration of value among a small number of platform providers, the growing importance of data infrastructure, and the likelihood that future regulations will require greater transparency around AI training pipelines.

Perhaps most interestingly, Shen argues that the industry’s biggest long-term challenge may not be robot hardware or AI algorithms, but the creation of sustainable economic incentives that encourage workers, companies, and data providers to continue contributing the diverse, high-quality data needed to develop truly general-purpose robotic systems.

Interview with Kate Shen

Robotics & Automation News: You’ve argued that robotics and physical AI companies need to think more seriously about worker consent and data ownership. Why do you believe this issue has been largely overlooked in the current AI and robotics boom?

Kate Shen: The current boom prioritizes technological velocity and scaling deployment, often structurally neglecting the economic and ethical layers.

Physical AI relies on high-fidelity, tacit knowledge embedded in worker actions and operational workflows – data that is expensive to generate and difficult to standardize.

Capturing this valuable, ego-centric input without a formalized economic layer or explicit consent is treated as a tactical shortcut.

However, this oversight weakens the integrity of the data supply chain and risks future industrial friction, making proactive compliance and fairness a critical deployment strategy.

R&AN: Many AI systems are trained using enormous amounts of operational and behavioral data collected from workers. Should warehouse staff, factory workers, or robot operators be compensated if their actions help train future automation systems?

KS: Yes, compensation is mandatory. Workers generate the high-fidelity, tacit knowledge indispensable for generalizable embodied AI.

Our collaborating researchers at Carnegie Mellon also showed that rewarding contributors based on the data’s measurable value improves fairness, incentivizing higher quality. This fair economic layer accelerates the acquisition of diverse, essential inputs, creating a win-win.

Furthermore, structurally integrating this compensation avoids future friction risk, preempting ethical issues and regulatory backlash associated with scaling automation without shared value.

R&AN: Carnegie Mellon’s research discusses measuring the “value” of data based on how much it improves AI performance. How practical is it to build an economic model where contributors are rewarded according to the measurable value of their data?

KS: We think it is quite practical. Our framework builds on existing data valuation tools that estimate how much a dataset improves an AI model’s performance, then provides a guideline for turning those contribution estimates into real prices, so contributors can be compensated based on measurable value rather than non-transparent or exploitative pricing practices.

The goal is not perfect pricing, but a transparent and workable procedure for fairer and more sustainable data markets.

R&AN: Physical AI and robotics companies are increasingly dependent on large-scale datasets for robot training and simulation. Do you think the industry risks a backlash if companies scale deployment without addressing questions around consent, fairness, and data rights?

KS: The risk is substantial. Scaling deployment without formalizing consent and compensation creates a public narrative where workers are seen as training their own replacements, fueling public and regulatory tension.

From a systems perspective, neglecting contributor incentives erodes the integrity of the data supply chain, making it difficult to sustain the diverse, high-quality, and long-tail data necessary for robust physical AI to achieve generalization robustness.

We view a proactive, GDPR-native global compliance framework as differentiated infrastructure for mitigating this risk at scale.

R&AN: Anaxi Labs describes itself as building infrastructure for a global AI and robotics data supply chain. What does that actually mean in practice, particularly for robotics and automation companies?

KS: We are architecting the compliant logistical and technical stack for a world-scale physical AI data supply chain. We aggregate high-fidelity real-world human training videos through strategic B2B partnerships and a decentralized expert network, with cross-embodiment and generalization robustness trends in mind.

Our petabyte-scale backend implements atomic action annotation, meticulously labeling four cognitive dimensions: observation, causal intent, execution methodology, and physical effect.

The data is validated by a team with hands-on experience running petabyte-scale data-to-robot closed loops, underpinned by a GDPR-native global compliance framework to mitigate privacy, data sovereignty and cross-border friction risks.

R&AN: You’ve spoken about AI evolving into an “ecosystem” rather than a standalone product. How do you see that changing the relationship between robotics companies, AI developers, data providers, and workers over the next decade?

KS: The relationship shifts toward shared economic participation, just like the hardware ecosystem. AI systems become composable components, necessitating infrastructure to track, meter, and monetize specialized assets like datasets and software components/agents.

This change is underpinned by automated revenue sharing that compensates contributors for downstream value. This fair economic layer sustains the high-quality human expertise and data contributions required to accelerate the deployment of generalized physical skills.

R&AN: One concern around AI automation is that value may become concentrated among a small number of platform providers. Do you think the robotics industry faces a similar risk as physical AI systems become more capable and widely deployed?

KS: Yes, the risk of value concentration is acute. Long-term competitive advantage is tied to organizations capable of building large-scale embodied data infrastructure.

If economic gains centralize among a few platform providers, the necessary data contributions – high-quality, diverse, and long-tail – will cease to flow, stalling progress toward generalizable robotics.

Our strategy is to try to build a marketplace that structurally distributes value to data creators and contributors, sustaining a healthy, competitive data supply.

R&AN: Looking ahead, do you think future robotics and AI regulations will eventually require formal frameworks around worker consent, data compensation, or transparency in AI training pipelines?

Formal frameworks are inevitable, driven by both ethical and distinct physical AI safety imperatives. Since robots act in the physical world, regulation will demand transparency and verifiable data to ensure semantic safety – understanding if an instruction is appropriate.

We anticipate strong requirements for explicit worker consent, fair data compensation, and auditable pipelines. Anaxi Labs is proactively mitigating this with a GDPR-native global compliance framework that addresses data sovereignty and industrial secret concerns.

R&AN: Humanoid robots and autonomous systems are increasingly moving into logistics, manufacturing, and service work. What are the biggest ethical or economic questions the robotics industry still isn’t discussing openly enough?

KS: The industry under-discusses the Evaluation Dilemma, which has both economic and ethical dimensions. Economically, the lack of standardized benchmarking – the inability to objectively measure robot performance independently of the hardware body – creates opacity, hindering market verification and slowing competitive R&D cycles.

Ethically, this same dilemma makes it difficult to ensure semantic safety and assign liability, as performance cannot be verifiably guaranteed for deployment in the physical world.

The combination of opaque performance evaluation and centralized economic value erodes the contributor incentives essential for maintaining the data ecosystem needed for generalization.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Interview with Kate Shen

Share this:

Related stories you might also like…