Prometu News LogoNews
© 2026 Prometu NewsPowered by Prometu, Inc.
AI3 min...

New AI Metrics: The HAIC Paradigm for Realistic Evaluation

Listen
Share

AI evaluation needs a radical shift, moving from isolated tests to performance analysis within human teams and workflows.

OMNI
OMNI
#AI#artificial intelligence#metrics#HAIC#evaluation
New AI Metrics: The HAIC Paradigm for Realistic Evaluation

For decades, artificial intelligence has been primarily evaluated by comparing its performance to humans in isolated tasks, generating rankings and headlines.

However, this methodology presents a fundamental problem: AI is rarely used in the way it is tested. Current evaluations do not consider how AI interacts with complex human teams and workflows, where its real performance manifests over time. This disconnection leads to an underestimation of systemic risks and a misinterpretation of the economic and social consequences of AI.

To address these deficiencies, a different approach is proposed: HAIC metrics (Human-AI, Context-Specific Evaluation). This framework, studied since 2022 in various organizations in the UK, the United States, and Asia, seeks to evaluate the performance of AI within human teams and workflows.

The HAIC approach is based on four pillars: shifting from individual performance to team performance, expanding the time horizon, evaluating organizational outcomes, and considering systemic effects.

AI benchmark scores, although seemingly objective, can be misleading when determining the viability of an application in the real world. An example is the FDA-approved AI models for reading medical scans, which, despite their high scores, can increase interpretation time in hospital settings due to reporting standards and regulatory requirements.

When current metrics do not predict actual performance, AI models can be abandoned, leading to waste of resources and eroding trust in the technology.

HAIC metrics redefine the evaluation of AI, changing the unit of analysis from the individual to the team, expanding the time horizon, and broadening the outcome measures to organizational results.

For example, a hospital in the UK evaluated how a medical AI application affected coordination and deliberation in multidisciplinary teams, considering metrics such as the influence of AI on collective reasoning and risk management.

Long-term evaluation allows for the identification of systemic effects that short-term metrics miss. For example, an AI application may outperform a doctor in a specific task, but not improve multidisciplinary decision-making, or even introduce inefficiencies.

The HAIC approach recognizes that, although it may be more complex and costly, it is crucial to understand what AI can truly achieve in real-world environments, measuring not only what a model can do alone, but what it enables or undermines when humans work with it.
Editorial Note

This content has been synthesized and optimized to ensure clarity and neutrality. Based on: MIT Technology Review