Episode 43: How to evaluate AI in tech products?

Embedded IT
Jun 14, 2025
3 min read

Updated: Jan 16

https://www.youtube.com/watch?v=cLpJcxATeng

AI is now woven into countless tools that procurement professionals use every day, often without them realising it. Automated invoice processing, data extraction from PDFs, and other long-standing features are all examples of machine learning working quietly in the background. Understanding this can take away some of the apprehension about buying AI-powered tools. But when it comes to procuring new AI products, there is a clear set of considerations that can help avoid risk and ensure the technology genuinely supports business processes.

If you want a simple explanation of the mechanics behind AI outputs (and why it can go wrong), see how AI works.

Understanding the type of AI behind a product

Before evaluating any AI tool, it is important to understand what kind of AI sits behind it. Some tools are powered by basic machine learning, where outcomes are shaped by rules created from patterns in historical data. Others rely on generative AI or large language models, which behave differently and introduce more complexity.

Knowing what type of AI is being used helps clarify how decisions are made and what level of transparency you can expect. It also informs the right questions to ask the supplier.

Reviewing how the model was trained

AI behaviour is shaped by the data it has been trained on. If a model has been trained on large parts of the public internet, then it inherits common digital biases, often reflecting predominantly Western or American viewpoints. That may or may not matter depending on the application, but procurement teams need clarity.

Asking what data was used, how it was prepared, and how representative it is provides essential insight into how the tool might behave in real-world situations.

Proving that the AI will work in context

A common pitfall is buying AI tools simply because they are labelled as “AI.” Instead, running a proof of concept or trial is essential. Internal data quality varies widely, and a model trained on the internet may not behave correctly when exposed to a business’s specific data sets.

Testing ensures the AI can be trained, adapted, or used effectively in the organisation's own environment.

Understanding bias, accuracy and human control

Even well-trained AI models make mistakes. It is crucial to understand how the product identifies and handles errors, and how the organisation will learn from incorrect outputs.

It is also important that humans remain able to override AI decisions. While automation helps remove repetitive tasks, businesses still carry responsibility for outcomes, and liabilities sit with the organisation, not the AI vendor.

Clear boundaries around what AI can and cannot decide help limit risk.

Considering model transparency

Transparency is a core requirement when evaluating AI products. Businesses need to understand, at least at a high level, how the model works and what approaches were used to build it.

Large AI providers share some information publicly but cannot offer detailed explanations tailored to every customer. This is why smaller or open source providers can sometimes offer greater visibility. Transparency supports better decision making and reduces the likelihood of unexpected model behaviour.

Learning from real examples of AI in practice

One example of AI being used for social good is the Be My Eyes application, originally created to connect blind users with volunteers who could describe what their phone camera was showing. It has since evolved to use AI to interpret scenes and offer guidance.

While the benefits are significant, the application highlights why procurement teams must think carefully about risk. When AI gives advice to a vulnerable user, incorrect outputs could have serious consequences. Understanding use cases, risks, and responsibilities is essential before deploying similar tools.

A practical checklist for evaluating AI products

A simple checklist can help structure procurement decisions:

What type of AI is it – machine learning, generative AI, or an LLM?
What data was it trained on, and does that raise any concerns?
Does it work with your organisation’s data? Prove it before buying.
How does it handle bias and accuracy?
Can humans override decisions?
How are errors identified, escalated, and corrected?
How transparent is the model?

With clarity on these points, procurement professionals can assess AI products confidently and surface any risks before committing to investment.

For organisations looking to procure AI tools with confidence, get in touch.