Contextualizing AI and How it Affects You as Product Manager
Part 1 of the AI for Product Manager Series where we
Let’s face it: AI is everywhere. It’s in your apps, your workflows, and probably your to-do list (if you’re not already using AI to manage that, too). But here’s the thing: AI isn’t perfect. In fact, it’s prone to something called hallucinations, where it confidently produces nonsense that sounds plausible. And if you’re a Product Manager, this is a technical glitch, but more importantly, it’s a product risk waiting to explode in your face.
So, let’s talk about what AI hallucinations are, why they matter, and how you, as a PM, can stay ahead of the curve.
AI Hallucinations & Product Risk
A 2024 study was done by reputable universities and Thinktank (Cornell, the universities of Washington and Waterloo, and the nonprofit research institute AI2) to evaluate several AI platforms by fact-checking models like GPT-4o against authoritative sources on topics ranging from law and health to history and geography. Basically, they found that none of these models really are able to accurately answer or perform well with the questions given. And those who were on the higher end scored better than others because they refused to answer.
Now, this isn’t the only time AI hallucinations made the news. Back in February 2024, Google’s Gemini AI faced backlash for generating historically inaccurate and biased images when users asked it to depict historical figures: America’s founding fathers were depicted as Black women and Ancient Greek warriors as Asian women and men. Launching the new image generation feature sent social media platforms into intrigue and confusion. When users entered prompts to create AI-generated images of people, Gemini largely showed them results featuring people of colour. This event caused many criticisms that led Google to pause its image-generation feature.
Another case of hallucination was highlighted when U.S. District Judge P. Kevin Castel in Manhattan ordered lawyers Steven Schwartz and Peter LoDuca and their law firm Levidow, Levidow & Oberman to pay a total of $5,000 in fines. These lawyers used ChatGPT to help them with a personal injury case against an airline company, Avianca. They submitted a legal brief that included six AI-generated fictitious case citations.
So what do these all have in common?
For one, you can’t just take what AI gives you at first glance. And that affects user trust and product adoption. Basically, if left unchecked, this erosion of user trust in your product can lead to the following:
Legal & Ethical Concerns: Imagine your AI gives bad medical advice or makes up financial data. Lawsuits, regulatory fines, and PR disasters could follow.
Brand Damage & PR Nightmares: One viral hallucination (like Google’s Gemini incident) can tank user trust and send your brand into crisis mode.
Business Risks for Clients & Partners: If your clients or partners rely on your AI for decision-making, a hallucination could cost them millions, and they’ll blame you.
Okay, enough doom and gloom. Let’s talk solutions. Here’s how you can mitigate AI hallucinations and build a product users can trust:
1. Build Real-Time Accuracy Checks into the Core Product Experience
AI hallucinations thrive in ambiguity. So, don’t let your AI wing it. Instead, design systems that validate its outputs in real-time.
Leverage External Knowledge Sources: Integrate APIs or databases (e.g., Wikipedia, financial data, scientific journals) to cross-check AI-generated facts. For example, if your AI claims a stock price, validate it against live market data.
Flag Low-Confidence Responses: Teach your AI to say, “I don’t know,” when it’s unsure. Better to admit uncertainty than to make something up.
Use Multi-Model Validation: Run outputs through multiple AI models to catch inconsistencies. If one model disagrees, flag the response for review.
Product Feature Idea: Embed accuracy checks into your product so you can reduce hallucinations while maintaining a seamless user experience.
2. Add Human Oversight for High-Stakes Scenarios
We’ve established that AI is a powerful tool, but it’s not infallible. You want to minimize as much risk as possible, especially when it comes to litigious matters. Therefore, you want to make sure there are checks and balances that add human oversight for high-risk outputs, like medical advice, legal interpretations, or financial recommendations.
Require Human Review for Critical Outputs: Build workflows where experts review critical AI outputs before they reach users.
Enable Active Learning Loops: Use human corrections to train the AI over time. Every time a human fixes an error, the AI should learn from it.
Crowdsource Verification: For community-driven products, let users flag or correct AI outputs. This creates a self-improving system where the AI learns from its mistakes.
How to start: Map out user journeys to identify high-risk touchpoints. Then, design workflows that add human oversight where it matters most.
3. Prioritize Transparency and Explainability in AI Outputs
Expanding on the idea of flagging low-confidence responses from above, users would better use information when they understand how AI arrives at its conclusions.
Show Confidence Scores: Instead of presenting AI outputs as absolute truth, display a confidence level (e.g., “80% certain based on available data”).
Provide Source References: Whenever possible, cite the sources behind AI-generated content. For example, “This summary is based on [Source X] data.”
Allow User Verification: Allow users to edit or fact-check AI outputs. This reduces blind trust and encourages critical engagement.
Product Feature Idea: Add confidence scores and source references to your AI outputs. It’s a small change that can make a big difference.
AI hallucinations are inevitable, but they don’t have to be catastrophic. You can turn AI’s weaknesses into opportunities to build trust and deliver exceptional user experiences by implementing real-time validation, human oversight, transparent design, and continuous monitoring.
Let me know your thoughts. If you have any war stories about AI gone wrong, I’d love to hear them.