The AI hallucination problem is real. Here's how I'm thinking about it.

Let me tell you about a mistake I made.

I was preparing a customer presentation about Microsoft Entra ID, which had recently rebranded from Azure AD, and I wanted to make sure my terminology was current. I asked ChatGPT to summarise the key differences and what had changed. It gave me a confident, well-structured response. I used it as the basis for a section of the presentation.

One of my bullet points was wrong. Not completely wrong, just close enough to plausible that I didn't catch it in my review. The customer's technical contact did.

That was my fault, not ChatGPT's. The tool didn't claim to be a source of truth. I treated it like one.

What hallucination actually means

When people talk about AI "hallucination," they mean the tendency of large language models to generate confident, fluent, plausible-sounding text that is factually incorrect. It's not a bug in the traditional sense; it's a feature of how these models work. They're predicting likely next tokens based on training data, not retrieving verified facts from a database.

The insidious thing is that hallucinated content often looks indistinguishable from accurate content. The formatting is right. The tone is authoritative. The surrounding context is correct. But one specific claim (a date, a product name, a feature behaviour, a regulatory requirement) is wrong.

Why this is a specific problem for IT professionals

In our domain, specific factual claims matter in ways they might not in others. If ChatGPT tells you that Conditional Access policies work a certain way and you configure your tenant based on that and it's wrong, the consequences are real. If it gives you incorrect licensing information and you quote it to a customer, the consequences are real. If it hallucinates a PowerShell cmdlet parameter, your script fails.

The hallucination problem is not evenly distributed. Model accuracy is higher for well-documented topics that appeared frequently in training data, and lower for:

Recent events or releases (the model's training data has a cutoff)
Niche or highly specific technical questions
Questions that require precise numbers, dates, or version information
Questions where the correct answer has changed over time

That profile covers a lot of what we deal with in IT.

How I'm actually managing this

I haven't stopped using these tools. The utility is too high. But I've changed how I use them:

1. I categorise tasks by verification cost

For tasks where being wrong is cheap (drafting, brainstorming, summarising publicly available high-level information), I use AI outputs more liberally. For tasks where being wrong is expensive (specific technical guidance, anything with compliance implications, customer-facing technical claims), I treat AI as a starting point and verify against authoritative sources before I rely on it.

2. I ask for reasoning, not just answers

"Explain how Conditional Access policies evaluate device compliance" rather than "Does this policy configuration work?" When the model explains its reasoning, I can spot where the logic breaks down. It also prompts more careful generation. A model that has to justify its answer is less likely to confabulate.

3. I watch for confidence without evidence

Hallucinations often come with hedging removed. If an AI answer sounds very certain about something specific (a date, a version number, a precise product behaviour), I apply more scepticism, not less.

4. I keep source documents in the context

Where possible, I paste in the actual documentation I want the model to work with, rather than relying on its training data. "Here's the Microsoft documentation on Intune compliance policies. Summarise what it says about X." That's a different task to "Tell me about Intune compliance policies."

Using these tools intelligently

The hallucination problem is real, and it will probably get better as models improve. GPT-4 already hallucinates noticeably less than GPT-3.5. But "less" is not "none," and the improvement doesn't change the underlying nature of what these models are doing.

The responsible position is to use these tools for what they're genuinely good at, maintain appropriate verification processes for consequential claims, and not pretend the limitations don't exist.

That's not a reason to avoid them. It's a reason to use them intelligently.

The AI hallucination problem is real. Here's how I'm thinking about it.

What hallucination actually means

Why this is a specific problem for IT professionals

How I'm actually managing this

Using these tools intelligently

Related posts

You've built AI agents on Azure. Has anyone checked who else can use them?

The 90-minute ultimatum that took the world's most powerful AI offline

The model running your Copilot demo might not be OpenAI