Jochen Deister, the founder and VP of innovation and strategy at Privacy Solutions, is sounding the alarm on the limitations of AI models. He says that when you use an AI tool, it can give you an answer that's well-structured, clearly reasoned, and properly sourced. But, he warns, that doesn't mean it's accurate. It's not that the AI tool can't provide a good answer - it's just that it can't guarantee the answer is correct.
You've probably had this experience before. You ask an AI a question, and it gives you a response that sounds right. But then you get a nagging feeling that you should verify it. The problem is, verifying the answer would take almost as long as doing the work without AI. So, you don't. You move on and hope the answer was correct, which isn't a good strategy.
Deister says this is a critical issue because AI training pipelines are robust, but they filter aggressively for text quality, removing spam, duplicates, and poorly written content. However, these filters don't assess domain-specific accuracy. He explains that Large Language Models (LLMs) are notorious for hallucinating and producing believable but incorrect or misleading text. They're trained on a lot of data, but that data isn't always accurate.
Deister believes that companies are incorporating AI into high-risk applications without verifying their training data. He's concerned about the domino effect of inaccurate information leading to more inaccurate information. For instance, a well-crafted regulatory guidance document may score highly on every quality metric, but its legal analysis may be incorrect. This can have serious consequences, and it's something companies should be aware of.
The AI training pipeline is excellent at identifying well-written text, but it has no mechanism for verifying whether the text is true. Deister says that AI is trained on material written by human beings whose understanding was shaped by their knowledge, perspective, and the state of their field at the time they wrote. This means that the AI model reflects the aggregate of what humans have written, not the current state of knowledge. It's a limitation that companies need to understand.
Earlier this year, Deister published an analysis of an official questionnaire issued by a German data protection authority. He found that the document contained 38 questions, and at least 19 of them had significant legal errors. The authority confused EU Charter of Fundamental Rights protections with internal market freedoms from a different treaty altogether. This is a big mistake, and it's something that can't be ignored.
Deister warns that every version of this document, and every commentary that treats it as settled guidance, are now likely to enter LLM training data as high-authority content. The errors risk being reinforced by each uncritical repetition, which only encourages the AI model to keep reproducing them. This can lead to a lot of problems, and it's something that companies need to be aware of.
He says that this is not only a legal sector problem but can apply to every expert domain where the training material reflects human interpretation rather than verified data. Medical guidelines get updated, but the previous versions, written by physicians whose understanding reflected the evidence available to them at the time, persist in training data and continue to shape model outputs. This is a big issue, and it's something that needs to be addressed.
When evaluating AI for any domain where accuracy has consequences, Deister thinks one question matters more than all others: What is the verified source of truth behind this tool? He says that a good answer describes a curated knowledge layer built and maintained by domain practitioners, anchored to primary sources and updated when the underlying reality changes. This is what companies should be looking for.
The model is not the product. The knowledge layer is. And for most AI deployments in expert domains, that layer hasn't been built yet. This is a significant problem, and it's something that companies need to address. They can't just rely on the AI model - they need to make sure the knowledge layer is in place.
Deister's warning is timely, given the increasing use of AI in various industries. He's not saying that AI is inherently bad, but rather that we need to be aware of its limitations. We need to verify the accuracy of the information provided by AI tools, especially in high-risk applications. This is crucial, and it's something that companies can't ignore.
Deister's analysis has significant implications for companies that are incorporating AI into their operations. They need to be aware of the potential risks of inaccurate information and take steps to verify the accuracy of the data used to train their AI models. This isn't easy, but it's something that companies must do. They can't just assume that the AI model is accurate - they need to make sure it is.
Deister's warning is a wake-up call for companies to be more careful when using AI tools. They need to understand the limitations of AI and take steps to verify the accuracy of the information provided. This requires a deeper understanding of how AI works and the potential risks associated with its use. Companies can't just use AI tools without thinking about the potential consequences - they need to be careful and make sure they're using the tools correctly.
Key Facts
- Jochen Deister is the founder and VP of innovation and strategy at Privacy Solutions.
- AI training pipelines filter for text quality, not domain-specific accuracy.
- LLMs are notorious for hallucinating and producing believable but incorrect or misleading text.
- Companies are incorporating AI into high-risk applications without verifying their training data.
- The AI training pipeline has no mechanism for verifying whether well-written text is true.