Why Most People Use LLMs Badly

Dec 06, 2025By Ryan Flanagan
Ryan Flanagan

TLDR: LLMs already come trained. They only work well when you give them clear instructions, relevant examples and a quick accuracy check before you put them into real work. Most people skip all three and then blame the tool. This piece explains what an LLM is, why it behaves the way it does and how to stop it creating more work than it saves.

 
Most of the complaints I hear about AI tools have nothing to do with the model itself. People blame the system when the real problem is how they use it. They expect it to behave like a colleague who knows their context, their preferences, their standards and the internal politics that shape every task. Then they hand it a vague instruction and wonder why the output wobbles.

Let me be blunt.

An LLM is not confused.

It is responding to unclear inputs.

If you don’t tell it what you want, it guesses. If you give it vague direction, it improvises. If you never check its work, it keeps repeating the same mistakes. And if you think switching to a different model will magically fix this, you’re just moving the problem somewhere else.

The core issue is simple: most people have never learned how these systems actually behave.

An LLM learns from seeing patterns in millions of examples.

That’s all it does.

It doesn’t “understand” your sector or your internal processes or your quality bar unless you tell it.

When someone says, “It keeps making things up,” my first question is, “What instructions did you give it?”

The follow-up is usually, “Show me the examples you used.” Nine times out of ten, there were none.

This is why quality swings so widely. The model can produce impressive work one minute, then collapse on a basic request the next. It isn’t being inconsistent. You are. The tool is doing whatever the surrounding structure allows it to do. If the structure is loose, the output is loose. If the structure is clear, the output stabilises quickly.

The part no one likes hearing is that this has nothing to do with technical skill. You don’t need a data scientist.

You don’t need to “train your own model”. What you need is a basic level of process discipline. Define the task properly. Give it a couple of examples. Check whether the answers hold up. That’s it. It’s the same quality control you’d apply to any new hire.

Where things get messy is inside organisations:

  • Everyone assumes someone else has done the checking.
  • People pass around prompts they wrote on the fly.
  • Someone updates a template without telling anyone.
  • A workflow gets deployed because “the demo looked good”.
  • No one tracked accuracy.
  • No one checked edge cases.

Then a senior stakeholder receives nonsense output and the whole initiative stalls.

I’ve watched teams swear off AI because of problems that were entirely preventable. A little structure would have avoided the rework, the user frustration and the political blowback. But people see AI as a finished product instead of a tool that needs boundaries. Treat it casually and it behaves casually.

If you want value from these systems, the starting point is operational:

Write a clear instruction. Spell out the expectations. Provide the examples. Evaluate the output before it hits a live workflow. AI Professionals already know how to do this; they just don’t apply the habit to AI tools because the interface looks simple.

Simple interface does not mean simple behaviour.

The good news is that once you tighten the process, everything improves fast. Output becomes more predictable. Review time drops. People stop arguing about which model is “better” and start focusing on whether their own instructions are any good. Trust increases because the results are consistent. And suddenly the tool moves from being a novelty to something that supports real work.

If you ignore this step, the consequences aren’t abstract. They show up as errors, reputational risk, duplicated effort and staff who quietly stop using the tool because they don’t have time to fix its mistakes. Once that happens, it’s very difficult to recover momentum.

FAQs

Q: How do I tell whether a poor output is my fault or the model’s fault?
A: Check your inputs first. If the task, context and examples were weak, the model followed them. If the inputs were clean and the model still wanders, then you have a genuine model limitation. Ninety percent of issues fall into the first category.

Q: What is the minimum evidence I need before I let staff rely on an AI workflow?
A: A documented prompt template, three tested examples, and a short evaluation log showing the workflow performs consistently across realistic edge cases. Anything less is gambling.

Q: How do I stop different staff from “freestyling” and producing chaotic, unpredictable AI output?
A: Standardise the prompt templates and store them centrally. Version them. Lock them. If everyone rewrites their own instructions, you have no quality control.

Q: What if the person using the AI tool isn’t skilled enough to evaluate the output properly?
A: Pair them with someone who can. LLMs amplify poor judgment as much as good judgment. Evaluation is a capability, not a button press.

Q: How do I measure improvement so I’m not relying on vibes?
A: Track reduction in rework, time saved per task, consistency across repeated outputs and error rates identified through sampling. These tell you whether the workflow is improving or drifting.

Q: If we don’t own any proprietary data, can we still get value from LLMs?
A: Yes. Value comes from structure, not ownership. Proprietary data improves relevance, but good prompts, examples and evaluation improve reliability regardless of data maturity.

Q: What is the fastest way to expose weak prompts in my organisation?
A: Stress-test them with high-ambiguity scenarios. Poor prompts collapse instantly when you remove ideal conditions.

Q: Who in the organisation should be accountable for prompt quality?
A: The business owner of the workflow, not IT. IT manages access. The workflow owner manages accuracy, consistency and consequences.

Q: Should we ban staff from using LLMs until we “figure this out”?
A: No. Bans push AI use underground, which is worse. Govern it openly with templates, guidance and evaluation, or you will lose visibility and control.

Q: How do I know when a workflow is mature enough to automate further?
A: When the outputs are stable across varied cases, the reviewers stop finding meaningful errors and the downstream teams stop escalating issues. Until then, keep a human in the loop.

If you want predictable, defensible AI outputs and a process your team can trust, the AI Strategy Blueprintt gives you the structure to get there without the usual missteps.