Q2 2025 Gaming Industry Report Released,
View Here

Konvoy’s Weekly Newsletter:

Your go-to for the latest industry insights and trends. Learn more here.

Newsletter

|

Jun 19, 2026

How Sure is Your AI?

How do you know what is right and what might be right?

Copy Link

No items found.

Copy Link

Looks Right versus Is Right

When you ask an LLM (large language model) the same question twice, you might know you will probably get two different answers. What is important is that this is not a bug, it is actually fundamental to how the models work.

A major reason for this is that these models need to be fast, fluent, and sound right. At its core, they are designed to give the end user the answer they are most confident in. Each time an LLM generates a response, it is sampling from a range of possibilities and is based on the temperature (turn it down and the model plays it safe; turn it up and it gets a little more creative) of the model. The dial is why the same question can produce different replies. These models are probabilistic and stochastic.

Deterministic models, on the other hand, will always produce the same output with a given set of inputs. Most LLMs can approach being deterministic by turning the temperature to 0 (though true non-deterministic LLMs can be difficult to achieve in the real world). But even if you have deterministic models, it does not mean the responses are always correct. For example, if you have a buggy calculator that computes 2+2=5 100% of the time, it is by definition deterministic, but is certainly wrong.

To get truly correct answers, you need models with proofs and formal verification. While we would all want provably correct responses to all our queries; in practice, much of what we discuss with LLMs has no verifiable truth. The experience of interacting with deterministic models would be extremely frustrating for your average user who is using it to ask opinions or just have a conversation and would constantly run into the model politely, but infuriatingly, providing a non-response of “sorry, I cannot provide a provable answer to that.”.  Models with formal verification are not built for the way humans are accustomed to communicating. They are slow and narrow, but when you get the answer, it is guaranteed that it was verified against formal proofs.

Stochastic Outputs

LLMs are the most common and widely recognized form of AI today. You interact with them directly through ChatGPT, Claude, and Gemini. Every answer you are given is the LLM predicting the most likely response based on the patterns built into its training data.

What is interesting is that each LLM's training data has verifiable information. Information such as proven mathematical theorems, peer-reviewed scientific research, documented historical facts, and correct code. A huge share of what a model trains on is actually true and checkable.

This is different from the output the model produces.  LLMs do not store facts the way a database does,  instead they reconstruct answers by predicting likely word sequences. There is always a risk that it blends two facts into a false combination or recalls a detail incorrectly. Because these models always answer with their most confident guess, they can state even wrong answers with total confidence. These confidence mistakes are what people mean when they say models “hallucinate.”

To take this a layer deeper, when an LLM picks the next word, it is not picking a single word; it is producing a probability distribution over all possible next words. Then it samples from that distribution (the sampling severity depends on the temperature) instead of just choosing the top choice. A lower temperature would mean it leans more heavily towards the top choice, while a higher temperature gives you more varied responses. With any temperature above 0, nondeterminism is designed into the model.

LLMs, on their own, are great for writing, brainstorming, summarizing, and conversation. Essentially anything where a strong, well-formed answer matters more than verifiable correctness. Writing software with LLMs today is a great example of nondeterministic models working with a verification system, where the LLM produces unverified code and the code is verified against a test suite or compiler in an agent loop.

Models with Verification

Verifiable AI systems are generally hybrid systems in which an LLM proposes an answer, and that answer is passed through a deterministic engine (for example, Lean 4) that checks the LLM's output before it ever reaches the end user. These are systems built around a verification layer.

The way this works, structurally, is best understood as a pipeline. A natural language problem piped in, the problem is translated into a formal language, the neural model searches for  proofs, each proof is checked by the proof assistant (the deterministic engine), and when a complete, verified proof exists, the answer is delivered to the user. Without a proof, no answer is provided. It will respond that the problem was unsolved, instead of ever giving you a “confident”, but potentially wrong, answer.

The engines themselves are actually not AI. They’re rule-bound and deterministic proof engines. Examples include Lean 4 (most widely used in the space right now), Coq, Isabelle, Metamath, and even products many of us would recognize, like Wolfram.

The companies truly exploring this space include Google DeepMind (AlphaProof and AlphaGeometry 2) and Harmonic (Aristotle). Other, less notable but still worth mentioning, include ByteDance (Seed-Prover), DeepSeek (DeepSeek-Prover), Princeton (Goedel-Prover), and Moonshot AI (Kimina-Prover).

At the core of all of these is the belief that when an answer absolutely has to be right, you make the AI show its work (“prove it”).

Takeaway: As we all adopted AI solutions in the earliest stages of innovation, being mostly correct was something we could accept. As these models advance and move into fields where the stakes are real, “probably right” is not good enough. We will see more robust verification layers leveraged alongside LLM “guesses” to provide reliable output answers. An LLM will tell you what is likely true; a verified system tells you what is certainly true. These models advancing will be core to how AI is leveraged to innovate in industries like cyber, biotechnology, and hardware manufacturing.

From the newsletters

View more
Left Arrow
Right Arrow

Interested in our Newsletters?

Click

here

to see them all