Konvoy is a thesis driven investment firm.

BACK TO
CONTENT

Newsletter

|

Jun 19, 2026

How Sure is Your AI?

How do you know what is right and what might be right?

Copy Link

No items found.

Copy Link

Looks Right versus Is Right

When you ask an LLM (large language model) the same question twice, you might know you will probably get two different answers. What is important is that this is not a bug, it is actually fundamental to how the models work.

A major reason for this is that these models need to be fast, fluent, and sound right. At its core, they are designed to give the end user the answer they are most confident in. Each time an LLM generates a response, it is sampling from a range of possibilities and is based on the temperature (turn it down and the model plays it safe; turn it up and it gets a little more creative) of the model. The dial is why the same question can produce different replies. These models are probabilistic and stochastic.

Deterministic models, on the other hand, will always produce the same output with a given set of inputs. Most LLMs can approach being deterministic by turning the temperature to 0 (though true non-deterministic LLMs can be difficult to achieve in the real world). But even if you have deterministic models, it does not mean the responses are always correct. For example, if you have a buggy calculator that computes 2+2=5 100% of the time, it is by definition deterministic, but is certainly wrong.

To get truly correct answers, you need models with proofs and formal verification. While we would all want provably correct responses to all our queries; in practice, much of what we discuss with LLMs has no verifiable truth. The experience of interacting with deterministic models would be extremely frustrating for your average user who is using it to ask opinions or just have a conversation and would constantly run into the model politely, but infuriatingly, providing a non-response of “sorry, I cannot provide a provable answer to that.”. Models with formal verification are not built for the way humans are accustomed to communicating. They are slow and narrow, but when you get the answer, it is guaranteed that it was verified against formal proofs.

Stochastic Outputs

LLMs are the most common and widely recognized form of AI today. You interact with them directly through ChatGPT, Claude, and Gemini. Every answer you are given is the LLM predicting the most likely response based on the patterns built into its training data.

What is interesting is that each LLM's training data has verifiable information. Information such as proven mathematical theorems, peer-reviewed scientific research, documented historical facts, and correct code. A huge share of what a model trains on is actually true and checkable.

This is different from the output the model produces. LLMs do not store facts the way a database does, instead they reconstruct answers by predicting likely word sequences. There is always a risk that it blends two facts into a false combination or recalls a detail incorrectly. Because these models always answer with their most confident guess, they can state even wrong answers with total confidence. These confidence mistakes are what people mean when they say models “hallucinate.”

To take this a layer deeper, when an LLM picks the next word, it is not picking a single word; it is producing a probability distribution over all possible next words. Then it samples from that distribution (the sampling severity depends on the temperature) instead of just choosing the top choice. A lower temperature would mean it leans more heavily towards the top choice, while a higher temperature gives you more varied responses. With any temperature above 0, nondeterminism is designed into the model.

LLMs, on their own, are great for writing, brainstorming, summarizing, and conversation. Essentially anything where a strong, well-formed answer matters more than verifiable correctness. Writing software with LLMs today is a great example of nondeterministic models working with a verification system, where the LLM produces unverified code and the code is verified against a test suite or compiler in an agent loop.

Models with Verification

Verifiable AI systems are generally hybrid systems in which an LLM proposes an answer, and that answer is passed through a deterministic engine (for example, Lean 4) that checks the LLM's output before it ever reaches the end user. These are systems built around a verification layer.

The way this works, structurally, is best understood as a pipeline. A natural language problem piped in, the problem is translated into a formal language, the neural model searches for proofs, each proof is checked by the proof assistant (the deterministic engine), and when a complete, verified proof exists, the answer is delivered to the user. Without a proof, no answer is provided. It will respond that the problem was unsolved, instead of ever giving you a “confident”, but potentially wrong, answer.

The engines themselves are actually not AI. They’re rule-bound and deterministic proof engines. Examples include Lean 4 (most widely used in the space right now), Coq, Isabelle, Metamath, and even products many of us would recognize, like Wolfram.

The companies truly exploring this space include Google DeepMind (AlphaProof and AlphaGeometry 2) and Harmonic (Aristotle). Other, less notable but still worth mentioning, include ByteDance (Seed-Prover), DeepSeek (DeepSeek-Prover), Princeton (Goedel-Prover), and Moonshot AI (Kimina-Prover).

At the core of all of these is the belief that when an answer absolutely has to be right, you make the AI show its work (“prove it”).

Takeaway: As we all adopted AI solutions in the earliest stages of innovation, being mostly correct was something we could accept. As these models advance and move into fields where the stakes are real, “probably right” is not good enough. We will see more robust verification layers leveraged alongside LLM “guesses” to provide reliable output answers. An LLM will tell you what is likely true; a verified system tells you what is certainly true. These models advancing will be core to how AI is leveraged to innovate in industries like cyber, biotechnology, and hardware manufacturing.

‍

From the newsletters

Newsletter

Jun 12, 2026

1979’s Impact on U.S. Nuclear Energy

The U.S. is 40+ years behind on nuclear energy.

Newsletter

Jun 5, 2026

The Merits of Maritime Data Centers

The pros and cons of putting compute in the ocean, and why serious capital is starting to bet that it floats.

Newsletter

May 29, 2026

Nonprofits Underpin the Tech Industry

Foundational value creation comes from nonprofits in tech

Newsletter

May 22, 2026

Agentic Entity Simulation

Applying agent-based simulation from individuals to entire organizations.

Newsletter

May 15, 2026

The Return of Supersonics

A new generation of aircraft is racing to make Mach 1+ travel routine both for passengers and warfighters.

Newsletter

May 8, 2026

Gaming Roots in Venture’s Biggest Exits

Gaming has quietly become one of the most prolific talent pipelines for venture's largest non-gaming exits.

Newsletter

May 1, 2026

Envy and the Future of Work

As many areas of work are commoditized, premium experiences and status retain value

Newsletter

Apr 24, 2026

Gaming x Heavy Industry

Heavy industry is leveraging simulation technology, which was perfected in gaming

Newsletter

Apr 17, 2026

Robotics and Gaming

Robotics is built on gaming

Newsletter

Apr 10, 2026

Modern Day Knights

Both Europe and the United States' most valuable defense tech startups owe heir origins to video games

Newsletter

Apr 3, 2026

The Great Unbundling of Game Engines

AI and open-source are dismantling the monolithic game engine

Newsletter

Mar 27, 2026

Gaming’s Historical Precedent for DevOps

Infrastructure is quickly becoming the next hurdle for AI

Newsletter

Mar 20, 2026

Airplanes from Ireland

Aircraft is now >50% leased, and much of it is owned by companies in Dublin, Ireland

Newsletter

Mar 14, 2026

Value Accrual to the Edge

Infrastructure at the edge is the future

Newsletter

Mar 6, 2026

America's Battery Independence

Section 842 attempts to save the U.S. from an incoming tidal wave of energy demand

Newsletter

Feb 27, 2026

The Evolution of IT Architecture

Choices for architecture have increased and AI can accelerate migration

Newsletter

Feb 20, 2026

An Unreliable Future

Our digital infrastructure is getting fragile

Newsletter

Feb 13, 2026

Red Hat, Open Source, and the Future

The history of Red Hat, how open-source evolved, and what the future holds

Newsletter

Feb 6, 2026

Enterprise AI Adoption

AI should change your workflows, not integrate into existing ones

Newsletter

Jan 30, 2026

Deal Flow is Now Table Stakes

Every investor will soon see everything, everywhere, all at once

Newsletter

Jan 23, 2026

AI Pins: A Short-Sighted Form Factor

Pins will be a forgotten fossil

Newsletter

Jan 16, 2026

The Robotic Standard

The rise of the intelligence stack

Newsletter

Jan 9, 2026

The State of AI & Games

AI continues to make progress reducing costs and changing gameplay

Newsletter

Jan 2, 2026

Skin in the Game

Employee ownership and incentive structures must evolve as a company matures

Newsletter

Dec 19, 2025

Age Gating the Future

Moving towards a verifiable age requirement

Newsletter

Dec 12, 2025

Physical Products are Back

Physical assets are growing in popularity relative to digital

Newsletter

Dec 5, 2025

A System Under Strain: The U.S. Navy

A System Under Strain: The U.S. Navy faces China’s 2027 challenge, turning to drones, robotic fleets, and gamer-native operators worldwide

Newsletter

Nov 21, 2025

The Air Gapped Home

The Air Gapped Home explores rising demand for private, secure home networks, self-hosting, local AI, and control over data and devices.

Newsletter

Nov 14, 2025

Robotics: Generalized vs Specialized

Explore the pros and cons of generalized vs specialized robotics, comparing flexibility, performance, cost, and ideal use cases across industries.

Newsletter

Nov 7, 2025

The Holding Company

Holding companies have shaped American business for over 140 years, evolving through regulation, innovation, and economic cycles to influence modern corporate structures.

Interested in our Newsletters?

Click

to see them all