there is no truth machine

mar 08 2025

a common complaint people make about current large language models is that they lie. i get the complaint, of course; having Google confidently make up a lie in response to your search query is nothing short of ridiculous. you might, in response, wish for it to simply stop lying. can't we just make it tell the truth instead? maybe if we throw more and more data at it, eventually it will run out of ways to be wrong about things?

well, no, not really. in fact, the more you think about it, the more you feel like an idiot for even asking that kind of question in the first place.

so, for the sake of argument, let's imagine that we're trying to create some sort of machine learning system that tells the truth. in order to do that, we need to score it on how truthful each output it gives actually is. now, in order to do that, we need to create a scoring function, which when given an input, is able to tell if that input is telling the truth or lying.

okay, so how do we do that? well, first you need to answer every possible question that could ever be asked with 100% accuracy. now, if you know anything about knowing things, you'd know that's pretty obviously impossible. there's countless questions that we don't know the answers to, and countless more that we don't even know to ask yet. worse yet, if you make a mistake, the hypothetical AI will probably just learn to tell people what they want to hear.

even if we assume that we've somehow managed to create an omniscient truth machine, we'd just end up with another problem; which is that sometimes telling people the truth is kinda the wrong thing to do. as an example, if a wannabe criminal asked it for detailed instructions on how to commit their crime, it would do exactly as asked; leaving us — as the operator — on the hook for aiding and abetting that crime.

now, over here in the Real World, large language models don't "think" about the "truth", they simply predict a sensible continuation of an input text. if that input text is the start of a conversation asking a question, it may output an answer which looks reasonably like the truth. combine this with a slick UI reminds that reminds the user of talking to a person, and your mind will work hard to fill in the gaps.

it's no wonder then — when you think about how an LLM works, and what it was actually programmed to do — that they end up spitting out truthy lies so often. if anything, it's a miracle of statistics that sometimes they don't. from the perspective of modeling natural language, telling a lie is just as valid of a behavior as telling the truth. if the training data is full of people lying to eachother, why wouldn't the model start lying?

Google claims that you can use glue to stop sauce from coming off pizza, which was originally said a decade prior by a Reddit user going by the name of "fucksmith".

fucksmith said it, so why shouldn't we?

call me a curmudgeon, but i just don't think generating text with a language model will ever more than a parlor trick. just because you can make it do that, doesn't mean that you should. statistical models should be used to analyze data, not synthesize it! why should anyone want a machine that just turns text into more text; haven't we all written enough bullshit already anyways?