Fact-checking AI In January DeepSeek seemingly changed everything in AI. Mind-blowing speed at dramatically lower costs. As Lucas Mearian writes, DeepSeek sent âshock wavesâ through the AI community, but its impact likely wonât last. Soon there will be something faster and cheaper. But will there be something that provides what we most need? That is, more accuracy and truth? We canât solve that problem by making AI more open. Itâs deeper than that. âEvery week thereâs a better AI model that gives better answers,â Evans notes. âBut a lot of questions donât have better answers, only right answers, and these models canât do that.â This isnât to say performance and cost improvements arenât needed. DeepSeek, for example, makes genAI models more affordable for enterprises that want to build them into applications. And, as investor Martin Casado and former Microsoft executive Steven Sinofsky suggest, the application layer, not infrastructure, is the most interesting and important area for genAI development. The problem, however, is that many applications depend on right-or-wrong answers, not âprobabilistic ⦠outputs based on patterns they have observed in the training data,â as Iâve covered before. As Evans expresses it, âThere are some tasks where a better model produces better, more accurate results, but other tasks where thereâs no such thing as a better result and no such thing as more accurate, only right or wrong.â In the absence of the ability to speak truth rather than probabilities, the models may be worse than useless for many tasks. The problem is that these models can be exceptionally confident and wrong at the same time. Itâs worth quoting an Evans example at length. In trying to find the number of elevator operators in the United States in 1980 (a number clearly identified in a U.S. Census report), he gets a range of answers: First, I try [the question] cold, and I get an answer thatâs specific, unsourced, and wrong. Then I try helping it with the primary source, and I get a different wrong answer with a list of sources, that are indeed the U.S. Census, and the first link goes to the correct PDF⦠but the number is still wrong. Hmm. Letâs try giving it the actual PDF? Nope. Explaining exactly where in the PDF to look? Nope. Asking it to browse the web? Nope, nope, nopeâ¦. I donât need an answer thatâs perhaps more likely to be right, especially if I canât tell. I need an answer that is right. Just wrong enough But what about questions that donât require a single right answer? For the particular purpose Evans was trying to use genAI, the system will always be just enough wrong to never give the right answer. Maybe, just maybe, better models will fix this over time and become consistently correct in their output. Maybe. The more interesting question Evans poses is whether there are âplaces where [generative AIâs] error rate is a feature, not a bug.â Itâs hard to think of how being wrong could be an asset, but as an industry (and as humans) we tend to be really bad at predicting the future. Today weâre trying to retrofit genAIâs non-deterministic approach to deterministic systems, and weâre getting hallucinating machines in response. Top of Form |
|
|