Discussion about this post

User's avatar
Paul Brinkley's avatar

I often express the same concern you mention here about LLMs, as a take-home lesson: when asking an LLM a question, ask yourself whether it's something that might already have been in its training data.

This has implications for whether you'll be amazed at the LLM's ability to "think" when it was really just reading off its storage device, but also implications for whether you might be able to trust the LLM's answer on a more mundane question. I often use LLMs to find the answer to some obscure programming problem, such as an openssh error message I'm not used to seeing, or a typical way to declare an advanced data structure in an unfamiliar programming language. I figure it's a pretty safe bet that o4 had this or that manual shoved into it. (It also helps that I can check the answer on my own computer.)

Another way LLMs (or other AIs) could be theoretically improved, that would get them closer to AGI, would be prediction power. I bat around a benchmark in which an AI is handed a complex task to run many times, iteratively, and to give me the result. I may (supervised) or may not (unsupervised) tell it that it doesn't have to actually perform the task; I just want to know the result. Does it try to predict the result without performing? Does it perform the task a few times to test a prediction? In general, does it try to save time and resources to shortcut that task, and can it still deliver the correct result? For multiple types of tasks? And can it tell a predictable task from an unpredictable one?

This doesn't seem _that_ farfetched, and could produce an enormous value. Whether that constitutes AGI is a philosophical question, but if it saves people a lot of time, it might not matter.

Expand full comment
2 more comments...

No posts