Thinking About AI


I am writing this post to organize and share my thoughts about the extraordinary progress in artificial intelligence over the last years and especially the last few months (link to a lot of my prior writing). First, I want to come right out and say that anyone still dismissing what we are now seeing as a “parlor trick” or a “statistical parrot” is engaging in the most epic goal post moving ever. We are not talking a few extra yards here, the goal posts are not in the stadium anymore, they are in a far away city.

Growing up I was extremely fortunate that my parents supported my interest in computers by buying an Apple II for me and that a local computer science student took me under his wing. Through him I found two early AI books: one in German by Stoyan and Goerz (I don’t recall the title) and Winston and Horn’s “Artifical Intelligence.” I still have both of these although locating them among the thousand or more books in our home will require a lot of time or hopefully soon a highly intelligent robot (ideally running the VIAM operating system – shameless plug for a USV portfolio company). I am bringing this up here as a way of saying that I have spent a lot of time not just thinking about AI but also coding on early versions and have been following closely ever since.

I also pretty early on developed a conviction that computers would be better than humans at a great many things. For example, I told my Dad right after I first learned about programming around age 13 that I didn’t really want to spend a lot of time learning how to play chess because computers would certainly beat us at this hands down. This was long before a chess program was actually good enough to beat the best human players. As an aside, I have changed my mind on this as follows: Chess is an incredible board game and if you want to learn it to play other humans (or machines) by all means do so as it can be a lot of fun (although I still suck at it). Much of my writing both here on Continuations and in my book is also based on the insight that much of what humans do is a type of computation and hence computers will eventually do it better than humans. Despite that there will still be many situations where we want a human instead exactly because they are a human. Sort of the way we still go to concerts instead of just listening to recorded music.

As I studied computer science both as an undergraduate and graduate student, one of the things that fascinated me was the history of trying to use brain like structures to compute. I don’t want to rehash all of it here, but to understand where we are today, it is useful to understand where we have come from. The idea of modeling neurons in a computer as a way to build intelligence is quite old. Early electromechanical and electrical computers started getting built in the 1940s (e.g. ENIAC was completed in 1946) and the early papers on modeling neurons can be found from the same time in work by McCulloch and Pitts.

But almost as soon as people started working on neural networks more seriously, the naysayers emerged also. Famously Marvin Minsky and Seymour Paper wrote a book titled “Perceptrons” that showed that certain types of relatively simple neural networks had severe limitations, e.g. in expressing the XOR function. This was taken by many at the time as evidence that neural networks would never amount to much, when it came to building computer intelligence, helping to usher in the first artificial intelligence winter.

And so it went for several cycles. People would build bigger networks and make progress and others would point out the limitations of these networks. At one time people were so disenchanted that very few researchers were left in the field altogether. The most notable of these was Geoffrey Hinton who kept plugging away at finding new training algorithms and building bigger networks.

But then a funny thing happened. Computation kept getting cheaper and faster and memory became unfathomably large (my Apple II for reference had 48KB of storage on the motherboard and an extra 16KB in an extension card). That made it possible to build and train much larger networks. And all of a sudden some tasks that had seemed out of reach, such as deciphering handwriting or recognizing faces started to work pretty well. Of course immediately the goal post moving set in with people arguing that those are not examples of intelligence. I am not trying to repeat any of the arguments here because they were basically silly. We had taken a task that previously only humans could do and built machines that could do them. To me that’s, well, artificial intelligence.

The next thing that we discovered is that while humans have big brains with lots of neurons in them, we can use only a tiny subset of our brain on highly specific tasks, such as playing the game of Go. With another turn of size and some further algorithmic breakthroughs all of a sudden we were able to build networks large enough to beat the best human player at Go. And not just beat the player but do so by making moves that were entirely novel. Or as we would have said if a human had made those moves “creative.” Let me stay with this point of brain and network size for moment as it will turn out to be crucial shortly. A human Go player not only can only use a small part of their brain to play the game but the rest of their brain is actually a hindrance. It comes up with pesky thoughts at just the wrong time “Did I leave the stove on at home?” or “What is wrong with me that I didn’t see this move coming, I am really bad at this” and all sorts of other interference that a neural network just trained to play Go does not have to contend with. The same is true for many other tasks such as reading radiology images to detect signs of cancer.

The other thing that should have probably occurred to us by then is that there is a lot of structure in the world. This is of course a good thing. Without structure, such as DNA, life wouldn’t exist and you wouldn’t be reading this text right now. Structure is an emergent property of systems and that’s true for all systems, so structure is everywhere we look including in language. A string of random letters means nothing. The strings that mean something are a tiny subset of all the possible letter strings and so unsurprisingly that tiny subset contains a lot of structure. As we make neural networks bigger and train them better they uncover that structure. And of course that’s exactly what that big brain of ours does too.

So I was not all that surprised when large language models were able to produce text that sounded highly credible (even when it was hallucinated). Conversely I found the criticism from some people that making language models larger would simply be a waste of time confounding. After all, it seems pretty obvious that more intelligent species have, larger brains than less intelligent ones (this is obviously not perfectly correlated). I am using the word intelligence here loosely in a way that I think is accessible but also hides the fact that we don’t actually have a good definition of what intelligence is, which is what has made the goal post moving possible.

Now we find ourselves confronted with the clear reality that our big brains are using only a fraction of their neurons for most language interactions. The word “most” is doing a lot of work here but bear with me. The biggest language models today are still a lot smaller than our brain but damn are they good at language. So the latest refuge of the goal post movers is the “but they don’t understand what the language means.” But is that really true?

As is often the case with complex material, Sabine Hossenfelder, has a great video that helps us think about what it means to “understand” something. Disclosure: I have been supporting Sabine for some time via Patreon. Further disclosure: Brilliant, which is a major advertiser on Sabine’s channel, is a USV portfolio company. With this out of the way I encourage you to watch the following video.

So where do I think we are? At a place where for fields where language and/or two dimensional images let you build a good model, AI is rapidly performing at a level that exceeds that of many humans. That’s because the structure it uncovers from the language is the model. We can see this simply by looking at tests in those domains. I really liked Bryan Caplan’s post where he was first skeptical based on an earlier version performing poorly on his exams but the latest version did better than many of his students. But when building the model requires input that goes beyond language and two dimensional images, such as understanding three dimensional shapes from three dimensional images (instead of inferring them from two dimensional ones) then the currently inferred models are still weak or incomplete. It seems pretty clear though that progress in filling in those will happen at a breathtaking pace from here.

Since this is getting rather long, I will separate out my thoughts on where we are going next into more posts. As a preview, I believe we are now at the threshold to artificial general intelligence, or what I call “neohumans” in my book The World After Capital. And even if that takes a bit longer, artificial domain specific intelligence will be outperforming humans in a great many fields, especially ones that do not require manipulating the world with that other magic piece of equipment we have: hands with opposable thumbs. No matter what the stakes are now extremely high and we have to get our act together quickly on the implications of artificial intelligence.