OpenAI Unveils GPT-4, Months After ChatGPT Stunned Silicon Valley

Four months ago, a small San Francisco company became the talk of the technology industry when it introduced a new online chatbot that could answer complex questions, write poetry and even mimic human emotions.

Now the company is back with a new version of the technology that powers its chatbots. The system will up the ante in Silicon Valley’s race to embrace artificial intelligence and decide who will be the next generation of leaders in the technology industry.

OpenAI, which has around 375 employees but has been backed with billions of dollars of investment from Microsoft and industry celebrities, said on Tuesday that it had released a technology that it calls GPT-4. It was designed to be the underlying engine that powers chatbots and all sorts of other systems, from search engines to personal online tutors.

Most people will use this technology through a new version of the company’s ChatGPT chatbot, while businesses will incorporate it into a wide variety of systems, including business software and e-commerce websites. The technology already drives the chatbot available to a limited number of people using Microsoft’s Bing search engine.

OpenAI’s progress has, within just a few months, landed the technology industry in one of its most unpredictable moments in decades. Many industry leaders believe developments in A.I. represent a fundamental technological shift, as important as the creation of web browsers in the early 1990s. The rapid improvement has stunned computer scientists.

Jakub Pachocki, left, and Nick Ryder, researchers at OpenAI.Credit...Jim Wilson/The New York Times

image

GPT-4, which learns its skills by analyzing huge amounts of data culled from the internet, improves on what powered the original ChatGPT in several ways. It is more precise. It can, for example, ace the Uniform Bar Exam, instantly calculate someone’s tax liability and provide detailed descriptions of images.

But OpenAI’s new technology still has some of the strangely humanlike shortcomings that have vexed industry insiders and unnerved people who have worked with the newest chatbots. It is an expert on some subjects and a dilettante on others. It can do better on standardized tests than most people and offer precise medical advice to doctors, but it can also mess up basic arithmetic.

Companies that bet their futures on the technology may — at least for now — have to put up with imprecision, which was long taboo in an industry built from the ground up on the notion that computers are more exacting than their human creators.

“I don’t want to make it sound like we have solved reasoning or intelligence, which we certainly have not,” Sam Altman, OpenAI’s chief executive, said in an interview. “But this is a big step forward from what is already out there.”

Other tech companies are likely to include GPT-4’s features in an array of products and services, including Microsoft’s software for performing business tasks and e-commerce sites that want to give customers new ways of virtually trying out their products. A number of industry giants like Google and Facebook’s parent company, Meta, are also working on their own chatbots and A.I. technology.

ChatGPT and similar technologies are already shifting the behavior of students and educators who are trying to understand whether the tools should be embraced or banned. Because the systems can write computer programs and perform other business tasks, they are also on the cusp of changing the nature of work.

Even the most impressive systems tend to complement skilled workers rather than replace them. The systems cannot be used in lieu of doctors, lawyers or accountants. Experts are still needed to spot their mistakes. But they could soon replace some paralegals (whose work is reviewed and edited by trained lawyers), and many A.I experts believe they will replace workers who moderate content on the internet.

“There is definitely disruption, which means some jobs go away and some new jobs get created,” said Greg Brockman, OpenAI’s president. “But I think the net effect is that barriers to entry go down, and the productivity of the experts goes up.”

On Tuesday, OpenAI started selling access to GPT-4 so that businesses and other software developers could build their own applications on top of it. The company has also used the technology to build a new version of its popular chatbot, which is available to anyone who purchases access to ChatGPT Plus — a subscription service priced at $20 a month.

A handful of companies are already working with GPT-4. Morgan Stanley Wealth Management is building a system that will instantly retrieve information from company documents and other records, and serve it up to financial advisers in conversational prose. Khan Academy, an online education company, is using the technology to build an automated tutor.

“This new technology can act more like a tutor,” said Khan Academy’s chief executive and founder, Sal Khan. “We want it to teach the student new techniques while the student does most of the work.”

Like similar technologies, the new system sometimes “hallucinates.” It generates completely false information without warning. Asked for websites that lay out the latest in cancer research, it might give several internet addresses that do not exist.

GPT-4 is a neural network, a type of mathematical system that learns skills by analyzing data. It is the same technology that digital assistants like Siri use to recognizes spoken commands and self-driving cars use to identify pedestrians.

Around 2018, companies like Google and OpenAI began building neural networks that learned from enormous amounts of digital text, including books, Wikipedia articles, chat logs and other information posted to the internet. They are called large language models, or L.L.M.s.

By pinpointing billions of patterns in all that text, the L.L.M.s learn to generate text on their own, including tweets, poems and computer programs. OpenAI threw more and more data into its L.L.M. More data, the company hoped, would mean better answers.

OpenAI also refined this technology using feedback from human testers. As people tested ChatGPT, they rated the chatbot’s responses, separating those that were useful and truthful from those that were not. Then, using a technique called reinforcement learning, the system spent months analyzing those ratings and gaining a better understanding of what it should and should not do.

“Humans rate which stuff they like to see and which stuff they don’t like to see,” said Luke Metz, an OpenAI researcher.

The original ChatGPT was based on a large language model called GPT-3.5. OpenAI’s GPT-4 learned from significantly larger amounts of data.

OpenAI executives declined to disclose just how much data the new chatbot had learned from, but Mr. Brockman said the data set was “internet scale,” meaning it spanned enough websites to provide a representative sample of all English speakers on the internet.

GPT-4’s new capabilities may not be obvious to the average person first using the technology. But they are likely to quickly come into focus as laypeople and experts continue to use the service.

Given a lengthy article from The New York Times and asked to summarize it, the bot will give a precise summary nearly every time. Add a few random sentences to that summary and ask the chatbot if the revised summary is accurate, and it will point to the added sentences as the only inaccuracies.

Mr. Altman described the behavior as “reasoning.” But the technology cannot duplicate human reasoning. It is good at analyzing, summarizing and answering complex questions about a book or news article. It is far less adept if asked about events that have not yet happened.

It can write a joke, but it does not show that it understands what will actually make someone laugh. “It doesn’t grasp the nuance of what is funny,” said Oren Etzioni, the founding chief executive of the Allen Institute for AI, a prominent lab in Seattle.

As with similar technologies, users may find ways of coaxing the system into strange and creepy behavior. Asked to imitate another person or playact, this kind of bot sometimes veers into areas it was designed to stay away from.

GPT-4 can also respond to images. Given a photograph, chart or diagram, the technology can provide a detailed, paragraphs-long description of the image and answer questions about its contents. It could be a useful technology for people who are visually impaired.

On a recent afternoon, Mr. Brockman showed how the system reacted to images. He gave the new chatbot an image from the Hubble Space Telescope and asked it to describe the photo “in painstaking detail.” It responded with a four-paragraph description, which included an explanation of the ethereal white line that stretched across the photo. A “trail from a satellite or shooting star,” the chatbot wrote.

OpenAI executives said the company was not immediately releasing the image description part of the technology because they were unsure how it could be misused.

Building and serving up chatbots is enormously expensive. Because it is trained on even larger amounts of data, OpenAI’s new chatbot will increase the company’s costs. Mira Murati, OpenAI’s chief technology officer, said the company could curtail access to the service if it generated too much traffic.

But in the long term, OpenAI plans to build and deploy systems that can juggle multiple types of media, including sound and video as well as text and images.

“We can take all these general-purpose knowledge skills and spread them across all sorts of different areas,” Mr. Brockman said. “This takes the technology into a whole new domain.”