Why VR/AR Gets Farther Away as It Comes Into Focus

image

Modern efforts to build extended reality (XR) devices—i.e., dedicated virtual reality (VR), dedicated augmented reality (AR), hybrid mixed reality (MR)—began more than a decade ago. Magic Leap was founded in 2010, the same year Microsoft started development on its HoloLens platform, which released its first model in 2016, with the second coming in 2019. The first Google Glass prototype was in 2011, with the first Explorer Edition coming in 2013 and the Enterprise Edition 2 launching as recently as 2019; a reconceived model was field-tested in 2022. Google’s Cardboard VR platform and software development kit (SDK) came in 2014, with the Daydream VR platform coming two years later. Sony PlayStation began development of its VR platform in 2011, which then debuted in 2016. Oculus was founded in 2012, with Facebook acquiring the company in 2014, and the Oculus Rift coming to market in 2016, followed by another four models through 2022. In 2014, Snap acquired Vergence Labs, an AR glasses start-up that had been founded in 2011, and served as the foundation for the Snap Spectacles, which premiered in 2016, and have seen three updates. Despite the failure of the Fire Phone, a 3D-enabled smartphone that had four front-facing cameras at a time where the smartphones had one or at most two, Amazon began development of its Alexa-based AR glasses sometime in 2016 or 2017. The first Echo Frames was released in 2019, with the second edition coming two years later.

As we observe the state of XR in 2023, it’s fair to say the technology has proved harder than many of the best-informed and most financially endowed companies expected. When it unveiled Google Glass, Google suggested that annual sales could reach the tens of millions by 2015, with the goal of appealing to the nearly 80% of people who wear glasses daily. Though Google continues to build AR devices, Glass was an infamous flop, with sales in the tens of thousands (the company’s 2022 AR device no longer uses the Glass brand). Throughout 2015 and 2016, Mark Zuckerberg repeated his belief that within a decade, “normal-looking” AR glasses might be a part of daily life, replacing the need to bring out a smartphone to take a call, share a photo, or browse the web, while a bigscreen TV would be transformed into a $1 AR app. Now it looks like Facebook won’t launch a dedicated AR headset by 2025—let alone an edition that hundreds of millions might want.

In 2016, Epic Games founder/CEO Tim Sweeney predicted not only that within five to seven years, we would have not just PC-grade VR devices but also that these devices would have shrunk down into Oakley-style sunglasses. Seven years later, this still seems at best seven years away. Recent reporting says Apple’s AR glasses, which were once targeted for a 2023 debut and then pushed to 2025, have been delayed indefinitely. Snap’s Spectacles launched to long lines and much fanfare, with another three editions launched by 2021. In 2022, the division was largely shuttered, with the company refocusing on smartphone-based AR. Amazon has yet to launch any Echo Frames with a screen, rather than just onboard Alexa. Google’s head of VR/AR is a direct report to CEO Sundar Pichai, though the company’s next (i.e., fourth) swing at XR is expected no sooner than 2024. In 2019, Magic Leap raised $300 million at a $7 billion post-money valuation. Two years later, the company raised $500MM at a $2.5B post-money valuation, a 66% drop that also meant the company was worth 30% less than the $3.5B in cash it had raised life-to-date. In January 2022, reports emerged that Saudi Arabia’s sovereign wealth fund had taken majority control of the company following a $450MM equity-and-debt deal, suggesting that the company’s valuation had fallen to less than a billion dollars, possibly down to even half a billion.

Over the past 13 or so years, there has been some technical progress. And we do see growing deployment. XR is selectively used in civil engineering, in film production, and in industrial design. Some schools use VR some of the time in some classes. VR is also increasingly popular for workplace safety training, especially in high-risk environments such as oil rigs; teaching personnel how, when, and where people look is already having life-saving applications. And on the topic of saving lives, Johns Hopkins has been using XR devices for live patient surgery for more than a year, beginning with the removal of cancerous spinal tumors. If you use a high-end VR headset such as the Varjo Aero (which also requires a physical tether to a gaming-grade PC) to play a title such as Microsoft Flight Simulator (which operates a 500,000,000 square kilometer reproduction of the earth, with two trillion individual rendered trees, 1.5 billion buildings, and nearly every road, mountain, and city globally), there is the unmistakable feeling the future is near.

Though the above examples are technically impressive and meaningful, it’s difficult to say that XR has “a killer app” in market today, or that these devices can substitute for the devices we use today. There are some games with strong sales—a few titles have done over $100MM—but none where one might argue that, if only graphics were to improve by X%, large swaths of the population would use it regularly. I strongly prefer doing VR-based presentations to those on Zoom—where I spend 30-60 minutes staring at a camera as though no one else is there. But the experience remains fraught; functionality is limited; onboarding other individuals is rarely worth the benefit because the benefits are both rare and small. When the iPhone launched, Steve Jobs touted it did three distinct things—MP3 player, phone, internet communicator—better than the single-use devices then on the market. The following year, the iPhone launched its App Store and “There’s an App for That” proliferated, with tens of millions doing everything they could on the device.

The Meta Quest 2 has sold well, with an estimated 20MM+ units since November 2020. This is broadly comparable to the sales of the Xbox Series X and PlayStation 5, which were released at the same time, but had an easier path to adoption. For example, the first four PlayStations had sold nearly half a billion units since 1994, and the PS5 launched with the sequel to the PS4’s best-selling title. But while the install bases are comparable, there is little evidence of comparable active user bases, let alone comparable usage per user. As of March 2022, the average PlayStation 5 owner used the device 50 hours per month, or roughly two hours a day (15% more than the PlayStation 4 at the same point in its life cycle). Annual sales of the Xbox and PlayStation also continue to grow in their third year, while Meta Quest 2 declined in its second year. To be clear, the Meta Quest 2 does not have be to XR what the iPhone was to smartphones. But even pre-iPhone smartphones demonstrated clear product-market fit at mass scale, not just potential. Smartphones also had a comparatively easier path to adoption. For most people in the world, buying a smartphone meant owning their first and only computer (and if not, it was their third, second, and only portable computer). For most VR buyers today, the device is their 4th or 5th computer after a PC/Mac, smartphone, tablet, and console. Unless an XR device can replace one of these devices, people are unlikely to adopt it.

The path to replacing an existing device category will not happen suddenly. Hundreds of millions will first use VR/AR alongside their consoles, PCs, and smartphones before tens of millions drop one of the latter for the first – and hundreds of millions will continue to use both longer after (this essay is written on a PC, for example). Return to my Johns Hopkins example. After completing the surgery, Dr. Timothy Witham, who is also director of the hospital’s Spinal Fusion Laboratory, likened the experience to driving a car with GPS. I love this analogy because it shows how XR can complement existing devices and behaviors rather than displace them (it also complements reality, rather than disconnecting us from it). Put another way, we drive a car with GPS; we don’t drive GPS instead of a car, and GPS doesn’t replace the onboard computer either. What’s more, many of us travel more often because GPS exists. Dr. Witham also provides a framework through which we can evaluate the utility XR devices. To exist, they need not upend convention, just deliver better and/or faster and/or cheaper and/or more reliable outcomes. But even under these more moderated measures, the future seems far off. GPS began to see non-military adoption in the 1990s, but it took another two decades to mature in cost and quality to become a part of daily life. Furthermore, the mainstream value in GPS was not only in improving commutes but in enabling applications as diverse as Tinder, Siri, Yelp, Spotify, and many others.

This delay is not for lack of investment; the past decade and a half has seen a lot of it. Meta has been spending $10–12B per year on its XR initiatives, and for several years. Lifetime spending is estimated at $40–50B and growing. Yet there is widespread mockery of the Quest line’s flagship first-party titles, such as Horizon Workrooms or Horizon Worlds, or its features, such as its legless avatars. We don’t know Apple’s spend, but it is a top 20 patent filer in the United States, and for years, 30–50% of these patents have been attributed to XR-related functionality (though not exclusively VR/AR/MR). According Alex Heath at The Verge, Apple has thousands of employees working on standalone XR devices, and have had for years (Apple’s VR/MR device was originally scheduled for 2019). In 2021, Microsoft signed a contract with the U.S. Army worth up to $22B by 2031—and for only 120,000 HoloLens headsets. This sum, which equates to $183,000 per device (though the price tag includes software, repairs, data center services), doubtless financed extensive R&D. But in January 2023, Congress denied the Army’s request to draw $400MM of the roughly $21.5B in unspent funds to buy another 7,000 units, having previously spent $350MM on 5,000 units. This first batch fell short of many field tests throughout 2022, with the military finding 80% to lead to “mission-affecting physical impairments” including headaches, eyestrain, and nausea. Microsoft was granted $40MM, or 10% of the Army’s request, to develop a new model. A week later, Microsoft announced 10,000 in layoffs as part of a broad cost-cutting measure. According to Bloomberg, the HoloLens division was disproportionately affected.

Patent Office publishes 2nd Apple Ring patent, this time using "Self-Mixing Interferometry" for Sensor-based Gesture SystemApplications span solo or multi-ring use, with or without Apple Pencil, supporting AR, VR, and MR pic.twitter.com/Uf95HsutwN

Many entrepreneurs, developers, executives, and technologists still believe XR is the future. In particular, these groups believe in AR glasses that will eventually replace most of our personal computers and TV screens. And history does show that over time, these devices get closer to our face, while also more natural and immersive in interface, leading to increased usage too. But why is this future so far behind? How many XR winters must come and go before a spring actually leads to summer?

“It Looks Like Wii Sports”

More than half of all households in the United States own a video game console. In almost all cases, this console is the most powerful computing device owned, used, or even seen by the members of that household. This includes those households who own the most recent model of iPad Pro or work in an office with a high-end enterprise PC or Mac. Regardless which one they choose, that video game console is also more affordable than most other consumer or even professional-grade computing devices. It typically costs more, for example, to purchase a comparably powered gaming PC or even to replace the graphics card on an existing PC. This is because consoles benefit from substantial economies of scale, with their manufacturers shipping 50–150MM mostly standardized units over a decade. Purchasing individual components, each one individually packaged, marked-up, and retailed, often with new models released annually, is expensive. Video game consoles are also subsidized, typically by $100–$200, as their manufacturers pursue a razor-and-blades model whereby subsequent software purchases eventually recoup the money lost selling the hardware. No graphics card or monitor manufacturer gets a cut of your Robux or V-Bucks.

Compared to everyday devices, the computing power of a video game console is so great that in 2000, Japan even placed export limitations on its own beloved giant, Sony, and its signature PlayStation 2 console. The government feared that the PS2 could be used for terrorism on a global scale, for instance to process missile guidance systems. The following year, in touting the importance of the consumer electronics industry, U.S. Secretary of Commerce Don Evans stated that “yesterday’s supercomputer is today’s PlayStation.” Evans’s pronouncement was powerful—even though it was arguably backwards. In 2010, the U.S. Air Force Research Laboratory built the 33rd-largest supercomputer in the world using 1,760 Sony PlayStation 3s. The project’s director estimated that the “Condor Cluster” was 5% to 10% the cost of equivalent systems and used 10% of the energy. The supercomputer was used for radar enhancement, pattern recognition, satellite imagery processing, and artificial intelligence research. In other words, today’s PlayStation is tomorrow’s supercomputer.

Yet in many ways, video game consoles have it easy. Consider the PlayStation 5 or Xbox Series X, both top-of-the-line video game consoles released in 2020. These devices are nearly ten pounds and larger than a shoebox—brutal in comparison to other consumer electronics devices, but fine given that these devices are placed inside a media shelving console and never moved. In fact, it’s not fine—it’s an advantage! Because these devices can be large, unsightly, and stationary, Sony and Microsoft get to place large and loud fans inside their consoles, which keep these consoles cool as they perform their intensive calculations, and aid these fans with large intake and exhaust vents. Sony and Microsoft can also keep component costs down because they don’t need to prioritize their size the way a smartphone manufacturer must. And while Sony’s and Microsoft’s consoles are heavy, they, unlike most consumer devices, never need a battery. Instead, they receive constant power from the electrical grid. This reduces the size of the device, as well as the heat it generates, which in turn means that the fan can be smaller, too, and means they can run indefinitely, rather than just a few hours.

Consoles typically support two forms of connectivity, Wi-Fi and Ethernet, but have no need for mobile network support, which is great because the latter is another chipset and particularly rough on battery life. Video game consoles also don’t need to bring their own display; they hook into a television. This shifts costs to another purchase—no one thinks a PlayStation is $500 plus a TV—while also enabling users to choose-their-own quality (e.g., 1K v. 4K definition, LED v. OLED). Most TVs are also 6–10 feet from the viewer, which means more than 2K resolution is often pointless unless your TV is over 70 inches diagonal. These televisions also work in optimal or least relatively stable environments—in a darkly-lit living room, rather than in an office or worse, on the street, where lighting is uncontrollable and variable.

Video game consoles can also offload a good portion of their work to other devices, such as a standalone controller for input or a separately purchased headset for audio (the PlayStation 5 doesn’t even include Bluetooth, so headset purchases include Bluetooth dongles to add the functionality). Video game consoles don’t need to know or figure out very much. They don’t need to track your motion, let alone your eyes, or scan and process your environment. And of course, video game consoles are leisure devices. It’s not important that they simulate physics precisely (in fact, most game engines are nondeterministic, meaning that a given action will not always produce the same outcome), be “photo real,” or support “work.” These devices primarily just carry out pre-set activities in pre-set environments, with a small array of items and only a few other players in highly defined functions (e.g. when playing Fortnite, you’re playing Fortnite, whereas your PC might be running many applications and processes concurrently), and lots of simplification (the cars in Fortnite don’t need to drive like real cars, the bushes can be walked through without impacting velocity, etc).

This context around consoles is important to keep in mind as we consider VR/AR/MR. It’s common to hear the critique that the experiences produced by these devices look worse than those produced by the consoles of a decade ago that cost half as much at the time. When it comes to visually rendering a virtual environment, VR/AR/MR devices will always fall short of a modern video game console. Always. This is because the “work” performed by these devices is far, far harder while the constraints are far, far greater.

A simple starting point is weight. Any XR device that aspires for regular use or long session times needs to be lightweight. There is broad consensus that the tolerable range for a VR/MR headset is about 300–700 grams, depending on purpose (consumer v. enterprise) and ideal session length (you’re more likely to play for three hours with 300 grams on your head than 700). So a VR device needs to be roughly 90% lighter than a PlayStation 5 or Xbox Series X, 60% lighter than a Nintendo Wii, and a bit smaller than the Nintendo Switch. Of course, AR glasses have to be even lighter. While no one would reasonably insist that these devices must weigh the 15­–40 grams of the average “dumb” glasses worn today, they still have to be under about 150 grams if they’re going to be worn all day. This weight constraint has natural, and unforgiving, consequences on processing power.

But remember, these devices also need to carry their own battery. This makes the device larger, for one, as well as more expensive, while also limiting how long they can be used. Furthermore, not only are batteries quite heavy relative to their space requirements, they also generate significant heat. And unlike a console, which always sits a few feet from the user, these batteries are placed on the user’s head. All of this means that XR device batteries must be small and lightweight while also being powerful and efficient!

XR devices have other additional burdens, too. For example, onboard speakers and microphones are required, as well as a multitude of cameras (most believe a minimum of 12) that can track the user’s eyes, face, fingers, and local environment. There are also sensors (e.g., gyroscope, light, heat, and so on). And AR devices need a mobile networking chip, which again adds weight and bulk while also draining battery and generating lots of heat.

Finally, there’s the display. XR devices need to bring their own screen, which produces still more heat (again, an inch from the user’s face) and still more weight. Because of their proximity to the user’s eyes, these screens also need to be dense. The minimum spec is typically defined as 8K pixels, although many believe 16K to be optimal. This means several times as many pixels as the average TV, which yet again translates into more cost, weight, and battery drain.

So now we’ve discussed how much more stuff an XR device needs, even though it needs to be a fraction of the weight of a console as well as aesthetically pleasing and comfortable. Then there’s the work that the device needs to do, which is far more complex than anything a video game console does. For example, XR devices need to understand their environment, not just capture images of it. This means determining the placement, color, and shape of objects, in some cases their materials and texture, as well as what the object actually is, what it’s doing, and what it is likely to do. They must also manage complex materials, such as reflective and transparent surfaces, and warn the user of key boundaries and dangers. Hands, eyebrows, lips, and other body parts must be precisely and quickly measured for reproduction. Often, these devices are asked to support video calling—that is, displaying “real-world” video inside a simulated environment—or to reproduce the screen of “real-world” device, such as a PC or iPhone, inside the virtual environment. Consoles, of course, are asked to do none of this. Instead they can focus their relatively abundant computing power on “graphics.”

Then there’s the rendering issue. I mentioned earlier that XR devices require their own screens and aspire to at least 8K (optimally 16K) pixels due to their proximity to the eye. The portable Nintendo Switch, as a point of contrast, is only 720p. This means that it is responsible for rendering less than 10% the pixels per second that an XR device will have to do. The Xbox Series X and PlayStation 5, which don’t include their own display, come from a later generation of consoles than the Switch, are stationary and constantly powered, weigh more than three times as much, and support a maximum of 4K (25-50% of the XR target). Every additional pixel requires an increase computing power.

Yet the number of pixels is only half of the equation. The other half is how frequently they are updated. Most video games target a 60 Hz refresh rate (meaning 60 updates per pixel per second; this is basically called “frames per second” or FPS, to simplify) Some titles will support 120 Hz or more, though this typically means reducing the definition that is rendered. Fortnite, for example, supports 4K in 60 Hz, but if a player wants 120 Hz (which helps with competitive play), the resolution drops to 1440p. On the Nintendo Switch, a relatively low powered device, Fortnite is available only at a max of 30 Hz. Furthermore, many titles prioritize the complexity of the game (which needs lots of computing power) over visual specifications. The AAA video game Gotham Knights, released in 2022, only operates at 30 Hz, for example. Because of the requirement of reproducing visual reality, there is broad consensus that an XR headset must refresh at 120 frames per second—ideally 240—in order to avoid nausea. This stems from the fact that interaction in XR comes from your head, not your thumb, and any lag between input and output is more noticeable when the screen sits on your face versus across the room, which results in a feeling of sensory disconnect. Add it all up, and an XR device has to render many, many, many more pixels per second than a traditional device. This (again) generates a lot more heat, uses a lot more battery, requires a lot more space, and most important, devours horsepower that could otherwise be used to make a more high-fidelity or more complex render.

There are some additional and particularly brutal optical constraints when we’re talking about AR devices. Most laptops support between 300–600 nits of brightness, with enterprise-level laptops typically peaking at 1,000, while TVs run from 600–1,500. Smartphones need 2,000 nits to be seen in the sun and, as we all know, they will still wash out on a moderately sunny day. For this reason, AR devices have to crank out way more brightness in order to be widely usable, which is another huge battery drain. AR devices also need to let light through their displays in order to augment what’s behind it. This leads to one of two options. First, the display must selectively broadcast even more nits to overcome the light being passed through. Or it needs to selectively shade down what shouldn’t be highlighted. Both of these options are difficult on a pixel-by-pixel basis (no one has really solved this yet), not to mention computationally intensive.

There are ways to overcome some of the aforementioned issues. For example, some XR devices rely on secondary units that provide additional computing power while also reducing weight. XR devices for military personnel typically include a cord-connected mini “bus” that might be worn inside a backpack, with the headset physically cabled to the bus. When surgeons use XR devices, as is now the case at Johns Hopkins, most of the “stuff” is placed under the operating table, with the headset mostly limited to cameras, headphones, display, and mic. Sony’s $600 PlayStation VR2 requires the owner to own and physically connect to the $500 PlayStation 5, which is large, heavy, powerful, more easily cooled, connected to the electrical grid, and the like. Some patents suggest that Meta’s AR glasses will include a pacemaker-like device that will hang off the wearer’s belt or sit in their pocket. Some rumors suggest Apple will use “relay stations” situated around the user’s living room or office. Think about these like stationary pacemakers or consoles designed only to support the XR device. Enterprise AR headsets, such as those used in a factory line, need not be individually owned, nor do they need general-purpose components, wireless chips, long-lasting batteries, or an attractive appearance. They can just be swapped in and out through company-owned charging stations, the same way a police car might be. Enterprise-owned devices can also be far more expensive because they’re shared and drive revenue, not just generate fun.

But Does It Play Better

All consumer tech faces tradeoffs and hard problems. But XR devices require so many points of optimization - heat, weight, battery life, resolution, frame rate, cameras, sensors, cost, size, and so on. Zuckerberg’s belief in this device category, placed aside these problems, explains how it’s possible he’s spending $10B+ year after year after year. That money is being sunk into optics, LEDs, batteries, processors, cameras, software, operating systems, and the like. And if Zuckerberg can crack this, with nearly all of his competitors years behind (if they’re bothering at all), the financial returns may be extraordinary. In early 2021, Zuckerberg said “The hardest technology challenge of our time may be fitting a supercomputer into the frame of normal-looking glasses. But it's the key to bringing our physical and digital worlds together.”

"...I wouldn't be against him... I think there's good odds for it, but whether it works out or not commercially, it's an incredible advance in technology... and they're really moving it ahead and it will really benefit all of us"[Fin]

The immense difficulty of XR also explains why “the graphics look like they’re from the Wii” is actually a compliment—it’s a bit like saying an adult ran 100 meters as fast as a 12-year-old, even though the adult was wearing a 50-pound backpack and solving math problems at the same time. This defense is, of course, separate from whether Meta’s art style is good relative to its constraints. There’s pretty widespread consensus it’s bad. However, it’s not quite fair to compare the graphics of Meta’s avatars or signature products, such as Horizon Workrooms, to those of third party titles such as VRChat or RecRoom. This fidelity is available to Meta, but only selectively – as we know, “graphics” are just one part of the computing equation. For example, a two-person meeting in Horizon Workrooms that expands to eight might require a halving of the frame rate or avatar definition or accuracy in eye reproduction, while also draining batteries far faster. Or your avatar—intended to be a representation of you—could look better or worse, more detailed or generic, legged or legless, depending on which application you’re using it in. This gets eerie, distracting, and annoying.

And there has been progress. Consider the Oculus line, which originated as a VR-only device but has since expanded to VR with a “mixed reality” mode. 2016’s Oculus Rift had a 1K display that was capable of 90 frames per second (i.e., 90,000 pixel updates per second), supported two to three hours of usage, weighed 500 grams, and cost $400 but had no external cameras (and was thus VR-only). 2020’s Oculus Quest 2 had the battery life, weight, and price despite quadrupling the resolution to 4K and supporting up to 120 frames per second (480,000 pixel updates per second), while also adding four external cameras (enabling the otherwise VR-only device to “see” and understand parts of the real room). Compared to the Quest 2, 2022’s Quest Pro increased the resolution to roughly 5.25K (the color contrast increased 75%) while adding color to the previously black-and-white external cameras, while the overall camera count tripled to 12 (substantially improving hand tracking while also adding facial and eye tracking). The Quest Pro was also the first model to use “foveated rendering,” which requires eye-tracking cameras. The human eye has foveal vision, which means that our eyesight is sharpest at the center and blurrier at the periphery. By tracking the user’s eye, it’s possible for a headset to concentrate its computing power where the user is looking and reduce fidelity elsewhere. An 8–16K display is still needed, as the user’s eyes might be directed at any point on the screen, but 90% of the screen might only be rendered in 2–4K at any one point, thereby saving battery and computing power.

At the same time, not all of the Quest 2 or Quest Pro’s advances were “cost-free.” The Quest 2’s 120 Hz mode is only selectively supported; more complex games are limited to 90 or even 72 Hz . The Quest Pro has a maximum frame rate of 90 Hz, and compared to the Quest 2, battery life is fully a third shorter (two hours, not three), weight jumped 40% to 700 grams, and cost more than tripled to $1,500. More broadly, 90 Hz at 4K is still literal multiples away from 120–240 Hz at 8–16K. And that’s just the number of pixels rendered, not their fidelity and not the sophistication of the simulation behind it. The external cameras, meanwhile, still can’t richly diagnose the world around them. And so on.

While VR/MR devices still have a ways to go, with time this gap will close. The unveiling of Apple’s mixed-reality device this spring will be an important time check. Apple is unmatched in its ability to produce world-class hardware, even when it is mostly reliant upon third party components. Equally impressive is how Apple’s hardware works in harmony with a bespoke operating system and interface, which will present and advantage in a category that so far lacks best practices and clear answers. This has enabled Apple to routinely crack open a long-stagnant or slowly developing computing models, from the GUI itself to MP3 player, smartphone, tablet, and smart watch. Yet when Apple did this in the early to mid-2000s with the iPod and iPhone, its two “big” innovations, it was mostly using outside “stuff” (especially computing chips). The company also had modest manufacturing scale and expertise as well as a comparatively tiny user base and small developer ecosystem. The Apple of 2023 is very, very different. For example, the company produces the most powerful miniaturized system-on-a-chips in the world and produces more phones, tablets, and smart watches than any other device maker in the world, which reflects both its manufacturing prowess and the desirability of its products. Apple has the most lucrative developer ecosystem in the world as well as one of its most beloved brands. The company’s XR device has the option of tapping in to other in-market devices—leveraging the user’s iPhone, for example, in lieu of a standalone pacemaker computer, or an Apple Watch to supplement hand tracking, and so on. As such, it’s not unreasonable to assume Apple’s mixed-reality device will be the most desirable, will yield the most performance per dollar cost, and will come equipped with the best interface—and the most applications, too. More importantly, this device is likely to be, well, “different.” To quote Apple expert John Gruber, “Outsiders inevitably base expectations on the current state of the art. But the iPhone was not an iPod phone. Apple Watch was not a Fitbit with a higher price. If Apple is still Apple, this first headset should be much more than a slightly nicer version of VR headsets as we know them.” That said, reports suggest the device will be at least $2,000, and more likely $3,000, which will limit its appeal and suggests that the device is primarily for those who use software to design 3D objects, such as film animators or architects—at least in the short to medium term.

Limited, but ever-expanding use is the necessary arc of new technology. At its start, both computers and internetworking were effectively limited to mega-corporations, public research labs, and government. No other groups could afford either, let alone put them to good use. VR looks likely to start with 3D-centric enterprises, as well as game-centric children and young adults. Technology, in other words, is not a “when” – let alone “when will X be mainstream” - but a “when is what, used by whom, why, and to what end”. And this is as much about the component cost, functionality, and retail price of XR devices as the software which makes these devices worth buying in the first place. This is particularly important for Meta, which loses $100–200 per unit in order to keep its price low and drive adoption, and thus needs significant usage/platform revenue in order to finance growth (note that this also means Meta has lost around $3B selling 20MM Meta Quest 2s, even after R&D is excluded). But even if Apple books it’s typical 35% gross profit margin on its XR devices, they are only viable if consumers—by the tens if not hundreds of millions—come to find them useful and when.

These use cases can come from anywhere. Watching movies in a VR theater or watching VR movies. Watching sports in a VR stadium. Virtual meetings or classrooms. VR productivity software. VR games. But it’s far from clear whether any of these categories are thus far resonating with users. And this creates the chicken-and-egg problem. Without compelling experiences, users won’t buy VR devices at scale, and but without a large and active install base, developers won’t focus on these platforms. And to this end, many developers focus on 3D applications, but few on VR.

Some of this impediment is probably hardware; that we’re not yet at the VR MVP (“minimum viable product”). It’s fair to argue that for VR to take off, we first need a device with an 8K display running at 120 Hz, thereby avoiding nausea for a substantial portion of users, that includes a dozen cameras, weighs less than 500 grams, and costs less than $1000, or perhaps even less than $500. Today’s user experience seems substandard, too. In October, longtime Oculus CTO (and then Consulting CTO) John Carmack said “The basic usability of Quest really does need to get better,” with user sessions often “aborted in frustration”; additionally, “app startup times are slow, our transitions are glitchy.” Carmack also admitted that some of Meta’s own staff got stuck in a “20 minute” and “multi-reboot” process to join the company’s 2022 Connect event. But a broader issue seems to be how VR competes with its substitutes. And if so, the MVP for VR is likely to be much higher than the aforementioned “min spec.”

Consider, for example, the primary use case for VR devices today: video games. While many mock Meta’s VR graphics, graphics don’t really matter; gameplay and fun does. Gamers, of course, know this, even if they still delight at shading Meta. This is why we’ve seen hit games are early at 1959 (Spacewar!). In 1993, an estimated 10% of Internet traffic was for text-based MUDs! (“multi-user dungeons”) When Pokémon Go launched in 2016, it was barely an AR experience. Pokémon could be seen in the real world, sure, but only because they were rendered on top of your camera’s feed. They didn’t hide behind a tree, sit on top of the grass (versus be buried into it), and so on. Still, the title amassed hundreds of millions of players. Six years later, with many true AR features now in place, the lifetime revenues exceed $5B. And today, the most popular games in the world are Roblox, Minecraft, Free Fire, League of Legends, Candy Crush, and so forth, none of which are, or even aspire to look photo-real. And when it comes to non-gamers, history tells us two important lessons. First, graphical improvements never lead non-gamers to become gamers. Second, intuitive (and typically motion) interactivity does convert this exact demo, even when the graphics are rudimentary (Wii Sports, Guitar Hero, and so on).

To drive adoption, VR games need to be better than the alternatives, such as TV, reading, board games, Dungeons & Dragons, video games, and whatever else. At least part of the time. But for the most part, VR loses the leisure war. Yes, it offers greater immersion, more intuitive inputs, and more precise (or at least complex) controls. But the downsides are many. The install base for VR is roughly 25–30MM, whereas the AAA device base (Switch, PlayStation, Xbox, PC) is roughly 350 million. Furthermore, most of the most popular games in the world are available on the latter platforms, not the former. As a result, the average VR user can only play with a subsection of their friends—a significant drawback given the nature of VR’s applications. Metcalfe’s Law implies that games become better as the number of your friends that play the game increases. Thus even if Player A prefers to play a VR game to a non-VR game, they have to so strongly prefer that title that compensates for playing without their friend and/or its VR-benefits beat social ones.

And given that we know the most popular games are not on VR, it’s likely the most beloved aren’t, either. These issues drive upstream problems, too. It’s already tough to spend $400–500 on a VR console after buying a non-VR console for a similar price; it’s harder still when it lacks many of your best friends and favorite titles. And due to the computational limitations of XR devices, there are gameplay-related constraints, rather than just limitations around battery life, comfort, weight, resolution, and so on. VR battle royale games, for example, are currently limited to two dozen players per match, rather than 50–150, as is the case on smartphones and consoles.

Because of the dynamic described above, the largest game developers are largely passing on VR game development. The player base is just too small, while these same device owners typically concentrate their play time and spend on non-VR platforms. Yes, the install base, usage, and spend are growing, but not fast enough to suggest that within three to five years, the opportunity will be large enough for these developers to prioritize over non-VR platforms. Sometimes developers will port a non-VR title to VR, but this doesn’t work for most games. Gran Turismo is a good fit because it’s a first-person title with relatively predictable behaviors and only a few other players (and fewer still are visible at any point). As such, it’s not too difficult to adapt the game’s controls or experiences, and a lower-powered VR device can still compute its gameplay. Resident Evil 7 is less structured than a racing game, but still has a first-person perspective, and unlike Gran Turismo, is limited to single-player and is played offline. Titles such as Fortnite or Call of Duty simply cannot be ported, as they’re specifically designed for non-VR platforms. This focus spans technical decisions (large maps, high resolutions, and 100+ players) and sometimes perspective (Fortnite is third-person, which would cause sea sickness) and especially gameplay (from the way building interiors are designed to the mechanics of jumping or shooting). It’s difficult enough to make titles that operate across both PC and console play well together, let alone Switch and then mobile.

Some online multiplayer tentpoles do support a VR build, even though they were designed for 2D interfaces, such as Roblox. But the very fact that these titles are “also VR,” not “only VR,” means they are limited in how they can express, or even leverage, the unique capabilities of VR. Even hit social VR titles such as Rec Room or VRChat no longer require VR—and as much of 90% of their regular users are believed to use “2D” devices when accessing the title. For sure, these titles are better in VR, but their popularity outside of VR speaks to the incentives of its developers, as well as the limited incentives of users to invest in a VR device. All of this explains why Meta is so focused on buying VR developers. Until their ecosystem can attract a critical mass of large and incipient studios, they need to prime it themselves. Many of these developers are profitable and have the option of staying independent, of course, but as subsidiaries, they can invest more aggressively, as their business case will stem from both direct revenue (game sales) and indirect value (driving overall platform adoption).

A related issue for developers stems from the number of different VR SKUs in market today as well as their rapid improvements. The latter helps the industry progress towards MVP, but also means that there is no single “VR install base” but rather many different ones. The Meta Quest Pro is a great example. It is far more capable than the Meta Quest 2 but likely has not even 5% of its users. Most developers will therefore produce titles that run on both platforms, which limits the distinctiveness of its VR-specific capabilities, while also reducing the benefit of buying the Meta Quest Pro. The Meta Quest 3, which is expected to release in 2023, will also be constrained by the success of the Meta Quest 2, and as it’s likely to be a better seller than the Meta Quest Pro (which is three times as expensive), that will further discourage Pro-only developers. It’s for this reason that video game console generations run six to eight years—compare smartphones, which update annually—with mid-cycle updates (e.g. the PlayStation 4 Pro) typically marginal and primarily price- and size-focused.

Now, games are a bit of a distraction. They’re a small market overall, generating less than $200B a year in revenue, with fewer than 350MM so-called “AAA player” games. It is niche, in other words, and not sufficient to drive XR devices into the hands of hundreds of millions, let alone billions, globally. However, the same software arguments endure for all categories. Indeed, on the enterprise side, it’s even harder. Enterprises are particularly reluctant to embrace new platforms, especially those with physical hardware. It’s costly, laborious, and slow to deploy new devices, train employees, adopt new processes and software models, and then drive the iterations required to deliver consistent (and better) net results. Accordingly, most of the market waits until the business cases clarify—and they are far from clarified today. In fact, most enterprise software has no “VR mode,” and there’s almost no VR-only software, either.

Much of this essay has focused on VR, rather than AR glasses – the XR device category that most consider to be the big opportunity – able to one-day replace smartphones as the dominant computing platform globally, with billions of daily users. The focus on VR reflects the fact that many VR (or, more precisely, MR) investments are laying the groundwork for AR. Batteries and optics are great examples here. But as you might expect, this also means that AR is even farther from “MVP” than VR/MR—and thus also farther behind schedule. Consider Microsoft’s enterprise-focused HoloLens devices. Compared to the first model, released in 2016, 2019’s HoloLens 2 had three times the resolution and augmented 60% more of the user’s vision, with four times as many processor cores, all at the same weight and size. However, the price grew from an already rough $3,000 to $3,500, and the device remained the size of a helmet. Furthermore, barely 20% of the user’s eyesight was augmented—and at a 2K resolution and 60 Hz. And when it came to environmental analysis, the HoloLens could recognize little and analyze even less. On the consumer side, Snap’s Spectacles are instructive. The original Spectacles, released in 2016, weighed 45 grams, augmented less than 10% of a user’s field of view, used a 0.5K display at 60 Hz, lacked GPS, and could not produce a 3D render. The 2021 model, Spectacles 4, added GPS and 3D rendering, but while the resolution doubled to 1K, the frame rate halved, coverage remained at 10%, and the device’s weight nearly tripled to 135 grams (typical glasses are less than 50 grams)—as did the price, to $350. And the battery supported less than 30 minutes of use. The benefits of all this were modest—basic, contextualized AR overlays (e.g., butterflies on a field), an in-ear speaker far worse than any AirPod, and eyeline-based cameras that didn’t require taking out your phone.

Many people I know believe that absent extraordinary advances in battery technology and wireless power and optics and computer processing, we simply cannot achieve the XR devices that many of us imagine and that would conceivably replace the smartphone or merely (a smaller ask) engage a few hundred million people on a daily basis. Just last December, six years after he told Venture Beat that such devices were five to seven years away, Tim Sweeney told Alex Heath, “Well, I think that augmented reality is the platform of the future. But it’s very clear from the efforts of Magic Leap and others that we need not just new technology but, to some extent, new science in order to build an augmented reality platform that’s a substitute for smartphones. And it’s not clear to me whether that’s coming in 10 years or in 30 years. I hope we’ll see it in my lifetime, but I’m actually not sure about that.” Rough!

Okay, the M-Word

So what does this all mean for the so-called “Metaverse”? Often, the topic is confused with the very idea or experience of VR/AR/XR. It’s important to recognize that the devices are exactly that—devices. They may come to be the best, most popular, or preferred way to access the Metaverse, but they are all just ways to access it. A good analogy might be the touchscreen smartphone and its relationship to the mobile internet. These devices have doubtless expanded who uses the mobile Internet, for what, and how often. But the mobile internet is not a smartphone, nor does the mobile Internet need a touchscreen. In fact, to access the mobile internet, you don’t even need a visual interface.

With this in mind, I think it’s valid to say that the difficulty in producing XR hardware will slow the emergence and thus the growth of the Metaverse. At the same time, VR isn’t fundamentally different from today’s rendering in a video game such as The Legend of Zelda or Fortnite or Rec Room. It’s harder because of device limitations, and it uses and requires different inputs, but the same processes are taking place. AR is far more different from today’s virtual experiences because it involves rendering objects on top of reality while also scanning and, ideally, interacting with it. Yet this functionality is being rapidly embraced by smartphones. Yes, they’re less intuitive and seamless than glasses hope to be, they’re already in use by billions globally, and they are improving annually. This is an unexpected and consequential as the challenges of XR proved to be. Consider the following 2022 quote, which comes from Neal Stephenson, who coined the term “Metaverse”:

“The assumption that the Metaverse is primarily an AR/VR thing isn’t crazy. In my book [Snow Crash, published in 1992] it’s all VR. And I worked for an AR company [Magic Leap]--one of several that are putting billions of dollars into building headsets. But I didn’t see video games coming when I wrote Snow Crash… Thanks to games, billions of people are now comfortable navigating 3D environments on flat 2D screens. The UIs that they’ve mastered [keyboard and mouse for navigation and camera] are not what most science fiction writers would have predicted. But that’s how path dependency in tech works. We fluently navigate and interact with extremely rich 3D environments using keyboards that were designed for mechanical typewriters. It’s steampunk made real. A Metaverse that left behind those users and the devs who build those experiences would be getting off on the wrong foot… My expectation is that a lot of Metaverse content will be built for screens (where the market is) while keeping options open for the future growth of affordable headsets”

For what it’s worth, the future that the AR/VR world Snow Crash was set in? It took place in the early 2010s :).

Matthew Ball (@ballmatthew)

Receive every essay, day of release.