Most people conceive of [software] as lists of instructions that tell a computer what to do. Examine the [bottom] side of a compact disk: you are looking at millions of microscopic bits of information. The information on the disk is a formula for constructing an event in time. Software is the same. A software machine is a real machine, and independent of the electronic box that decodes it, in the same sense that the music coming out of a CD player is an independent reality.— David Gelernter, Mirror Worlds (1993)
Investors today are surrounded by businesses whose competitive advantage purports to be data, or its derivative, network effects. But what is data and what is software? Is data the new oil? Is it something else entirely? If competitive advantages derive from possession and acquisition of data, then what prevents a competitor from simply copying the data? Isn’t data today both abundant and frictionless? Could competitive advantage arise from AI / ML algorithms, when they are largely unpatentable? If our world is comprised of data, then where do profit opportunities come from and where might they emerge?
To understand data and software, and its competitive advantage, seems to require reconsideration of its essence. We must first understand what data and software truly are. This essay explores these ideas from an investing context, but first some definitions are required:
1) Data is unprocessed raw facts; 2) Information is summarized and compressed data; 3) Knowledge contextualizes Information and leads to decisions (not discussed); 4) Software is computation which transforms Data-to-Information-to-Knowledge.
Credit: data-information-knowledge-humanities [figshare]
Data (1) encodes for (describes) physical reality, we consume it as compressed Information (2) which summarizes a specific state of the physical world. An excel financial model is summarized information describing the summation of profitability of all constituent parts of a company. Messaging and email compresses in-person communication into something portable. Virtual reality may eventually encode for physical experience at a quality indistinguishable from reality itself. Less obvious might be that Salesforce Customer Relationship Management (CRM) software contains information that represents all the minutes and seconds of customer interaction, occurring across the company over a period of time.
Software (4) is an information machine which converts Data into Information, and Knowledge (3), through computation. While often incomplete, an Excel financial model allows a financial analyst sitting on the other side of the world, the ability to understand the overall profitability of a company, as-if we collected every piece of financial data ourselves – invoice by invoice, paystub by paystub -- to simultaneously see the cohesive whole. David Gelernter referred to this as “topsight”: an intuitive understanding of the whole enabled by an ability summarize and experience reality from a new place of perspective.
This essay is structured as follows:
- Why information describes the physical world around us, separating hardware from software;
- Re-imagining the business of real estate brokerage in terms of endpoints, data, software and networks, to re-frame competitive moats of the physical world as information-problems;
- Why the collapsing cost of information frames how to think about where competitive moats might diminish and where they are likely to reappear;
- Explore why software businesses today might be both more predictable and investable than they’ve been in the recent past.
In generalizing information problems to their underlying essence, we may not only be able to describe new business models we encounter, but predict where profitability might emerge in yet unforeseen places.
A Thought Experiment of Atoms and Bytes
Imagine a world which can be encoded entirely in terms of information. Store shelves are data-sets of products and services, with price and availability (in-store, available now) – the equivalent of an Amazon.com search that tells you price, availability, and delivery times. Coca-Cola’s brand is not a red and white wave, but information which describes all the positive (and negative) interactions you’ve had with the brand over the last 10-years: the polar bear, advertisements of a cold beverage on a hot summer day, your memories, the news articles on the health risks of sugar.
Your local real estate broker is not a person standing opposite you, but rather the accumulated information of buyers, sellers, apartments, neighborhoods and prices: he or she is the combination of “hardware”, physically visiting apartments, and “software”, memories of comparable pricing for recent transactions and intuition, to improve your real estate purchase.
It is as if the world around you can be described entirely in the 1’s and 0’s of information:
Credit: The Matrix, Warner Brothers Films (1999)
We focus here is on the intangible economy: finance, marketplaces, entertainment (in digital form), and services; these conclusions apply less to the tangible of commodities, energy, capital machinery, and infrastructure, which are problems confined to atoms, not bytes.
Our thought experiment involves re-framing widely understood competitive moats of the physical world into their underlying information phenomenon. To do so, we use the language of information, software and networks, to describe the competitive moats of atoms, in the information of bytes. Our goal is to consider the information of software independent of the physical “hardware” through which we experienced it.
Real Estate Brokerage as an Information Problem
What ultimately may be true is many businesses at their core revolve around information facilitation: consumer brands convey quality information, capital markets convey asset & cash-flow information, sales executives convey customer information, real estate brokers convey location information, and distribution conveys price / availability information. If you are looking for some underlying truth that broadly explains the disruption occurring, it may be this: the marginal cost of acquiring information has plummeted (The Investing Meta-Game).
Let’s consider the business of real estate brokerage as a simple information problem. These types of information marketplaces exist today in the guise of Ebay, Etsy, Leboncoin, and Craigslist.
Real estate brokerage can be described in its digital building blocks:
- Endpoints: where the physical world meets the digital world, where “hardware + software” convert atoms-to-bytes, in telephones, smartphones, desktop PC’s, and the like.
- Data/Information: bytes which summarizes tangible (appearance, size, neighborhood, schools) and intangible features (prices, preferences, utility) of the physical world.
- Software: computation which transforms (atoms-to-data-to-information) and summarizes data (recommendations, advice, transaction-matching).
- Networks: connections between discrete endpoints, data-sets or software, through protocols, application programming interfaces (API’s), and the like.
Endpoints: Biological (humans: hardware + software) and digital (smartphones, desktop, voice) endpoints are costly to maintain and access. Humans require continual calorific intake and respiration; must transport ourselves over long distances; imperfectly encode information (hard to summarize our preferences); continually convert data to information (intuition); and are creatures of habit whose behaviors are difficult to change. Digital endpoints are also costly to acquire, whether through tied services (Alexa), hardware distribution, or Customer Acquisition Costs. At the end of every digital endpoint, smartphone or otherwise, exists another human.
Individual buyer. physically acquire costly information (drive across town, schedule meetings, automobiles / gasoline) and compress / summarize information (Upper West Side apartment is nicer than Harlem).
Individual seller. physically encode and transmit information (photos, price for sale, seller preferences) to one or many brokers, or listing services.
Desktop / smartphones. new digital endpoints which replace the physical activity of visiting apartments, by summarizing the world (information) through a computer screen in front of you.
Data/Information: Costly or restricted access, imperfect encoding / noisy / poor predictive power, costly to duplicate (each buyer must visit an apartment). Decreasing returns to scale: more data doesn’t always create better inference (predictions). Data is not valuable in isolation: its value is crystallized by interactions with other data. Profits from data seem to be a proxy for time-saved, costs avoided, (capital) cost of duplication, and, ultimately, value created through improved outcomes.
Apartment listings. describes tangible features such as size, rooms, sunlight, building, noise, local schools, amenities. Access to high-quality information often substitutes for visiting the apartment itself: we skim hundreds of listings and visit a few.
Buyer preferences. price, location, renovations, white goods, neighborhoods, colors, style, etc. Data is difficult to summarize and costly to access: what we’re looking for is hard to define, and takes many viewings.
Seller preferences. price, expected sale date, other offers, ancillary financial considerations. A broker often tries to access and summarize this data, but it is costly in time, with poor prediction.
Prior transactions. exists in databases or in the memory of neighborhood brokers (“human computation”) who can recall prior transactions. Used to compute fair price.
Software: Biological or digital, these use human-cognition, machine-learning or artificial intelligence to infer probabilistic models of relationship between discrete data based on current and past experience.
Real estate broker. “human computation” comprising access to pricing data (neighborhood transactions), connections to buyers / sellers (preferences, financing, price information). Interpretation and matching of real estate transactions through inference. Also responsible for identity, security, and compliance: identifies you by name & face (identity), ensures you have the necessary funds for purchase and escrow / title (security), and may implicitly consider KYC/AML issues (compliance). This last point is unnecessary but important in a software context.
Networks: Complex interactions, or inter-dependencies, between discrete endpoints, data and software. Point-to-point connections are easier to form and replicate; complex networks are harder to replicate and replace (“switching costs”).
Multiple Listing Services (MLS): middle-ware, data graph, or closed-protocol. May be a data-set or connections between endpoints (brokers, buyers, sellers). Access is often restricted or subject to licensing requirements. Monetized through open- (advertising, commissions) or closed-forms (subscriptions, fees).
Informal or Social Networks: comprise informal many-to-many connections between, other brokers (gossip), proprietary information (institutionalized learning, internal data), diverse data-sets (school information). Access can be restricted by licensing requirements, social standing, or job availability.
Information Moats of Restriction, Scale or Complexity
Jerry Neumann [reactionwheel.net], a VC and professor at Columbia University’s Engineering School, crowd-sourced a comprehensive list of competitive moats. While not a perfect analogue, one way to consider them in a digital context is in terms of their underlying information phenomenon: Information Restriction, Information Scale, and Information Networks / Complexity:
Credit: Jerry Neumann - A Taxonomy of Moats [reactionwheel.net]
Our real estate brokerage benefited from features of the physical world which were competitive moats of information: Restriction might be preferential access to listings or apartment viewings; Scale might be visiting many apartments to serve many buyers, being a full-time job; Networks might be the information interactions of licensing, market gossip, trust / reputation, and access to many buyers and sellers.
Each moat also has its inverse, which creates fragility. For instance, tying/bundling to a legacy (high) cost structure can be a source of vulnerability when marginal costs dramatically change. The business of music stores was to distribute bytes with expensive atoms: then distributing bytes became practically free.
Diminished Marginal Costs of Information
Re-framing physical reality into an information-problem allows us to do something novel: understand the economic implications of vastly reduced costs of information in terms of replication, transmission, and inference.
Three important marginal costs changed:
The costs of data-replication plummeted. Sensors are ubiquitous, from our smartphones to software, which converts physical (data) reality of atoms into the compressed information of bytes (machine vision, translation, IoT, 5G).
The costs of data-transmission plummeted. The compression of information improved and our ability to transmit large amounts of information over long distances through fiber or wireless spectrum.
The costs of inference (prediction) plummeted. Software computation replaced many hardware/software features that were solved by machines or human workers.
The first-order effects are well understood, the free flow of information overwhelmed many pre-existing points of information friction: our local real estate broker no longer controlled the endpoints (smartphones, desktops, or human participants), nor demand data (buyers preferences), nor supply data (seller preferences), nor inventory data (apartments), nor the matching algorithms (human judgement), nor access to networks (MLS, informal social networks, licensing – to some extent), nor scale-economies (costs of acquiring data). It is mostly regulation and licensing requirements which maintains the real estate brokerage business today. When marginal costs in a value-chain drastically change, it is the places of scale and profitability, which are re-arranged.
The second-order effect is bytes replaced atoms: endpoints, data, software, and networks, allowed us to separate the hardware of the physical world from the information it represented: hotel brand franchises scale operational information (branding, trust, operational know-how) while commoditizing hardware (hotel real estate), creating a capital-light “software” business. Owning a hotel becomes an exercise involving a cap-rate calculation, local zoning information and an executed franchise management contract.
Uber / Lyft in their idealized forms turned local-scale capital-intensive hardware endpoints (taxis), which collectively encoded information (availability, location, price, quality), into a regional-scale software problem (computation, data, demand acquisition, supply acquisition, market liquidity), shifting economies of scale away from the aggregation of hardware (taxi fleets), into scale economies of information and digital endpoint acquisition.
Technology investors likely understand these concepts, as they seem to describe how higher levels of software abstraction (Windows operating system) commoditized the simpler hardware components they operated on-top of (PC-compatible hardware).
Higher Level Abstractions: Building Better Moats
Our thought experiment allows us one last trick. If our real estate broker was the neighborhood oligopolist of the physical world, what does this say about where competitive moats might next emerge in a digital-space?
Where new competitive moats seem likely to emerge is in higher levels of networks, complexity and inter-dependency. In its physical terms, the profitability of a hub-and-spoke network airline seems to require the careful balancing of feeder flights, hub-to-hub connections, and international point-to-point routes. To displace this network of flights, a competitor must replicate its cost and availability in a specific region, or country. Networks must be defeated at all points simultaneously (by other network airlines) or from new vectors to which they are disadvantaged, such as low-cost point-to-point city pairs (LCC’s). Network profitability plummeted, however, when all competitors operated at similar complexity: many similar sized network airlines brutally competing in a high fixed cost, low marginal cost business.
It is now common to describe many businesses as circular “flywheels” of positive-feedback effects: curiously, many seem to describe global-scale organizations created by pervasive low-cost communications networks:
Credit: Amazon Flywheel [Futureblind.com]
Amazon’s flywheel around its merchandising algorithm [Zach Kanter] would not be possible in a world lacking low-cost seamless communications. Displacing Amazon requires replicating the data inter-dependencies that makes its products and services great: a competitor needs access to billions of endpoints, trillions of data-points of products and services, efficient software pricing algorithms, seamless interconnection between merchandising, logistics, compute, storage, marketing, HR, and an ability to rapidly evolve with customer feedback. All combined with some very traditional scale-economies of atoms: like global-scale logistics / fulfillment and cloud computing (AWS).
It seems it is not any of these loops in isolation which creates a competitive moat, it is the complexity of their interactions, and their difficulty of replacement, which creates abnormal profit opportunities. Adding to the challenge, the rapid iteration of software delivered via the internet creates programs (and services) which rapidly evolve in their environments.
And yet any optimization of hardware or software features for a specific environment, can be enormously fragile under different conditions: price and uncertainty still matter in investing, with price paid the primary calibration tool for uncertainty.
But how do we value these networks of complex inter-dependency?
For all their fragility, cities are remarkably robust over long periods of time, and seem to exhibit similar stacked inter-dependencies of roads, avenues, sewers, schools, services, people, social groups, etc. These features seem resilient except during paradigm-shifts (wars, pestilence, famine, environment-scale catastrophe), when inter-dependencies are violently re-arranged in ways which disrupt many of these features simultaneously.
The disruption of AOL Instant Messenger by the mobile phone, violently re-arranged the complex networks of inter-dependencies of endpoints, data, software and networks, allowing for new messaging paradigms to emerge.
Creating Abstractions: When Computers Started Communicating
Network effects may be an emergent phenomenon which arise from complex information inter-dependencies enabled by low-cost communications. They first emerged in landline telephones when the technology drastically reduced the cost of communications, thus creating low-cost information networks. And each step-down in the cost of communication created a more complex tapestry of information networks.
In a famous email exchange Warren Buffett explained why he would not invest in technology or software, as he found the outcomes too hard to predict. That may still be true, but unmistakably something changed when our computers started communicating; there is a reason biologist E.O. Wilson’s famous book is titled “The Social Conquest of Earth”:
“Your analysis of Microsoft, why I should invest in it, and why I don’t could not be more on the money. In effect the company has a royalty on a communications stream that can do nothing but grow… [AT&T] should have anticipated [Microsoft / software] and let someone else put in the phone infrastructure [hardware] while [they] collected the by the minute and distance (and even importance of the call if [they] could have figured a [way] to monitor it) in perpetuity… but I don’t feel I am capable of assessing probabilities [], except to the extent that with a gun to my head and forced to make a guess” -- Warren Buffet’s email Jeff Raikes of Microsoft (1997)
Perhaps we are now entering an investment world defined by platforms whose dominance is predictable in S-curves, resulting in rapid, dominant growth over shorter, predictable periods of time, until the paradigm changes? Or perhaps corporations are becoming “cities”, whose manageable complexity creates more robust businesses which are harder to displace, as our Amazon example might suggest? Could the computing-cloud be the next city built in the digital-world, laying the sewers, roads, legal codes, and institutions of complex networks, by leveraging the scale of atoms, into the complexity of bytes, enveloping the physical world in a seamless computing fabric? Will new regulations recognize that bytes are really atoms, and entirely change the rules of the game?
The investment solution seems to revolve around acknowledging these new realities: software over hardware, a focus on hyper-scale, or not easily digitized, hyper-local “information” businesses; reduced information asymmetries mean investing is not about finding a cheap multiple, but finding what is not obvious to a computer, and so on. Yet in the investing sphere, many investors may inadvertently be betting on businesses of software and information, hidden behind the facade of hardware. It seems the bricks and mortar of many businesses, are really information problems in disguise.
Thanks to H.S. and N.L. who helped me think through many economic problems of the (internet) cloud; K.P., the bean-counting actuary, who kindly pointed out that prediction is really inference, algorithms are a class of computation, and data, information and knowledge are all different things; and R.D. for extensive feedback and edits.