Almost 20 years ago, a startup named Jigsaw pioneered a new crowdsourcing model where users would contribute data to a platform in exchange for access to its services. Jigsaw is largely forgotten today, but its so-called “give-to-get” model could be perfect for AI startups that need to obtain rich proprietary datasets to train their models. These datasets are crucial to improve accuracy and performance of AI models, to provide a competitive advantage over rivals, to allow customization and specialization for industry-specific needs, and to reduce reliance on third-party data sources. This post will discuss the Jigsaw model, its applicability to AI, the challenges in obtaining proprietary training datasets, and the industry verticals where it could be applied.
Jigsaw and the Give-to-Get Model
Jigsaw Data Corporation was founded in 2004 by Jim Fowler and Garth Moulton. The company’s primary product was a large, crowdsourced, searchable database containing millions of business contacts across various industries. In the days before everyone had a LinkedIn profile, this was particularly valuable to sales people looking for prospects.
Jigsaw’s crowdsourcing model revolved around a points system. Users could create a free account on Jigsaw’s platform by contributing their own business contact information. They could also add new contacts to the database to earn points, which they could then spend to see contacts posted by others. Users who didn’t want to contribute their own data could purchase points. Jigsaw also encouraged users to verify the accuracy of contact information in the database by rewarding them with points for each correction made.
In 2010, Salesforce.com acquired Jigsaw for $142 million, rebranded it as “Data.com” and integrated it with the Salesforce.com ecosystem. This allowed users to access updated business contact information within their CRM system directly.
The Give-to-Get Model for AI
The give-to-get model, where users earn points by contributing data and spend points to access services based on that data, could be an effective approach for AI startups. In many industry verticals, obtaining a rich proprietary dataset will be the key challenge to producing a differentiated AI model. By incentivizing professionals in that industry to share the necessary data, AI startups can rapidly train and improve their models to serve those professionals.
For example, an “AI architect” startup could give users points for contributing architectural plans and CAD drawings. Users could then spend points by asking the AI to design new plans. This approach could be used across a variety of industries where users have proprietary data and would be willing to contribute some of it in exchange for harnessing AI capabilities.
Incentivizing users to crowdsource could be a cost-effective way to acquire large amounts of data, as it leverages the efforts of a community rather than relying on paid data collection services. As users contribute more data and use the AI’s services, the model can be iteratively improved, leading to better performance and more valuable insights.
There will be some important issues to work out. Ensuring the quality and accuracy of the contributed data is critical. Startups may need to implement verification processes, such as peer review or expert validation, to maintain data quality. Handling proprietary data also comes with privacy and intellectual property concerns that will need to be addressed. Startups may need to ensure that certain data will be used for training purposes only and be transparent in how the contributed data is used. Compliance with industry-specific regulations is also crucial.
Finally, the need to monetize must be balanced against the points-based system; otherwise users may prefer to use the platform for free forever by contributing data rather than paying for services. Points could be limited so that users earn a discount or are awarded more queries as opposed to receiving the service completely for free.
Opportunities in Diverse Industries
A give-to-get, crowdsourced data collection approach could be applied to a number of industry verticals where the target users are in possession of the training data. Here are some examples of where this approach could be useful:
- Medical and health data: AI models can greatly benefit from access to diverse patient data, such as electronic health records, medical imaging, and genomic data. Users (patients or healthcare professionals) might be willing to share anonymized data in exchange for points, which could then be used to access AI-driven health insights, personalized treatment suggestions, or early disease detection.
- Legal document analysis: Law firms and legal professionals often have access to large collections of legal documents, such as contracts, court rulings, or patent filings. By sharing these documents, users could contribute to training AI models for legal document analysis, and in return, gain access to AI-driven legal research tools or contract review services.
- Art and creative work: Artists and designers may possess large collections of their own artwork, sketches, or designs. Sharing this data could help train AI models for artistic style transfer, generative art, or design assistance. Users could then access AI-driven creative tools or personalized design suggestions.
- Finance and investment: Financial professionals and investors may have access to proprietary trading algorithms, portfolio data, or market analysis reports. By sharing this data, they could contribute to AI models for financial analysis and predictions. In return, users could gain access to AI-driven investment advice, risk assessment, or market forecasting tools.
- Scientific research data: Researchers in various fields might have access to valuable datasets generated through experiments or simulations. By sharing this data, they can help train AI models for data analysis, pattern recognition, or predictive modeling in their respective domains. Users could then access AI-driven research tools or personalized research recommendations.
- Manufacturing and production data: Companies involved in manufacturing and production may possess proprietary data on production processes, quality control, and equipment performance. Sharing this data could improve AI models for predictive maintenance, process optimization, and quality assurance. Users could then gain access to AI-driven optimization suggestions or equipment monitoring services.
Obtaining rich proprietary training datasets will be the key challenge for startups looking to create AI models for industry verticals. Crowdsourcing this data from the professionals who work in these industries may be an excellent approach to solve this problem. Moreover, crowdsourcing should create a flywheel: as users contribute data to the model, the model gets smarter and more capable, which draws in the next set of users, who provide the next set of data. This data network effect should create a strong moat around the business. That said, potential risks or downsides related to the give-to-get model, such as data quality, privacy, and intellectual property concerns, must be addressed proactively by startups in order to ensure the long-term success of their AI models.
I used ChatGPT-4 to help me write this blog post. You can see how we worked together here. I look forward to your comments.