This article was originally published with The McCarthy Institute for the 2022 IP-Con in March 2022.
To ensure an AI system’s protection as IP, attorneys must work with data scientists to understand which component(s) of their AI system are the key factors driving differentiation and, critically, why.
With the rise of artificial intelligence (AI), protecting the value of AI innovations is becoming an increasingly important topic in IP law. Complicating this is the advent of data-centric AI, a new movement focusing on data as the main value driver for AI, with the code, algorithms, and models secondary.
In a first case of its kind, Health Discovery Corp. v. Intel Corp., the West Texas District court issued a ruling with implications for how AI can be protected. Specifically, it found that “an inventive concept could not be found in inventions that simply improved data quality, reduced error rate and yielded more accurate data.” Ensuring the protection of an AI system’s components—data and algorithms—is challenging because these components are regarded by IP law as mathematical methods and facts, which are not protected works. To solve this, the AI system’s components need to be linked to a task’s improvement under patent law, a unique arrangement under copyright law, and commercial advantage under trade secret law.
Data as the Value Driver for AI
Historically, AI development has been more model-centric. The value of an AI system has been attributed to algorithms – code, models, and methods were the key components for improving a task’s accuracy. However, over the past decade, there has been immense innovation in algorithms—and the open-source libraries that implement these algorithms—leading to a commoditization of models for most practical use cases. This has led to a data-centric AI movement, in which data scientists attribute data used to train the algorithm (“training data”) as the primary value driver for an AI system.
AI systems learn to solve tasks by first “observing” many examples of training data which describe the domain, problem, and solution for the given task. Specifically, for supervised machine learning, the training data is composed such that each example has features that the AI system studies and an outcome for the AI system to learn to predict. As such, an increase in the volume of data collected generally results in a more predictive AI system, particularly for deep learning-based systems.
Research has shown, however, that data volume is not the most important component of the data used to train an AI system. Many other characteristics associated with data quality, such as data correctness, completeness, and relevance, among others, play an important role in ensuring an AI system’s success. That is, the way in which data is collected, linked, modified, adapted, and weighted to train an AI system—data formation—is critical to the system’s outcome. In fact, considering data collection and curation as a process linked to improvements in AI outcomes may be key to ensuring patentability of future AI innovations.
Defining Unique Properties of Data to Support IP Protection
Literature on data value has been discussed in the context of measuring AI task contributions, determining market equilibriums, and assessing business competition, among others. Critically, however, a framework for assessing the value of data does not exist in IP law. Such a framework could serve as a standard tool used in IP’s case-by-case assessment of the value, originality, and contributions of work.
Specifically, high quality data can be thought of as an intermediate product in the generation of AI outcomes. The process for collecting the data, curating the data, and training the model on the data are all pieces of an AI innovation which independently and collectively improve outcomes. To describe the way that an improvement in data formation leads to an improvement for an AI system’s outcome, a framework for data value and its associated impact on AI outcomes is needed.
All data has a shelf life, and the world’s most successful technology companies focus on their data formation process and flows of data, rather than on a specific stock of data at any point in time. By shifting the industry thinking towards data formation processes as a protectible work, and understanding how an improved data collection process produces better data and, in turn, better AI outcomes, the value of innovations can be better captured.
 “Time Dependency, Data Flow, and Competitive Advantage,” Harvard Business School