Insight

COPIED Act of 2024: Protecting Creative Works in the AI Era

Executive Summary 

  • In July, a bipartisan group of senators introduced the Content Origin Protection and Integrity from Edited and Deepfake Media Act (COPIED Act) to promote transparency in artificial intelligence- (AI) generated content and protect creators from having their digital representations used or altered by AI.  
  • The COPIED Act would mainly direct the National Institute of Standards and Technology (NIST) to develop standards that facilitate the detection of synthetic content and prohibit the use of protected material to train AI models or generate AI content, giving creators more control over their images, likenesses, and copyrighted material.  
  • As Congress debates the bill, it should not overlook the unresolved questions about copyright protections regarding AI training data – or the impacts the bill would have on free speech and AI innovation – to ensure that NIST does not struggle to establish effective guidelines without this necessary information. 

Introduction  

As artificial intelligence (AI) takes on a larger role in content creation, the challenge of distinguishing between human-created works and AI-generated content grows more difficult, raising concerns within the creative industries about originality and copyright. In response to this challenge, a bipartisan group of senators introduced the Content Origin Protection and Integrity from Edited and Deepfake Media Act (COPIED Act) to promote transparency in AI-generated content and protect creators from having their works and digital representations used or altered by AI. 

The COPIED Act would mainly direct the National Institute of Standards and Technology (NIST) to develop standards that facilitate the detection of synthetic content and prohibit the use of protected material to train AI models or generate AI content, giving creators more control over their material. Because these provisions would give creators and other users greater control over their images, likenesses, and copyrighted works, the bill has received broad support from large stakeholders in the creative industries, who have praised in particular the legislation’s provisions to allow owners of copyrighted works and state attorneys general to take legal action against those who misuse copyrighted content.   

The bill is not without concerns, however. First, by essentially prohibiting the training of AI models on data with content provenance marks, the bill would undermine copyright law and could raise First Amendment issues, as it eliminates the fair use right that allows for certain uses of copyrighted material. Second, by making it illegal to knowingly alter or disable content provenance information, the bill could inadvertently discourage legitimate and valuable forms of expression, such as parody and satire, as well as other forms of speech designed to expand on existing works. Finally, by increasing potential liability for AI developers, the bill could harm innovation and development of new models, as firms no longer have the necessary training data to improve their systems. Many of these issues are currently being litigated in the courts and have generated legislative and academic interest. To ensure that NIST doesn’t struggle to establish effective guidelines, Congress should not overlook concerns with the bill in its current form.  

The COPIED Act 

The COPIED Act contains two main components. First, the synthetic content detection component fosters the development of standards and research that promote advances in the detection of synthetic content, watermarking, and content provenance information,  and requires AI developers to equip users with the ability to include provenance information to indicate the nature of the content, and prohibit the removal, alteration, or disabling of this origin information. Second, the copyright component would allow the creators of the works to prohibit using content with provenance information for AI training or generating synthetic outputs without consent or compensation. 

The bill is designed to increase transparency and safeguard the originality of creative works, protecting the creative industry from AI theft. By developing standards that facilitate the detection and labeling of synthetic or synthetically modified content and to promote research advancing these practices, the creative industry could better manage its images, voices, and likenesses, and general users would have the tools to distinguish original, legitimate works from synthetically created pieces.  

Further, by requiring AI developers to equip users with the ability to include provenance information and prohibiting the use of such content for AI training, the bill would seemingly preempt potential fair use arguments currently being considered by courts. Theoretically, reproducing a copyrighted work to train an AI model would violate the copyright, but the fair use doctrine grants permission to use copyrighted works for commentary, news reporting, academic purposes, and research. The debate centers on whether training AI with copyrighted materials is permissible or if it constitutes copyright infringement.  

Finally, the bill could also help consumers identify AI-generated content. AI-generated content can digitally represent anyone and alter digital representations of public figures to create new content that can be deceptive or spread misinformation. This has led to increased calls for tools to facilitate the identification of AI-generated content and to trace content provenance, which can include the origins of a piece of digital content, and information such as who created or edited it and how, when, and where. By improving access to content provenance, the bill could make it easier to identify mis- and disinformation. 

Concerns With the Bill  

The COPIED Act raises several concerns regarding copyright, free speech, and innovation in AI. First, courts are currently considering whether AI training on copyrighted material constitutes fair use, and by dismissing fair-use arguments entirely, the bill could undermine Copyright Law and the First Amendment. AI training data consists of information used to teach models to make accurate predictions or decisions. This data may range from text, such as literary works and academic papers, to images and videos that help AI models understand and generate human language. Some of these sources may be copyrighted, but AI developers rely on the fair use doctrine, which allows specific uses of copyrighted works. The bill’s prohibition on training AI models with data marked with content provenance raises significant concerns about using existing works to train models and the free speech rights therein. Declaring attaching provenance information prevents the use of that content for new creations or AI model training without consent challenges the fair use principle under copyright law, and it could intensify concerns about AI development. 

Second, by limiting the ability to remove content provenance or watermarks, the bill could limit free speech of outputs, specifically regarding transformative works. Copyright may limit speech by restricting the reproduction of creative works without permission. Fair use, however, exists to safeguard against these restrictions focusing on whether the use is “transformative,” meaning it adds new expression, meaning, or a different purpose to the original work. Therefore, by making it illegal to use digital representations of copyrighted works with content provenance to train AI or create synthetic content, the bill could inadvertently discourage legitimate and valuable forms of expression such as parody and satire. These creative forms of expression sometimes use existing content in a transformative way, which might involve changing the original information in a way that could be seen as altering provenance.  

Third, beyond addressing the specific challenges AI presents to various fields, policymakers face the broader task of mitigating AI’s risks while also ensuring its development. Data is the foundation to create and improve AI models and the quality and quantity of that data is important to ensure the systems make correct predictions and are free of bias. Content provenance information, which tracks the origin and history of content, and the ambiguity of what constitutes “fair use,” can restrict or limit the access to data, increase uncertainty for AI developers about whether they can legally use certain datasets, and potentially hinder the development of AI. By increasing potential liability for AI developers, the bill could harm innovation and development of new models as firms no longer have the necessary data to develop their systems. 

Finally, the bill may be premature as there is an increasing number of lawsuits against AI companies that may provide direction for some of the bill’s provisions. Additionally, earlier this year, the U.S. Copyright Office issued a letter highlighting its commitment to addressing key copyright concerns related to AI. Therefore, it would be wise for Congress to hold legislation until such legal, private, and academic actions can provide additional guidance. Clear, well-informed legislation will be critical to ensure that agencies such as NIST can create effective guidelines that do not unnecessarily stifle AI innovation.  

 

 

Disclaimer