News

Training artificial intelligence with copyright-protected images

article_published_on_label: June 30, 2021

The Copyright Information Point (CIP) receives many interesting questions that could easily be on an exam about Dutch Copyright Law. In the current and upcoming WUR Library Newsletters, we will highlight some of these because they're too interesting to keep to ourselves. Today's question is about training artificial intelligence and text and data mining.

The question

Can you use copyrighted images to train artificial intelligence (AI)?

The CIP's answer

Since 7June 2021, members of the WUR community are allowed to use copyright protected material to train AI’s. Let’s have a look at what’s changed and what this means for you in terms of text and data mining.

Training artificial intelligence with image datasets

To train an AI with images, you need to create a training dataset that contains the images you want to feed into the algorithm. These images may have been scraped from the internet and will likely include works that are protected by copyright. Downloading and creating the training dataset is considered a reproduction of those works. This is not allowed unless you have permission for all copyright protected materials or if you can rely on a statutory exception.

New Statutory Exception from 7 June 2021

On 7June 2021, a statutory exception came into effect for research organisations and cultural heritage institutions: they are allowed to use text and data mining for scientific research without asking permission from or compensating the right holders. This exception was created through the implementation of the European Digital Single Market Directive.

This means that since that date, WUR is allowed to use copyright protected images to train AI without infringing copyright. The directive, however, has the following restrictions and limitations:

The exception only applies to reproducing the protected works. The dataset may not be made publicly available or shared beyond WUR researchers.
You are using the materials for scientific research purposes.
The materials are properly secured by appropriate technical measures, including access on a need-to-know basis. Furthermore, the materials may be retained only as long as they are needed for the research and validation.
The materials were legally obtained. Scraping datasets is not allowed if you know or should know that the dataset contains materials that were put online without the owner's consent.
In some cases, the right holders (e.g. publishers) may apply terms to use their data, such as consulting them before scraping so you don't overload the servers.

If you have any question about copyright law, please feel free to send it to copyright@wur.nl. Don't worry! We will only write about your question with your permission.

Note: WUR is currently working on a Software IP policy and this will include text and data mining. Stay tuned!