Data archiving: a double interview

In 2013 the Wageningen UR Library has begun to support archiving research data. Frits van Evert, researcher at Agrosystems Research, supports archiving research data in a sustainable way. For this reason, he works as a kind of ambassador visiting a number of PSG chair groups and business units to explain the advantages plus the ins and outs of a good data archive. Library employee Anne-Marie Patist studies the processes of archiving and all the information she needs for the processes. Together they’ve carried out a few pilots. In a double interview, they discuss the how and why of data archiving.

Why data archiving?

Frits: “The research is finished, the article is written. What happens with the research data then? The raw data is saved on a computer or a CD rom and can now be used only with extra information. If the researcher leaves, then the research data is as good as useless. That’s too bad since a lot of material has been collected but no-one can do anything with it!”

Why should a researcher archive data?

According to Frits, data archiving offers the researcher several advantages:

You’ve organized your own measurement data.
They’re data from scientific research. And then scientific integrity gets involved: Others have to be able to repeat your research (reproducibility) or at least have to be able to check the analysis of the data
Your data can be used in other research, for instance, in meta-analysis or in building a model.
An article with the data added instils more trust and that's been proved to increase citations of your work
It helps in giving evidence if you become involved in a legal process.

What support does the Library offer? Does it reduce the researcher’s work?

Anne-Marie: “If we reduce a researcher’s work is still difficult to say. Archiving research material does, of course, cost the researcher time. We email or call back and forth because we need specific information. A ‘Read me’ text has to be put together on research data. And the methods used have to be written down. Frits: “This extra time investment depends, of course, on the extent to which the data is already described. For example, how are the columns in an Excel file defined? Does the dataset contain field-specific terms and if so, are these explained somewhere? Do the data contain a lot of 'non-information' etc.?"

What does data archiving entail?

Frits has experience with depositing datasets in the Library: “I’ve written a paper on processing images to recognize a certain type of weed. Then I offered the accompanying data, the images that I had on my computer, to the Library in a zip file and now those images are in the E-depot. It wasn’t a lot of work because all the accompanying information was already nicely detailed in the article. The publisher’s website now has a link next to my article, a link that goes to the visual material in the E-depot. For another article on extra fertilization of potatoes, I placed the data as a text dump in the E-depot before the article was published. Because I did this, the E-depot link could immediately be attached to the article, and the link is now not only on the publisher's website but also in the printed version of my article. You help yourself by saving material this way: I still use these data, and because I've properly archived them, I can always access them.
Anne-Marie adds: The E-depot does now have data sets, but they won't stay there! We are working on sustainably archiving datasets , accompanied by a 'read me file' and a 'methodology file' in national archives like DANS (Data Archiving and Network Services) and 3TU Datacenter Delft. The data has to remain available over the long term. That’s why the data are converted into sustainable formats before they are sent to the national archives. The formats are independent of specific versions of software. After the datasets are checked by DANS or 3TU Datacenter employees, they are published. The persistent link received from the archive centre is then the only thing that is stored in the E-depot. We use this link in our systems (Staff Publications and E-depot). This entire process costs time, but it can't be done at the expense of the time dedicated to publishing. If researchers want to refer in an article to the data, then the E-depot link can be created and given to the author before the paper is published.

How to retrieve my material?

In the registration system Metis and in the database Staff Publications, you can find a compound document. The article and its datasets have been combined. However, the individual dataset is also visible in the publication list. From Staff Publications, the information on datasets also goes to other places like the national system Narcis but also to Google and Google Scholar.

And, if I don’t want to make my data publicly accessible?

Frits: “Of course, there are reasons not to place datasets in the public domain. Because you still want to use your data in your research or because you don’t want to show your data to the competition. In this case, you can place your datasets under embargo. Once you've done this, then people who want to access the data will have to contact you to get access. You determine who sees your data!”

In your opinion, what do you need for good data archiving?

Considering their experience with archiving 'old' datasets, Frits and Anne-Marie have clear thoughts on this: “At the start of your research, think about what data you’re going to collect and draw up a plan. It’s called data management planning. Along with the Wageningen Graduate Schools, the Library organizes courses on data management planning. The courses also offer practical tips on archiving your data. Following these tips, makes reusing data and the data archiving process easier!"