Data Documentation

It is essential to systematically document your data during research and when depositing your dataset into an online repository. This allows you, as well as current and future users, to find, understand and reuse your data.

Detailed and clear data documentation (i.e. proper description, annotation and contextual information) improve the quality of your data and is key in order to make your data publishable, discoverable, citable and reusable. In general, your documentation should answer the following questions:

1.    What does my dataset contain? What abbreviations were used and
what do they mean?
2.    How was my data collected and how much? Who collected the data?
When were they collected?
3.    How were my data processed? What software is needed to read
them?

The most common required form of documenting your data is by adding a README.txt file to your dataset/data package.  

Keeping digital research notes
Another form of documenting your data is through keeping digital research notes for which several types of software are available. There is a wide offer of so-called 'Electronic Laboratory Notebooks (ELNs)'. For an overview, see for example Dirnagl and Przesdzing (2016), A pocket guide to electronic laboratory notebooks in the academic life sciences. Note that an ELN is not the same as a 'Laboratory Information Management System' (LIMS). A LIMS is meant to track all the measurements, samples and protocols that are processed day by day in a laboratory. In an ELN, a note may be stored that certain samples were sent to the laboratory for a specific research project on a specific date, when the results were received back and where these files can be found.

A number of groups have developed an adequate way of working with general-purpose notes applications. We here share several "Tips & tricks" and examples for OneNote that were kindly made available by the Food Process Engineering Group:

Metadata: machine-readable data documentation
A key form of data documentation is creating metadata, or in other words “data about data”. Metadata are characteristics describing the data, which facilitates cataloguing and discovery of the data. When depositing your data into a trusted data repository, the repository generates machine-readable metadata. As such, it becomes easier to search and find documents written by for example a certain author. Metadata contain amongst others the dataset title, temporal and spatial coverage, creator(s) and contributor(s) with affiliation(s), terms of use, access conditions etc.

Preparing data documentation for publication
The WUR provides data publishing services; we can assist you in preparing dataset documentation and compiling relevant metadata. Contact the Datadesk for support.

For PhD candidates and postdocs, the Graduate Schools offer a Research Data Management course, which is organised by WUR library and given four times a year. This course covers various aspects of data management, among which data documentation including metadata. More background information can be found here.