Last week, the Dutch research world was startled by the revelation of a serious case of fraud (article in Dutch only). The data used in psychological research turned out to be fictitious. My position is that the widespread use of the computer leads to ‘creative research,’ but that it can also contribute to the prevention of fraud. In fact, the good use of digital media can improve the quality of research.
The tendency to polish up research data or even to fabricate it arises primarily as a result of the great pressure placed on researchers to constantly produce new and striking results. Standing out within the world of science and academia as well as within society as a whole is a precondition in today’s world for receiving recognition and funding. The impact of the research thus becomes more important than the methodology.
At the same time, the computer has penetrated right into the capillaries of research practice. This was made possible particularly by the fact that the computer allows the fast analysis of data. It has become very easy to carry out a few statistical analyses on a set of measurement points, to produce some attractive graphs to accompany the analysis and to identify links. It is no surprise that tools such as Excel, Matlab and SPSS are so popular. The computer is therefore used primarily as a number cruncher. The problem is that the descriptive side of research is now overlooked as a result. Moreover, the process of data collection is invisible to outsiders. Research data is stored in individual computer files, spread over different systems. If the data can even be retrieved, it is often little more than numbers in tables, with no explanation of the parameters used, methods, materials etc. There is no careful and cohesive description of the complete set of observations, as would have been the case in a traditional lab journal.
Due to recent developments within the world of information science, it is becoming possible to retrieve the descriptive element of good research. In this regard, we have to make a distinction between (1) the careful making of notes, in such a manner that other people would in principle be able to interpret and use the data – in other words, basic good research practice; and (2) making the data accessible, if necessary subject to restrictions. The distinction is important, as the former is often used as an argument not to do the latter, or vice versa. The practice demonstrates that researchers exhibit rather a lot of resistance to producing well-annotated and easily retrievable datasets. ‘That takes up too much of my time,’ ‘other people might steal my data,’ ‘the experiment failed,’ ‘my data wouldn’t be any use to anyone else,’ ‘no-one does that, so why should I?’ – these are a few of the standard responses.
New computer applications could remove many of these objections, however. They assist with the descriptions of the data, the cohesion and the origin. They are ideal for laying claims to data and results at an early stage and for making these known. It becomes possible to trace and verify the material on which publications are based. This results in essential principles of scientific research resurfacing, namely verifiability and reproducibility. Standard publications do not offer this option at a satisfactory level. Research institutes invest heavily in attractive and expensive project management tools. They now need to make the step towards looking after and using the research data itself - after all, that is the actual ‘treasure chest’.
Can fraud be combated in this way? No, ultimately people will of course always be able to find ways of distorting data. However, the tendency towards this can be reduced by demanding the careful description of research materials and by making them accessible in principle. This also boosts the quality of the research and promotes cooperation!
Jan Top, Senior researcher at Wageningen UR Food & Biobased Research and professor of Knowledge Management in Agrifood at the VU University Amsterdam.