Publication: Fast probabilistic file fingerprinting for big data

Gepubliceerd op
18 maart 2013

Next generation sequencing and other data acquisition technologies are generating terabytes of data.

Even the mundane task of transferring and comparing files in storage and over computer networks is time consuming.  The paper presents a method to significantly reduce the use of computational resources when comparing big data. We prove statistically that sampling a subset of biological data inside a big data file allows for unique identification of that file.

