Thesis subject

Predictive Analytics in Data Mining of Chemical Profiles


Research area/discipline: Data Science

Prerequisites: Programming in Python, Big Data, and Machine Learning

Short description:

In Europe, it is forbidden to use growth promoters in meat production and our research question is how we can predict if hormonal growth promoting compounds (anabolics) is used in cattle.

This will be a research on urine profiles which are affected by several factors such as food intake, physiology, sex, the use of medicines, and growth promoting compounds. RIKILT research institute in Wageningen University campus has related data around 100s-1000s of chemical profiles produced with LC/Orbitrap Mass Spectrometry (Liquid chromatography coupled to high-resolution full-scan Orbitrap mass spectrometry) technology. RIKILT has a software to preprocess the raw data (200-500 MB each sample), which produces files around 400 KB-500 KB.

Machine learning techniques should be applied on these small data instead of the raw data and the predictive model should determine if a urine profile is abnormal or not (in growth promotor used or not). From machine learning perspective, this is a classification problem. This thesis will require the expertise of two groups (INF and RIKILT) and therefore, the student will be visiting two groups periodically.

This thesis is framed as a collaboration between INF chair group of Wageningen University and RIKILT research institute.

For more information: (INF Chair Group) (RIKILT)