INF-33806 Big Data

Profile of the course

The term Big Data usually refers to data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable elapsed time. In general Big Data can be explained according to four V's: Volume (amount of data), Velocity (speed of data), Variety (range of data types and sources), and Veracity (reliability of data). The realization of Big Data relies on disruptive technologies such as Cloud Computing, Internet of Things and Data Analytics.  Big Data has become a very important driver for innovation and growth for various industries, including Life Sciences.

This course will cover both theoretical and practical aspects for understanding, developing and using Big Data systems, focusing on applications and needs in the Life Sciences. We discuss the general concepts of Big Data systems, their principles and  limitations. We introduce concepts related to big data system architecture, the Map-Reduce framework, and how they are made available with cutting edge technologies such as the Hadoop Distributed File System and Apache Spark. Students will practice with such tools with individual tutorials, and gain hands-on experience by deploying Big Data ecosystem tools with a group project in a Wageningen UR area of expertise.

The course has been designed in such a way that it is accessible for students of a diverse range of disciplines in Wageningen University and Research.

Learning outcomes:

After successful completion of this course students are expected to be able to:

  • discuss the basic concepts related to Big Data
  • show insight in the value of data-driven innovation, and associate this with their own course of studies
  • demonstrate the basic architecture of Big Data systems, and use it for designing applications in their own areas of interest
  • discuss and apply the Map Reduce principles
  • discuss the role of various tools in the Big Data ecosystem and have hands on experience with some of them
  • explore data analytics for discovery and communication of meaningful patterns in data
  • develop knowledge and experience on governance and reuse of open data
  • develop a Big Data system using Big Data tools

Assumed knowledge:

Fundamentals of programming (e.g. INF-22306 Programming in Python). Specifically you should be acquainted with the following concepts and techniques:

  • variables, assignment, expressions, operators
  • functions (and/or procedures, subroutines, methods) and parameters; also making your own functions
  • control structures: at least: if, for, while
  • objects and their properties (fields, variables) and operations (methods)
  • arrays, including standard algorithms to traverse arrays (searching, summing, finding the largest element, etc.)
  • data structure (lists, tuples, dictionaries)

Familiarity with relational databases (e.g. INF-21306 Data Management).