Course
Bioinformatics with Linux and Python
Scope
Linux and Python, a dynamic, readable programming language, is a popular combination for all types of bioinformatics work, from simple one-off scripts to large, complex software projects. This workshop is aimed at complete beginners and assumes no prior programming experience. It gives an overview of the language with an emphasis on practical problem-solving, using examples and exercises drawn from various aspects of bioinformatics work. The workshop is structured so that the parts of the language most useful for bioinformatics are introduced as early as possible, and that students can start writing plausibly-useful programs after the first few sessions.
Learning goals
After completing the workshop, students should be in a position to (1) apply the skills they have learned to tackling problems in their own research and (2) continue their Linux and Python education in a self-directed way.
Assumed knowledge
Entree level programming. No modelling skills are needed. However, this course is heavy on bioinformatics and programming, so the interest to improve your modelling and programming skills is a requirement.
Course programme (subject to small changes):
- Session 1 – connecting to the server and basic Linux commands
In the first session we briefly cover the design of Linux: how is it different from Windows/OSX and how is it best used? We’ll then jump straight onto the command line and learn about the layout of the Linux filesystem and how to navigate it. We’ll describe Linux’s file permission system (which often trips up beginners), how paths work, and how we actually run programs on the command line. We’ll learn a few tricks for using the command line more efficiently, and how to deal with programs that are misbehaving. We’ll finish this session by looking at the built in help system and how to read and interpret manual pages.
- Session 2 – assembling Linux commands into pipelines
Many data types we want to work with in bioinformatics are stored as tabular plain text files, and here we learn all about manipulating tabular data on the command line. We’ll start with simple things like extracting columns, filtering and sorting, searching for text before moving on to more complex tasks like searching for duplicated values, summarizing large files, and combining simple tools into long commands. Aliases, shell redirection, pipes, and shell scripting will all be introduced here.
- Session 3 – introduction to bash scripting and variables
In this session we will introduce the idea of a script – a text file that combines commands to be run as a batch. We will get to grips with the basic idea by converting some of the complex command lines that we composed in the previous session into scripts. This gives us an opportunity to discuss the pros and cons of scripting. An important idea introduced in this session is that of a variable – a bit of information that can be passed into scripts. Sometimes variables can be files, or lists of files, which allows us to build our own custom command line tools.
- Session 4 – biological pipelines and data formats
In this session we will apply the approaches that we learned in the previous three sessions to biology-specific tools, looking at Eutils for sequence retrieval and EMBOSS for biological data file manipulation. A discussion of file format, focussing on FASTA and genbank format, will be necessary.
- Session 5 – introduction to Python, text and files
In this session students learn to write very simple programs that produce output to the terminal, and in doing so become comfortable with editing and running Python code. This session also introduces many of the technical terms that we’ll rely on in future sessions. I run through some examples of tools for working with text and show how they work in the context of biological sequence manipulation. We also cover different types of errors and error messages, and learn how to go about fixing them methodically. We’ll finish by looking at how to get data in and out of our programs using files.
- Session 6 – lists and loops in Python
A discussion of the limitations of the techniques learned in session 3 quickly reveals that flow control is required to write more sophisticated file-processing programs, and I introduce the concept of loops. We look at the way in which Python loops work, and how they can be used in a variety of contexts. We explore the use of loops and lists together to tackle some more difficult problems.
- Session 7 – conditions in Python
I use the idea of decision-making as a way to introduce conditional tests, and outline the different building-blocks of conditions before showing how conditions can be combined in an expressive way. We look at the different ways that we can use conditions to control program flow, and how we can structure conditions to keep programs readable.
- Session 8 – writing functions in Python
We discuss functions that we’d like to see in Python before considering how we can add to our computational toolbox by creating our own. We examine the nuts and bolts of writing functions before looking at best-practice ways of making them usable. We also look at a couple of advanced features of Python – named arguments and defaults.
- Session 9 – paired data and dicts in Python
We discuss a few examples of key-value data and see how the problem of storing them is a common one across bioinformatics and programming in general. We learn about the syntax for dictionary creation and manipulation before talking about the situations in which dictionaries are a better fit that the data structures we have learned about thus far.
- Session 10 – programming workshop
General information
Registration
- Early bird registration deadline: 11 February 2024
- Regular registration deadline: 25 February 2024
Go to registration form
N.B.: This course gives priority to VLAG and WIMEK PhD candidates due to the limited space available.
Course duration
11-18 March 2024, mornings 09:00-12:30
Credit points
1.5 ECTS
Language
English
Group size
12-15 participants
Frequency
Yearly.
Fee
Role | Early (before 11 December) | Regular (after 11 December) |
WUR PhDs with TSP | 310 | 360 |
SENSE PhDs with TSP | 620 | 670 |
Other PhDs | 700 | 750 |
Staff of WUR graduate schools | 700 | 750 |
Others/non-academic | 740 | 790 |
The course fee includes coffee, tea and lunch on all 5 days, and drinks on day 5.
The fee does not include accommodation, breakfast or dinner. Accommodation is not included in the fee of the course, but there are several possibilities in Wageningen. For information on B&B’s and hotels in Wageningen please visit proefwageningen.nl/overnachten. Another option is Short Stay Wageningen. Furthermore, Airbnb offers several rooms in the area. Note that besides the restaurants in Wageningen, there are also options to have dinner at Wageningen Campus.
Cancellation conditions
- Up to 4 (four) weeks prior to the start of the course, cancellation is free of charge.
- Up to 2 (two) weeks prior to the start of the course, a fee of €360,- will be charged.
- In case of cancellation within ten or less days prior to the start of the
course, or if you do not show at all, a fee of €790,- will be charged.
Note: If you would like to cancel your registration, ALWAYS inform us. By NOT paying the participation fee, your registration is NOT automatically cancelled (and do note that you will be kept to the cancellation conditions).
Also note that when there are not enough participants, we can cancel the course. We will inform you if this is the case a week after the early bird deadline. Please take this into account when arranging your trip to the course (I.e. check the re-imburstment policies).