Structural Equation Modelling


While much of statistics focusses on associations between variables and making predictions, the aim of structural equation modelling is to establish causal relationships between variables. In spite of the common belief that any causal statement requires randomized experiments, there is an increasing body of theory, methodology and software which enables scientists to draw certain types of causal conclusions from observational data. This has important advantages, especially in cases where randomized experiments are not feasible. Notably, causal models allow the quantification of intervention effects, which is the response of the system given a certain value of one your variables (e.g. gene knock-out, rainfall). This new course will explain the key concepts underlying causal inference, the required assumptions, and how the interpretation of results differs from the case of randomized experiments. To ensure that you learn from the best, we managed to get Prof. Bill Shipley from the Université de Sherbrooke in Canada to come over to Wageningen to actually give this course. Prof. Shipley is the author of "Cause and correlation in biology: A user’s guide to path analysis, structural equations, and causal inference", which by many is seen as the guide for working with Path Analysis and Structural Equation Models. The focus will be on classical structural equation models with a small number of (latent) variables, but we will also give an introduction to recent developments on methodology for high-dimensional data. Throughout the course we will discuss applications in ecology, social sciences and genetics. Depending on the background and interests of the participants we may put a stronger emphasis on some of these applications. Participants are therefore encouraged to bring their own data.

  • Day 1: Introduction and background of structural equation models (SEM): causation versus correlation, causal inference versus ‘classical’ statistics. Identifiability and estimation for models without latent variables.
  • Day 2: Testing and selecting your model: goodness of fit tests, model comparison, confirmatory and explanatory models.
  • Day 3: Adding latent (unmeasured) variables to your model; concept of latent variables, estimating SEMs with latent variables.
  • Day 4: The estimation of causal effects revisited: causal graphs, directed acyclic graphs and conditional independence, d-separation and faithfulness. The d-sep test.
  • Day 5: Applications of SEMs in ecology and genomics, e.g. causal inference for high-dimensional data with the R-packages pcalg and qtlnet.