Optimising realism of synthetic images using cycle generative adversarial networks for improved part segmentation

Barth, R.; Hemming, J.; Henten, E.J. Van


In this paper we report on improving part segmentation performance for robotic vision using convolutional neural networks by optimising the visual realism of synthetic agricultural images. In Part I, a cycle consistent generative adversarial network was applied to synthetic and empirical images with the objective to generate more realistic synthetic images by translating them to the empirical domain. We hypothesise that plant part image features (e.g. color, texture) become more similar to the empirical domain after translation of the synthetic images. Results confirm this with an improved mean color distribution correlation with the empirical data prior of 0.62 and post translation of 0.90. Furthermore, the mean image features of contrast, homogeneity, energy and entropy moved closer to the empirical mean, post translation. In Part II, 7 experiments were performed using convolutional neural networks with different combinations of synthetic, synthetic translated to empirical and empirical images. We hypothesise that the translated images can be used for (i) improved learning of empirical images, and (ii) that learning without any fine-tuning with empirical images is improved by bootstrapping with translated images over bootstrapping with synthetic images. Results confirm our hypotheses in Part II. First a maximum intersection-over-union performance was achieved of 0.52 when bootstrapping with translated images and fine-tuning with empirical images; an 8% increase compared to only using synthetic images. Second, training without any empirical fine-tuning resulted in an average IOU of 0.31; a 55% performance increase over previous methods that only used synthetic images. The key contribution of this paper to robotic vision is to provide supporting evidence that domain adaptation can be successfully used to translate and improve synthetic data to the real empirical domain that results in improved segmentation learning whilst lowering the dependency on manually annotated data.