Deep learning has revolutionised the field of image segmentation and recognition.
The potential for applications in agriculture and food science is massive. Two main obstacles hamper widespread application:
- the need for many (many!) training instances;
- the inherent black-box character of the method, making fine-tuning a cumbersome and highly subjective process.
To counter these challenges, Wageningen University & Research (WUR) has investigated methods to generate synthetic training data, and in addition assess the potential of networks trained on these data to be applied in real-world approaches.
Our approach: generating and processing data
A training set is only useful if it reflects reality: results of training deep learning models on synthetic sets need to be compared to real-life data in order to see whether the characteristics captured in the network are valid. Several heuristic approaches have been compared, and valuable routes to including synthetic data in the training process have been identified.
A three-dimensional plant model has been generated, allowing a virtual camera to obtain synthetic images from any angle, under any lighting condition. This has led for example to a large data set of images of a set of bell pepper plants, which has been published (on sciencedirect and here). What is more, this set is complete with all ground truth information (number, colour, and size of peppers, size of leaves etc.) - information that is crucial in training networks and can often be obtained only with great effort. This set, in short, is an example how to generate synthetic data for training deep learning networks.
(Expected) impact of the approach
This research is a necessary step for applying deep learning in high-throughput phenotyping approaches; this is undoubtedly the future, and will open up many opportunities for smart farming and more efficient use of resources.
Currently, training deep learning models is really dependent on the knowledge and expertise of the scientists involved. This research should lead to more automatic training protocols, allowing widespread use in very different applications.
Imaging is a powerful tool to obtain information, but many other information sources are available as well. It is as yet unclear how to combine these sources in meaningful ways. Experience and expertise in a wider range of phenotyping applications, e.g. in robotics, needs to be build. This will definitely be a theme for the coming five to ten years.
Facts and figures
- Vijay Badrinarayanan, Alex Kendall and Roberto Cipolla "SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation." PAMI, 2017.
- K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," 2016 IEEE Conf on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 770-778.
- Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick, Mask RCNN, IEEE International Conference on Computer Vision (ICCV), 2017