Rangers in game reserves are often confronted with the issue of poaching. In order to minimise this problem, it is crucial to count and localise the animals living in these reserves. Counting from the ground level is an arduous task, which requires a great number of rangers. The more often helicopter flights are used for covering transects and the populations are then extrapolated. In order to increase the area that can be observed at one time, a semi-automatic approach based on drones (or Unmanned Aerial Vehicles, UAVs) is ideal.
The needle in a haystack
In times of affordable UAV’s, one could easily obtain one to take images from above. These images are then inspected by rangers to look for any visible animals. This means sifting through many images on which an animal is very hard to spot, even for experienced rangers. Figure 1 shows the image as taken by an UAV and Figure 2 shows the zoomed in area encircled in red.
As seen in examples mentioned above, finding animals on aerial imagery is hard, it is similar to finding the needle in a haystack. At the image resolution, animals are small (on average size of 30x18 pixels), scarce and heterogeneous in appearance. Moreover, their surrounding landscape is similar to the animals in appearance (e.g. tree trunks) and diverse (soil composition, illumination etc.). These characteristics make identification of animals from aerial imagery very challenging. A ranger would therefore take days to spot all the animals.
The main question to this issue is how to obtain annotations of good quality in ranger-useful time. The solution is a combination of using the power of crowds and Geospatial Computer Vision.
Calculating animal presence probabilities
First the reserve was imaged in a terrestrial campaign and the acquired images were subsequently cut into 224 x 224 pixels subframes. The ground truth was crowdsourced by volunteers on Micromappers. A deep learning model is then used to predict animal probabilities in each image. However, as most images do not contain animals and the variability of backgrounds is huge, such a model is difficult to train. The consequence is that the CNN (Convolutional Neural Network) - and any other baseline - detect animals everywhere.
Training the CNN can be achieved by taking into account the following characteristics:
- Curriculum learning: Only start with images containing animals, subsequently use the full dataset.
- Class weights: Errors count more if done on the animal class.
- Hard negative mining: After 100 epochs, penalize more the most confident mistakes.
The results are far less false positives for the CNN during the detection phase (Figure 3), therefore users need to screen only a portion of the tiles (Figure 4), thus saving valuable time and resources.
Figure 3. Results are far less false positives in the detection phase (870 for the CNN).
Figure 4. False positives of the Baseline (red) and CNN (blue) show less false positives for the CNN.