Open Data Science Europe Workshop 2021

Semi-supervised learning for predicting soil properties at a national scale of Germany
2021-09-09, 10:40–11:00, HUGOTech

Spatial soil information denotes maps and associated databases that provide explicit, quantitative expressions of soil property variation for a given area. In general, more detailed and accurate soil maps that contain more information about soil properties and their spatial distribution are necessary for modern land evaluation, land suitability analysis, and land resource management. Digital soil mapping (DSM) can accurately predict soil properties. The basis of DSM techniques is the relationship between the geospatial environmental covariates, obtained e.g. from terrain attributes and satellite imagery, with any soil properties. Based on such relationships, DSM can produce and quantify spatial soil functions by implementing different machine learning (ML) algorithms. Although the ML algorithms are routinely applied throughout the world for mapping of soil properties at all manner of spatial scales and extents, there are still some unresolved issues for the application of ML algorithms in DSM when it comes to the limited sample training data, especially at the national scale in which the process of obtaining the soil data may be cumbersome for such large scales. To overcome this drawback, semi-supervised learning approaches might be an alternative to supervised learning techniques. The general idea behind semi-supervised learning is that the ML model combines a small amount of labeled data with a large amount of unlabeled data during training. In this research, we applied semi-supervised learning approaches for predicting soil properties at the national scale of Germany. Specifically, we used 2000 labeled data and 20,000 unlabeled data randomly distributed across Germany for training the ML algorithms. Furthermore, a random forest model was applied to make a correlation between labeled data and 170 geospatial environmental covariates with 30-meter resolutions. Its results were compared to the semi-supervised learning approach. Based on the testing data set, results indicated the higher performance of the semi-supervised learning approach compared to the supervised learning approach. This was especially true when we reduced the sample size of labeled data. In general, our findings suggested the use of semi-supervised learning approaches for digital soil mapping with limited data.


Please, insert here all the other authors of your submission, together with their affiliated institution.

Ruhollah Taghizadeh-Mehrjardi and Thomas Scholten

Department of Geosciences, Soil Science and Geomorphology, University of Tübingen, Tübingen, Germany

I am currently a Postdoc at the Department of Geosciences, University of Tübingen. My research interests include digital soil mapping and soil-landscape modeling.