Data mining

Sometimes the available data is huge, and there is no evident relationship between variables, and as a result, traditional modeling and data interpretation techniques fail. This can be due to the high dimensionality of the dataset, the missing effectual factors in the model, or it can be due to noisy data and presence of outliers. In this case, unsupervised clustering methods could be beneficial for finding hidden relations within data and also for determining outliers. Unsupervised methods, machine learning algorithms that analyze and cluster unlabeled datasets, bypass the need for labels and have shown great potential in revealing unknown patterns in data. K-means, self-organization mapping (SOM) [1], principal component analysis (PCA)[2], and hierarchical clustering are among the frequently used unsupervised algorithms on remote sensing data. Although unsupervised algorithms might not perform as good as supervised methods[3], [4], their independence from labels make them a reasonable choice for most problems, especially for clustering and noise removal. Moreover, combination of supervised and unsupervised techniques might increase the models’ performance[5]. These methods can be used in a row with modeling as well.


[1]       X. E. Pantazi, D. Moshou, T. Alexandridis, R. L. Whetton, and A. M. Mouazen, “Wheat yield prediction using machine learning and advanced sensing techniques,” Computers and electronics in agriculture, vol. 121, pp. 57–65, 2016.

[2]       J. Zabalza et al., “Novel folded-PCA for improved feature extraction and data reduction with hyperspectral imaging and SAR in remote sensing,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 93, pp. 112–122, 2014.

[3]       Z. Chang et al., “Landslide Susceptibility Prediction Based on Remote Sensing Images and GIS: Comparisons of Supervised and Unsupervised Machine Learning Models,” Remote Sensing, vol. 12, no. 3, Art. no. 3, Jan. 2020, doi: 10.3390/rs12030502.

[4]       Mohd Hasmadi I, Pakhriazad HZ, and Shahrin MF, “Evaluating supervised and unsupervised techniques for land cover mapping using remote sensing data,” Geografia : Malaysian Journal of Society and Space, vol. 5, no. 1, Art. no. 1, Jan. 2009.

[5]       N. Alajlan, Y. Bazi, F. Melgani, and R. R. Yager, “Fusion of supervised and unsupervised learning for improved classification of hyperspectral images,” Information Sciences, vol. 217, pp. 39–55, Dec. 2012, doi: 10.1016/j.ins.2012.06.031.