Feature engineering and feature selection

Data gathered in a remote sensing process contains specific information called features. The average reflectance of a canopy in a particular band or maximum height of a canopy are examples of features.  However, most of the time the raw features are not adequate or sometimes are redundant to be used in the modeling process. As a result, feature engineering and feature selection methods gained attention. Feature engineering (sometimes called feature extraction) is the technique of creating new (more meaningful) features from the original features[1]. For example, NDVI can be considered a feature engineering in agriculture since the reflectance values of two bands are transformed to create a more tangible feature. Using any other transformation, such as data normalizing and data standardization, that creates new features from the raw features is also feature extraction. One should note that an infinite number of transformations are possible for any given dataset, and as a result, unlimited features can be generated. However, obviously, too many features cannot be used for modeling, even with machine learning techniques, especially when the number of data points is not very large, which is the case in most agricultural applications by expensive samplings. In many instances, even the raw features are too many that cause dimensionality problem: many variables corresponding to a few samples[2]. Therefore, a variable reduction is usually inevitable, either through projection methods, feature selection, or a combination of both.

Projection methods work on the feature engineering principle: they transform data from a higher dimension to a lower dimension using linear, as in principal component analysis (PCA), or non-linear transformations. In an effective transform, a few variables (sometimes called latent variables or components) would explain most of the variation in the response. Although these techniques are prevalent, they have a severe drawback: losing links to the original variables and hence foiling interpretability[3].

On the other hand, feature selection is an approach in which a subset of the input features (either the raw or engineered features) is selected based on their significance and contribution to the model, and redundant or irrelevant features are removed[3]. In addition to the redundancy elimination, feature selection improves the models' interpretability and understanding of the relationship between the explanatory variables and the response [2]. Feature selection should typically lead to a performance improvement or at least dimensionality reduction with minimal performance degradation. For example, in a yield prediction study, removing 40% of a total of 65 features, including soil information, crop varieties, and fertilization practices, by a feature selection process led to a less than 3 % increase in the mean absolute error[1]. Although feature section techniques might not gain much attention when the explanatory variables are limited, in most remote sensing studies, especially in those involving various spectral bands such as hyperspectral data, feature selection is inevitable. For instance, a study showed that less than 0.45 percent of the features (6 out of 1339 bands) in a hyperspectral dataset of grapevine leaves' nutrient content are informative[4]. In [3] feature selection algorithms are categorized broadly as filter algorithms, wrapper algorithms, and hybrids, and advantages and disadvantages are discussed. So, based on the input variables and the modeling problem, suitable feature engineering and feature selection approaches must be employed to ensure the correct data are being used in the modeling process[5].

References

[1]       F. F. Bocca and L. H. A. Rodrigues, “The effect of tuning, feature engineering, and feature selection in data mining applied to rainfed sugarcane yield modelling,” Computers and electronics in agriculture, vol. 128, pp. 67–76, 2016.

[2]       T. Mehmood, K. H. Liland, L. Snipen, and S. Sæbø, “A review of variable selection methods in partial least squares regression,” Chemometrics and Intelligent Laboratory Systems, vol. 118, pp. 62–69, 2012.

[3]       A. Senawi, H.-L. Wei, and S. A. Billings, “A new maximum relevance-minimum multicollinearity (MRmMC) method for feature selection and ranking,” Pattern Recognition, vol. 67, pp. 47–61, 2017.

[4]       F. Vanegas, D. Bratanov, J. Weiss, K. Powell, and F. Gonzalez, “Multi and hyperspectral UAV remote sensing: Grapevine phylloxera detection in vineyards,” in IEEE Aerospace Conference Proceedings, Jun. 2018, vol. 2018-March, pp. 1–9. doi: 10.1109/AERO.2018.8396450.

[5]       R. Omidi, A. Moghimi, A. Pourreza, M. El-Hadedy, and A. S. Eddin, “Ensemble Hyperspectral Band Selection for Detecting Nitrogen Status in Grape Leaves,” arXiv preprint arXiv:2010.04225, 2020.