You need to login before you can view or download document
A COMPARISON OF DATA MINING ALGORITHMS FOR IMPROVING NIR MODELS OF CANE QUALITY MEASURES
By J SEXTON; Y EVERINGHAM; D DONALD
NEAR INFRARED (NIR) analysis systems are used to estimate cane quality measures such as brix and pol in juice and apparent purity. Within the Australian sugarcane industry, partial least squares regression (PLSR) has been used to build NIR models of cane quality measures in the lab, on-line and in the field. PLSR relies on the linear relationship between sample constituents and electromagnetic absorption at NIR wavelengths. In practice, this linear relationship can often break down resulting in relationships that are more complex. Recently, machine learning techniques have become popular for their skill with complex data and ability to produce robust calibrations. The objective of this paper was to compare PLSR with the machine learning technique support vector regression (SVR). The two techniques were used to estimate three cane quality parameters; brix in juice, pol in juice and (apparent) purity. Results from the PLSR models were consistent with previous industry studies and justified the use of PLSR as a baseline against which to compare approaches that are more sophisticated. The SVR models slightly reduced prediction error compared with PLSR models for brix and pol in juice, but slightly increased prediction error for purity. The marginal improvement in model skill using SVR was not considered sufficient to recommend SVR over PLSR, given the relative ease of use and interpretability of PLSR. However, this study showed that certain samples were difficult to model with either approach. Future research should consider machine learning algorithms as well as techniques to identify difficult to model samples. This will allow researchers to seek greater improvement in our ability to utilise NIR modelling techniques.