Thursday 9 January 2014

Chemometrics under Mnova 9 - PCA

(NoteThis entry has been written by Dr. Silvia Mari, from R4R who has helped us to design and implement this module)

Background: spectroscopy and chemometrics

“For many years, there was the prevailing view that if one needed fancy data analyses, then the experiment was not planned correctly, but now it is recognized that most systems are multivariate in nature and univariate approaches are unlikely to result in optimum solutions.”
      Hopke, P. K. (2003). The evolution of chemometrics. Analytica Chimica Acta, 500(1-2), 365–377]
  
Either we apply analytical chemistry for quality and control or we attempt to a more “system biology” approach for our R&D we do need advanced methods to design experiments, calibrate instruments, and analyze the resulting data. And the “emergence of chemometrics thinking came from the realization that traditional univariate statistics is not sufficient to describe and model chemical experiments”
       Geladi, P. (2003). Chemometrics in spectroscopy . Part 1 . Classical chemometrics, 58, 767–782


With this in mind Mnova 9 now offers to its users a module called PCA which could be found under the main menu “Advanced”. It is the result of our first efforts to include chemometric tools into Mnova and it is meant to give spectroscopist the possibility to interactively work on both stacked spectra and its corresponding statistical plots.

Starting from mid ‘70s where the first paper with chemometrics in the title appeared in 1975 [1], chemometrics has grown up and is now considered a functioning research area in the chemical science. It has expanded widely from its beginnings into a variety of other areas including multivariate calibration, pattern recognition, and mixture resolution and today there are several applications of interest for the NMR spectroscopists [2-5].


PCA module

Principal Component Analysis (PCA) is a procedure which uses orthogonal transformation to convert a set of observations from correlated variables into a set of values of linearly uncorrelated variables (named principal components) [6].

PCA module under Advanced menu is working in two subsequent steps: (1) matrix generation and (2) principal component analysis. The overall workflow can be represented with the following illustration, where general steps available in Mnova are highlighted in blue whilst specific functionalities of this new PCA module are highlighted in yellow.


With the aim to help the spectroscopist to refine and optimize the data matrix to be used for advanced analysis, PCA in Mnova makes it very easy the detection and removal of spectrum outliers, reveal problems in spectral alignment as well as in its phase or baseline. Once the user has properly corrected those regions of interest, the PCA module allows to re-run the analysis, either replacing the previous analysis or creating a new one for comparison.

Interaction with the stacked spectra.

The main effort applied during the design and development could be summarized in one word: SYNCHRONIZATION. PCA plots, PCA tables and stacked plot are always synchronized. By doing so selections of a point in the score plot imply a selection in the stacked plot. 


In the same way, a selection of a point in the loading plots (hence a selection of a variable of the matrix) generate a shadow into the stacked plot according to the bin position and size.



Colors and graphics

When dealing with large dataset, color coding plays a very important role and eventually essential. Even if PCA does not use class definition in its algorithm since it is an unsupervised method, the kind of patterns expected is generally known.
The driving concept here is that colors are assigned on the basis of class belonging. Again, as in the previous section, colors are always synchronized from PCA tables to PCA plots and to stacked spectra as well


Moreover, in the loading plot, the user is allowed to select more than one bin (see flag option in the loading plot table, or multiple selection of table entry using shift or ctrl  key). Visualization of a bin region is obtained with a colored box that is displayed superimposed over the stacked plot. The User can associate different colors to different bins regions





Data filtering and scaling


The results of the analysis depend on the types of filtering and scaling of the matrix that user selects, which therefore must be specified. It can be demonstrated how both factors greatly affect the outcome of the data analysis and thus the rank of the most important variables. PCA module includes several possibilities in terms of data cleaning and scaling.


There is not a general rule in the selection of the type of scaling. For that purposes we recommend the manuscript from van den Berg et. al. [7] which describes extensively how these transformations could improve the information content of the data matrix. Finally, bear in mind that visual inspection and assessment is ultimately one of the most important steps in chemometrics.

Conclusion

We have introduced in Mnova 9 a chemometric module called PCA (Principal Component Analysis). PCA have been shown to be very effective in compressing large volume of noisy correlated data into a subspace of much lower dimension than the original data set. Data pretreatment method is crucial to the outcome of the data analysis. The resulting low dimensional representation of the data set has been shown to be of great utility for analysis or monitoring the system under study, as well as in selecting variables for control or markers of the expected pattern.
The possibility to interactively play with PCA plots and spectra at the same time, and the user friendly interface provided by Mnova will be of great advantages also for spectroscopists that are not familiar with multivariate analysis but would like to learn more and test it.
As has always been for Mnova community, the future of this new first step in chemometrics will be driven by user requirements. For that reason we look forward to get feedback, criticisms, suggestions, comments and lots of requests for future development. So, play with it and have fun at looking at your own datasets from a different perspective!

References

[1] B.R. Kowalski, Chemometrics: views and propositions, J. Chem. Inf. Comp. Sci. 15 (1975) 201–203
[2] Chemometrics in bioreactor monitoring. Lourenço, N. D., Lopes, J. a, Almeida, C. F., Sarraguça, M. C., & Pinheiro, H. M. (2012). Bioreactor monitoring with spectroscopy and chemometrics: a review. Analytical and bioanalytical chemistry, 404(4), 1211–37. doi:10.1007/s00216-012-6073-9
[3] Metabonomics and chemometrics in food science and nutrition. Kuang, H., Li, Z., Peng, C., Liu, L., Xu, L., Zhu, Y., Wang, L., et al. (2012). Metabonomics approaches and the potential application in food safety evaluation. Critical reviews in food science and nutrition, 52(9), 761–74. doi:10.1080/10408398.2010.508345
[4] Pharmaco-metabonomic phenotyping and chemometrics. Robertson, D. G., Reily, M. D., & Baker, J. D. (2007). Metabonomics in Pharmaceutical Discovery and Development, 526–539.
[5] Metabonomics and chemometrics in drug safety and toxicology. Griffin, J. (2004). The potential of metabonomics in drug safety and toxicology. Drug Discovery Today Technologies, 1(3), 285–293. doi:10.1016/j.ddtec.2004.10.011
[6] Principal component analysis, Svante Wold, Kim Esbensen, Paul Geladi. Volume 2, Issues 1–3, August 1987, Pages 37–52
[7] Van den Berg, R. A., Hoefsloot, H. C. J., Westerhuis, J. A., Smilde, A. K., & van der Werf, M. J. (2006). Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC genomics, 7(1), 142. doi:10.1186/1471-2164-7-142


No comments: