Tuesday, 18 June 2013

A Tutorial On Principal Component Analysis with the Accord.NET Framework


Principal Component Analysis (PCA) is a technique for exploratory data analysis with many success applications in several research fields. It is often used in image processing, data analysis, data pre-processing, visualization and is often used as one of the most basic building steps in many complex algorithms.





One of the most popular resources for learning about PCA is the excellent tutorial due to Lindsay I Smith. On her tutorial, Lindsay gives an example application for PCA, presenting and discussing the steps involved in the analysis. 










Souza, C. R. "A Tutorial on Principal Component Analysis with the Accord.NET Framework".

Department of Computing, Federal University of São Carlos, Technical Report, 2012.





This said, the above technical report aims to show, discuss and otherwise present the reader to the Principal Component Analysis while also reproducing all Lindsay's example calculations using the Accord.NET Framework. The report comes with complete source C# code listings and also has a companion Visual Studio solution file containing all sample source codes ready to be tinkered inside a debug application. 





While the text leaves out the more detailed discussions about the exposed concepts to Lindsay and does not addresses them in full details, it presents some practical examples to reproduce most of the calculations given  by Lindsay on her tutorial using solely Accord.NET.





If you like a practical example on how to perform matrix operations in C#, this tutorial may help getting you started.


Monday, 3 June 2013

Sequence Classifiers in C#: Hidden Conditional Random Fields

After a preliminary article on hidden Markov models, some months ago I had finally posted the article on Hidden Conditional Random Fields (HCRF) on CodeProject. The HCRF is a discriminative model, forming the generative-discriminative pair with the hidden Markov model classifers.









This CodeProject article is a second on a series of articles about sequence classification, the first being about Hidden Markov Models. I've used this opportunity to write a little about generative versus discriminative models, and also provide a brief discussion on how Vapnik's ideas apply to these learning paradigms.



All the code available on those articles are also available within the Accord.NET Framework. Those articles provide good examples on how to use the framework and can be regarded as a practical implementation on how to use those models with the framework.



Complete framework documentation can be found live at Google Code. The framework has now been referred on 30+ publications over the years, and several more are already in the works, by me and users around the world.




Academical publications


Talking about publications, the framework has been used within my own research on Computer Vision. If you need help in understanding the inner workings of the HCRF, a more visual explanation on the HCRF derivation can also be found at the presentation I gave on Iberamia 2012 about Fingerspelling Recognition with Support Vector Machines and Hidden Conditional Random Fields [pdf].



An application to a more interesting problem, namely natural words drawn from Sign Languages using a Microsoft Kinect, has also been accepted for publication at the 9th International Conference on Machine Learning and Data Mining, MLDM 2013, and will be available publicly shortly.




As usual, hope you find it interesting!