Accord Vision: 2010

Monday, 29 November 2010

Machine Learning Books and Lectures

Jordan, one of the readers of the blog, wrote to point out some cool references for machine learning, mathematics and artificial intelligence.

Thanks again for all your contributions. I'm a .NET programmer coming from a background of studying politics and Latin American studies so when the moment came that I realized I was interested in A.I., I didn't have many resources to turn to.

I have a collection of books and links that I found very very helpful when I was trying to learn about these concepts. I was thinking they could be useful to some of your readers who were in my shoes two years ago. So without further adieu:

Handbook of Normal Frames and Coordinates

"Handbook of Normal Frames and Coordinates," Iliev, Bozhidar Z. (2006)

“This can be a heavy text, but if you understand the concept, this can take you to the next level - I would categorize this as Topology but has applications in quantum physics, theoretical mathematics, theoretical physics etc...- in context of your blogs - granted this is a dedicated read - I think readers will be able to appreciate and truly understand what a Hilbert Space is after reading this book.”

Linear Algebra

"Linear Algebra," Jim Heffron (2008)

“I liked this one because it was free and is still highly rated - I have also heard great reviews about David Poole's book on linear algebra, but I couldn't get myself to shell out the money when this was free.”

Complex To Real

http://www.complextoreal.com/tutorial.htm

“There are a ton of great articles here - I have personally read the ones on fourier transforms and wavelet transforms - great stuff”

Stanford Lectures - Fourier Analysis series

(free through Youtube or iTunesU)

“Email the Professor for the companion book. At this point it may have been published - but well worth shelling out the dough in any case.”

Bio-Inspired Artificial Intelligence

"Bio-Inspired Artificial Intelligence," Floreano and Mattiussi (2008)

“Excellent reference - fairly in depth for the number of topics it covers, lots of illustrations (illustrations are always a good thing :) and I've also found it to be a useful source for inspiration. I bought this book while I was looking into fuzzy swarm intelligence - it doesn't have all the answers, but I am simply in love with this book.”

Video lectures on Machine Learning

http://videolectures.net/pascal/

“A collection of video lectures featuring very brilliant people - let's face it... if you're doing or are interested in mathematics this complex... you probably don't know too many people who you can talk to / learn from on the subject unless you're in a University studying this kind of thing - these are some great lectures on machine learning - I just recently found this site but wish I had found it sooner - it's great if you're just exploring machine learning or are very well versed in it - however, I fall somewhere in the middle of that distribution so take it for what it's worth!”

Fearless Symmetry

"Fearless Symmetry," Ash and Gross (2006)

Another accessible book to those without heavy training in math - great intro to Galois Theory, the Riemann Hypothesis and several other concepts.

Zero: The Biography of a Dangerous Idea

"Zero: The Biography of a Dangerous Idea," Charles Seife (2000)

This one is more historical and conceptual than technical but it's a captivating read and will help get you through those hard times when you want to put down that book on K-Dimensional Manifolds, but still need to satisfy your mathematical mind (some say it's never at rest once you catch that "learning bug").

Khan Academy

http://www.khanacademy.org/

Finally, when you get lost, go here! The Khan Academy is a not-for-profit 501(c)(3) with the mission of providing a world-class education to anyone, anywhere.

Thanks Jordan! I hope readers can enjoy those resources as much as I did. Cheers!

Monday, 18 October 2010

Gaussian Mixture Models and Expectation-Maximization

Like K-Means, Gaussian Mixture Models (GMM) can be regarded as a type of unsupervised learning or clustering methods. They are among the most statistically mature methods for clustering. But unlike K-Means, GMMs are able to build soft clustering boundaries, i.e., points in space can belong to any class with a given probability.

Download source code.

Download sample application.

Browse the source code online

The code presented here is also part of the Accord.NET Framework. The Accord.NET Framework is a framework for developing machine learning, computer vision, computer audition, statistics and math applications. It is based on the already excellent AForge.NET Framework. Please see the starting guide for mode details. The latest version of the framework includes the latest version of this code plus many other statistics and machine learning tools.

Introduction

Source code

Using the code

Sample application

Conclusion

References

See also

Introduction

In statistics, a mixture model is a probabilistic model which assumes the underlying data to belong to a mixture distribution. In a mixture distribution, its density function is just a convex combination (a linear combination in which all coefficients or weights sum to one) of other probability density functions:

p(x) = w_1 p_1(x) + w_2 p_2(x) + ... + w_n p_n(x)

The individual p_i(x) density functions that are combined to make the mixture density p(x) are called the mixture components, and the weights w₁, w₂, …, w_n associated with each component are called the mixture weights or mixture coefficients.

The most common mixture distribution is the Gaussian (Normal) density function, in which each of the mixture components are Gaussian distributions, each with their own mean and variance parameters.

$p(x) = w_1 N(x \mid \mu_1,\Sigma_1) + w_2 N(x \mid \mu_2,\Sigma_2) + ... + w_n N(x \mid \mu_n,\Sigma_n)$

To give a more solid and concrete understanding of the Gaussian mixture models, we will be jumping directly on how to represent those abstractions in source code through the use of class diagrams. Hopefully class diagrams may give a better and direct overview on the subject than a lengthy theoretical discussion.

Source code

First we will discuss about the characteristics of any probability distribution. The IDistribution interface (depicted below) says any probability distribution may have a probability function and a distribution function. It also says that probability distributions can also be fitted to sets of observations through a parameter estimation method.

Since distribution functions can be either univariate or multivariate, the methods above accept any number of scalar values as input through the use of the params keyword. The Fit method for parameter estimation also accepts a general System.Array because we could be given either an array of univariate observations in the form of a double[] or an array of multivariate observations in the form of a jagged double[][] array. Since each observation may have a different weight in the estimation process, the weights parameter can be used to inform those weights to the fitting method.

Probability distributions may also have some associated measures, mainly the Expectation and the Variance of the modeled random variable. Depending if the distribution is univariate or multivariate, the expected values and variances can be either a single scalar or a vector of scalars. For multivariate distributions the variance can be also represented by a symmetric, positive-definite matrix known as the variance-covariance matrix.

Moreover, probability distributions can also be continuous or discrete. In continuous distributions the probability function is known as the Probability Density Function (pdf), and in discrete distributions the probability function is known as the Probability Mass Function (pmf). However, since our distribution of interest, the Gaussian, is a continuous distribution, be focusing only on continuous multivariate distributions:

The Normal (Gaussian) distribution and the mixture distributions fall under the multivariate continuous distributions category and are implemented as such.

Normal distributions

The Normal (or Gaussian) multivariate distribution is a multivariate distribution whose parameters are the mean vector μ and a variance-covariance matrix Σ. Its probability density function is given by:

$p(x) = \frac{1}{ (2\pi)^{k/2}|\Sigma|^{1/2} } \exp\!\Big( {-\tfrac{1}{2}}(x-\mu)' \Sigma^{-1} (x-\mu) \Big)$

The more detailed class diagram for the normal distribution shows the multivariate characteristics of most of its methods, in particular of the Fit method for estimating Gaussian parameters from a multivariate set of data. Each sample, given in the form of a double[], may also have different associated weights. In this case, the weight associated with each of the samples can be passed through the weights parameter.

To estimate the parameters of a Gaussian distribution all we have to do is compute the mean and the variance-covariance matrix out of the given data. This is a very straightforward procedure and can be done by using the standard methods of the Tools class in the Accord.Statistics namespace. The Fit method returns the fitted distribution as a return value and leaves the current distribution untouched.

Mixture distributions

Mixture distributions are just convex combinations of other probability distributions. Thus said, mixture distributions have an array of component distributions and a coefficient array which contains the weights of the each of the component probability distributions.

In the generic implementation above, T is the type of the distributions used in the mixture. The most common mixture distribution, the Gaussian Mixture Distribution, could then be created by instantiating a Mixture<NormalDistribution> object passing the initial Normal distributions as constructor parameters.

Alternatively, the parameters of the mixture distributions could be estimated from a set of observations by calling the Fit method. To estimate the parameters of a mixture distribution we will be using a common technique known as the Expectation-Maximization algorithm.

Expectation Maximization

Expectation-Maximization (EM) is a well established maximum likelihood algorithm for fitting a mixture model to a set of training data. It should be noted that EM requires an a priori selection of model order, namely, the number of M components to be incorporated into the model.

The general E-M algorithm is comprised of the following simple steps:

Initialization
Initialize the distribution parameters, such as the means, covariances and mixing coefficients and evaluate the initial value of the log-likelihood (the goodness of fit of the current distribution against the observation dataset)’;

Expectation
Evaluate the responsibilities (i.e. weight factors of each sample) using the current parameter values;

Maximization
Re-estimate the parameters using the responsibilities found in the previous step;

Repeat
Re-evaluate the log-likelihood and check if it has changed; if it has changed less than a given threshold, the algorithm has converged.

Below is the actual realization of the algorithm as implemented in the Fit method of the Mixture<T> class.


    /// <summary>
    ///   Fits the underlying distribution to a given set of observations.
    /// </summary>
    /// 
    /// <param name="observations">The array of observations to fit the model against. The array
    /// elements can be either of type double (for univariate data) or
    /// type double[] (for multivariate data).</param>
    /// <param name="weights">The weight vector containing the weight for each of the samples.</param>
    /// <param name="options">Optional arguments which may be used during fitting, such
    /// as regularization constants and additional parameters.</param>
    /// 
    /// <remarks>
    ///   Although both double[] and double[][] arrays are supported,
    ///   providing a double[] for a multivariate distribution or a
    ///   double[][] for a univariate distribution may have a negative
    ///   impact in performance.
    /// </remarks>
    /// 
    public override void Fit(double[] observations, double[] weights, IFittingOptions options)
    {
        // Estimation parameters
        double threshold = 1e-3;
        IFittingOptions innerOptions = null;

        if (options != null)
        {
            // Process optional arguments
            MixtureOptions o = (MixtureOptions)options;
            threshold = o.Threshold;
            innerOptions = o.InnerOptions;
        }


        // 1. Initialize means, covariances and mixing coefficients
        //    and evaluate the initial value of the log-likelihood

        int N = observations.Length;
        int K = components.Length;

        double weightSum = weights.Sum();

        // Initialize responsabilities
        double[] norms = new double[N];
        double[][] gamma = new double[K][];
        for (int k = 0; k < gamma.Length; k++)
            gamma[k] = new double[N];


        // Clone the current distribution values
        double[] pi = (double[])coefficients.Clone();
        T[] pdf = new T[components.Length];
        for (int i = 0; i < components.Length; i++)
            pdf[i] = (T)components[i].Clone();

        // Prepare the iteration
        double likelihood = logLikelihood(pi, pdf, observations, weights);
        bool converged = false;

        // Start
        while (!converged)
        {
            // 2. Expectation: Evaluate the component distributions 
            //    responsibilities using the current parameter values.
            Array.Clear(norms, 0, norms.Length);

            for (int k = 0; k < gamma.Length; k++)
                for (int i = 0; i < observations.Length; i++)
                    norms[i] += gamma[k][i] = pi[k] * pdf[k].ProbabilityFunction(observations[i]);

            for (int k = 0; k < gamma.Length; k++)
                for (int i = 0; i < weights.Length; i++)
                    if (norms[i] != 0) gamma[k][i] *= weights[i] / norms[i];

            // 3. Maximization: Re-estimate the distribution parameters
            //    using the previously computed responsibilities
            for (int k = 0; k < gamma.Length; k++)
            {
                double sum = gamma[k].Sum();

                for (int i = 0; i < gamma[k].Length; i++)
                    gamma[k][i] /= sum;

                pi[k] = sum / weightSum;
                pdf[k].Fit(observations, gamma[k], innerOptions);
            }

            // 4. Evaluate the log-likelihood and check for convergence
            double newLikelihood = logLikelihood(pi, pdf, observations, weights);

            if (Double.IsNaN(newLikelihood) || Double.IsInfinity(newLikelihood))
                throw new ConvergenceException("Fitting did not converge.");

            if (Math.Abs(likelihood - newLikelihood) < threshold * Math.Abs(likelihood))
                converged = true;

            likelihood = newLikelihood;
        }

        // Become the newly fitted distribution.
        this.initialize(pi, pdf);
    }

Gaussian Mixture Models

Gaussian mixture models are among the most commonly used examples of mixture distributions. The GaussianMixtureModel class encompasses a Mixture<NormalDistribution> object and provides methods to learn from data and to perform actual classification through a simplified interface.

Moreover, a common problem which rises in mixture model fitting through E-M is the proper initialization of the mixture parameters. One popular approach is to initialize the mixture parameters using the centroids detected by the K-Means algorithm. The code below shows how the mixture parameters are computed by first creating a KMeans object and then passing the clusters to the mixture model for further processing.

    /// <summary>
    ///   Divides the input data into K clusters modeling each
    ///   cluster as a multivariate Gaussian distribution. 
    /// </summary>     
    public double Compute(double[][] data, double threshold)
    {
        int components = this.gaussians.Count;

        // Create a new K-Means algorithm
        KMeans kmeans = new KMeans(components);

        // Compute the K-Means
        kmeans.Compute(data, threshold);

        // Initialize the Mixture Model with data from K-Means
        NormalDistribution[] distributions = new NormalDistribution[components];
        double[] proportions = kmeans.Clusters.Proportions;
        for (int i = 0; i < components; i++)
        {
            double[] mean = kmeans.Clusters.Centroids[i];
            double[,] covariance = kmeans.Clusters.Covariances[i];
            distributions[i] = new NormalDistribution(mean, covariance);
        }

        // Fit a multivariate Gaussian distribution
        model = Mixture<NormalDistribution>.Estimate(data, threshold, proportions, distributions);

        // Return the log-likelihood as a measure of goodness-of-fit
        return model.LogLikelihood(data);
    }

Using the code

Code usage is rather simple. First, instantiate a new GaussianMixtureModel object passing the desired number of components in the Gaussian mixture. Then, call the Compute(double[][], double) method passing the learning samples and a desired convergence threshold.


    // Create a new Gaussian Mixture Model with 2 components
    GaussianMixtureModel gmm = new GaussianMixtureModel(2);

    // Compute the model (estimate)
    gmm.Compute(samples, 0.0001);

    // Classify a single sample
    int c = gmm.Classify(sample);

After training has been completed, a new sample can be classified by calling the Classify(double[]) method.

Sample application

The accompanying sample application demonstrates how Gaussian Mixture Models can be used to cluster normally-distributed samples of data. By clicking the “Create random data” button, a given number of Normal distributions will be created in the two-dimensional space using random mean and covariance matrices.

After the data has been created, you can use the “Fit a Gaussian Mixture Model” button to fit a mixture of Gaussians to the data. Each of the Gaussians will receive a random color and the samples which have the greatest probability of belonging to any of the Gaussians will be colored accordingly.

Left: a random dataset containing three Gaussian clusters. Right: the clusters as identified by the Gaussian Mixture Model.

Left: a random dataset containing 10 Gaussian clusters. Right: the clusters as identified by the Gaussian Mixture Model.

Conclusion

In this article we found how Gaussian Mixture Models can be successfully used to create soft clustering boundaries around data. Those soft boundaries are possible because in a mixture model each sample is said to belong to a cluster only within certain probability.

A main drawback of GMMs is that the number of Gaussian mixture components, as in K-Means, is assumed known as prior, so it cannot be considered as totally unsupervised clustering method. To aid in this situation, one could use additional algorithms such as the Minimum Message Length(MML) criteria.

Another problem arises with the correct initialization of the mixture components. A poor choice of initial parameters will invariably lead to poor results, as the E-M algorithm converges only to a local optimization point. A commonly used solution is initialization by randomly sampling in the mixture data. Other common solution, covered by the implementation presented here, is to use K-Means to perform initialization. However, K-Means itself may also converge to a poor solution and impact negatively in the mixture model fitting.

References

Raja, Y., Shaogang, G. Gaussian Mixture Models. Department of Computer Science, Queen Mary and Westfield College, England.

Bishop, C. M. (2007), Pattern Recognition and Machine Learning (Information Science and Statistics), Springer.

Wikipedia contributors, "Normal distribution," Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/w/index.php?title=Normal_distribution (accessed October 18, 2010).

Wikipedia contributors, "Probability density function," Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/w/index.php?title=Probability_density_function (accessed October 18, 2010).

Wikipedia contributors, "Mixture density," Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/w/index.php?title=Mixture_density (accessed October 18, 2010).

Tuesday, 5 October 2010

K-Means Clustering

K-Means is one of the most popular, classic and simple approaches to clustering. Clustering is a method of unsupervised learning, and a common technique for statistical data analysis used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics [3].

Download source code.

Download sample application.

Introduction
1. Lloyd's K-Mean algorithm

Source code

Using the code

Sample application

Conclusion

Acknowledgements

References

See also

Introduction

In statistics and machine learning, k-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean [4]. In its most common form, the algorithm is an iterative greedy algorithm which converges to a local optimum after a certain number of iterations.

Illustration of the K-Means algorithm.

The algorithm works by first selecting k locations at random to be the initial centroids for the clusters. Each observation is then assigned to the cluster which has the nearest centroid, and the centroids are recalculated using the mean value of assigned values. The algorithm then repeats this process until the cluster centroids do not change anymore, or until the change less than a given threshold.

There are other refinements and extensions of the algorithm. The version depicted above is its most common form, also referred as Lloyd's algorithm.

Lloyd's K-Means algorithm

Place K points into the space represented by the objects that are being clustered. These points represent initial group centroids.

Assign each object to the group that has the closest centroid.

When all objects have been assigned, recalculate the positions of the K centroids.

Repeat Steps 2 and 3 until the centroids no longer move. This produces a separation of the objects into groups from which the metric to be minimized can be calculated.

Source code

The source code has been implemented using Accord.NET and is now part of the framework. In the current version (2.1.2), the following classes related to K-Means are contained inside the Accord.MachineLearning namespace. The source code available in this page contains only the parts of the framework actually needed to support the algorithm.

Class diagram for the K-Means algorithm.

The KMeans class is the main class representing the K-Means algorithm. The algorithm itself is implemented in the Compute(double[][] data, double threshold) method, which accepts a set of observations and a convergence threshold to determine when the method should stop.


/// <summary>
///   Divides the input data into K clusters. 
/// </summary>     
public int[] Compute(double[][] data, double threshold)
{
    int k = this.K;
    int rows = data.Length;
    int cols = data[0].Length;


    // pick K unique random indexes in the range 0..n-1
    int[] idx = Accord.Statistics.Tools.Random(rows, k);

    // assign centroids from data set
    this.centroids = data.Submatrix(idx);


    // initial variables
    int[] count = new int[k];
    int[] labels = new int[rows];
    double[][] newCentroids;


    do // main loop
    {

        // Reset the centroids and the
        //  cluster member counters'
        newCentroids = new double[k][];
        for (int i = 0; i < k; i++)
        {
            newCentroids[i] = new double[cols];
            count[i] = 0;
        }


        // First we will accumulate the data points
        // into their nearest clusters, storing this
        // information into the newClusters variable.

        // For each point in the data set,
        for (int i = 0; i < data.Length; i++)
        {
            // Get the point
            double[] point = data[i];

            // Compute the nearest cluster centroid
            int c = labels[i] = Nearest(data[i]);

            // Increase the cluster's sample counter
            count[c]++;

            // Accumulate in the corresponding centroid
            double[] centroid = newCentroids[c];
            for (int j = 0; j < centroid.Length; j++)
                centroid[j] += point[j];
        }

        // Next we will compute each cluster's new centroid
        //  by dividing the accumulated sums by the number of
        //  samples in each cluster, thus averaging its members.
        for (int i = 0; i < k; i++)
        {
            double[] mean = newCentroids[i];
            double clusterCount = count[i];

            for (int j = 0; j < cols; j++)
                mean[j] /= clusterCount;
        }


        // The algorithm stops when there is no further change in
        //  the centroids (difference is less than the threshold).
        if (centroids.IsEqual(newCentroids, threshold))
            break;

        // go to next generation
        centroids = newCentroids;

    }
    while (true);


    // Compute cluster information (optional)
    for (int i = 0; i < k; i++)
    {
        // Extract the data for the current cluster
        double[][] sub = data.Submatrix(labels.Find(x => x == i));

        // Compute the current cluster variance
        covariances[i] = Statistics.Tools.Covariance(sub, centroids[i]);

        // Compute the proportion of samples in the cluster
        proportions[i] = (double)sub.Length / data.Length;
    }


    // Return the classification result
    return labels;
}

The implementation is quite straightforward and does not use additional techniques to avoid convergence problems. More refined techniques may be added to the implementation in the future, so please make sure to download the latest version of Accord.NET Framework for the most up-to-date revision of the code.

Using the code

To use the code, create a new instance of the KMeans class passing the desired number of clusters to its constructor. Additionally, you may also pass a distance function to be used as a distance metric during clustering. The default is to use the square Euclidean distance.

    // Declare some observations
    double[][] observations = 
    {
        new double[] { -5, -2, -1 },
        new double[] { -5, -5, -6 },
        new double[] {  2,  1,  1 },
        new double[] {  1,  1,  2 },
        new double[] {  1,  2,  2 },
        new double[] {  3,  1,  2 },
        new double[] { 11,  5,  4 },
        new double[] { 15,  5,  6 },
        new double[] { 10,  5,  6 },
    };

    // Create a new K-Means algorithm with 3 clusters 
    KMeans kmeans = new KMeans(3);

    // Compute the algorithm, retrieving an integer array
    //  containing the labels for each of the observations
    int[] labels = kmeans.Compute(observations);

    // As a result, the first two observations should belong to the
    //  same cluster (thus having the same label). The same should
    //  happen to the next four observations and to the last three.

Sample application

The k-means clustering algorithm is commonly used in computer vision as a form of image segmentation. The results of the segmentation are often used to aid border detection and object recognition. The sample application performs image segmentation using the standard squared Euclidean distance over RGB pixel color space. There are, however, better distance metrics to be used for image segmentation, such as weighted distances and other color spaces, which will not be addressed in this example.

Original image (from Ossi Petruska Flickr page*).

To perform image segmentation, we will first translate our image into an array of pixel values. The single image will be read, pixel by pixel, into a jagged array where each element is a double array of length 3. Each element of those double array will contain one of the three RGB values scaled to the interval [–1,1].

After we perform clustering on this array of pixel values, each pixel will have an associated cluster label. Each of these values will then be swapped by its corresponding cluster centroid. The source code below is called when one clicks the Run button in the application.

private void btnRun_Click(object sender, EventArgs e)
{
    // Retrieve the number of clusters
    int k = (int)numClusters.Value;

    // Load original image
    Bitmap image = Properties.Resources.leaf;

    // Transform the image into an array of pixel values
    double[][] pixels = image.ToDoubleArray();


    // Create a K-Means algorithm using given k and a
    //  square euclidean distance as distance metric.
    KMeans kmeans = new KMeans(k, Distance.SquareEuclidean);

    // Compute the K-Means algorithm until the difference in
    //  cluster centroids between two iterations is below 0.05
    int[] idx = kmeans.Compute(pixels, 0.05);


    // Replace every pixel with its corresponding centroid
    pixels.ApplyInPlace((x, i) => kmeans.Clusters.Centroids[idx[i]]);

    // Show resulting image in the picture box
    pictureBox.Image = pixels.ToBitmap(image.Width, image.Height);
}

After segmentation, the following resulting images can be obtained:

Same image after K-Means clustering with k = 5.

Image after K-Means clustering with k = 10.

* The sample image used above has been licensed by Ossi Petruska in a Creative Commons Attribution-NonCommercial-ShareAlike 2.0 Generic license.

Conclusion

K-Means is a very simple and popular approach to clustering. The implementation presented here is the same implementation used in Accord.NET. As it can be seem, the method can be easily extended with custom distance functions through delegates or lambda expressions, and can be used in different contexts, such as image segmentation, without further modifications. As a suggestion for improvement, the method can be further speeded up by using the triangle inequality as suggested on the paper "Using the triangle inequality to accelerate k-means", by Charles Elkan.

In the next article, we will see how we can use K-Means to initialize the Expectation-Maximization algorithm for estimating Gaussian Mixture densities in Gaussian Mixture Models. Those articles will form the basis for Continuous density Hidden Markov Models.

Acknowledgements

To Antonino Porcino, who provided the first version of the code and for the valuable information about many other methods and algorithms.

References

[1] Matteo Matteucci. “Tutorial on Clustering Algorithms,” Politecnico di Milano, http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/kmeans.html (acessed October 4, 2010).

[2] Teknomo, Kardi. "K-Means Clustering Tutorials," http://people.revoledu.com/kardi/ tutorial/kMean/ (acessed October 6, 2010).

[3] Wikipedia contributors, "Cluster analysis," Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/wiki/Cluster_analysis (accessed October 4, 2010).

[4] Wikipedia contributors, "K-means clustering," Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/wiki/K-means_clustering (accessed October 4, 2010).

Monday, 13 September 2010

Handwritten Digits Recognition with C# SVMs in Accord.NET

I've posted a new article on CodeProject, entitled “Handwriting Recognition Revisited: Kernel Support Vector Machines”. It is a continuation of a previous article on handwritten digits recognition but this time using SVMs instead of KDA.

The code uses the SVM library in Accord.NET Framework. The framework, which runs on .NET and is written mostly in C#, supports standard or multiclass support vector machines for either classification or regression, having more than 20 kernel functions available to choose from.

In the article, the same set and the same amount of training and testing samples have been used as in the previous Kernel Discriminant Analysis article for comparison purposes. The experimental results shows that SVMs can outperform KDA both in terms of efficiency and accuracy, requiring much less processing time and memory available while still producing robust results.

Tuesday, 10 August 2010

Matrix manipulation using Accord.NET

Matrix manipulation in Accord.NET
is very straightforward: Just add a new using directive on top of your class to
have (literally) about a hundred new extension methods that operate directly on
.NET multi-dimensional arrays.

target="_blank">
accord-matrix2 width="358" height="265" />

Introduction

target="_blank">
border="0" alt="accord-matrix" align="right" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg5TMxjbp7bYPTPK7yI2QZyk-40mRVxet3XhWaSWb2Udc9n6ASMrz13p7rCDrQTeXpHdY_jZnehvXaPjjYbSdhlAI29tgvIYLqTc9iTHrH4sKAUoE-qUIamPv_itkZgo2Y1mx1UTXW6mkhZ/?imgmax=800"
width="246" height="251" />

Accord.NET uses a bit different approach for matrix manipulation in contrast to
other libraries. By using target="_blank">C# 3.0 extension methods, Accord adds several of the standard
methods you would expect from a Matrix library,
such as linear system solving, matrix algebra and numerical decompositions directly
to the standard double[,] (or more generally T[,]) matrices of the framework[ href="http://accord-net.origo.ethz.ch/image/class_diagram_accord_math">^].

This approach offers some advantages since it avoids mismatches when one is using
multiple libraries, each with their own and often incompatible Matrix implementations.
Extending multi-dimensional arrays makes the use of matrices much more natural in
the .NET world, as no specialized Matrix classes have to be used and no code has
to be rewritten just to use a particular Matrix implementation.

Please note, however, that most methods implemented by Accord are not equivalent
to the heavily optimized versions of more specialized numerical packages, such as

BLAS, from a performance view. Their real power comes when prototyping or realizing algorithms into code. Once an algorithm is written, several
unit tests can be created to test the correctness of the code. After that, because
we have an early working prototype, it will be much easier to perform optimizations
as needed. The key point is to have a working version early which can (if
actually necessary) be optimized later.

Using extension methods

The first step is to include a new using directive on the top of
your source file.

title="accord-matrix3" border="0" alt="accord-matrix3" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgpo_b_J445BwyGME6LhJeH5ub3pRV4uJ5zYb0sSgXhXXdqC4m6HkzlvYv_8ZCEfJc_RssdRmbM_NbgE245V4a8yZgMu9Xohx8R6cEdKbO4D-7GRVciWE3fzE7_Kx4TBzg7llTX1PaZkzT0/?imgmax=800"
width="377" height="137" />

This is all that is necessary after you have referenced the Accord.Math library
in your project. Now you can use all of the methods and operations described below.
The following list is nowhere complete, but shows only some basics and highlights
from the current version of Accord.

Declaring matrices

Using standard .NET declarations

To declare matrices, no special code is required. Just create multi-dimensional
arrays as usual.


double[,] A = 
{
    {1, 2, 3},
    {6, 2, 0},
    {0, 0, 1}
};


double[,] B = 
{ 
    {2, 0, 0}, 
    {0, 2, 0}, 
    {0, 0, 2} 
};

Using Accord.NET extensions

Additionally, one may wish to use one of the convenience methods of Accord.NET to
create specialized matrices such as the Identity matrix, multiples of the Identity
matrix or Diagonal matrices.

Accord.NET (C#)	MATLAB®
double[,] B = class="str">Matrix.Identity(3).Multiply(2);	B = eye(3)*2;
double[,] B = class="str">Matrix.Diagonal(3, 2.0);	B = eye(3)*2;

Using Accord.NET extensions with implicit typing

By using implicit type variable declaration, the code acquires a certain lightweight
feeling and gets closer of its MATLAB®/Octave counterpart (MATLAB is a registered
trademark of The MathWorks, Inc).

Accord.NET (C#)	MATLAB®
var I = class="str">Matrix.Identity(3);	I = eye(3)

A much more closer alternative will be possible by using Algorithm Environments,
a upcoming feature in Accord. When available, this feature will allow construction
of mathematical code using

var I =         class="kwrd">eye(3)

directly. The code will be based on current Octave syntax. Octave is a high-level
language, primarily intended for
numerical computations, that is mostly compatible with
MATLAB.

Matrix operations

All standard matrix operations such as transpose, inverse, column and row manipulations
are available in the extension methods. In the example below, A is the same standard
double[,] matrix declared in the first section of this
article. All methods return a new double[,] matrix
as a result, leaving the original matrix A untouched.

Operation	Accord.NET (C#)	MATLAB®
Transpose	var At = A.Transpose();	At = A'
Inverse	var invA = A.Inverse();	invA = inv(A)
Pseudo-Inverse	var pinvA = A.PseudoInverse();	pinvA = pinv(A)

Matrix Algebra

All common algebraic operations for matrices are also implemented. Those are the
common operations such as addition, subtraction and multiplication. Here, division
is actually a shortcut for multiplying by the inverse.

Operation	Accord.NET (C#)	MATLAB®
Addition	var C = A.Add(B);	C = A + B
Subtraction	var C = A.Subtract(B);	C = A - B
Multiplication	var C = A.Multiply(B);	C = A * B
Division	var C = A.Divide(B);	C = A / B

The above also works with vectors and scalars.

Operations	Accord.NET (C#)	MATLAB®
Multiplying by a scalar	var H = A.Multiply(3.14);	H = A * 3.14;
Multiplying by a column vector	double[] u = { 1, 2, 3 }; double[] v = A.Multiply(u);	v = A * u';

Special element-wise operations

Element-wise operations are operations performed on each element of the matrix or
vector. In Matlab, they are some times known as the dot operations, since they are
usually denoted by prefixing a dot on the common operators.

Operation	Accord.NET (C#)	MATLAB®
Multiplication	var C = A.ElementwiseMultiply(B);	C = A .* B
Division	var C = A.ElementwiseDivide(B);	C = A ./ B
Power	var C = A.ElementwisePower(B);	C = A .^ B

Vector operations

Accord.NET can also perform many vector operations. Among them are the many flavors
of products between vectors, such as the target="_blank">inner, the target="_blank">outer and the target="_blank">Cartesian.

Operation	Accord.NET (C#)	MATLAB®
Inner product (a.k.a. the scalar product)	var w = u.InnerProduct(v);	w = u*v'
Outer product (a.k.a. the matrix product)	var w = u.OuterProduct(v);	w = u'*v
Cartesian product	var w = u.CartesianProduct(v);
Euclidean Norm	double n = u.Norm();	n = norm(u)
Sum	double s = u.Sum();	s = sum(u)
Product	double p = u.Product();	p = prod(u)

Matrix characteristics

Some common matrix characteristics, such as the determinant and trace, are readily
available.

Operation	Accord.NET (C#)	MATLAB®
Determinant	A.Determinant();	det(A)
Trace	A.Trace();	tr(A)

Other characteristics

Other available characteristics are the Summation and Product of vector and matrix
elements.

Operation	Accord.NET (C#)	MATLAB®
Sum vector	double[] sum = A.Sum();	sum(A)
Sum of elements	double sum = A.Sum().Sum()	sum( class="kwrd">sum(A))
Sum along columns	double[] sum = A.Sum(0)	sum(A, 1)
Sum along rows	double[] sum = A.Sum(1)	sum(A, 2)

Linear Algebra

Linear algebra
is certainly one of the most important fields of mathematics, if not the most important
one. Accord includes many methods for numerical linear algebra such as matrix inversion
and matrix decompositions. Most of them were originally based on target="_blank">JAMA and MAPACK, but today Accord has some additions from
routines translated from EISPACK (mainly the
Generalized Eigenvalue Decomposition[^],
which is currently absent from Jama).

Operation	Accord.NET (C#)	MATLAB®
Solve a linear system	x = A.Solve(B)	x = A \ B

Eigenvalue Decomposition

Operation

Accord.NET (C#)

MATLAB®

Standard decomposition


var evd = new EigenvalueDecomposition(A);
var V = evd.Eigenvectors;
var D = evd.DiagonalMatrix;

[V, D] = eig(A)

Generalized decomposition


var gevd = new GeneralizedEigenvalueDecomposition(A,B);
var V = gevd.Eigenvectors;
var D = gevd.DiagonalMatrix;

[V, D] = eig(A,B)

Singular Value Decomposition

Operation	Accord.NET (C#)	MATLAB®
Economy decomposition	var svd = new SingularValueDecomposition(A); var U = evd.LeftSingularVectors; var S = evd.Diagonal; var V = evd.RightSingularVectors;	[U,S,V] = svd(A,0)

QR Decomposition

Operation	Accord.NET (C#)	MATLAB®
Standard decomposition	var qr = new QrDecomposition(A); var Q = qr.OrthogonalFactor; var R = qr.UpperTriangularFactor;	[Q,R] = QR(A)

Cholesky decomposition

Operation	Accord.NET (C#)	MATLAB®
Standard decomposition	var chol = new CholeskyDecomposition(A); var R = chol.LeftTriangularFactor;	R = CHOL(A)

LU Decomposition

Operation	Accord.NET (C#)	MATLAB®
Standard decomposition	var lu = new LuDecomposition(A); var L = lu.LowerTriangularFactor; var U = lu.UpperTriangularFactor;	[L, U] = LU(A)

Special operators

There are some other useful operators available in Accord.NET. There are facilities
to create index vectors (very common in Matlab for accessing sub-portions of a matrix),
select elements, find elements based on a selection criteria, and so on.

Operation	Accord.NET (C#)	MATLAB®
Create a vector of indices	var idx = class="str">Matrix.Indices(0,10);	idx = 1:9
Selecting elements	var B = A.Submatrix(idx);	B = A(idx)
Finding elements matching a certain criteria (For example, finding x ∈ v / x > 2).	var v = { 5, 2, 2, 7, 1, 0 }; var idx = v.Find(x => x > 2);	v = [ 5 2 2 7 1 0]; idx = find(v > 2)
Reshaping a vector to a matrix.	double[] m = { 1, 2, 3, 4 }; double[,] M = Matrix.Reshape(m, 2, 2);	m = [1 2 3 4]; reshape(m,2,2)

More than just matrices

Accord.NET also offers many other standard mathematical functions such as the Gamma
function, log-Gamma, Digamma, Bessel functions, Incomplete beta integrals, and so
on.

target="_blank">
title="accord-matrix4" border="0" alt="accord-matrix4" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgBD_5qI0IflDtm_E5eejzkOOLYsMwGu5OhTlsJ4YNqLJvQFmYNtIG0LocelOaRoNKe0iLAyny0EaKwiYQ_VEVMg-KGXXCWllFpM3XXpQDDHFuF739_mvxNbb1e2sF34hId6sEIraixKkOh/?imgmax=800"
width="412" height="357" />

For a complete listing of the framework features, please check the target="_blank">project page at Origo. In particular, don’t forget to check
out the the target="_blank">class diagram for the Accord.Math namespace.

Cheers!

Legal notice: MATLAB is a registered trademark of The MathWorks, Inc. All
other trademarks are the property of their respective owners.

Monday, 29 November 2010

Handbook of Normal Frames and Coordinates

Linear Algebra

Complex To Real

Stanford Lectures - Fourier Analysis series

Bio-Inspired Artificial Intelligence

Video lectures on Machine Learning

Fearless Symmetry

Zero: The Biography of a Dangerous Idea

Khan Academy

Monday, 18 October 2010

Contents

Initialization

Expectation

Maximization

Repeat

Tuesday, 5 October 2010

Monday, 13 September 2010

Tuesday, 10 August 2010

Introduction

Using extension methods

Declaring matrices

Using standard .NET declarations

Using Accord.NET extensions

Using Accord.NET extensions with implicit typing

Matrix operations

Matrix Algebra

Special element-wise operations

Vector operations

Matrix characteristics

Other characteristics

Linear Algebra

Eigenvalue Decomposition

Singular Value Decomposition

QR Decomposition

Cholesky decomposition

LU Decomposition

Special operators

More than just matrices