Accord Vision: Kernel Discriminant Analysis in C#

Kernel (Fisher) discriminant analysis (kernel FDA) is a non-linear generalization of linear discriminant analysis (LDA) using techniques of kernel methods. Using a kernel, the originally linear operations of LDA are done in a reproducing kernel Hilbert space with a non-linear mapping.

Download source code

Download sample application

This code has also been incorporated in Accord.NET Framework, which includes the latest version of this code plus many other statistics and machine learning tools.

Analysis Overview

KDA is an extension of LDA to non-linear distributions, just as KPCA is to PCA. For information about LDA, please check the previous post, Linear Discriminant Analysis in C#. The algorithm presented here is a multi-class generalization of the original algorithm by Mika et al. in Fisher discriminant analysis with kernels (1999).

The objective of KDA is to find a transformation maximizing the between-class variance and minimizing the within-class variance. It can be shown that, with Kernels, the original objective function can be expressed as:

$J(\alpha) = \frac{\alpha^T S_B \alpha}{\alpha^T S_W \alpha}$

with:

$S_B = \sum_{c=1}^C (\mu_c - \bar{x}) (\mu_c - \bar{x})^T$

$S_W = \sum_{c=1}^C K_c (I - 1_{l_c}) K_c^T$

where K_c is the Kernel matrix for class c, u_c is the column means vector for K_c, I is the identity matrix, l_c is the number of samples in class c and 1_lc is a l_c x l_c matrix with all entries 1/l_c.

The Kernel Trick

The Kernel trick transforms any algorithm that solely depends on the dot product between two vectors. Wherever a dot product is used, it is replaced with the kernel function.

K(x,y) = <φ(x),φ(y)>

Thus, a linear algorithm can easily be transformed into a non-linear algorithm. So the algorithm can be carried in a higher-dimension space without explicitly mapping the input points into this space. For more information and a cool video showing how the kernel trick works, please see the introductory text in the previous post, Kernel Principal Component Analysis in C#.

Adding regularization

As the problem proposed above is ill-posed (we are estimating l dimensional covariance structures from l samples), the Sw matrix may become singular, and computing the objective function may become problematic. To avoid singularities, we can add a multiple of the identity matrix to the Sw matrix:

$S_W :=\!\, S_W + \lambda I$

The λ adds numerical stabilities as for large λ the matrix Sw will become positive definite. The λ term can also be seen as a regularization term, favoring solutions with small expansion coefficients [Mika et al, 1999].

Source Code

The code below implements a generalization of the original Kernel Discriminant Analysis algorithm by Mika et al. in Fisher discriminant analysis with kernels (1999).

/// <summary>
///   Computes the Multi-Class Kernel Discriminant Analysis algorithm.
/// </summary>
public override void Compute()
{
    // Get some initial information
    int dimension = Source.GetLength(0);
    double[,] source = Source;
    double total = dimension;


    // Create the Gram (Kernel) Matrix
    double[,] K = new double[dimension, dimension];
    for (int i = 0; i < dimension; i++)
    {
        for (int j = i; j < dimension; j++)
        {
            double s = kernel.Kernel(source.GetRow(i), source.GetRow(j));
            K[i, j] = s;
            K[j, i] = s;
        }
    }


    // Compute entire data set measures
    Means = Tools.Mean(K);
    StandardDeviations = Tools.StandardDeviation(K, Means);


    // Initialize the kernel analogue scatter matrices
    double[,] Sb = new double[dimension, dimension];
    double[,] Sw = new double[dimension, dimension];


    // For each class
    for (int c = 0; c < Classes.Count; c++)
    {
        // Get the class subset
        double[,] subset = Classes[c].Subset;
        int count = subset.GetLength(0);

        // Create Kernel matrix for the class and get its mean
        double[,] Kc = new double[dimension, count];
        double[] mean = new double[dimension];
        for (int i = 0; i < dimension; i++)
        {
            double sum = 0.0;
            for (int j = 0; j < count; j++)
            {
                double s = kernel.Kernel(source.GetRow(i), subset.GetRow(j));
                Kc[i, j] = s;
                sum += s;
            }
            mean[i] = sum / count;
        }

        // Construct the Kernel equivalent of the Within-Class Scatter matrix
        double[,] Swi = Kc.Multiply(Matrix.Centering(count)).Multiply(Kc.Transpose());
        Sw = Sw.Add(Swi);
        
        // Construct the Kernel equivalent of the Between-Class Scatter matrix
        double[] d = mean.Subtract(Means);
        double[,] Sbi = d.Multiply(d.Transpose()).Multiply(total/count);
        Sb = Sb.Add(Sbi);


        // Store additional information
        ClassScatter[c] = Swi;
        ClassCount[c] = count;
        ClassMeans[c] = mean;
        ClassStandardDeviations[c] = Tools.StandardDeviation(Kc, mean);
    }


    // Add regularization
    double[,] Swu = Sw.Add(Matrix.Diagonal(dimension, regularization));


    // Compute eigen value decomposition
    EigenValueDecomposition evd = new EigenValueDecomposition(Matrix.Inverse(Swu).Multiply(Sb));

    // Gets the eigenvalues and corresponding eigenvectors
    double[] evals = evd.RealEigenValues;
    double[,] eigs = evd.EigenVectors;


    // Sort eigen values and vectors in ascending order
    eigs = Matrix.Sort(evals, eigs, new GeneralComparer(ComparerDirection.Descending, true));


    // Calculate translation bias
    this.DiscriminantBias = eigs.Transpose().Multiply(Means).Multiply(-1.0);


    // Store information
    base.EigenValues = evals;
    base.DiscriminantMatrix = eigs;
    base.ScatterBetweenClass = Sb;
    base.ScatterWithinClass = Sw;


    // Computes additional information about the analysis and creates the
    //  object-oriented structure to hold the discriminants found.
    createDiscriminants();
}

After completing the analysis, we may wish to project new data into discriminant space. The following code projects a data matrix into this space using the basis found during the analysis.

/// <summary>Projects a given matrix into discriminant space.</summary>
/// <param name="matrix">The matrix to be projected.</param>
public override double[,] Transform(double[,] data, int dimension)
{
    // Get some information
    int rows = data.GetLength(0);
    int cols = data.GetLength(1);
    int N = Source.GetLength(0);


    // Create the Kernel matrix
    double[,] K = new double[rows, N];
    for (int i = 0; i < rows; i++)
    {
        for (int j = 0; j < N; j++)
        {
            K[i, j] = kernel.Kernel(Source.GetRow(j), data.GetRow(i));
        }
    }


    // Project into the kernel discriminant space
    double[,] result = new double[rows, dimension];
    for (int i = 0; i < rows; i++)
        for (int j = 0; j < dimension; j++)
            for (int k = 0; k < N; k++)
                result[i, j] += K[i, k] * DiscriminantMatrix[k, j];


    return result;
}

Sample Application

The sample application shows the how Kernel Discriminant Analysis works. The application can load Excel worksheets containing tables in the same format as the included sample worksheet.

The first image shows the Wikipedia example for Kernel Principal Component Analysis. The second image shows the Analysis carried out with a Gaussian kernel with sigma = 3.6. The third image shows the Kernel between class and within class equivalent matrices. The last image shows the subset spawned by each class and its Kernel scatter matrix.

The picture on the left shows the projection of the original data set in the first two discriminant dimensions. Note that just one dimension would already be sufficient to linear separate the classes. On the right, the picture shows how a point in input space (given by the mouse cursor) gets mapped into feature space.

Linear Discriminant Analysis equivalent as a special case

Linear Discriminant Analysis as Kernel Discriminant Analysis spacial case

We can also check that a projection equivalent to Linear Discriminant Analysis can be obtained by using a Linear kernel in the Kernel Discriminant Analysis.

References

Sebastian Mika, G. Rätsch, J. Weston, B. Schölkopf, K.-R. Müller; Fisher discriminant analysis with kernels (1999).
Available in: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.35.9904

Yongmin Li, Shaogang Gong and Heather Liddell; Kernel Discriminant Analysis.

Available in: http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/LI1/kda/index.html

Accord Vision

Thursday, 28 January 2010

Kernel Discriminant Analysis in C#