Thursday, 28 January 2010

Kernel Discriminant Analysis in C#

Kernel (Fisher) discriminant analysis (kernel FDA) is a non-linear generalization of linear discriminant analysis (LDA) using techniques of kernel methods. Using a kernel, the originally linear operations of LDA are done in a reproducing kernel Hilbert space with a non-linear mapping.





This code has also been incorporated in Accord.NET Framework, which includes the latest version of this code plus many other statistics and machine learning tools.

Analysis Overview


KDA is an extension of LDA to non-linear distributions, just as KPCA is to PCA. For information about LDA, please check the previous post, Linear Discriminant Analysis in C#. The algorithm presented here is a multi-class generalization of the original algorithm by Mika et al. in Fisher discriminant analysis with kernels (1999).



The objective of KDA is to find a transformation maximizing the between-class variance and minimizing the within-class variance. It can be shown that, with Kernels, the original objective function can be expressed as:


 J(\alpha) = \frac{\alpha^T S_B \alpha}{\alpha^T S_W \alpha}

with:


S_B =  \sum_{c=1}^C (\mu_c - \bar{x}) (\mu_c - \bar{x})^T S_W =  \sum_{c=1}^C K_c (I - 1_{l_c}) K_c^T



where Kc is the Kernel matrix for class c, uc is the column means vector for Kc, I is the identity matrix, lc is the number of samples in class c and 1lc is a lc x lc matrix with all entries 1/lc.



The Kernel Trick


The Kernel trick transforms any algorithm that solely depends on the dot product between two vectors. Wherever a dot product is used, it is replaced with the kernel function.




K(x,y) = <φ(x),φ(y)>




Thus, a linear algorithm can easily be transformed into a non-linear algorithm. So the algorithm can be carried in a higher-dimension space without explicitly mapping the input points into this space. For more information and a cool video showing how the kernel trick works, please see the introductory text in the previous post, Kernel Principal Component Analysis in C#.



Adding regularization


As the problem proposed above is ill-posed (we are estimating l dimensional covariance structures from l samples), the Sw matrix may become singular, and computing the objective function may become problematic. To avoid singularities, we can add a multiple of the identity matrix to the Sw matrix:



S_W :=\!\, S_W + \lambda I



The λ adds numerical stabilities as for large λ the matrix Sw will become positive definite. The λ term can also be seen as a regularization term, favoring solutions with small expansion coefficients [Mika et al, 1999].


 



Source Code


The code below implements a generalization of the original Kernel Discriminant Analysis algorithm by Mika et al. in Fisher discriminant analysis with kernels (1999).



/// <summary>
/// Computes the Multi-Class Kernel Discriminant Analysis algorithm.
/// </summary>
public override void Compute()
{
// Get some initial information
int dimension = Source.GetLength(0);
double[,] source = Source;
double total = dimension;


// Create the Gram (Kernel) Matrix
double[,] K = new double[dimension, dimension];
for (int i = 0; i < dimension; i++)
{
for (int j = i; j < dimension; j++)
{
double s = kernel.Kernel(source.GetRow(i), source.GetRow(j));
K[i, j] = s;
K[j, i] = s;
}
}


// Compute entire data set measures
Means = Tools.Mean(K);
StandardDeviations = Tools.StandardDeviation(K, Means);


// Initialize the kernel analogue scatter matrices
double[,] Sb = new double[dimension, dimension];
double[,] Sw = new double[dimension, dimension];


// For each class
for (int c = 0; c < Classes.Count; c++)
{
// Get the class subset
double[,] subset = Classes[c].Subset;
int count = subset.GetLength(0);

// Create Kernel matrix for the class and get its mean
double[,] Kc = new double[dimension, count];
double[] mean = new double[dimension];
for (int i = 0; i < dimension; i++)
{
double sum = 0.0;
for (int j = 0; j < count; j++)
{
double s = kernel.Kernel(source.GetRow(i), subset.GetRow(j));
Kc[i, j] = s;
sum += s;
}
mean[i] = sum / count;
}

// Construct the Kernel equivalent of the Within-Class Scatter matrix
double[,] Swi = Kc.Multiply(Matrix.Centering(count)).Multiply(Kc.Transpose());
Sw = Sw.Add(Swi);

// Construct the Kernel equivalent of the Between-Class Scatter matrix
double[] d = mean.Subtract(Means);
double[,] Sbi = d.Multiply(d.Transpose()).Multiply(total/count);
Sb = Sb.Add(Sbi);


// Store additional information
ClassScatter[c] = Swi;
ClassCount[c] = count;
ClassMeans[c] = mean;
ClassStandardDeviations[c] = Tools.StandardDeviation(Kc, mean);
}


// Add regularization
double[,] Swu = Sw.Add(Matrix.Diagonal(dimension, regularization));


// Compute eigen value decomposition
EigenValueDecomposition evd = new EigenValueDecomposition(Matrix.Inverse(Swu).Multiply(Sb));

// Gets the eigenvalues and corresponding eigenvectors
double[] evals = evd.RealEigenValues;
double[,] eigs = evd.EigenVectors;


// Sort eigen values and vectors in ascending order
eigs = Matrix.Sort(evals, eigs, new GeneralComparer(ComparerDirection.Descending, true));


// Calculate translation bias
this.DiscriminantBias = eigs.Transpose().Multiply(Means).Multiply(-1.0);


// Store information
base.EigenValues = evals;
base.DiscriminantMatrix = eigs;
base.ScatterBetweenClass = Sb;
base.ScatterWithinClass = Sw;


// Computes additional information about the analysis and creates the
// object-oriented structure to hold the discriminants found.
createDiscriminants();
}




After completing the analysis, we may wish to project new data into discriminant space. The following code projects a data matrix into this space using the basis found during the analysis.



/// <summary>Projects a given matrix into discriminant space.</summary>
/// <param name="matrix">The matrix to be projected.</param>
public override double[,] Transform(double[,] data, int dimension)
{
// Get some information
int rows = data.GetLength(0);
int cols = data.GetLength(1);
int N = Source.GetLength(0);


// Create the Kernel matrix
double[,] K = new double[rows, N];
for (int i = 0; i < rows; i++)
{
for (int j = 0; j < N; j++)
{
K[i, j] = kernel.Kernel(Source.GetRow(j), data.GetRow(i));
}
}


// Project into the kernel discriminant space
double[,] result = new double[rows, dimension];
for (int i = 0; i < rows; i++)
for (int j = 0; j < dimension; j++)
for (int k = 0; k < N; k++)
result[i, j] += K[i, k] * DiscriminantMatrix[k, j];


return result;
}




Sample Application


The sample application shows the how Kernel Discriminant Analysis works. The application can load Excel worksheets containing tables in the same format as the included sample worksheet.












Kernel Discriminant Analysis (KDA) Kernel Discriminant Analysis (KDA) Kernel Discriminant Analysis (KDA) Kernel Discriminant Analysis (KDA)


The first image shows the Wikipedia example for Kernel Principal Component Analysis. The second image shows the Analysis carried out with a Gaussian kernel with sigma = 3.6. The third image shows the Kernel between class and within class equivalent matrices. The last image shows the subset spawned by each class and its Kernel scatter matrix.









Kernel Discriminant Analysis (KDA) Kernel Discriminant Analysis (KDA)


The picture on the left shows the projection of the original data set in the first two discriminant dimensions. Note that just one dimension would already be sufficient to linear separate the classes. On the right, the picture shows how a point in input space (given by the mouse cursor) gets mapped into feature space.



Linear Discriminant Analysis equivalent as a special case








Linear Discriminant Analysis as Kernel Discriminant Analysis spacial case Linear Discriminant Analysis as Kernel Discriminant Analysis spacial case


We can also check that a projection equivalent to Linear Discriminant Analysis can be obtained by using a Linear kernel in the Kernel Discriminant Analysis.



See also




References


Wednesday, 27 January 2010

On Google Account Security, Trustfulness and Dependence

Today I received a very strange email in my personal gmail address from gmail itself:




O seu endereço secundário, "my-personal-address"@gmail.com, está associado a:



"xxxxxx"@gmail.com

"yyyyy"@gmail.com

"zzzzzzz"@gmail.com





Para efetuar login, clique no link abaixo.



http://www.google.com/accounts/

Se você clicar no link acima e ele não funcionar, copie e cole o URL em uma nova janela do navegador.Obrigado por usar o Google.



Em caso de dúvidas ou preocupações em relação à sua conta, visite as Perguntas freqüentes (FAQs) do Google no endereço

http://www.google.com/support/accounts/





Esse correio é apenas para envio de mensagens. As respostas para essa mensagem não serão monitoradas nem respondidas.



Which roughly translates to:





Your secondary email, "my-personal-address"@gmail.com, is associated with:



"xxxxxx"@gmail.com

"yyyyy"@gmail.com

"zzzzzzz"@gmail.com





To login, click on the link below.

http://www.google.com/accounts/




However, only the first address is known to me. I remember creating it to test a web project for a university class last year. But for the others, I have to ask: How did those people add MY personal email address as their secondary email?



 



I'm pretty sure my personal account is safe and has not been compromised. I switch (strong) passwords often and always take care when and where I insert my credentials. Could this be an error within GMail itself? I always thought GMail required secondary email addresses to be confirmed before they could be linked with an account.



 



I'm getting worried because I just realized GMail accounts often aren't just email accounts. They are much more. They form the online identity for a lot of people. Through Google services, they hold financial, health, social and content information. They are often the primary identity in sites such as Twitter, Facebook, LinkedIn and many others. Gosh, I know business which run with GMail addresses. For some people, losing a Google ID would mean starting life over.



I think Google has acquired a lot of responsibility those days. And I hope they can handle it well and wisely, as I (and almost everyone else I know) trust Google to handle a lot of their personal data.



 



On a final note, I strongly suggest Google to start researching ways to identify people other than just asking passwords. I have to use 2048-bit cryptographic keys with unlocking pass-phrases to access my remote hosts; why would I continue using passwords - even if they are strong passwords travelling within SSL by default - to access my - much more important - personal data?

Monday, 25 January 2010

Linear Discriminant Analysis in C#

Linear discriminant analysis (LDA) is a method used in statistics and machine learning to find a linear combination of features which best characterize or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification.





The code presented here is also part of the Accord.NET Framework. The Accord.NET Framework is a framework for developing machine learning, computer vision, computer audition, statistics and math applications. It is based on the already excellent AForge.NET Framework. Please see the starting guide for mode details. The latest version of the framework includes the latest version of this code plus many other statistics and machine learning tools.

Motivations


The goals of LDA are somewhat similar to those of PCA. But different from LDA, PCA is an unsupervised technique and as such does not include label information of the data, effectively ignoring this often useful information. For instance, consider two cigar-like clusters in 2 dimensions, one cigar having y = c and the other y = –c (with c being a arbitrary constant), as the image below suggests:



cigars

This example was adapted from the
note on Fisher Linear Discriminant Analysis by Max Welling.



 



The cigars are positioned in parallel and very closely together, such that the variance in the total dataset, ignoring the labels, is in the direction of the cigars. For classification, this would be a terrible projection, because all labels will get mixed and we will destroy any useful information:



cigars-pca-2 cigars-pca-1


 



A much more useful projection is orthogonal to the cigars, which would perfectly separate the data-cases:



cigars-lda-2 cigars-lda-1


 



The first row of images was obtained by performing PCA on the original data. The left image is the PCA projection using the first two components. However, as the analysis says the first component accounts for 96% of the information, one can infer all other components could be safely discarded. The result is the image on the right, clearly a not very useful projection.


The second row of images was obtained by performing LDA on the original data. The left image is the LDA projection using the first two dimensions. However, as the analysis says the first dimension accounts for 100% of the information, all other dimensions could be safely discarded. Well, this is actually true, as we can see on the result in the right image.



 



Analysis Overview


Linear Discriminant Analysis is closely related to principal component analysis (PCA) and factor analysis in that both look for linear combinations of variables which best explain the data. LDA explicitly attempts to model the difference between the classes of data. PCA on the other hand does not take into account any difference in class, and factor analysis builds the feature combinations based on differences rather than similarities. Linear Discriminant Analysis considers maximizing the following objective:



 J(w) = \frac{w^T S_B w}{w^T S_W w}



where



S_B =  \sum_{c=1}^C (\mu_c - \bar{x}) (\mu_c - \bar{x})^T S_W =  \sum_{c=1}^C \sum_{i \in c} (x_i - \mu_c) (x_i - \mu_c)^T


are the Between-Class Scatter Matrix and Within-Class Scatter Matrix, respectively. The optimal solution can be found by computing the Eigenvalues of SBSW-1 and taking the Eigenvectors corresponding to the largest Eigen values to form a new basis for the data. Those can be easily obtained by computing the generalized eigenvalue decomposition of SB and SW.



 



Source Code


The source code below shows the main algorithm for Linear Discriminant Analysis.



/// <summary>
/// Computes the Multi-Class Linear Discriminant Analysis algorithm.
/// </summary>
public virtual void Compute()
{
// Compute entire data set measures
Means = Tools.Mean(source);
StandardDeviations = Tools.StandardDeviation(source, totalMeans);
double total = dimension;

// Initialize the scatter matrices
this.Sw = new double[dimension, dimension];
this.Sb = new double[dimension, dimension];


// For each class
for (int c = 0; c < Classes.Count; c++)
{
// Get the class subset
double[,] subset = Classes[c].Subset;
int count = subset.GetLength(0);

// Get the class mean and number of samples
double[] mean = Tools.Mean(subset);


// Continue constructing the Within-Class Scatter Matrix
double[,] Swi = Tools.Scatter(subset, mean);
Sw = Sw.Add(Swi);

// Continue constructing the Between-Class Scatter Matrix
double[] d = mean.Subtract(totalMeans);
double[,] Sbi = d.Multiply(d.Transpose()).Multiply(total/count);
Sb = Sb.Add(Sbi);

// Store some additional information
this.classScatter[c] = Swi;
this.classCount[c] = count;
this.classMeans[c] = mean;
this.classStdDevs[c] = Tools.StandardDeviation(subset, mean);
}



// Compute eigen value decomposition
EigenValueDecomposition evd = new EigenValueDecomposition(Matrix.Inverse(Sw).Multiply(Sb));

// Gets the eigenvalues and corresponding eigenvectors
double[] evals = evd.RealEigenValues;
double[,] eigs = evd.EigenVectors;


// Sort eigen values and vectors in ascending order
eigs = Matrix.Sort(evals, eigs, new GeneralComparer(ComparerDirection.Descending, true));


// Calculate translation bias
this.bias = eigs.Transpose().Multiply(totalMeans).Multiply(-1.0);

// Store information
this.EigenValues = evals;
this.DiscriminantMatrix = eigs;


// Create projections
this.result = new double[dimension, dimension];
for (int i = 0; i < dimension; i++)
for (int j = 0; j < dimension; j++)
for (int k = 0; k < dimension; k++)
result[i, j] += source[i, k] * eigenVectors[k, j];


// Computes additional information about the analysis and creates the
// object-oriented structure to hold the discriminants found.
createDiscriminants();
}


 



/// <summary>Projects a given matrix into discriminant space.</summary>
/// <param name="matrix">The matrix to be projected.</param>
/// <param name="dimensions">The number of dimensions to consider.</param>
public virtual double[,] Transform(double[,] data, int dimensions)
{
int rows = data.GetLength(0);
int cols = data.GetLength(1);

double[,] r = new double[rows, dimensions];

// multiply the data matrix by the selected eigenvectors
for (int i = 0; i < rows; i++)
for (int j = 0; j < dimensions; j++)
for (int k = 0; k < cols; k++)
r[i, j] += data[i, k] * eigenVectors[k, j];

return r;
}


 



Using The Code


Code usage is simple, as it follows the same object model in the previous PCA example.





   // Creates the Linear Discriminant Analysis of the given source
lda = new LinearDiscriminantAnalysis(data, labels);


// Compute the Linear Discriminant Analysis
lda.Compute();


// Perform the transformation of the data using two dimensions
double[,] result = lda.Transform(data, 2);


 



Considerations


Besides being useful in many situations, in many practical cases linear discriminants are not suitable. A nice insight is that LDA can be extended for use in non-linear classification via the kernel trick. Using kernels, the original observations can be effectively mapped into a higher dimensional non-linear space. Linear classification in this non-linear space is then equivalent to non-linear classification in the original space.



While the code available here already contains code for Kernel Discriminant Analysis, this is something I'll address in the next post. If you have any suggestions or questions, please leave me a comment.



Finally, as usual, I hope someone finds this information useful.



 



References: