Accord Vision: Partial Least Squares Analysis and Regression in C#

Partial Least Squares Regression (PLS) is a technique that generalizes and combines
features from principal component analysis and (multivariate)
multiple regression. It has been widely adopted in the field of chemometrics
and social sciences.

Download source code

Download sample application

Download the full Accord.NET Framework

The code presented here is also part of the Accord.NET Framework. The Accord.NET Framework is a framework for developing machine learning, computer vision, computer audition, statistics and math applications. It is based on the already excellent AForge.NET Framework. Please see the starting guide for mode details. The latest version of the framework includes the latest version of this code plus many other statistics and machine learning tools.

Introduction

Overview

Source Code

Using the code

Sample application

See also

References

Introduction

Partial least squares regression (PLS-regression) is a
statistical method that bears some relation to
principal components regression. Its goal is to find a
linear regression model by projecting the
predicted variables and the
observable variables to new, latent variable spaces. It
was developed in the 1960s by Herman
Wold to be used in econometrics.
Today it is most commonly used for regression in the field of
chemometrics.

In statistics, latent variables
(as opposed to observable
variables), are variables
that are not directly observed but are rather inferred (through a
mathematical model) from other variables that are observed (directly measured.)
Mathematical models that aim to explain observed variables in terms of latent variables
are called latent variable
models.

A PLS model will try to find the multidimensional direction in the X space
that explains the maximum multidimensional variance direction in the Y space.
PLS-regression is particularly suited when the matrix of predictors has more variables
than observations, and when there is
multicollinearity among X values. By contrast, standard regression
will fail in these cases.

Overview

Multivariate Linear Regression in Latent Space

linear-regression align="right" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhgpSwOCSCrn9hisiwj3hRJVEkbi8gN-qKWqoaiPciV9vtOecqK_DlHszpJKQEADSl7K444skcbvfVuAj1iKl1FuXI1tN6o8X4XSGNm46EQpRHZtHkPYJBTjVwWRkI18LVs9JxzJfv-HrmL/?imgmax=800"
width="150" height="127" />Multiple Linear Regression is a generalization
of simple linear regression
for multiple inputs. In turn,
Multivariate Linear Regression is a generalization of Multiple Linear Regression
for multiple outputs.

The multivariate linear regression is a general linear regression model which can
map an arbitrary dimension space into another arbitrary dimension space using only
linear relationships. In the context of PLS, it is used to map the latent variable
space for X into the latent variable space for Y.

Those latent variable spaces are spawned by the loadings matrix for X and Y, commonly
denoted P and Q, respectively. To compute those matrices, we can use two different
algorithms: NIPALS and SIMPLS.

Algorithm 1: NIPALS

The following is one of the most common and efficient algorithms for NIPALS. There
are, however, many variations of the algorithm which normalize or do not normalize
certain vectors.

Algorithm:

Let X be the mean-centered input matrix,

Let Y be the mean-centered output matrix,

Let P be the loadings matrix for X, and let p_i
denote the i-th column of P;

Let Q be the loadings matrix for Y, and let q_i
denote the i-th column of Q;

Let T be the score matrix for X, and t_i
denote the i-th column of T;

Let U be the score matrix for Y, and u_i
denote the i-th column of U;

Let W be the PLS weight matrix, and d w_i
denote the i-th column of W; and

Let B be a diagonal matrix of diagonal coefficients b_i

Then:

For each factor i to be calculated:

Initially choose u_i as the largest column vector in
X (having the largest sum of squares)

While (t_i has not converged to a desired precision)

w_i ∝ X'u_i
(estimate X weights)

t_i ∝ Xw_i
(estimate X factor scores)

q_i ∝ Y't_i
(estimate Y weights)

u_i = Yq_i
(estimate Y scores)

b_i = t'u
(compute prediction coefficient b)

p_i = X't
(estimate X factor loadings)

X = X – tp'     (deflate X)

For the predictor variables, the amount of variance explained by each factor can
be computed as b_i². For the outputs, it can be computed as the squared
sum of the corresponding P column, i.e. as sum(p_i²).

Algorithm 2: SIMPLS

In SIMPLS, the components are derived by truly maximizing the covariance criterion.
Because the construction of the weight vectors used by SIMPLS is based on the empirical
variance–covariance matrix of the joint input and output variables, outliers present
in the data will severely impact its performance.

Algorithm:

Let X be the mean-centered input matrix,

Let Y be the mean-centered output matrix,

Let P be the loadings matrix for X, and let p_i denote the i-th column of P;

Let C be the loadings matrix for Y, and let c_i denote the i-th column of C;

Let T be the score matrix for X, and t_i denote the i-th column of T;

Let U be the score matrix for Y, and u_i denote the i-th column of U; and

Let W be the PLS weight matrix, and w_i denote the i-th column of W.

Then:

Create the covariance matrix C = X'Y

For each factor i to be calculated:

Perform SVD on the covariance matrix and store the first left singular vector in w_i and the first right singular value times the singular values in c_i.

t_i ∝ X*w_i           (estimate X factor scores)

p_i = X'*t_i           (estimate X factor loadings)

c_i = c_i/norm(t_i)     (estimate Y weights)

w_i = w_i/norm(t_i)     (estimate X weights)

u_i = Y*c_i           (estimate Y scores)

v_i = p_i              (form the basis vector v_i)

Make v orthogonal to the previous loadings V

Make u orthogonal to the previous scores T

Deflate the covariance matrix C

C = C - v_i*(v_i'*C)

Source Code

Here is presented the realization of the algorithms in C#. The models also have
been carried to a Object-oriented structure and are very suitable for direct binding
into Windows.Forms (or WPF) controls.

Class Diagram

target="_blank">
title="pls-diagram" border="0" alt="pls-diagram" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjHiDUOG2ZbtyoIX-W9Ces_yRwq5mZx69Y55ut-T2wu5woCtae_afH20FbehSWWO8Fk1G8Xlu5S0LsjJ00Xisz9S35lX-4NEo3hXE-ljraayIZSESMscenimfL_CDAN1eydnR0PcQTNrpbw/?imgmax=800"
width="509" height="488" />

Class diagram for the Partial Least Squares Analysis.

Performing PLS using NIPALS

Source code for computing PLS using the nipals algorithm:


    /// <summary>
    ///   Computes PLS parameters using NIPALS algorithm.
    /// </summary>
    /// 
    private void nipals(double[,] X, double[,] Y, int factors, double tolerance)
    {
        // References:
        //  - Hervé Abdi, http://www.utdallas.edu/~herve/abdi-wireCS-PLS2010.pdf


        // Initialize and prepare the data
        int rows = sourceX.GetLength(0);
        int xcols = sourceX.GetLength(1);
        int ycols = sourceY.GetLength(1);


        // Initialize storage variables
        double[,] E = (double[,])X.Clone();
        double[,] F = (double[,])Y.Clone();

        double[,] T = new double[rows, factors];
        double[,] U = new double[rows, factors];
        double[,] P = new double[xcols, factors];
        double[,] C = new double[ycols, factors];
        double[,] W = new double[xcols, xcols];
        double[] B = new double[xcols];

        double[] varX = new double[factors];
        double[] varY = new double[factors];


        // Initialize the algorithm
        bool stop = false;

        #region NIPALS
        for (int factor = 0; factor < factors && !stop; factor++)
        {
            // Select t as the largest column from X,
            double[] t = E.GetColumn(largest(E));

            // Select u as the largest column from Y.
            double[] u = F.GetColumn(largest(F));

            // Will store weights for X and Y
            double[] w = new double[xcols];
            double[] c = new double[ycols];


            double norm_t = Norm.Euclidean(t);


            #region Iteration
            while (norm_t > 1e-14)
            {
                // Store initial t to check convergence
                double[] t0 = (double[])t.Clone();


                // Step 1. Estimate w (X weights): w ∝ E'*u
                //   (in Abdi's paper, X is referred as E).

                // 1.1. Compute w = E'*u;
                w = new double[xcols];
                for (int j = 0; j < w.Length; j++)
                    for (int i = 0; i < u.Length; i++)
                        w[j] += E[i, j] * u[i];

                // 1.2. Normalize w (w = w/norm(w))
                w = w.Divide(Norm.Euclidean(w));


                // Step 2. Estimate t (X factor scores): t ∝ E*w
                //   (in Abdi's paper, X is referred as E).

                // 2.1. Compute t = E*w
                t = new double[rows];
                for (int i = 0; i < t.Length; i++)
                    for (int j = 0; j < w.Length; j++)
                        t[i] += E[i, j] * w[j];

                // 2.2. Normalize t: t = t/norm(t)
                t = t.Divide(norm_t = Norm.Euclidean(t));


                // Step 3. Estimate c (Y weights): c ∝ F't
                //   (in Abdi's paper, Y is referred as F).

                // 3.1. Compute c = F'*t0;
                c = new double[ycols];
                for (int j = 0; j < c.Length; j++)
                    for (int i = 0; i < t.Length; i++)
                        c[j] += F[i, j] * t[i];

                // 3.2. Normalize q: c = c/norm(q)
                c = c.Divide(Norm.Euclidean(c));


                // Step 4. Estimate u (Y scores): u = F*q
                //   (in Abdi's paper, Y is referred as F).

                // 4.1. Compute u = F*q;
                u = new double[rows];
                for (int i = 0; i < u.Length; i++)
                    for (int j = 0; j < c.Length; j++)
                        u[i] += F[i, j] * c[j];


                // Recalculate norm of the difference
                norm_t = 0.0;
                for (int i = 0; i < t.Length; i++)
                {
                    double d = (t0[i] - t[i]);
                    norm_t += d * d;
                }

                norm_t = Math.Sqrt(norm_t);
            }
            #endregion


            // Compute the value of b which is used to
            // predict Y from t as b = t'u [Abdi, 2010]
            double b = t.InnerProduct(u);

            // Compute factor loadings for X as p = E'*t [Abdi, 2010]
            double[] p = new double[xcols];
            for (int j = 0; j < p.Length; j++)
                for (int i = 0; i < rows; i++)
                    p[j] += E[i, j] * t[i];

            // Perform deflaction of X and Y
            for (int i = 0; i < t.Length; i++)
            {
                // Deflate X as X = X - t*p';
                for (int j = 0; j < p.Length; j++)
                    E[i, j] -= t[i] * p[j];

                // Deflate Y as Y = Y - b*t*q';
                for (int j = 0; j < c.Length; j++)
                    F[i, j] -= b * t[i] * c[j];
            }


            // Calculate explained variances
            varY[factor] = b * b;
            varX[factor] = p.InnerProduct(p);


            // Save iteration results
            T.SetColumn(factor, t);
            P.SetColumn(factor, p);
            U.SetColumn(factor, u);
            C.SetColumn(factor, c);
            W.SetColumn(factor, w);
            B[factor] = b;


            // Check for residuals as stop criteria
            double[] norm_x = Norm.Euclidean(E);
            double[] norm_y = Norm.Euclidean(F);

            stop = true;
            for (int i = 0; i < norm_x.Length && stop == true; i++)
            {
                // If any of the residuals is higher than the tolerance
                if (norm_x[i] > tolerance || norm_y[i] > tolerance)
                    stop = false;
            }
        }

Performing PLS using SIMPLS

Source code for computing PLS using the simpls algorithm:


    /// <summary>
    ///   Computes PLS parameters using SIMPLS algorithm.
    /// </summary>
    /// 
    private void simpls(double[,] X, double[,] Y, int factors)
    {
        // References:
        //  - Martin Anderson, "A comparison of nine PLS1 algorithms". Journal of Chemometrics,
        //    2009. Available on: http://onlinelibrary.wiley.com/doi/10.1002/cem.1248/pdf
        //  - Hervé Abdi, http://www.utdallas.edu/~herve/abdi-wireCS-PLS2010.pdf
        //  - Statsoft, http://www.statsoft.com/textbook/partial-least-squares/#SIMPLS
        //  - Sijmen de Jong, "SIMPLS: an alternative approach to partial least squares regression"
        //  - N.M. Faber and J. Ferré, “On the numerical stability of two widely used PLS algorithms,”
        //    J. Chemometrics, 22, pps 101-105, 2008.


        // Initialize and prepare the data
        int rows = sourceX.GetLength(0);
        int xcols = sourceX.GetLength(1);
        int ycols = sourceY.GetLength(1);

        // Initialize storage variables
        double[,] P = new double[xcols, factors]; // loading matrix P, the loadings for X such that X = TP + F
        double[,] C = new double[ycols, factors]; // loading matrix C, the loadings for Y such that Y = TC + E
        double[,] T = new double[rows, factors];  // factor score matrix T
        double[,] U = new double[rows, factors];  // factor score matrix U
        double[,] W = new double[xcols, factors]; // weight matrix W

        double[] varX = new double[factors];
        double[] varY = new double[factors];

        // Orthogonal loadings
        double[,] V = new double[xcols, factors];


        // Create covariance matrix C = X'Y
        double[,] covariance = X.TransposeAndMultiply(Y);

        #region SIMPLS
        for (int factor = 0; factor < factors; factor++)
        {

            // Step 1. Obtain the dominant eigenvector w of C'C. However, we
            //   can avoid computing the matrix multiplication by using the
            //   singular value decomposition instead, which is also more
            //   stable. the ﬁrst weight vector w is the left singular vector
            //   of C=X'Y [Abdi, 2007].

            var svd = new SingularValueDecomposition(covariance,
                computeLeftSingularVectors: true,
                computeRightSingularVectors: false,
                autoTranspose: true);

            double[] w = svd.LeftSingularVectors.GetColumn(0);
            double[] c = covariance.TransposeAndMultiply(w);


            // Step 2. Estimate X factor scores: t ∝ X*w
            //   Similarly to NIPALS, the T factor of SIMPLS
            //   is computed as T=X*W [Statsoft] [Abdi, 2010].

            // 2.1. Estimate t (X factor scores): t = X*w [Abdi, 2010]
            double[] t = new double[rows];
            for (int i = 0; i < t.Length; i++)
                for (int j = 0; j < w.Length; j++)
                    t[i] += X[i, j] * w[j];

            // 2.2. Normalize t (X factor scores): t = t/norm(t)
            double norm_t = Norm.Euclidean(t);
            t = t.Divide(norm_t);


            // Step 3. Estimate p (X factor loadings): p = X'*t
            double[] p = new double[xcols];
            for (int i = 0; i < p.Length; i++)
                for (int j = 0; j < t.Length; j++)
                    p[i] += X[j, i] * t[j];


            // Step 4. Estimate X and Y weights. Actually, the weights have
            //   been computed in the first step during SVD. However, since
            //   the X factor scores have been normalized, we also have to
            //   normalize weights accordingly: w = w/norm(t), c = c/norm(t)
            w = w.Divide(norm_t);
            c = c.Divide(norm_t);


            // Step 5. Estimate u (Y factor scores): u = Y*c [Abdi, 2010]
            double[] u = new double[rows];
            for (int i = 0; i < u.Length; i++)
                for (int j = 0; j < c.Length; j++)
                    u[i] += Y[i, j] * c[j];


            // Step 6. Initialize the orthogonal loadings
            double[] v = (double[])p.Clone();


            // Step 7. Make v orthogonal to the previous loadings
            // http://en.wikipedia.org/wiki/Gram%E2%80%93Schmidt_process

            if (factor > 0)
            {
                // 7.1. MGS for v [Martin Anderson, 2009]
                for (int j = 0; j < factor; j++)
                {
                    double proj = 0.0;
                    for (int k = 0; k < v.Length; k++)
                        proj += v[k] * V[k, j];

                    for (int k = 0; k < v.Length; k++)
                        v[k] -= proj * V[k, j];
                }

                // 7.1. MGS for u [Martin Anderson, 2009]
                for (int j = 0; j < factor; j++)
                {
                    double proj = 0.0;
                    for (int k = 0; k < u.Length; k++)
                        proj += u[k] * T[k, j];

                    for (int k = 0; k < u.Length; k++)
                        u[k] -= proj * T[k, j];
                }
            }

            // 7.2. Normalize orthogonal loadings
            v = v.Divide(Norm.Euclidean(v));


            // Step 8. Deflate covariance matrix as s = s - v * (v' * s)
            //   as shown in simpls1 in [Martin Anderson, 2009] appendix.
            double[,] cov = (double[,])covariance.Clone();
            for (int i = 0; i < v.Length; i++)
            {
                for (int j = 0; j < v.Length; j++)
                {
                    double d = v[i] * v[j];

                    for (int k = 0; k < ycols; k++)
                        cov[i, k] -= d * covariance[j, k];
                }
            }
            covariance = cov;


            // Save iteration
            W.SetColumn(factor, w);
            U.SetColumn(factor, u);
            C.SetColumn(factor, c);
            T.SetColumn(factor, t);
            P.SetColumn(factor, p);
            V.SetColumn(factor, v);

            // Compute explained variance
            varX[factor] = p.InnerProduct(p);
            varY[factor] = c.InnerProduct(c);
        }
        #endregion


        // Set class variables
        this.scoresX = T;      // factor score matrix T
        this.scoresY = U;      // factor score matrix U
        this.loadingsX = P;    // loading matrix P, the loadings for X such that X = TP + F
        this.loadingsY = C;    // loading matrix Q, the loadings for Y such that Y = TQ + E
        this.weights = W;      // the columns of R are weight vectors
        this.coeffbase = W;


        // Calculate variance explained proportions
        this.componentProportionX = new double[factors];
        this.componentProportionY = new double[factors];

        double sumX = 0.0, sumY = 0.0;
        for (int i = 0; i < rows; i++)
        {
            // Sum of squares for matrix X
            for (int j = 0; j < xcols; j++)
                sumX += X[i, j] * X[i, j];

            // Sum of squares for matrix Y
            for (int j = 0; j < ycols; j++)
                sumY += Y[i, j] * Y[i, j];
        }

        // Calculate variance proportions
        for (int i = 0; i < factors; i++)
        {
            componentProportionY[i] = varY[i] / sumY;
            componentProportionX[i] = varX[i] / sumX;
        }

    }

Multivariate Linear Regression

Multivariate Linear Regression is computed in a similar manner to a Multiple Linear
Regression. The only difference is that, instead of having a weight vector and a
intercept, we have a weight matrix and a intercept vector.

    /// <summary> 
    /// Computes the Multiple Linear Regression for an input vector. 
    /// </summary> 
     /// <param name="input">The input vector.</param> 
    /// <returns>The calculated output.</returns> 
    public double[] Compute(        class="kwrd">double[] input) 
    { 
        int N = input.Length; 
        int M = coefficients.GetLength(1); 

        double[] result = new double[M]; 
        for (int i = 0; i < M; i++) 
        { 
            result[i] = intercepts[i]; 

            for (int j = 0; j < N; j++) 
                result[i] += input[j] * coefficients[j, i]; 
        } 

        return result; 
    }

The weight matrix and the intercept vector are computed in the PartialLeastSquaresAnalysis
class by the CreateRegression method. In case the analyzed data already
was mean centered before being fed to the analysis, the constructed intercept vector
will consist only of zeros.

    /// <summary> 
    /// Creates a Multivariate Linear Regression model using 
    /// coefficients obtained by the Partial Least Squares. 
    /// </summary> 
    public MultivariateLinearRegression CreateRegression(        class="kwrd">int factors) 
    { 
        int rows = sourceX.GetLength(0); 
        int xcols = sourceX.GetLength(1); 
        int ycols = sourceY.GetLength(1); 

        // Compute regression coefficients B of Y on X as B = RQ' 
        double[,] B = new double[xcols, ycols]; 
        for (int i = 0; i < xcols; i++) 
            for (int j = 0; j < ycols; j++) 
                for (int k = 0; k < factors; k++) 
                    B[i, j] += coeffbase[i, k] * loadingsY[j, k]; 

        // Divide by standard deviation if X has been normalized 
        if (analysisMethod == AnalysisMethod.Correlation) 
            for (int i = 0; i < xcols; i++) 
                for (int j = 0; j < ycols; j++) 
                    B[i, j] = B[i, j] / stdDevX[i]; 

        // Compute regression intercepts A as A = meanY - meanX'*B 
        double[] A = new double[ycols]; 
        for (int i = 0; i < ycols; i++) 
         { 
            double sum = 0.0; 
            for (int j = 0; j < xcols; j++) 
                sum += meanX[j] * B[j, i]; 
            A[i] = meanY[i] - sum; 
        } 

        return new MultivariateLinearRegression(B, A,             class="kwrd">true); 
    }

Using the code

As an example, lets consider the
example data from Hervé Abdi, where the goal is to predict the subjective
evaluation of a set of 5 wines. The dependent variables that we want to predict
for each wine are its likeability, and how well it goes with meat, or dessert (as
rated by a panel of experts). The predictors are the price, the sugar, alcohol,
and acidity content of each wine.

    double[,] dependent = { 
                              // Wine | Hedonic | Goes with meat | Goes with dessert 
                              {           14,          7,                 8 }, 
                              {           10,          7,                 6 }, 
                              {            8,          5,                 5 }, 
                              {            2,          4,                 7 }, 
                              {            6,          2,                 4 }, 
                          }; 

    double[,] predictors = { 
                              // Wine | Price | Sugar | Alcohol | Acidity 
                                       {   7,     7,      13,        7 }, 
                                       {   4,     3,      14,        7 }, 
                                       {  10,     5,      12,        5 }, 
                                       {  16,     7,      11,        3 }, 
                                        {  13,     3,      10,        3 }, 
                           };

Next, we proceed to create the Partial Least Squares Analysis using the Covariance
method (data will only be mean centered but not normalized) and using the SIMPLS
algorithm.

    // Create the analysis using the covariance 
        method and the SIMPLS algorithm 
    PartialLeastSquaresAnalysis pls = new PartialLeastSquaresAnalysis(predictors, dependent, 
        AnalysisMethod.Covariance, PartialLeastSquaresAlgorithm.SIMPLS); 

    // Compute the analysis 
     pls.Compute();

After the analysis has been computed, we can proceed and create the regression model.

    // Create the Multivariate Linear Regression 
        model 
    MultivariateLinearRegression regression = pls.CreateRegression(); 

    // Compute the regression outputs for the predictor variables 
    double[][] aY = regression.Compute(predictors.ToArray());

Now after the regression has been computed, we can find how well it has performed.
The coefficient
of determination r² for the variables Hedonic, Goes with Meat
and Goes with Dessert can be computed by the CoefficientOfDetermination
method of the MultivariateRegressionClass and will be, respectively,
0.9999, 0.9999 and 0.8750 - the closer
to one, the better.

Sample application

The accompanying sample application performs Partial Least Squares Analysis and
Regression in Excel worksheets. The predictors and dependent variables can be selected
once the data has been loaded in the application.

target="_blank">
Wine example for PLS from Hervé Abdi alt="Wine example from Hervé Abdi" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgXAGB_e3lYChScdFcdGnBYwyqpNsjlDMc5kwUW55m46bM-9svPnBuiY8YHKrImOUOXEmmkuvavGozBgZS34rOYKe5z72h9Ed6kkKinPhjSZYA3T3XihiKBoj7d5s645Qtr4GRyqci2aq1s/?imgmax=800"
width="299" />

border="0" alt="Variance explained by PLS using the SIMPLS algorithm" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi0arTljU5KQGirNJTG6DO9EBlSTwjYz2bYLsHZS_9aUWwPcuZ6f3TtifKRVoip6deKVHE-MHVYJnm2HSDCU8L6_WMLBh81S0c_MiMD2gNBrCos8gW-kYYozzcjZpKB49XxuXo2qJB8pcMG/?imgmax=800"
width="299" />

Left: Wine example from Hervé Abdi. Right: Variance explained by PLS using the SIMPLS
algorithm

target="_blank">
border="0" alt="Partial Least Squares Analysis results and regression coefficients for the full regression model"
src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi-hqNVtNaS5QaxBbas90It89yFhbHeTtMzeiDWSSs3kTdp1z_Ez1MxQT9n5QNMlieou1vHVd1YDEKgcUsS9Xg1EsOSmWTpPSXFWM3YO9Vy0anCrf-phiROPmubwBbiYWtFpb_KTV6EfqaX/?imgmax=800"
width="299" />

target="_blank">
border="0" alt="Projection of the dependent and predictors variables using the three first factors"
src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhWN5PoNBFcdltKxSBsoeELBRZV9WVUFtfUhopfJ4t3rpD617AZjdc9t9OUA8VgBNAp5d_0L6lwx2LRqUfMj9bjQFF6k-YyeZzMJc1puFl8K37GozwUW1X0n4HVNxcyy8VxsfnQqt1nRzqh/?imgmax=800"
width="299" />

Left: Partial Least Squares Analysis results and regression coefficients for the
full regression model. Right: projection of the dependent and predictors variables
using the three first factors.

target="_blank">
border="0" alt="Results from the Multivariate Linear Regression performed in Latent Space using three factors from PLS"
src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh971MdN2jqOokTOTuG2cZf6KBM7SDRZomSrqZ1DltFBiEa6d7Bb8VTAgFRdUArKJghlCCjO-kiBEscDKH6W5rCVb8KSXYy00WDaAVsiw8cNfH7W0XQu5MUIXiDtkxQtpjtc5whGY-o76lw/?imgmax=800"
width="605" />

Results from the Multivariate Linear Regression performed in Latent Space using
three factors from PLS.

target="_blank">
alt="Example data from Geladi and Kowalski in the PLS sample application" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4NNWX_xEBDJg2x7gjl9ELza1rzvXmNqb3_Q_f_g0WRbbF5yTFKiHCU84gP_brW4HhJNT4a-Grh0JDNa-PNjgFm7pHCooBW7DfSYnmzpmlu8AYe5i-_wTzeNNGR6Use0qXoEIUWF43RcBN/?imgmax=800"
width="299" />

target="_blank">
alt="Analysis results for the data using NIPALS. We can see that just two factors are enough to explain the whole variance of the set."
src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiBtHtylO66oYdai8xvyKEzxdqb4C9xxuDDWRr8sXcqIbOMryOTJD9hyphenhyphenbsTL96VifCigtlwnqfBqQa-RVXufHe4HBhl3qIhcP0sTXcyEIjp8USoNunzon421-F4J_AZEv8n0iHyAX13jYHR/?imgmax=800"
width="299" />

Left: Example data from Geladi
and Kowalski. Right: Analysis results for the data using NIPALS. We can
see that just two factors are enough to explain the whole variance of the set.

target="_blank">
Projections to the latent spaces detected by the Partial Least Squares Analysis

src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhoZSNbSYOoaWmfxPw8buWy_qB8VL3FLMgb0WJ-OH5Wt8fG5rudrE3vVkUPKtn2DjPr_1GJQa9eSB7RISkpughg6TcMGlrCp-u7oiO6WBluqQc8ZZo_PndeY0sWfyIUTxzQq6xF4YtmnG8B/?imgmax=800"
width="299" />

target="_blank">
border="0" alt="Loadings and coefficients for the two factor Multivariate Linear Regression (MLR) model discovered by Partial Least Squares (PLS)"
src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiRIfcPO8TnNpavJg2OdLHvRq_eE9j9fSjG-KANHZS7b3wYQVQTM29_EOzj9jPOhhhtxxSO12qqxWPPw8B1QsJCgz9oOH4z5ZvHkE2uM5w73yq2UwI8E7kuVbFASIY1aY5TOri1diz_VbMT/?imgmax=800"
width="299" />

Left: Projections to the latent spaces. Right: Loadings and coefficients for the
two factor Multivariate Linear Regression model.

target="_blank">
border="0" alt="Results from the Partial Least Squares Regression showing a perfect fit using only the first two factors"
src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjahgSDSVTOe0BkkXgn7E55QVKiLJrlnzsJemfu-1csiZ8Z1fKSTDWUaRCJbRTB7Xev8DY798P-LThyphenhyphenDoROl1DkeNON3o1UhYRohJOF7t4Zay0aa6-UtlKhNbOxExiqC357OQKfWd9q3M_r/?imgmax=800"
width="605" />

Results from the Partial Least Squares Regression showing a perfect fit using only
the first two factors.

References

Abdi, H. Partial least
square regression, projection on latent structure regression, PLS-Regression.
Wiley Interdisciplinary Reviews: Computational Statistics, 2, 97-106, 2010.

Abdi, H. Partial least
square regression (PLS regression). In N.J. Salkind (Ed.): Encyclopedia
of Measurement and Statistics. Thousand Oaks (CA): Sage. pp. 740-744, 2007.

Mevik, B-H. and Wehrens, R. The pls
Package: Principal Component and Partial Least Squares Regression in R.
Journal of Statistical Software, Volume 18, Issue 2, 2007.

De Jong, S. SIMPLS: an alternative
approach to partial least squares regression. Chemometrics and Intelligent
Laboratory Systems, 18: 251–263, 1993.

Geladi, P. and Kowalski, B.R.
An example of 2-block predictive partial least squares regression with simulated
data. Anal. Chim. Acta 185, 19-32 (1986).

Rosipal, R. and Trejo, L.J.
Kernel Partial Least Squares for Nonlinear Regression and Discrimination.
The Journal of Machine Learning Research, Volume 2, pp. 123, 2002.

Garson, D. Partial Least
Squares Regression (PLS). Website.

Wikipedia contributors. "Partial least squares." Wikipedia, The Free Encyclopedia. Website.

Accord Vision

Saturday, 10 April 2010

Partial Least Squares Analysis and Regression in C#

Contents