Accord Vision: Logistic Regression in C#

The Logistic regression (sometimes called the logistic model or logit model) is used for prediction of the probability of occurrence of an event by fitting data to a logistic curve. Logistic regression is used extensively in the medical and social sciences as well as marketing applications such as prediction of a customer's propensity to purchase a product or cease a subscription.

This code has also been incorporated in Accord.NET Framework, which includes the latest version of this code plus many other statistics and machine learning tools.

Overview

The Logistic regression is a generalized linear model used for binomial regression. Like many forms of regression analysis, it makes use of several predictor variables that may be either numerical or categorical.

Logistic Sigmoid Function

The logistic sigmoid function is given by

g(z) = 1 / (1 + Exp(-z))

where in the context of logistical regression z is called the logit.

Logistic Regression Model

The logistic regression model is a generalized linear model. This means that it is just a linear regression model taken as input for a non-linear link function. The linear model has the form

z = c₁x₁ + c₂x₂ + … c_nx_n + i = c^t x + i

where c is the coefficient vector, i is the intercept value and x is the observation vector for n variables and in the context of logistic regression is called the logit. The logit is then applied as input for the nonlinear logistic sigmoid function g(z), giving as result a probability.

So in a binomial problem where we are trying to determine whether a observation belongs to class C₁ or class C₂, the logistic model tell us that:

p(C₁|x) = g(c^t x + i)

p(C₂|x) = 1 - p(C₁|x)

where p(C₁|x) denotes the probability of C₁ being true when x is true.
In other words, denotes the probability of x belonging to class C₁.

Coefficients

The coefficients for the logistic regression are the values which multiply each observation variable from a sample input vector. The intercept is analogous to the independent term in a polynomial linear regression model or the threshold or bias value for a neuron in a neural network model.

Odds Ratio

After computation of the logistic regression model, every coefficient will have an associated measure called the odds ratio. The odds ratio is a measure of effect size, describing the strength of association or non-independence between two binary data values and can be approximated by raising the coefficient value to the euler's number.

Odds ratio_c = e^c

Standard Error

The standard error for the coefficients can be retrieved from the inverse Hessian matrix calculated during the model fitting phase and can be used to give confidence intervals for the odds ratio. The standard error for the i-th coefficient of the regression can be obtained as:

SE_i= sqrt(diag(H^-1)_i)

Confidence Intervals

The confidence interval around the logistic regression coefficient is plus or minus 1.96*SE_i, where SE_i is the standard error for coefficient i. We can then define:

95% C.I._i = <lower, upper> = <coefficient_i - 1.96 * SE_i, coefficient_i + 1.96 * SE_i>

Wald Statistic and Wald’s Test

The Wald statistic is the ratio of the logistic coefficient to its standard error. A Wald test is used to test the statistical significance of each coefficient in the model. A Wald test calculates a Z statistic, which is:

z = coefficient / standard error

This z value can be squared, yielding a Wald statistic with a chi-square distribution, or, alternatively, it can be taken as is and compared directly with a Normal distribution.

The Wald test outputs a p-value indicating the significance of individual independent variables. If the value is below a chosen significance threshold (typically 0.05), then the variable plays some role in determining the outcome variable that most likely is not result of chance alone. However, there are some problems with the use of the Wald statistic. The Likelihood-ratio test is a better alternative for the Wald test.

Likelihood-Ratio and Chi-Square Test

The likelihood-ratio is the ratio of the likelihood of the full model over the likelihood of a simpler nested model. When compared to the null model the likelihood-ratio test gives an overall model performance measure. When compared to nested models, each with one variable omitted, it tests the statistical significance of each coefficient in the model. These significance tests are considered to be more reliable than the Wald significance test.

The likelihood-ratio is a chi-square statistic given by:

$D & = -2\ln\left( \frac{\text{likelihood for first model}}{\text{likelihood for second model}} \right).$

The model with more parameters will always fit at least as well (have a greater log-likelihood). Whether it fits significantly better and should thus be preferred can be determined by deriving the probability or p-value of the obtained difference D. In many cases, the probability distribution of the test statistic can be approximated by a chi-square distribution with (df1 − df2) degrees of freedom, where df1 and df2 are the degrees of freedom of models 1 and 2 respectively.

Regression to a Logistic Model

If we consider the mapping

φ(<x₁, x₂, …, x_n>) = <x₁, x₂, … x_n, 1>

The logistic regression model can also be rewritten as

p(C₁|φ) = g(w^t φ) = f(φ, w)

so that w contains all coefficients and the intercept value in a single weight vector. Doing so will allow the logistic regression model to be expressed as a common optimization model in the form f(φ, w) allowing many standard non-linear optimization algorithms to be directly applied in the search for the best parameters w that best fits the probability of a class C₁ given a input vector φ.

Likelihood function

The likelihood function for the logistic model is given by:

$p(t|w) = \prod_{n=1}^N y_n^{t_n} \{ 1 - y_n \}^{1-t_n}$

but, as the log of products equals the sum of logs, taking the log of the likelihood function results in the Log-likelihood function in the form:

$\ln p(t|w) = \sum_{n=1}^N \{ t_n \ln y_n + (1-t_n) \ln (1 - y_n) \}$

Furthermore, if we take the negative of the log-likelihood, we will have a error function called the cross-entropy error function:

$E(w) = -\ln p(t|w) = - \sum_{n=1}^N \{ t_n \ln y_n + (1-t_n) \ln (1 - y_n) \}$

which gives both better numerical accuracy and enable us to write the error gradient in the same form as the gradient of the sum-of-squares error function for linear regression models (Bishop, 2006).

Another important detail is that the likelihood surface is convex, so it has no local maxima. Any local maxima will also be a global maxima, so one does not need to worry about getting trapped in a valley when walking down the error function.

Iterative Reweighted Least-Squares

The method of Iterative Reweighted Least-Squares is commonly used to find the maximum likelihood estimates of a generalized linear model. In most common applications, an iterative Newton-Raphson algorithm is used to calculate those maximum likelihood values for the parameters. This algorithm uses second order information, represented in the form of a Hessian matrix, to guide incremental coefficient changes. This is also the algorithm used in this implementation.

Source Code

Below is the main code segment for the logistic regression, performing one pass of the Iterative Reweighted Least-Squares algorithm.

/// <summary>
///   Iterates one pass of the optimization algorithm trying to find
///   the best regression coefficients for the logistic model.
/// </summary>
/// <remarks>
///   An iterative Newton-Raphson algorithm is used to calculate
///   the maximum likelihood values of the parameters.  This procedure
///   uses the partial second derivatives of the parameters in the
///   Hessian matrix to guide incremental parameter changes in an effort
///   to maximize the log likelihood value for the likelihood function. 
/// </remarks>
/// <returns>
///   The absolute value of the largest parameter change.
/// </returns>
public double Regress(double[][] input, double[] output)
{
    // Regress using Iterative Reweighted Least Squares estimation.

    // Initial definitions and memory allocations
    int N = input.Length;
    int M = this.Coefficients.Length;
    double[,] regression = new double[N, M];
    double[,] hessian = new double[M, M];
    double[] gradient = new double[M];
    double[] errors = new double[N];
    double[] R = new double[N];
    double[] deltas;


    // Compute the regression matrix, errors and diagonal
    for (int i = 0; i < N; i++)
    {
        double y = this.Compute(input[i]);
        double o = output[i];

        // Calculate error vector
        errors[i] = y - o;

        // Calculate R diagonal
        R[i] = y * (1.0 - y);

        // Compute the regression matrix
        regression[i, 0] = 1;
        for (int j = 1; j < M; j++)
            regression[i, j] = input[i][j - 1];
    }


    // Compute error gradient and "Hessian" matrix (with diagonal R)
    for (int i = 0; i < M; i++)
    {
        // Compute error gradient
        for (int j = 0; j < N; j++)
            gradient[i] += regression[j, i] * errors[j];

        // Compute "Hessian" matrix (regression'*R*regression)
        for (int j = 0; j < M; j++)
            for (int k = 0; k < N; k++)
                hessian[j, i] += regression[k, i] * (R[k] * regression[k, j]);
    }


    // Decompose to solve the linear system
    LuDecomposition lu = new LuDecomposition(hessian);
    double[,] inverse;

    if (lu.NonSingular)
    {
        // Solve using LU decomposition
        deltas = lu.Solve(gradient);
        inverse = lu.Inverse();
    }
    else
    {
        // Hessian Matrix is singular, try pseudo-inverse solution
        SingularValueDecomposition svd = new SingularValueDecomposition(hessian);
        deltas = svd.Solve(gradient);
        inverse = svd.Inverse();
    }

    // Update coefficients using the calculated deltas
    for (int i = 0; i < coefficients.Length; i++)
        this.coefficients[i] -= deltas[i];

    // Calculate Coefficients standard errors
    for (int i = 0; i < standardErrors.Length; i++)
        standardErrors[i] = System.Math.Sqrt(inverse[i, i]);


    // Return the absolute value of the largest parameter change
    return Matrix.Max(Matrix.Abs(deltas));
}

Furthermore, as the likelihood function is convex, the Logistic Regression Analysis can perform regression without having to experiment different starting points. Below is the code to compute a logistic regression analysis. The algorithm iterates until the largest absolute parameter change between two iterations becomes less than a given limit, or the maximum number of iterations is reached.

/// <summary>
///   Computes the Logistic Regression Analysis.
/// </summary>
/// <remarks>The likelihood surface for the
///   logistic regression learning is convex, so there will be only one
///   peak. Any local maxima will be also a global maxima.
/// </remarks>
/// <param name="limit">
///   The difference between two iterations of the regression algorithm
///   when the algorithm should stop. If not specified, the value of
///   10e-4 will be used. The difference is calculated based on the largest
///   absolute parameter change of the regression.
/// </param>
/// <returns>
///   True if the model converged, false otherwise.
/// </returns>
public bool Compute(double limit, int maxIterations)
{
    double delta;
    int iteration = 0;

    do // learning iterations until convergence
    {
        delta = regression.Regress(inputData, outputData);
        iteration++;

    } while (delta > limit && iteration < maxIterations);


    // Store additional information
    this.deviance = regression.GetDeviance(inputData, outputData);
    this.logLikelihood = regression.GetLogLikelihood(inputData, outputData);
    this.chiSquare = regression.ChiSquare(inputData, outputData);

    for (int i = 0; i < regression.Coefficients.Length; i++)
    {
        this.waldStatistics[i] = regression.GetWaldStatistic(i);
        this.standardErrors[i] = regression.GetStandardError(i);
        this.pvalues[i] = regression.GetPValue(i);
        this.coefficients[i] = regression.Coefficients[i];
        this.confidences[i] = regression.GetConfidenceInterval(i);
        this.oddsRatios[i] = regression.GetOddsRatio(i);
    }

    return delta < limit;
}

Using the code

Code usage is very simple. To perform a Logistic Regression Analysis, simply create an object of the type LogisticRegressionAnalysis and pass the input and output data as constructor parameters. Optionally you can pass the variables names as well.

  // Creates the Logistic Regression Analysis of the given source
  var lra = new LogisticRegressionAnalysis(input, output, independentNames, dependentName);

Then just call Compute() to compute the analysis.

  // Compute the Logistic Regression Analysis
  lra.Compute();

After that, you can display information about the regression coefficients by binding the CoefficientCollection to a DataGridView.

  // Populates coefficient overview with analysis data
  dgvLogisticCoefficients.DataSource = lra.Coefficients;

Sample application

The sample application creates Logistic Regression Analyses by reading Excel workbooks. It can also perform standard Linear Regressions, although there aren’t many options available to specify linear models.

Example data set adapted from http://www.statsdirect.co.uk/help/regression_and_correlation/logi.htm.

The image on the left shows information about the analysis performed, such as coefficient values, standard errors, p-value for the Wald statistic, Odds ratio and 95% confidence intervals. It also reports the log-likelihood, deviance and likelihood-ratio chi-square test for the final model. The newest available version can also compute likelihood-ratio tests for each coefficient, although not shown in the image.

Final Considerations

The logistic regression model can be seen as the exact same as a one layer MLP with only one hidden neuron using a sigmoid activation function trained by a Newton's method learning algorithm. Thus we can say the Logistic Regression is just a special case of Neural Networks. However, one possible (and maybe unique) advantage of logistic regression is that it allows simpler interpretation of its results in the form of odds ratios and statistical hypothesis testing. Statistical analysis using neural networks is also possible, but some may argue it is not as straightforward as using ordinary logistic regression.

References

Bishop, Christopher M.; Pattern Recognition and Machine Learning. Springer; 1st ed. 2006.

Connor, Edward F.; Logistic Regression.

Available in: http://userwww.sfsu.edu/~efc/classes/biol710/logistic/logisticreg.htm.

DTREG; Logistic Regression.

Available in: http://www.dtreg.com/logistic.htm.

Guidry, Trent F.; Linear and Multiple Linear Regression in C#. Available in:

http://www.trentfguidry.net/post/2009/07/19/Linear-and-Multiple-Linear-Regression-in-C.aspx.

Shalizi, Cosma; Logistic Regression and Newton's method.

Available in: http://www.stat.cmu.edu/~cshalizi/350/lectures/26/lecture-26.pdf.

Storkey, Amos; Learning from Data: Learning Logistic Regressors.

Available in: http://www.inf.ed.ac.uk/teaching/courses/lfd/lectures/logisticlearn-print.pdf.

Stata; Stata FAQ: How do I interpret odds ratios in logistic regression?
Available in: http://www.ats.ucla.edu/stat/Stata/faq/oratio.htm

StatsDirect Statistical Software; Logistic Regression.

Available in: http://www.statsdirect.co.uk/help/regression_and_correlation/logi.htm

Wikipedia contributors, "Likelihood-ratio Test", Wikipedia, The Free Encyclopedia,

http://en.wikipedia.org/wiki/Likelihood-ratio_test (accessed February 10, 2010).

Wikipedia contributors, "Logistic Regression", Wikipedia, The Free Encyclopedia,

http://en.wikipedia.org/wiki/Logistic_regression (accessed February 10, 2010).

Accord Vision

Wednesday, 10 February 2010

Logistic Regression in C#