RANSAC is an iterative method to build robust estimates for parameters of a mathematical model from a set of observed data which is known to contain outliers. The RANSAC algorithm is often used in computer vision, e.g., to simultaneously solve the correspondence problem and estimate the fundamental matrix related to a pair of stereo cameras.
This code has also been incorporated in Accord.NET Framework, which includes the latest version of this code plus many other statistics and machine learning tools.Introduction
RANSAC is an abbreviation for "RANdom SAmple Consensus". It is an iterative method to estimate parameters of a mathematical model from a set of observed data which may contains outliers. It is a non-deterministic algorithm in the sense that it produces a reasonable result only with a certain probability, with this probability increasing as more iterations are allowed. The algorithm was first published by Fischler and Bolles in 1981.
The basic assumption is that the data consists of "inliers", i.e., data whose distribution can be explained by some mathematical model, and "outliers" which are data that do not fit the model. Outliers could be considered points which come from noise, erroneous measurements or simply incorrect data. RANSAC also assumes that, given a set of inliers, there exists a procedure which can estimate the parameters of a model that optimally explains or fits this data.
Example: Fitting a simple linear regression
We can use RANSAC to robustly fit a linear regression model using noisy data. Consider the example below, in which we have a cloud of points that seems to belong to a line. These are the inliers of the data. The other points, which can be seem as measurement errors or extreme noise values, are points expected to be considered outliers.
Linear structure contained in noisy data.
RANSAC is able to automatically distinguish the inliers from the outliers through the evaluation of the linear regression model. To do so, it randomly selects subsets from the data and attempts to fit linear regression models using them. The model which best explains most of the data will then be returned as the most probably correct model fit.
The image below shows the result of fitting a linear regression directly (as shown by the red line) and using RANSAC (as shown by the blue line). We can see that the red line represents poorly the data structure because it considers all points in order to fit the regression model. The blue line seems to be a much better representation of the linear relationship hidden inside the overall noisy data.
Hidden linear structure inside noisy data. The red line shows the fitting of a linear regression model directly considering all data points. The blue line shows the same result using RANSAC.
Source code
The code below implements RANSAC using a generic approach. Models are considered to be of the reference type TModel and the type of data manipulated by this model is considered to be of the type TData. This approach allows for the creation of a general purpose RANSAC algorithm which can be used in very different contexts, be it the fitting of linear regression models or the estimation of homography matrices from pair of points in different images.
/// <summary>
/// Computes the model using the RANSAC algorithm.
/// </summary>
public TModel Compute(TData[] points, out int[] inliers)
{
// We are going to find the best model (which fits
// the maximum number of inlier points as possible).
TModel bestModel = null;
int[] bestInliers = null;
int maxInliers = 0;
// For this we are going to search for random samples
// of the original points which contains no outliers.
int count = 0; // Total number of trials performed
double N = maxEvaluations; // Estimative of number of trials needed.
// While the number of trials is less than our estimative,
// and we have not surpassed the maximum number of trials
while (count < N && count < maxEvaluations)
{
int[] idx;
TModel model = null;
int samplings = 0;
// While the number of samples attempted is less
// than the maximum limit of attempts
while (samplings < maxSamplings)
{
// Select at random s datapoints to form a trial model.
idx = Statistics.Tools.Random(points.Length, s);
TData[] sample = points.Submatrix(idx);
// If the sampled points are not in a degenerate configuration,
if (!degenerate(sample))
{
// Fit model using the random selection of points
model = fitting(sample);
break; // Exit the while loop.
}
samplings++; // Increase the samplings counter
}
// Now, evaluate the distances between total points and the model returning the
// indices of the points that are inliers (according to a distance threshold t).
idx = distances(model, points, t);
// Check if the model was the model which highest number of inliers:
if (idx.Length > maxInliers)
{
// Yes, this model has the highest number of inliers.
maxInliers = idx.Length; // Set the new maximum,
bestModel = model; // This is the best model found so far,
bestInliers = idx; // Store the indices of the current inliers.
// Update estimate of N, the number of trials to ensure we pick,
// with probability p, a data set with no outliers.
double pInlier = (double)idx.Length / (double)points.Length;
double pNoOutliers = 1.0 - System.Math.Pow(pInlier, s);
N = System.Math.Log(1.0 - probability) / System.Math.Log(pNoOutliers);
}
count++; // Increase the trial counter.
}
inliers = bestInliers;
return bestModel;
}
Besides the generic parameters, the class utilizes three delegated functions during execution.
- The Fitting function, which should accept a subset of the data and use it to fit a model of the chosen type, which should be returned by the function;
- The Degenerate function, which should check if a subset of the training data is already known to result in a poor model, to avoid unnecessary computations; and
- The Distance function, which should accept a model and a subset of the training data to compute the distance between the model prediction and the expected value for a given point. It should return the indices of the points only whose predicted and expected values are within a given threshold of tolerance apart.
Using the code
In the following example, we will fit a simple linear regression of the form x→y using RANSAC. The first step is to create a RANSAC algorithm passing the generic type parameters of the model to be build, i.e. SimpleLinearRegression and of the data to be fitted, i.e. a double array.
In this case we will be using a double array because the first position will hold the values for the input variable x. The second position will be holding the values for the output variables y. If you are already using .NET 4 it is possible to use the Tuple type instead.
// Create a RANSAC algorithm to fit a simple linear regression
var ransac = new RANSAC<SimpleLinearRegression, double[]>(minSamples);
ransac.Probability = probability;
ransac.Threshold = errorThreshold;
ransac.MaxEvaluations = maxTrials;
After the creation of the RANSAC algorithm, we should set the delegate functions which will tell RANSAC how to fit a model, how to tell if a set of samples is degenerate and how to check for inliers in data.
// Set the RANSAC functions to evaluate and test the model
ransac.Fitting = // Define a fitting function
delegate(double[][] sample)
{
// Retrieve the training data
double[] inputs = sample.GetColumn(0);
double[] outputs = sample.GetColumn(1);
// Build a Simple Linear Regression model
var r = new SimpleLinearRegression();
r.Regress(inputs, outputs);
return r;
};
ransac.Degenerate = // Define a check for degenerate samples
delegate(double[][] sample)
{
// In this case, we will not be performing such checkings.
return false;
};
ransac.Distances = // Define a inlier detector function
delegate(SimpleLinearRegression r, double[][] sample, double threshold)
{
List<int> inliers = new List<int>();
for (int i = 0; i < sample.Length; i++)
{
// Compute error for each point
double input = sample[i][0];
double output = sample[i][1];
double error = r.Compute(input) - output;
// If the squared error is below the given threshold,
// the point is considered to be an inlier.
if (error * error < threshold)
inliers.Add(i);
}
return inliers.ToArray();
};
Finally, all we have to do is call the Compute method passing the data. The best model found will be returned by the function, while the given set of inliers indices for this model will be returned as an out parameter.
// Finally, try to fit the regression model using RANSAC
int[] idx; SimpleLinearRegression rlr = ransac.Compute(data, out idx);
Sample application
The accompanying source application demonstrates the fitting of the simple linear regression model with and without using RANSAC. The application accepts Excel worksheets containing the independent values in the first column and the dependent variables in the second column.
References