Many real-world datasets are corrupted by noise and outliers. Traditional methods for model fitting, such as least squares, are highly susceptible to these outliers, as even a single outlier can significantly distort the estimated parameters. Techniques like RANSAC help in fitting the model accurately despite large number of outliers with robustness efficiently.

RANSAC, an acronym for RANdom SAmple Consensus, is a powerful iterative algorithm used in computer vision and computational geometry to estimate parameters of a mathematical model from a set of observed data containing outliers. It's a non-deterministic algorithm, meaning it produces a reasonable result only with a certain probability, which increases as more iterations are allowed.

Working of RANSAC:

Instead of using all data points to estimate a model, RANSAC intelligently samples subsets of the data:

Random Sampling: In each iteration, a minimum number of data points required to define the model are randomly selected from the input dataset. This small subset is assumed to be "inliers" (i.e., not outliers).

For example, if we are estimating a line model and we have 10 points available in the dataset, then out of ten, only two are randomly sampled, since two points are enough to estimate a line.
Model Estimation: A candidate model is then estimated using only these randomly selected points.
Consensus Check: All other data points in the original dataset are then tested against this candidate model. Points that fit the model within a predefined tolerance are considered inliers for this specific model.
Inlier Count: The number of inliers for this model is counted.
Iteration and Selection: Steps 1-4 are repeated for a predetermined number of iterations. The model that achieves the highest number of inliers across all iterations is chosen as the best estimate.

The intuition behind this is that if the sampled points are all inliers, the resulting model will be accurate and will likely have a large number of other inliers supporting it. If outliers are sampled, the resulting model will likely be poor and attract few inliers.

Now it’s time to implement this algorithm for the equation of a plane.

Plane Model

First we need to create a constructor to estimate the parameters of a plane from three given points (because we already know that three random unique non-collinear points are enough define a plane).

For that, let’s say the points are $(x_1, y_1, z_1)$, $(x_2, y_2, z_2)$ and $(x_3, y_3, z_3)$, first we will find the difference vector between two pair of points.

$$\begin{align} V_1 &= (x_2 - x_1, \space y_2 - y_1, \space z_2 - z_1) \\ V_2 &= (x_3 - x_1, \space y_3 - y_1, \space z_3 - z_1) \end{align}$$

Then we will have to calculate the normal vector, which is the cross product of these two vectors :

$$(\text{A}, \text{B}, \text{C}) = V_1 \times V_2$$

Then substitute one of the points (e.g., (x1,y1,z1)) and the normal vector (A,B,C) into the general plane equation to solve for D:

$$\text{D} = - ( \text{A} x_1 \space + \space \text{B}y_1 \space + \space \text{C}z_1 )$$

which can be also written as the dot product of normal vector with the first point.

PlaneModel(const Point3d &pt1, const Point3d &pt2, const Point3d &pt3){
            Point3d V1 = pt2 - pt1;
            Point3d V2 = pt3 - pt1;

            Point3d cross_product = V1.cross(V2); 

            if (cross_product.norm() < 1e-9) { 
                a = b = c = d = 0; 
                return; }

            normal = cross_product.normalized();

            a = normal.x();
            b = normal.y();
            c = normal.z();
            d = -normal.dot(pt1);
        }

Now we need to find a way to calculate error which defines how good a fit is against the predefined tolerance.

For a plane, we can take the perpendicular distance of a sampled point from the estimated plane. If the distance is small and within tolerance, then the point is more of an inlier and can be added to the consensus set.

The formula for perpedicular distance of a pt $(x_0, y_0, z_0) $ is :

$$d = \frac{\left| \text{A} x_0 \space + \space \text{B}y_0 \space + \space \text{C}z_0 \space + \space \text{D} \right |}{\sqrt{\text{A}^2 + \text{B}^2 + \text{C}^2}}$$

where the denominator is responsible for normalization. Since we will be using normalized values from the start, the denominator is going to be 1 already, so we will omit it.

double computeDistance(const Point3d &pt) const {
            if ( a*a + b*b + c*c < 1e-18) return 1e10; 
            return std::abs(a * pt.x() + b * pt.y() + c * pt.z() + d); }

(Note: As you can see we have still used the denominator expression to calculate whether it is too less or approaching to zero to avoid error and ensure numerical stability)

We will also slide in a Validity check function for clarity and readability instead of manually checking with condition every time. This checks if the normal is too small of a value (or approaching zero) and declares the model invalid if it is so.

bool isValid() const { return normal.norm() > 1e-9; }

The next step is to implement a RANSAC class. First we will declare the required variables in private, i.e. error threshold, maximum iterations and minimum number of points in consensus set and the total dataset:

One more thing which we have to implement in the private part of the class is a method of fitting the model. This method will take the consensus set as input and account those points to estimate the model parameters.

RANSAC class

Model Fitting Function

Our aim is to estimate the parameters for the equation of the plane best fitting to the current consensus set. For that, we have to undergo the following steps:

Calculate the Centroid (or the mean) of the data.

 Point3d centroid = Point3d::Zero();
 for (const auto& pt : consensus_set) centroid += pt;
 centroid /= consensus_set.size();

Center the Data Points according to the centroid by making a centered data matrix.

 Eigen::MatrixXd centered_data(consensus_set.size(), 3);
 for (size_t i = 0; i < consensus_set.size(); ++i) 
     centered_data.row(i) = consensus_set[i] - centroid;

Perform Singular Value Decomposition (SVD) on the Centered Data Matrix (M).

Yes, our beloved technique has returned to solve Linear equations once again.

We know that the relation between SVD and eigenvalue decomposition is:

The eigenvalues of $\mathbf{A}^\top\mathbf{A}$ are the squares of the singular values of M.
The eigenvectors of $\mathbf{A}^\top\mathbf{A}$ are the right singular vectors (columns of V) of M.

Let's say the SVD yields the following (hypothetical) V matrix on solving the centered data matrix (M):

$$V = \begin{pmatrix} 0.43 & -0.54 & 0.71 \\ 0.28 & 0.84 & 0.46 \\ 0.86 & -0.06 & -0.53 \end{pmatrix}$$

The last column of V (which corresponds to the smallest singular value due to the design of SVD) will also correspond to the eigenvector with minimal variance, that is the normal vector which corresponds to the plane with best fit.

$$(\text{A}, \text{B}, \text{C}) = (0.71, \space 0.46, -0.53)$$

now we can use it to find $\mathbf{D}$ and hence, the equation of the plane.

(Note: The eigenvectors corresponding to the largest singular values represent maximum variance or ‘spread’ and hence also represent the principal component. That is why techniques like SVD are also used instead of PCA since they are more efficient in time and computational complexity and numerically robust)

In case you don’t get what is happening here, you can refer to one of our previous blogs about using SVD for Feature Extraction.

RANSAC Loop

Now that our spices are ready, it’s time to cook the main dish. The algorithm for RANSAC has been discussed before, we just have to follow the instructions as it is. First we will initialize the constructor using a member initialization list in the public section:

 public:
        RANSAC(Vec<Point3d> points, double error_tolerance, int max_iterations, int min_consensus) 
        : data(points), error_tolerance(error_tolerance), max_iterations(max_iterations), 
          min_consensus(min_consensus), rng(std::random_device{}()) {}

Now we have will create the loop. First we need to do the Random Sampling. Instead of shuffling the original data array, we will shuffle by index. So for that we will need to create and index array and fill it using iota function. Then we will shuffle it using the random number generator:

Vec<int> indices(data.size());
std::iota(indices.begin(), indices.end(), 0);
std::shuffle(indices.begin(), indices.end(), rng);

Next step is to find three non-collinear points. Now due to the randomness of shuffle and how data might be provided we might encounter colilnear points again and again, so we will limit the attempts to shuffle to either 10 or 1 less than the size of data vector.

PlaneModel currentModel;
bool found_valid_sample = false;
for (int attempt = 0; attempt < 
    std::min(10, (int)data.size() - 2); attempt++) {
    if (indices.size() < 3) break;

    Point3d p1 = data[indices[attempt]];
    Point3d p2 = data[indices[attempt + 1]];
    Point3d p3 = data[indices[attempt + 2]];

    if (!areCollinear(p1, p2, p3)) {
        currentModel = PlaneModel(p1, p2, p3);
        if (currentModel.isValid()) {
            found_valid_sample = true;
            break;
        }
    }                                          
}

if (!found_valid_sample) continue;

(Note : Since all this is part of a big RANSAC loop running for maximum iterations (a parameter we set before) if we don’t find any set of collinear points for this iteration, we will move on to the next one. The picture will get clearer once you see the overall code)

Now we will get the consensus set using the pre-defined function.

 Vec<Point3d> currentConsensusSet = getConsensusSet(currentModel);

Now here’s the catch. The algorithm asks us to fit the model every time we get a new consensus set. But we know that would cause redundant operations of centring the data again and again. Therefore, we will take the cheap way out and only fit the model once we get the best consensus set.

For early termination in case of failure, we will keep count of the attempts it undergoes without any improvement

if (static_cast<int>(currentConsensusSet.size()) > bestInliersCount) {
    bestInliersCount = currentConsensusSet.size();
    bestConsensusSet = currentConsensusSet;
    attempts_without_improvement = 0;
} else {
    attempts_without_improvement++;
}

// Early termination conditions
if (bestInliersCount >= min_consensus && 
    (attempts_without_improvement > max_attempts_without_improvement)) {
    break; 
    }

//the max_iterations loop will end here

// Final model fitting with best consensus set 
if (!bestConsensusSet.empty() && bestConsensusSet.size() >= 3) {
    PlaneModel finalModel = fitModel(bestConsensusSet);
    if (finalModel.isValid()) {
        std::cout << "RANSAC converged with " << bestInliersCount << " inliers out of " << data.size() << " points." << std::endl;
        return finalModel;
    }
}

Therefore, the complete RANSAC implementation becomes this :

PlaneModel run() {
            if (data.size() < 3) {
                std::cerr << "Insufficient data points for plane fitting." << std::endl;
                return PlaneModel();
            }

            int bestInliersCount = 0;
            Vec<Point3d> bestConsensusSet;
            int attempts_without_improvement = 0;
            const int max_attempts_without_improvement = max_iterations / 4;

            for (int i = 0; i < max_iterations; i++) {
                // Random Sampling with improved strategy
                Vec<int> indices(data.size());
                std::iota(indices.begin(), indices.end(), 0);
                std::shuffle(indices.begin(), indices.end(), rng);

                // Find three non-collinear points
                PlaneModel currentModel;
                bool found_valid_sample = false;

                for (int attempt = 0; attempt < std::min(10, (int)data.size() - 2); attempt++) {
                    if (indices.size() < 3) break;

                    Point3d p1 = data[indices[attempt]];
                    Point3d p2 = data[indices[attempt + 1]];
                    Point3d p3 = data[indices[attempt + 2]];

                    if (!areCollinear(p1, p2, p3)) {
                        currentModel = PlaneModel(p1, p2, p3);
                        if (currentModel.isValid()) {
                            found_valid_sample = true;
                            break;
                        }
                    }
                }

                if (!found_valid_sample) continue;

                // Get consensus set for current model 
                Vec<Point3d> currentConsensusSet = getConsensusSet(currentModel);

                // Only update if we found more inliers 
                if (static_cast<int>(currentConsensusSet.size()) > bestInliersCount) {
                    bestInliersCount = currentConsensusSet.size();
                    bestConsensusSet = currentConsensusSet;
                    attempts_without_improvement = 0;
                } else {
                    attempts_without_improvement++;
                }

                // Early termination conditions
                if (bestInliersCount >= min_consensus && 
                    (attempts_without_improvement > max_attempts_without_improvement)) {
                    break;
                }
            }

            // Final model fitting with best consensus set 
            if (!bestConsensusSet.empty() && bestConsensusSet.size() >= 3) {
                PlaneModel finalModel = fitModel(bestConsensusSet);
                if (finalModel.isValid()) {
                    std::cout << "RANSAC converged with " << bestInliersCount << " inliers out of " << data.size() << " points." << std::endl;
                    return finalModel;
                }
            }

            std::cerr << "RANSAC failed to find a valid consensus set." << std::endl;
            return PlaneModel();
        }

Testing the Implementation

To test the implementation, we will select some random points with the known equation of an arbitrary plane. For robustness, we will also find the plane’s distance from some points which we know beforehand to be near/far from the plane. As a collective metric, we can report the distance from computeDistance function as error for a single point and average it out for all inliers.

double evaluateModel(const PlaneModel& model) const {
            if (!model.isValid()) return 1e10;

            double total_error = 0.0;
            int inlier_count = 0;

            for (const auto& pt : data) {
                double dist = model.computeDistance(pt);
                if (dist < error_tolerance) {
                    total_error += dist;
                    inlier_count++;
                }
            }

            if (inlier_count == 0) return 1e10;
            return total_error / inlier_count; 
        }

This is the main function covering rest of the testing :

int main() {

    Vec<Point3d> points;

    // Inlier points (z = 2x + 0.5y + 1, or 2x + 0.5y - z + 1 = 0)
    points.push_back(Point3d(1.0, 1.0, 3.5)); 
    points.push_back(Point3d(2.0, 1.0, 5.5)); 
    points.push_back(Point3d(1.0, 2.0, 4.0)); 
    points.push_back(Point3d(3.0, 2.0, 8.0)); 
    points.push_back(Point3d(0.0, 0.0, 1.0)); 
    points.push_back(Point3d(2.5, 1.5, 7.25)); 
    points.push_back(Point3d(1.5, 0.5, 4.25));  
    points.push_back(Point3d(0.5, 1.5, 2.75));  

    // Outliers
    points.push_back(Point3d(10.0, 10.0, 10.0)); 
    points.push_back(Point3d(10.0, 20.0, 10.0)); 
    points.push_back(Point3d(5.0, 5.0, 100.0));    
    points.push_back(Point3d(-5.0, -5.0, -5.0));   
    points.push_back(Point3d(50.0, 1.0, 1.0));     
    points.push_back(Point3d(20.0, 20.0, 5.0));    
    points.push_back(Point3d(1.0, 1.0, -50.0));    
    points.push_back(Point3d(-10.0, 10.0, 10.0));

    double tolerance = 0.4; 
    int iterations = 2000;  
    int min_pts_for_consensus = 0.6 * points.size(); 

    RANSAC ransac_solver(points, tolerance, iterations, min_pts_for_consensus);
    PlaneModel best_fitted_plane = ransac_solver.run();

    std::cout << "\n--- RANSAC Results ---" << std::endl;
    if (best_fitted_plane.isValid()) { 
        std::cout << "Best fitted plane equation: "
                  << best_fitted_plane.a << "x + "
                  << best_fitted_plane.b << "y + "
                  << best_fitted_plane.c << "z + "
                  << best_fitted_plane.d << " = 0" << std::endl;
        std::cout << "Normal vector: (" << best_fitted_plane.a << ", " << best_fitted_plane.b << ", " << best_fitted_plane.c << ")" << std::endl;

        // Test multiple points
        Vec<Point3d> test_points = {
            Point3d(1.0, 1.0, 3.5),   
            Point3d(2.0, 1.0, 5.5),   
            Point3d(10.0, 10.0, 10.0) 
        };

        for (const auto& test_pt : test_points) {
            std::cout << "Distance from (" << test_pt.x() << "," << test_pt.y() << "," << test_pt.z() 
                      << ") to fitted plane: " << best_fitted_plane.computeDistance(test_pt) << std::endl;
        }

        // Evaluate model quality
        std::cout << "Average inlier error: " << ransac_solver.evaluateModel(best_fitted_plane) << std::endl;

        // Count inliers
        int inlier_count = 0;
        for (const auto& pt : points) {
            if (best_fitted_plane.computeDistance(pt) < tolerance) {
                inlier_count++;
            }
        }
        std::cout << "Total inliers: " << inlier_count << " out of " << points.size() << " points" << std::endl;

    } else {
        std::cout << "RANSAC failed to find a valid plane model." << std::endl;
    }

    return 0;
}

As you can see the equation of the arbitrarily selected plane here is $2x \space + \space 0.5y \space - \space z \space + \space 1 \space = \space 0 \space $ which has normal vector $(2, 0.5, -1)$ which when normalized becomes $(0.872871, 0.218218, -0.436436)$ and this is what our results show (approximately):

Complete Code File : RANSAC Github

Also Watch : SVD Visualized by Visual Kernel

Guide to Implementing RANSAC in C++ Programming