Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advanced Edge Refinement results #4

Open
s-trinh opened this issue Jun 18, 2021 · 8 comments
Open

Advanced Edge Refinement results #4

s-trinh opened this issue Jun 18, 2021 · 8 comments
Assignees

Comments

@s-trinh
Copy link

s-trinh commented Jun 18, 2021

Hi,

I am interested by the "Advanced Edge Refinement" function since the paper states a median error of 0.017 px.
But I am not able to get good results using the refineCornerPointsByDirectEdgeOptimization function.

For instance, I am testing the refine code with 0019.png from the random.zip dataset.

Please see the code I am using if there is any obvious error I would have made:

  • used image is 0019.png:
    0019
  • ground truth file is tags.csv
  • code:
C++ code (collapsed block, click on the arrow)

#include <ceres/ceres.h>
#include <Eigen/Dense>

#include <apriltags/TagFamily.h>
#include <apriltags/Tag36h11.h>
#include <apriltags/TagDetector.h>

namespace
{
void read_ground_truth(std::map<std::string, std::map<int, std::vector<cv::Point2d>>>& ground_truth)
{
    const std::string filename = "tags.csv";
    std::ifstream file(filename);
    if (!file.is_open()) {
        throw std::invalid_argument("Cannot open ground truth file tags.csv.");
    }

    std::string line;
    file >> line;

    while (file >> line) {
        std::istringstream ss(line);
        std::string token;

        std::getline(ss, token, ',');
        std::string img = token;

        std::getline(ss, token, ',');
        int tag_id = std::atoi(token.c_str());

        std::getline(ss, token, ',');
        std::getline(ss, token, ',');
        double ground_truth_x = std::atof(token.c_str());

        std::getline(ss, token, ',');
        double ground_truth_y = std::atof(token.c_str());

        ground_truth[img][tag_id].push_back(cv::Point2d(ground_truth_x, ground_truth_y));
    }
}

double point_error(double u1, double v1, double u2, double v2)
{
    double u_err = u1 - u2, v_err = v1 - v2;
    return sqrt(u_err*u_err + v_err*v_err);
}

class EdgeCostFunctor
{
  public:
    EdgeCostFunctor(const cv::Mat& rawImg, const cv::Mat& maskImg)
        : raw_img(rawImg), mask_img(maskImg)
    {
        template_img.create(1, 2, CV_8U);
        template_img.data[0] = 0;
        template_img.data[1] = 255;
    }

    template<typename T>
    T getSubPixel(const cv::Mat& img, cv::Point_<T> pt) const
    {
        const T x = floor(pt.x);
        const T y = floor(pt.y);

        const int x0 = cv::borderInterpolate(x, img.cols, cv::BORDER_REPLICATE);
        const int x1 = cv::borderInterpolate(x + 1, img.cols, cv::BORDER_REPLICATE);
        const int y0 = cv::borderInterpolate(y, img.rows, cv::BORDER_REPLICATE);
        const int y1 = cv::borderInterpolate(y + 1, img.rows, cv::BORDER_REPLICATE);

        const T a = pt.x - T(x);
        const T c = pt.y - T(y);

        return (T(img.at<uint8_t>(y0, x0)) * (T(1.0) - a) + T(img.at<uint8_t>(y0, x1)) * a) * (T(1.0) - c) +
               (T(img.at<uint8_t>(y1, x0)) * (T(1.0) - a) + T(img.at<uint8_t>(y1, x1)) * a) * c;
    }

    template<typename T>
    bool operator()(const T* const edge_normal_ptr, const T* const edge_offset_ptr, T* residual) const
    {
        const Eigen::Map<const Eigen::Matrix<T, 2, 1>> edge_normal(edge_normal_ptr);
        const double& edge_offset = *edge_offset_ptr;

        const Eigen::Hyperplane<T, 2> line(edge_normal, edge_offset);

        int residual_index = 0;
        for (int row = 0; row < raw_img.rows; row++)
        {
            const auto raw_img_row_ptr = raw_img.ptr<uint8_t>(row);
            const auto mask_row_ptr = mask_img.ptr<uint8_t>(row);

            for (int col = 0; col < raw_img.cols; col++)
            {
                const auto eval = mask_row_ptr[col];

                if (eval)
                {
                    const Eigen::Vector2d raw_img_pos(col, row);
                    const double dist_to_line = line.signedDistance(raw_img_pos);

                    const ::cv::Point2d template_pt(dist_to_line + 0.5, 0);

                    const double pred_pixel_value = getSubPixel<double>(template_img, template_pt);
                    const uint8_t current_pixel_values = raw_img_row_ptr[col];
                    residual[residual_index] = pred_pixel_value - current_pixel_values;
                    residual_index++;
                }
            }
        }

        return true;
    }

  private:
    cv::Mat raw_img;
    cv::Mat mask_img;
    cv::Mat template_img;
};

void removeBadTags(std::vector<AprilTags::TagDetection>& tag_detections) noexcept
{
    tag_detections.erase(remove_if(begin(tag_detections),
                                   end(tag_detections),
                                   [](AprilTags::TagDetection const& tag) { return tag.good == false; }),
                         end(tag_detections));
}

void refineCornerPointsByDirectEdgeOptimization(
    cv::Mat& img, std::vector<AprilTags::TagDetection>& tag_detections) noexcept
{
    for (AprilTags::TagDetection& tag : tag_detections)
    {
        std::pair<float, float> x_range(std::numeric_limits<float>::max(), std::numeric_limits<float>::min());
        std::pair<float, float> y_range(std::numeric_limits<float>::max(), std::numeric_limits<float>::min());
        for (std::pair<float, float>& corner_point : tag.p)
        {
            x_range.first = std::min(x_range.first, corner_point.first);
            x_range.second = std::max(x_range.second, corner_point.first);
            y_range.first = std::min(y_range.first, corner_point.second);
            y_range.second = std::max(y_range.second, corner_point.second);
        }

        x_range.first = std::max(x_range.first - 10, static_cast<float>(0));
        x_range.second = std::min(x_range.second + 10, static_cast<float>(img.cols - 1));
        y_range.first = std::max(y_range.first - 10, static_cast<float>(0));
        y_range.second = std::min(y_range.second + 10, static_cast<float>(img.rows - 1));

        try
        {
            cv::Rect roi(cv::Point(x_range.first, y_range.first), cv::Point(x_range.second, y_range.second));

            cv::Mat cropped_img = img(roi);

            std::array<Eigen::Vector2d, 4> estimated_edge_normals;
            std::array<double, 4> estimated_edge_offsets;
            std::array<cv::Mat, 4> mask_images;

            int line_thickness = 5;

            auto nextCornerIndex = [](const int i) {
                if (i <= 2)
                {
                    return i + 1;
                }
                else
                {
                    return 0;
                }
            };

            for (int i = 0; i < 4; i++)
            {
                const int next_corner_index = nextCornerIndex(i);

                const Eigen::Vector2d roi_offset_vector(roi.x, roi.y);

                const Eigen::Vector2d corner(tag.p[i].first, tag.p[i].second);
                const Eigen::Vector2d next_corner(tag.p[next_corner_index].first, tag.p[next_corner_index].second);

                const Eigen::Hyperplane<double, 2> edge_line =
                    Eigen::Hyperplane<double, 2>::Through(corner - roi_offset_vector, next_corner - roi_offset_vector);

                estimated_edge_normals[i] = edge_line.normal();
                estimated_edge_offsets[i] = edge_line.offset();

                mask_images[i].create(cropped_img.rows, cropped_img.cols, CV_8U);
                mask_images[i].setTo(0);

                const cv::Point2i current_corner_point(std::round(tag.p[i].first - roi.x),
                                                       std::round(tag.p[i].second - roi.y));

                const cv::Point2i next_corner_point(std::round(tag.p[next_corner_index].first - roi.x),
                                                    std::round(tag.p[next_corner_index].second - roi.y));

                cv::line(mask_images[i], current_corner_point, next_corner_point, cv::Scalar(255), line_thickness);

                cv::rectangle(mask_images[i], current_corner_point, current_corner_point, cv::Scalar(0), 10);
                cv::rectangle(mask_images[i], next_corner_point, next_corner_point, cv::Scalar(0), 10);
            }

            ceres::Problem optimization_problem;

            // Set up the only cost function (also known as residual). This uses
            // auto-differentiation to obtain the derivative (jacobian).

            ceres::NumericDiffOptions numeric_diff_options;

            auto addEdgeResidualBlocks = [&optimization_problem,
                                          &mask_images,
                                          &cropped_img,
                                          &estimated_edge_normals,
                                          &estimated_edge_offsets,
                                          &numeric_diff_options,
                                          &nextCornerIndex](const int i) {
                const int pixel_count = cv::countNonZero(mask_images[i]);

                ceres::CostFunction* cost_function =
                    new ceres::NumericDiffCostFunction<EdgeCostFunctor, ceres::CENTRAL, ceres::DYNAMIC, 2, 1>(
                        new EdgeCostFunctor(cropped_img, mask_images[i]),
                        ceres::TAKE_OWNERSHIP,
                        pixel_count,
                        numeric_diff_options);
                optimization_problem.AddResidualBlock(
                    cost_function, nullptr, estimated_edge_normals[i].data(), &estimated_edge_offsets[i]);

                optimization_problem.SetParameterization(estimated_edge_normals[i].data(),
                                                         new ceres::HomogeneousVectorParameterization(2));
            };

            addEdgeResidualBlocks(0);
            addEdgeResidualBlocks(1);
            addEdgeResidualBlocks(2);
            addEdgeResidualBlocks(3);

            ceres::Solver::Options solve_options;
            solve_options.linear_solver_type = ceres::DENSE_QR;
            solve_options.max_num_iterations = 100;

            ceres::Solver::Summary summary;
            ceres::Solve(solve_options, &optimization_problem, &summary);

            for (int edge_index = 0; edge_index < 4; edge_index++)
            {
                const int next_edge_index = nextCornerIndex(edge_index);
                const int corner_index = next_edge_index;

                const Eigen::Hyperplane<double, 2> edge_A(estimated_edge_normals[edge_index],
                                                          estimated_edge_offsets[edge_index]);
                const Eigen::Hyperplane<double, 2> edge_B(estimated_edge_normals[next_edge_index],
                                                          estimated_edge_offsets[next_edge_index]);

                const Eigen::Vector2d estimated_corner_pos_roi = edge_A.intersection(edge_B);

                tag.p[corner_index].first = estimated_corner_pos_roi.x() + roi.x;
                tag.p[corner_index].second = estimated_corner_pos_roi.y() + roi.y;
            }
        }
        catch (const std::exception& /*e*/)
        {
            tag.good = false;
        }
    }
    // removeBadTags(tag_detections);
}

struct less_than_key
{
    inline bool operator() (const AprilTags::TagDetection& det1, const AprilTags::TagDetection& det2)
    {
        return (det1.id < det2.id);
    }
};
}

int main(int argc, char** argv)
{
    AprilTags::TagDetector apriltag_cpp_detector(AprilTags::TagCodes(AprilTags::tagCodes36h11));

    cv::Mat img_gray = cv::imread("0019.png", cv::IMREAD_GRAYSCALE);

    std::vector<AprilTags::TagDetection> tag_detections = apriltag_cpp_detector.extractTags(img_gray);
    std::sort(tag_detections.begin(), tag_detections.end(), less_than_key());
    std::cout << "tag_detections: " << tag_detections.size() << std::endl;

    std::vector<AprilTags::TagDetection> tag_detections_refined = tag_detections;
    refineCornerPointsByDirectEdgeOptimization(img_gray, tag_detections_refined);

    std::map<std::string, std::map<int, std::vector<cv::Point2d>>> ground_truth;
    read_ground_truth(ground_truth);
    const std::string imageFilename = "0019.png";
    std::cout << "ground_truth: " << (ground_truth.find(imageFilename) != ground_truth.end())
              << " ; ground_truth: " << ground_truth.size() << std::endl;

    for (size_t i = 0; i < tag_detections.size(); i++) {
        std::cout << "\nTag " << tag_detections[i].id << ":" << std::endl;
        for (size_t j = 0; j < 4; j++) {
            cv::Point2d corner = ground_truth[imageFilename][tag_detections[i].id][j];
            double error_detection = point_error(tag_detections[i].p[j].first, tag_detections[i].p[j].second, corner.x, corner.y);
            std::cout << "detection: " << tag_detections[i].p[j].first << ", " << tag_detections[i].p[j].second << " ; error="
                      << error_detection << std::endl;

            double error_detection_refined = point_error(tag_detections_refined[i].p[j].first, tag_detections_refined[i].p[j].second, corner.x, corner.y);
            std::cout << "detection refined: " << tag_detections_refined[i].p[j].first << ", " << tag_detections_refined[i].p[j].second << " ; error="
                      << error_detection_refined << std::endl;

            std::cout << "ground truth: " << corner.x << ", " << corner.y << std::endl;
        }
    }

    return 0;
}

This is mostly copy-paste code but comparing the corners coordinates between refined and ground truth data shows not good results. For instance:

Tag 8:
detection: 2033.96, 1260.67 ; error=0.698578
detection refined: 2033.88, 1260.73 ; error=0.706687
ground truth: 2034.38, 1261.23
detection: 2373.62, 1283.37 ; error=0.522154
detection refined: 2373.38, 1283.32 ; error=0.711717
ground truth: 2373.88, 1283.82
detection: 2367.81, 1005.07 ; error=0.869125
detection refined: 2368.05, 1005.02 ; error=0.708416
ground truth: 2368.55, 1005.53
detection: 2028.88, 1011.25 ; error=0.804983
detection refined: 2028.97, 1011.3 ; error=0.705023
ground truth: 2029.47, 1011.8

Tag 9:
detection: 1665.5, 1236.2 ; error=0.68357
detection refined: 1665.45, 1236.2 ; error=0.707975
ground truth: 1665.95, 1236.71
detection: 1921.41, 1253.23 ; error=0.6572
detection refined: 1921.34, 1253.25 ; error=0.697219
ground truth: 1921.83, 1253.74
detection: 1916.5, 1013.47 ; error=0.697531
detection refined: 1916.57, 1013.37 ; error=0.704689
ground truth: 1917.07, 1013.87
detection: 1660.98, 1017.99 ; error=0.826231
detection refined: 1661.03, 1018.1 ; error=0.712545
ground truth: 1661.54, 1018.6

What did I miss since it should be mostly copy-paste code? Any issue or subtility with the ground truth data?

@s-trinh
Copy link
Author

s-trinh commented Jun 23, 2021

Updated code to compute statistics for 0019.png image

#include <ceres/ceres.h>
#include <Eigen/Dense>

#include <apriltags/TagFamily.h>
#include <apriltags/Tag36h11.h>
#include <apriltags/TagDetector.h>

namespace
{
void read_ground_truth(const std::string& filepath, std::map<std::string, std::map<int, std::vector<cv::Point2d>>>& ground_truth)
{
    std::ifstream file(filepath);
    if (!file.is_open()) {
        throw std::invalid_argument("Cannot open ground truth file tags.csv.");
    }

    std::string line;
    file >> line;

    while (file >> line) {
        std::istringstream ss(line);
        std::string token;

        std::getline(ss, token, ',');
        std::string img = token;

        std::getline(ss, token, ',');
        int tag_id = std::atoi(token.c_str());

        std::getline(ss, token, ',');
        std::getline(ss, token, ',');
        double ground_truth_x = std::atof(token.c_str());

        std::getline(ss, token, ',');
        double ground_truth_y = std::atof(token.c_str());

        ground_truth[img][tag_id].push_back(cv::Point2d(ground_truth_x, ground_truth_y));
    }
}

double point_error(double u1, double v1, double u2, double v2)
{
    double u_err = u1 - u2, v_err = v1 - v2;
    return sqrt(u_err*u_err + v_err*v_err);
}

class EdgeCostFunctor
{
  public:
    EdgeCostFunctor(const cv::Mat& rawImg, const cv::Mat& maskImg)
        : raw_img(rawImg), mask_img(maskImg)
    {
        template_img.create(1, 2, CV_8U);
        template_img.data[0] = 0;
        template_img.data[1] = 255;
    }

    template<typename T>
    T getSubPixel(const cv::Mat& img, cv::Point_<T> pt) const
    {
        const T x = floor(pt.x);
        const T y = floor(pt.y);

        const int x0 = cv::borderInterpolate(x, img.cols, cv::BORDER_REPLICATE);
        const int x1 = cv::borderInterpolate(x + 1, img.cols, cv::BORDER_REPLICATE);
        const int y0 = cv::borderInterpolate(y, img.rows, cv::BORDER_REPLICATE);
        const int y1 = cv::borderInterpolate(y + 1, img.rows, cv::BORDER_REPLICATE);

        const T a = pt.x - T(x);
        const T c = pt.y - T(y);

        return (T(img.at<uint8_t>(y0, x0)) * (T(1.0) - a) + T(img.at<uint8_t>(y0, x1)) * a) * (T(1.0) - c) +
               (T(img.at<uint8_t>(y1, x0)) * (T(1.0) - a) + T(img.at<uint8_t>(y1, x1)) * a) * c;
    }

    template<typename T>
    bool operator()(const T* const edge_normal_ptr, const T* const edge_offset_ptr, T* residual) const
    {
        const Eigen::Map<const Eigen::Matrix<T, 2, 1>> edge_normal(edge_normal_ptr);
        const double& edge_offset = *edge_offset_ptr;

        const Eigen::Hyperplane<T, 2> line(edge_normal, edge_offset);

        int residual_index = 0;
        for (int row = 0; row < raw_img.rows; row++)
        {
            const auto raw_img_row_ptr = raw_img.ptr<uint8_t>(row);
            const auto mask_row_ptr = mask_img.ptr<uint8_t>(row);

            for (int col = 0; col < raw_img.cols; col++)
            {
                const auto eval = mask_row_ptr[col];

                if (eval)
                {
                    const Eigen::Vector2d raw_img_pos(col, row);
                    const double dist_to_line = line.signedDistance(raw_img_pos);

                    const ::cv::Point2d template_pt(dist_to_line + 0.5, 0);

                    const double pred_pixel_value = getSubPixel<double>(template_img, template_pt);
                    const uint8_t current_pixel_values = raw_img_row_ptr[col];
                    residual[residual_index] = pred_pixel_value - current_pixel_values;
                    residual_index++;
                }
            }
        }

        return true;
    }

  private:
    cv::Mat raw_img;
    cv::Mat mask_img;
    cv::Mat template_img;
};

void removeBadTags(std::vector<AprilTags::TagDetection>& tag_detections) noexcept
{
    tag_detections.erase(remove_if(begin(tag_detections),
                                   end(tag_detections),
                                   [](AprilTags::TagDetection const& tag) { return tag.good == false; }),
                         end(tag_detections));
}

void refineCornerPointsByDirectEdgeOptimization(
    cv::Mat& img, std::vector<AprilTags::TagDetection>& tag_detections) noexcept
{
    for (AprilTags::TagDetection& tag : tag_detections)
    {
        std::pair<float, float> x_range(std::numeric_limits<float>::max(), std::numeric_limits<float>::min());
        std::pair<float, float> y_range(std::numeric_limits<float>::max(), std::numeric_limits<float>::min());
        for (std::pair<float, float>& corner_point : tag.p)
        {
            x_range.first = std::min(x_range.first, corner_point.first);
            x_range.second = std::max(x_range.second, corner_point.first);
            y_range.first = std::min(y_range.first, corner_point.second);
            y_range.second = std::max(y_range.second, corner_point.second);
        }

        x_range.first = std::max(x_range.first - 10, static_cast<float>(0));
        x_range.second = std::min(x_range.second + 10, static_cast<float>(img.cols - 1));
        y_range.first = std::max(y_range.first - 10, static_cast<float>(0));
        y_range.second = std::min(y_range.second + 10, static_cast<float>(img.rows - 1));

        try
        {
            cv::Rect roi(cv::Point(x_range.first, y_range.first), cv::Point(x_range.second, y_range.second));

            cv::Mat cropped_img = img(roi);

            std::array<Eigen::Vector2d, 4> estimated_edge_normals;
            std::array<double, 4> estimated_edge_offsets;
            std::array<cv::Mat, 4> mask_images;

            int line_thickness = 5;

            auto nextCornerIndex = [](const int i) {
                if (i <= 2)
                {
                    return i + 1;
                }
                else
                {
                    return 0;
                }
            };

            for (int i = 0; i < 4; i++)
            {
                const int next_corner_index = nextCornerIndex(i);

                const Eigen::Vector2d roi_offset_vector(roi.x, roi.y);

                const Eigen::Vector2d corner(tag.p[i].first, tag.p[i].second);
                const Eigen::Vector2d next_corner(tag.p[next_corner_index].first, tag.p[next_corner_index].second);

                const Eigen::Hyperplane<double, 2> edge_line =
                    Eigen::Hyperplane<double, 2>::Through(corner - roi_offset_vector, next_corner - roi_offset_vector);

                estimated_edge_normals[i] = edge_line.normal();
                estimated_edge_offsets[i] = edge_line.offset();

                mask_images[i].create(cropped_img.rows, cropped_img.cols, CV_8U);
                mask_images[i].setTo(0);

                const cv::Point2i current_corner_point(std::round(tag.p[i].first - roi.x),
                                                       std::round(tag.p[i].second - roi.y));

                const cv::Point2i next_corner_point(std::round(tag.p[next_corner_index].first - roi.x),
                                                    std::round(tag.p[next_corner_index].second - roi.y));

                cv::line(mask_images[i], current_corner_point, next_corner_point, cv::Scalar(255), line_thickness);

                cv::rectangle(mask_images[i], current_corner_point, current_corner_point, cv::Scalar(0), 10);
                cv::rectangle(mask_images[i], next_corner_point, next_corner_point, cv::Scalar(0), 10);
            }

            ceres::Problem optimization_problem;

            // Set up the only cost function (also known as residual). This uses
            // auto-differentiation to obtain the derivative (jacobian).

            ceres::NumericDiffOptions numeric_diff_options;

            auto addEdgeResidualBlocks = [&optimization_problem,
                                          &mask_images,
                                          &cropped_img,
                                          &estimated_edge_normals,
                                          &estimated_edge_offsets,
                                          &numeric_diff_options,
                                          &nextCornerIndex](const int i) {
                const int pixel_count = cv::countNonZero(mask_images[i]);

                ceres::CostFunction* cost_function =
                    new ceres::NumericDiffCostFunction<EdgeCostFunctor, ceres::CENTRAL, ceres::DYNAMIC, 2, 1>(
                        new EdgeCostFunctor(cropped_img, mask_images[i]),
                        ceres::TAKE_OWNERSHIP,
                        pixel_count,
                        numeric_diff_options);
                optimization_problem.AddResidualBlock(
                    cost_function, nullptr, estimated_edge_normals[i].data(), &estimated_edge_offsets[i]);

                optimization_problem.SetParameterization(estimated_edge_normals[i].data(),
                                                         new ceres::HomogeneousVectorParameterization(2));
            };

            addEdgeResidualBlocks(0);
            addEdgeResidualBlocks(1);
            addEdgeResidualBlocks(2);
            addEdgeResidualBlocks(3);

            ceres::Solver::Options solve_options;
            solve_options.linear_solver_type = ceres::DENSE_QR;
            solve_options.max_num_iterations = 100;

            ceres::Solver::Summary summary;
            ceres::Solve(solve_options, &optimization_problem, &summary);

            for (int edge_index = 0; edge_index < 4; edge_index++)
            {
                const int next_edge_index = nextCornerIndex(edge_index);
                const int corner_index = next_edge_index;

                const Eigen::Hyperplane<double, 2> edge_A(estimated_edge_normals[edge_index],
                                                          estimated_edge_offsets[edge_index]);
                const Eigen::Hyperplane<double, 2> edge_B(estimated_edge_normals[next_edge_index],
                                                          estimated_edge_offsets[next_edge_index]);

                const Eigen::Vector2d estimated_corner_pos_roi = edge_A.intersection(edge_B);

                tag.p[corner_index].first = estimated_corner_pos_roi.x() + roi.x;
                tag.p[corner_index].second = estimated_corner_pos_roi.y() + roi.y;
            }
        }
        catch (const std::exception& /*e*/)
        {
            tag.good = false;
        }
    }
    // removeBadTags(tag_detections);
}

struct less_than_key
{
    inline bool operator() (const AprilTags::TagDetection& det1, const AprilTags::TagDetection& det2)
    {
        return (det1.id < det2.id);
    }
};
}

int main(int argc, char** argv)
{
    AprilTags::TagDetector apriltag_cpp_detector(AprilTags::TagCodes(AprilTags::tagCodes36h11));

    cv::Mat img_gray = cv::imread("0019.png", cv::IMREAD_GRAYSCALE);

    std::vector<AprilTags::TagDetection> tag_detections = apriltag_cpp_detector.extractTags(img_gray);
    std::sort(tag_detections.begin(), tag_detections.end(), less_than_key());
    std::cout << "tag_detections: " << tag_detections.size() << std::endl;

    std::vector<AprilTags::TagDetection> tag_detections_refined = tag_detections;
    refineCornerPointsByDirectEdgeOptimization(img_gray, tag_detections_refined);

    std::map<std::string, std::map<int, std::vector<cv::Point2d>>> ground_truth;
    read_ground_truth("tags.csv", ground_truth);
    const std::string imageFilename = "0019.png";
    std::cout << "ground_truth: " << (ground_truth.find(imageFilename) != ground_truth.end())
              << " ; ground_truth: " << ground_truth.size() << std::endl;

    cv::Mat img = cv::imread("0019.png");

    double error_detection_all = 0.0, error_detection_refined_all = 0.0;
    int nb_corners = 0;
    for (size_t i = 0; i < tag_detections.size(); i++) {
        if (tag_detections[i].good) {
            std::cout << "\nTag " << tag_detections[i].id << ":" << std::endl;

            for (size_t j = 0; j < 4; j++, nb_corners++) {
                cv::line(img, cv::Point(tag_detections[i].p[j].first, tag_detections[i].p[j].second),
                         cv::Point(tag_detections[i].p[(j+1) % 4].first, tag_detections[i].p[(j+1) % 4].second),
                         cv::Scalar(0,0,255));
                cv::line(img, cv::Point(tag_detections_refined[i].p[j].first, tag_detections_refined[i].p[j].second),
                         cv::Point(tag_detections_refined[i].p[(j+1) % 4].first, tag_detections_refined[i].p[(j+1) % 4].second),
                         cv::Scalar(0,255,0));

                cv::Point2d corner = ground_truth[imageFilename][tag_detections[i].id][j];
                double error_detection = point_error(tag_detections[i].p[j].first, tag_detections[i].p[j].second, corner.x, corner.y);
                error_detection_all += error_detection;
                std::cout << "detection: " << tag_detections[i].p[j].first << ", " << tag_detections[i].p[j].second << " ; error="
                        << error_detection << std::endl;

                double error_detection_refined = point_error(tag_detections_refined[i].p[j].first, tag_detections_refined[i].p[j].second, corner.x, corner.y);
                error_detection_refined_all += error_detection_refined;
                std::cout << "detection refined: " << tag_detections_refined[i].p[j].first << ", " << tag_detections_refined[i].p[j].second << " ; error="
                        << error_detection_refined << std::endl;

                std::cout << "ground truth: " << corner.x << ", " << corner.y << std::endl;
            }
        }
    }

    std::cout << "\nMean error detection: " << (error_detection_all / nb_corners) << std::endl;
    std::cout << "Mean error detection refined: " << (error_detection_refined_all / nb_corners) << std::endl;

    cv::imshow("Detections", img);
    cv::waitKey();

    return 0;
}

It gives:

Mean error detection: 0.707245
Mean error detection refined: 0.70299

With my custom image, I am also not getting good results:

code

#include <ceres/ceres.h>
#include <Eigen/Dense>

#include <apriltags/TagFamily.h>
#include <apriltags/Tag36h11.h>
#include <apriltags/TagDetector.h>

namespace
{
class EdgeCostFunctor
{
public:
    EdgeCostFunctor(const cv::Mat& rawImg, const cv::Mat& maskImg)
        : raw_img(rawImg), mask_img(maskImg)
    {
        template_img.create(1, 2, CV_8U);
        template_img.data[0] = 0;
        template_img.data[1] = 255;
    }

    template<typename T>
    T getSubPixel(const cv::Mat& img, cv::Point_<T> pt) const
    {
        const T x = floor(pt.x);
        const T y = floor(pt.y);

        const int x0 = cv::borderInterpolate(x, img.cols, cv::BORDER_REPLICATE);
        const int x1 = cv::borderInterpolate(x + 1, img.cols, cv::BORDER_REPLICATE);
        const int y0 = cv::borderInterpolate(y, img.rows, cv::BORDER_REPLICATE);
        const int y1 = cv::borderInterpolate(y + 1, img.rows, cv::BORDER_REPLICATE);

        const T a = pt.x - T(x);
        const T c = pt.y - T(y);

        return (T(img.at<uint8_t>(y0, x0)) * (T(1.0) - a) + T(img.at<uint8_t>(y0, x1)) * a) * (T(1.0) - c) +
               (T(img.at<uint8_t>(y1, x0)) * (T(1.0) - a) + T(img.at<uint8_t>(y1, x1)) * a) * c;
    }

    template<typename T>
    bool operator()(const T* const edge_normal_ptr, const T* const edge_offset_ptr, T* residual) const
    {
        const Eigen::Map<const Eigen::Matrix<T, 2, 1>> edge_normal(edge_normal_ptr);
        const double& edge_offset = *edge_offset_ptr;

        const Eigen::Hyperplane<T, 2> line(edge_normal, edge_offset);

        int residual_index = 0;
        for (int row = 0; row < raw_img.rows; row++)
        {
            const auto raw_img_row_ptr = raw_img.ptr<uint8_t>(row);
            const auto mask_row_ptr = mask_img.ptr<uint8_t>(row);

            for (int col = 0; col < raw_img.cols; col++)
            {
                const auto eval = mask_row_ptr[col];

                if (eval)
                {
                    const Eigen::Vector2d raw_img_pos(col, row);
                    const double dist_to_line = line.signedDistance(raw_img_pos);

                    const ::cv::Point2d template_pt(dist_to_line + 0.5, 0);

                    const double pred_pixel_value = getSubPixel<double>(template_img, template_pt);
                    const uint8_t current_pixel_values = raw_img_row_ptr[col];
                    residual[residual_index] = pred_pixel_value - current_pixel_values;
                    residual_index++;
                }
            }
        }

        return true;
    }

private:
    cv::Mat raw_img;
    cv::Mat mask_img;
    cv::Mat template_img;
};

void removeBadTags(std::vector<AprilTags::TagDetection>& tag_detections) noexcept
{
    tag_detections.erase(remove_if(begin(tag_detections),
                                   end(tag_detections),
                                   [](AprilTags::TagDetection const& tag) { return tag.good == false; }),
                         end(tag_detections));
}

void refineCornerPointsByDirectEdgeOptimization(
    cv::Mat& img, std::vector<AprilTags::TagDetection>& tag_detections) noexcept
{
    for (AprilTags::TagDetection& tag : tag_detections)
    {
        std::pair<float, float> x_range(std::numeric_limits<float>::max(), std::numeric_limits<float>::min());
        std::pair<float, float> y_range(std::numeric_limits<float>::max(), std::numeric_limits<float>::min());
        for (std::pair<float, float>& corner_point : tag.p)
        {
            x_range.first = std::min(x_range.first, corner_point.first);
            x_range.second = std::max(x_range.second, corner_point.first);
            y_range.first = std::min(y_range.first, corner_point.second);
            y_range.second = std::max(y_range.second, corner_point.second);
        }

        x_range.first = std::max(x_range.first - 10, static_cast<float>(0));
        x_range.second = std::min(x_range.second + 10, static_cast<float>(img.cols - 1));
        y_range.first = std::max(y_range.first - 10, static_cast<float>(0));
        y_range.second = std::min(y_range.second + 10, static_cast<float>(img.rows - 1));

        try
        {
            cv::Rect roi(cv::Point(x_range.first, y_range.first), cv::Point(x_range.second, y_range.second));

            cv::Mat cropped_img = img(roi);

            std::array<Eigen::Vector2d, 4> estimated_edge_normals;
            std::array<double, 4> estimated_edge_offsets;
            std::array<cv::Mat, 4> mask_images;

            int line_thickness = 5;

            auto nextCornerIndex = [](const int i) {
                if (i <= 2)
                {
                    return i + 1;
                }
                else
                {
                    return 0;
                }
            };

            for (int i = 0; i < 4; i++)
            {
                const int next_corner_index = nextCornerIndex(i);

                const Eigen::Vector2d roi_offset_vector(roi.x, roi.y);

                const Eigen::Vector2d corner(tag.p[i].first, tag.p[i].second);
                const Eigen::Vector2d next_corner(tag.p[next_corner_index].first, tag.p[next_corner_index].second);

                const Eigen::Hyperplane<double, 2> edge_line =
                    Eigen::Hyperplane<double, 2>::Through(corner - roi_offset_vector, next_corner - roi_offset_vector);

                estimated_edge_normals[i] = edge_line.normal();
                estimated_edge_offsets[i] = edge_line.offset();

                mask_images[i].create(cropped_img.rows, cropped_img.cols, CV_8U);
                mask_images[i].setTo(0);

                const cv::Point2i current_corner_point(std::round(tag.p[i].first - roi.x),
                                                       std::round(tag.p[i].second - roi.y));

                const cv::Point2i next_corner_point(std::round(tag.p[next_corner_index].first - roi.x),
                                                    std::round(tag.p[next_corner_index].second - roi.y));

                cv::line(mask_images[i], current_corner_point, next_corner_point, cv::Scalar(255), line_thickness);

                cv::rectangle(mask_images[i], current_corner_point, current_corner_point, cv::Scalar(0), 10);
                cv::rectangle(mask_images[i], next_corner_point, next_corner_point, cv::Scalar(0), 10);
            }

            ceres::Problem optimization_problem;

            // Set up the only cost function (also known as residual). This uses
            // auto-differentiation to obtain the derivative (jacobian).

            ceres::NumericDiffOptions numeric_diff_options;

            auto addEdgeResidualBlocks = [&optimization_problem,
                                          &mask_images,
                                          &cropped_img,
                                          &estimated_edge_normals,
                                          &estimated_edge_offsets,
                                          &numeric_diff_options,
                                          &nextCornerIndex](const int i) {
                const int pixel_count = cv::countNonZero(mask_images[i]);

                ceres::CostFunction* cost_function =
                    new ceres::NumericDiffCostFunction<EdgeCostFunctor, ceres::CENTRAL, ceres::DYNAMIC, 2, 1>(
                        new EdgeCostFunctor(cropped_img, mask_images[i]),
                        ceres::TAKE_OWNERSHIP,
                        pixel_count,
                        numeric_diff_options);
                optimization_problem.AddResidualBlock(
                    cost_function, nullptr, estimated_edge_normals[i].data(), &estimated_edge_offsets[i]);

                optimization_problem.SetParameterization(estimated_edge_normals[i].data(),
                                                         new ceres::HomogeneousVectorParameterization(2));
            };

            addEdgeResidualBlocks(0);
            addEdgeResidualBlocks(1);
            addEdgeResidualBlocks(2);
            addEdgeResidualBlocks(3);

            ceres::Solver::Options solve_options;
            solve_options.linear_solver_type = ceres::DENSE_QR;
            solve_options.max_num_iterations = 100;

            ceres::Solver::Summary summary;
            ceres::Solve(solve_options, &optimization_problem, &summary);

            for (int edge_index = 0; edge_index < 4; edge_index++)
            {
                const int next_edge_index = nextCornerIndex(edge_index);
                const int corner_index = next_edge_index;

                const Eigen::Hyperplane<double, 2> edge_A(estimated_edge_normals[edge_index],
                                                          estimated_edge_offsets[edge_index]);
                const Eigen::Hyperplane<double, 2> edge_B(estimated_edge_normals[next_edge_index],
                                                          estimated_edge_offsets[next_edge_index]);

                const Eigen::Vector2d estimated_corner_pos_roi = edge_A.intersection(edge_B);

                tag.p[corner_index].first = estimated_corner_pos_roi.x() + roi.x;
                tag.p[corner_index].second = estimated_corner_pos_roi.y() + roi.y;
            }
        }
        catch (const std::exception& e)
        {
            std::cerr << "Exception: " << e.what() << std::endl;
            tag.good = false;
        }
    }
    // removeBadTags(tag_detections);
}

struct less_than_key
{
    inline bool operator() (const AprilTags::TagDetection& det1, const AprilTags::TagDetection& det2)
    {
        return (det1.id < det2.id);
    }
};

void loadGroundTruth(const std::string& path, int idx, cv::Matx31d& rvec, cv::Matx31d& tvec)
{
    char buffer[256];
    sprintf(buffer, path.c_str(), idx);
    std::string filename = buffer;
    std::ifstream file(filename);

    cv::Matx44d T;
    for (int i = 0; i < 4; i++) {
        for (int j = 0; j < 4; j++) {
            file >> T(i,j);
        }
    }

    cv::Matx33d R;
    for (int i = 0; i < 3; i++) {
        for (int j = 0; j < 3; j++) {
            R(i,j) = T(i,j);
        }
        tvec(i) = T(i,3);
    }
    cv::Rodrigues(R, rvec);
}

std::vector<cv::Point3d> buildTagPoints(double size)
{
    std::vector<cv::Point3d> pts;
    double tag_size_2 = (size * 8/10.0) / 2.0;
    pts.push_back(cv::Point3d(-tag_size_2, -tag_size_2, 0));
    pts.push_back(cv::Point3d( tag_size_2, -tag_size_2, 0));
    pts.push_back(cv::Point3d( tag_size_2,  tag_size_2, 0));
    pts.push_back(cv::Point3d(-tag_size_2,  tag_size_2, 0));
    return pts;
}

double pointError(double u1, double v1, double u2, double v2)
{
    double u_err = u1 - u2, v_err = v1 - v2;
    return sqrt(u_err*u_err + v_err*v_err);
}
}

int main(int argc, char** argv)
{
    AprilTags::TagDetector apriltag_cpp_detector(AprilTags::TagCodes(AprilTags::tagCodes36h11));

    std::string imageFilepath = "scene_36h11.png";
    cv::Mat img = cv::imread(imageFilepath);
    cv::Mat img_gray;
    cv::cvtColor(img, img_gray, cv::COLOR_BGR2GRAY);

    std::vector<AprilTags::TagDetection> tag_detections = apriltag_cpp_detector.extractTags(img_gray);
    std::sort(tag_detections.begin(), tag_detections.end(), less_than_key());
    std::cout << "tag_detections: " << tag_detections.size() << std::endl;

    cv::Matx33d cameraMatrix = cv::Matx33d::eye();
    cameraMatrix(0,0) = 2100;
    cameraMatrix(1,1) = 2100;
    cameraMatrix(0,2) = 960;
    cameraMatrix(1,2) = 540;

    std::vector<std::vector<cv::Point3d>> tagsPoints;
    const double scale = 8.0/10.0;
    double tagSize[5] = {0.25, 0.25, 0.25, 0.125, 0.125};
    tagsPoints.push_back(buildTagPoints(0.25));
    tagsPoints.push_back(buildTagPoints(0.25));
    tagsPoints.push_back(buildTagPoints(0.25));
    tagsPoints.push_back(buildTagPoints(0.125));
    tagsPoints.push_back(buildTagPoints(0.125));

    std::vector<cv::Matx31d> rvecs, tvecs;
    std::vector<std::vector<cv::Point2d>> tagsImagePoints;
    for (size_t i = 0; i < 5; i++) {
        cv::Matx31d rvec, tvec;
        loadGroundTruth("cMo%d.txt", i+1, rvec, tvec);
        rvecs.push_back(rvec);
        tvecs.push_back(tvec);

        std::vector<cv::Point2d> imagePoints;
        cv::projectPoints(tagsPoints[i], rvec, tvec, cameraMatrix, cv::noArray(), imagePoints);
        tagsImagePoints.push_back(imagePoints);
    }

    // for (size_t i = 0; i < tag_detections.size(); i++) {
    //     for (size_t j = 0; j < 4; j++) {
    //         cv::line(img, cv::Point(tag_detections[i].p[j].first, tag_detections[i].p[j].second),
    //                  cv::Point(tag_detections[i].p[(j+1) % 4].first, tag_detections[i].p[(j+1) % 4].second),
    //                  cv::Scalar(0, 0, 255));
    //     }
    // }

    std::vector<AprilTags::TagDetection> tag_detections_refined = tag_detections;
    refineCornerPointsByDirectEdgeOptimization(img_gray, tag_detections_refined);

    for (size_t i = 0; i < tag_detections_refined.size(); i++) {
        std::cout << "\nDetection: " << tag_detections[i].id << std::endl;

        for (size_t j = 0; j < 4; j++) {
            std::cout << j << ") " << "pt: " << tag_detections[i].p[j].first << " " << tag_detections[i].p[j].second << std::endl;
            std::cout << j << ") " << "pt refined: " << tag_detections_refined[i].p[j].first << " "
                      << tag_detections_refined[i].p[j].second << std::endl;
            std::cout << j << ") " << "true pt: " << tagsImagePoints[i][j].x << " " << tagsImagePoints[i][j].y << std::endl;

            std::cout << j << ") " << "error: " << pointError(tag_detections[i].p[j].first, tag_detections[i].p[j].second,
                tagsImagePoints[i][j].x, tagsImagePoints[i][j].y) << std::endl;
            std::cout << j << ") " << "error refined: " << pointError(tag_detections_refined[i].p[j].first, tag_detections_refined[i].p[j].second,
                tagsImagePoints[i][j].x, tagsImagePoints[i][j].y) << std::endl;

            cv::line(img, cv::Point(tag_detections_refined[i].p[j].first, tag_detections_refined[i].p[j].second),
                     cv::Point(tag_detections_refined[i].p[(j+1) % 4].first, tag_detections_refined[i].p[(j+1) % 4].second),
                     cv::Scalar(0, 255, 0));
        }
    }

    cv::imshow("Image", img);
    cv::waitKey();
    cv::imwrite("detections_refine_direct_edge.png", img);

    return 0;
}

Ground truth poses:

Results:

Detection: 0
0) pt: 666.195 404.992
0) pt refined: 666.209 405.139
0) true pt: 666.74 405.696
0) error: 0.889887
0) error refined: 0.769782
1) pt: 862.924 345.698
1) pt refined: 863.059 345.799
1) true pt: 863.697 346.321
1) error: 0.992241
1) error refined: 0.824402
2) pt: 769.292 236.181
2) pt refined: 769.296 236.059
2) true pt: 769.869 236.499
2) error: 0.658831
2) error refined: 0.722184
3) pt: 581.085 287.513
3) pt refined: 580.972 287.414
3) true pt: 581.434 287.878
3) error: 0.504522
3) error refined: 0.65411

Detection: 1
0) pt: 1730.19 337.565
0) pt refined: 1730.35 337.659
0) true pt: 1730.99 338.155
0) error: 0.996222
0) error refined: 0.811692
1) pt: 1614.72 232.837
1) pt refined: 1614.67 232.706
1) true pt: 1615.25 233.159
1) error: 0.62197
1) error refined: 0.738124
2) pt: 1477.53 254.158
2) pt refined: 1477.36 254.068
2) true pt: 1477.83 254.572
2) error: 0.512063
2) error refined: 0.68806
3) pt: 1589.71 364.202
3) pt refined: 1589.78 364.345
3) true pt: 1590.29 364.893
3) error: 0.900668
3) error refined: 0.749383

Detection: 2
0) pt: 1176.97 243.39
0) pt refined: 1177.01 247.563
0) true pt: 1177.55 243.931
0) error: 0.793492
0) error refined: 3.6718
1) pt: 1278.75 139.505
1) pt refined: 1282.13 140.906
1) true pt: 1279.31 140.027
1) error: 0.765969
1) error refined: 2.95309
2) pt: 1181.81 42.5991
2) pt refined: 1180.88 39.6576
2) true pt: 1182.37 43.0165
2) error: 0.702135
2) error refined: 3.67418
3) pt: 1083.19 146.575
3) pt refined: 1078.83 146.935
3) true pt: 1083.73 147.031
3) error: 0.708302
3) error refined: 4.89479

Detection: 3
0) pt: 280.855 254.537
0) pt refined: 280.936 254.213
0) true pt: 281.398 254.906
0) error: 0.657437
0) error refined: 0.833526
1) pt: 160.759 296.928
1) pt refined: 160.477 296.745
1) true pt: 161.196 297.353
1) error: 0.609835
1) error refined: 0.942034
2) pt: 218.207 392.315
2) pt refined: 218.095 392.634
2) true pt: 218.754 392.94
2) error: 0.83018
2) error refined: 0.726124
3) pt: 338.896 346.515
3) pt refined: 339.278 346.659
3) true pt: 339.558 347.082
3) error: 0.87223
3) error refined: 0.507625

Detection: 4
0) pt: 1209.37 323.736
0) pt refined: 1209.56 323.19
0) true pt: 1209.92 324.066
0) error: 0.637798
0) error refined: 0.945356
1) pt: 1109.74 389.021
1) pt refined: 1109.32 388.845
1) true pt: 1110.21 389.422
1) error: 0.617286
1) error refined: 1.05806
2) pt: 1123.84 484.098
2) pt refined: 1123.59 484.719
2) true pt: 1124.34 484.785
2) error: 0.848558
2) error refined: 0.74948
3) pt: 1227.88 419.265
3) pt refined: 1228.34 419.385
3) true pt: 1228.56 419.828
3) error: 0.879426
3) error refined: 0.494116

It is even worse with badly illuminated scene where it can diverge (look at the top most tag):

detections_refine_direct_edge

@s-trinh
Copy link
Author

s-trinh commented Jun 23, 2021

Any idea why I could not reproduce the paper results?

I have used the provided code without trying to fine tune the parameters. Ceres-solver version is 2.0.0. Any constraint such as specific tag corners order?

@j-kallwies
Copy link
Collaborator

Hi!

Thank you for your detailed description of the problem.

Unfortunately I am extremely busy this week and have no time to look at the topic.

I will try to find some time and have a closer look at it next week.

@s-trinh
Copy link
Author

s-trinh commented Jul 22, 2021

Any chance to confirm or disprove my findings?

@forkel
Copy link
Member

forkel commented Oct 5, 2021

Hi,

sorry for the delay, I'll jump in here.

The inconsistencies you experience result from different pixel definitions as described in section III.A of our paper. Unfortunately, the ground truth and the detection algorithm do follow a different convention here.

You need to subtract 0.5 from the x and y positions of the ground truth (e.g., in line 32 and 35 of your updated code) to evaluate the error correctly. As a result, your first example will give:

Tag 8:
detection: 2033.97, 1260.67 ; error=0.103988
detection refined: 2033.88, 1260.73 ; error=0.00137797
ground truth: 2033.88, 1260.73
detection: 2373.64, 1283.37 ; error=0.267335
detection refined: 2373.38, 1283.32 ; error=0.00471463
ground truth: 2373.38, 1283.32
detection: 2367.78, 1005.08 ; error=0.275723
detection refined: 2368.05, 1005.02 ; error=0.00631705
ground truth: 2368.05, 1005.03
detection: 2028.88, 1011.25 ; error=0.104791
detection refined: 2028.97, 1011.3 ; error=0.00247119
ground truth: 2028.97, 1011.3

Tag 9:
detection: 1665.53, 1236.21 ; error=0.0836221
detection refined: 1665.45, 1236.2 ; error=0.00681968
ground truth: 1665.45, 1236.21
detection: 1921.43, 1253.22 ; error=0.103313
detection refined: 1921.34, 1253.25 ; error=0.00981711
ground truth: 1921.33, 1253.24
detection: 1916.47, 1013.47 ; error=0.143925
detection refined: 1916.57, 1013.37 ; error=0.00253767
ground truth: 1916.57, 1013.37
detection: 1660.95, 1017.99 ; error=0.137362
detection refined: 1661.03, 1018.1 ; error=0.00688903
ground truth: 1661.04, 1018.1

So the refined tag poses are significantly more accurate than the ones returned by AprilTags C++.

@s-trinh
Copy link
Author

s-trinh commented Oct 15, 2021

Thanks for your answer.

I will try to test it on my data when I will found some free time.

@s-trinh
Copy link
Author

s-trinh commented Oct 20, 2021

Thanks, subtracting 0.5 did the trick.

But what about my Blender test case? I am using Blender to generate the images. I am retrieving the ground truth camera pose and the camera intrinsic parameters.

How to compare tag coordinates from the AprilTag algorithm + your refined strategy with the ground truth corners coordinates computed from perspective projection?

What about the pose computation step? Do we have to add an offset on pixel coordinates before doing pose computation(e.g. solvePnP() in OpenCV)?

@ziyou1987912
Copy link

ziyou1987912 commented Aug 22, 2022

From your implementation, it seems that you are assuming a black-white image for the refinement method to work well. Your refinement method may fail completely for dark images. Also line assumption is not valid for images with large distortion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants