How we measure quality in 2D-3D Linking

What is a Dataset?

A dataset has two components

  1. LIDAR data streams with camera streams

  2. Annotations representing the objects present in the LIDAR data and the camera data.

For the scope of this article, we will be defining the quality based only on the quality of ground truth annotations involving Sensor Fusion annotations in a particular dataset.

What Does the Quality of an Annotation Mean?

A dataset may have millions of objects which need to be annotated and tracked along a sequence. An annotation can have the following key parameters.

  1. Object Type or Class: The annotation should correctly classify the object it represents. e.g. A car shouldn't be annotated as a truck.

  2. Annotation Position and Dimensions: The annotation should be as tight as possible while ensuring no portion of the object is outside the annotation. This is evaluated on 6 parameters, length, height, width, roll, pitch, yaw, and getting this right is one of the most time-consuming aspects of data labeling. Usually, some buffer zone of a couple of pixels is considered while ensuring the correct dimension.

  3. Object Attributes: The annotation should include correct values for attributes as per the project requirements. e.g. If every car also needs to have information about its direction i.e. same or opposite, the annotation for every car should have the correct value for direction.

  4. Object Linking: Object Linking: If an object is present in multiple sensors, annotations in all the sensors should be correctly linked.

  5. Object Tracking Along a Sequence (if applicable): If an object is present in multiple frames of a sequence, all the annotations representing the object should have common tracking id. If there is an object which is represented by two different tracking id then this type of error is termed as id switching error.

How to Measure the Quality of a Dataset Using Quantifiable Metrics?

The quality of a ground truth dataset is good if most of the annotations in it are true positives and there are very few false positives and false negatives. To measure quality in quantifiable metrics, we experimented with few different universally accepted methods. We have found that many of our clients view FP errors and FN errors differently with different costs attached to them. Hence a standard and simple "precision" and "recall" (for 2d and 3d annotations) give a good representation of the quality of the dataset.

3D-Metrics

True Positive: When all the parameters of an annotation in the LIDAR data are correct, that annotation is called True positive. This means that the object has been correctly annotated.

False Positive: When at least one of the parameters of an annotation in the LIDAR data is incorrect, that is annotation is called False positive.

False Negative: If there is an object which can be identified by human eyes and doesn't have an annotation in the LiDAR data, then every such object is counted as False-negative.

Precision=tTPtt(TPt+FPt)Precision = \frac{\sum_{t}TP_t}{\sum_{t}(TP_t+FP_t)}
Recall=tTPtt(TPt+FNt)Recall = \frac{\sum_{t}TP_t}{\sum_{t}(TP_t+FN_t)}

2D-Metrics

True Positive: When all the parameters of an annotation in the camera data are correct, that annotation is called True positive. This means that the object has been correctly annotated.

False Positive: When at least one of the parameters of an annotation in the camera data is incorrect, that is annotation is called False positive.

False Negative: If there is an object which can be identified by human eyes and doesn't have an annotation in the camera data, then every such object is counted as False-negative.

Precision=tTPtt(TPt+FPt)Precision = \frac{\sum_{t}TP_t}{\sum_{t}(TP_t+FP_t)}
Recall=tTPtt(TPt+FNt)Recall = \frac{\sum_{t}TP_t}{\sum_{t}(TP_t+FN_t)}

How does Playment check the Quality of a Submission?

Playment's Quality Assurance Process:

For Playment, quality is always the top priority. We have multiple checks and balances in place during the execution phase to ensure the quality of the output is best in class. Each annotation is checked multiple times before it is ready for submission.

Importance of Sampling in Determining The Quality of Large Ground Truth Dataset:

As we have seen so far to calculate the quality of the ground truth dataset, you need to manually check all the parameters of all the annotations present in a dataset. However, when dealing with millions of annotations this activity becomes prohibitively expensive. Hence we create a statistically significant random sample which would be a good representation of the dataset.

There are 2 ways to generate a sample

  1. Long High FPS Sequences: If the dataset contains very long sequences, it's better to scale down the FPS by a certain factor. e.g. If the dataset has 100 sequences with 1000 frames each i.e. total of 100,000 frames in the dataset. We can scale down each FPS by a factor of 10 and pick up 100 equidistant frames from each sequence. This will give you a sample size of 100 sequences * 100 frames/seq = 10,000 frames which are 10% of the original dataset.

  2. Short Low FPS Sequence: If the dataset contains a large number of short low FPS sequences, it's better to randomly pick a certain % of the sequences at the original FPS. e.g. If the dataset has 1000 sequences with 100 frames each. i.e. total 100,000 frames in the dataset. We randomly pick up 10% of the sequences i.e. 100 sequences at the original FPS i.e. with 100 frames each. This will give you a sample size of 100 sequences * 100 frames/seq = 10,000 frames which is 10% of the original dataset.

We usually consider an unbiased sample with at least 5% of the total frames to be statistically significant.

A Glimpse of Playment's Quality Assurance Tool:

Playment has built a quality checking tool which allows us to comprehensively check all the parameters of all the annotations present in a sample.

QC Results:

Upon QC completion on a sample set, the tool generates a result in the following format which clearly tells us the quality of the dataset.

A customer can perform a QC to evaluate the quality of a batch. If the Precision and recall numbers are below the contractually agreed upon benchmarks then the customer can request Playment to rework the batch to iron out the issues.

Last updated