Data Review

Visualization#

When the data is uploaded, the catalog is visualized with the help of hundreds of computer vision and NLP models, that process your data, and project the products on a 2D canvas. This enables you to visualize your data spread across the canvas, to quickly realize similarities and differences between the products, based on their relative distances from each other. To avoid clutter, not all data points are visualized at once but only a sample of it.

You would be able view all data points with the predicted value for which you can give feedback.

Visualization

Review Interaction#

Labeling data points#

You can review and correct a datapoint in two ways
- Drag and drop products into their respective classes below
- Hover on a product and choose a label from the dropdown
Labeled datapoint - The user action of assigning a class value to a datapoint marks a datapoint as a ‘labeled datapoint’. Once a datapoint is labeled it disappears from the working area. To view labeled data points, click on the class to open all data points labeled as that class

Predictions and Clusters#

A cluster is a group of similar products that’s predicted by the system. A cluster boundary represents a prediction confidence of 80%.
Predicted datapoint - A datapoint that falls within a cluster is a predicted datapoint. Data points predicted with a confidence over 80% will fall inside the cluster.
The data points with lower confidence in predictions will fall outside the cluster. These are considered unlabeled and unpredicted data, and can be referred to as outliers.
The user can continue drag and drop interactions with the clusters to convert predicted data points into labeled data points. Labeled data points disappear from the cluster/working area.
- Confirm prediction - Drag a datapoint from within the cluster and drop within the same cluster - This will confirm the systems prediction, and mark the datapoint as ‘labeled’
- Correct mispredictions - Dragging a datapoint from one cluster and dropping inside another cluster - This will correct a misprediction, and mark the datapoint ‘labeled’ as the new cluster class.
- Label unpredicted data - Dragging a datapoint from outside and dropping inside a cluster - This action is the same as dropping a datapoint into the class below - it will mark the datapoint ‘labeled’ as that cluster class.

Labeling data points in bulk#

Another powerful option is to review data in bulk.

Here the user can scan through a grid of similar products, and quickly select and label tens of data points at a time.

You can sort data based on confidence of prediction. And also filter the data by user labeled, System predicted data. Predicted data points have a grey tag, while labeled data points are blue.

Other bulk page feature includes Icon enlargement to size up/down icons as required.

-Switching working levels - Now that we have completed 99 - 100% at a category level, we can switch to reviewing a different level of the taxonomy. We repeat the review steps, very similar to the category level.

Switching taxonomy levels#

Once review at the current level is complete, it is recommended to switch to a child taxonomy level or a peer attribute to continue review.

In order to switch the taxonomy level, click the taxonomy level indicator to open up the taxonomy screen, and select the taxonomy level to switch to. When switching to a deeper level in the taxonomy, only the data points that have been labeled as the parent class will show up for review.

In this video, we look at how to switch the taxonomy level from parent to child node once the category has been reviewed. If the user switches taxonomy level to ‘Sleeve Length’ attribute under ‘Dress class, only those data points labeled/predicted as ‘Dresses’ will show up for reviewing for 'Sleeve Length'.

Menubar/Toolbar#

Zoom tool has the option to Zoom In, Zoom Out, Rectangular area selection and Zoom Reset

Image Size Icon enlargement to size up/down icons as required in the 2D canvas for better visibility

Labeling in Detailed view of a datapoint#

Hover over an image and click on the detail icon, to access all the attributes & values for that image to review predictions.

From here you can review one image at a time across all Taxonomy levels and accept the changes to prediction.

Metrics#

There’s a number of metrics that provide insights into the data review.

Let’s start with some basic count of data points. Each of these counts are available at a class level (when hovering on the class at the bottom of the screen) and at the current taxonomy level (when hovering on the completion rate indicator)

System predicted data - The number of data points that the system could predict with a confidence higher than 80%
User labeled data - The number of data points labeled by the user either in single or bulk edit mode. Predicted data points once labeled by the user will move from predicted count to labeled count. To provide more granular info on the user labeled data points, we break this down into accepted vs correct data points. These ratios provide a sense of the AI’s understanding and correctness in organizing data
- User Accepted - These were labeled predicted correctly by the AI, that the user accepted
- User Corrected - These were mis-predicted data points or data points without any predictions which the user labeled