Building Blox.ai Image Moderation

AI Beyond The Buzzword: Building An AI-Powered Image Moderation Solution8 min read

February 7, 2022 6 min read

Machine Learning Engineer, Blox.ai

AI Beyond The Buzzword: Building An AI-Powered Image Moderation Solution8 min read

Reading Time: 6 minutes

88% of shoppers think product images are critically important for making a purchase decision. As a result, retailers invest a significant amount of resources into ensuring that product images have clarity and adhere to internal guidelines to secure quality. A big part of the challenge however lies in maintaining this quality comes from the vendor onboarding process. This is especially true in the case of marketplaces. Every single image that a marketplace onboards needs checking to ensure guideline adherence and product credibility.

Traditionally this process of moderating images for quality is carried out manually. However, this approach results in three fundamental issues:

  • Slow onboarding time
  • Errors and inaccuracies in results due to human error and biases
  • Substantial investments in the resources and effort required

With over 25 million eCommerce sites around the world, the volume of products entering the market, and the level of competition,  the speed at which retailers go to market is critical. In order to address these issues, we introduced an AI-powered solution to help businesses automate the tedious process of moderating images for quality.

By intercepting this process with A.I. and automation, businesses have been able to exponentially improve throughput, accelerate go-to-market timelines and revenue generation, and free up personnel to focus on other parts of their business.

Here’s how we built our AI-powered image moderation solution for retailers:

The Approach

When it comes to evaluating product images, there is no hard and fast rule which qualifies an image for use. Every business has its own custom guidelines – what might be considered acceptable for certain use cases might not hold true for others. This generally depends on the type of business and use case being considered. For instance, a blurry image is universally unacceptable whereas an image with a logo on it may be acceptable only for certain categories. 

Nuances like these vary from case to case. Thus, we decided to design a customizable solution that gave the retailer control over what kind of images were acceptable to them. For every image submitted, we provide ACCEPT or REJECT tags against a list of relevant guidelines along with a confidence score that denotes the confidence of our models in generating the tag.

The Tech Behind It

The Input: Product Image Library
The Output: ACCEPT/REJECT tags 

Convolutional Neural Network (CNN) classifiers are best suited for this use case. With the availability of their image processing functions, detectors, and segmentors, CNNs play a critical role in our solution. 

Every image submitted is first evaluated to determine whether it is blurry or grainy. By using edge statistics from spatial filters at multiple resolutions, our models can determine the clarity of every image submitted and establish whether or not they match up to the guidelines. 

Features such as watermark, logo, and price are evaluated with the help of a binary CNN classifier. The positive class implies the presence of these features, while the negative class implies the absence of them in the image.

For features such as the presence of unwanted text within the image, we use pre-trained text detectors. These object detector models are trained to only locate irrelevant text that might be written over the product in the image and not that which contains essential product information.

Another feature that is common to most guidelines – especially in retail – is the type of background in the image. Objects that are placed in plain or solid color backgrounds are preferred as they are less distracting and easier to see. For this, we leverage foreground-background segmentation models. The segmented background is subjected to multiple rounds of image processing before it is cleared for use.

Certain retailers also require us to detect the presence of children/minors in the images submitted. In this case, models such as simple binary classifiers may result in a high number of false positives. Hence, we utilize a hierarchical solution wherein irrelevant data is discarded at each level. This allows us to reduce the diversity in data and focus the learning on the leaf levels. 

With our proprietary image moderation solution, our systems can process upwards of 30k requests per minute –  with zero data loss! 

Evaluation Criteria

Retailers that are looking to automate this part of their workflow typically have two challenges that they are looking to solve:

  • To reduce the man-hours spent in manual moderation of seller data
  • To reduce the number of falsely rejected images

Our AI-powered image moderation solution plays a significant role in achieving the first goal, by evaluating image guideline values at scale. From an evaluation perspective, if we were to compare our AI-based automated image moderation versus manual image qualification, we can see a reduction in the time taken through our automation. In some cases, even exceeding 70%

For each guideline, we typically provide a tag and a numeric value between 0 and 1 – that indicates the confidence of our model in providing the ACCEPT/REJECT tag. In this case, 0 represents the state of least confidence while 1 represents full confidence. This equips retailers with the ability to create their internal workflow for automatically accepting or rejecting the reviewed images based on their custom thresholds. If the confidence value of the image is greater than the set threshold, the workflow automatically accepts the images. If not, the image is passed along to moderators for manual verification. 

When it comes to evaluating the second goal, it’s important to remember that there is an inherent data skew towards acceptable images. For this reason, the accuracy of the models isn’t an ideal measure for performance. Instead, we consider precision & recall. Our models are optimized for high precision for REJECT tags. 

With our Image Moderation solution, retailers have seen a significant reduction in the number of inaccurately tagged images and fewer contact tickets raised by their vendors due to incorrectly tagged images. 

Feedback & Retraining

Deploying our models to evaluate the submitted images and generating the relevant tags isn’t the last step in the pipeline. It’s important to remember that data in production is constantly evolving and it is critical that our A.I. models also evolve with them. 

For this, our teams regularly sample our datasets for rejected images and analyze the AI-generated tags. Then, the QA-based classifications are fed back into the system and retrained to ensure that our models continue to get better with time.

Currently, our systems are capable of processing standard guidelines such as borders, text, watermark, resolution, quantity of objects, presence of minors as well as custom guidelines based on individual business needs. We’ve been able to exponentially reduce the time, cost, and manual effort involved with assessing images at scale along with other benefits such as improved user engagement and productivity due to our ability to provide immediate feedback. 

Just like our data, our models are also continuously learning, evolving, and adapting to suit your needs. Going forward, we’re working towards supporting more guidelines such as NSFW and offensive content, facial detection, landmarks, and more. With our infinitely scalable infrastructure, intuitive workflow, and highly flexible stack, we look forward to playing a part in your A.I. journey.

While this article explores the scope of our AI-powered image moderation solution for the retail industry, it is important to understand that our A.I. can be customized to solve image moderation use cases across a variety of industries. This could be for quicker user-onboarding on apps, claims processing, qualification of UGC images, and much more.
Today, given any industry, we can map the value of clean, standard, qualified images to their business value chain. For example, faster patient onboarding directly impacts patient acquisition rates, faster claims processing can impact payouts and the end-user experience thus impacting retention.