AI-Driven Moderation

Introduction

Many companies see AI tools as a key component of their Trust and Safety programs. As UGC and regulatory scrutiny increases and budgets tighten, companies look to AI as a cost-effective way to achieve compliance, safeguard their community, and protect their brand.

Now that the hype around AI is calming, it's time to examine where we are. What do we think about its current capabilities and benefits and how they scale. The fact that we're one of the true content moderation companies in the CX industry (we are the Mods!) and our nearly two decades of experience leading trust and safety services give us a unique perspective on the role of technology in moderation.

This article dives into the evolving role of AI in content moderation and explores how to harness its full potential. You'll learn:

How today's AI tools go beyond keyword detection to assess sentiment, context, and intent.
The strategic value of combining AI with human oversight to enhance accuracy and build trust.
How to scale AI-driven moderation effectively while balancing cost, compliance, and community well-being.

Uses of AI in Moderation

AI has a wide variety of trust and safety uses.

Assessment and Categorization

Content moderation primarily involves evaluating and categorizing all types of user-generated content – and AI is particularly well-suited for this task. The predominant use case for AI-enabled tools is to provide that initial screening, review, and categorization of content, tagging and flagging it for further action.

The use of technology on the moderation front lines is not new. For example, moderation programs have used black and white lists to protect communities from harmful words and phrases for over two decades. But AI's capabilities are much more impressive.

Text and Audio

Today's large language models (LLMs) and natural language processing (NLP) capabilities enable AI to evaluate language in a much more nuanced way than those early moderation tools. The better implementations go beyond merely identifying banned language. They can detect sentiment and tone and identify subtle undertones such as sarcasm and implied threats. The key is that LLMs can evaluate context. Speech-to-text conversion enables audio content to be processed as text.

The key is that LLMs can evaluate context.

Image and Video

Image and video moderation relies on vision-based models rather than language models and includes technology such as Convolutional Neural Networks (CNNs) and object detection models. They operate by identifying patterns at the pixel level. These models can detect content that is violent, gory, and explicit. They can identify things like weapons or drugs and hate symbols. With optical character recognition (OCR), these models can also read text in images, which can then be processed as language.

Behavior, Intent, and Fraud

Beyond identifying inappropriate language and imagery, today's more sophisticated AI tools can detect—to some degree—intent and behavior, including self-harm, trolling, bullying, and harassment. AI can also be propaganda and terrorism and aid in trust and safety tasks such as user verification.

Tool Stacking

With the wide variety of types of UGC and different kinds of violations to detect, it's no surprise that one trend we see on larger and more complex moderation projects is stacking AI tools. Moderation teams can wring the most from their technology by combining tools with different capabilities, features, and focus. For example, we have clients who use one AI tool as a preliminary filter. That content is categorized and passed to the primary AI tool for another round of filtering and assessment involving human oversight. From there, certain types of content could be sent through a further AI tool specializing in fraud detection, for example.

Review and Action

Once content has been evaluated and categorized (typically tagged), the next step is to take action.

It's helpful to think of evaluated content as falling into three buckets: white, black, and gray. White content is stuff that has passed moderation without being flagged. Black content contains clear violations. Gray content is content that might be in violation and requires further review.

How that content is treated varies.

White content is often published or allowed to remain published.
Gray content is typically queued for human review.
Black content, at least in the past, was automatically blocked, hidden, or deleted.

These days, we're seeing a different trend: human moderators spend more time evaluating black content, whereas, in the past, this content was automatically hidden or deleted. One reason for this trend is regulatory accountability around banned content, particularly in Europe. Some companies have elected to review banned content or have content pass human review before actioning it to ensure banning is justified.

These days, we're seeing a different trend: human moderators spend more time evaluating black content.

However, the second reason is the same reason all types of content are getting closer scrutiny by human moderators, especially content initially reviewed by AI. Due to their black-box nature, developing and maintaining confidence in AI tools requires ongoing review and feedback.

Other Uses

There are numerous potential use cases for AI in moderation and trust and safety, but two are worth mentioning here.

Analysis and Reporting

AI is a powerful tool for a moderation team that uses data to optimize operations, improve accuracy, or detect broader trends. It's also highly capable of helping ensure compliance with the growing number of regulations regarding logging, reporting, and follow-up to moderation actions.

Engagement Moderation

Policing content and community behavior is only one aspect of a robust trust and safety program. It's not just about removing harmful content and banning bad behavior and actors—it's about encouraging and inspiring engagement and building a vibrant community. Generative AI has proven to be a popular tool for content creation, particularly on some social platforms and Discord communities.

The Promise of AI

Clearly, AI has many roles to play in moderation and trust and safety more broadly.

AI's most significant benefit is the speed and efficiency with which it can review UGC. This allows a moderation team to focus on higher-level issues like review, QA, and strategy and also, in many cases, protect moderators from seeing the most horrific content on the internet.

Consistency is another benefit. While AI may not always be accurate or bias-free, it reviews content consistently. Based on QA and feedback from human mods, accuracy tends to increase over time.

AI makes regulatory compliance more manageable, especially at scale.

Finally, AI can provide insights into a community's sentiments, topics of conversation, and behavior.

The promise of AI ticks all the boxes: greater protection, deeper insights, and cost savings.

But promise is one thing. Making AI work and making it scale is another.

Scaling AI: Requirements, Considerations, & Iteration

By processing content at such a rapid pace, AI has the potential to help companies cut costs, reduce risk (by reducing the time harmful content is left unactioned), and scale when new opportunities present themselves (such as adding new languages to support international expansion).

Elements of a Success

The right strategy and plan

The success of an AI tool or toolset starts with the right strategy and plan. While some of these tools have impressive capabilities out of the box, tuning them to your specific use cases—whether that means your brand policies, desired community character, or regional legal requirements—is critical.

Laws and policies tend to provide broad guidelines for what's acceptable and what is not. However, details matter, especially when it comes to training, operating, and QAing AI agents. One effective tool we use is our proprietary Behavior Matrix. This tool allows us to map these laws and policies onto specific behaviors that more precisely inform moderation and the degree to which specific items should be enforced.

The Right Tool(s)

Choosing the right AI tool or stack can greatly affect the effectiveness and efficiency of your moderation program.

Of course, the first step is to ensure you have the right model for the type of content under review.

Then there's the question of access. Moderation on propriety platforms, for example, will require connection via APIs. This will likely involve custom code, which means development and maintenance. If you're reviewing content on a public platform such as a social media network or a more traditional community platform like Discord, you'll likely use tools, bots, or plugins specifically designed for that application.

Finally, you'll want to determine whether your tools require data storage and, if so, how much for how long. Data retention, whether for training or compliance, is often a relatively large unexpected cost.

Ready to skip ahead and just book a consult?

Proper QA and Review

Scaling AI requires oversight and quality assurance. Even if AI can consistently apply moderation standards, that doesn't guarantee accuracy or freedom from bias. The model needs feedback on what is and isn't acceptable. After all, AI is only as good as the data on which it is trained and the subsequent direction it receives.

Scaling AI with precision means having the right processes and people in place who can evaluate the tremendous amount of content processed by AI. Challenges here might include staffing a more technical team familiar with AI. It might involve staffing a senior team more familiar with reviewing and QAing flagged content than simply identifying it. And if you're using AI to quickly scale internationally, it might require building a team that speaks those languages so they can adequately evaluate whether localized content is appropriately categorized.

Balancing Costs

At this point, it is easy to see all the variables determining the total costs of scaling AI. Tool costs include purchase, implementation, operation, maintenance, and ongoing data storage. People costs involve ensuring you have higher-level moderators for review, appeals, QA, and ongoing training. Whether a particular AI solution will result in cost savings depends heavily on the specifics of your project. And, as mentioned above, data storage can get expensive. That said, in general, we're seeing approximately savings of approximately 20% on larger projects.

In general, we're seeing ~20% cost savings on larger projects.

Continual Reassessment

It's obvious – but still worth emphasizing – that you'll need to continually reassess as you go, especially if you're scaling quickly. You have to be ready to adapt to changing variables to keep your program effective and efficient. There are two questions to regularly ask yourself.

Is it working?

Spikes in false positives or increases in appeals indicate potential problems – possibly a problem with the model or a change in your community. Online memes and slang can evolve quickly and erode accuracy. Community sentiment can shift, requiring tweaks to how you enforce and adjudicate policies. Is the community engaged or disengaged? And with the rapid pace at which regulations are increasing and evolving, AI tools will inevitably need updating.

And it's not just that AI can fail to provide sufficient moderation. AI can sometimes act like an overzealous gatekeeper, removing legitimate content or silencing critical voices because it misinterprets context, humor, or cultural nuances. This "collateral damage" reveals a harsh reality: while AI scales moderation, it sometimes undermines the very communities it's meant to protect. You may have to reign it in.

At the end of the day, the real question is whether trust and safety are decreasing or increasing and whether it meets the needs of your community and the standards of your brand.

While AI scales moderation, it sometimes undermines the very communities it's meant to protect.

Is it efficient and cost-effective?

Obviously, AI and the moderation tools that utilize it are evolving rapidly – it's still early days. Recently we've started to see that the underlying models may be becoming more commoditized and less expensive. So checking in with vendors on potential savings and exploring and evaluating new competitors is a smart way to keep costs in check and should be part of your strategic plan.

Even if your technology remains cost-effective, you'll want to ensure your workflows are processes are efficient as well. Top moderation teams focus on ongoing optimization to reduce time on task and cut costs.

Conclusion

As AI and the technology supporting it continue to improve, scaling moderation will become easier, more efficient, accurate, and cost-effective.

But today, we're still a long way from fully automated moderation. If anything, we're currently seeing a greater demand for human moderators to proactively QA AI results and handle the proportional increase in assistance and appeals and AI churns through greater quantities of UGC. Unlike other industries partly replaced by AI, Trust & Safety demands human involvement to handle context, nuance, and cultural complexity.

Still, most companies are still not fully leveraging AI tools due to gaps in their tech stack and ineffective human-in-the-loop processes, leaving significant potential untapped.

If you want to ensure you get the most AI has to offer for trust and safety, we're here to help with all aspects, from strategy to implementation to operations.

The ModSquad Blog