Creating & Managing custom blocklists in the Azure Content Moderator service
In my previous blog post I explained about using Azure Content moderator service to identify personal information and explicit content in a given text and using it for moderating comments. In this blog post we will look into another feature of the Azure Content Moderator service called the terms list.
The fact that the feature is called Term list and not blocklist is intentional. The Azure service checks if the text we sent contains any words from the specified list or not and gives us the result. Whether we want to use it as a blocklist or allowlist is completely upto us.
The lists are fully manageable through the .NET SDK. This includes operations like
- Creating a list
- Deleting a list
- Adding words to a list
- Deleting a word from the list
- Emptying a list
- Get all the words inside a list
Once you create a list you will get a id for that list, which is then used for all other operations.
Here is a sample code in my POC console application to perform these operations,
#region Managing Lists
public string CreateTermList(string listName, string listDesc)
{
var body = new Body(listName, listDesc);
var list = client.ListManagementTermLists.Create("application/json", body);
var listId = list.Id.Value.ToString();
return listId;
}
public void AddToTermsList(string listId, List<string> terms, int throttleRate = 3000, string lang = "eng")
{
foreach(var term in terms)
{
var result = client.ListManagementTerm.AddTerm(listId, term, lang);
Thread.Sleep(throttleRate);
}
}
public List<string> GetAllTerms(string listId, string lang = "eng")
{
var termsData = client.ListManagementTerm.GetAllTerms(listId, lang).Data;
List<string> termsList = new List<string>();
foreach (var term in termsData.Terms)
{
termsList.Add(term.Term);
}
return termsList;
}
public void DeleteTerm(string listId, string term, string lang = "eng")
=> client.ListManagementTerm.DeleteTerm(listId, term, lang);
public void DeleteAllTerms(string listId, string lang = "eng")
=> client.ListManagementTerm.DeleteAllTerms(listId, lang);
public void DeleteTermList(string listId)
=> client.ListManagementTermLists.Delete(listId);
#endregion Managing Lists
The complete console application is hosted on GitHub.
Now that we have seen how to create and manage a term list, the next step is to learn on how to use it during text moderation. As mentioned in my previous blog post, the function we need to call to analyse the text via Azure is ContentModeratorClient.TextModeration.ScreenText
. The only change we need to make is to send the listId value to the ScreenText
method. My modified method looks like this,
private Screen CallAzureModerator(string text, string listId)
{
var textBytes = Encoding.UTF8.GetBytes(text);
using var stream = new MemoryStream(textBytes);
return client.TextModeration.ScreenText("text/plain", stream, language: "eng", autocorrect: false, pII: true, listId:listId, classify: true);
}
In the response we test for the Terms
field. If the variable is null then our given text contains no words from the specified list. If our variable contains any terms that are in the list then we will receive an array of objects. Here is a serialized JSON version of a sample response.
{
[
"Index": 16,
"ListId": 1,
"OriginalIndex": -1,
"Term": "Batman"
]
}
Some of the cases in which these lists can be used are
- Used as blocklist to block comments or other user generated content that contains certain words.
- Can be used to redact information from UGC(User Generated Content). The response contains Index and Term(which gives us the term length) which can be used to redact certain words from UGC.
- Used it check against a list of sensitive topics and flag them for manual review.