Mastodon

Creating & Managing custom blocklists in the Azure Content Moderator service

Apr 10, 2021 by Kolappan N

In my previous blog post I explained about using Azure Content moderator service to identify personal information and explicit content in a given text and using it for moderating comments. In this blog post we will look into another feature of the Azure Content Moderator service called the terms list.

The fact that the feature is called Term list and not blocklist is intentional. The Azure service checks if the text we sent contains any words from the specified list or not and gives us the result. Whether we want to use it as a blocklist or allowlist is completely upto us.

The lists are fully manageable through the .NET SDK. This includes operations like

Once you create a list you will get a id for that list, which is then used for all other operations.

Here is a sample code in my POC console application to perform these operations,

#region Managing Lists

public string CreateTermList(string listName, string listDesc)
{
    var body = new Body(listName, listDesc);
    var list = client.ListManagementTermLists.Create("application/json", body);
    var listId = list.Id.Value.ToString();
    return listId;
}

public void AddToTermsList(string listId, List<string> terms, int throttleRate = 3000, string lang = "eng")
{
    foreach(var term in terms)
    {
        var result = client.ListManagementTerm.AddTerm(listId, term, lang);
        Thread.Sleep(throttleRate);
    }
}

public List<string> GetAllTerms(string listId, string lang = "eng")
{
    var termsData = client.ListManagementTerm.GetAllTerms(listId, lang).Data;
    List<string> termsList = new List<string>();
    foreach (var term in termsData.Terms)
    {
        termsList.Add(term.Term);
    }
    return termsList;
}

public void DeleteTerm(string listId, string term, string lang = "eng")
    => client.ListManagementTerm.DeleteTerm(listId, term, lang);

public void DeleteAllTerms(string listId, string lang = "eng")
    => client.ListManagementTerm.DeleteAllTerms(listId, lang);

public void DeleteTermList(string listId)
    => client.ListManagementTermLists.Delete(listId);

#endregion Managing Lists

The complete console application is hosted on GitHub.

Now that we have seen how to create and manage a term list, the next step is to learn on how to use it during text moderation. As mentioned in my previous blog post, the function we need to call to analyse the text via Azure is ContentModeratorClient.TextModeration.ScreenText. The only change we need to make is to send the listId value to the ScreenText method. My modified method looks like this,

private Screen CallAzureModerator(string text, string listId)
{
    var textBytes = Encoding.UTF8.GetBytes(text);
    using var stream = new MemoryStream(textBytes);
    return client.TextModeration.ScreenText("text/plain", stream, language: "eng", autocorrect: false, pII: true, listId:listId, classify: true);
}

In the response we test for the Terms field. If the variable is null then our given text contains no words from the specified list. If our variable contains any terms that are in the list then we will receive an array of objects. Here is a serialized JSON version of a sample response.

{
    [
        "Index": 16,
        "ListId": 1,
        "OriginalIndex": -1,
        "Term": "Batman"
    ]
}

Some of the cases in which these lists can be used are