Automated Social Tagging

A working proof-of-concept for a machine learning model that could be used to automatically classify Greenpeace's social media posts worldwide.

The Need For the Product

-Target audiences: global & NRO campaign analysts, GPI analysts;

-Pain points of target audiences:

  • Difficulty of filtering out and extracting all Facebooks posts from a specific campaign

  • Difficulty understanding the performance of a social media campaign

  • Difficulty reporting, without having a workable dataset

  • Large amount of unstructured and categorised post data

  • Categorisation of posts includes a lot of manual work and is very time consuming, this approach is not scalable to address our 55 Facebook pages.

Making use of new methods of data accessibility: Facebook data warehousing project (led by Mathias Schuh) has enabled access to Facebook data from all the offices in one database.

Possible Solutions

While the categorization of Facebook posts manually is time consuming, the need to understand the social media performance by campaign is present. This need has been also strengthened by the development of logframes for campaign evaluation, as many campaigns choose social media metrics as a part of their evaluation. While GPI Comms Hubs had a developed tagging system for Facebook posts, it was done manually, and only included 1 NRO, therefore excluding the performance of other NRO channels.

Based on these considerations, the possible solutions for tagging should be:

a) scalable to all NROs;

b) including less manual work;

c) time-saving.

Facebook data warehouse is based on Google services, therefore Google services were also considered for the automation of post tagging. After researching various options, Google’s Auto ML Natural Language product has been chosen for a proof of concept.

Creating a Prototype and Testing

Prototype

The Natural Language API discovers syntax, entities, and sentiment in text, and classifies text into a predefined set of categories. For the purpose of prototyping, those categories in Facebook posts’ case are tags, identifying if a post belongs to a certain topic of campaigning: for example, oceans, forests, climate, etc. The current scope of prototyping is English-language Facebook pages, as the NLP possibilities are the most advanced in this language. Other languages will also be considered later in the process, depending on the results of testing.

Model of a functioning prototype (step-by-step)

  1. Step 1: FB post caption data is ingested from the database;

  2. Step 2: trained machine learning model is applied through a custom piece of code received from Google’s AutoML;

  3. Step 3: machine learning model predicts a category of campaigning, and automatically adds a “tag” column to every single post;

  4. Step 4: tagged data is either saved in a new document, or brought back to the database

Testing

Testing mainly consists of improving and feeding into the training model to be able to achieve a high precision metric for every category (>95%), while also keeping a good balance with recall (>50%).

Here are several important metrics to use, while evaluating the training model:

Score confidence threshold

The score threshold refers to the level of confidence the model must have to assign a category to a test item. Currently testing with the confidence threshold of 0.9.

Precision

Precision tells us, from all the test examples that were assigned a label, how many actually were supposed to be categorized with that label. A high-precision model is likely to label only the most relevant examples, which is useful for cases where your category is common in the training data. To be able to test with 0.9 confidence threshold, the testing model is optimized for precision, rather than recall. Recall

Recall tells us, from all the test examples that should have had the label assigned, how many were actually assigned the label. A high-recall model is likely to label marginally relevant examples, which is useful for cases where your category has scarce training data.

The next level of testing will include exploring the model’s connection with the database, data ingestion and feeding back.

When is the testing finalised?

The testing will be finalised once the trained model fits the above mentioned requirements, and once it is connected to the active Facebook database in BigQuery. If all the steps in the model of the functioning prototype can be implemented, then the first testing phase will end.

Documented feedback

After the initial testing, open communication will follow. It will allow to receive feedback, and collect the experiences from the users. With the directions given and feedback collected, the project might receive the opportunity to either be scaled to more languages and other categories, or to be discontinued.

Last updated