Exclude Spam Bot Hits

In 2020, some of the Planet 4 websites were affected with a high number of searches generated by a Korean Spam Bot. This increased traffic and disrupted Google Analytic's reports for a few offices.

The solution below was proposed by Greenpeace Nordic and aims to exclude the Korean Bot hits from being recorded in Google Analytics. This setup can also be adapted for future similar issues.

The solution is based on three main steps:

  • Send a fake page view when a traffic is identified from a spam bot

  • Set a specific value for a user-level custom dimension

  • Exclude hits that match the value used in the custom dimension from Google Analytics views

Instructions:

Setup in Google Analytics

Pre-requisites: You’ll need a custom dimension with a user scope in your Google Analytics property. In the Global Property, we are using the existing custom dimension n.1 which is called “Internal Traffic”.

1) In your Google Analytics account, navigate to Admin > View > Filters. Create a custom filter called "Exclude Spam Bot". Select the filter type as custom and the user-scoped custom dimension. The filter pattern should match the value spam:

Setup in Google Tag Manager:

You can download the recipe for Google Tag Manager here if you want to skip a few steps. Import the .json file in a new container and choose the method merge. Don't forget to review the settings in the virtual pageview tag (see below).

1) Create a variable to extract the search term from the URL query. In your Google Tag Manager container, go to Variables > User-defined variables > New > select the URL type. Define the componety type as Query and the Query Key as s:

2) Create a trigger called to match the Search Term patterns: We observed that the Korean bot was using special characters in the search queries, which is not a common behaviour for normal users. Therefore, the solution is to create a trigger that matches the Regex pattern below:

\♥|\❥|\★|\■|\♬|\▶|\➡|\◤|\◐|\㋛|\◈|\❣|\●|\☆|\☼|\♤|\♡|\▷|\△|\▽|\↘|[|\。|\.|{|\≪|\⊀|\『|\┖|\【|\《|\〘|.com|.net|.me|.shop

Define the trigger type for Pageviews to match when the url query - s (variable you created in the previous step) matches RegExp the pattern above. See the preview below:

3) Create a fake (virtual) pageview tag called “UA - VPV - Search Spam"

  • Create a new Universal Analytics tag for Page Views

  • Connect to your Google Analytics settings

  • Enable the overwriting settings field

  • Define the field name page as /visit-from-spam-bot

  • Define the field name location as https://{{Page Hostname}}{{Page Path}}visit-from-spam-bot

  • Select the same custom dimension that you used to create the filter in Google Analytics. We need to store the spam value under the exact same index.

See example below:

4) Debug

One way to debug this setup is to use Tag Manager's preview mode in the Search Page and try running a search query with any of the special characters above. In this case, the VPV tag should be fired.

Monitoring:

Add this custom report to your Google Analytics view (filters only searches that matches the same RegEx rule used above) to monitor this solution. If everything is working fine, you should see the numbers of the search bot decrease.

You can also monitor it against a raw view in Google Analytics.

Last updated