Reporting is one of the key aspects of an inbound-focused company. And good data is the key to reporting. The other day I was checking New Breed’s Google Analytics and I saw an interesting language on the dashboard. Where I'd normally see something like en-us (english) or fr (french), this time our 45th President’s name caught my eye...
That was under the Language segment of the Audience Overview. I don’t know about you, but I don’t want to learn that language! I immediately decided I needed to investigate further. I then noticed it carried a small percentage of our visitors and knew it had to be a spam bot.
I did a little more digging and discovered it wasn’t the only spam traffic bot we were getting hit by. I also ran a quick check of my clients' portals and realized we weren’t the only people affected by this. I decided it was high-time to clean out our Google Analytics. Here's how I went about doing it and I've also included a complete list of spam sources I found along the way. Let's get started!
The first step
Note: Before you make any major adjustments to your Google Analytics dashboard, be sure to test out each filter or adjustment on a test view before implementing across your entire dashboard.
To start a new test view, hop into the Admin > View Settings of your GA dashboard and click Create New View under your other views. Once you have the new test view you’ll be good to start test blocking that pesky spam traffic.
Now there are a lot of different methods to blocking spam traffic bots from messing with your data but this is by far the simplest option. First off, let Google do the most work for you. Head on over to the Admin section of your Google Analytics dashboard and head into View Settings. Scroll down to Bot Filtering and check that box (highlighted in green below) if it wasn't already.
This will exclude all hits from all of the bots and spiders that Google is aware of and remove those hits from appearing in your dashboard going forward. Google is constantly updating those filtered bots and spiders so the major ones should be taken care of. Now it’s time to analyze our current data sets and start to identify other major spam sources.
What is spam traffic and how can I identify it?
Spam traffic will mess with your data by sending fake hits. Which will be seen in your data as fake traffic, fake events, page views and transactions. This spam traffic is sent from spambots created by sketchy individuals to attempt hacks into websites and steal a company’s information. So this is a serious problem we need to eliminate. Let's first find where this spam traffic is affecting your data.
There are a few common ways to identify the fake traffic from the real traffic. I always start with the Referrals report under the Acquisition section in your Google Analytics dashboard. Change the date range to the last 3 months and sort in descending order. Look for referrals with 100% or 0% bounce rate and 10 or more sessions, those usually indicate spam traffic.
Also, just read some of the names and you can figure it out pretty easily. Like Vice Motherboard referring traffic to a Vermont-based Inbound Marketing Agency — I wish we were that cool! Document all of the sketchy sources you see. If you want to make sure a site is actually not legit you can visit the link, but make sure you have protection when visiting these sites. Put each confirmed site in an empty document for you to reference later.
Next head to Audience > Technology > Network > Hostname. This shows you the servers that your traffic is coming from. Naturally you should only see your web server, which would be your website and other places you’ve configured your GA tracking ID. If you see Hostnames such as www.foxnews.com and lifehacker.com. If you don’t work for these organizations, you can be pretty sure they’re spam sources.
The ones you need to watch out for are the Google affiliated Hostnames. If you see Googleweblight with your hostname in the front, that’s a good one! It's a service from Google that helps serve your page on mobile networks in some parts of the world. However, translate.googleusercontent.com is used by spammers to bypass people’s filters so be wary of that one. When documenting these hostnames make sure you put a vertical bar in between them ( | ) without any spaces between them.
How to block spam traffic
An easy way to block these from messing with your reports is to create a Filter Expression that only includes the domains you trust and consider to be valid. Head to Admin > Filters > Create new Filter. Name it hostnames or something similar so it can easily be identified later on. Set the filter type to Custom, Include hostname and then input your relevant hostnames in the Filter Pattern.Make sure you always verify a filter before you implement it so that you can make sure it actually works. If it gives you a prompt like this:
Make sure always verify a filter before you implement it to make sure it actually works. If it gives you a prompt like this:
"This filter would not have changed your data. Either the filter configuration is incorrect, or the set of sampled data is too small.”
Then you know you've probably input the filter incorrectly and you’ll have to adjust it. Once your specific hostnames are the only servers being included the spam sources will automatically be filtered out giving you a much cleaner data collection going forward.Next we’re going to take that list of spam referrers you wrote down later and implement them into an Advanced Segment so you can explore historical data without those pesky spam traffic messing up your decision-making. Add a new segment within your GA Dashboard, label it something along the lines of All Sessions (No Spam) and start with a filter of your sessions to include Hostname and matches regex. Then add all of the valid hostnames that your domain uses.
For example, I added www.newbreedmarketing.com, newbreedmarketing.com, and our test sites for new HubSpot websites. Be sure to include all of the hostnames affiliated with your domain, including ecommerce apps, Content Delivery Networks (CDN), and any Google products you are connected to such as YouTube.
Now add a new filter and add Source to include and add all of those spam hostnames you jotted down earlier. As always preview those changes to make sure you’re doing it correctly, then once everything looks good, save that segment. Now whenever you are looking back at historic data make sure to use this segment as it will hide the spam traffic to help you make better data-driven decisions.
Below is the complete list of common spam sources I found while cleaning out our GA. Copy each line individually and add them into your Advanced Segment. And be sure to add a filter (as per above) so you can look at historical data without the worry of spam traffic affecting your reporting.
So now you know how to remove your spam traffic, block them from coming back and how to hide that historical spam traffic from your reports in Google Analytics. If you have any other methods of blocking Google Analytics spam traffic or other reporting tips for GA, feel free to drop us a comment or hit us up on our social media channels!