QZ qz thoughts
a blog from Eli the Bearded

Bot Traffic, Again


One of annoying things I had happen last time this blog was in active use was getting hammered by a rogue bot. It has happened again.

blog hits from 12am March 1st to 2pm March 9th35121
blog hits in that time not from bots528

Hits by bot:

countUser-Agent
27543 "Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)"
4998 "Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)"
1001 "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
449 "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
216 "Mozilla/5.0 (compatible; AhrefsBot/6.1; +http://ahrefs.com/robot/)"
114 "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.92 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
110 "istellabot/t.1.13"
74 "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65 "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)"
37 "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
34 "Mozilla/5.0 (compatible; SemrushBot/1.0~bm; +http://www.semrush.com/bot.html)"
32 "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.75 Safari/537.36 (compatible; SMTBot/1.0; +http://www.similartech.com/smtbot)"
22 "Mozilla/5.0 (compatible; YandexImages/3.0; +http://yandex.com/bots)"
16 "PHP-Curl-Class/8.0.1 (+https://github.com/php-curl-class/php-curl-class) PHP/7.0.33-0ubuntu0.16.04.12 curl/7.47.0"
16 "Mozilla/5.0 (compatible; SemrushBot/6~bl; +http://www.semrush.com/bot.html)"
16 "Mozilla/5.0 (iPhone; CPU iPhone OS 7_0 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
16 "SearchAtlas.com SEO Crawler"
13 "Mozilla/5.0 (compatible; Linux x86_64; Mail.RU_Bot/2.0; +http://go.mail.ru/help/robots)"
12 "Mozilla/5.0 (compatible; Linespider/1.1; +https://lin.ee/4dwXkTH)"
11 "Jigsaw/2.3.0 W3C_CSS_Validator_JFouffa/2.0 (See <http://validator.w3.org/services>)"
10 "Validator.nu/LV http://validator.w3.org/services"
10 "Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; AspiegelBot)"
10 "Mozilla/5.0 (compatible;Linespider/1.1;+https://lin.ee/4dwXkTH)"
9 "Mozilla/5.0 (compatible; SEOkicks; +https://www.seokicks.de/robot.html)"
7 "Googlebot-Image/1.0"
6 "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106"
4 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) BingPreview/1.0b"
4 "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"
2 "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Applebot/0.1; +http://www.apple.com/go/applebot)"
2 "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebit/53.7.36 (KHTML, like Gecko) Chrome/63.0.3239.0 Safari/537.36 (compatible; Linespider/1.1; +https://lin.ee/4dwXkTH)"
2 "Mozilla/5.0 (compatible; Pinterestbot/1.0; +http://www.pinterest.com/bot.html)"
2 "Mozilla/5.0 (compatible;AspiegelBot)"
2 "Mozilla/5.0 (iPhone; CPU iPhone OS 8_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B411 Safari/600.1.4 (compatible; YandexMobileBot/3.0; +http://yandex.com/bots)"
2 "ltx71 - (http://ltx71.com/)"
1 "W3C_Validator/1.3 http://validator.w3.org/services"
1 "DomainStatsBot/1.0 (https://domainstats.com/pages/our-bot)"

Hits by non-bots: 38 unique User-Agents (across ~500 hits)

One user agent really stands out. And one other is suspicious. I'm talking about the two that hit my site more than world-famous Google.

I don't know everything MJ12bot does, but I do know one thing it does is power paid access to "incoming" links reports via "Majestic Site Explorer": "Access raw exports from £79.99 a month". So let me get this, you crawl sites to sell people lists of who links to them? Why should I waste my bandwidth giving you pages?

But clearly it is Megaindex that is abusive. At the .com version of the site I read "MegaIndex is a powerful and versatile competitive intelligence suite for online marketing, from SEO and PPC to social media and advertising research." Again, this is a bullshit use of my resources (bandwidth, web server CPU) for some commercial enterprise that cannot benefit me.

So: another new plugin is born, browser_block. Goodbye Megaindex. Goodbye Majestic.