QZ qz thoughts
a blog from Eli the Bearded

Post Filter

So I started working on a method to compose posts better, for my idea of better at least. Version one of qzpostfilt (and a README and some test code) is available in the git repo and browsable here.

It's a mis-mash of Markdown and nroff/troff style commands. I picked what I thought the easiest to remember and to type on a phone keyboard methods of composing my posts would be. Basically you have markdown style inline formatting for bold, italic, code and *roff style formatting for more block level stuff. As a general rule: .foo creates <foo> and you need to explicitly close your ./foo.

my xterm composing this post

It ends up looking more like *roff than markdown.

I see this as the first step towards a composing tool. I'll also need a CGI wrapper for phone use and a command line wrapper to help with tags. As of today, it's more of a :% !qzpostfilt in vi sort of thing.

Also in blosxom blog news, I've changed the html flavor templates slightly and made changes to the CSS file mostly to better support reading on small screens, but also for a <ul> class to use in recipe posts.

Lastly, I added another two dozen logos to the mix. Previously there had been 146, so it's up to 170 now. I don't recall how I created the first logos, this time my method was to type "QZ" in libreoffice, change the font size to 180, and then go through the fonts I have installed, screenshotting all the interesting ones. I next very roughly cropped the images so that each QZ was alone on a white background. From there I started scripting the work.

# for every png file, convert to ppm (RGB color), auto-crop a white border,
# convert to pgm (grayscale), rescale so ysize (height) is 100 pixels,
# convert to png making white (and only exactly white) transparent, saving
# that result in the "new" directory
for f in *png ; do pngtoppm $f | pnmcrop -white | ppmtopgm | 
   pnmscale -ysize 100 | pnmtopng -transparent =white > new/$f ; done

(That's something I need to add to qzpostfilt: a <pre> handler. Todo. Hand fix for now.)

A second, messier, pass with identify got me the files renamed to look like "linux-biolinum-keys-h100-w205.png" instead of "linux-biolinum-keys.png". My randomlogo plugin uses the height and width information when available.

Tags Plugin, first version

Several things have become clear writing this plugin.

  1. Some sort of method to standardize tags will be helpful. Did I use "plugin" or "plugins" last time?
  2. Getting tags working and getting tags complete are different tasks. I have tags working now, complete will include non-ASCII tags working (currently they display, but won't search) and more advanced searching like TAG or TAG, TAG and TAG, TAG without TAG, and no tags at all. Even if the automatically generated links won't include that, I'll want those searches for my own use.
  3. The limits of the default interpolation become more obvious. I can set a a string to be included, but I can't have a template that includes a loop over an array, to e.g. have the top twenty tags listed in desktop view but only five in mobile view.

I don't plan to fix interpolation any time soon, but the other two I can see in my near future. I will probably code up a post composer of some sort to help standardize tags. With that I can foresee also making the post composer do preliminary HTML formatting from a simple markup language, and hooking the composer up to a cellphone friendly CGI page.

Fixing the tags plugin to move beyond "MVP" — minimum viable product — particularly for non-ASCII tags will also be a relatively high priority. Right now tags can be (roughly) /^[a-zA-Z0-9_-]+$/, with the caveat that I block leading hyphens (will probably want that available for tag negation searches) and I allow whitespace in tags, but map that to _ (underscore) for searches, so there will be no way to distinguish between this_tag and this tag.

But here it is: aaa_tags.

First off tags for any post are saved in a file with the same name but a different (configurable) extension. One tag per line. Lines starting with # are comments. Leading and trailing whitespace removed, and internal whitespace is normalized to a single space.

It creates a interpolation variable for a configurable number of top tags, I'm going to be using that instead of the old categories. (And I used the categories and subcategories to seed the tags on all the old posts.) It also creates a tags variable for interpolation just before each story is processed. Both of those make the tags shown into search links.

I've named it starting with "aaa" because I want it to have access to the %files and %others lists before any filter() edits them. This way I can build a complete list of unfiltered tags. This also means that tag search filters happen with a very high priority, which isn't as necessary. Just running that any time before pagination kicks in would have been fine.

With this plugin, I'm retiring my use of the categories, prettycategory, menu, and breadcrumbs plugins. That's leaving me pretty close to 100% plugins I wrote (or *cough*paginateqz*cough* rewrote).

A new pagination plugin

I found my "round tuit" and rewrote the old paginate plugin to include the features of cooluri, thus superceding my modified versions of both of those (paginateqz-v0.10, nowcooluri-v0.2).

The new paginateqz handles both at once allowing me to have the "newer" and "older" pagination links on permalink pages, and those links go through posts a single entry at a time. This is the feature I really wanted. It works just like a real blog now!

At the same time I added some features. There's a new interpolation variable $paginateqz::np_sep which is a separator only set when next and previous links are set. The template filler in paginateqz now is respectful of how you may have a plugin to find templates (spoiler, I do) and how you may have a plugin to do interpolation (I do not).

Next up will be a tagging plugin. I have a draft which can find and display tags, but it does not let one search by tags, yet. I also need to actually go and tag posts for it to be useful. After that, I'll have all the features I find essential. Non-essential, but nice to have, will include searching and comments. The comments one is hard because of spammers. Tags that search takes me to 80% value of complete search, I think.

Bot Traffic, Again

One of annoying things I had happen last time this blog was in active use was getting hammered by a rogue bot. It has happened again.

blog hits from 12am March 1st to 2pm March 9th35121
blog hits in that time not from bots528

Hits by bot:

27543 "Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)"
4998 "Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)"
1001 "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
449 "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
216 "Mozilla/5.0 (compatible; AhrefsBot/6.1; +http://ahrefs.com/robot/)"
114 "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.92 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
110 "istellabot/t.1.13"
74 "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65 "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)"
37 "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
34 "Mozilla/5.0 (compatible; SemrushBot/1.0~bm; +http://www.semrush.com/bot.html)"
32 "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.75 Safari/537.36 (compatible; SMTBot/1.0; +http://www.similartech.com/smtbot)"
22 "Mozilla/5.0 (compatible; YandexImages/3.0; +http://yandex.com/bots)"
16 "PHP-Curl-Class/8.0.1 (+https://github.com/php-curl-class/php-curl-class) PHP/7.0.33-0ubuntu0.16.04.12 curl/7.47.0"
16 "Mozilla/5.0 (compatible; SemrushBot/6~bl; +http://www.semrush.com/bot.html)"
16 "Mozilla/5.0 (iPhone; CPU iPhone OS 7_0 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
16 "SearchAtlas.com SEO Crawler"
13 "Mozilla/5.0 (compatible; Linux x86_64; Mail.RU_Bot/2.0; +http://go.mail.ru/help/robots)"
12 "Mozilla/5.0 (compatible; Linespider/1.1; +https://lin.ee/4dwXkTH)"
11 "Jigsaw/2.3.0 W3C_CSS_Validator_JFouffa/2.0 (See <http://validator.w3.org/services>)"
10 "Validator.nu/LV http://validator.w3.org/services"
10 "Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; AspiegelBot)"
10 "Mozilla/5.0 (compatible;Linespider/1.1;+https://lin.ee/4dwXkTH)"
9 "Mozilla/5.0 (compatible; SEOkicks; +https://www.seokicks.de/robot.html)"
7 "Googlebot-Image/1.0"
6 "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106"
4 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) BingPreview/1.0b"
4 "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"
2 "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Applebot/0.1; +http://www.apple.com/go/applebot)"
2 "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebit/53.7.36 (KHTML, like Gecko) Chrome/63.0.3239.0 Safari/537.36 (compatible; Linespider/1.1; +https://lin.ee/4dwXkTH)"
2 "Mozilla/5.0 (compatible; Pinterestbot/1.0; +http://www.pinterest.com/bot.html)"
2 "Mozilla/5.0 (compatible;AspiegelBot)"
2 "Mozilla/5.0 (iPhone; CPU iPhone OS 8_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B411 Safari/600.1.4 (compatible; YandexMobileBot/3.0; +http://yandex.com/bots)"
2 "ltx71 - (http://ltx71.com/)"
1 "W3C_Validator/1.3 http://validator.w3.org/services"
1 "DomainStatsBot/1.0 (https://domainstats.com/pages/our-bot)"

Hits by non-bots: 38 unique User-Agents (across ~500 hits)

One user agent really stands out. And one other is suspicious. I'm talking about the two that hit my site more than world-famous Google.

I don't know everything MJ12bot does, but I do know one thing it does is power paid access to "incoming" links reports via "Majestic Site Explorer": "Access raw exports from £79.99 a month". So let me get this, you crawl sites to sell people lists of who links to them? Why should I waste my bandwidth giving you pages?

But clearly it is Megaindex that is abusive. At the .com version of the site I read "MegaIndex is a powerful and versatile competitive intelligence suite for online marketing, from SEO and PPC to social media and advertising research." Again, this is a bullshit use of my resources (bandwidth, web server CPU) for some commercial enterprise that cannot benefit me.

So: another new plugin is born, browser_block. Goodbye Megaindex. Goodbye Majestic.