Sitemap Plugin
About a month ago
I posted about finding a lot of Blosxom plugins
on github. I've been looking at some them. There are a family of them, one
original and a few modifications to the original, for enabling comments. I
have not gotten those to work: best I've gotten is I can leave comments, but
not see them on pages. I may end up writing my own, so that it follows my
idea of what is needed for comments.
But also in that batch of plugins was one for Google Sitemap. The
documentation is non-existant in the repo. Searching the web I did find
blog posts from the author, in Japanese. From those I gather the version
in the github repo is the old, memory intensive way to build a sitemap.
I didn't find the new improved version.
I decided to make do. The
gsitemap
plugin
is dead simple. It just sets a few variables to use in templates, and then
when the sitemap flavor is desired, disables pagination. The rest of the
magic happens in the
flavour templates.
As part of making do, I'm not going to reference or link to the URL that
generates the output (if you are reading and want to use this for your own
site, the gsitemap plugin in the original configuration would generate it
for https://example.com/blosxom/index.xml
, assuming /blosxom/
is the root
of your Blosxom blog).
Instead I've reconfigured the templates to generate
a fragment of a sitemap XML file, changed the flavour to sitemap
from xml
,
and have scripted up a sitemap builder for the whole qaz.wtf site that
curl
s the proper URL and includes()
the xml fragment. Blosxom then can
remain the source of truth for blog permalinks while find
and some
per-directory configuration can build URLs for other parts of the site.
I decided I should run that
script from SAVE-DATES.sh
under the theory that any time I save post
timestamps is a likely time I want to rebuild the sitemap. This works for
qaz.wtf because the blog is the only thing updating more frequently than
monthly, and I typically run SAVE-DATES.sh
shortly after posting an entry.
This is all prompted by looking (again) at just how much
bot traffic
the site gets. I figure a sitemap will stop well-behaved bots from crawling
as much or as frequently. And for non-well-behaved bots, I've
belt and suspendered things by adding entries to
robots.txt
and more user-agents to my
browser_block
plugin.
Similarly in the name of improving search engine interaction, I've got a new
(trivial) plugin called
extrameta
that gets used by other plugins, namely the newly modified
tags plugin
and
pagination plugin
to add a <meta name="robots" content="noindex">
header (in a naive way)
to search result pages, to avoid duplicated content.