QZ qz thoughts
a blog from Eli the Bearded
Tag search results for code Page 2 of 3

Cut-and-paste as a scripting language


I'd say one, perhaps controversial, technique that has been very successful for me in computer system administration are scripts that are not intended to be run with a shell or other single tool but as a series of things to cut-and-paste. I frequently use such things in situations like:

  1. Drafting a proper scripts to be run from Puppet, Chef, Ansible, cron, etc.
  2. Massaging existing systems from one state into another before Puppet, Chef, or Ansible takes over management. This includes bootstrapping.
  3. Changing state in things not under Puppet, Chef, Ansible, etc, control, because, eg it's a database not an OS.
  4. The code is to be used as a runbook for what should be done in the case of an error.

In many cases the cut-and-paste script is something that should become a proper script if it is going to be done often enough, but see point one. A number of systems that need to be brought into a consistent state but for whatever reason are starting from a diverse set of existing states might need a script with a lot of branching. The cut-and-paste script has a human running it and human that can perform error recovery and new branch configurations with relative ease.

To point two: in some cases there will be enough systems to perform this on that some script is required, but the test to know which state should apply is a complicated one to script and it's much simpler to let a human decide the steps needed after which the test becomes simple and automation can take over.

And for point three: always you will have some aspect of the system that is supposed to be above the level of the automation but for which a degree of control is sometimes needed.

Lastly point four: a runbook that provides as exact as possible set of steps to follow allows for more error-free recovery from issues when they do arise. Sometimes this set of steps can be shown to be reliable enough to create scripts (point one again) that perform autorecovery.

I think at this point it becomes clear that the cut-and-paste script is a useful developmental tool for creating robust automation. At least, I frequently find it to be so.

Some examples of situations I've used cut-and-paste scripts:

  • I first developed this method managing a fleet of web servers in the early '00s. The site and the config were all in source code control, but triggering syncs and Apache config reload were manual actions. A loop like for host in $(< hostlistfile); do ssh $host "cd /webroot && p4 sync"; done would work, but it wouldn't be nearly as close to simultaneous as opening sixteen xterms each sshed in to a web server (easy to locally script) and then pasting the commands as fast as possible in each. Later (much later) that company started to use Chef, where the knife command could do a good job of replacing such stuff.
  • Using a web browser to connect to "console" of a newly installed system, using xdotool to type the commands to bootstrap networking onto the system. That "console" was some weird javascript app that didn't accept "paste", hence getting creative with xdotool to emulate typing in it. That network had no DHCP and needed a static IP on the correct VLAN before it could connect. I broke down the commands into several xdotool commands for two reasons, (a) to keep the command lines from getting unreasonably long (ie personal taste), (b) to not "typeahead" when starting programs like a file editor, and (c) to not have to encode which static IP to use, instead just getting right to that point, entering it by hand then continuing with the script. Finally the script ended with rebooting, and then I could take over with ansible, the config management tool of choice there.
  • Filling out a number of web forms where an API should be used, but there is resistance to making the API usable. Specifically, of late, that has been managing "silences" during system updates in Prometheus Alertmanager. Due to the login being behind Okta, command line tools can't authenticate. There is a proposed fix for this, but it hasn't been prioritized yet. In the meantime, I'll just open vi in an xterm and put field values to use on separate lines for quick triple-click copying. Typically I'll have two files open one with the things that are the same for every "new silence" and one for the hostnames changing between them.

One thing that has helped with all of this is the standard X11 paradigm of "select is copy" and "mouse middle button is paste". I can double click on a word in one window, move the mouse to another window and paste it, with no keyboard. Or triple click for line, and then paste it in as many windows as I want with just a single click each. It becomes the opposite of hands never leave keyboard, where the script run is completely done with hand never leaves mouse (at least until the script needs an edit). This cut and paste paradigm never caught on, and it makes me less productive on Macs and Windows. Macs are in some ways the worst, because it's Unix but it's not X11: almost, but not quite. (And to add to the pain, Mac keyboard shortcuts are nothing like xterm ones.)

Of course, if you do need to type some part of the cut-and-paste script learning and using keyboard shortcuts for that, particularly paste, are the way to go. Consider this simple script:

for host in $(< hostlist ) ; do
        # need to set EDITOR enviornment variable after sudo
	ssh -t $host "sudo env EDITOR=ex crontab -e"
done
# paste to ex:
#	:g /day$/ s, \* /, 1-5 /, | x 

One can copy that global command, which finds lines ending in "day" and edits the * before the / to be 1-5. For a crontab like:

MAILTO=cron-list@example.com
3 9 * * * /usr/local/bin/start-day
10 2 * * * /usr/local/bin/run-backup

This will change run every day at 9:03 am start-day program to only run on weekdays: 3 9 * * 1-5 /usr/local/bin/start-day and save the file. The for loop will log in to each host in the list of hosts, possibly prompting for an ssh password (depending on how you have that set up) and probably prompting for a sudo password (depending on how you have that set up). It would be quite reasonable to run the loop, enter passwords by hand, then hit <shift-insert> to use the default xterm paste primary keybinding to paste the script whereupon cron is updated, you are logged out, and move on to next host, so you end up just typing password<enter><shift-insert> password<enter><shift-insert> password<enter><shift-insert> ....

Some tricks of the trade that I've picked up over the years of doing this:

  • Many things that don't immediately look cut-and-pastable can be made so by rethinking them. File edits with ex or ed are easier to script than vi; xdotool can be used to control things in windows.
  • Whole lines are easier to copy than fragments, triple-click will select a line.
  • Learn exactly how to include or not include trailing new lines in your copies, it can make a lot of difference.
  • Use variables like you would in a regular script loop for things that will vary from run to run, even if each run is on a separate host. This lets you separate variable setting pastes from command run pastes.
  • Setting variables with shell commands instead of manually makes for a smoother experience. grep, head, and tail, are good for selecting lines from output. cut, sed, and grep -o are good for selecting part of a line from output:
       host=$(hostname -f)
       homeuse=$(df /home | tail -1 | grep -o '[0-9]*%' | cut -f 1 -d %) 
  • Some shell line break methods are more fragile than others. Consider:
    # Pasted as a group, both of these run, even if you only want the
    # second to run if the first succeeds.
    /bin/false
    /bin/true
    
    # Pasted as a group, the second will run only if the first succeeds
    /bin/true &&
    /bin/false
    
    # The backslash will escape the following whitespace character.
    # If you have a trailing space, that's the whitespace that will
    # be escaped, not the newline.
    [ -f "$somefile" ] && 
    	head "$somefile" \ 
    	| next command
    
    # Written like this, trailing whitespace is ignored, the pipe
    # alone is enough to keep the shell looking for more to do.
    [ -f "$somefile" ] && 
    	head "$somefile" | 
    	next command
    

And remember, cut-and-paste scripts are a good way to write a real script, slowly dealing with all the error cases to handle and best ways to make the code universal. Use them to compose your startup scripts, your cron jobs, your configuration management tooling, your runbooks, and coding your self-healing systems.

slowcat.c and signature.c


A discussion in the Usenet group comp.sys.raspberry-pi about browsers turned to ASCII art when I brought up concerns about showing ASCII art in web pages (based on the recent notint screenshots here). And that discussion had one Charlie Gibbs reminisce about the "Andalusian Video Snail".

I found the file for him at https://grox.net/misc/vt/ — it's an ASCII art animation with VT100 escape codes so viewing it is tricky. Those animations are typically highly dependent on viewing on slow terminals, and modern hardware is anything but a slow terminal.

At grox.net there is also a slowcat.c which easily compiles. That code is essentially a loop of getchar();nanosleep();putchar(); to delay output. It worked okay for a few videos I tried, but the snails did not look so good. And it would not slow down STDIN, which would be handy for a program I wrote (more later).

So the biggest problem with the Grox slowcat.c is while it adds delays, it doesn't disable output buffering, so output ends up being large chunks showed without delay and then a delay for the next large chunk. That doesn't suit that particular animation well. Other problems include a "cat" that only shows exactly one file instead of arbitrary numbers, an awkward interface for delays (how many nanoseconds do I really want? and why does the code use usecs for presumably μseconds), and arg parsing oddness.

I decided to rewrite slowcat. I disabled buffering, switched to usleep(), employ getopt() and an arbitrary number of files, and added a option to have the program calculate a delay to emulate a particular baud rate, which is how terminal speeds were measured when these were written. Quite likely a baud of 1200 or 9600 was assumed by the animator. Baud is a measure of bits per second, so 9600 works out to 150 characters per second, or about 13 seconds for an 80x24 terminal screen. When I test time slocat -b 9600 snails.tv takes 3m33.23s while time cat snails.tv takes 0.47s. On my system stty tells me 38400 baud, which would be 4 times faster, but since it actually runs 453(ish) times faster, the baud equivalent is closer to 4,500,000 than 38,400. For most purposes, that faster speed is much appreciated.

So back to signature.c, by which I mean this program:

Elijah
------                                         /* gcc signature.c -lm */
main(){int i,j,k,l;for(i=-12;i<13;i++,printf("\033[H")){for(l=61968*457+
1,l*=5;l>8;l>>=4)p(97-(l>>22)+(l&15));for(j=-12;j<12;){for(k=j+12?-12:-6
;12>k;)p((l=k*k+++i*i+j*j)<12*12?(l=sqrt(l))[".,:;iIJYVSOM"]:32);j++<11?
p(10):0;}}puts("Elijah ");}p(int m){printf("%c",m);}

I use it in place of a .signature on select Usenet posts. I wrote it approximately 1995. Over the years, it's been used perhaps one or two dozen times, and I have made at least five variants with different email addresses, each of which has been reformated to have the lines the same length.

(An aside on .signatures: I have never used a proper .signature, that is to say a block of standard text appeneded automatically to the end of my posts or email. I have always done it manually, and, for Usenet posts, 99.9% of them are unique entries composed for that particular post. I have about 30 programs, most in Perl, that I have used and reused from time to time. Since I don't consider them true signatures, and do consider them part of the post, I do not use proper signature cut lines — dash dash space on a line by itself — which annoys some people, sometimes.)

The origins of that program are interesting. The program came to me in a dream where I visualized an ASCII art movie of cross-sections of a sphere that changes "color" closer to the core. The colors are represented by the characters in the string ".,:;iIJYVSOM" and the code uses some basic IOCCC obfuscation tricks, like using C arrays backwards: (offset)["array to index"], but is mostly pretty readable once reformatted. The use of +++ is very classic me, since I enjoy testing how far I can push a parser to accept the same character over and over. My extreme is perhaps this Perl one:

sub S(){@s=caller($/);$s[3]=~s s\w+:+ss&&print$s[3].q. .}$/=$^=~s/\S+/\n/;
$_="Just (eli) Another (the) Perl (bearded) Hacker";sub s($){eval$_[0];$/}
while(s&&&&& &s(qq&sub$^&.$&.q&{\&S}&)&& &{$&}&&s&&&){$/}$\=$^;print"\b,";

Five ampersands in a row, then a space because the parser choked on ampersand six. s&&& runs a substitution using the previous regexp, so the RE field between s& and the middle & is empty and replaces the match with an empty string, the middle & and third &. If that succeeds (&&) call subroutine s: &s. I needed a space on either side of the && and operator as a hint to the parser about what's going on, and I choose to go with after to get five in a row. Then I squeeze as many extra ampersands as I can in the rest of that line. I'm pleased to say that since I wrote that, the Perl parser has improved to the point that the six ampersand in a row version now works: a fragment like (s&&&&&&s( is either a nightmare or a wonderous sight depending on your tastes.

But I digress. The dream version of sphere.c did not include the use of the number 28319377 to represent "Elijah" printed on every frame, I added that when I decided that sphere.c should become signature.c. It's obfuscated enough that the average schmoe won't be able to change it, but could perhaps remove it.

But like those other ASCII art movies, signature suffered from the speed-up of terminals. The visualization cannot be appreciated at such lightning rates. The fix to have slowcat work on STDIN was added with my signature program in mind: signature | slowcat -b 9600

My slowcat and the latest version of signature.c, now fixed to have a more circular sphere, are available at github, along with various animations (the animations from grox.net and textfiles.com).

Game Tools


Here, some discussion of two game tool programs I have in game-tools on github.

asciimapper

In the mid-1990s, I knew an admin of the Tsunami MUD and played the game a bit. Fast-forward a decade and I decided to give it a try again. At (then) about fifteen years old (now closer to thirty), it was one of the older MUDs around, which meant it had a very long time to expand. There were vast areas of the game to explore, and I set out to see as much as I could.

Over the course of several months, I visited huge swaths of the game, and got myself on the explorer leaderboard, where I was one of the lowest level characters there. (Accounts automatically delete after time time if you don't log in, so I can't know if others had done better than me before then, and you won't be able to find me there now.) Eventually I started to run into time-to-new-area payoff diminishing returns and stopped playing.

While I was playing I drew myself a lot of maps. At first these were on paper, but eventually I developed an ASCII art short hand. This let me have text files I could grep for noteworthy items or places. From there, I wrote a tool that could take my ASCII art maps and convert them into nice printable maps. asciimapper worked by converting my ASCII art into config files for ifm the "Interactive Fiction Mapper", which was designed for Infocom and similar games. The crossover to MUD maps was trivial. Some of the maps I printed and would hand annotate for further details, but most I kept only in ASCII file form.

I have all my ASCII art maps for Tsunami somewhere, I could probably dig them out and put them on the web. I haven't played in at least a decade now, though, and there's more than zero chance some of them are obsolete. Some became inaccuate while I was playing. In particular I recall the entrance to Toyland moving, to be friendlier to low level players.

I've been thinking about asciimapper again as I play "Andor's Trail"; (previously dicussed about a month ago here). In "Andor's Trail", there are perhaps 520ish visitable areas, most of which show up on the World Map, but about 20% are indoors, underground, or otherwise not visible there. How to get to those plus the inventories of stores in particular spots has been something I've been mulling over. The ASCII art needed for the World Map would be doable, but something of a challenge.

The maps are text form already though, just not very clear text form. Here's an excerpt from AndorsTrail/res/xml/woodsettlement0.tmx, an XML file apparently created by Tiled:

 <objectgroup name="Mapevents">
  <object name="east" type="mapchange" x="928" y="224" width="32" height="64">
   <properties>
    <property name="map" value="roadbeforecrossroads2"/>
    <property name="place" value="west"/>
   </properties>
  </object>
  <object name="woodhouse1" type="mapchange" x="608" y="288" width="32" height="32">
   <properties>
    <property name="map" value="woodhouse1"/>
    <property name="place" value="south"/>
   </properties>
  </object>
  <object name="woodhouse2" type="mapchange" x="640" y="128" width="32" height="32">
   <properties>
    <property name="map" value="woodhouse2"/>
    <property name="place" value="south"/>
   </properties>
  </object>
  <object name="woodhouse0" type="mapchange" x="224" y="256" width="32" height="32">
   <properties>
    <property name="map" value="woodhouse0"/>
    <property name="place" value="south"/>
   </properties>
  </object>
  <object name="sign_wdsetl0" type="sign" x="800" y="256" width="32" height="32"/>
  <object name="sign_wdsetl0_grave1" type="sign" x="128" y="160" width="32" height="32"/>
  <object name="sign_wdsetl0_grave2" type="sign" x="128" y="224" width="32" height="32"/>
 </objectgroup>

You can easily see how the map pieces connect together, including ones like woodhouse0, woodhouse1, and woodhouse2 that don't show up on the World Map. In woodhouse2.tmx we find Lowyna:

<objectgroup name="Spawn">
  <object height="96" name="smuggler1" type="spawn" width="96" x="32" y="96"/>
  <object height="128" name="smuggler2" type="spawn" width="96" x="128" y="96"/>
  <object height="32" name="lowyna" type="spawn" width="96" x="288" y="96"/>
[...]

Which with a little bit of work we can connect that the shop "droplist", in this case in AndorsTrail/res/raw/droplists_v070_shops.json, to get items she stocks.

A map.tmx to IFM format converter might be handy, but I haven't put any serious thought into it.

asciimapper

I have thought about game play efficiency with "Andor's Trail". In particular while playing I thought it would be useful to have a way to see how fast I'm earning in-game rewards like XP, game currency, item drops, and how fast I'm using consumables while doing so. I imagined a tool that I could tell what I have at a particular time and it would work out how much that changes over time.

Those imaginings lead to stat-timer, a CLI with a very old school interogation interface. You can use the command line to give it starting stats or just start it and it will ask for stats. Then you can update as many or as few stats as you want each round and it gives updates. The design requires that you name stats for the initial state, and then if in same order, you can omit names. Thus the most important things being measured should be first, and least important last. Or least changing last.

In practice this means I've been putting XP first, then common area item drop and/or gold, then health potion count, and then rare drops, and finally — sometimes — constants I want for annotations. As I play, I update XP frequently and other columns less frequently. To update just the first two columns is a matter of just entering the first two numbers. To update first and third requires labeling the number for the third column. After each entry it gives a snapshot of how things are doing on a per-second basis. When done, I can <ctrl-d> out or put a ! at the end of the numbers to indicate final update. It then gives a final update with total changes, per-hour and per-second rate of changes. This makes it easier to compare play style one to play style two even if they are on different days and for different lengths of play.

If I update it further, things I've been thinking about for improving it include: a curses interface with data at particular screen locations, sophisticated "pause timer while entering data", realtime per-second updates, and perhaps a more sophisticated state model for the command line, for better continuation after an intertuption.

ascii-art

Web Log Tools


As in tools for web server logs, not the web logs commonly called "blogs".

In the early 2000s, I was doing a lot of very specific log analysis. At the time I was "webmaster" for a site with ads. To justify ad sales, the company paid for a web server log audit service. This provided the main log reports looked at by the company, but sometimes I'd be called on to explain things. So I had to dive into the logs and examine them myself.

Enter logprint. Today this tool is not going to be widely useful, instead people will use an ELK stack and define a grok rule in the logstash part of ELK. But initial release of logstash was 2010, long after I wrote logprint.

What logprint does is parse log files of various formats — I defined four that I've had to work with, adding more is an excerise regular expression writing, same as with grok — into columns. Some of those columns can be sub-parsed. For example, the Apache request line column can be broken down into a method ("GET", "POST", "HEAD", etc), a URI (the actual requested resource) and an optional protocol (not present for HTTP 0.9 or present as "HTTP/1.0" or "HTTP/1.1"). After parsing the line, it can be filtered: only consider requests that succeeded (2xx), and were over 200,000 bytes; then selectively print some of the columns for that entry, say date, referer, URI.

# Apache "combined" has referer as a column ("common" does not)
# status >= 200 and status <= 299 is any 2xx response
# @uri will only be the local file name, discarding a full hostname
#      on the request line and CGI parameters
logprint -f combined \
	-F 'status>=200' -F 'status<=299' -F 'bytes>200000' \
	-c date,referer,@uri

Things like parsing the file part into URI when you get a request with the full URL on the GET line is an unusual need, but I needed it then and it is still useful now. The same parsing rules for a full URL there are also available for parsing Referer: headers, which was once useful for pulling out search terms used from referring search engines.

So logprint is a very handy slice and dice tool for web logs. It can be combined with another tool I wrote, adder which aims to be a full featured "add values up" tool. You can feed in columns of numbers and get columns of sums. You can feed in columns of numbers and get a sum per line. You can feed in value and keyword and get sums per keyword. That last one is rather useful in combination with logprint.

# using Apache "common" format, find lines with status 200,
# print bytes used and the first directory component of the URI file part
#   pipe that to adder,
#       suppress column headers, 
#	use column 0 as value to add,
#	 and column 1 as label
logprint -f common --filter status=200 -c bytes,file:@path1 $log |
    adder -n -b 1 -r 0

That gets output like this (although this was sorted):

/u      14415354750
/favicon.ico    3311323662
/i      655750249
/qz     272329622
/apple-touch-icon.png   218913277
/index.html     62583501
/jpeg   49580565
/qz.css 38188009

So simple to see where the bytes are coming from. Looking at that, I decied I really should better compress the "apple-touch-icon.png". I'm not sure I can get "faveicon.ico" smaller, at least not with the features it has now. And the CSS and other icons in /i/ also got some compression.

Then I looked at bytes per day to see if adding a sitemap helped. It does, but the difference is slight, easy to lose in the weekly cycle. Usage really picked up in April, didn't it?

bytes per day graph
$ cat by-day-usage
#!/bin/sh
log="$1"
if [ ! -f "$log" ] ; then echo "$0: usage by-day-usage LOGFILE[.gz]"; exit 2 ; fi
shift;
logprint -f common --filter status=200 -c bytes,date $log $@ |
    adder -n -b 1 -r 0

And graphed with gnuplot

So I'm publishing these log tool scripts for anyone interested in similar simple log slicing and dicing. It's not awstats or webalizer but it's not trying to be either.