Matched Pairs
In vi
(and vim
), there's a "motion" command, %
that moves you to an enclosing symbol. In the previous
sentence, with the cursor on the "v" of vim
, using
%
will move the cursor to the "(", which is the
start of the enclosed sequence. On that "(", the %
motion moves to the matching ")".
Out of the box, vi
knows the pairs "()", "[]", and "{}".
You can change the pairs with the configuration
variable matchpairs
and people frequently do to add
"<>" for XML or HTML work:
set matchpairs=(:),[:],{:},<,>
But there are a lot more, like quoting angles "«»"
and smart quotes. And vim
happily accepts UTF-8 characters for each half of a pair. So I could think up some
Unicode pairs and stick them in there. Or I could look
for all pairs that exist in Unicode.
Here's a stab at doing just that.
First off, we need the list of characters in Unicode.
This is surprisingly easy to get. Unicode themselves
provide an easy to parse list of characters in plain
ASCII(!).
$ curl -sO http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
The list is not fixed, new stuff gets added with each
version of Unicode. Releases happen every 12 to 18
months. Refreshing that file is the major change
needed to update my
Unicode Toys
for a new version.
So how about a quick script to find "LEFT" characters
that have a matching "RIGHT" version?
#!/usr/bin/perl
# Read the UnicodeData.txt file to create a vim
# matchpair list.
#
# May 2022 "Eli the Bearded"
use strict;
use warnings;
my $in = 'UnicodeData.txt';
my %found;
my %pool;
my %check;
my $id;
my $pid;
my $name;
my $count = 1; # first pair is the hardcoded one
# Tag characters are an obsolete invisible set of
# ASCII for hidden metadata. Modifiers and Combining
# should not be used on their own. Arabic are not
# left-to-right text, so I decided I don't need them.
# You may decide otherwise. The others, by inspection,
# don't have anything I'd want as a pair. (Some
# are part of of larger sets, like up/down/left/right
# quads, or parts of multicharacter pictures.) This
# still leaves some unlikely pairs including box drawing
# stuff. It's a quick list.
#
# These are checked with word boundaries, so CIRCLE
# will not skip CIRCLED.
my @skip = qw[ TAG MODIFIER COMBINING ARABIC IDEOGRAPH
ARROWHEAD ARROW
AFFIX CIRCLE HALF
UP UPWARDS
DOWN DOWNWARDS
];
my $skip = join('|', @skip);
my $skip_re = qr/\b(?:$skip)\b/; # \b for boundary
# no boundary check, allow "leftfacing" and the like
my $keep_re = qr/(?:LEFT|RIGHT)/;
binmode(STDOUT, ':utf8');
open(STDIN, '<', $in) or die;
while(<>) {
# keep code point and name only
/^([^;]+);([^;]+);/ or next;
$id = $1;
$name = $2;
# Stop checking if on skip list
next if /$skip_re/;
# if left or right, keep, but separately
if (/$keep_re/) {
if (/RIGHT/) {
$check{$name} = $id;
} else {
$pool{$name} = $id;
}
}
}
close STDIN;
for $name (keys %check) {
my $pair = $name;
# %check has RIGHTs, see if there is a matching left
$pair =~ s/RIGHT/LEFT/g;
if (length( $pid = $pool{$pair} )) {
$id = $check{$name};
$found{$pid} = $id;
# In .exrc or .vimrc " is used to begin a comment.
# These three printf()s just document the pairs.
printf(qq{" U+%s\t%c\t%s\n}, $pid, hex($pid), $pair);
printf(qq{" U+%s\t%c\t%s\n}, $id, hex($id), $name);
printf "\"\n";
$count ++;
}
}
print STDERR "Found $count pairs\n";
# Unfortunately < and > are not named with LEFT and RIGHT
# so hardcode that.
printf "set matchpairs=<:>";
for $id (sort { $a cmp $b } (keys %found)) {
$pid = $found{$id};
printf ",%c:%c", hex($id), hex($pid);
}
printf "\n";
__END__
Saved as matchmaker
, with the Unicode data file in
same directory, let's try it.
$ perl matchmaker >> .vimrc
Found 186 pairs
$ tail -1 .vimrc
set matchpairs=<:>,(:),[:],{:},«:»,֎:֍,܆:܇,࿖:࿕,࿘:࿗,𐡷:𐡸,𝄆:𝄇,𝅊:𝅌,𝅋:𝅍,👈:👉,🔍:🔎,🕃:🕄,🕻:🕽,🖉:✎,🖘:🖙,🖚:🖛,🖜:🖝,🗦:🗧,🗨:🗩,🗬:🗭,🗮:🗯,🙬:🙮,🤛:🤜,🫲:🫱,🭪:🭨,🭬:🭮,🭼:🭿,🭽:🭾,🮜:🮝,🮟:🮞,🮠:🮡,🮢:🮣,🮤:🮥,🯇:🯈,‘:’,“:”,‹:›,⁅:⁆,⁌:⁍,⁽:⁾,₍:₎,⇇:⇉,⊣:⊢,⋉:⋊,⋋:⋌,⌈:⌉,⌊:⌋,⌍:⌌,⌏:⌎,⌜:⌝,⌞:⌟,〈:〉,⌫:⌦,⍅:⍆,⎛:⎞,⎜:⎟,⎝:⎠,⎡:⎤,⎢:⎥,⎣:⎦,⎧:⎫,⎨:⎬,⎩:⎭,⎸:⎹,⏋:⎾,⏌:⎿,⏪:⏩,⏮:⏭,⏴:⏵,┤:├,┥:┝,┨:┠,┫:┣,╡:╞,╢:╟,╣:╠,╴:╶,╸:╺,▉:🮋,▊:🮊,▋:🮉,▍:🮈,▎:🮇,▏:▕,▖:▗,▘:▝,◀:▶,◁:▷,◂:▸,◃:▹,◄:►,◅:▻,◜:◝,◟:◞,◣:◢,◤:◥,◰:◳,◱:◲,◸:◹,◺:◿,☚:☛,☜:☞,⚟:⚞,⛦:⛥,❨:❩,❪:❫,❬:❭,❮:❯,❰:❱,❲:❳,❴:❵,⟅:⟆,⟕:⟖,⟞:⟝,⟢
:⟣,⟤:⟥,⟦:⟧,⟨:⟩,⟪:⟫,⟬:⟭,⟮:⟯,⥼:⥽,⦃:⦄,⦅:⦆,⦇:⦈,⦉:⦊,⦋:⦌,⦍:⦐,⦏:⦎,⦑:⦒,⦗:⦘,⧘:⧙,⧚:⧛,⧼:⧽,⫍:⫎,⫥:⊫,⬱:⇶,⮄:⮆,⮐:⮑,⮒:⮓,⯇:⯈,⸂:⸃,⸄:⸅,⸉:⸊,⸌:⸍,⸜:⸝,⸠:⸡,⸦:⸧,⸨:⸩,⸶:⸷,⹑:⹐,⹕:⹖,⹗:⹘,⿸:⿹,〈:〉,《:》,「:」,『:』,【:】,〔:〕,〖:〗,〘:〙,〚:〛,꧁:꧂,﴾:﴿,︵:︶,︷:︸,︹:︺,︻:︼,︽:︾,︿:﹀,﹁:﹂,﹃:﹄,﹇:﹈,﹙:﹚,﹛:﹜,﹝:﹞,(:),[:],{:},⦅:⦆,「:」
$
There are a lot of good pairs in that.
But some pairs might need to be switched for taste.
(Looking at those hands.)