Chess Tiebreaks
SuperDB isn’t limited to JSON — it can ingest and transform plain text just as easily. This tutorial demonstrates parsing raw text input using grok patterns, reshaping records, and aggregating results to answer a real question.
PGN (Portable Game Notation) is a plain text format for recording chess games. Each game has metadata in bracketed lines like [White "LastName, FirstName"] followed by the moves. Here we’ll parse a tournament’s PGN file to find players who faced each other more than once (indicating tie-break games).
The data comes from the Tata Steel Masters 2024 broadcast on Lichess, the tie-break games can be seen here.
The Complete Solution
curl -sS https://lichess.org/api/broadcast/ycy5D2r8.pgn |
super -i line -s -c "
--
-- parse the pgn file
--
where this != ''
| grok('.*White \"%{WORD:last_name_white}|.*Black \"%{WORD:last_name_black}', this)
| where not is_error(this)
--
-- pair up each player in each game
--
| count
| put game_id:=((count - 1) / 2)::int64
| values {...that, game_id}
| aggregate
last_name_white:=max(last_name_white),
last_name_black:=max(last_name_black)
by game_id
| drop game_id
--
-- re-organize the data by player and opponent regardless of piece color
--
| [
{player: last_name_white, opponent: last_name_black},
{player: last_name_black, opponent: last_name_white}
]
| unnest this
--
-- count each match-up of player and opponent and find any with counts > 1
--
-- this should be players who participated in tie-breaks, as the regular
-- tournament was only a single round-robin
--
| count(this) by player, opponent
| where count > 1
| sort player, opponent
" -
{player:"Abdusattorov",opponent:"Wei",count:3}
{player:"Giri",opponent:"Gukesh",count:4}
{player:"Gukesh",opponent:"Giri",count:4}
{player:"Gukesh",opponent:"Wei",count:3}
{player:"Wei",opponent:"Abdusattorov",count:3}
{player:"Wei",opponent:"Gukesh",count:3}
Walkthrough
Line Input and Filtering
-i line tells super to treat each line as a separate string record. We filter out empty lines with where this != ''.
Grok Parsing
The grok pattern '.*White \"%{WORD:last_name_white}|.*Black \"%{WORD:last_name_black}' uses alternation (|) to match either a White or Black player line. The %{WORD:...} capture extracts just the last name (the first word after the opening quote).
Lines that don’t match (like move notation or other metadata) produce errors, which we filter with where not is_error(this).
Pairing Records
PGN files list White and Black players on consecutive lines for each game. We need to combine them into single records.
The count operator adds a sequential count field to each record (1, 2, 3, …). Integer division (count - 1) / 2 maps pairs of rows to the same game_id: rows 1,2 get game_id 0; rows 3,4 get game_id 1, etc. The count operator wraps the input in a that field, so {...that, game_id} spreads the original fields back out alongside the game_id.
Then aggregate ... by game_id with max() picks up the non-null value from each field within each pair.
Reshaping with Arrays and Unnest
Each game record has {last_name_white, last_name_black}. To count matchups regardless of color, we create an array with both perspectives:
| [{player: last_name_white, opponent: last_name_black},
{player: last_name_black, opponent: last_name_white}]
| unnest this
This doubles our records — each game now appears twice, once from each player’s perspective.
Final Aggregation
count(this) by player, opponent counts how many times each matchup occurred. where count > 1 filters to only matchups that happened more than once - these are the tie-break games.
Nested unnest ... into Example
Here’s a minimal dataset to demonstrate this next technique, based on similar data we were just working with. Four players in a single round-robin (6 games), plus one tie-breaker between Gukesh and Wei:
{pairing:"Giri Gukesh"}
{pairing:"Giri Wei"}
{pairing:"Giri Abdusattorov"}
{pairing:"Gukesh Wei"}
{pairing:"Gukesh Abdusattorov"}
{pairing:"Wei Abdusattorov"}
{pairing:"Gukesh Wei"}
super -s -c "
split(pairing, ' ')
| unnest {pairing:this, player:this} into (
unnest {player: this.player, opponent: pairing} into (
where player != opponent
| cut player, opponent
)
)
" pairings.sup
{player:"Giri",opponent:"Gukesh"}
{player:"Gukesh",opponent:"Giri"}
{player:"Giri",opponent:"Wei"}
{player:"Wei",opponent:"Giri"}
{player:"Giri",opponent:"Abdusattorov"}
{player:"Abdusattorov",opponent:"Giri"}
{player:"Gukesh",opponent:"Wei"}
{player:"Wei",opponent:"Gukesh"}
{player:"Gukesh",opponent:"Abdusattorov"}
{player:"Abdusattorov",opponent:"Gukesh"}
{player:"Wei",opponent:"Abdusattorov"}
{player:"Abdusattorov",opponent:"Wei"}
{player:"Gukesh",opponent:"Wei"}
{player:"Wei",opponent:"Gukesh"}
How Nested Unnest Works
The unnest ... into syntax is powerful but takes some unpacking:
-
split(pairing, ' ')turns"Giri Gukesh"into["Giri", "Gukesh"] - First unnest:
unnest {pairing:this, player:this} into (...)- Keeps the full array as
pairing - Unnests each element as
player - For
["Giri", "Gukesh"]this produces two records:{pairing: ["Giri", "Gukesh"], player: "Giri"}{pairing: ["Giri", "Gukesh"], player: "Gukesh"}
- Keeps the full array as
- Second unnest:
unnest {player: this.player, opponent: pairing} into (...)- Preserves
playerfrom the outer record - Unnests
pairingarray elements asopponent - For
{pairing: ["Giri", "Gukesh"], player: "Giri"}this produces:{player: "Giri", opponent: "Giri"}{player: "Giri", opponent: "Gukesh"}
- Preserves
where player != opponentfilters out self-matches
The result: each pairing expands into both player/opponent perspectives.
Finding Tie-Breaks
Now we can append the same count query to find players who faced each other more than once:
super -s -c "
split(pairing, ' ')
| unnest {all_pairings:this, player:this} into (
unnest {player: this.player, opponent: all_pairings} into (
where player != opponent
| cut player, opponent
)
)
| count(this) by player, opponent
| where count > 1
| sort player, opponent
" pairings.sup
{player:"Gukesh",opponent:"Wei",count:2}
{player:"Wei",opponent:"Gukesh",count:2}
as of versions
super --version
Version: v0.2.0