Kristina Chodorow's Blog
Hacking Chess: Data Munging
This is a supplement to the Hacking Chess with the MongoDB Pipeline. This post has instructions for rolling your own data sets from chess games.
Download a collection of chess games you like. I’m using 1132 wins in less than 10 moves, but any of them should work.
These files are in a format called portable game notation (.PGN), which is a human-readable notation for chess games. For example, the first game in TEN.PGN (helloooo 80s filenames) looks like:
[Event "?"] [Site "?"] [Date "????.??.??"] [Round "?"] [White "Gedult D"] [Black "Kohn V"] [Result "1-0"] [ECO "B33/09"] 1.e4 c5 2.Nf3 Nc6 3.d4 cxd4 4.Nxd4 Nf6 5.Nc3 e5 6.Ndb5 d6 7.Nd5 Nxd5 8.exd5 Ne7 9.c4 a6 10.Qa4 1-0
This represents a 10-turn win at an unknown event. The “ECO” field shows which opening was used (a Sicilian in the game above).
Unfortunately for us, MongoDB doesn’t import PGNs in their native format, so we’ll need to convert them to JSON. I found a PGN->JSON converter in PHP that did the job here. Scroll down to the “download” section to get the .zip.
It’s one of those zips that vomits its contents into whatever directory you unzip it in, so create a new directory for it.
So far, we have:
$ mkdir chess $ cd chess $ $ ftp ftp://ftp.pitt.edu/group/student-activities/chess/PGN/Collections/ten-pg.zip ./ $ unzip ten-pg.zip $ $ wget http://www.dhtmlgoodies.com/scripts/dhtml-chess/dhtml-chess.zip $ unzip dhtml-chess.zip
Now, create a simple script, say parse.php, to run through the chess matches and output them in JSON, one per line:
<?php require("PgnParser.class.php"); $parser = new PgnParser("/path/to/chess/TEN.PGN"); $total = $parser->getNumberOfGames(); for ($i=0; $i<$total; $i++) { echo $parser->getGameDetailsAsJson($i)."\n"; } ?>
Run parse.php and dump the results into a file:
$ php parse.php > games.jsonNow you’re ready to import games.json.
Back to the original “hacking” post
If you liked this, you might enjoy:
| This entry was posted by Kristina Chodorow on January 27, 2012 at 8:33 am, and is filed under MongoDB. Follow any responses to this post through RSS 2.0. You can leave a response or trackback from your own site. |
-
Christer Nilsson
-
Christer Nilsson
-
Anonymous
-
Christer Nilsson
-
Anonymous
Subscribe to all posts