Coding Co-op Replay Parsing

Earlier this year, Spawning Tool added co-op replay parsing and build orders on the Spawning Tool website. As usual with programming, the approach was clear, but I ran into many minor issues, and in the end, it required an entire code rewrite.

So this post will dive into that process. Although the content is technical, it hopefully will make sense to non-programmers as well. For the programmers out there, you can see the actual code this pull request.

Adding Co-op Game Data

The first step to parsing replays is turning 1s and 0s from the .SC2Replay file into something meaningful to humans. Thankfully, the sc2reader open source library does it for me: the data comes out structured as game events and units. However, it still takes some massaging to turn it into a build order. The open source spawningtool replay parsing library handles that, which is separate from the Spawning Tool website that uses the results of that parsing.

The biggest obstacle is that the game events are slightly different from how we write build orders. The cleanest data in the replay is when units appear on the screen and when upgrades are done. However, we don’t write build orders with timings of when the Marine or Metabolic Boost is done: we write build orders for when you started building something. To account for that, we need to code in the build time for all units. Then, we just do the math to figure out when it started. Before co-op parsing, the data looked like:

'WarpPrism': { 'build_time': 36, 'built_from': ['RoboticsFacility'], },

That says that a Warp Prism takes 36 seconds and can be built from the Robotics Facility. The building is important because we also need to know if the building was Chronoboosted.

We have entries for this for every unit and upgrade available on the ladder, which takes about 900 lines of code for LotV. A fun fact about the data is that we don’t need build times for buildings because buildings actually appear on the screen when they started building (unlike, say, Zerglings, which only appear after the egg finishes hatching).

Similarly, we had to add build data for every Co-op unit and upgrade as well, and with 15 commanders, there was a lot of it. In addition to gathering the same data, I added more relevant info to arrive at entries that look like this:

'HighTemplarTaldarim': { 'build_time': 55, 'built_from': ['Gateway', 'WarpGate'], 'display_name': 'Ascendant', 'race': 'Protoss', 'type': 'Unit', 'is_morph': False, },

The “display_name” converts the internal game unit name into what you see in-game. It also marks the race, type, and whether it is a morph or not. Without getting into details, we previously maintained this data only in the Spawning Tool website, which did another processing step to turn spawningtool parsing output into readable build orders for people. However, it made more sense just to centralize all of that to be converted in one step.

Also, we split out these entries per Co-op commander: instead of having a global list of units, each commander had their own list of units because the same unit in the data can actually be different units. For example, Alarak’s “Stalker” is actually called a “Slayer.” Different commanders also have different build times for the same unit. For example, Swann’s mech units build faster than others.

After these changes and accumulating the data, the co-op unit data took about 4200 lines of code! That wasn’t all written by hand, but there was a lot of manual testing required. I never figured out a good way to pull game data directly from the StarCraft client program, so I played a lot of test games where I had to make exactly one of each unit, structure, and upgrade (and trigger every ability), then look at the parsing output and compare it to watching the replay.

But by the end, I had all of the raw data put together and just needed to use it correctly.

Rewriting the replay parsing logic

Even after I had the game data in, there was a lot more work to properly parse the replays. For some rough statistics, the parsing file was about 650 lines of code. To add co-op support, I wrote roughly another 650 lines of code and deleted 450 lines of code, so it was a big change.

Conceptually, the biggest change was rewriting the code from a series of instructions (known as “Imperative Programming“) to tracking details and modifying it at different steps (known as “Object-oriented Programming“).

For the technical, I had to do this because co-op requires maintaining a lot more state that I didn’t want to pass around from function to function, so I put everything into a class and instance variables so they were in scope.

For the non-technical, let me explain by analogy.

Let’s say you have an assembly line to build, say, 2 bicycles: a men and a women’s bike. On the assembly line, you have specialized workers doing different tasks: the first guy welds the triangle of the frame. The second guy attaches the handlebars. The third guy grabs a wheel and puts on a tube, etc. If you only make two types of bicycle, each person can tell the next guy if it was a men or women’s bike, and they will know what to do. This is roughly what it was like parsing LotV replays: each part of the program could depend on most replays being mostly the same, so it was easy to just work with the data passed from part to part.

However, let’s say instead of making 1 type of bike, you start making, say, 16 types of bikes, and each one is customized by the buyer. Now, each worker on the assembly line has to know a lot more about the final outcome and what building decisions were made so far. The first guy figures out the measurements on the height of the bike to make he frame. The second guy has to use different length handlebars for the possible accessories and adjust it for the frame. And so forth. Now, if you were just to have a note for the customizations, it would be a pain to write all of that down and pass a long note.

This is roughly what we have to do for Co-op replays because Co-op replays are very different not only from each other, but also LotV replays. For example, they have different data sets depending on the commander (see above), and commander masteries that can vary by game and player. So instead of passing notes, we can group all of this knowledge onto, say, a big whiteboard in front of the entire assembly line where everyone can see it and add notes as they go.

And that unfortunately took a lot of rewriting, but it worked, and the code is a lot easier to read. You can see the changes here.

Other Technical Challenges

Those 2 projects were the bulk of the work. There were some changes to the website to use the new data, but that actually wasn’t too bad overall. I will identify just a few more interesting challenges with this.

First, and perhaps most frustrating, was that co-op games still use Blizzard time. With LotV, Blizzard fixed the clock so that “fastest” corresponds to real-time, and that is 22.4 frames per second (FPS). However, co-op still runs on Blizzard time of 16 frames per second. I was clever enough to use a constant for the FPS in the codebase, so when I swapped HotS for LotV, I only had to change one line of code to get times to work right. However, I wasn’t clever enough to treat FPS as a variable, so I had to rewrite a lot of code to get that to work.

Second, similarly, Co-op games still use an older version of chronoboost that was only temporarily in LotV, so we had to incorporate that logic again.

Third, .SC2Replay files have a “cooperative” flag to tell you if the game is a co-op game or not. Oddly, this flag doesn’t work. I reported this bug to Blizzard, but I’m not sure if it has been addressed, and even if it has been, I still have a lot of old co-op replays that need to be marked correctly. As such, the library checks for that flag (if it starts to work) or if anyone has a commander.

Help Out The Open Source StarCraft Community!

If you’re interested, I invite you to contribute to open source StarCraft code. There are a few related projects for parsing replays, including:

sc2reader – check out this intro from MGD
s2protocol
spawningtool

I have to admit that the StarCraft replay parsing community actually isn’t very big or active: we keep it going for new versions, but we’re not adding or changing much.

However, the StarCraft AI community is quite active, so you could check out their work as well. They’re a great place to land.

I hope you enjoyed reading about how we got Co-op to work. It took longer than it should have, but we’re quite pleased with the result and hope it’s useful to the community.

Spawning Tool Blog

Updates about StarCraft 2 build orders site

Adding Co-op Game Data

Rewriting the replay parsing logic

Other Technical Challenges

Help Out The Open Source StarCraft Community!

Leave a Reply Cancel reply