At Spawning Tool, we’re all about labeling StarCraft replays with build orders. With tens of millions of unit build events tracked, build order labels are a good way to simplify and make that data more accessible. Spawning Tool collects build order labels from various sources, including:
- machine learning – given many examples and counter-examples of Bio Mine builds, which unlabeled builds look the closest?
- programmatic logic – if there are 2 Hatcheries built before the first Spawning Pool, it’s a “3 Hatch before Pool”
- community-approved suggestions – the yes/no you sometimes see on replays
- community-contributed labels – anything else the community punches in
Our certainty about the labels follows the list above as well: we are relatively uncertain of the labels from machine learning but highly certain on those contributed by hand. The gray area, however, has always been those in the middle: how certain do we have to be to add the build orders straight into the system versus just posing it as a suggestion?
After going through the data, we determined that openings are relatively standard, and have shuffled most of those to be automatically, programmatically labeled for you. By cleaning this up, we added ~9900 new build order tags. We also removed ~5000 suggestions that were relatively obvious. Currently, the list of programmatic build order labeled are:
- Protoss: Forge Fast Expand, Nexus First, 1 Gate MSC Expand, 1 Gate Expand
- Terran: Proxy 2 Rax, 8/8/8 Proxy Reaper, 14 CC, 15 CC, Reaper Expand, 1 Rax Expand, 1/1/1
- Zerg: 3 Hatch Before Pool, X Pool X Hatch, X Hatch X Pool, X Pool (10 and below)
Overall, we hope that this greatly increases the quality and accessibility of our build order data. However, the data can always be better, and we would appreciate suggestions for more build orders that we can add into the system. Our hope is to come up with a broad taxonomy for build orders to label all of them and understand the relationship between them.
And even if you don’t have any new ideas, we would appreciate any help approving existing suggestions. We still have over 10,000 undecided suggestions that all of you should feel welcome to adjudicate on.