Spawning Tool: big features, backend changes, product decisions

The past few days have been very productive for me on Spawning Tool on several fronts. Not only did I work through several features, I made some important backend changes. Consequently, I have also thought on future development, product, and business concerns that I think might be of interest to the community.

Let’s start with the backend changes. First, Graylin updated sc2reader to 0.6.0, which brought 2 important changes for Spawning Tool: support for patch 2.0.10 replays and support for GameHeart replays. See my last post for what GameHeart parsing entailed. Importantly, this brought in ASUS ROG Summer 2013 replays and corrected data from various other replays. Second, I migrated from a mysql to a postgres database tonight. This change won’t impact end-users much (other than some load times), but it makes me feel better about the data integrity.

Next up are the new features. First, you can now view counts on the abilities casted in a game so you can see that Protoss players typically use Psionic Storm about 2.67 times a game, whereas they can expect 1.72 EMPs and 1.37 Snipes against them. The data is pretty rough when summarized from so many replays like this, but hopefully more specific queries will yield more interesting results.

You can also specify times to capture particular counts, so you can see that on average, 12.32 SCVs are built in the first 5 minutes of a game. You can now also filter replays by the date played so you can be sure to only consider the latest and greatest replays from your favorite players.

So all of these changes have me looking forward at the direction of Spawning Tool. Over the past few months, I have been talking a lot with ChanmanV, who has provided product direction for Spawning Tool. The current feature set represents what we consider a Minimum Viable Product (MVP) for showing what basic statistics we can pull out of replays. It’s been a strange road, but Spawning Tool today is far more developed than I imagined when I first sat down pulling an all-nighter to just build something. I originally envisioned it as a proof of concept and technical achievement for the community to get excited about. Today, it’s a product when potential value, and I need to re-evaluate it in that context.

The primary change is that the Spawning Tool site is no longer open source. It was originally open source because it was supposed to be a demonstration for the community, and I hoped to pick up collaborators along the way. Well, I didn’t find many collaborators (though I’m still open to it; email me if you’re interested!), and between forums, code contributions, and the still open source spawningtool parser, I hope I’m doing enough. There are lots of great reasons to open source things, but today, the site doesn’t quite fit.

The other aspect is that I need to start looking forward towards a business plan. Before I scare you off, know this: I intend for Spawning Tool to always be free for public, individual use. The big picture goal of the site is to make quantitative thinking pervasive in the StarCraft community, and I don’t want to change that. I thankfully have a day job that I really like, and this is something that I’m doing for fun and for the community on the side. Given that, there are server and development costs to cover. That’s currently trivial (~$30/month), but those will increase with scale. I’m thinking on a lot of different ideas, so let me know if you have any thoughts on it. Hopefully we can get bigger, commercial entities interested and have them throw us some change, but nothing is locked down.

As usual, leave a comment or email me (kevin@kevinleung.com) if you have any thoughts. And if you’re here for my StarCraft strategy content, I’m working on updating my Protoss strategy guide right now. I had forgotten how much cheese you get from Zerg in real laddering, like this 8 Pool. I’ll try cater the content to this type of play.

Spawning Tool updates: units killed, split win rates

3 big updates for Spawning Tool this week.

First, you can see units killed and units lost in the research tab. For example, you might be curious to know how many Marines a certain player killed during Dreamhack Valencia or how badly their workers are harassed on average. These stats, of course, are very dependent on how long a game goes and what units composition are, but they’re in there and maybe fun to consider.

Second, I split out the win rates data onto its own page from research, so for any of your favorite filters, you can see how the games go. The most obvious use is to see how a player does in different matchups. It may be unsurprising how good JvZ is (the other numbers may be a bit off because of GameHeart replays; more on that next paragraph), but you might be surprised to see that TLO wins 80% of his games from 21 to 25 minutes long, but only wins 16.7% (1-5) of games longer than that. How strange.

Third, GameHeart support is coming soon. GameHeart games are unusual for a few reasons. First, the players and teams aren’t reflective of the actual matchup because the actual players are only picked in the in-game lobby, so the observers are mixed into the data and races may not be accurate. Second, all events are mistimed because the actual game doesn’t start until the in-game lobby is set. Finally, because the teams are misaligned, the winner may not be marked correctly.

So the fix is coming soon, which I can explain the process of. I mentioned before that Spawning Tool is actually built on top of an open source replay parser. Well, Spawning Tool is actually built on top of sc2reader, an amazing open source library that literally turns 1s and 0s into comprehensible data structures. Since many people may be interested in having GameHeart replays normalized to work like standard replays, this functionality should be added to sc2reader (not spawningtool) since it will have better reach that way. sc2reader was cleverly built to allow for plugins for specific functionality. Since sc2reader is an open source project, I wrote up a plugin to normalize GameHeart replays and submitted it to the maintainers of sc2reader to be incorporated into the primary project. If they like it, they can accept the request, and it will be available for everyone (including me) to use. Anyways, get excited about that because there are a lot of Spawning Tool replays that aren’t quite working correctly.

And on that note, if you’re interested in working with StarCraft data (whether replays, profiles, or anything else), I recommend you check out the new API and Data Analysis Forum that Blizzard made. As you might have guessed, the StarCraft developer community has some amazing people at the forefront, and I think it’s a supportive and generous group of people. If you’re interested in contributing or have an idea, chime in there or check out and contact someone else with a project in this thread. No matter where you’re coming from, we could always use a helping hand.

Some interesting stats before Dreamhack

Dreamhack Valencia is coming up this weekend, and in preparation for that, I fired up Spawning Tool for some analysis. Check out a few observations I made in a TL post http://www.teamliquid.net/forum/viewmessage.php?topic_id=421974

Because Dreamhack releases replays after their events, we actually have a ton of data available for each of these players. I went through the players seeded into Group Stage #2 and checked their performance at Dreamhack Summer. Here’s what I found:

1. Lucifron beat Sonder 2-0 using almost the exact same opening in both games. He opened with 2 Reapers into 3 CC and Hellions. The builds only start to diverge around 7:00 (ref)

2. Harstem went 4-4 (50%) in PvZ. He cannon rushed in 3 of his 4 wins, and in none of his losses (ref)

3. TheStC only played TvT in 1 series, going mech in both games against Morrow in by far his longest games (24 and 31 minutes) (ref)

4. Hyun went 7-1 (82.5%) in ZvT (ignore the issues parsing gameheart games). The games lasted between 9 and 19 minutes, with his 1 loss coming against TheStC in the longest game. His 3 fastest wins, unsurprisingly, came off of Roach-Zergling all-ins (ref)

5. StarNaN went 0-6 in PvZ. Granted, it was against Hyun, Life, and Pig, but YugiOh isn’t going to be much easier (ref)

6. Stephano went 6-2 (75%) in ZvZ, losing only to Hyun. He doesn’t build very many Roaches but does build a lot of Zerglings(ref)

7. Tefel went 8-1 (89%) in ZvP, losing only to Harstem’s cannon rush. He went Roach/Hydra in ALL of these games (ref)

8. NightEnD went 3-1 (75%) in PvZ, using a Gateway Expand build in all of them (ref)

I thought those were pretty interesting, but they also show off some of what you can do with the research tool. You might have noticed that this looks a lot different from the old research tool. After getting some feedback about the valuable parameters and developing an approachable interface, I scaled back the page a bit. The fancier bits about searching over the build orders are still around in advanced search, though I’ll need to revisit that code, too, probably.

Oddly, my work recently has been at 2 extremes. There was a lot of parsing stuff I needed to go through that basically just result in more accurate data and low-level extraction of data and tags. On the other extreme, I worked with Julie to get a splash of purple and revised interfaces. The result is that the actual core functionality of the site (somewhere between those 2 ends) didn’t change much, so not too many new features to hawk. Despite that, the site is in much better shape than before.

Anyways, enjoy Dreamhack. I think we’re on the cusp of some really exciting developments with Spawning Tool as we get more replays and hit the core functionality.

Slowing down on content

You might have noticed that I recently have been posting less frequently. I have a few reasons for this.

First, my personal motivation to ladder has dried up. I have fizzled on laddering in many past seasons, and the release of HotS only slightly prolonged my interest. Without actually laddering, I can’t claim to be any sort of authority here.

Second, the HotS meta-game has somewhat stabilized. When I restarted this blog, I was really trying to make some accessible guides in a chaotic landscape and help new players get into multiplayer. Since then, we have seen a few meta-game shifts, and while that will continue, it at least means that there’s a coherent meta-game.

Third, I figured that my StarCraft time is better spent working on Spawning Tool. Truth be told, I don’t know if I was ever authoritative enough to be worth listening to, but Spawning Tool is much more up my alley, and I hope it becomes just as useful.

Of course, this blog isn’t completely abandoned. If I see something really cool, I’ll write it up. I’ll also be putting Spawning Tool updates here. To get your fill, you should check out http://imbabuilds.com/. NoseKnowsAll is a great guy and has been putting together a lot of valuable content.

And quick Spawning Tool update with 2 big features that I haven’t shared yet. First, the research tool. With it, you can put together more advanced queries for replays based on actual timings from build orders. For example, you might be curious to know how effective DT rushes are in PvP (answer: enough), or what the dangerous timings are in TvZ (answer: Roach all-ins from 12-15 minutes, but oddly enough, not Hellion-Marauder play). So play around with that and see what you learn. And please let me know if you have other criteria you would like to see there. I welcome any enhancements to make this tool very powerful.

Of course, that is all limited by the amount of data available, so the other big improvement is that you can now upload replay packs, and it’ll unzip and upload all of the contents. Spawning Tool is also hosted on its own server now instead of piggybacking on my personal server, so it should be better able to handle the load. Keep that coming, and if you hit any server errors, come back to try again later. I receive emails every time there’s a server error, and I do my best to fix them immediately.

So that’s it for now. As always, feel free to reach out to me with questions, suggestions, and feedback. I’m always down to listen. In the meantime, keep laddering in my place.

Spawning Tool upgrades: authentication and an improved schema

For all Americans, I hope you had a good Memorial Day weekend. If you’re unfamiliar with it, Memorial Day is a federal holiday for honoring departed members of the US armed forces. Given the time of year, many people will go out to grill and enjoy the weather. I did some of that, but I also got started on the new season of Arrested Development and built a number of new features for the Spawning Tool.

First, I added basic authentication so you can login and upload replays. This mechanism isn’t particularly interesting, but it’s necessary to restrict who can edit the tags on a replay. I once thought it best if anyone could edit any tags, but I think that’s inviting vandalism. If anyone feels so strongly to add tags to other replays, we’ll come up with a mechanism to enable that.

Second, I improved the Browse Replays interface. You can now do text search and toggle a few options. The actual list of replays is now much more readable thanks to built-in bootstrap styles.

Third, I actually incorporated more parsed data from the replay into the actual database schema for the site itself. The full explanation requires some context.

The Spawning Tool is primarily composed of 2 parts. First, there’s the spawningtool tool, which actually goes through the replay data and pulls out the build order. The result of this tool is basically just a bunch of structured text. Second, there’s the spawningtool site, which gets this text and turns it into a presentable website.

Previously, I was just taking the raw build order data as a whole and re-rendering it from scratch every time. It was easy and was sufficient for displaying builds. The downside was that the spawningtool site was pretty blind to what the data was. I re-implemented it so that the build order is stored in the database itself: instead of a raw build order in text, the build order itself is stored with times, units, and supply counts in the database.

The biggest benefit is coming soon: more advanced filters and statistics from replays. With this data, we can make database queries that translate naturally into questions like, “If you build a Robotics Facility before 7:00 in PvT, how often will you win?” It’s just a matter of finding all replays with builds that fit the criteria, then counting the wins and losses.

I’m very excited about the next steps for the Spawning Tool, and I think that advanced filters is a big part of that. Additionally, I’m working with ChanmanV to make the Spawning Tool a replay archive for practice games. Tune in at http://youtu.be/vstWWo0Gmao?t=29m30s for more about that.

One last thing: I’m in the process of moving the Spawning Tool off of my personal server and onto Amazon Web Services. As such, there may be some downtime in the near future. Deployment might be rough, but I think this will ultimately lead to a much more stable environment.

Tagging Added to the Spawning Tool

It’s funny when StarCraft personalities mention their league promotions since they inevitably are lower than you would expect them to be. I believe the TotalBiscuit recently made Platinum, and Husky made Master, though for all I have heard them talk about StarCraft in the past, I always thought they were at least Masters. Well, you all should know by this point that I’m happily in Diamond, and between content for this blog, following Proleague and Fantasy StarCraft, and the Spawning Tool, I’m beginning to see why these personalities don’t play much. There just isn’t that much time.

The most recent task (and first major feature added to the Spawning Tool since launch) was adding tags to replays, which I completed Monday night. If you look at a replay, you can see a few gray boxes with red text, and those give context and metadata for the replay. Moreover, you can now browse replays by filtering by tags, so if you’re interested in finding all PvT games, you can easily narrow in on those replays.

Moving forward, I hope that tags become to primary method of organizing replays on the site. There is a lot of metadata present in the replays, which I could have (and still may) spit out into specific fields in the database, but I think tags encompass that feature and allow for more flexibility. Note that tags are filed under various categories, which I think should also improve organization. It’s easy to label a replay based on the map it was on, the event it was from, the players in the game, the build orders used, and so forth. I’m not exactly sure what tags are most useful for searching, but it should hopefully be self-organizing.

The strengths of flexibility and community input are also its biggest weaknesses. Because tags can be anything, we could see a lot of strange, esoteric, and unnormalized tags out there. For example, “My Favorites” isn’t helpful because it’s specific to a user, “1 Barracks Expand into Medivac Drops into Early 4th into Late Game Reaper” isn’t helpful because it’s just too specific, and “having both “1 Rax Expand” and “1 rax expand” leads to fragmentation*. Currently, the best solution I have is to include auto-complete in inputting tags so that you are guided to the right result.

Because I’m counting on community input, tags are also open to all users, registered and anonymous. I would like users to be able to upload replays anonymously, and they should be able to tag them as well. This does lead to an odd asymmetry in that I haven’t yet built deletion or editing of tags since that could lead to massive vandalism. I’m toying around with ways to split up permissions based on whether you’re registered or not, but I’m open to ideas.

Anyways, please take a look at the site, play around with tags, and let me know what you think. After the big hit on reddit, traffic has died down a lot, and I’m okay with that. The proof of concept came quickly, but I imagine it’ll be about a month of development before I make another big push for people to start using the site.

In the meantime, I was wondering whether I should put the word “Beta” on the site somewhere. Notice that the front page has a big disclaimer on it for basically the same purpose, but I think “Beta” has been overused and overextended heavily, and I kind of want to fight my little fight. On the other hand, it’s exactly the right description for the Spawning Tool while it’s still in heavy development. Vote below if you have an opinion.

Should I slap "Beta" on Spawning Tool?

  • Yes (75%, 3 Votes)
  • No (25%, 1 Votes)

Total Voters: 4

Loading ... Loading ...

* I considered normalizing all input by removing punctuation and reducing it to lowercase, but 2 counter-examples came up. First, “Roach/Hydra” is much easier to read than “roachhydra”, and second, “HerO” and “herO” are 2 different players

Flash build orders from Proleague (TvZ Hellbat drops!)

At some point, my friend George told something like, “It just doesn’t feel like real StarCraft unless it’s Proleague.” Well, thanks to Fantasy Proleague, I have really gotten into Proleague, and it’s just about all I watch.

The timing is not terrible. Friday and Saturday nights work well, and I’m willing to stay up Sunday/Monday night until 1 watching games. The gaps between games give me time to take care of chores and get ready for bed. But of course, the real draw is that these are the best players playing high stake games and are willing to do all sorts of crazy all-ins, map-specific strategies, and builds tailored for the matchup.

This past weekend was the start of round 5, which is an all-kill round. Instead of each team picking a full lineup of 7 players, the winner of each match stays on while the losing team picks another player. There’s more variance for Fantasy StarCraft, and you don’t get as much exposure to new players, but the aces come out for more games. Innovation looked good in an all-kill of EG-TL, but I think the bigger bounce-back was Flash, who did poorly last round and got knocked out of his GSL Code S group of death (Life, Innovation, Parting, and Flash).

I have a few Flash builds written up. Let’s jump into them.

Flash’s Standard TvP

(v. Terminator on Neo Planet) (TwitchYoutube)

  • 10 Supply Depot
  • 12 Barracks, Refinery
  • 15 Reaper, Orbital
  • 17 Command Center, Reactor
  • 18 Supply Depot
  • 19 Bunker
  • 20 Engineering Bay
  • 20 Marine x2 (continuous)
  • 27 5:20 +1 attack
  • 30 5:45 Barracks
  • 6:20 Barracks
  • 6:35 Refinery
  • 6:50 Tech Lab
  • 7:10 Factory
  • 7:15 Refinery
  • 7:20 Stimpack
  • 7:30 Tech Lab
  • 7:55 +1 armor
  • 8:15 Reactor (on Factory), Starport
  • 8:35 Combat Shield
  • 9:15 Medivac x2
  • 9:40 Armory
  • 9:45 pushing out, delaying Protoss 3rd
  • 10:00 Command Center

This build should look pretty normal: Reaper opening into Bio. The funny part of htis build is the very early Engineering Bay. This lets him get out +1/+1 upgrades for the usual 10-11 minute push. Otherwise, this should be reassuring since it looks like Apollo’s progression.

Flash’s TvZ Hellbat Drops into Mech
(v. Roro on Akilon Wastes) (Twitch, Youtube)

  • 10 Supply Depot
  • 14 Command Center
  • 15 Barracks
  • 16 Refinery
  • 19 Marine, Orbital Command
  • 22 Refinery, Factory
  • 23 Reactor
  • 26 Supply Depot
  • 27 Bunker
  • 29 Starport
  • 6:00 Swap, Hellion x2 (+ a few more), Marine
  • 6:10 Armory
  • 6:35 Medivac (continuous)
  • 7:15 Hellbat x2 (continuous)
  • 7:30 Factory, Tech Lab (will swap onto Factory)
  • 8:00 First drop out, Refinery
  • 8:45 Siege Tanks start
  • 9:00 Command Center
  • 9:20 2nd drop arrives, Refinery
  • 9:35 +1 Attack
  • 10:05 Factory

I myself have been looking for a Hellbat drop build, and it looks roughly like what I should have expected. CC first if you like, or maybe don’t: I certainly won’t. Otherwise, it looks like Polt’s drops, but you add in an Armory for the Hellbats. Disclaimer: none of us can micro like Flash, but Flash was continuously dropping at 2 locations. The Starport and Reactored Factory are an endless stream of drops, and even Roro was losing tons of workers. This is a great way to punish quick 3 bases from Zerg.

Flash’s TvT Mech
(v. Reality on Whirlwind) (Twitch, Youtube)
10 Supply Depot
12 Barracks
15 Refinery
16 Marine, Orbital Command
17 Supply Depot
17 Marine
19 Marine
20 Command Center, Reactor
22 4:20 Factory, Supply Depot
23 4:50 Marine x2
5:20 Starport, Refinery, Swap Reactor, Hellion x2 (continuous)
Orbital Command
6:10 Viking (defend drops)
6:50 Tech Lab (Barracks)
7:20 swap onto Starport, Raven
7:50 fends off a drop and run-by (takes some damage)
? 9:00 Hellion pressure
8:30 Command Center
9:00 Armory x2
9:45 Refinery x2
9:50 Factory x2
10:30 +1/+1 vehicle
11:00 Siege Tanks x2, Hellbats x2 start
11:20 Engineering Bay
12:00 Medivacs, Missile Turrets

I don’t have much to say about this build: it kind of speaks for itself. I have just 2 things to draw attention to. First, there’s the early Raven. I have been looking at ways to integrate a single Raven into my own builds (great in TvZ for clearing creep, right?), so I guess the answer is just to go for it. Second, Siege Tanks start really late for mech play. Barring any signs that you need the defense, I guess it’s okay to go up to 3 bases before getting Tanks. I guess it worked in this game because Reality revealed his hand in the 7:50 Marine/Hellion pressure, but even so, that’s really late.

I hope you like the builds. To be honest, all Terran play is starting to look very similar to me, but I guess that makes it easier to put together a guide. On that note, I heard that Apollo will be doing another set of videos, so you should look forward to those. On a related note, I’m feeling good enough to write up a Terran guide at this point. Maybe you can look forward to that.

Finally, Blizzard is getting serious about replay analysis, and I’m very excited. They apparently have enhanced the data provided and released an open source library for parsing that data. I have already put together a tool for extracting build orders that I’m calling the Spawning Tool, so check that out when you get a chance. Note that this is primarily just a proof of concept, so there are a lot of bugs. I would go into more detail, but that’s not really what this post is about, so expect more news about that soon!

How quantitative analysis could change StarCraft

The community likes to think that eSports is on the cutting edge of competitive play, but we still have much to learn from conventional sports. I don’t know much about the production and marketing side of sports, but I do know some statistics, and StarCraft, at least, is lagging behind conventional sports tremendously in quantitative analysis.

The only significant statistics I see from StarCraft are 1) win percentages in various circumstances and matchups and 2) Actions Per Minute (APM).* Win percentages are very broad metrics and not particularly instructive. APM is generally regarded as misleading at best and irrelevant at worst. Granted, StarCraft is a complicated game: the sides are often asymmetric, and game length varies. Sc2gears tracks many more statistics, but these haven’t become standard for broadcasting and analysis, whereas conventional sports broadcasts almost always feature statistics. Even Fantasy StarCraft discussions are pretty fuzzy, whereas fantasy football and baseball really are sports fans geeking out over numbers. Generally, StarCraft analysis is qualitative.

One of the coolest advances in conventional sports is computational, normative analysis. Today, games are tracked with better equipment, and by combining that data with advanced statistics, we can make predictions about what players should do in various circumstances. Because baseball is basically turn-based, it already has advanced sabermetric analysis (link to a reddit discussion about this). Basketball, however, has also been making strides in this area, according to this recent story from Grantland.

Hopefully you’re familiar with basketball, but if you’re not, it’s a 5 on 5 sport played on a (usually) indoor court. On opposite ends of the court, there are hoops, and each team’s goal is to shoot the basketball into their target hoop. On offense, teams design specific plays, and execution is key. On defense, however, teams have general schemes and react to what the other team is doing. Given that, it’s always been assumed that defensive skill is all about experience, “smarts”, and other intangibles.

Well, new analysis is starting to give us more concrete ways to understand defense. A new camera-tracking system in the NBA called SportVU can track where players are, and that data is turned into X-Y coordinates for clean video footage. And it gets even better. With significant computational analysis, the Toronto Raptors have come up with the “ideal” defense that minimizes the expected point value of a play**. You can watch the videos in the Grantland article where there are 2 sets of defenses super-imposed on the play: the actual defenders on the play, and the “ghost” defenders of where the players should be.

Hopefully you’re beginning to see how this analysis can impact StarCraft. Fortunately, we already all of the relevant data for unit positions in replays. If we can figure out how to parse expected outcomes from a large number of these replays, then we can begin to see general trends. Watching professional play, big deathball fights often come down to positioning. Is it safe to fight in this open area? Can you safely attack this base without getting trapped? How should you position your army to get the best engagement? Which units should be in front? These are similar questions to what the Raptors are answering in basketball.

It’ll be a lot of work to make this work. Specifically, it’s very difficult to parse meaningful actions out of a stream of data. The Raptors managed to recognize a pick and roll (one offensive player stands beside another defender, allowing the ball carrier to run around them. The first offensive player then goes in the opposite direction, hopefully resulting in confusion between the 2 defenders and leaving 2 open players). It may sound simple, but that’s darn hard, and I find that amazing.

Anyways, I think there’s a huge opportunity here for growth in eSports and a way for us to remain at the cutting edge of sports analysis, and even Artificial Intelligence at that. And there’s a tremendous amount of really interesting stuff that I would have to investigate and share, if you guys are interested. So before you head off, let me know in the poll below if you would be interested in me writing any of the following.

Which of the following topics should I elaborate on?

  • Just stick with the build orders, buddy (40%, 4 Votes)
  • Machine learning for event parsing and predictions (30%, 3 Votes)
  • Speculation on useful statistics for StarCraft (20%, 2 Votes)
  • Training AI to play StarCraft (this is tangentially related to this post) (10%, 1 Votes)
  • Advanced statistics and sabermetrics from baseball (0%, 0 Votes)

Total Voters: 8

Loading ... Loading ...

* If you know of more, please let me know. I’m interested.

** I’m not 100% sure how they do this in basketball, but I can explain how this is done in baseball in another post if you want