Visualizing the Skill Curve

Recently in my discord, one user, Hambone, linked a study related to skill rankings in Blitz Chess, standard Chess, and the card game Yomi by David Sirlin, and how well those rankings correlated with win ratio. You can read the full study here. From it emerges this amazing chart:

This chart is a depiction of a game’s consistency across skill levels, with a spectacular illustration of how there are certain bottlenecks where consistency goes up and down. (for those with color blindness Player 2 winning is represented on the graph as yellow, losing as dark blue. Slight wins are orange, slight losses are light blue, and 50:50 is represented as teal). It’s worth noting that the skill ranking of an average player is 1200. From this chart, we can intuitively extrapolate a number of conclusions, but first lets make some observations: We can see that Yomi is less consistent across all skill levels than either variant of chess. We can see that chess has a short period near the bottom skill level where better players very consistently beat worse ones, then there’s a free-for-all near mid-low level, and another bottleneck at higher to top levels of skill. We see a mild version of this trend even in the yomi chart.

From this we can conclude that aspects of these games make them more or less consistent. From personal experience, I’m going to put forward that the big thing that makes a game consistent is execution testing, a style of game that I call an “efficiency race” (eg. racing games, games that directly compare a skill that is dependent exclusively on you and nothing else). The things that make a game less consistent are Randomness, and unweighted Rock Paper Scissors (eg. games of chance and games with hidden information, where you directly interact with your opponent). For example’s sake, there are some games where you cannot become consistent, such as a pure coin toss. The graph for this game would be teal (50:50 odds) across the entire chart. A hypothetical perfectly consistent game, where the better player always wins, would be a perfect split of yellow/dark blue directly across the diagonal center line, with almost no teal.


So what do the bottlenecks represent? I believe they represent points where execution skills become more important, where the game is less randomness, less direct interaction, and more about who has stronger execution. It’s been a long-stated observation that high-level Chess is about memorizing board positions more than necessarily out-thinking your opponent. Bobby Fischer creates Fischer random chess to help alleviate this problem. We can conclude from this that in turn based games of perfect information, eventually memorization wins out, but possibility space, the state size of the game, staves off this tendency for longer. In other words, depth helps create inconsistency for this style of game. If we had a similar chart for Go rankings vs match results, we could probably confirm this prediction.

At this point I’m going into conjecture, but I think we see specific bottlenecks like this in a lot of games. For example, in Smash Bros Melee, I believe there is a bottleneck around learning to L Cancel or not. L canceling is a skill that someone learns roughly when they start playing competitively, and not before that point. Many lower level players at tournaments can’t L cancel. L canceling provides a reasonably big advantage to players who can do it, and it’s hard enough to have a clear boundary between the cans and cannots. An additional observation we can infer is that games with low skill ceilings don’t become very consistent at the highest skill ratings.

Traditional fighting games have a similar barrier in the form of special move inputs. If you can’t do special move inputs, you’re at a massive disadvantage versus people who can, and this comes up right when you begin to play. This barrier comes in early enough to create a significant disparity between cans and cannots, and it is a large part of what puts players off of traditional fighters. New players consistently losing to their friends who aren’t much better, but can do special move inputs is frustrating! This is regrettable, because special move inputs add a lot of depth to traditional fighting games. In order to prevent this, we should probably do a better job of teaching this skill to players, so that more lower level players pick up the skill.

An analog to these skill bottlenecks in Chess is Openings. Openings are sets of moves you can memorize and play that are highly effective in the early game and set you up to get an advantage. This might explain some of the lower skill ratings having such a harsh bottleneck in the chart above.


Of course the question comes up of, how much consistency do we actually want? There are two opposite extremes, a perfectly inconsistent game that no one can improve in, and a perfectly consistent game where a worse player can never hope to outcompete a better player. The principle of fun (players have fun when something is inconsistent, but also when they can raise their consistency), is a good rule of thumb here. We want something in the goldilocks zone of inconsistency, where it’s consistent enough that a player can improve at it, but not so consistent that all the matches are decided when you sit down to play. Different games might aim to achieve different levels of consistency, for example, games meant to introduce a player to a genre might have a low skill ceiling, or have a lot of randomness, so that the best players cannot be much better than lower ranking players, and so even if they are, randomness evens out results. For serious competitive games, there needs to be a certain level of consistency, otherwise players don’t have much fun improving at the game, because their efforts are thwarted by never actually winning more often. This can create the feeling of working really hard, but getting nothing for it. There’s no perfect answer for this, and different designers are going to have to consider what actually feels right for their game, but greater awareness of consistency curves, skill curves, creates another tool in a game designer’s toolkit.

A factor worth considering in consistency is how much reward something gives you. How many victory points do you get for performing a certain execution test (and how hard is that execution test)? How many do you get for winning a certain rock paper scissors toss? How much do you get for the result of a random drop? How many points is that proportional to how many you need to win the game overall? The more victory points you get from difficult execution tests, the more consistent the game will be. The more victory points you get from other sources, the less consistent the game will be. Rock Paper Scissors is in a weird place where it’s significantly less consistent than execution tests, but more consistent than pure randomness, because it’s more a product of chaos than randomness.

Even a game that is purely execution tests can be more or less consistent depending on the varying level of difficulty versus reward versus risk of particular execution tests. An extremely difficult and high risk execution test will make players’ performance in the game less consistent overall, whereas a large number of medium difficulty, medium reward execution tests will create a very consistent game. It’s worth considering how much reward particular skills get you in various games, and how much risk there is when you fuck up. There’s no hard rule, but having extremely difficult techniques with a low reward and a high risk will lead to players not bothering (slashback is frequently this in Guilty Gear AC+R). A game with a lot of extremely difficult techniques that don’t provide a lot of reward might have players work really hard to get better, but ultimately not become any more consistent (we saw this happen in Smash 4, where top players got knocked out in pools all the time, with the bizarre exception of Zero, who was the most consistent player in esports history, winning over 50 tournaments in a row).

Practical Measures

What are some tricks we can use to decrease or increase consistency? Randomness is an obvious one. Randomness decreases the consistency of a game. Skilled or serious players tend not to like the introduction of randomness into their games, and tend to applaud the removal or mitigation of random elements in the rules. Good players like winning. An increase in consistency is the reward players like to see when they improve at a game. Being unable to improve at a game is sometimes frustrating.

Another one that decreases consistency is Negative Feedback, aka. Comeback Factors. Negative feedback reduces the amount of actual reward for winning exchanges, or opens up options that allow you to earn a lot of victory points only when you’re behind in victory points (blue shell only goes to racers in last place). Negative feedback decreases the consistency of a game by making the player who is behind in victory points right now gain more victory points. In other words, in games with negative feedback, a lead is not really a lead, you might even technically be behind your opponent.

This can create bizarre cases where falling behind might be the best option towards victory, such as in Tekken 7 where rage arts (a powerful super move that can absorb hits and deal 55 damage out of 180 health) become available at 25% of your life (45 health). Since rage arts deal 55 damage, but you get them at 45 health, if your opponent is under 55 health, but you’re under 45, you’re technically ahead of them; having access to a powerful option they don’t. Rage drives have a higher damage potential, also becoming available at 45 health, but they don’t have super armor, so they’re less likely to win in a neutral situation. There are mitigating factors on this to prevent rage arts from running rampant in high level play, such as jabs recovering fast enough to block before a rage art can come out, and the long super flash of rage arts giving a clear indicator in neutral that they’re coming so they’re useless outside of trades or punishes as soon as people know how to block when they see the flash. This means rage arts create inconsistency mostly at the mid to low level, while avoiding affecting higher level players most of the time.

Rage Drives however do affect the high level, because getting big damage off of them takes skill. We also see this with X-factor in Marvel 3, and V-trigger in Street Fighter V. All of these mechanics allow skilled players to make a comeback in a way that a less skilled player can’t, using the same mechanic, because the damage you get off of them scales with your skill at combos. Having the reward of negative feedback scale with skill certainly makes the game feel more fair, but it also introduces more inconsistency into high level play, which is something that’s frequently called out in Street Fighter V.

Towerfall ascension presents an example of negative feedback that is incapable of creating this type of inconsistency however. When you’re behind by a large number of rounds, towerfall gives you a boost to help you win rounds to catch up, but this boost goes away when you’re closer to your opponent in wins. If comeback factors were proportional to the size of your lead, then we’d see less disruptive scenarios where losing actually puts you ahead, and less reduction of consistency in the game overall (as well as probably feeling fairer to the players).

An example of the opposite of this appears in Tennis. In Tennis, to win a game or a set, you must be ahead by at least 2 points, or 2 games. This is because, in tennis, the serving player is considered to be at an advantage, so the rules require you to break your opponent’s serve, usually while holding your own serves, in order to win overall. This tiebreaker rule means that you must not only inch out your opponent, you must maintain a lead against them in order to win overall. The consequence of this is that tennis is a significantly more consistent game than it would be otherwise. Keeping things in tiebreaker until one can win with a significant lead is a great way to exaggerate small gaps in skill, making the game more consistent.

Positive feedback (snowballing) can have a good or bad effect on consistency, depending on what actions generate positive feedback. If those actions are related to execution tests, then consistency will raise a lot. If they’re related to rock paper scissors, or randomness, then they can decrease consistency. Super Meter in most fighting games is an example of positive feedback that increases consistency. Same for destroying your opponent’s units in most RTS games (the number of units you have determines combat effectiveness, and that’s determined by your execution of macro). It can also make games more inconsistent when they’re based on randomness or rock paper scissors, because it can mean that small wins early on can compound into much bigger wins later. Cheese strategies in Starcraft bet on this in order to get wins.

Heavy snowballing will make games very inconsistent, as effectively early wins earn a lot more victory points. The more victory points are required to win a game (proportionally to how many you earn per-exchange), the more consistent the game will become, and heavy snowballing means you proportionally earn a ton of victory points from the early exchanges, so you end up having effectively less exchanges over the course of the game, even if you actually have a ton of exchanges. If whoever lands the first punch wins 95% of the time, then the game is really about who lands the first punch, even if the game consists of 500 punches after the first one. This type of scenario is called a lame duck scenario. In Starcraft, this gets resolved by having players forfeit voluntarily when they know they’re done in. In fighting games, even those without meters, you have the same ability to win exchanges on no health versus a ton of health, so fighting games have less lame duck scenarios (except for getting chipped out when you have no life, but many newer fighting games won’t let you die from chip damage).

If there’s a highly rewarding, but highly difficult technique, such as the Perfect Electric Wind God Fist (possibly the hardest technique in Tekken) except it kills you instantly, then you’ll see a corresponding rise in consistency for players who can pull it off, but very low consistency among players above that threshold (since they’re all using it). Making a consistent game relies on having a lot of techniques of varying effectiveness that all have varyingly difficult execution tests, so that a good player can master more of them and have a real advantage over a less skilled player.

Admittedly, a lot of this article is my own speculation. I’d like to see further research performed across different games to see if we have more clearly observable patterns. An obvious followup study would be Go, since the statistics and rankings already exist for Go.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s