Elo Rating System.

image

NOTE: Medium doesn’t display subscripts or superscripts and makes formulas hard to read. You might want to take a look at this text as PDF which is better formatted.

In game design, the term matchmaking denotes the process of selecting two or more opponents to take part in a single match or a round of competition. In PvP games, any player who joins the game, and wants to take part in a match, is placed into a pool of eligible contestants. The system tries to find a suitable opponent for each eligible player, according to some criterion. The objective is to provide an entertaining experience to both players. This means finding a suitable opponent with just enough skill to provide just the right amount of challenge. Matching a noob with a pro would result only in frustration for the noob and boredom for the pro. Skill is quality. It is often intangible. Still, computers work with numbers. We need to associate some quantitative numeric value with the skill of each player and use this as a matchmaking criterion.

The Elo rating system is designed to do exactly this, i.e. to associate a numeric value to each player that somehow reflects the level of the player’s skill. What is more important it provides a way to constantly update this value to reflect the improvements in player’s performance. Despite the common misconception, Elo is not an abbreviation. It is actually the last name of Arpad Elo, a Hungarian-American physicist, mathematician, and chess player who created this system.

The system he designed was meant originally specifically to be applied to chess. Chess is, of course, a highly skill-based game. It is also what is known as a PvP 1:1 zero-sum game, meaning that each chess match involves exactly two players, presumably human and that there can be only one winner of each match. This makes the Elo system applicable to any other similar games. There are plenty of texts on the Elo rating system around. Some of those texts are more theoretical than others. There are also many modifications and extensions of this system for various particular other types of games.

What follows is my personal interpretation of the mathematics behind this system. I tried to provide practical step-by-step instructions in building this system, providing just enough theoretic considerations needed for understanding its workings. I also tried to reflect on actual issues and problems that I have encountered when trying to apply this system to games and game features I was working on. As you will notice, some of the presented values are somewhat arbitrary. They are mostly chosen empirically and I have picked them up from the existing implementations of the system.

Basic Implementation

Imagine that two players, we shall denote them as player A and player B are playing a PvP match against each other. These two players will have their corresponding Elo ratings, Ra, and Rb respectively. Before the match, the difference between the Elo ratings of these two players should be small enough for them to be selected as good competitors. In other words |Ra — Rb| ≤ M where M is the matchmaking distance of the players.

From the perspective of player A, this will result either in a victory, loss, or a draw.

The Elo rating of each player should reflect the player’s relative skill, meaning it should reflect the player’s past performance in matches against other players. Therefore, the Elo scores of both players need to be updated after the match in order to reflect its outcome.

Obviously, not all victories or losses have equal importance. Mercilessly beating a much weaker opponent doesn’t count as quite the same kind of achievement as carving a victory against a superior adversary. On the flip side, there is no shame in losing a battle against a much stronger opponent, which can’t be said for losing against someone with a much smaller rating. These should be reflected in the updated ratings of both players. Elo formula is designed to do precisely that.

After each match, the Elo rating of player A is updated according to the following formula R’a = Ra + K*(Sa — Ea), where R’a represents the new, updated, Elo rating of the player after the match, Ra represents the Elo rating that the player had before the match, Ea denotes the expected outcome of the match and SA is the actual outcome of the match. Variable K denotes a scaling factor that determines how much influence each particular match can have on the overall Elo rating of the player. There are several ways to derive this value. In practice, the factor K has a different value in different levels of competition, for example in different leagues of the competition. A good starting value is K = 32 with which you can start to experiment.

As we mentioned before, each match can have only three possible outcomes from the perspective of player A. This is reflected in the values that the variable SA takes, in the case of victory Sa = 1, in the case of loss, Sa = 0, and in the case of a tie Sa = 0.5.

The true magic of the Elo formula comes from the second part, i.e. from the way how the value Ea is calculated. Since 0 ≤ Sa ≤ 1 it follows that it would be nice that likewise 0 ≤ Ea ≤ 1. Elo formula maps the expectation of the outcome of the match onto the interval (0,1). It is the logistic curve with base 10. It can be written as Ea = Qa /(Qa + Qb), where Qa = 10^(Ra/c) and Qb = 10^(Rb/c).

The factor c is another of the magic numbers in this calculation and is usually taken to have the value of c = 400. You can select any other value you want but if you are uncertain, 400 is a good start.

Now we have all the ingredients for the barebones implementation of Elo rating system:

  • The current Elo rating of a player Ra
  • The expected outcome of a match Ea = Qa /(Qa + Qb), Qa = 10^(Ra/c), Qb = 10^(Rb/c), 0 ≤ Ea ≤ 1.
  • The actual outcome of the match Sa = 1 in the case of a win, Sa = 0 and in the case of a loss, and Sa = 0.5 in the case of a draw,
  • The formula how to update the Elo score of the player after the match R’a = Ra + K*(Sa — Ea),
  • Two free scaling parameters K and c, with their typical values K = 32 and c = 400, and
  • The matchmaking distance between players M.

Notice that since both 0 ≤ Sa ≤ 1 and 0 ≤ Ea ≤ 1 the biggest value by which the Elo rating of the player can be updated equals K.

The formula by which the Elo rating of the other player, player B, is updated is, of course, exactly the same, only substituting Rb for Ra when appropriate.

One last thing remains that should be taken into account. This is the initial Elo rating of a new player that has just started playing the game and has not yet won any matches. Let’s denote it as Ra0. A good value is Ra0 = 1500.

Actually, any value can be chosen for this as long as it is commensurate to the maximal rate of change, i.e. K. Keep in mind that in theory, a new player can start and furthermore can continue losing matches immediately after beginning to play. Fu Therefore the new value of the Elo rating can be RA0 < RA. This is something that we should try to prevent with other aspects of our game design, but our mathematical system needs to be able to handle this case. Finally, we do not want a player to have a negative Elo rating. The initial value Ra0 needs to be significantly larger than the scaling factor K, to allow for the player to play, and lose enough matches before eventually winning one so that his Elo rating would finally start to rise up.

Just to be on the safe side, and to limit negative user experience it is good to introduce a minimal Elo rating value, Rmin below which the Elo rating of an individual player cannot drop. In other words, if Ra = Rmin the player’s rating will be updated only upwards!

Keeping Score

Are you still with me? Good! We are going to complicate things even more. The original Elo formula was designed for chess games. Chess is a game where players really do not keep score. Each match really can end only with three possible outcomes, win, loss, or a tie. Chances are that this is not enough for the game you are designing. Chances are that in your game players will actually score during the match. Consider for example a game of football, the European version known as Soccer in the US. A victory in which one team scores 5 goals against a team that scored 0 is surely a greater achievement than a victory of 1:0. The same applies to our game. What if we want to take this into account when updating the Elo rating of players after each match? There are at least two ways of doing this. One way is elegant however, the other one is less elegant but offers more control.

The elegant way involves modifying the way that the numeric value for the actual outcome of the match SA is calculated. Previously we have used a very simplistic of calculating SA by assigning fixed values for victory, loss, and draw. We can make this more sophisticated and take into account the scores achieved by both players during the match. Remember, 0 ≤ Sa ≤ 1 should remain. Let Pa be the number of points scored by player A, and Pb be the number of points scored by player B during the match. We can calculate the desired numeric value as Sa = Pa/(Pa +Pb), i.e. as the number of points scored by player A divided by the total number of points scored by both players during the match. Obviously, in the extreme case, if player A has scored 0 points Sa = 0/Pb = 0 if player A has won and player B scored 0 points, SA = Pa/Pb = 1, and if the match ended in a draw, Sa = Pa/(2*Pa) = 0.5. Be careful of division by 0. In the case that no player scored any points Sa=00 should still produce S0 = 0.5. In any other case, SA should result in a more granular value. For example, a 3:2 victory produces a value of Sa = 3/(3+2) = 3/5 = 0.6, a value a bit bigger than 0.5 but still not quite as big as 1. On the other hand, a near loss of 2:3 will result in Sa = 0.4, a value a bit smaller than 0.5. This new way of calculating Sa can be neatly plugged into our original formula R’a = Ra + K*(Sa — Ea)

However, this approach has some disadvantages. Above all the results can be somewhat unpredictable. As game designers, we might want to retain a bit more control over the way that the Elo rating gets updated. Instead of plugging in the player score directly into the original formula, we might want to extend it in the following way:

R’a = Ra + K*(Sa— Ea) + L*Pa/(Pa +Pb),

Where again Pa/(Pa +Pb) is the contribution of points scored by both players during the match and L is a new scaling factor that we are free to control independently from the rest of the calculation.

Rewarding Victory

There is one more final modification that I would do to the traditional Elo formula, and this one is not for the faint of heart. The Elo rating system is a piece of delicate math, well balanced and adjusted for the purpose it was designed for. However, this is also its biggest disadvantage when it comes to its use for game design.

The rating of an individual player actually reflects the player’s skill level remarkably well. It has a tendency to grow at about the same rate the player’s actual skill grows. Unfortunately, as we know, players’ skills tend to follow a familiar S-shaped learning curve. This means that eventually, both player’s skill level and his Elo rating will reach a plateau, i.e. they will reach a point beyond which they will not grow significantly. This is all well and fine if we are talking about something like chess. However, we are not dealing with the grandmaster level of competitive chess here. We are most likely designing a game that is supposed to be, above all, entertaining for each individual player.

Quite often in games, we tend to use the player’s level as an unlocking criterion for all sorts of things, from new game features to new characters, unit types, cards, arenas, etc. If we chose to use Elo rating as a proxy for a player’s level we are running a risk of having the majority of players getting permanently stuck at some mediocre level with very little hope of progressing beyond that point. Their true skill would simply reach the plateau.

In order to avoid this we need to modify the basic Elo formula. I am going to take that sophisticated piece of mathematics and hammer it a bit with a sledgehammer to make it fit my purpose.

Let’s analyze a bit the problem that we are dealing with. Consider a basic 1:1 PvP scenario. Each match has exactly two participants. Each match can have exactly one winner and one loser. In a well-balanced matchmaking system, such as the one based on Elo ratings, on average, any individual player can expect to win exactly 50% of matches. Since the Elo formula is symmetric, after a certain point, the player’s Elo rating is expected to grow as much as it is expected to fall. Thus the player is stuck hovering at about the same level. To help players break out of this deadlock and progress further in the metagame, we need to break this symmetry. We can do this by adding another factor to the Elo rating update formula,

R’a = Ra + K*(Sa — Ea) + L*Pa/(Pa +Pb) + Sa*V.

The factor V is sort of a bonus that is added to the player’s Elo rating with each victory, actually with each match that doesn’t end up in a defeat. To ensure this I multiply it with SA the variable that is anyway equal to 0 in the case that the player has lost the match. The exact value for the variable V can be chosen freely.

This addition has some important implications for the behavior of the whole system. The maximal value by which Ra can be reduced is K + L. However, the maximal value by which Ra can be increased is now equal to K + L + V. This means that on average, all Elo scores for all players will gradually creep up over time. This allows players to eventually progress through the metagame and unlock new content without the danger of being stuck. Furthermore, this rate can be calculated with respect to the number of matches that the player plays, as ∆Ran*V/2, where n is the number of played matches.

But what about the fairness of the matchmaking? Doesn’t this break the delicate balance? Actually, what this adds is a certain velocity by which players’ Elo ratings creep upwards. If the same V factor is applied uniformly to all the players, the relative differences in their Elo scores will remain roughly the same. Thus matchmaking will also remain relatively fair.

The really fun part about all this is that you are actually free to play around with the values for K, L, and V factors. If you are building a system of arenas or leagues, you can assign different sets of values for these factors for each arena, creating a more dynamic or more static environment, making the actual score have more influence on overall player rating, or over-reward victories.

For example, in many league-based systems, the lowest, entry-level league, tends to be populated by a mixed crowd of players with very diverse skill levels. You might want to over-reward any victories that a player achieves in this league, to make them move to the higher leagues faster.

Links