Tl, dr: Graphs with the calculated champion similarities are found in the middle of this post. A player is likely to play champions close to each other in these graphs.

Hello everyone

I calculated champion similarities using neural net embeddings based on champion mastery points and wanted to share the results with you all. It is a way to determine which champions are played by the same person, so the output can be used as a recommendation engine. I did this because I work as a data scientist at a fairly large retailer and wanted to know more about neural net embeddings as that could improve some neural nets I made in the past to predict various things and to improve my knowledge about recommendation engines. We get some time every week for continued education at work, so this was an educational project it did in that time.

I wrote this post as I hope it will be an interesting read for you.

1. Introduction
2. Data collection
3. Neural net embedding and loss function for champion similarity
4. Graphical representation of the calculated champion embeddings

Introduction

The idea behind this project is basically to try to calculate champion similarity based on champion mastery points for many player accounts. The reasoning behind using mastery points is that if a player likes some champions, they are likely to be similar in some way (e.g. I personally like to play enchanters and tanks) so the player will have high mastery points for these champions compared to the others. This is similar to what is behind a recommendation engine, I want to calculate similarity of champions for the players, and could then recommend a player champions close to their main champion to try out.

To accomplish this I used neural net embeddings following

this article. An embedding is a representation for each champion in a (fairly) high-dimensional space with the dot product as similarity measure between individual champions. The dot product is 1 for similar champions and 0 for champions which are not similar (actually, the dot product would go to -1 for antisimilar champions, but using 0 works better with the similarity function introduced later). Sounds rather complicated, so I will give an example:

Assume we have 5 champions and embed them into a 3-dimensional space. This will give us a matrix like the following one:

ChampionDimension 1Dimension 2Dimension 3
Amumu1
Ahri0.10.9
Syndra1
Yasuo0.10.9
Zed1

The dot product is calculated by multiplying two vectors and then suming up the result, e.g.

``dot(Amumu, Ahri) = 1*0.1 + 0*0.9 + 0*0 = 0.1 ``

For the example matrix above, this would result in a similarity table like this:

AmumuAhriSyndraYasuoZed
Amumu0.1
Ahri0.10.90.09
Syndra0.90.1
Yasuo0.090.10.9
Zed0.9

So for this example embedding, there would be a big similarity between Ahri/Syndra and Yasuo/Zed.

After learning the embeddings, they are reduced onto two dimensions using a dimension reduction technique. I will use t-SNE, and also provide the result from

MDS as you will see that the choice of dimensionality reduction has a visible effect on the output.

The code for this can be found

here. If you want to use to code for yourself, note that I used a locally installed SQL Server 2019 Express as database to store the data since I still have one on my computer from a work project. You also need a Riot API key in order to access the Riot API to download the necessary data. You can also contact me and I will help you to get it running if you want.

Data collection

A lot of data is usually necessary to reliable train neural networks. To do so I accessed the Riot API to download the mastery points for 120'000 accounts on both EUW as well as NA (I might also do Korea to compare in the future). To get the account names, I started by looking at my account (somewhere in gold I think) and get the account names for my last 100 played games. I repeated this for ~500 randomly selected accounts, which gives me a library of ~300'000 accounts from which I randomly selected the 120'000 accounts.

For all of these accounts I downloaded and saved the mastery points for all but the newest champions (the newest champion considered is Yone) for a total of 150 champions into the local database.

Neural net embedding and loss function for champion similarity

After downloading the data, I excluded one-trick accounts (remember that I want to recommend you another champion, not to be a one-trick) which I defined as having a champion with more than 50% of all the champion mastery points on the account. I also excluded accounts with less than 100000 champion mastery points over all champions.

For anyone interested, the neural net including the dot product is defined as

Загрузка...
``X = mx.sym.Variable('data') y = mx.sym.Variable('label') symEmb = mx.sym.Embedding(data = X, input_dim = nChamps, output_dim = nDimEmbedding) symEmbChamp1 = mx.sym.slice_axis(symEmb, 1, 0, 1) symEmbChamp2 = mx.sym.slice_axis(symEmb, 1, 1, 2) symEmbReshape1 = mx.sym.reshape(symEmbChamp1, (-1, nDimEmbedding)) symEmbReshape2 = mx.sym.reshape(symEmbChamp2, (-1, nDimEmbedding)) symSkalarProdWinkel = mx.sym.sum(symEmbReshape1 * symEmbReshape2, axis = 1, keepdims = True) symFehler = mx.sym.LinearRegressionOutput(symSkalarProdWinkel, y) ``

together with a custom data iterator. The embedding layer has 15 dimensions. The data iterator selects randomly (but skewed towards the more played champions) 15 champions for a random account and calculates the geometric means of the champion mastery point ratios for all the combinations which are the target variables for the neural net:

``geom(Champ_1, Champ_2) = sqrt((Mastery_1 / sum(All mastery points)) * (Mastery_2 / sum(All mastery points))) ``

The loss function between the dot products and the target variables is a standard MSE-error.

Each epoch for the neural net training consists of the combination of the 15 randomly selected champions for 10'000 randomly selected accounts, repeated over 100 epochs.

Graphical representation of the calculated champion embeddings (not mobile friendly, but what can you do with 150 champion-icons, sorry)

The calculated embeddings after neural net training is a 150 by 15 matrix which is nearly impossible to visualize directly. To overcome this, we need to reduce the dimensionality to a displayable amount (namely 2 dimensions), for which I used
t-SNE. The results look as follows (if you miss your champion, it can happend that two champions are so close together that one icon is completely covered by the other, e.g. Orrn is behind Urgot for EUW):

EUW:

Graphical representation of the champion similarities for EUW calculated with neural net embeddings followed by t-SNE. Champions close to each other are more likely to be both played by the same player.

We can nicely see the ADC cluster (with Ziggs) at the bottom and the supports to the bottom left, separated into enchanters, tanks and catchers. And Zyra (my old main) somewhere hanging in there. Lux is also more support than midlane-mage. There is also an assassin/edgelord cluster top left. Interestingly, LeBlanc is located with other mages, not other assassins. In the center and top-right we have tanks and junglers with fighters/juggernaughts being on the right, except Irelia which is in the edgelord-cluster.

We can also see champions like Lillia, Nidalee, Qiyana, Quinn, Ivern, Aurelion Sol and Yorick far from other champions. This is to be expected as they have unique playstyles or attract one-trick players as they don't have other similarly played champions.

NA:

Graphical representation of the champion similarities for NA calculated with neural net embeddings followed by t-SNE. Champions close to each other are more likely to be both played by the same player.

While NA looks fairly similar to EUW, there are some differences where the clusters are located relative to each other. We can discuss individual champions or clusters further in the comments.

As a comparison of the effect the choice of dimensionality reduction technique has, I also want to show the results from applying MDS on the trained neural net embeddings.

EUW1:

Graphical representation of the champion similarities for EUW calculated with neural net embeddings followed by MDS. Champions close to each other are more likely to be both played by the same player.

Compared to t-SNE, the champions are more evenly spred out. Overall the same clusters as seen in t-SNE still exist, but are way less visible. But you can see the champion icons better here as they overlap less.

NA:

Graphical representation of the champion similarities for NA calculated with neural net embeddings followed by MDS. Champions close to each other are more likely to be both played by the same player.

As I downloaded all the champion mastery points anyway, I also want to show some more information on them which I found interesting.

Here is a table of the ratio of accounts which had no mastery points for a given champion (out of the 120'000 accounts):

Ratio of accounts with no mastery points for the specified champion for both EUW and NA for the 120'000 used accounts.

Not surprisinlgy, a lot of players have not played the newer champions, but also Ivern and Skarner are up there. On the other end, almost everyone has played at least a single game of Ashe or Lux. In addition, it seems that EUW players tend to play/try out a little more different champions than NA players.

Here is a table for the total sum of all mastery points for the different champions (out of the 120'000) accounts as well as the ratio of these mastery points compared to the total sum of all mastery points:

Sum of all mastery points per champion for the 120'000 used accounts together with the ratio of the sum to the total amount of mastery points over all champions for both EUW and NA.

I don't think we have to discuss who will get the next skins are the most popular.

Thanks for reading so far. It was a really interesting small project for me and I hope you found something in here that got you thinking. Again, I did this as a personal continued education project and have no affiliation with Riot. If you have questions, I'll try to answer them in the comments. Tldr is at the beginning.

Have a good day 🙂