Being an Economics major, statistical analysis is used in every course I have taken. While I know some Excel and some STATA, I felt the need to improve my skills. Thus, I took it upon myself to do learn to code and parse through data better, ultimately I did the shittiest statistical analysis possible. Being a Hearthstone fanatic, I wanted to mess around with Hearthstone data. At first, I found my Hearthstone Deck Tracker backup which was stored in an extremely large XML file. After converting to a STATA readable format and cleaning up the data (removing missing names, only using Ranked game modes, etc.) I was left with a sample size of 3722 games almost all of which were Rank 4 and Top 500 legend on NA.
From here I generated a new variable which represented the first letter of my opponents battletag. From there I counted total times I faced that letter, and total times I won, as well as calculating a win rate. For those curious, my highly inefficient code can be found here. From there I went through all 26 letters of the alphabet as well as a new variable for other characters. A general overview of the data is here. Now I don’t expect you to read that at all. I mean we all know how useless this is. Therefore, I made some neat looking charts. The first of which is the representation of each letter. The top 5 letters in order of popularity are “S”, “M”, “D”, “B”, and “T”. The bottom 5 letters are “Q”, “U”, Other, “X”, and “Y”. From there organized the win percentage versus each letter. Apparently, I am good vs. people starting with the letter “F”, “N”, and “P”. On the other hand, I cannot beat people starting with the letter “Q” “V” and non-english characters. So what conclusions can we derive from this data? Well, we can see the most popular and least popular letters to start battletags, but my win rates vs different letters are pretty much useless. Lastly, I looked at classes by letter. This ultimately turned out to be a measure of how popular each class was within the data. There were no significant letter changes per class or anything like that. I believe my data sample is significant enough to assume that even if blizzard looked at their data within the same time period, they would see a similar letter distribution by battle tag on the NA server. Clearly there would be different distributions on CN or EU and I would be curious to see the character distribution on those servers.
I also did a regression with an R-squared of 0. I regressed whether the first letter of the name influenced whether the game was a win or not (dropping other characters to avoid collinearity). This violated multiple assumptions when running a least squares regression, again another reason this data is useless. TLDR: There is no effect on win rate depending on the first character of your opponent’s battletag (shocker, right?). Also, opponents battletag does not influence which class they will be in any given game.
Source: Original link
© Post "Does the First Letter of your Opponent’s Battletag have any Effect on Win Rate? A Shitty Statistical Analysis" for game HearthStone.