The Myth of the High-Skill Cap

hearthstone 4 - The Myth of the High-Skill Cap

Hey all, J_Alexander_HS back again today to talk about an idea that gets thrown around somewhat regularly in discussions of specific deckbuilding choices: the idea that a card or deck might look like it performs poorly when examining the statistics surrounding it, but actually has great potential to be an amazing performer when piloted by the right player. The idea that something has a high skill cap.

This idea is not necessarily incorrect. There have been several examples of decks that look like they perform poorly-to-decent, on average, but turn out to be true powerhouses in the hands of seasoned players. Patron Warrior is the go-to example here, but there are others, including the recent, yet brief flirtations with Nomi Priest. Sometimes there truly are decks that take time and patience to build, refine, and learn before they start performing well. Even good players might need dozens or hundreds of games of practice before they start to get a sense for how things operate.

The myth part I wanted to discuss today is an issue related to this real phenomenon: once people get the idea in their head that this is true of some cards or decks, they begin to apply it too broadly without enough evidence that it’s true to justify a list. It’s easy to claim that something is high-skill and difficult to back that claim up confidently.

To demonstrate this you want to show: (a) that deck performance increases as familiarity with the deck does – so bad players show bad results against other bad players, average players show average results against other average players, and good players show good results against other good players, or some similar pattern – (b) that the deck shows a steeper learning curve than its alternatives, as all decks take time to master but some take a lot more, and (c) that at high skill levels the deck tends to outperform its alternatives. This last part is important for making a high-skill claim, as we’ll see in a bit.

The issue with making these claims is that drawing such a conclusion requires more data than any one, two, or even 10 players can provide. To truly understand how a deck or card performs, we want to see several thousand games; not a few dozen or even a few hundred of them. We want to see the deck in the hands of bad, decent, excellent, and elite players. We want to know how well decks perform not only when the pilot knows how to play them, but when their opponents know how to play against them.

Let’s look at these ideas in a bit more detail and understand this problem

Before the last round of nerfs, I played a ton of Myracle Rogue. I had been building and refining the deck since the Leeching Poison nerf and had clocked hundreds of games in with it. While I had a wealth of knowledge about how the deck functioned and had been playing it at top legend for months, I was still a bit unsure about how to fill in the last few card slots. Up until the day the deck got nerfed and we had tons of data on how it worked, people were still playing versions varying in their use of Nomi, Shadowstep, Spirit of Shark, Cold Blood, Crystallizer, Faerie Dragon, Wisp, Sap, Ooze, Bloodsail Corsair, Zilliax, Togwaggle, Cable Rat, and even a few other choices. All the decks were posting impressive results because the core was strong, but many players were giving up percentage points here and there with their last few inclusions. They had to be, since not all those choices could be correct.

The reasons for that jumble of different choices boils down to the following issue: there are many moving parts that interact with each other within the deck and against the meta, and changes from one variant to the other might only yield tiny improvements in overall win rate. To accurately and reliably detect the presence of such tiny changes you need a massive sample size of games. For the purposes of perspective, even a dedicated team of ten players trying different variants of the deck all month and doing nothing else would be unlikely to provide reliable data that answers the question of how to optimize the list. Finding out how all the different cards perform against all the different matches and alternatives is too much to ask of the data they could provide. This problem gets much worse when we’re talking about individual players trying to figure it out while not dedicating themselves to the issue fully.

Thankfully, we have other tools at our disposal, like HSReplay and VS data. These sites can aggregate data from thousands of players, rapidly accelerating this process of discovery. From the best available data we had at the time, there were some clearly sub-optimal choices in the deck like Wisp, Shadowstep, and Spirit of the Shark. Including them in your deck tended to do little more than decrease you win rate, yet people persisted in including them anyway and sometimes achieved top-legend finishes with them present. They weren’t achieving those ranks because the cards were present, however; they achieved those ranks in spite of those cards being present.

Then again, maybe those cards only looked bad because most players didn’t have the proper amount of experience playing them. Maybe those cards were great if the deck was being played properly. That was the argument being tossed around a lot of the time to explain the poor statistics, and there’s the problem there: if you think most players aren’t competent enough to play the cards properly (which is what people typically say when the data doesn’t agree with their personal intuitions), you can justify discounting all that negative data. The result, however, is that you’re usually not left with enough data to draw the conclusion you want either. Rather than being able to confidently call the card high skill and good enough to include, the remaining judgment should be one of ambivalence in the face of small (and unreliable) sample sizes.


The other problem with that approach is that you might just be wrong about the high skill part and are now ignoring all the data that doesn’t fit your existing idea to justify a bad choice. If you can’t support your ideas well and instead focus on discounting contradictory data to make your point, you might be on this path.

This is related to what is known as a file drawer effect. This comes up a lot in academic research where some groundbreaking (and perhaps unexpected) result gets published. Another few papers publish similar results in the coming years and people start to accept the effect as true and well-replicated. That’s all well and good, until a deeper understanding of the issues surrounding publication begins to arise: all the people who tried and failed to replicate those results never published their papers because journals don’t publish failures often. The failed experiments just end up sitting unpublished in a file drawer somewhere and people only see the attempts that worked, even if the research is otherwise identical in terms of methodology. This gives people a biased sense of the data and it can make things look real that doesn’t exist. It’s not until later that serious replication attempts with substantially larger samples fail to find these effects that some people begin to say, “OK, maybe we were wrong.” Others might simply say that something was wrong with the new experiment with the larger sample and so the contradictory data should be dismissed (the “research is high-skill” argument).

To put that in simple Hearthstone terms, if I post a screenshot of myself sitting at rank 1 legend with a deck, people will pay attention. If I post a screenshot of myself dumpstering with that same deck, it will draw much less interest. The big, initial results get the attention. Larger sample sizes on that deck that get acquired afterwards might get ignored as the win rate falls from that peak.

There are other issues to think about with respect to claiming a card is high-skill we can run through quickly.

  • “High-skill” doesn’t equal “good.” A card might have a 30% win rate in the hands of a bad player and a 45% win rate in the hands of a pro. That’s a huge skill cap and still a bad card, even for the best players. When it comes to your typical argument in favor a deck or card being good because it’s high skill, people are usually interested in convincing others that a deck will win a lot compared to its alternatives; not just that it’s hard to play. To really make the argument in favor of a “high skill” choice, you’ll need data showing both that something performs better at high levels of experience and that is outperforms its competition when those options are also piloted by a comparably high skill player. This requires a certain type of data that few are able to provide, but that doesn’t stop them from making the claim.
  • In practical terms, a “high-skill” card does equal “bad” for most players. Unless you’re among the elites in skill, you will not be able to draw the power of those high skill cards or decks consistently. You’re better off avoiding them if the stats say they’re bad on average. Fancying yourself a higher-skill player than you are and choosing a deck accordingly won’t do you any favors. There’s always room to play with those cards to try and learn and improve, but for many they may not improve enough to ever tap the real potential and miss out on wins because of it.
  • A card is likely to be perceived as high skill when it functions in combination with other cards. To use the above Rogue example (as I’m familiar with it), this means people might latch onto cards like Shadowstep or Spirit of the Shark as high skill, as they have many possible applications and big moments. Because the variance of such cards tends to be high and they can result in big moments which are memorable, it’s easy to lose track of all the times those cards are losing you the game because they are sitting dead in the hand when they could be doing something better were they a more consistent card instead (see the File Drawer effect). However, just because this card has a lot of potential uses, that doesn’t mean it’s high skill. Such cards are often conditional and the game may dictate their use more than your skill.
  • Related to the last point, even a card or deck perceived to be “Low-Skill” can still be impacted plenty by player skill. For instance, there’s not a ton of skill in playing Captain Hooktusk. You can often just play the card and have it be good without requiring any kind of special combo or setup. Players should (in my humble opinion) almost always be keeping it in the mulligan of the decks that play it, as it does have close to, if not the, highest mulligan win rate in those lists. She’s the entire reason for the deck existing. Despite that, HSReplay shows that the majority of players throw Hooktusk away in the mulligan if they have the option. If my perceptions are correct and it should be kept, this means these Hooktusk deck’s win rates are being lowered by this fairly “low skill” decision (keep or not keep in mulligan when the answer is almost always “keep”). If you want to throw out the data of “bad” players, you’ll end up throwing out basically all the data on HSReplay for many decks; not just the high-skill ones.

TL;DR: It’s easy to call a card high skill and hard to support that claim with lots of data. If you’re only one player, it’s near impossible. You need to show that a deck/card performs better as player skill increases and – to be truly convincing – that it outperforms other alternatives when also piloted by a high-skill player. Discounting all the data that doesn’t confirm your personal intuitions can mislead you into forcing reality to fit your conclusion, rather than the other way around.

Be wary of failing into the trap of thinking a pet card or deck is better than it is and hiding that mistake under the guise of “it’s just high skill”

Source: Original link

© Post "The Myth of the High-Skill Cap" for game HearthStone.

Top 10 Most Anticipated Video Games of 2020

2020 will have something to satisfy classic and modern gamers alike. To be eligible for the list, the game must be confirmed for 2020, or there should be good reason to expect its release in that year. Therefore, upcoming games with a mere announcement and no discernible release date will not be included.

Top 15 NEW Games of 2020 [FIRST HALF]

2020 has a ton to look forward to...in the video gaming world. Here are fifteen games we're looking forward to in the first half of 2020.

You Might Also Like

Leave a Reply

Your email address will not be published. Required fields are marked *