Hello everyone, since people on here liked my earlier post about analyzing the value per coin of top players, I thought why not do another similar post. In this one I wanted to look at the goal contributions of top strikers in the game and try to figure out what stats contribute the most to it.

The Results

Top 5 most important attributes before I bore you with the rest of the post.

https://i.redd.it/iryugazdser31.png
1. Finishing
2. Aggression
3. Positioning
5. Strength

I calculated the importance of each of those attributes by training a bunch of random forest models (more on that below). Higher importance deosn't mean having a high value in that category is good, it just means certain ranges of that attribute tend to lead to better goal contribution.

Aggression Distribution of Top Strikers vs. Aggression Distribution for Everyone

Aggression was what stood out the most to me to be honest. So I decided to take a deeper look at it by filtering for top goal contributors (with more than 1.5 per game goal contribution).

Top Goal Scorers Aggression Box Plot

Non Top Goal Scorers Aggression Box Plot

Interestingly, lower aggression correlates with better goal scoring contribution even though aggression is one of the stand out attributes!

Is Composure Placebo?

This is why I was looking at the data in the first place. I wanted to figure out if composure does actually influence goal scoring or not.

Top Goal Scorers Composure Box Plot

Non Top Goal Scorers Composure Box Plot

The distributions seem more or less the same. I'd say composure is mostly placebo when it comes to goal scoring.

The Data

I used Python to pull the data. You can use the following code to pull the same data that I did.

Загрузка...
``import pandas import requests from lxml import html headers = { "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko)" + " Chrome/50.0.2661.75 Safari/537.36", "X-Requested-With": "XMLHttpRequest" } def construct_request(min_games=1000, count=600): pages = int(count / 30) dataframes =  for page in range(pages): r = requests.get( f"https://www.futbin.com/20/pgp?page={page + 1}&sort=goals&order=desc&games=1000", headers=headers ) dataframes.append(pandas.read_html(r.text)) return pandas.concat(dataframes) def get_futbin_id (player_name): clean_player_name = player_name<:player_name.find> clean_player_name.replace(" ", "+") try: r = requests.get( f"https://www.futbin.com/search?year=20&extra=1&term={clean_player_name}", headers=headers ) return r.json() except: pass def get_futbin_stats (player_name): id = get_futbin_id(player_name) try: r = requests.get(f"https://www.futbin.com/20/player/{id}", headers=headers) df = pandas.read_html(r.text) tree = html.fromstring(r.content) return { **dict( attack_wr=df.iloc, defend_wr=df.iloc, foot = df.iloc, skills = df.iloc, weak_foot = df.iloc ), **{k: v for k, v in zip( tree.xpath('//span/text()'), tree.xpath('//div/text()') )} } except: pass content = construct_request() rows = content.to_dict("records") for record in rows: stats = get_futbin_stats(record) if stats: for key, value in stats.items(): record = value content = pandas.DataFrame(rows) ``

This essentially pulls top 600 players in terms of goal contributions that have at least played 1000 games. The min games requirement was to minimize squad battles and other game modes skewing the results.

It also loads every single player page on Futbin and pulls their in-games as well as attributes such as WF, WR, etc.

If you don't want to run the code yourself, I have uploaded the final

Methodology

I calculated the feature importance of most of the stats for every player.

Generally, importance provides a score that indicates how useful or valuable each feature was in the construction of the boosted decision trees within the model. The more an attribute is used to make key decisions with decision trees, the higher its relative importance.

This importance is calculated explicitly for each attribute in the dataset, allowing attributes to be ranked and compared to each other.

Importance is calculated for a single decision tree by the amount that each attribute split point improves the performance measure, weighted by the number of observations the node is responsible for. The performance measure may be the purity (Gini index) used to select the split points or another more specific error function.

The feature importances are then averaged across all of the the decision trees within the model.

https://stats.stackexchange.com/questions/162162/relative-variable-importance-for-boosting

Conclusion

Composure is placebo