2018 世界杯

2025-12-01 07:07:41

2018 男足世界杯(128 场比赛)基本统计信息

完整数据分析报告:https://github.com/adi0229/ML-DL/blob/master/fifa2018.ipynb

数据特征包含:

Index(['Date', 'Team', 'Opponent', 'Goal Scored', 'Ball Possession %',

'Attempts', 'On-Target', 'Off-Target', 'Blocked', 'Corners', 'Offsides',

'Free Kicks', 'Saves', 'Pass Accuracy %', 'Passes',

'Distance Covered (Kms)', 'Fouls Committed', 'Yellow Card',

'Yellow & Red', 'Red', 'Man of the Match', '1st Goal', 'Round', 'PSO',

'Goals in PSO', 'Own goals', 'Own goal Time'],

dtype='object')

随机森林分类器(Baseline)及特征重要性

import numpy as np

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

data = pd.read_csv(path + 'FIFA_2018_Statistics.csv')

y = (data['Man of the Match'] == "Yes")

# 特征工程 -> 选取numerical类数值作为训练特征

feature_names = [i for i in data.columns if data[i].dtype in [np.int64]]

X = data[feature_names]

train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)

rf = RandomForestClassifier(random_state=0).fit(train_X, train_y)

from sklearn.metrics import accuracy_score

predictions = rf.predict(val_X)

print("accuracy_score: " + str(accuracy_score(predictions, val_y)))

accuracy_score: 0.59375

import eli5

from eli5.sklearn import PermutationImportance

perm = PermutationImportance(rf, random_state=1).fit(val_X, val_y)

eli5.show_weights(perm, feature_names = val_X.columns.tolist())

随机森林分类器(微调)及特征重要性变化

rf = RandomForestClassifier(random_state=0,n_estimators=500).fit(train_X, train_y)

predictions = rf.predict(val_X)

print("accuracy_score: " + str(accuracy_score(predictions, val_y)))

accuracy_score: 0.71875

分析:「随机森林」准确率(60% - 72%)提升之后

扑救、传球准确率、射门命中率的重要性上升

角球、全场跑动距离的重要性下降

符合足球战术常识

Xgboost 分类器(微调)及特征重要性

from xgboost import XGBRFClassifier

xgb = XGBRFClassifier(silent=False,

scale_pos_weight=1,

learning_rate=0.01,

colsample_bytree = 0.4,

subsample = 0.8,

n_estimators=1000,

reg_alpha = 0.3,

max_depth=4,

gamma=10).fit(train_X, train_y)

predictions = xgb.predict(val_X)

print("accuracy_score: " + str(accuracy_score(predictions, val_y)))

accuracy_score: 0.71875

Xgboost发现进球是唯一重要特征。

简单粗暴,也更符合足球常理。进球多,更容易获胜,获胜一方容易出 MVP 球员。其他数据的关系并不大。

perm_xgb = PermutationImportance(xgb, random_state=1).fit(val_X, val_y)

eli5.show_weights(perm_xgb, feature_names = val_X.columns.tolist())

2017年03月10日 中超第2轮 山东鲁能VS广州恒大
励志!Uzi奥运全马5小时41分35秒完赛 成全球完成全马电竞第一人