sklearn各种

sklearn常用的各种

Preprocessing

1.Imputer缺失值处理

原本是属于preprocessing里的，from sklearn.preprocessing import Imputer，但是目前好像是独立了可以：from sklearn.impute import SimpleImputer or others，但是二者都可以使用

前者可以看 https://blog.csdn.net/kancy110/article/details/75041923

后者可以看 https://scikit-learn.org/stable/modules/impute.html#impute
和
https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html#sklearn.impute.SimpleImputer

2.LabelEncoder和OneHotEncoder

LabelEncoder类别编码将属性转化为数值

OneHotEncoder独热编码把特征转化为独热编码，方便进行训练
（为什么需要独热编码：为了更好地衡量离散化特征的区别 https://www.imooc.com/article/35900）

Label和OneHot的使用说明：
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html#sklearn.preprocessing.OneHotEncoder

把标签转换为OneHot编码：

TIM截图20190118142506

3、数值预处理：二值化、标准化、归一化

Binarizer二值化：

sklearn.preprocessing.`Binarizer(threshold=0.0, copy=True)
官方文档：
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Binarizer.html#sklearn.preprocessing.Binarizer
TIM截图20190118162643

StandardScaler标准化：

sklearn.preprocessing.`StandardScaler(copy=True, with_mean=True, with_std=True)

转换公式：z = (x - u) / s

官方文档：
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler

TIM截图20190118164334

Normalizer正则化：

sklearn.preprocessing.Normalizer`(norm=’l2’, copy=True)

官方文档：
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Normalizer.html#sklearn.preprocessing.Normalizer

Compare the effect of different scalers on data with outliers

官方文档：
https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#sphx-glr-auto-examples-preprocessing-plot-all-scaling-py

Model selection

1.数据集切分train_test_split

sklearn.model_selection.train_test_split`(arrays, options*)

官方文档：
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html#sklearn.model_selection.train_test_split

Metrics

衡量训练的效果，包括各种损失函数和判准

官方文档：
https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics

Wu's Blog