sklearn各种

sklearn常用的各种

Preprocessing

1.Imputer缺失值处理

原本是属于preprocessing里的,from sklearn.preprocessing import Imputer,但是目前好像是独立了可以:from sklearn.impute import SimpleImputer or others,但是二者都可以使用

前者可以看 https://blog.csdn.net/kancy110/article/details/75041923

后者可以看 https://scikit-learn.org/stable/modules/impute.html#impute

https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html#sklearn.impute.SimpleImputer

2.LabelEncoder和OneHotEncoder

LabelEncoder类别编码将属性转化为数值

OneHotEncoder独热编码把特征转化为独热编码,方便进行训练
(为什么需要独热编码:为了更好地衡量离散化特征的区别 https://www.imooc.com/article/35900)

Label和OneHot的使用说明:
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html#sklearn.preprocessing.OneHotEncoder

把标签转换为OneHot编码:

TIM截图20190118142506

3、数值预处理:二值化、标准化、归一化

Binarizer二值化:

sklearn.preprocessing.`Binarizer(threshold=0.0, copy=True)
官方文档:
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Binarizer.html#sklearn.preprocessing.Binarizer
TIM截图20190118162643

StandardScaler标准化:

sklearn.preprocessing.`StandardScaler(copy=True, with_mean=True, with_std=True)

转换公式:z = (x - u) / s

官方文档:
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler

TIM截图20190118164334

Normalizer正则化:

sklearn.preprocessing.Normalizer`(norm=’l2’, copy=True)

官方文档:
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Normalizer.html#sklearn.preprocessing.Normalizer

Compare the effect of different scalers on data with outliers

官方文档:
https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#sphx-glr-auto-examples-preprocessing-plot-all-scaling-py

Model selection

1.数据集切分train_test_split

sklearn.model_selection.train_test_split`(arrays, options*)

官方文档:
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html#sklearn.model_selection.train_test_split

Metrics

衡量训练的效果,包括各种损失函数和判准

官方文档:
https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics

1.Regression metrics

2.Classification metrics

3.Multilabel ranking metrics

4.Clustering metrics

0%