4-german-creditcard-logistics-reg

简介

1. load package

Code
include("../utils.jl")
import MLJ:predict,fit!,predict_mode,range
using DataFrames,MLJ,CSV,MLJModelInterface,GLMakie

2. data procsssing

Code
Xtrain, Xtest, ytrain, ytest,cat= load_german_creditcard();

3. MLJ workflow

3.1 define model

Code
LogisticClassifier = @load LogisticClassifier pkg=MLJLinearModels
model=LogisticClassifier()
NuSVC = @load NuSVC pkg=LIBSVM
model2 = NuSVC()
KNNClassifier = @load KNNClassifier pkg=NearestNeighborModels
model3 = KNNClassifier(weights = NearestNeighborModels.Inverse())

"定义 几个 tune 参数的区间 "
k1 =range(model, :gamma, lower=0.1, upper=1.2);
k2 =range(model, :lambda, lower=0.1, upper=1.2);
k3 =range(model, :penalty, values=([:l2, :l1,:en,:none]));
k4 =range(model, :fit_intercept, values=([true, false]));

tuning_logistic = TunedModel(model=model,
                             resampling = CV(nfolds=4, rng=1234),
                             tuning = Grid(resolution=8),
                             range = [k1,k2],
                             measure=accuracy)
mach = machine(tuning_logistic, Xtrain, ytrain;scitype_check_level=0)|>fit!
[ Info: For silent loading, specify `verbosity=0`. 
[ Info: For silent loading, specify `verbosity=0`. 
[ Info: For silent loading, specify `verbosity=0`. 
[ Info: Training machine(ProbabilisticTunedModel(model = LogisticClassifier(lambda = 2.220446049250313e-16, …), …), …).
[ Info: Attempting to evaluate 64 models.
Evaluating over 64 metamodels:   0%[>                        ]  ETA: N/A┌ Warning: The number and/or types of data arguments do not match what the specified model
│ supports. Suppress this type check by specifying `scitype_check_level=0`.
│ 
│ Run `@doc MLJLinearModels.LogisticClassifier` to learn more about your model's requirements.
│ 
│ Commonly, but non exclusively, supervised models are constructed using the syntax
│ `machine(model, X, y)` or `machine(model, X, y, w)` while most other models are
│ constructed with `machine(model, X)`.  Here `X` are features, `y` a target, and `w`
│ sample or class weights.
│ 
│ In general, data in `machine(model, data...)` is expected to satisfy
│ 
│     scitype(data) <: MLJ.fit_data_scitype(model)
│ 
│ In the present case:
│ 
│ scitype(data) = Tuple{Table{Union{AbstractVector{Continuous}, AbstractVector{OrderedFactor{33}}, AbstractVector{OrderedFactor{10}}, AbstractVector{OrderedFactor{5}}, AbstractVector{OrderedFactor{53}}, AbstractVector{OrderedFactor{3}}, AbstractVector{OrderedFactor{4}}, AbstractVector{OrderedFactor{2}}}}, AbstractVector{OrderedFactor{2}}}
│ 
│ fit_data_scitype(model) = Tuple{Table{<:AbstractVector{<:Continuous}}, AbstractVector{<:Finite}}
└ @ MLJBase ~/.julia/packages/MLJBase/fEiP2/src/machines.jl:230
Evaluating over 64 metamodels:   2%[>                        ]  ETA: 0:13:55Evaluating over 64 metamodels:   3%[>                        ]  ETA: 0:07:07Evaluating over 64 metamodels:   5%[=>                       ]  ETA: 0:04:40Evaluating over 64 metamodels:   6%[=>                       ]  ETA: 0:03:27Evaluating over 64 metamodels:   8%[=>                       ]  ETA: 0:02:43Evaluating over 64 metamodels:   9%[==>                      ]  ETA: 0:02:13Evaluating over 64 metamodels:  11%[==>                      ]  ETA: 0:01:52Evaluating over 64 metamodels:  12%[===>                     ]  ETA: 0:01:37Evaluating over 64 metamodels:  14%[===>                     ]  ETA: 0:01:25Evaluating over 64 metamodels:  16%[===>                     ]  ETA: 0:01:15Evaluating over 64 metamodels:  17%[====>                    ]  ETA: 0:01:07Evaluating over 64 metamodels:  19%[====>                    ]  ETA: 0:01:00Evaluating over 64 metamodels:  20%[=====>                   ]  ETA: 0:00:54Evaluating over 64 metamodels:  22%[=====>                   ]  ETA: 0:00:50Evaluating over 64 metamodels:  23%[=====>                   ]  ETA: 0:00:45Evaluating over 64 metamodels:  25%[======>                  ]  ETA: 0:00:42Evaluating over 64 metamodels:  27%[======>                  ]  ETA: 0:00:38Evaluating over 64 metamodels:  28%[=======>                 ]  ETA: 0:00:36Evaluating over 64 metamodels:  30%[=======>                 ]  ETA: 0:00:33Evaluating over 64 metamodels:  31%[=======>                 ]  ETA: 0:00:31Evaluating over 64 metamodels:  33%[========>                ]  ETA: 0:00:29Evaluating over 64 metamodels:  34%[========>                ]  ETA: 0:00:27Evaluating over 64 metamodels:  36%[========>                ]  ETA: 0:00:25Evaluating over 64 metamodels:  38%[=========>               ]  ETA: 0:00:23Evaluating over 64 metamodels:  39%[=========>               ]  ETA: 0:00:22Evaluating over 64 metamodels:  41%[==========>              ]  ETA: 0:00:20Evaluating over 64 metamodels:  42%[==========>              ]  ETA: 0:00:19Evaluating over 64 metamodels:  44%[==========>              ]  ETA: 0:00:18Evaluating over 64 metamodels:  45%[===========>             ]  ETA: 0:00:17Evaluating over 64 metamodels:  47%[===========>             ]  ETA: 0:00:16Evaluating over 64 metamodels:  48%[============>            ]  ETA: 0:00:15Evaluating over 64 metamodels:  50%[============>            ]  ETA: 0:00:14Evaluating over 64 metamodels:  52%[============>            ]  ETA: 0:00:13Evaluating over 64 metamodels:  53%[=============>           ]  ETA: 0:00:12Evaluating over 64 metamodels:  55%[=============>           ]  ETA: 0:00:12Evaluating over 64 metamodels:  56%[==============>          ]  ETA: 0:00:11Evaluating over 64 metamodels:  58%[==============>          ]  ETA: 0:00:10Evaluating over 64 metamodels:  59%[==============>          ]  ETA: 0:00:10Evaluating over 64 metamodels:  61%[===============>         ]  ETA: 0:00:09Evaluating over 64 metamodels:  62%[===============>         ]  ETA: 0:00:08Evaluating over 64 metamodels:  64%[================>        ]  ETA: 0:00:08Evaluating over 64 metamodels:  66%[================>        ]  ETA: 0:00:07Evaluating over 64 metamodels:  67%[================>        ]  ETA: 0:00:07Evaluating over 64 metamodels:  69%[=================>       ]  ETA: 0:00:06Evaluating over 64 metamodels:  70%[=================>       ]  ETA: 0:00:06Evaluating over 64 metamodels:  72%[=================>       ]  ETA: 0:00:06Evaluating over 64 metamodels:  73%[==================>      ]  ETA: 0:00:05Evaluating over 64 metamodels:  75%[==================>      ]  ETA: 0:00:05Evaluating over 64 metamodels:  77%[===================>     ]  ETA: 0:00:04Evaluating over 64 metamodels:  78%[===================>     ]  ETA: 0:00:04Evaluating over 64 metamodels:  80%[===================>     ]  ETA: 0:00:04Evaluating over 64 metamodels:  81%[====================>    ]  ETA: 0:00:03Evaluating over 64 metamodels:  83%[====================>    ]  ETA: 0:00:03Evaluating over 64 metamodels:  84%[=====================>   ]  ETA: 0:00:03Evaluating over 64 metamodels:  86%[=====================>   ]  ETA: 0:00:02Evaluating over 64 metamodels:  88%[=====================>   ]  ETA: 0:00:02Evaluating over 64 metamodels:  89%[======================>  ]  ETA: 0:00:02Evaluating over 64 metamodels:  91%[======================>  ]  ETA: 0:00:01Evaluating over 64 metamodels:  92%[=======================> ]  ETA: 0:00:01Evaluating over 64 metamodels:  94%[=======================> ]  ETA: 0:00:01Evaluating over 64 metamodels:  95%[=======================> ]  ETA: 0:00:01Evaluating over 64 metamodels:  97%[========================>]  ETA: 0:00:00Evaluating over 64 metamodels:  98%[========================>]  ETA: 0:00:00Evaluating over 64 metamodels: 100%[=========================] Time: 0:00:14
┌ Warning: The number and/or types of data arguments do not match what the specified model
│ supports. Suppress this type check by specifying `scitype_check_level=0`.
│ 
│ Run `@doc MLJLinearModels.LogisticClassifier` to learn more about your model's requirements.
│ 
│ Commonly, but non exclusively, supervised models are constructed using the syntax
│ `machine(model, X, y)` or `machine(model, X, y, w)` while most other models are
│ constructed with `machine(model, X)`.  Here `X` are features, `y` a target, and `w`
│ sample or class weights.
│ 
│ In general, data in `machine(model, data...)` is expected to satisfy
│ 
│     scitype(data) <: MLJ.fit_data_scitype(model)
│ 
│ In the present case:
│ 
│ scitype(data) = Tuple{Table{Union{AbstractVector{Continuous}, AbstractVector{OrderedFactor{33}}, AbstractVector{OrderedFactor{10}}, AbstractVector{OrderedFactor{5}}, AbstractVector{OrderedFactor{53}}, AbstractVector{OrderedFactor{3}}, AbstractVector{OrderedFactor{4}}, AbstractVector{OrderedFactor{2}}}}, AbstractVector{OrderedFactor{2}}}
│ 
│ fit_data_scitype(model) = Tuple{Table{<:AbstractVector{<:Continuous}}, AbstractVector{<:Finite}}
└ @ MLJBase ~/.julia/packages/MLJBase/fEiP2/src/machines.jl:230
import MLJLinearModels ✔
import MLJLIBSVMInterface ✔
import NearestNeighborModels ✔
trained Machine; does not cache data
  model: ProbabilisticTunedModel(model = LogisticClassifier(lambda = 2.220446049250313e-16, …), …)
  args: 
    1:  Source @007 ⏎ Table{Union{AbstractVector{Continuous}, AbstractVector{OrderedFactor{33}}, AbstractVector{OrderedFactor{10}}, AbstractVector{OrderedFactor{5}}, AbstractVector{OrderedFactor{53}}, AbstractVector{OrderedFactor{3}}, AbstractVector{OrderedFactor{4}}, AbstractVector{OrderedFactor{2}}}}
    2:  Source @757 ⏎ AbstractVector{OrderedFactor{2}}

3.2 predict test results

Code
yhat=predict_mode(mach, Xtest)|>Array
@info "german-creditcard 违约预测准确率"=>accuracy(ytest,yhat)|>d->round(d,digits=3)
[ Info: "german-creditcard 违约预测准确率" => 0.74