9-poisson-reg

Author

math4mad

简介

泊松回归(Poisson Regression) 是一类特殊的回归模型,相应变量是计数数据(离散正整数) 响应变量的分布遵循泊松分布

dataset data

数据集有两个变量, 预测变量为数学成绩(Math Score),响应变量为奖学金等级(0-6)

1. load package

Code

    include("../utils.jl")
    import MLJ:fit!,fitted_params,coerce
    using GLMakie,MLJ,CSV,DataFrames,ScientificTypes

2. load data

Code

to_ScienceType(d)=coerce(d,:Awards=> Multiclass,:MathScore=>Continuous)
df=CSV.File("./data/competition_awards_data.csv") |> DataFrame|>dropmissing
 
 X=MLJ.table(reshape(df[:,2],200,1))
 y=Vector(df[:,1])
 (Xtrain, Xtest), (ytrain, ytest) = partition((X, y), 0.8, rng=123, multi=true)
 first(df,10)

10×2 DataFrame

Row	Awards	MathScore
	Int64	Int64
1	0	43
2	0	38
3	0	41
4	0	33
5	0	39
6	0	43
7	0	35
8	0	41
9	0	36
10	0	38

3. MLJ Workflow

3.1 load model

Code

    CountRegressor = @load LinearCountRegressor pkg=GLM
    model = CountRegressor(fit_intercept=false)
    mach = machine(model, Xtrain, ytrain)
    fit!(mach)

import MLJGLMInterface ✔

[ Info: For silent loading, specify `verbosity=0`. 
┌ Warning: The number and/or types of data arguments do not match what the specified model
│ supports. Suppress this type check by specifying `scitype_check_level=0`.
│ 
│ Run `@doc GLM.LinearCountRegressor` to learn more about your model's requirements.
│ 
│ Commonly, but non exclusively, supervised models are constructed using the syntax
│ `machine(model, X, y)` or `machine(model, X, y, w)` while most other models are
│ constructed with `machine(model, X)`.  Here `X` are features, `y` a target, and `w`
│ sample or class weights.
│ 
│ In general, data in `machine(model, data...)` is expected to satisfy
│ 
│     scitype(data) <: MLJ.fit_data_scitype(model)
│ 
│ In the present case:
│ 
│ scitype(data) = Tuple{Table{AbstractVector{Count}}, AbstractVector{Count}}
│ 
│ fit_data_scitype(model) = Union{Tuple{Table{<:AbstractVector{<:Continuous}}, AbstractVector{Count}}, Tuple{Table{<:AbstractVector{<:Continuous}}, AbstractVector{Count}, AbstractVector{<:Union{Continuous, Count}}}}
└ @ MLJBase ~/.julia/packages/MLJBase/fEiP2/src/machines.jl:230
[ Info: Training machine(LinearCountRegressor(fit_intercept = false, …), …).

trained Machine; caches model-specific representations of data
  model: LinearCountRegressor(fit_intercept = false, …)
  args: 
    1:  Source @955 ⏎ Table{AbstractVector{Count}}
    2:  Source @555 ⏎ AbstractVector{Count}

3.2 predict model results

Code

 yhat=predict_mode(mach, Xtest)|>Array
 @info "rms"=>rms(yhat,ytest)

 report(mach)

[ Info: "rms" => 0.9486832980505138

(stderror = [0.0013746169531615926],
 dof_residual = 160.0,
 vcov = [1.88957176791926e-6;;],
 deviance = 254.53389416397937,
 coef_table = ───────────────────────────────────────────────────────────────────
         Coef.  Std. Error     z  Pr(>|z|)    Lower 95%   Upper 95%
───────────────────────────────────────────────────────────────────
x1  0.00104856  0.00137462  0.76    0.4456  -0.00164564  0.00374276
───────────────────────────────────────────────────────────────────,)