Code
include("../utils.jl")
import MLJ:fit!,fitted_params,coerce
using GLMakie,MLJ,CSV,DataFrames,ScientificTypes
泊松回归(Poisson Regression) 是一类特殊的回归模型,相应变量是计数数据(离散正整数) 响应变量的分布遵循泊松分布
dataset data
数据集有两个变量, 预测变量为数学成绩(Math Score),响应变量为奖学金等级(0-6)
include("../utils.jl")
import MLJ:fit!,fitted_params,coerce
using GLMakie,MLJ,CSV,DataFrames,ScientificTypes
to_ScienceType(d)=coerce(d,:Awards=> Multiclass,:MathScore=>Continuous)
=CSV.File("./data/competition_awards_data.csv") |> DataFrame|>dropmissing
df
=MLJ.table(reshape(df[:,2],200,1))
X=Vector(df[:,1])
y= partition((X, y), 0.8, rng=123, multi=true)
(Xtrain, Xtest), (ytrain, ytest) first(df,10)
Row | Awards | MathScore |
---|---|---|
Int64 | Int64 | |
1 | 0 | 43 |
2 | 0 | 38 |
3 | 0 | 41 |
4 | 0 | 33 |
5 | 0 | 39 |
6 | 0 | 43 |
7 | 0 | 35 |
8 | 0 | 41 |
9 | 0 | 36 |
10 | 0 | 38 |
= @load LinearCountRegressor pkg=GLM
CountRegressor = CountRegressor(fit_intercept=false)
model = machine(model, Xtrain, ytrain)
mach fit!(mach)
import MLJGLMInterface ✔
[ Info: For silent loading, specify `verbosity=0`.
┌ Warning: The number and/or types of data arguments do not match what the specified model
│ supports. Suppress this type check by specifying `scitype_check_level=0`.
│
│ Run `@doc GLM.LinearCountRegressor` to learn more about your model's requirements.
│
│ Commonly, but non exclusively, supervised models are constructed using the syntax
│ `machine(model, X, y)` or `machine(model, X, y, w)` while most other models are
│ constructed with `machine(model, X)`. Here `X` are features, `y` a target, and `w`
│ sample or class weights.
│
│ In general, data in `machine(model, data...)` is expected to satisfy
│
│ scitype(data) <: MLJ.fit_data_scitype(model)
│
│ In the present case:
│
│ scitype(data) = Tuple{Table{AbstractVector{Count}}, AbstractVector{Count}}
│
│ fit_data_scitype(model) = Union{Tuple{Table{<:AbstractVector{<:Continuous}}, AbstractVector{Count}}, Tuple{Table{<:AbstractVector{<:Continuous}}, AbstractVector{Count}, AbstractVector{<:Union{Continuous, Count}}}}
└ @ MLJBase ~/.julia/packages/MLJBase/fEiP2/src/machines.jl:230
[ Info: Training machine(LinearCountRegressor(fit_intercept = false, …), …).
trained Machine; caches model-specific representations of data
model: LinearCountRegressor(fit_intercept = false, …)
args:
1: Source @955 ⏎ Table{AbstractVector{Count}}
2: Source @555 ⏎ AbstractVector{Count}
=predict_mode(mach, Xtest)|>Array
yhat@info "rms"=>rms(yhat,ytest)
report(mach)
[ Info: "rms" => 0.9486832980505138
(stderror = [0.0013746169531615926],
dof_residual = 160.0,
vcov = [1.88957176791926e-6;;],
deviance = 254.53389416397937,
coef_table = ───────────────────────────────────────────────────────────────────
Coef. Std. Error z Pr(>|z|) Lower 95% Upper 95%
───────────────────────────────────────────────────────────────────
x1 0.00104856 0.00137462 0.76 0.4456 -0.00164564 0.00374276
───────────────────────────────────────────────────────────────────,)