Simpson’s paradox on penguins

Author

math4mad

Code
include("utils.jl")
[ Info: loading success

load data

Code
df=@pipe CSV.File("./data/palmerpenguins.csv")|>DataFrame|>dropmissing
first(df,10)
10×7 DataFrame
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64 Float64 Int64 Int64 String7
1 Adelie Torgersen 39.1 18.7 181 3750 male
2 Adelie Torgersen 39.5 17.4 186 3800 female
3 Adelie Torgersen 40.3 18.0 195 3250 female
4 Adelie Torgersen 36.7 19.3 193 3450 female
5 Adelie Torgersen 39.3 20.6 190 3650 male
6 Adelie Torgersen 38.9 17.8 181 3625 female
7 Adelie Torgersen 39.2 19.6 195 4675 male
8 Adelie Torgersen 41.1 17.6 182 3200 female
9 Adelie Torgersen 38.6 21.2 191 3800 male
10 Adelie Torgersen 34.6 21.1 198 4400 male
Code
   describe(df)
7×7 DataFrame
Row variable mean min median max nmissing eltype
Symbol Union… Any Union… Any Int64 DataType
1 species Adelie Gentoo 0 String15
2 island Biscoe Torgersen 0 String15
3 bill_length_mm 43.9928 32.1 44.5 59.6 0 Float64
4 bill_depth_mm 17.1649 13.1 17.3 21.5 0 Float64
5 flipper_length_mm 200.967 172 197.0 231 0 Int64
6 body_mass_g 4207.06 2700 4050.0 6300 0 Int64
7 sex female male 0 String7
Code
```{julia}
#| label: fig-simpson-paradox
#| fig-cap: simpson-paradox on palmerpenguins
#| fig-align: center
#| warning: false
axis = (width = 300, height = 300)
penguin_bill = data(df) * mapping(
    :bill_length_mm => (t -> t / 10) =>"bill_length",
    :bill_depth_mm => (t -> t / 10) =>"bill_depth",
)
pipeline1=penguin_bill * linear() * mapping(color = :species)
pipeline2=penguin_bill * mapping(color = :species)*visual(Scatter;strokewidth=1,strokcolor=:black)
pipeline3=penguin_bill *linear()

plt =(pipeline1+pipeline2+pipeline3)*visual(alpha = 0.5)
draw(plt; axis = axis)
```
Figure 1: simpson-paradox on palmerpenguins