Hearthstone: Mana cost
/When evaluating the base cost of a card, you might be tempted to say that most of the cost lives in the base attributes; so how would you evaluate that statement for truthiness?
Looking at a linear regression fit from all Minions with an expressed cost >0:
First, just look at the quantization in the residuals-vs-fitted. Pretty, isn't it? That suggests that the mechanics associated with these cards have clear, distinguishable values; this is Blizzard's own statisticians at work.
Next is the fit to a normal distribution; not bad, and as you'd expect, the outliers are the ones whose mechanics strongly influence mana cost (in either direction).
Residuals:
Min 1Q Median 3Q Max
-4.7648 -0.5829 -0.1133 0.4963 11.3935
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.02670 0.14274 -0.187 0.852
Attack 0.54873 0.04468 12.282 < 2e-16 ***
Health 0.53042 0.04092 12.962 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.21 on 270 degrees of freedom
Multiple R-squared: 0.7617, Adjusted R-squared: 0.76
F-statistic: 431.6 on 2 and 270 DF, p-value: < 2.2e-16
So a basic LM fit is surprisingly expressive - moreso by far than I was expecting, and it matches Trump's views on base cost of card being a very important factor. In fact, even without filtering out all of the cards that represent more unusual cases, it covers more than 76% of the variance in the dataset.
We can do better, though - if we're looking to fit a model for base cost, let's restrict the model to those mechanics that don't actually express any other mechanics.
In other words, let's go build a linear model that fits only the relationship between Mana, Attack, and Health for minions with no other mechanics.
The resulting fit is better, too - We're at 93% of the variance of the data covered by the model.
Residuals:
Min 1Q Median 3Q Max
-1.9916 -0.2755 0.2203 0.2287 0.7730
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.17189 0.13566 -1.267 0.213
Attack 0.50416 0.05960 8.460 3.56e-10 ***
Health 0.43905 0.05918 7.419 7.90e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4879 on 37 degrees of freedom
Multiple R-squared: 0.9338, Adjusted R-squared: 0.9302
F-statistic: 261.1 on 2 and 37 DF, p-value: < 2.2e-16
Of course, the problem with this is that we're now restricted to 37 degrees of freedom, and there's still quite a bit of scatter between the fit and residuals.
In fact, if you take this model, and use it to predict mana cost for all cards in the deck if the card had no other mechanics than its base value, you get something like this:
It bodes well for my tuning of cost evaluation models for Deckalytics.