Hearthstone: Mana cost

When evaluating the base cost of a card, you might be tempted to say that most of the cost lives in the base attributes; so how would you evaluate that statement for truthiness?

Looking at a linear regression fit from all Minions with an expressed cost >0:

Linear regression model of all Minion cards with mana cost > 0

First, just look at the quantization in the residuals-vs-fitted. Pretty, isn't it? That suggests that the mechanics associated with these cards have clear, distinguishable values; this is Blizzard's own statisticians at work.

Next is the fit to a normal distribution; not bad, and as you'd expect, the outliers are the ones whose mechanics strongly influence mana cost (in either direction).

Residuals:
    Min      1Q  Median      3Q     Max 
-4.7648 -0.5829 -0.1133  0.4963 11.3935 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.02670    0.14274  -0.187    0.852    
Attack       0.54873    0.04468  12.282   < 2e-16 ***
Health       0.53042    0.04092  12.962   < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.21 on 270 degrees of freedom
Multiple R-squared:  0.7617,    Adjusted R-squared:   0.76 
F-statistic: 431.6 on 2 and 270 DF,  p-value: < 2.2e-16

So a basic LM fit is surprisingly expressive - moreso by far than I was expecting, and it matches Trump's views on base cost of card being a very important factor. In fact, even without filtering out all of the cards that represent more unusual cases, it covers more than 76% of the variance in the dataset.

We can do better, though - if we're looking to fit a model for base cost, let's restrict the model to those mechanics that don't actually express any other mechanics.

In other words, let's go build a linear model that fits only the relationship between Mana, Attack, and Health for minions with no other mechanics.

Linear regression model of all Minion cards with mana cost > 0 and no other mechanics.

The resulting fit is better, too - We're at 93% of the variance of the data covered by the model.

Residuals:
    Min      1Q  Median      3Q     Max 
-1.9916 -0.2755  0.2203  0.2287  0.7730 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.17189    0.13566  -1.267    0.213    
Attack       0.50416    0.05960   8.460 3.56e-10 ***
Health       0.43905    0.05918   7.419 7.90e-09 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4879 on 37 degrees of freedom
Multiple R-squared:  0.9338,    Adjusted R-squared:  0.9302 
F-statistic: 261.1 on 2 and 37 DF,  p-value: < 2.2e-16

Of course, the problem with this is that we're now restricted to 37 degrees of freedom, and there's still quite a bit of scatter between the fit and residuals.

In fact, if you take this model, and use it to predict mana cost for all cards in the deck if the card had no other mechanics than its base value, you get something like this:

It bodes well for my tuning of cost evaluation models for Deckalytics.