Definition¶
We try to predict value of "Loves Troll 2", and try to find which predictors provide the most information to the prediction.
- In theory we could use R^2, but $R^2$ only works well with continuous data
- MI help us to quantify the relationship between mixture of discrete and continuous variables
- MI will categorize continuous data into different bins for transforming it to discrete data
Steps in calculating MI¶
Property¶
- $MI=0$ when the predictor only have one value (never change → provide no information)
- Larger MI means predictors provide more information to target variables. But MI also related to datasets, MI from different datasets are not comparable.
- MI is like a sign-regardless correlation. When the predictor is 100% possitive correlated / negative correlated to target variables, we will get the same MI