Mr.Ba.Code: Three Learning Principles

2014年11月23日星期日

Occam’s Razor: trimming down unnecessary explanation

The simplest model that fits the data is also the most plausible.

Sampling Bias: If the data is sampled in a biased way, learning will produce a similarly biased outcome.

data and testing should be both iid from P

random for validation vs. last records for test, that’s why still lost the contest

Consider using same distribution (sampling) in all of training, validation and testing phases

1. Emphasize weight in training if need

2. Match validation with test scenario as much as possible

Data Snooping

Red: using entire 8 years data for training although the performance is good (snooping)

Blue: using 6 years for training and 2 years for testing, the result is even negative

對犯人逼共久了，任誰都會招供的!!!

1. 避免偷看資料後決定模型

2. 時刻存著懷疑

Mr.Ba.Code