1. Occam’s Razor
2. Sampling Bias
3. Data Snooping
Occam’s Razor: trimming down unnecessary explanation
The simplest model that fits the data is also the most plausible.
Sampling Bias: If the data is sampled in a biased way, learning will produce a similarly biased outcome.
data and testing should be both iid from P
random for validation vs. last records for test, that’s why still lost the contest
Consider using same distribution (sampling) in all of training, validation and testing phases
1. Emphasize weight in training if need
2. Match validation with test scenario as much as possible
Data Snooping
Red: using entire 8 years data for training although the performance is good (snooping)
Blue: using 6 years for training and 2 years for testing, the result is even negative
對犯人逼共久了,任誰都會招供的!!!
1. 避免偷看資料後決定模型
2. 時刻存著懷疑
沒有留言:
張貼留言