凑热闹转发一篇自己写的博文，轻拍 - DataSciences版 - 未名存档

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

DataSciences版 - 凑热闹转发一篇自己写的博文，轻拍

相关主题
● [Data Science Project Case] Parsing URLS	● data science 面试求教
● DS需要会的手艺真不少	● 请教大家一个做feature的问题
● 借版面问个machine learning的问题	● Data scientist / Machine Learning Engineer 相关面试题 (转载)
● 求职要求clearance	● NY Lead Data Scientist, Finance Credit
● [Data Science Project Case] Topic Learning	● 有没有做sentiment analysis的，求思路 (转载)
● Data scientist / Machine Learning Engineer 相关面试题 (转载)	● training dataset和unbalanced dataset的设计
● random forest 有没有可能保证某几个变量一直被选上	● Big data是下一个大坑吗
● 一个面试题（predictive model） (转载)	● pig能做iterative的问题吗?

相关话题的讨论汇总
话题: adaboost话题: weak话题: life话题: mistakes

进入DataSciences版参与讨论

1

(共1页)

g*********n 发帖数: 119	1 An introduction of AdaBoost/AdaCost algorithm, and a metaphor for life (slides我就不贴这里了) Today I gave my co-data-scientists at Kingstowne office a presentation of the introduction of AdaCost. The feedback is surprisingly well so that I think it might be a good idea to highlight some interesting thoughts in a blog post to share with larger audience. The full deck of slides is attached on the post. The purpose of the presentation was to describe a classification algorithm, which could be boring by nature. Fortunately we are talking about AdaCost and AdaBoost, which I see as a perfect metaphor for our life. Here I'll explain why. First, some layman description of the algorithm. Adaboost is short for Adaptive Boosting, which is an approach to classification. By classification, I mean to label an observation as 1 or -1 , in the simplest form. For example, an image is of a dog or cat, stock market is up or down, a truck shipment contains drug or not, etc. Basically, each effort of classification (labelling an observation) is to make the best decision based on the information available at the moment. Adaboost was invented by two brilliant computer scientist, Yoav Freund and Robert Schapire, based on some heuristic reasoning. Their reasoning is the following: First,each observation in the training data is equally assigned a weight. Second, they start with a humble choice of tool to help a model user make the decision. The humble choice of a tool is called weak classifiers, which usually assume the simplest structure or configuration, for example, a one- node decision tree or a one-parameter linear regression model. They are called weak because they are flexible and usually provide mediocre performance, to say the best. Third, they fit the chosen weak classifier with the known data, and compare the results to the ground truth. All the errors, or mis-classifications, are identified, and based on some mechanism, emphasized by being assigned a heavier weight. Now each observation has a newer weight, and those misclassified got larger weights. Forth, a new weak classifier is fit and generates outputs. During this round of training, the mis-classified data from the last round may hopefully get more attention so they could be correctly classified. The iteration goes on till it reaches the maximum number of iterations specified by user. The output will be the weighted sum of all those weak classifiers during the iterations. Adaboost works astonishingly well for the purpose of classification. Without considering its solid mathematical explanation, I attribute its success to the perfect sense it makes for our human life. Adaboost is essentially a weighted sum of weak classifiers, or dumb classifiers. The word "dumb" somehow makes a direct connection to myself, reminding me of how limited my wisdom was and is. In that sense, individually we all are weak classifiers, trying to make life decisions based on whatever information we have in front of us, decisions like which college to go to, which job to take, whom to marry, such and such. Just like Adaboost has a big number of iterations for decision making, we are also given a lot of chances in our life to make decisions, some of which we got right, the others we got wrong. The younger we are, the likelihood of making mistakes would be higher, just like in the earlier iteration a weak classifier usually is more prone to making mistakes. Then Adaboost says "It's O.K.", because you will have future chances to make them right. The embarrassing thing is, the errors you made from the last round are highlighted and put just in front of your face, urging you to correct them. In stead of avoiding them, you'd better deal with them. In Adaboost, each iteration assigns different weight to the weak classifier it generates. This is like the decisions we make in our life has different importance. A wrong marriage is presumed to be more harmful or costly than going to a less-than-ideal college, for instance. Like in Adaboost, you can only look forward in your life. We all make mistakes, and whatever have happened has happened. In the next chance, we will try to get it right. This is the spirit of Adaboost, and this is the spirit of human life. You're given no chance to correct past mistakes, and you may have plenty of chances to make right decisions in the future. So, don't cry over spilt milk, buy a new bottle. Last, the final model of Adaboost is a weighted sum of all weak classifiers, which sounds like a naive but simple measurement of life value. How much it is worth at the end of a person's life is viewed as all the values added up for every decision he has made across his life. I admit that the metaphor for life is not perfect, but it does help my co- workers (a group of very smart young professionals) understand better the essence of Adaboost. As its variant version, AdaCost weights mistakes not equally (like Adaboost) but differentiate them by type of mistakes. One type of mistake gets heavier penalty than the other. In the sense of metaphor, moral mistakes would generally be more costly than honest mistakes. That's about it. For more mathematical fun, please read the attached slides, and feel free to ask me questions.
o**a 发帖数: 1315	2 真能写，没看完，让我联想到ANN，明天接着看
r*****d 发帖数: 346	3 赞一个！ attached , 【在 g*********n 的大作中提到】 : An introduction of AdaBoost/AdaCost algorithm, and a metaphor for life : (slides我就不贴这里了) : Today I gave my co-data-scientists at Kingstowne office a presentation of : the introduction of AdaCost. The feedback is surprisingly well so that I : think it might be a good idea to highlight some interesting thoughts in a : blog post to share with larger audience. The full deck of slides is attached : on the post. : The purpose of the presentation was to describe a classification algorithm, : which could be boring by nature. Fortunately we are talking about AdaCost : and AdaBoost, which I see as a perfect metaphor for our life. Here I'll
T*****u 发帖数: 7103	4 赞一个

1

(共1页)

进入DataSciences版参与讨论

相关主题
● pig能做iterative的问题吗?	● [Data Science Project Case] Topic Learning
● database startup の广告~ (不是骗子)	● Data scientist / Machine Learning Engineer 相关面试题 (转载)
● ［请教］一个R问题	● random forest 有没有可能保证某几个变量一直被选上
● Career opportunities-Data Scientist (Permanent positions)	● 一个面试题（predictive model） (转载)
● [Data Science Project Case] Parsing URLS	● data science 面试求教
● DS需要会的手艺真不少	● 请教大家一个做feature的问题
● 借版面问个machine learning的问题	● Data scientist / Machine Learning Engineer 相关面试题 (转载)
● 求职要求clearance	● NY Lead Data Scientist, Finance Credit

相关话题的讨论汇总
话题: adaboost话题: weak话题: life话题: mistakes

未名新帖统计// 7月16日

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

* 这里只显示发帖超过25的版面，努力灌水吧:-)