由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
DataSciences版 - 凑热闹转发一篇自己写的博文,轻拍
相关主题
[Data Science Project Case] Parsing URLSdata science 面试求教
DS需要会的手艺 真不少请教大家一个做feature的问题
借版面问个machine learning的问题Data scientist / Machine Learning Engineer 相关面试题 (转载)
求职要求clearanceNY Lead Data Scientist, Finance Credit
[Data Science Project Case] Topic Learning有没有做sentiment analysis的,求思路 (转载)
Data scientist / Machine Learning Engineer 相关面试题 (转载)training dataset和unbalanced dataset的设计
random forest 有没有可能保证某几个变量一直被选上Big data是下一个大坑吗
一个面试题(predictive model) (转载)pig能做iterative的问题吗?
相关话题的讨论汇总
话题: adaboost话题: weak话题: life话题: mistakes
进入DataSciences版参与讨论
1 (共1页)
g*********n
发帖数: 119
1
An introduction of AdaBoost/AdaCost algorithm, and a metaphor for life
(slides我就不贴这里了)
Today I gave my co-data-scientists at Kingstowne office a presentation of
the introduction of AdaCost. The feedback is surprisingly well so that I
think it might be a good idea to highlight some interesting thoughts in a
blog post to share with larger audience. The full deck of slides is attached
on the post.
The purpose of the presentation was to describe a classification algorithm,
which could be boring by nature. Fortunately we are talking about AdaCost
and AdaBoost, which I see as a perfect metaphor for our life. Here I'll
explain why.
First, some layman description of the algorithm.
Adaboost is short for Adaptive Boosting, which is an approach to
classification. By classification, I mean to label an observation as 1 or -1
, in the simplest form. For example, an image is of a dog or cat, stock
market is up or down, a truck shipment contains drug or not, etc. Basically,
each effort of classification (labelling an observation) is to make the
best decision based on the information available at the moment. Adaboost was
invented by two brilliant computer scientist, Yoav Freund and Robert
Schapire, based on some heuristic reasoning. Their reasoning is the
following:
First,each observation in the training data is equally assigned a weight.
Second, they start with a humble choice of tool to help a model user make
the decision. The humble choice of a tool is called weak classifiers, which
usually assume the simplest structure or configuration, for example, a one-
node decision tree or a one-parameter linear regression model. They are
called weak because they are flexible and usually provide mediocre
performance, to say the best.
Third, they fit the chosen weak classifier with the known data, and compare
the results to the ground truth. All the errors, or mis-classifications, are
identified, and based on some mechanism, emphasized by being assigned a
heavier weight. Now each observation has a newer weight, and those
misclassified got larger weights.
Forth, a new weak classifier is fit and generates outputs. During this round
of training, the mis-classified data from the last round may hopefully get
more attention so they could be correctly classified.
The iteration goes on till it reaches the maximum number of iterations
specified by user. The output will be the weighted sum of all those weak
classifiers during the iterations.
Adaboost works astonishingly well for the purpose of classification. Without
considering its solid mathematical explanation, I attribute its success to
the perfect sense it makes for our human life.
Adaboost is essentially a weighted sum of weak classifiers, or dumb
classifiers. The word "dumb" somehow makes a direct connection to myself,
reminding me of how limited my wisdom was and is. In that sense,
individually we all are weak classifiers, trying to make life decisions
based on whatever information we have in front of us, decisions like which
college to go to, which job to take, whom to marry, such and such. Just like
Adaboost has a big number of iterations for decision making, we are also
given a lot of chances in our life to make decisions, some of which we got
right, the others we got wrong. The younger we are, the likelihood of making
mistakes would be higher, just like in the earlier iteration a weak
classifier usually is more prone to making mistakes.
Then Adaboost says "It's O.K.", because you will have future chances to make
them right. The embarrassing thing is, the errors you made from the last
round are highlighted and put just in front of your face, urging you to
correct them. In stead of avoiding them, you'd better deal with them. In
Adaboost, each iteration assigns different weight to the weak classifier it
generates. This is like the decisions we make in our life has different
importance. A wrong marriage is presumed to be more harmful or costly than
going to a less-than-ideal college, for instance.
Like in Adaboost, you can only look forward in your life. We all make
mistakes, and whatever have happened has happened. In the next chance, we
will try to get it right. This is the spirit of Adaboost, and this is the
spirit of human life. You're given no chance to correct past mistakes, and
you may have plenty of chances to make right decisions in the future. So,
don't cry over spilt milk, buy a new bottle.
Last, the final model of Adaboost is a weighted sum of all weak classifiers,
which sounds like a naive but simple measurement of life value. How much it
is worth at the end of a person's life is viewed as all the values added up
for every decision he has made across his life.
I admit that the metaphor for life is not perfect, but it does help my co-
workers (a group of very smart young professionals) understand better the
essence of Adaboost. As its variant version, AdaCost weights mistakes not
equally (like Adaboost) but differentiate them by type of mistakes. One type
of mistake gets heavier penalty than the other. In the sense of metaphor,
moral mistakes would generally be more costly than honest mistakes.
That's about it. For more mathematical fun, please read the attached slides,
and feel free to ask me questions.
o**a
发帖数: 1315
2
真能写,没看完,让我联想到ANN,明天接着看
r*****d
发帖数: 346
3
赞一个!

attached
,

【在 g*********n 的大作中提到】
: An introduction of AdaBoost/AdaCost algorithm, and a metaphor for life
: (slides我就不贴这里了)
: Today I gave my co-data-scientists at Kingstowne office a presentation of
: the introduction of AdaCost. The feedback is surprisingly well so that I
: think it might be a good idea to highlight some interesting thoughts in a
: blog post to share with larger audience. The full deck of slides is attached
: on the post.
: The purpose of the presentation was to describe a classification algorithm,
: which could be boring by nature. Fortunately we are talking about AdaCost
: and AdaBoost, which I see as a perfect metaphor for our life. Here I'll

T*****u
发帖数: 7103
4
赞一个
1 (共1页)
进入DataSciences版参与讨论
相关主题
pig能做iterative的问题吗?[Data Science Project Case] Topic Learning
database startup の 广告~ (不是骗子)Data scientist / Machine Learning Engineer 相关面试题 (转载)
[请教]一个R问题random forest 有没有可能保证某几个变量一直被选上
Career opportunities-Data Scientist (Permanent positions)一个面试题(predictive model) (转载)
[Data Science Project Case] Parsing URLSdata science 面试求教
DS需要会的手艺 真不少请教大家一个做feature的问题
借版面问个machine learning的问题Data scientist / Machine Learning Engineer 相关面试题 (转载)
求职要求clearanceNY Lead Data Scientist, Finance Credit
相关话题的讨论汇总
话题: adaboost话题: weak话题: life话题: mistakes