由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Classified版 - DA/DS 求职刷题指南(上)- 含内推机会
进入Classified版参与讨论
1 (共1页)
S******1
发帖数: 1
1
最近很多小伙伴来咨询DA/DS的岗位。后来发现大家对DA/DS存在很多的误区。很多人认
为写好SQL和Python就足够应对面试,但实际上,数据科学需要什么技能?需要从哪些
方面去准备?面试过程中会问些什么?日常工作是怎样的?很多学生并不清楚。今天就
分享一篇帖子,来告诉大家想申请DA/DS的岗位,要从哪些方面去准备。
Data Scientist/Data Analyst 通常需要集中准备的分为以下几块内容:
* Machine Learning
* 统计,概率与 A/B testing
* Online coding(Python + R)
* SQL
* Product sense
* Project
* Extra Skills
一、 MachineLearning
1. 常见面试问题
* What is overfitting? / Please briefly describe what is bias vs. variance.
* How do you overcome overfitting? Please list 3-5 practical experience. /
What is 'Dimension Curse'? How to prevent?
* Please briefly describe the Random Forest classifier. How did it work? Any
pros and cons in practical implementation?
* Please describe the difference between GBM tree model and Random Forest.
* What is SVM? what parameters you will need to tune during model training?
How is different kernel changing the classification result?
* Briefly rephrase PCA in your own way. How does it work? And tell some
goods and bads about it.
* Why doesn't logistic regression use R^2?
* When will you use L1 regularization compared to L2?
* List out at least 4 metrics you will use to evaluate model performance and
tell the advantage for each of them. (F1 score, ROC curve, recall, etc...)
* What would you do if you have > 30% missing value in an important field
before building the model?
2. 相关资料准备
* Coursera 上 Andrew Ng 的 Machine learning 课程: https://www.coursera.org/
learn/machine-learning 算得上考古级别的课程了,内容有些老旧但是很经典,很适
合商学院 BA 专业的从 0 开始补齐 ML 的背景知识
* 【15 hours of expert ML videos】: https://www.dataschool.io/15- hours-of-
expert-machine-learning-videos/
* 《ISLR》(一个免费链接直通车),入门神书
* Practical Statistics for Data Scientists: 50 Essential Concepts》,很实用
的一本书, 专讲一些细小知识,不深但是读完会感觉多了些对 ML 的理解。
* Medium-Towards Data Science 专题,比如 Machine Learning 101 (https://
medium.com/machine-learning-101)这个小专题,非常浅显易懂,适合初学者用具象的
方式理解抽象算法
* StackOverflowhttps://stackoverflow.com/自然也是不能漏掉的,学 data 或者
编程总会遇到很细枝末节的问题,这些一般文章里没有,所以就需要求助社群的力量了

* DataCamp:Machine Learning A-Zhttps://lnkd.in/gXqdBsQ
二、统计,概率与A/B Testing
1. 常见面试问题
* What is p-value? What is confidence interval? Explain them to a product
manager or non-technical person.. (很明显人家不想让你回答: 画个正态分布然后
两边各卡 5%
* How do you understand the "Power" of a statistical test?
* If a distribution is right-skewed, what's the relationship between medium,
mode, and mean?
* When do you use T-test instead of Z-test? List some differences between
these two.
* Dice problem-1: How will you test if a coin is fair or not? How will you
design the process(有时会要求编程实现)? what test would you use?
* Dice problem-2: How to simulate a fair coin with one unfair coin?
* 3 door questions. (自行 google 吧,经典题之一)
* Bayes Questions: Tom takes a cancer test and the test is advertised as
being 99% accurate: if you have cancer you will test positive 99% of the
time, and if you don't have cancer, you will test negative 99% of the time.
If 1% of all people have cancer and Tom tests positive, what is the prob
that Tom has the disease? (非常经典的 cancer screen 的题,做会这一道,其他都
没问题了)
* How do you calculate the sample size for an A/B testing?
* If after running an A/B testing you find the fact that the desired metric(
i.e, Click Through Rate) is going up while another metric is decreasing(i.e.
, Clicks). How would you make a decision?
* Now assuming you have an A/B testing result reflecting your test result is
kind of negative (i.e, p-value ~= 20%). How will you communicate with the
product manager?
* If given the above 20% p-value, the product manager still decides to
launch this new feature, how would you claim your suggestions and alerts?
2. 相关资料准备
* A/B testing 的资料首推的是 Udacity 上免费的 A/B testing(by Google)的课, 同
学们的评 价都还不错,很适合全面的了解一下 A/Btesting。
* 其余的 A/B testing 的内容大多来自于 Medium 上的好文,原因是 A/B testing 是
一个 要和实际的业界应用场景结合的东西,只知道原理和基本不懂没啥区别。所以要
去看 一看业界的人写的关于 A/B testing 的文章,只 da 有带着案例看,才能懂面试
中的问题都应该怎么样回答。
* 还有就是如果有在工作的学长姐,长辈等等,一定要不吝啬的问 A/B 方面的问题。
他们说个十几二十分钟,能省下你很多时间去到处扒资料,原因同上条不解释。
* Stats 的话,有一个非常快的捡起一些统计学基础的内容是 Coursera 上 intro to
stats and prob 课程,很快,一个下午就可以看完。
* Udemy 课程:Data Science Career Guide - Interview Preparation, 还是很不错的
。课 程轻量,学起来无压力。
* 概率题对于大多数中国学生来说都没问题,都是高中学过的,稍加捡起就行。Udemy
的课就可以帮你捡起来
三、Online coding (Python+R)
1. 面试问题(这个考的五花八门,所以不敢说是最常见的)
* Report the biggest sum of a continuous 3 numbers in a list? with the
related index?
* Dynamic programming problem: Now you have 5 types of coins(1,2,3,5,8) and
a total sum(a big number, say 589). How many different combinations of coins
can you find to reach this total sum?
* Please write a function to reverse the key and value in a dictionary. When
you have repeated values, please only keep the first key as the new value.
* Similarly to the "gather" and "spread" functions in the tidyr package,
write a one by yourself and test it using XXX dataset.
* Given a log file with rows featuring a date, a number, and then a string
of names, parse the log file and return the count of unique names aggregated
by month. (我的不是这个原题,但是意思很像)
* Using python to calculate a 30-day rolling profit. (大致就是要用 python 写
一个 rolling window)
2. 相关资料准备
* 算法自然是逃不过 Leetcode 了,Easy 和 Medium 水平的刷一刷有利无害。
* Youtube 上讲算法的一些视频
* 划重点,大家在面 online coding 的轮次之前,千万记得去 glassdoor 上看一下会
不会 有人 share 一些题目。遇不到原题权当练手,遇到原题了的话简直不要太爽。 (
glassdoor --> a company --> interview question --> title)
* DataCamp:d
Intro to Python https://lnkd.in/grCsv8v
Intro to R https://lnkd.in/gKFiDZn
Data Wrangling Pydata (90min) https://lnkd.in/gEhF3-W
EDA (20min video) https://lnkd.in/gT8_RKh
Stats/Prob (Khan Academy) https://lnkd.in/gsyGpVu
* Udemy 家的两个课:Data Analysis with Pandas and Python 和 Python for Data
Science and Machine Learning Bootcamp。 非常简单易懂,上手率非常高。
* 一个好网站 real python
* 手上如果还有书就更好了,甩给你们一些选项: https://realpython.com/best-
python- books/
### 剩餘內容,下集待续...
[现有内推机会 - New Grads Friendly!]
1. 硅谷南湾智能电动汽车"EV"公司。设计,开发,制造和销售与先进的互联网,人工
智能和自动驾驶技术无缝集成的智能电动汽车。致力于内部研发和智能制造,以为客户
创造更好的出行体验。致力于通过技术和数据改造智能电动汽车,塑造未来的出行体验
招聘 Entry Level [Data Platform Engineer] | [Machine Learning Infrastructure
Engineer]
全职起薪 $75000,Sponsor OPT/Ext/H1b
2. 南加州Banking App研发商,致力于创造可增强美国集体潜力的金融机会。其金融工
具,包括借记卡和支出帐户,可帮助超过800万客户进行银行业务,制定预算,避免透
支费用,找到工作并建立信贷。合作方包含Mark Cuban,Norwest Venture Partners,
Section 32和Financial Venture Studios等
招聘 Entry Level [Data Engineer]
全职起薪 $70000,Sponsor OPT/Ext/H1b
3. 为合作商家打造的一站式SAAS平台,包含Instagram,YouTube, Tiktok等数百万网
红资源,是品牌扩大知名度最有效的工具。通过人工智能,大数据分析,和丰富的网红
营销经验,为数以百计的品牌量身定制专属网红营销方案
招聘 Entry Level [Data Analyst]
全职起薪 $72000,Sponsor OPT/Ext/H1b/GC
[Job Descriptions/Requirements]
Data Engineer
* Provide seamless and timely data access for your users;
* Build reliable and dependable ETL;
* Build and maintain production machine learning infrastructure;
* Troubleshoot complex issues in distributed systems;
* Debate data processing philosophies and methodologies with your team;
* Familiar with Python, Java, SQL
Machine Learning Engineer
* Profile large-scale training jobs and identify/resolve bottlenecks;
* Increase training speed by mixed-precision, faster database design and
preprocess optimization;
* Work with Infra to build hyper-parameter tuning pipeline and experiments
database;
* Work with Infra to build release pipeline that include model-pruning,
model to car release and writing GPU.
Data Analyst
* Assist product manager to deal with daily product development, delivery,
and client communication duties
* Conduct research on different business issues in a group based on data in
Google Analytics
* Work in a group to improve new product marketing copy-writing for used as
an introduction and new products directly to customers
* Put together pages of competitive product analysis independently through
collecting and analyzing required information from the finical annual report
and official website and presented reports on the weekly meetings
* Work in Business Processing Re-Engineering group to remove redundancy and
optimize current sales, marketing processes by using ARIS Express
* Prepare Dashboards using calculations, parameters, calculated fields,
groups, sets, and hierarchies in Tableau
* Publish Tableau dashboard on Tableau Server or Tableau Online and embedded
them into the portal.
### 对学习资源、内推机会感兴趣同学,私信咨询!
1 (共1页)
进入Classified版参与讨论