最近面的两道题，求解答 - JobHunting版

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

JobHunting版 - 最近面的两道题，求解答

相关主题
● 讨论两道L家的设计题	● 这个题怎么答？
● 职位和 candidate 数量的关系	● 这是什么数据结构？
● 【工作机会】Startup Shape Security 招 data scientist	● Amazon Interview: algorithm for 2*LOG(N) up bound for search
● fb电面面经	● Facebook 笔试 User Operations Analyst
● 新鲜RocketFuels电面	● 请教一个C++试题！
● 电面被问到hadoop了	● 用一个C++面试题challenge一下大家
● CS 面试题总结(6)	● bloomberg 几题
● bloomberg 面经 - 挂面了	● 请教几个unix的面试题目。

相关话题的讨论汇总
话题: log话题: job话题: logs话题: strings话题: bin

进入JobHunting版参与讨论

(共1页)

j******s
发帖数: 48

Q1. Bin-String Distance
You are only given strings made of by '1's and '0's (no empty strings),
such as "1", "101", "0000". Let's call them bin-strings.
The term "strange-distance" of two bin-strings is defined as follows:
Rip off their common prefix, the "strange-distance" is therefore sum
of length of remaining strings. E.g., "111001001" and "1110101001"
after taking off common prefix "1110", the remains are "01001" and
"101001". So their strange distance is 11.
Your program input will be a series of bin-strings sent from STDIN. Each
bin-string makes one single line. Test cases will be written in a text
file and piped into your program by executing:
cat bin-strings.txt |
You output shall print out on screen the value of maximal strange-distance
made by input set.
For example, given test case:
1000
1101
111
Your program shall print number 6 on screen.
Note: O(n^2) brutal force algorithm is not accepted.
No comments are needed in your program. Write your code succinctly but
pay attention to coding style.
Q2. Log Processing
This is a design question. State your solution in two or three paragraphs
and possibly list technical difficulties that you might be facing. No code
or pseudo code are required. The question is describe as follows:
In ** we own data clusters and run various routine data processing jobs on
it. These jobs are executed remotely on a non-deterministic location,
controlled by a centralized job scheduler, in a manner that you can simply
imagine that the job scheduler is doing location transparent RPC calls. All
jobs consume data on HDFS and put results on HDFS. During their running time
, these jobs have no communication with scheduler. In another words, if a
job wants to leave status or statistics information, the only way is through
logging.
Now we have a need to be analyze job statistics, to answer the questions
such as “how long would one job usually take”, “how frequent would a kind
of job fail”, “for a particular date, did all jobs succeed”, “what is
the usual delay time between two jobs that have dependency relation”, etc.
We need to dig information from job logs and answer all those question. Note
that these log analytic questions cannot be predefined at any time.
Please propose an architecture solution of the logging system and describe
what the data structure of logs would likely to be, for the convenience for
above analytic question. This is a very open question, so don’t bother to
go into too much detail.
第一题我的算法不怎么样，第二题简直就毫无头绪，诚心请教系统设计类的题目应该如
何准备，如何训练自己这方面的能力？
谢谢各位大神的回答

J**9
发帖数: 835

Homework?

【在 j******s 的大作中提到】

: Q1. Bin-String Distance
: You are only given strings made of by '1's and '0's (no empty strings),
: such as "1", "101", "0000". Let's call them bin-strings.
: The term "strange-distance" of two bin-strings is defined as follows:
: Rip off their common prefix, the "strange-distance" is therefore sum
: of length of remaining strings. E.g., "111001001" and "1110101001"
: after taking off common prefix "1110", the remains are "01001" and
: "101001". So their strange distance is 11.
: Your program input will be a series of bin-strings sent from STDIN. Each
: bin-string makes one single line. Test cases will be written in a text

j******s
发帖数: 48

offline coding test. EA的

l*********8
发帖数: 4642

第一题，trie tree变体

【在 j******s 的大作中提到】

l*********8
发帖数: 4642

两道题目给多长时间？

【在 j******s 的大作中提到】

: offline coding test. EA的

j******s
发帖数: 48

两个小时

l*********8
发帖数: 4642

那不容易

【在 j******s 的大作中提到】

: 两个小时

j******s
发帖数: 48

继续坐等对第二题的看法

e***s
发帖数: 799

1000
1101
111
3 个 bin-string? 为什么是6？

j******s
发帖数: 48

1000
1101
common prefix 1
rip it off
000
101, total length is 6

【在 e***s 的大作中提到】

: 1000
: 1101
: 111
: 3 个 bin-string? 为什么是6？

相关主题
● 电面被问到hadoop了	● 这个题怎么答？
● CS 面试题总结(6)	● 这是什么数据结构？
● bloomberg 面经 - 挂面了	● Amazon Interview: algorithm for 2*LOG(N) up bound for search
进入JobHunting版参与讨论

e***s
发帖数: 799

所以，第三行的 111 是 TYPO？
这里只是两个 bin-string， 1000， 1101 ？

【在 j******s 的大作中提到】

: 1000
: 1101
: common prefix 1
: rip it off
: 000
: 101, total length is 6

j******s
发帖数: 48

好吧，自己回答一下思路，虚心求各路大神拍，给点意见
第一题应该是建立一个binary tree,然后recursive求每一个节点的左右子树的最高高
度,他们和就是在这一位上的largest distance，可以online做，但是需要在每个节点
保存左右子树的高度并且动态更新。
o(n) time + O(n) space
第二题,
Assumption:
1 Log file is typically very small, operations are append, delete and read.
2 Cluster is built on Hadoop, which has a chunk size of 64MB, roughly, and
it might be too large and inefficient if we use this chunk size for the log
file.
3 Log information is typically not very important, less effort is needed for
redundancy/reliability.
System Participants:
1 Job/user program
2 Data Cluster(File System), actually store the logs.
3 Log Analyzer
4 Logging System
Components of Logging System(Master Node):
1 Log Mapper, map job id/log id into log's handler id, and place where it
stores.
2 Namespace Mapper, transform log/job namespace, it should be persistent
3 Log Space Allocator, allocate space for job and add entry at log mapper
and namespace mapper.
4 Log indexer, build some index on logs if needed, like start time, end time
, etc..
Components of Log Analyzer:
1 Query parser, parse the query into some request of logs
2 Log Fetcher, fetch logs according to job id/ log id/ or some other
identification information.
3 Response Generator: generate statistical response according to the query.
A typical work flow for appending logs:
1 A job is running on the cluster, it ask the master(logging system) to
allocate space for logging, it should know roughly how much it need to use
to allocate enough space.
2 Master return back the log handler and its location, job connect the
corresponding data node(It might need to write two or more for reliability)
to write logs.
A typical work flow for reading logs:
1 Analyzer parses the log query and generate a serious of log request.
2 Analyzer asks master to locate the logs
3 Analyzer generates response according to the logs information.
Log data structure:
{Job Id,Job Status, Parent Id, Start Time, End Time, Exceptions, Run Time
Information}
Potential Problems:
1 Determine the chunk size of logs is very important, we hope to store all
the mapping of logs at master node in memory.
2 Caching could enhance analyzer's speed, and since we do not update the
logs, we do not need to worry about cache refresh
3 The degree of redundant depends how accurate we want the statistics of
logs to be.
4 Redundant logging will cause consistence problem.
5 Generating useful index is the key point of speeding up performance.

c***s
发帖数: 192

第一道题我同学面EA的时候碰到过。
要证明两个定理：
(1). 包含有最长距离的两个字符串s1和s2中至少有一个是最长的字符串。
(2). 任意一个最长的字符串总是可以找到另外一个字符串组成最长距离。
这道题的复杂度就是O(N).

【在 j******s 的大作中提到】

j******s
发帖数: 48

就是面EA的题.

【在 c***s 的大作中提到】

: 第一道题我同学面EA的时候碰到过。
: 要证明两个定理：
: (1). 包含有最长距离的两个字符串s1和s2中至少有一个是最长的字符串。
: (2). 任意一个最长的字符串总是可以找到另外一个字符串组成最长距离。
: 这道题的复杂度就是O(N).

j******s
发帖数: 48

假设这个两个定理都成立，那么请问有别的解决方法么？求大神指引

【在 c***s 的大作中提到】

b*******e
发帖数: 217

For problem (1), using a TRIE data structure. Each node track the current
max sum below that node. keep updating it when a new string is entering the
TRIE.

【在 j******s 的大作中提到】

p****e
发帖数: 3548

第一题的输入如果是
0
00
000
0000
...
之类的话，最后的结果是1，但你的算法估计得不到这个结果

.
log

【在 j******s 的大作中提到】

: 好吧，自己回答一下思路，虚心求各路大神拍，给点意见
: 第一题应该是建立一个binary tree,然后recursive求每一个节点的左右子树的最高高
: 度,他们和就是在这一位上的largest distance，可以online做，但是需要在每个节点
: 保存左右子树的高度并且动态更新。
: o(n) time + O(n) space
: 第二题,
: Assumption:
: 1 Log file is typically very small, operations are append, delete and read.
: 2 Cluster is built on Hadoop, which has a chunk size of 64MB, roughly, and
: it might be too large and inefficient if we use this chunk size for the log

c***s
发帖数: 192

这两个定理你很容易证明的。没明白你说的别的解决方法是什么？
如果用这两个定理，编程就很简单了：
扫一遍输入字符串，找出一个最长的，O(N).
然后用这个字符串跟其它的字符串计算距离，找出最大的一个。O(N).
这个解法应该就是EA想要的答案。

【在 j******s 的大作中提到】

: 假设这个两个定理都成立，那么请问有别的解决方法么？求大神指引

j*****y
发帖数: 1071

不错

【在 c***s 的大作中提到】

j******s
发帖数: 48

嗯，那应该是对于每一个节点(其实不需要）做Math.min(leftHeight,rightHeight)*2+
2？

【在 p****e 的大作中提到】

: 第一题的输入如果是
: 0
: 00
: 000
: 0000
: ...
: 之类的话，最后的结果是1，但你的算法估计得不到这个结果
:
: .
: log

相关主题
● Facebook 笔试 User Operations Analyst	● bloomberg 几题
● 请教一个C++试题！	● 请教几个unix的面试题目。
● 用一个C++面试题challenge一下大家	● 这个题目的比较好的方法是什么？
进入JobHunting版参与讨论

j******s
发帖数: 48

这是一个很好的答案，比用trie简单多了。谢谢！

【在 c***s 的大作中提到】

: 这两个定理你很容易证明的。没明白你说的别的解决方法是什么？
: 如果用这两个定理，编程就很简单了：
: 扫一遍输入字符串，找出一个最长的，O(N).
: 然后用这个字符串跟其它的字符串计算距离，找出最大的一个。O(N).
: 这个解法应该就是EA想要的答案。

j******s
发帖数: 48

Because it only contains 0-1, so it is actually a binary tree.

the

【在 b*******e 的大作中提到】

: For problem (1), using a TRIE data structure. Each node track the current
: max sum below that node. keep updating it when a new string is entering the
: TRIE.

j******s
发帖数: 48

但是这个定理，在面试的时候，真有人观察出来么，如果没有任何提示..

【在 c***s 的大作中提到】

G****A
发帖数: 4160

nice!
但是证明了（1）,还需要证明(2）么？

【在 c***s 的大作中提到】

c***s
发帖数: 192

我同学真的自己想出来了。
代码就50来行，第二天就拿到onsite。
不过他已经有facebook和linkedin的offer了，就没去面试。
他是搞算法的，可能会先往这个方面想吧。

【在 j******s 的大作中提到】

: 但是这个定理，在面试的时候，真有人观察出来么，如果没有任何提示..

c********t
发帖数: 5706

如果最长字符串有很多个，怎么办？
感觉还是binary trie靠谱

【在 c***s 的大作中提到】

c***s
发帖数: 192

需要吧，因为可能有很多最长字符串。。
不过也可以在1的证明里加一句说明即可。

【在 G****A 的大作中提到】

: nice!
: 但是证明了（1）,还需要证明(2）么？

j******s
发帖数: 48

就是遍历多一边的问题吧.

【在 c********t 的大作中提到】

: 如果最长字符串有很多个，怎么办？
: 感觉还是binary trie靠谱

c***s
发帖数: 192

这就是定理2的作用啊。
任意一个最长字符串都可以找到最长距离的字符串。
这样只需要任选一个最长字符串就可以了。

【在 c********t 的大作中提到】

: 如果最长字符串有很多个，怎么办？
: 感觉还是binary trie靠谱

j******s
发帖数: 48

好吧，看来路漫漫其修远兮啊.

【在 c***s 的大作中提到】

: 我同学真的自己想出来了。
: 代码就50来行，第二天就拿到onsite。
: 不过他已经有facebook和linkedin的offer了，就没去面试。
: 他是搞算法的，可能会先往这个方面想吧。

相关主题
● 一道程序题	● 职位和 candidate 数量的关系
● 回报本版，付A家面经	● 【工作机会】Startup Shape Security 招 data scientist
● 讨论两道L家的设计题	● fb电面面经
进入JobHunting版参与讨论

E****U
发帖数: 59

Not sure why we need trees for Problem 1. Please see the code [O(n)] below.
Did I misunderstand any thing?
int distance(const string& a, const string& b)
{
int n = a.length();
int m = b.length();

int min = n < m ? n : m;
int i = 0;
for (; i < min; ++i)
{
if (a[i] != b[i])
{
break;
}
}
return n - i + m - i;
}

s*******e
发帖数: 1630

第二题不就是著名startup Splunk正在做的么？就是针对machine log的BI软件

j******s
发帖数: 48

You only calculate the distance of two string.
There are n strings, and we need to calculate the max distance of any two
strings in these n strings.

.

【在 E****U 的大作中提到】

: Not sure why we need trees for Problem 1. Please see the code [O(n)] below.
: Did I misunderstand any thing?
: int distance(const string& a, const string& b)
: {
: int n = a.length();
: int m = b.length();
:
: int min = n < m ? n : m;
: int i = 0;
: for (; i < min; ++i)

j******s
发帖数: 48

看了下Splunk的主页，感觉他们家做的好像比这个题目高端很多的样子，而且也没找到
他们的infrastructure,估计是不公开的吧

【在 s*******e 的大作中提到】

: 第二题不就是著名startup Splunk正在做的么？就是针对machine log的BI软件

c********t
发帖数: 5706

哦，高，确实很好。没有数学脑根本想不到，更别说面试时想到了。你同学是个大牛啊。
我刚才理解歧义了，还在想2有啥用，难道不是任意一个字符串都可以找到“与其”距
离最长的字符串。
改写一下：
定理二任意一个最长字符串都可以找到另一个字符串，他们的距离是字符串组的最大
距离

【在 c***s 的大作中提到】

: 这就是定理2的作用啊。
: 任意一个最长字符串都可以找到最长距离的字符串。
: 这样只需要任选一个最长字符串就可以了。

(共1页)

进入JobHunting版参与讨论

相关主题
● 请教几个unix的面试题目。	● 新鲜RocketFuels电面
● 这个题目的比较好的方法是什么？	● 电面被问到hadoop了
● 一道程序题	● CS 面试题总结(6)
● 回报本版，付A家面经	● bloomberg 面经 - 挂面了
● 讨论两道L家的设计题	● 这个题怎么答？
● 职位和 candidate 数量的关系	● 这是什么数据结构？
● 【工作机会】Startup Shape Security 招 data scientist	● Amazon Interview: algorithm for 2*LOG(N) up bound for search
● fb电面面经	● Facebook 笔试 User Operations Analyst

相关话题的讨论汇总
话题: log话题: job话题: logs话题: strings话题: bin

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

boards

未名新帖统计// 7月16日

历史上的今天