请教一道电面算法题 - JobHunting版

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

JobHunting版 - 请教一道电面算法题

相关主题
● Uber 电面	● 借人气请教个G题
● 一道google interview的题目	● amazon OR scientist面经
● 何解?	● 求问G面试题，非普通的DP
● map reduce word count	● any idea?
● 电面不好，求bless。这题怎么答？	● 给个算法题
● A第二次电面	● Word ladder 2这种题目很吃力
● Leetcode一题（非OJ）	● Amazon 居然电面放鸽子
● 弱问一道G家电面题	● eBay onsite面经，已挂

相关话题的讨论汇总
话题: sentences话题: sentence话题: words话题: problem话题: solution

进入JobHunting版参与讨论

(共1页)

g*****n
发帖数: 21

There's a book and each sentence consists of several words. How to find the
minimal set of words which can used by the reader to understand half of
total number of the sentences. Each sentence is readable if the reader know
all the words.
I can only think of brute force solution -- do word counting on every
possible collection of half number of sentences and choose the minimal. But
that's exponential C(n,n/2).
Any better solution?
Thanks

g*********s
发帖数: 1782

which company so bt on phone interview?

the
know
But

【在 g*****n 的大作中提到】

: There's a book and each sentence consists of several words. How to find the
: minimal set of words which can used by the reader to understand half of
: total number of the sentences. Each sentence is readable if the reader know
: all the words.
: I can only think of brute force solution -- do word counting on every
: possible collection of half number of sentences and choose the minimal. But
: that's exponential C(n,n/2).
: Any better solution?
: Thanks

f****g
发帖数: 313

听起来像NLP的题，解释解释如何pick up这些words的呢？

g*********s
发帖数: 1782

i think it's a graph algorithm...

【在 f****g 的大作中提到】

: 听起来像NLP的题，解释解释如何pick up这些words的呢？

f****g
发帖数: 313

我好像不太明白这道题？来给建个模把？

【在 g*********s 的大作中提到】

: i think it's a graph algorithm...

P*******b
发帖数: 1001

R

the
know
But

【在 g*****n 的大作中提到】

g*********s
发帖数: 1782

each word is a type A node; each sentence is a type B node. word w in
sentence s denoted as an edge.
by doing so, we get a bi-partite graph G.
delete type A node from G as many as possible such that the graph has at
least half type B nodes left.

【在 f****g 的大作中提到】

: 我好像不太明白这道题？来给建个模把？

g*****n
发帖数: 21

a hedge fund :(

【在 g*********s 的大作中提到】

: which company so bt on phone interview?
:
: the
: know
: But

j**w
发帖数: 382

feature selection.
http://en.wikipedia.org/wiki/Feature_selection
A simple way.
(1) Find the shortest sentence. Put all of its word into a set S, and marked
this sentence.
(2) Among all remaining unmarked sentences, pick the sentence such that the
number of words which are not in S is the minimal. Add those "new" words
into S and mark the related sentence.
(3) Repeat (2) until half of the sentences are marked.

n*****y
发帖数: 361

It's interesting to think it as feature selection.
But your solution is only a heuristic, not guaranteed to find the minimum
set.
Notice the optimal solution not only prefers shorter sentence, but also
prefers words that are shared among multiple sentences.
Image there are 10 sentences, and 8 words as the following, the solution
will keep the longer sentences.
楼上的 bipartite graph 是正解. just my 2 cents.
S1 (W1, W2)
S2 (W1, W3)
S3 (W1, W4)
S4 (W2, W3)
S5 (W2, W4)
S6 (W3, W4)
S7 (W5)
S8 (W6)
S9 (W7)
S10 (W8)

marked
the

【在 j**w 的大作中提到】

: feature selection.
: http://en.wikipedia.org/wiki/Feature_selection
: A simple way.
: (1) Find the shortest sentence. Put all of its word into a set S, and marked
: this sentence.
: (2) Among all remaining unmarked sentences, pick the sentence such that the
: number of words which are not in S is the minimal. Add those "new" words
: into S and mark the related sentence.
: (3) Repeat (2) until half of the sentences are marked.

相关主题
● A第二次电面	● 借人气请教个G题
● Leetcode一题（非OJ）	● amazon OR scientist面经
● 弱问一道G家电面题	● 求问G面试题，非普通的DP
进入JobHunting版参与讨论

b*****e
发帖数: 474

A* works (of course only for very small size problems):
Each word is a 0, 1 variable x_i.
Solution is (x_0, x_1, .... x_n)
start state: (0, 0, 0....)
step: change one 0 to 1. cost:1
heuristic: median value of all #of unknown words for each sentence.
it should be 0 when half of sentences are totally recognized.
If number of sentences is small (say 20), then I think the brute force
method is not bad, as it requires not much space at all.

【在 n*****y 的大作中提到】

: It's interesting to think it as feature selection.
: But your solution is only a heuristic, not guaranteed to find the minimum
: set.
: Notice the optimal solution not only prefers shorter sentence, but also
: prefers words that are shared among multiple sentences.
: Image there are 10 sentences, and 8 words as the following, the solution
: will keep the longer sentences.
: 楼上的 bipartite graph 是正解. just my 2 cents.
: S1 (W1, W2)
: S2 (W1, W3)

f***g
发帖数: 214

大家看看DP行吗
d[k][j] 为从Sentence 1 到 Sentence j中，有k个句子被理解。每个cell保存在这k个
句子中unique的单词的数量。
d[k][j] = min{d[k][j-1], d[k-1][x] + #of unique words in sentence j ( x = 0.
...j-1 )}
问题是对于每一个cell都要maintain一个set，里面保存选出的这k个句子中的unique
word，这样才能保证每一个sentence考虑进来的时候可以算出还需要增加多少个unique
words.

j**w
发帖数: 382

minimum
Yes, in each step, if there is a tie. More work is needed.
The appropriate solution is based on the question itself. If the input
data are quite large (eg. all webpages in Internet), we need some
heuristic approaches because we can't put all words into memory.
However, if the input size is decent, the graph-based one should be the
better choice and DP might be over-skilled.

【在 n*****y 的大作中提到】

c******c
发帖数: 2

Two questions regarding this problem:
1. Is the question asking for EXACTLY half of the sentences to be readable?
Or
just AT LEAST half to be readable? The latter seems to make more sense.
2. For the bipartite graph solution, how do we choose which of the Type A
nodes
to delete from graph G? I’m thinking of using a greedy approach, but not
sure
if this finds the minimum set of Type A nodes:
a. Find node n in A such that n’s degree is minimum
b. For each edge (n, b) where b in B
b.1 Delete edge (n, b)
b.2 For each edge (a, b) where a in A
b.2.1 Delete edge (a, b)
c. Delete node n from A
d. Goto a. if there are more than half nodes in B whose degree > 0

【在 g*********s 的大作中提到】

: each word is a type A node; each sentence is a type B node. word w in
: sentence s denoted as an edge.
: by doing so, we get a bi-partite graph G.
: delete type A node from G as many as possible such that the graph has at
: least half type B nodes left.

n*****y
发帖数: 361

DP won't work, because there is no ordering for those sentences.
The problem is combinatorial in nature.
Given the simple description of the problem, I thought it must be a well
known problem or equivalent to some of the well known set covering or
bipartite graph matching problems, but I cannot find the answer - I kind of
suck at these algorithms.
As far as interview goes, I believe if you can formulate it as a bipartite
graph and provide some search heuristics (greedy, branch and bound), it
should be good enough.

0.
unique

【在 f***g 的大作中提到】

: 大家看看DP行吗
: d[k][j] 为从Sentence 1 到 Sentence j中，有k个句子被理解。每个cell保存在这k个
: 句子中unique的单词的数量。
: d[k][j] = min{d[k][j-1], d[k-1][x] + #of unique words in sentence j ( x = 0.
: ...j-1 )}
: 问题是对于每一个cell都要maintain一个set，里面保存选出的这k个句子中的unique
: word，这样才能保证每一个sentence考虑进来的时候可以算出还需要增加多少个unique
: words.

S*******e
发帖数: 118

The 0-1 knapsack can be reduced to this problem when some constraints in
this problem are removed. For each item in the knapsack problem with weight
w and value v, construct w words and v sentences that contain exactly these
w words. Then whether we can cover n sentences with max-weight m is
equivalent to knapsack problem.
The major difference is that w and v need to be integers in this problem,
and w is no larger than a constant. This is exactly the condition where 0-1
knapsack can be solved in polynomial time. Please refer to http://en.wikipedia.org/wiki/Knapsack_problem, section "Dynamic programming solution".
Same algorithm should work for this problem (probably with some minor
modification since no minimum number of words is given in this problem).

the
know
But

【在 g*****n 的大作中提到】

g*********s
发帖数: 1782

wow, i didn't think deep about the details but the outline seems
promising.

in
weight
these
problem,
where 0-1
http://en.wikipedia.org/wiki/Knapsack_problem, section "Dynamic
programming solution".
problem).

【在 S*******e 的大作中提到】

: The 0-1 knapsack can be reduced to this problem when some constraints in
: this problem are removed. For each item in the knapsack problem with weight
: w and value v, construct w words and v sentences that contain exactly these
: w words. Then whether we can cover n sentences with max-weight m is
: equivalent to knapsack problem.
: The major difference is that w and v need to be integers in this problem,
: and w is no larger than a constant. This is exactly the condition where 0-1
: knapsack can be solved in polynomial time. Please refer to http://en.wikipedia.org/wiki/Knapsack_problem, section "Dynamic programming solution".
: Same algorithm should work for this problem (probably with some minor
: modification since no minimum number of words is given in this problem).

P********l
发帖数: 452

knapsack won't work because the weights of later sentences depend on the
previous words collected.
it seems can be solved by linear programming. but I don't know how to setup
the equations now.

c******n
发帖数: 4965

不对吧， knapsack 里面的weight 都是有同样意义的。
这里的3-word sentence 可以是abc, or def, 当然是不一样的了

constraints in
weight
exactly these
problem,
where 0-1
http://en.wikipedia.org/wiki/Knapsack_problem, section "Dynamic
programming solution".
minor
problem).

【在 S*******e 的大作中提到】

(共1页)

进入JobHunting版参与讨论

相关主题
● eBay onsite面经，已挂	● 电面不好，求bless。这题怎么答？
● G intern电面面经	● A第二次电面
● what does this mean?	● Leetcode一题（非OJ）
● 菜鸟求救请大家看看我的代码有没有问题	● 弱问一道G家电面题
● Uber 电面	● 借人气请教个G题
● 一道google interview的题目	● amazon OR scientist面经
● 何解?	● 求问G面试题，非普通的DP
● map reduce word count	● any idea?

相关话题的讨论汇总
话题: sentences话题: sentence话题: words话题: problem话题: solution

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

boards

未名新帖统计// 7月16日

历史上的今天