由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
JobHunting版 - 请教一道电面算法题
相关主题
Uber 电面借人气请教个G题
一道google interview的题目amazon OR scientist面经
何解?求问G面试题,非普通的DP
map reduce word countany idea?
电面不好,求bless。这题怎么答?给个算法题
A第二次电面Word ladder 2这种题目很吃力
Leetcode一题(非OJ)Amazon 居然电面放鸽子
弱问一道G家电面题eBay onsite面经,已挂
相关话题的讨论汇总
话题: sentences话题: sentence话题: words话题: problem话题: solution
进入JobHunting版参与讨论
1 (共1页)
g*****n
发帖数: 21
1
There's a book and each sentence consists of several words. How to find the
minimal set of words which can used by the reader to understand half of
total number of the sentences. Each sentence is readable if the reader know
all the words.
I can only think of brute force solution -- do word counting on every
possible collection of half number of sentences and choose the minimal. But
that's exponential C(n,n/2).
Any better solution?
Thanks
g*********s
发帖数: 1782
2
which company so bt on phone interview?

the
know
But

【在 g*****n 的大作中提到】
: There's a book and each sentence consists of several words. How to find the
: minimal set of words which can used by the reader to understand half of
: total number of the sentences. Each sentence is readable if the reader know
: all the words.
: I can only think of brute force solution -- do word counting on every
: possible collection of half number of sentences and choose the minimal. But
: that's exponential C(n,n/2).
: Any better solution?
: Thanks

f****g
发帖数: 313
3
听起来像NLP的题,解释解释如何pick up这些words的呢?
g*********s
发帖数: 1782
4
i think it's a graph algorithm...

【在 f****g 的大作中提到】
: 听起来像NLP的题,解释解释如何pick up这些words的呢?
f****g
发帖数: 313
5
我好像不太明白这道题?来给建个模把?

【在 g*********s 的大作中提到】
: i think it's a graph algorithm...
P*******b
发帖数: 1001
6
R

the
know
But

【在 g*****n 的大作中提到】
: There's a book and each sentence consists of several words. How to find the
: minimal set of words which can used by the reader to understand half of
: total number of the sentences. Each sentence is readable if the reader know
: all the words.
: I can only think of brute force solution -- do word counting on every
: possible collection of half number of sentences and choose the minimal. But
: that's exponential C(n,n/2).
: Any better solution?
: Thanks

g*********s
发帖数: 1782
7
each word is a type A node; each sentence is a type B node. word w in
sentence s denoted as an edge.
by doing so, we get a bi-partite graph G.
delete type A node from G as many as possible such that the graph has at
least half type B nodes left.

【在 f****g 的大作中提到】
: 我好像不太明白这道题?来给建个模把?
g*****n
发帖数: 21
8
a hedge fund :(

【在 g*********s 的大作中提到】
: which company so bt on phone interview?
:
: the
: know
: But

j**w
发帖数: 382
9
feature selection.
http://en.wikipedia.org/wiki/Feature_selection
A simple way.
(1) Find the shortest sentence. Put all of its word into a set S, and marked
this sentence.
(2) Among all remaining unmarked sentences, pick the sentence such that the
number of words which are not in S is the minimal. Add those "new" words
into S and mark the related sentence.
(3) Repeat (2) until half of the sentences are marked.
n*****y
发帖数: 361
10
It's interesting to think it as feature selection.
But your solution is only a heuristic, not guaranteed to find the minimum
set.
Notice the optimal solution not only prefers shorter sentence, but also
prefers words that are shared among multiple sentences.
Image there are 10 sentences, and 8 words as the following, the solution
will keep the longer sentences.
楼上的 bipartite graph 是正解. just my 2 cents.
S1 (W1, W2)
S2 (W1, W3)
S3 (W1, W4)
S4 (W2, W3)
S5 (W2, W4)
S6 (W3, W4)
S7 (W5)
S8 (W6)
S9 (W7)
S10 (W8)

marked
the

【在 j**w 的大作中提到】
: feature selection.
: http://en.wikipedia.org/wiki/Feature_selection
: A simple way.
: (1) Find the shortest sentence. Put all of its word into a set S, and marked
: this sentence.
: (2) Among all remaining unmarked sentences, pick the sentence such that the
: number of words which are not in S is the minimal. Add those "new" words
: into S and mark the related sentence.
: (3) Repeat (2) until half of the sentences are marked.

相关主题
A第二次电面借人气请教个G题
Leetcode一题(非OJ)amazon OR scientist面经
弱问一道G家电面题求问G面试题,非普通的DP
进入JobHunting版参与讨论
b*****e
发帖数: 474
11
A* works (of course only for very small size problems):
Each word is a 0, 1 variable x_i.
Solution is (x_0, x_1, .... x_n)
start state: (0, 0, 0....)
step: change one 0 to 1. cost:1
heuristic: median value of all #of unknown words for each sentence.
it should be 0 when half of sentences are totally recognized.
If number of sentences is small (say 20), then I think the brute force
method is not bad, as it requires not much space at all.

【在 n*****y 的大作中提到】
: It's interesting to think it as feature selection.
: But your solution is only a heuristic, not guaranteed to find the minimum
: set.
: Notice the optimal solution not only prefers shorter sentence, but also
: prefers words that are shared among multiple sentences.
: Image there are 10 sentences, and 8 words as the following, the solution
: will keep the longer sentences.
: 楼上的 bipartite graph 是正解. just my 2 cents.
: S1 (W1, W2)
: S2 (W1, W3)

f***g
发帖数: 214
12
大家看看DP行吗
d[k][j] 为从Sentence 1 到 Sentence j中,有k个句子被理解。每个cell保存在这k个
句子中unique的单词的数量。
d[k][j] = min{d[k][j-1], d[k-1][x] + #of unique words in sentence j ( x = 0.
...j-1 )}
问题是对于每一个cell都要maintain一个set,里面保存选出的这k个句子中的unique
word,这样才能保证每一个sentence考虑进来的时候可以算出还需要增加多少个unique
words.
j**w
发帖数: 382
13

minimum
Yes, in each step, if there is a tie. More work is needed.
The appropriate solution is based on the question itself. If the input
data are quite large (eg. all webpages in Internet), we need some
heuristic approaches because we can't put all words into memory.
However, if the input size is decent, the graph-based one should be the
better choice and DP might be over-skilled.

【在 n*****y 的大作中提到】
: It's interesting to think it as feature selection.
: But your solution is only a heuristic, not guaranteed to find the minimum
: set.
: Notice the optimal solution not only prefers shorter sentence, but also
: prefers words that are shared among multiple sentences.
: Image there are 10 sentences, and 8 words as the following, the solution
: will keep the longer sentences.
: 楼上的 bipartite graph 是正解. just my 2 cents.
: S1 (W1, W2)
: S2 (W1, W3)

c******c
发帖数: 2
14

Two questions regarding this problem:
1. Is the question asking for EXACTLY half of the sentences to be readable?
Or
just AT LEAST half to be readable? The latter seems to make more sense.
2. For the bipartite graph solution, how do we choose which of the Type A
nodes
to delete from graph G? I’m thinking of using a greedy approach, but not
sure
if this finds the minimum set of Type A nodes:
a. Find node n in A such that n’s degree is minimum
b. For each edge (n, b) where b in B
b.1 Delete edge (n, b)
b.2 For each edge (a, b) where a in A
b.2.1 Delete edge (a, b)
c. Delete node n from A
d. Goto a. if there are more than half nodes in B whose degree > 0

【在 g*********s 的大作中提到】
: each word is a type A node; each sentence is a type B node. word w in
: sentence s denoted as an edge.
: by doing so, we get a bi-partite graph G.
: delete type A node from G as many as possible such that the graph has at
: least half type B nodes left.

n*****y
发帖数: 361
15
DP won't work, because there is no ordering for those sentences.
The problem is combinatorial in nature.
Given the simple description of the problem, I thought it must be a well
known problem or equivalent to some of the well known set covering or
bipartite graph matching problems, but I cannot find the answer - I kind of
suck at these algorithms.
As far as interview goes, I believe if you can formulate it as a bipartite
graph and provide some search heuristics (greedy, branch and bound), it
should be good enough.

0.
unique

【在 f***g 的大作中提到】
: 大家看看DP行吗
: d[k][j] 为从Sentence 1 到 Sentence j中,有k个句子被理解。每个cell保存在这k个
: 句子中unique的单词的数量。
: d[k][j] = min{d[k][j-1], d[k-1][x] + #of unique words in sentence j ( x = 0.
: ...j-1 )}
: 问题是对于每一个cell都要maintain一个set,里面保存选出的这k个句子中的unique
: word,这样才能保证每一个sentence考虑进来的时候可以算出还需要增加多少个unique
: words.

S*******e
发帖数: 118
16
The 0-1 knapsack can be reduced to this problem when some constraints in
this problem are removed. For each item in the knapsack problem with weight
w and value v, construct w words and v sentences that contain exactly these
w words. Then whether we can cover n sentences with max-weight m is
equivalent to knapsack problem.
The major difference is that w and v need to be integers in this problem,
and w is no larger than a constant. This is exactly the condition where 0-1
knapsack can be solved in polynomial time. Please refer to http://en.wikipedia.org/wiki/Knapsack_problem, section "Dynamic programming solution".
Same algorithm should work for this problem (probably with some minor
modification since no minimum number of words is given in this problem).

the
know
But

【在 g*****n 的大作中提到】
: There's a book and each sentence consists of several words. How to find the
: minimal set of words which can used by the reader to understand half of
: total number of the sentences. Each sentence is readable if the reader know
: all the words.
: I can only think of brute force solution -- do word counting on every
: possible collection of half number of sentences and choose the minimal. But
: that's exponential C(n,n/2).
: Any better solution?
: Thanks

g*********s
发帖数: 1782
17
wow, i didn't think deep about the details but the outline seems
promising.

in
weight
these
problem,
where 0-1
http://en.wikipedia.org/wiki/Knapsack_problem, section "Dynamic
programming solution".
problem).

【在 S*******e 的大作中提到】
: The 0-1 knapsack can be reduced to this problem when some constraints in
: this problem are removed. For each item in the knapsack problem with weight
: w and value v, construct w words and v sentences that contain exactly these
: w words. Then whether we can cover n sentences with max-weight m is
: equivalent to knapsack problem.
: The major difference is that w and v need to be integers in this problem,
: and w is no larger than a constant. This is exactly the condition where 0-1
: knapsack can be solved in polynomial time. Please refer to http://en.wikipedia.org/wiki/Knapsack_problem, section "Dynamic programming solution".
: Same algorithm should work for this problem (probably with some minor
: modification since no minimum number of words is given in this problem).

P********l
发帖数: 452
18
knapsack won't work because the weights of later sentences depend on the
previous words collected.
it seems can be solved by linear programming. but I don't know how to setup
the equations now.
c******n
发帖数: 4965
19
不对吧, knapsack 里面的weight 都是有同样意义的。
这里的3-word sentence 可以是abc, or def, 当然是不一样的了

constraints in
weight
exactly these
problem,
where 0-1
http://en.wikipedia.org/wiki/Knapsack_problem, section "Dynamic
programming solution".
minor
problem).

【在 S*******e 的大作中提到】
: The 0-1 knapsack can be reduced to this problem when some constraints in
: this problem are removed. For each item in the knapsack problem with weight
: w and value v, construct w words and v sentences that contain exactly these
: w words. Then whether we can cover n sentences with max-weight m is
: equivalent to knapsack problem.
: The major difference is that w and v need to be integers in this problem,
: and w is no larger than a constant. This is exactly the condition where 0-1
: knapsack can be solved in polynomial time. Please refer to http://en.wikipedia.org/wiki/Knapsack_problem, section "Dynamic programming solution".
: Same algorithm should work for this problem (probably with some minor
: modification since no minimum number of words is given in this problem).

1 (共1页)
进入JobHunting版参与讨论
相关主题
eBay onsite面经,已挂电面不好,求bless。这题怎么答?
G intern电面面经A第二次电面
what does this mean?Leetcode一题(非OJ)
菜鸟求救 请大家看看我的代码有没有问题弱问一道G家电面题
Uber 电面借人气请教个G题
一道google interview的题目amazon OR scientist面经
何解?求问G面试题,非普通的DP
map reduce word countany idea?
相关话题的讨论汇总
话题: sentences话题: sentence话题: words话题: problem话题: solution