由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
JobHunting版 - 某大公司面试题
相关主题
most clicked urls in the last 5 mins, 1hr, 24 hrs?问一道面试题,现在好像很流行这种题
新鲜面试题面试题分享:把数字翻译成字符串
在线紧急求助一道system design面试题,面经内附面试题
贡献一下:本版上搜集的 Google 面试题贡献两道google面试题
请教一个面试题,在大量URL中找一个有wildcard的patternData scientist / Machine Learning Engineer 相关面试题
面试题,大规模url求重复 讨论CS 面试题总结(1)
请问一个面试题:"System described. Blank page returned. How would you test it?"Amazon Second phone
G面试题请教贴华人版程序员简历,大家帮忙拍砖成印度版
相关话题的讨论汇总
话题: results话题: click话题: search话题: url话题: search2
进入JobHunting版参与讨论
1 (共1页)
a********r
发帖数: 218
1
Open google.com, you type some words in the edit box to search something, it
will return lots of search results. Among these returning results (that is
, website link), you may only CLICK some results that are interesting to you
. The system will record the "CLICK"action. Finally, you will have the
search results (i.e. url) and "CLICK" informatin at hand.
Question: how do you find the similarity of these searching?
That's exactly what the interviewer told me, he didn't want to supply any
additional information.
请大牛指点, 多谢!
k****n
发帖数: 369
2
so you have a set of (query => list of search results with "Click" info),
and he want to find the similarities between queries, right?
The first straightforward one is, if Qa => Cx, Qb => Cx, then Qa is similar
to Qb.
Furthermore, the Query and URLs can form a graph, with edges defined as
E(Q, URL) where Query lead to a URL. The URLs can be connected if they
are returned in one query using certain weighting strategy.
After this simplification, we can run an all-source shortest path
in this graph in O(n^3) time, and the distance between Qs can be
treated as the (reverse of) similarities of the queries.
Another way. To vectorize the queries in a big "URL" space, each URL is
a dimension in the vector space, define some weighting strategy. Then calcu
-late the similaries between the URL vectors.
There must also exist other solutions, since this is a very open problem.

it
is
you

【在 a********r 的大作中提到】
: Open google.com, you type some words in the edit box to search something, it
: will return lots of search results. Among these returning results (that is
: , website link), you may only CLICK some results that are interesting to you
: . The system will record the "CLICK"action. Finally, you will have the
: search results (i.e. url) and "CLICK" informatin at hand.
: Question: how do you find the similarity of these searching?
: That's exactly what the interviewer told me, he didn't want to supply any
: additional information.
: 请大牛指点, 多谢!

a********r
发帖数: 218
3
@Kevinn,
Thanks for your solution.
The interviewer particularly mentioned: some search results clicked, some
not. That is, for each query, the search results can be divided into two
groups: "clicked" and "non-clicked". How do you handle this? assign more
weight to"clicked"?

similar

【在 k****n 的大作中提到】
: so you have a set of (query => list of search results with "Click" info),
: and he want to find the similarities between queries, right?
: The first straightforward one is, if Qa => Cx, Qb => Cx, then Qa is similar
: to Qb.
: Furthermore, the Query and URLs can form a graph, with edges defined as
: E(Q, URL) where Query lead to a URL. The URLs can be connected if they
: are returned in one query using certain weighting strategy.
: After this simplification, we can run an all-source shortest path
: in this graph in O(n^3) time, and the distance between Qs can be
: treated as the (reverse of) similarities of the queries.

k*j
发帖数: 153
4
i think i was asked the same question long time ago when i interviewed with
them
my answer was like:
for two searches, you get something like the following, each is a link, and
they are in order.
search 1: ABCDEF
search 2 : FABEDG
and for each link in the search1, you find its position in search2, and then
calculate the absolute difference of two positions. for example, for link "A", it is
position 0 in search 1, and position 1 in search2. the diff is abs(1-0)=1.
another example, for link "F", in seach1, it is position5, for search2, it is position0,
so diff is abs(0-5) = 5. you compute the abs diff for each link in seach1, if can't find it in search2,
you put N, N is the length of the array results.(in this case is 6). You then sum
up all the differences to get a total score to indicate the similarity between these two searches.
The smaller the more similar they are.
we can also improve the results by using the "click" information, we can put
more weight on the "clicked" links when summing up the score. you can tune
this parameter using some training sets
1 (共1页)
进入JobHunting版参与讨论
相关主题
贴华人版程序员简历,大家帮忙拍砖成印度版请教一个面试题,在大量URL中找一个有wildcard的pattern
g电面,新鲜面经面试题,大规模url求重复 讨论
一道大数据的题,讨论一下请问一个面试题:"System described. Blank page returned. How would you test it?"
面试: Take home projectG面试题请教
most clicked urls in the last 5 mins, 1hr, 24 hrs?问一道面试题,现在好像很流行这种题
新鲜面试题面试题分享:把数字翻译成字符串
在线紧急求助一道system design面试题,面经内附面试题
贡献一下:本版上搜集的 Google 面试题贡献两道google面试题
相关话题的讨论汇总
话题: results话题: click话题: search话题: url话题: search2