由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Statistics版 - How to pre-index 100,000+ text files for keyword search?
相关主题
R: distance matrix问一个regression的弱问题
[合集] how to combine ATE files using R[合集] 问一个和统计相关的算法面试题 (转载)
US News Ranking-MPH-2008请问一个sas距离矩阵输出格式的问题(PROC DISTANCE)
诚心请教Mahalanobis Distance计算sas proc distance
请问有哪些学校有ONLINE DISTANCE BIOSTA/STA MS PROGRAM急问:需要take 一个grant funded position 吗?
请教一个多元和距离的问题online degree or distance education?
问一个和统计相关的算法面试题 (转载)sas大牛们这个要怎么实现呀
which university has long-distance pH.D program[转载]找工作中。。急求教:13.5万在湾区或者9。5万在DC,大家会选择哪个。。?
相关话题的讨论汇总
话题: files话题: search话题: text话题: index话题: pre
进入Statistics版参与讨论
1 (共1页)
S******y
发帖数: 1123
1
I have 100,000 + text files. The total size of those files are about 30 GBs.
I would like to pre-index those files regarding a bunch of keywords to
search.
For example, I type "cat" + "dog", the python pgm would return snippets of
text from those files (Just like Google search), sorted by the distances
between two words.
Is there a smart algorithm to do that?
I am thinking -
for 'cat', search all files, and record which file and which location the
word appear.
for 'dog', search all files, and re
D******n
发帖数: 2836
2
这个问CS好一点,IR的东西。

GBs.

【在 S******y 的大作中提到】
: I have 100,000 + text files. The total size of those files are about 30 GBs.
: I would like to pre-index those files regarding a bunch of keywords to
: search.
: For example, I type "cat" + "dog", the python pgm would return snippets of
: text from those files (Just like Google search), sorted by the distances
: between two words.
: Is there a smart algorithm to do that?
: I am thinking -
: for 'cat', search all files, and record which file and which location the
: word appear.

z**k
发帖数: 378
3
trie,这个最基本了

GBs.
of
the

【在 S******y 的大作中提到】
: I have 100,000 + text files. The total size of those files are about 30 GBs.
: I would like to pre-index those files regarding a bunch of keywords to
: search.
: For example, I type "cat" + "dog", the python pgm would return snippets of
: text from those files (Just like Google search), sorted by the distances
: between two words.
: Is there a smart algorithm to do that?
: I am thinking -
: for 'cat', search all files, and record which file and which location the
: word appear.

1 (共1页)
进入Statistics版参与讨论
相关主题
[转载]找工作中。。急求教:13.5万在湾区或者9。5万在DC,大家会选择哪个。。?请问有哪些学校有ONLINE DISTANCE BIOSTA/STA MS PROGRAM
Apple 电话面试 面经请教一个多元和距离的问题
有关feature selection的问题求助 (转载)问一个和统计相关的算法面试题 (转载)
求个 normalized euclidean distance 的公式which university has long-distance pH.D program
R: distance matrix问一个regression的弱问题
[合集] how to combine ATE files using R[合集] 问一个和统计相关的算法面试题 (转载)
US News Ranking-MPH-2008请问一个sas距离矩阵输出格式的问题(PROC DISTANCE)
诚心请教Mahalanobis Distance计算sas proc distance
相关话题的讨论汇总
话题: files话题: search话题: text话题: index话题: pre