由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Java版 - 再请教一个lucene的问题
相关主题
再请教一个lucene的问题httpsession 问题
A question about how to segment intput text fileHelp! (转载)
清教关于编译原理怎么实现 twitter 桌面程序
请帮忙看看这个编译错误网上web services的免费书,哪本好点?c#或者java都可以。 (转载)
再问generic问题:tomcat编译错误有谁在做drupal programming吗? (转载)
新手问一个弱问题, 关于从stdin输入int或者其他数值的实现方法在手机上怎么远程控制一个房间的灯亮灯灭?
ConcurrentModificationExceptiongoodbug, aws上怎么实现web server,app server分离,2个firewall的?
How to prevent double submission in web form?how to pass a client certificate (x509) while calling a web service?
相关话题的讨论汇总
话题: lucene话题: documents话题: text话题: index话题: hit
进入Java版参与讨论
1 (共1页)
t*g
发帖数: 1758
1
我需要把token从lucene index中dump出来,可能要很多数据。怎么做呢?要用Term做吗
?我是一个新手。。谢谢!
t*******e
发帖数: 684
2
Depending on how index files are created in the first place, Lucene may
store a full copy of the original text to be indexed, such that you can
restore the text from the query results. Otherwise, you only get other
fields like IDs from the Hit Documents.
t*g
发帖数: 1758
3
We did store the original text. I don't have problems in dumping the
original text. I can dump it from through Hit Documents. However, what I
need is to dump the tokenized text. It doesn't exist in the Hit Documents.
Looks like I need to go into indices to get the tokenized documents. But I'm
new to Lucene, I can't find a way to do it. Need help! Thx.

【在 t*******e 的大作中提到】
: Depending on how index files are created in the first place, Lucene may
: store a full copy of the original text to be indexed, such that you can
: restore the text from the query results. Otherwise, you only get other
: fields like IDs from the Hit Documents.

t*******e
发帖数: 684
4

.
'm
This is impossible. Inverted index in a search engine stores terms
(tokens) in a term index file as the search key, which maps Document IDs,
and returns matched Documents as the query results. But not the other way around.
The terms you specified in you query are the tokens you may use to highlight
the original text.

【在 t*g 的大作中提到】
: We did store the original text. I don't have problems in dumping the
: original text. I can dump it from through Hit Documents. However, what I
: need is to dump the tokenized text. It doesn't exist in the Hit Documents.
: Looks like I need to go into indices to get the tokenized documents. But I'm
: new to Lucene, I can't find a way to do it. Need help! Thx.

b******y
发帖数: 9224
5
You will need to store the terms in lucene index. But, I don't see why you
want to do that.
1 (共1页)
进入Java版参与讨论
相关主题
how to pass a client certificate (x509) while calling a web service?再问generic问题:tomcat编译错误
java security新手问一个弱问题, 关于从stdin输入int或者其他数值的实现方法
这叫啥名词?ConcurrentModificationException
再请教一个lucene的问题How to prevent double submission in web form?
再请教一个lucene的问题httpsession 问题
A question about how to segment intput text fileHelp! (转载)
清教关于编译原理怎么实现 twitter 桌面程序
请帮忙看看这个编译错误网上web services的免费书,哪本好点?c#或者java都可以。 (转载)
相关话题的讨论汇总
话题: lucene话题: documents话题: text话题: index话题: hit