由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
StartUp版 - Nutch vs Lucene
相关主题
So how to install Lucene?需要做一个大型SNS网站,请报价
做了个job search网站问个特种搜索引擎的问题
How to tell if Nutch works properly?iloveopt比我牛多了
牛人指教再庆祝一下affiliate收入
MITBBS.COM估价84万美金,每日广告收入可达一千多美金 (转载)随便说说两个创业的误区
mitEbiz是本版哪位高人做的?数据挖掘软件, 和金融相关的, 有搞头吗?
想搭一个搜索引擎,哪种open source的crawler最好? (转载)一个小站的流量报告,谁翻译和解读一下?
http://blekko.com/breeze 封 ILoveOPT 在 StartUp
相关话题的讨论汇总
话题: lucene话题: nutch话题: crawler话题: engine话题: search
进入StartUp版参与讨论
1 (共1页)
I*****y
发帖数: 6402
1
如果想做一个和indeed.com或iloveOPT的myvisajobs.com一样的网站,哪个搜索引擎好
一些? 好像indeed.com公布出来的是用lucene.
b******y
发帖数: 9224
2
ya, indeed.com uses lucene
g********g
发帖数: 2172
3
Lucene is a index engine only. Nutch is a web crawler. The crawled results
were indexed with Lucene. So they are different products. Indeed used the
Lucene as the index engine but built their own crawler. Nutch is an general
purpose search engine crawler. It is too much work to modify it as a
vertical search engine crawler.

【在 I*****y 的大作中提到】
: 如果想做一个和indeed.com或iloveOPT的myvisajobs.com一样的网站,哪个搜索引擎好
: 一些? 好像indeed.com公布出来的是用lucene.

b******y
发帖数: 9224
4
good write-up.
Nutch is not good at all for production environment. It is good for playing
with.
To do a truly scalable crawler for a vertical market, you got to do it
yourself.
z********s
发帖数: 22
5
z********s
发帖数: 22
6
nutch is a crawler based on lucene.
here is mine search engine based on nutch.
http://malachi.thechristianlife.com/
it works pretty well.
here is a tutorial I wrote.
http://peterpuwang.googlepages.com/NutchGuideForDummies.htm
hope it helps.
I*****y
发帖数: 6402
7
thanks Peter

【在 z********s 的大作中提到】
: nutch is a crawler based on lucene.
: here is mine search engine based on nutch.
: http://malachi.thechristianlife.com/
: it works pretty well.
: here is a tutorial I wrote.
: http://peterpuwang.googlepages.com/NutchGuideForDummies.htm
: hope it helps.

b******y
发帖数: 9224
8
good, thanks for the info
w****n
发帖数: 48
9
Enterprise search engine: solr: based on lucene.
Good crawler: heritrix.
so far the best tools to build a search engine. Many commercial sites use
the two combination including some big companies.
b******y
发帖数: 9224
10
unfortunately, for doing a search engine, the crawler is the hardest part.
Search is relatively easy.
You get all sorts of crappy html pages and also all sort of crappy websites
to handle...
g********g
发帖数: 2172
11
还有一种方法就是用YAHOO, ALEXA 的DATA. 否则不是狭小领域的话 crawler 的带宽费
都付不起.

【在 w****n 的大作中提到】
: Enterprise search engine: solr: based on lucene.
: Good crawler: heritrix.
: so far the best tools to build a search engine. Many commercial sites use
: the two combination including some big companies.

1 (共1页)
进入StartUp版参与讨论
相关主题
breeze 封 ILoveOPT 在 StartUpMITBBS.COM估价84万美金,每日广告收入可达一千多美金 (转载)
有在san diego的吗?mitEbiz是本版哪位高人做的?
ruby on rails 和 php想搭一个搜索引擎,哪种open source的crawler最好? (转载)
请问这收费合理吗?http://blekko.com/
So how to install Lucene?需要做一个大型SNS网站,请报价
做了个job search网站问个特种搜索引擎的问题
How to tell if Nutch works properly?iloveopt比我牛多了
牛人指教再庆祝一下affiliate收入
相关话题的讨论汇总
话题: lucene话题: nutch话题: crawler话题: engine话题: search