所有软件外包项目 Gray arrow bg Build a "status" Page for the Nutch Se...

Build a "status" Page for the Nutch Searchengine 资金已经托管 线上项目,线下洽谈,智城安排

发包方 : Ronald clark 接包方 : Iphonevogue 状态 :完成
项目编号 : 91468
项目预算 : 多于$100
开发周期 : 7 天
技能 : Java JEE JSP Apache Search
发布日期 : 2009-11-09

描述


Nutch is a Java based Web-Search engine. While it can run on clusters of hundreds of machines it can also be run on a single host and can provide search results via a few JSP pages provided with nutch.



Crawling would be accomplished by something like <code>./bin/nutch crawl starturls.txt -dir crawl -depth 2 -topN 30000</code> and the HTML interface by dropping <code>nutch-1.0.war</code> into you favorite servlet container (I use Jetty).



Your task is to buils a JSP single page allowing to view statistis about the current search index. For that you need to use the lucene API. Probably the study of the sourcecode of the tool "Luke" can show you exactly how to query the index (see http://www.getopt.org/luke/#)



The page should display

* number of documents

* number of terms

* index last modified. Date in http://www.faqs.org/rfcs/rfc3339.html format

* Any statistics you can get on the crawldb. http://is.gd/4Q7Jp http://issues.apache.org/jira/browse/NUTCH-558 and http://is.gd/4Q7Ny might provide pointers



This page will be used by us to monitor if the nutch instance is "healty", still adding pages etc. Nutch is run on an intranet spidering about two dozen hosts.

竞标

请您先登录,然后提交此项目的竞标方案。
还不是智城用户? 智城期待您的加入,请注册成为我们的一员吧!
Project ad2