所有软件外包项目 Gray arrow bg java spider

java spider 资金已经托管 线上项目,线下洽谈,智城安排

发包方 : Anne 接包方 : Internationalcoders 状态 :完成
项目编号 : 94620
项目预算 : $1,000-5,000
开发周期 : 7 天
技能 : MySQL Java PHP Search
发布日期 : 2010-01-04

描述

I want a have spider modified or built which ever is easier. You can use existing opensource libraries or anything, it doesn't matter as long as it acheives the tasks.


I want to be able to run the spider as an applet and from the command line so that i can be execute as a cron job.



The spider must be able to accept command line arguments eg.


main(String args[]) { String var = args[0]}



and the applet should have a simple gui.



The spider should be able to take in the domain name and crawl that domain only unless the option is choosen for the spider to leave the domain. It must have the option to re-index if html page has changed.



It should check header status of a page and does not index unless the page is available, so status 200 etc.



--------------- Specs ---------------------------



Spider gets full html page contents


if the html tag i want to check for (eg <object></object>) is found then


Parse all html tags


get : array of tags i specify


example String getTags[]={"title","keyword"}



if keyword is empty/missing and description or title not empty then


split description at every word


return array of keywords limit to 250


else if title empty


attempt to extract keywords from html body up 250 words


if the html tag i checked for is not found then do not parse the page just get all the links from the page and continue crawling.



the crawler need to be able to return the values of html tags and their attributes that i specify.



I'd like the values returned to be in an associative array/map so that



myObject['title'] will contain the title


myObject['keyword'] will contain an array of keywords


myObject['tagName']['Attribute'] will get the attribute value of the html tag example


myObject['embed']['src']



Lastly i want the data to be inserted/indexed in my mySQL database but only if the html tag i checked for was found.




Please make sure you read and udnerstand the reqiurments. This will be integrated into one of my projects and it needs to be built according to my specs.



The spider can be a modification to the one found here


http://www.developer.com/java/other/article.php/1573761/Programming-a-Spider-in-Java.htm



or here


http://www.javaworld.com/javaworld/jw-11-2004/jw-1101-spider.html



or anything from the net


http://www.google.com/search?hl=en&source=hp&q=java +web+spider&aq=0&oq=java+web+spid&aqi=g2



or if you already have a class or library that does this.



It doesn't matter i just want a spider customized to do the above.



Escrow payment only... No automated bids please.


项目竞标

竞标

请您先登录,然后提交此项目的竞标方案。
还不是智城用户? 智城期待您的加入,请注册成为我们的一员吧!
Project ad2