所有软件外包项目 Gray arrow bg Web Crawler and Search Feature

Web Crawler and Search Feature 资金已经托管 线上项目,线下洽谈,智城安排

发包方 : Paul thatcher 接包方 : Loveisp 状态 :完成
项目编号 : 91810
项目预算 : $1,000-5,000
开发周期 : 20 天
技能 : JavaScript PHP
发布日期 : 2009-11-17

描述

This is a Web crawler/search combination tool that can be used on any site on the Internet. It should be able to select a web site with specific criteria and then look for a certain item within the content. This is not a keyword search, this is more of a link checker within the content.

This should be able to be done on most big web sites. Example. If you go to http://www.wikihow.com/Find-a-Low-Airfare This article is an example of the 1000's they have on their site.

I would like to be able to crawl each category by page and find the anchor link domains that are not http://www.wikihow.com or advertising links, so in this case we would want to populate a text file with
http://www.airfarewatchdog.com
http://southwest.com
http://www.airbank-travel.com
http://www.priceline.com

Then i would like to see if these domains are available. Obviously the newer the article the less likely, so I would rather it search the categories by oldest article first.

Let me know if this makes sense. There are several sites like this, so I would want it to work on all of them.

My goal is to go through an extemelly large web site with tons of pages of content without having to do it manually. The sites are not mine, I just want to access them. Almost every page on this site has links within the content. I'm not talking about all the links on the page, just the 1-3 links with in the belly of the body. I then want to have it bring the links into a text file or better yet, check and see if that domain is available for purchase.

A few things that need to be done are categories, so it doesn't search the whole site at the same time. It can be broken down. I should be able to do this on multiple sites. Is this something you can do?

I want to be able to scan exisiting articles on sites like wikihow.com and pull out the links inside the articles that are not wiki or ads. The sites I had listed were examples of what would get pulled out.

Then I want to check and see if airfarewatchdog.com is available and so on.

项目竞标

接包方 国家/地区
通过实名认证 拥有案例
5
Buzhidao
2
Early-software
1
Loveisp (中标)
0
Wenlovejob

竞标

请您先登录,然后提交此项目的竞标方案。
还不是智城用户? 智城期待您的加入,请注册成为我们的一员吧!
Project ad2