所有软件外包项目 Gray arrow bg Web Scraping Software

Web Scraping Software 资金已经托管 线上项目,线下洽谈,智城安排

发包方 : Carol oates 接包方 : Rsglobalinfotech 状态 :完成
项目编号 : 97693
项目预算 : $1,000-5,000
开发周期 : 7 天
技能 : Search
发布日期 : 2010-02-24

描述

I am looking to have a marketing research tool created.



This tool will take a domain name and perform a domain name search using the Site Explorer tool ( siteexplorer.search.yahoo.com ) from yahoo. The idea is to find all of the webpages of a website that has been indexed by yahoo. When searched yahoo returns 1000 URLs of the website that has been indexed by them. The tool also provides a TSV file that can be downloaded.



This program will need to automatically perform the search and then download the TSV file that is provided.



Once the program grabs the TSV file it needs to pull all of the URL information from the file. There will be other information inside the file but I need the program to extract only the URLs inside the file.



The program must use proxies because only a certain amount of searches can be done from the same IP address.



The program needs to be built for maximum speed as well.



Once the URLs are gathered the URLs will be passed on to a CSV file. There will be a template provided that shows the information that should be used in the CSV file.



The program must be a desktop application and compatible with windows.



The program must be in wizard format.



The information that is put into the wizard will be the information that is passed on to create the CSV file in the last step.



Once the URLs are gathered from the TSV file there must be an option to save the list of URLs.



The program must be able to perform these searches for multiple websites, up to 1000, at one time.


Meaning I want to be able to load a list of up to 1000 websites and the program will perform this action for all of them, starting with the first one, then the second and so on.



The last step is the generation of the CSV files and there must be an option to save them to the computer.



The program needs to search without the http://www. , which means it will only use, for example, about.com , not http://www.about.com . The software does not need to add the http://www. To the URL, it just needs to do the search the way the URLs are entered.



竞标

请您先登录,然后提交此项目的竞标方案。
还不是智城用户? 智城期待您的加入,请注册成为我们的一员吧!
Project ad2