所有软件外包项目 Gray arrow bg Text Parsing Program

Text Parsing Program 资金已经托管 线上项目,线下洽谈,智城安排

发包方 : Polly 接包方 : Iphonelab 状态 :完成
项目编号 : 96418
项目预算 : $1,000-5,000
开发周期 : 7 天
技能 : Mac OS X
发布日期 : 2010-02-01


I am looking for a developer to write a program with the following function. 

Tear apart file and extract sentences from Word, and PDFs. 
Tear apart a specified webpage and extract sentences and words. Sentences would be anything greater than x words. 
Sentences need to be extracted. As well an option for phrases should be available as well. 
Needs to handle English, Chinese, Spanish, Japanese (all UTF-8 encoded), other languages as well. 
2 or more languages could exist within the same document - but each character should be recognized. 
The program if possible should be written in a scripting language that would run on a webserver, MAC or PC (feel free to propose what you believe the best approach). 

Data Processing
The target file would be the name of the file that was read but the extension would need to be changed to .txt. 
The target file would possess 1 sentence per line.
A target txt file should be written to a specified path on a local or remote server (could be ftp).


还不是智城用户? 智城期待您的加入,请注册成为我们的一员吧!
Project ad2