所有软件外包项目 Gray arrow bg Sr. Java Developer

Sr. Java Developer 资金已经托管 线上项目,线下洽谈,智城安排

发包方 : Michelle taylor 状态 :竞标已结束
项目编号 : 563
项目预算 : $1,000-5,000
开发周期 : 7 天
发布日期 : 2008-08-27

描述

<br /><br /><h1>Introduction</h1>&nbsp; <p class="MsoNormal">The Bidder shall develop a web crawling and reporting tool using Nutch (http://lucene.apache.org/nutch/) distributed over a series of Linux workstations.<span>  </span>The tool shall index and search the World Wide Web to identify sites that contain certain keywords and URLs (via regular expressions).<span>  </span>These sites shall be cross referenced with information from a database and a public web service to join in additional site information such as traffic rank.<span>  </span>The database shall be provided by the Requester.<span>  </span>The tool shall generate a report displaying a categorized list of all sites containing specified keywords/URLs sorted by traffic rank or other attributes assigned to the identified sites.<span>  </span>The keywords and URLs to be displayed in the report shall be entered into a configuration file using regular expressions.</p>&nbsp; <p class="MsoNormal">The tool shall run continuously, producing a new report daily based on the information most recently generated by the tool.</p>&nbsp; <p class="MsoNormal">The report shall be of the following example structure:</p>&nbsp; <p class="MsoNormal" style="margin-bottom:.0001pt;line-height:normal;">------------------------------------------------------------------------------------------------------------------------------------------</p>&nbsp; <p class="MsoNormal" style="margin-bottom:.0001pt;line-height:normal;">Site<span>                              </span><span>           </span>Rank<span>                                       </span>Other Attribute #1<span>              </span>Other Attribute #2</p>&nbsp; <p class="MsoNormal" style="margin-bottom:.0001pt;line-height:normal;">------------------------------------------------------------------------------------------------------------------------------------------</p>&nbsp; <p class="MsoNormal" style="margin-bottom:.0001pt;line-height:normal;">“auto|car manufacturers”<span>                           </span></p>&nbsp; <p class="MsoNormal" style="margin-bottom:.0001pt;line-height:normal;"><a href="http://www.site1.com/" class="extlink" rel="nofollow" onmouseover="ddrivetip('You are about to go to a URL outside odesk.com - http://www.site1.com/', 250)" onmouseout="hideddrivetip()">www.site1.com</a><span>                </span><span>                </span>500<span>         </span><span>    </span><span>                                            </span>103<span>                                                         </span>+7</p>&nbsp; <p class="MsoNormal" style="margin-bottom:.0001pt;line-height:normal;"><a href="http://www.site2.com/" class="extlink" rel="nofollow" onmouseover="ddrivetip('You are about to go to a URL outside odesk.com - http://www.site2.com/', 250)" onmouseout="hideddrivetip()">www.site2.com</a><span>                </span><span>                </span>400 <span>                                                        </span>214<span>         </span><span>                                                </span>+9</p>&nbsp; <p class="MsoNormal" style="margin-bottom:.0001pt;line-height:normal;"><a href="http://www.site3.com/" class="extlink" rel="nofollow" onmouseover="ddrivetip('You are about to go to a URL outside odesk.com - http://www.site3.com/', 250)" onmouseout="hideddrivetip()">www.site3.com</a><span>                </span><span>                </span>300<span>         </span><span>                                                </span>120<span>         </span><span>                                                </span>+77</p>&nbsp; <p class="MsoNormal" style="margin-bottom:.0001pt;line-height:normal;"><a href="http://www.site4.com/" class="extlink" rel="nofollow" onmouseover="ddrivetip('You are about to go to a URL outside odesk.com - http://www.site4.com/', 250)" onmouseout="hideddrivetip()">www.site4.com</a><span>                </span><span>                </span>200<span>         </span><span>                                                </span>121<span>         </span><span>                                                </span>+13</p>&nbsp; <p class="MsoNormal" style="margin-bottom:.0001pt;line-height:normal;"><a href="http://www.site5.com/" class="extlink" rel="nofollow" onmouseover="ddrivetip('You are about to go to a URL outside odesk.com - http://www.site5.com/', 250)" onmouseout="hideddrivetip()">www.site5.com</a><span>                </span><span>                </span>100<span>         </span><span>                                                </span>210<span>                                                         </span>-8</p>&nbsp; <p class="MsoNormal" style="margin-bottom:.0001pt;line-height:normal;">“fast cars”<span>                           </span></p>&nbsp; <p class="MsoNormal" style="margin-bottom:.0001pt;line-height:normal;"><a href="http://www.site4.com/" class="extlink" rel="nofollow" onmouseover="ddrivetip('You are about to go to a URL outside odesk.com - http://www.site4.com/', 250)" onmouseout="hideddrivetip()">www.site4.com</a><span>                </span><span>                </span>700<span>         </span><span>                                                </span>20<span>           </span><span>                                                </span>+4</p>&nbsp; <p class="MsoNormal" style="margin-bottom:.0001pt;line-height:normal;"><a href="http://www.site7.com/" class="extlink" rel="nofollow" onmouseover="ddrivetip('You are about to go to a URL outside odesk.com - http://www.site7.com/', 250)" onmouseout="hideddrivetip()">www.site7.com</a><span>                </span><span>                </span>600<span>         </span><span>                                                </span>53<span>           </span><span>                                                </span>-10</p>&nbsp; <p class="MsoNormal" style="margin-bottom:.0001pt;line-height:normal;"><a href="http://www.site8.com/" class="extlink" rel="nofollow" onmouseover="ddrivetip('You are about to go to a URL outside odesk.com - http://www.site8.com/', 250)" onmouseout="hideddrivetip()">www.site8.com</a><span>                </span><span>                </span>500<span>         </span><span>                                                </span>11<span>           </span><span>                                                </span>-8</p>&nbsp; <p class="MsoNormal" style="margin-bottom:.0001pt;line-height:normal;"><a href="http://www.site9.com/" class="extlink" rel="nofollow" onmouseover="ddrivetip('You are about to go to a URL outside odesk.com - http://www.site9.com/', 250)" onmouseout="hideddrivetip()">www.site9.com</a><span>                </span><span>                </span>400<span>         </span><span>                                                </span>4<span>              </span><span>                                                </span>+5</p>&nbsp; <p class="MsoNormal" style="margin-bottom:.0001pt;line-height:normal;">www.site10.com<span>              </span>300<span>         </span><span>                                                </span>52<span>           </span><span>                                                </span>+1<span>  </span></p>&nbsp; <p class="MsoNormal" style="margin-bottom:.0001pt;line-height:normal;">------------------------------------------------------------------------------------------------------------------------------------------</p>&nbsp; <p class="MsoNormal" style="margin-bottom:.0001pt;line-height:normal;">Total<span>      </span><span>                                </span>998<span>         </span><span>                                                </span>908<span>         </span><span>                                                </span>90</p>&nbsp; <p class="MsoNormal" style="margin-bottom:.0001pt;line-height:normal;">------------------------------------------------------------------------------------------------------------------------------------------</p>&nbsp; <p class="MsoFootnoteText"><span class="MsoSubtleEmphasis"><span style="font-family:Calibri;">Source:</span></span> Fictitious data, for illustration purposes only</p>&nbsp; <p class="MsoNormal" style="text-align:center;">Figure 1 – Example Report</p>&nbsp; <p class="MsoNormal">The Bidder shall work at least four hours during U.S. Eastern Standard Time to facilitate communication with the Requester.</p>&nbsp; <p class="MsoNormal">The system shall be delivered in four phases, described below:</p>&nbsp; <h1>Phase 1 – Proof of Concept</h1>&nbsp; <p class="MsoNormal">The Phase 1 system shall operate on a single Linux workstation and will be limited to crawling and searching for specified keyword sets.<span>  </span>The report generated shall resemble that shown in Figure 1, except that sites will be organized according to rank based on link counts rather than attributes selected from the database.<span>  </span>Phase 1 shall be designed to be scalable to multiple servers, per Phase 3 below.</p>&nbsp; <h1>Phase 2 – Incorporate Site Attributes from Database and Web Service</h1>&nbsp; <p class="MsoNormal">In Phase 2 the attributes from the provided MySQL database shall be incorporated into the report, along with attributes from Amazon’s Alexa web service.<span>  </span>The account to access the Alexa web service shall be provided by the Requester.<span>  </span>Attributes shall include Alexa rank and Nielson volume estimate.</p>&nbsp; <h1>Phase 3 – Distributed Search and Algorithmic Tests</h1>&nbsp; <p class="MsoNormal">In Phase 3, the system shall be distributed across a family of servers, which will provide for much more extensive search and reporting capabilities.<span>  </span>Phase 3 also includes certain tests on pages that match the search criteria, for example, testing of certain URLs and scripts embedded in the page match algorithmic criteria.<span>  </span>These tests shall be specified as configuration parameters, for example, by using a scripting language such as Groovy, Jython, or similar (to be specified by Requester). The test results (e.g., pass/fail) shall be displayed in the report.</p>&nbsp; <h1>Phase 4 – Site Characterization</h1>&nbsp; <p class="MsoNormal">In Phase 4, Site Characterization Information (SCI) shall be included for each site.<span>  </span>Initially, SCI shall be the top twenty index words (based on word frequency in the site, not including words specified in an index exclusion configuration file.<span>  </span>The index exclusion configuration file shall be a line oriented text file, where each line is a regular expression.</p>&nbsp; <h1>Feature Summary (End of Phase 4)</h1>&nbsp; <p class="MsoListParagraphCxSpFirst" style="text-indent:-.25in;"><span style="font-family:Symbol;"><span>·<span style="font-family:'Times New Roman';font-style:normal;font-variant:normal;font-weight:normal;font-size:7pt;line-height:normal;">         </span></span></span>Crawls the WWW and captures sites containing keywords/URLs specified using regular expressions in a text-based configuration file.</p>&nbsp; <p class="MsoListParagraphCxSpMiddle" style="text-indent:-.25in;"><span style="font-family:Symbol;"><span>·<span style="font-family:'Times New Roman';font-style:normal;font-variant:normal;font-weight:normal;font-size:7pt;line-height:normal;">         </span></span></span>Adds Site Characterization Information, i.e., metadata, for each site matching the keyword criteria.<span>  </span>Initially, the SCI shall be the top twenty index words (by frequency), found in the site, excluding words or phrases that match regular expressions in a configurable exclusion file.</p>&nbsp; <p class="MsoListParagraphCxSpMiddle" style="text-indent:-.25in;"><span style="font-family:Symbol;"><span>·<span style="font-family:'Times New Roman';font-style:normal;font-variant:normal;font-weight:normal;font-size:7pt;line-height:normal;">         </span></span></span>Runs continuously, generating a daily report based on the recent information captured by the crawler.<span>  </span>The report and system shall run continuously and reliably, without requiring any manual maintenance activities on the part of the Requester.</p>&nbsp; <p class="MsoListParagraphCxSpMiddle" style="text-indent:-.25in;"><span style="font-family:Symbol;"><span>·<span style="font-family:'Times New Roman';font-style:normal;font-variant:normal;font-weight:normal;font-size:7pt;line-height:normal;">         </span></span></span>Adds information to the site report captured via:</p>&nbsp; <p class="MsoListParagraphCxSpMiddle" style="margin-left:1in;text-indent:-.25in;"><span style="font-family:'Courier New';"><span>o<span style="font-family:'Times New Roman';font-style:normal;font-variant:normal;font-weight:normal;font-size:7pt;line-height:normal;">   </span></span></span>MySQL database provided by the Requester</p>&nbsp; <p class="MsoListParagraphCxSpMiddle" style="margin-left:1in;text-indent:-.25in;"><span style="font-family:'Courier New';"><span>o<span style="font-family:'Times New Roman';font-style:normal;font-variant:normal;font-weight:normal;font-size:7pt;line-height:normal;">   </span></span></span>Amazon Alexa web service (bidder shall implement interface)</p>&nbsp; <p class="MsoListParagraphCxSpMiddle" style="margin-left:1in;text-indent:-.25in;"><span style="font-family:'Courier New';"><span>o<span style="font-family:'Times New Roman';font-style:normal;font-variant:normal;font-weight:normal;font-size:7pt;line-height:normal;">   </span></span></span>Algorithmic tests running against pages in the sites that match the search criteria.<span>  </span>The algorithmic tests shall be specific using a scripting language in a configuration file.</p>&nbsp; <p class="MsoListParagraphCxSpMiddle" style="text-indent:-.25in;"><span style="font-family:Symbol;"><span>·<span style="font-family:'Times New Roman';font-style:normal;font-variant:normal;font-weight:normal;font-size:7pt;line-height:normal;">         </span></span></span>Generates reports similar in format to that shown in Figure 1.<span>  </span>The report shall be in ASCII text format.</p>&nbsp; <p class="MsoListParagraphCxSpLast" style="text-indent:-.25in;"><span style="font-family:Symbol;"><span>·<span style="font-family:'Times New Roman';font-style:normal;font-variant:normal;font-weight:normal;font-size:7pt;line-height:normal;">         </span></span></span>Supplementary tools to aid in maintenance and monitoring of the system shall be provided by the Bidder.<span>  </span>However, the system shall operate reliably 24x7 without requiring manual maintenance activities.</p>&nbsp; <h1>Deliverables</h1>&nbsp; <p class="MsoListParagraphCxSpFirst" style="text-indent:-.25in;"><span><span>1.<span style="font-family:'Times New Roman';font-style:normal;font-variant:normal;font-weight:normal;font-size:7pt;line-height:normal;">       </span></span></span>Source code, libraries, and scripts need to run and administer the application</p>&nbsp; <p class="MsoListParagraphCxSpMiddle" style="text-indent:-.25in;"><span><span>2.<span style="font-family:'Times New Roman';font-style:normal;font-variant:normal;font-weight:normal;font-size:7pt;line-height:normal;">       </span></span></span>Demonstration of the system running continuously, generating reports as specified.</p>&nbsp; <p class="MsoListParagraphCxSpLast" style="text-indent:-.25in;"><span><span>3.<span style="font-family:'Times New Roman';font-style:normal;font-variant:normal;font-weight:normal;font-size:7pt;line-height:normal;">       </span></span></span>Documentation describing the use and maintenance of the system, including examples of the reports generated.</p>&nbsp; <h1>Development Environment</h1>&nbsp; <p class="MsoNormal">The development environment is:</p>&nbsp; <p class="MsoListParagraphCxSpFirst" style="text-indent:-.25in;"><span style="font-family:Symbol;"><span>·<span style="font-family:'Times New Roman';font-style:normal;font-variant:normal;font-weight:normal;font-size:7pt;line-height:normal;">         </span></span></span>Linux OS, multiple workstations, accessed via ssh</p>&nbsp; <p class="MsoListParagraphCxSpMiddle" style="text-indent:-.25in;"><span style="font-family:Symbol;"><span>·<span style="font-family:'Times New Roman';font-style:normal;font-variant:normal;font-weight:normal;font-size:7pt;line-height:normal;">         </span></span></span>Java Language (Nutch is written in Java)</p>&nbsp; <p class="MsoListParagraphCxSpMiddle" style="text-indent:-.25in;"><span style="font-family:Symbol;"><span>·<span style="font-family:'Times New Roman';font-style:normal;font-variant:normal;font-weight:normal;font-size:7pt;line-height:normal;">         </span></span></span>Scripting via Bourne shell and Perl</p>&nbsp; <p class="MsoListParagraphCxSpMiddle" style="text-indent:-.25in;"><span style="font-family:Symbol;"><span>·<span style="font-family:'Times New Roman';font-style:normal;font-variant:normal;font-weight:normal;font-size:7pt;line-height:normal;">         </span></span></span>MySQL database, loaded via csv files</p>&nbsp; <p class="MsoListParagraphCxSpLast" style="text-indent:-.25in;"><span style="font-family:Symbol;"><span>·<span style="font-family:'Times New Roman';font-style:normal;font-variant:normal;font-weight:normal;font-size:7pt;line-height:normal;">         </span></span></span>Scripting language (Groovy, Jython, or similar)</p>

竞标

请您先登录,然后提交此项目的竞标方案。
还不是智城用户? 智城期待您的加入,请注册成为我们的一员吧!
Project ad2