FAST Search – RSS Feed Crawling

Page copy protected against web site content infringement by Copyscape

Here are the steps to crawl a RSS Feed using FAST Search Web Crawler

1) Locate the RSS feed URL

2) Configure FAST Web Crawler

3) Search your RSS Content

Configure FAST Web Crawler:

Locate the xml file under C:\FASTSEARCH\etc\crawlerconfigtemplate-rss.xml. Make a copy of it and place it under C:\FASTSearch\bin\rss.xml

Before making the following changes, check the correct collection name is mentioned for DomainSpecification (example: sp) :

    <section name=”rss”>
            <!– List of start (seed) URIs pointing to RSS feeds. –>
            <attrib name=”start_uris” type=”list-string”>
              <member> http://yourRSSfeed URL </member>
              <!– <member> http://www.contoso.com/feed.rss </member> –>
            </attrib>

  <!– Delay in seconds between requests to a single site. You can mention 5 or 10 seconds –>
        <attrib name=”delay” type=”real”> 5 </attrib>

  <!– Length of crawl cycle expressed in minutes –>
        <attrib name=”refresh” type=”real”> 30 </attrib>

<! — include your domain so web crawler can download the content –> 

<section name=”include_domains”>
            <attrib name=”exact” type=”list-string”>
                <member> sathishtk.com </member>
            </attrib>
        </section>

<! — authenticate FAST Search web crawler to access SharePoint content.  –>

<section name=”passwd”>
 <attrib name=”http://www.sathishtk.com” type=”string”> username:password:sathishtk:auto </attrib>
 </section>

 Save changes and configure crawler to reflect changes made by executing the PowerShell command on the FAST Server

PS C:\FASTSearch\bin> crawleradmin -f rss.xml

 

Open up QR Server interface page – Example: http://localhost:13280 and test some search words present in your RSS feed. If the above steps are configured right, you should see the search result.