Pages

Search This Blog

Wednesday, October 14, 2015

SharePoint Crawl DB size increasing @20GB per day


Recently, one of my SharePoint farms faced an issue where the volume utilization of DB server (SharePoint crawl database) went upto 99% within 10 days.

Analysis:
- Looking at the crawl DB log in SharePoint, the incremental crawl was running every 1.5 hours with 0 success and 100% failures.

- There were about 961K errors with each incremental crawl scheduled at every 2 hours.









- There is a temporary folder that Crawl uses to place files temporarily on the server. The path is C:\Users\"SPAdminAccount"\AppData\Local\Temp\gthrsvc_OSearch14\

- I found out that this temporary folder was somehow missing / deleted.

Solution:

On creating the above mentioned folder manually, the success rate of crawl went high and there were very less failures (as expected)









Post this, the space utilization went about 10 MB a day which is expected during crawl.

Wednesday, October 7, 2015

SharePoint detect when crawl account is accessing your page


Many a times we put some code in our SharePoint pages to add an entry in a list, whenever  a particular page is accessed. This is usually done for getting the page visit count or to keep the record of the latest visited page/document per user. However you may not want to add entries for the search account when it performs a crawl of your content source as this could cause serious performance issues; as we faced, since we were having multiple site collections and in those site collections we had our custom document set home pages implementing the same logic of updating a list whenever the page is accessed.

In order to fix this, I had to detect that whether the page is being accessed by normal user or by the SharePoint Crawl account. The best way I found to detect the page access is using "User-Agent". When the Crawl accesses the page, it adds following string as user agent in Request headers:

MS Search 6.0 Robot
User-Agent: Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 6.0 Robot)

I was simply able to detect whether the page is accessed by normal user or SharePoint Crawl by checking the following value:

if (!HttpContext.Current.Request.UserAgent.Contains(“MS Search 6.0 Robot))
        {
          //Add your logic to add/update the list
        }

This works for both SharePoint 2010 as well as 2013 since the Robots tag value is same for both the versions.