TechPerspect Blog: October 2015

Wednesday, October 14, 2015

SharePoint Crawl DB size increasing @20GB per day

Recently, one of my SharePoint farms faced an issue where the volume utilization of DB server (SharePoint crawl database) went upto 99% within 10 days.

Analysis:
- Looking at the crawl DB log in SharePoint, the incremental crawl was running every 1.5 hours with 0 success and 100% failures.

- There were about 961K errors with each incremental crawl scheduled at every 2 hours.

- There is a temporary folder that Crawl uses to place files temporarily on the server. The path is C:\Users\"SPAdminAccount"\AppData\Local\Temp\gthrsvc_OSearch14\

- I found out that this temporary folder was somehow missing / deleted.

Solution:

On creating the above mentioned folder manually, the success rate of crawl went high and there were very less failures (as expected)

Post this, the space utilization went about 10 MB a day which is expected during crawl.

Wednesday, October 7, 2015

SharePoint detect when crawl account is accessing your page

Many a times we put some code in our SharePoint pages to add an entry in a list, whenever a particular page is accessed. This is usually done for getting the page visit count or to keep the record of the latest visited page/document per user. However you may not want to add entries for the search account when it performs a crawl of your content source as this could cause serious performance issues; as we faced, since we were having multiple site collections and in those site collections we had our custom document set home pages implementing the same logic of updating a list whenever the page is accessed.

In order to fix this, I had to detect that whether the page is being accessed by normal user or by the SharePoint Crawl account. The best way I found to detect the page access is using "User-Agent". When the Crawl accesses the page, it adds following string as user agent in Request headers:

MS Search 6.0 Robot

User-Agent: Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 6.0 Robot)

I was simply able to detect whether the page is accessed by normal user or SharePoint Crawl by checking the following value:

if (!HttpContext.Current.Request.UserAgent.Contains(“MS Search 6.0 Robot”))
{

//Add your logic to add/update the list

}

This works for both SharePoint 2010 as well as 2013 since the Robots tag value is same for both the versions.

Pages

Search This Blog

Wednesday, October 14, 2015

SharePoint Crawl DB size increasing @20GB per day

Wednesday, October 7, 2015

SharePoint detect when crawl account is accessing your page