Wednesday, 15 April 2015

Sitecore Pipelines - ExcludeRobots

This post documents Sitecore's ExcludeRobots pipeline and is part of a series providing information on all the pipelines and processors in involved in a Sitecore 8 request. It's not a line by line account, but all the key logic is described.

I'd like the post to develop over time, so if you find any inaccuracies, would like to contribute more information, or have useful links, then please leave a comment. Better still, contact me on Twitter.

All information has been written with reference to Sitecore 8 (rev 141212)

Sitecore's analytics system should be reserved for genuine visits from humans. The ExcludeRobots pipeline helps to ensure this by identifying whether the current visitor is in fact a known robot.



TryObtainCachedResult

Namespace: Sitecore.Analytics.Pipelines.ExcludeRobots
Assembly: Sitecore.Analytics
The TryObtainCachedResult processor is responsible for checking whether the ExcludeRobots pipeline has already successfully determined the status of the visitor during this session. It first makes the following checks:

  • The current HttpContext's Session property must not be null.
  • The Session property must contain the key "SC_ANALYTICS_EXCLUDE_REQUEST".

If these criteria are both met, then args.IsInExcludeList is set to the value obtained from the session object, and the pipeline is aborted.

The rest of the pipeline is then dedicated to setting that value in session.



CheckUserAgent

Namespace: Sitecore.Analytics.Pipelines.ExcludeRobots
Assembly: Sitecore.Analytics
The CheckUserAgent processor is responsible for looking up the current request's user agent in an exclusion list specified in the "analyticsExcludeRobots/excludedUserAgents" node of Sitecore's configuration.

In order for the processor to run, the current request object's UserAgent property must not be null. If it isn't null then the processor looks up its value in the user agent exclusion list (AnalyticsSettings.Robots.ExcludeList.ContainsUserAgent). If it's found in the exclusion list, then args.IsInExcludeList is set to true;

n.b. This processor is essentially the same as CheckIpAddress, but it looks for user agents in the ExcludeList class instead of IP addresses.



CheckIpAddress

Namespace: Sitecore.Analytics.Pipelines.ExcludeRobots
Assembly: Sitecore.Analytics
The CheckIpAddress processor is responsible for looking up the current request's IP address in an exclusion list specified in the "analyticsExcludeRobots/excludedIPAddresses" node of Sitecore's configuration.

In order for the processor to run, the current request object's UserHostAddress property must not be null. If it isn't null then the processor looks up its value in the user agent exclusion list (AnalyticsSettings.Robots.ExcludeList.ContainsIpAddress). If it's found in the exclusion list, then args.IsInExcludeList is set to true;

n.b. This processor is essentially the same as CheckUserAgent, but looks for IP addresses in the ExcludeList class instead of user agents.



AddResultToCache

Namespace: Sitecore.Analytics.Pipelines.ExcludeRobots
Assembly: Sitecore.Analytics

The AddResultToCache processor is responsible for registering the result of the pipeline in session. Storing the value means that the pipeline does not need to run again during subsequent requests from the visitor. For the processor to run, the args.HttpContext.Session object must not be null.

If the session is not null, then args.HttpContext.Session["SC_ANALYTICS_EXCLUDE_REQUEST"], is set the boolean value obtained from args.IsInExcludeList.