Friday, 24 January 2014

Handling Duplicate Media Paths in Sitecore.

When Sitecore receives a media request it goes through a few key steps to convert the URL in to a form it can use:
  1. Sitecore identifies that the request is for a media item because the path begins with a media prefix defined in the web.config.
    www.website.com/~/media/Images/kitten.jpg

  2.  The prefix is removed and replaced with the root of the media library.
    /media library/Images/kitten.jpg

  3. The file extension is removed.
    /media library/Images/kitten
Sitecore now has a valid path for resolving an item and serves it to the user. However, because the file extension gets stripped during this process, Sitecore is quite greedy in terms of what it will accept as a valid URL. For example, the following URL's will all work:

www.website.com/~/media/Images/kitten.jpg
www.website.com/~/media/Images/kitten.png
www.website.com/~/media/Images/kitten.sausages

Marek Musielak has written an excellent article on this behaviour. He provides a solution which checks that the resolved item has the requested extension, and returns a 404 response if it doesn't.

Reading the article got me thinking about a situation where you might reasonably expect duplicate Sitecore media paths with different extensions. I've tried to find a solution to this and am using Marek's code as the starting point.


The scenario

It's not uncommon for a website to make the same file available in different formats. Let's say we have these files: Memo.docx, Memo.pdf, Memo.pages.

When I upload them to the same location in the media library, the extension and MIME type are stored in the items, but the the Sitecore path is identical for each.

When Sitecore receives a request to this path, it resolves that first one it finds. Regardless of the URL's file extension, the other two items always get ignored.

This is the point at which Marek's code intervenes, comparing the resolved item's extension with the URL's. If they're different then the request is deemed be invalid and an 404 is returned. But in this scenario, that assumption may be incorrect.

My approach to the solution

  1. Allow Sitecore to resolve an item as usual (let's call it the the 'first guess' item).
  2. Check to see if the first guess item has the correct value in its Extension field.
  3. If the first guess item didn't have a correct extension then look to see if any sibling items have the same name and correct extension.
  4. If a sibling item meets the criteria, then instruct Sitecore to use that instead.

Implementation

We need to create a new class that overrides the behaviour of Sitecore's MediaRequestHandler:
public class ExtendedMediaRequestHandler : MediaRequestHandler
{
    protected override bool DoProcessRequest(System.Web.HttpContext context)
    {
        Assert.ArgumentNotNull(context, "context");

        // If we're in a Sitecore internal site (admin, shell etc),
        // then use the base MediaRequestHandler
        if(Sitecore.Context.Domain.Name == "sitecore")
            return base.DoProcessRequest(context);

        MediaRequest request = MediaManager.ParseMediaRequest(context.Request);

        // By default Sitecore will get the first matching 
        // item regardless of the file extension.
        var mediaPath = request.MediaUri.MediaPath;
        var firstGuess = Context.Database.GetItem(mediaPath);

        if (firstGuess == null)
            return false;

        var urlExtension = GetFileExtension(context.Request.RawUrl).ToLower();

        // if the extension of the first guess item is 
        // incorrect, then we have some work to do.
        if (firstGuess["Extension"].ToLower() != urlExtension)
        {
            // Check if any of the matched item's siblings have
            // a matching path and file extension.
            var siblings = firstGuess.Parent.Children;

            var matches = from sibling in siblings
                          where firstGuess.Paths.Path == sibling.Paths.Path &&
                                sibling["Extension"].ToLower() == urlExtension
                          select sibling;

            if (!matches.Any())
                return false;

            var selectedItem = matches.First();

            // Now we've identified the correct item, we can
            // reference it by ID rather than path.
            request.MediaUri.MediaPath = selectedItem.ID.ToString();
        }

        Media media = MediaManager.GetMedia(request.MediaUri);

        if (media == null)
            return false;
           
        return DoProcessRequest(context, request, media);
    }

    private string GetFileExtension(string rawUrl)
    {
        var extension = Path.GetExtension(rawUrl);
        if (extension.Contains('?'))
            extension = extension.Substring(0, extension.IndexOf('?'));

        extension = extension.TrimStart('.');
        return extension;
    }
}

With that in place, you just need to replace any references to Sitecore.Resources.Media.MediaRequestHandler  in the web.config with your own class.

Some additional points

I'm not sure if this is the most elegant solution to the problem, and I welcome any other ideas on how to achieve it. When I was investigating the issue, I found relatively few intervention points for working with media requests. For reference, these were the ones I found:

Sitecore.Resources.Media.MediaRequestHandler, Sitecore.Kernel
Sitecore.Resources.Media.MediaProvider, Sitecore.Kernel
Sitecore.Resources.Media.MediaRequest, Sitecore.Kernel

I've only used this code in a test environment. Going through all the sibling items is quite an expensive process and to do it for every single media request would probably be impractical. If you plan on using this approach in a production environment, I would restrict it to an area of the media library where it would be most useful (i.e. /Files/).

The Media.IncludeExtensionsInItemNames setting in web.config allows you to append  the file extension of uploaded files to the item name. This overcomes the issue of getting duplicate paths, but it isn't suitable if you want to keep your original file names.


Here's some other blog posts that deal with peculiarities of media requests:
HTTP 304 response handling in Sitecore MediaRequestHandler
Use Different Media Prefixes for Sites Managed with the Sitecore ASP.NET CMS
Resolve Media Library Items Linked in Sitecore Aliases
Encode Names in Media URLs with the Sitecore ASP.NET CMS
Sitecore Idiosyncrasies: Media URLs

Kamruz Jaman has written an alternative version of the code in this post.