Dedicated image server in Sitecore – part 2

I have earlier blogged about setting up a dedicated image server for Sitecore here. The implementation required some customization of the LinkProvider, but from Sitecore 6.3 it is possible to achieve the same just using configuration.

A dedicated media server can be used to serve media on a separate URL, so that you can optimize performance due to limitations on the amount of connections the browser can open to the domain, which I also described in the last post. Further it can be used to optimize your server for handling media and supporting a CDN.

As mentioned it is now possible to have a separate server for media by configuration. In this scenario you might have a dedicated publishing target for media, and this should just be a standard Sitecore installation, which then only handles media by directing the DNS for your image URL to this server. For instance I have setup three Sitecore installations – one as a content manager server, one as the dedicated image server and one as content delivery server. The domain for my content management server is http://sitecore63/, the one for my image server is http://sitecore63imageserver/ and the content delivery server has the domain http://sitecore63cd/. What I then want to achieve is:

  1. That my content management server serves images in the normal way, so that the editors can insert, alter and preview images without having to publish.
  2. On my content delivery servers I want my images to be served by the image server.

The first goal is easy. I just leave every configuration to the standard. This means that the content management server serves media itself.
The second goal requires some configuration changes. On the content delivery servers I must make all media references point to the image server. This can be done by setting the Media.MediaLinkPrefix in the web.config to the domain name of the image server plus the media prefix I want. For instance like this:

<setting name=”Media.MediaLinkPrefix” value=”http://sitecore63imageserver/~/media/” />

This means that all media items on the content delivery server(s) will be prefixed with the above url:

<img width=”128″ height=”128″ alt=”" src=”http://sitecore63imageserver/~/media/Images/user.ashx?w=128&amp;h=128&amp;as=1“>

So in this scenario the html is delivered by http://sitecore63cd and the image is served by http://sitecore63imageserver. In the content management environment, the media is served locally. Now this is a lot easier, than the customization you needed in older versions of Sitecore.

dedicated image server in sitecore

If you want another media prefix then the standard /~/media, you also need to make the media server handle these requests as media. This is done by adding the prefix to the <mediaPrefix> element in the web.config and add the prefix to the customHandlers section in the web.config.

Hidden functionality: The XPath Builder

Maybe it is just me, who never found it, but after a support request I was made aware of a feature in Sitecore, I have missed for years. I thought I’d just share it, if there were any others like me, who wasn’t aware of the great Sitecore application XPath Builder.

Whenever I create a template which uses a Sitecore Query to populate a multilist field with items, I always spend a little too much time figuring out the correct syntax for the Sitecore Query. But no longer… now that I have found the XPath Builder!

The XPath Builder is triggered in the Developer Center through the Tools menu. The application helps you build the right query and shows the result of the query:

Sitecore XPath Builder

The XPath Builder in Sitecore

Yay! What a great tool!

What TODO – in Sitecore 6.3

Sitecore 6.3 was recently released. A lot of great improvements was included in this version – for instance support for multiple backend web servers and Windows Azure support. Read more about it in Marco Tana’s blog post and read more about the event queue on Adam Conn’s blog.

Besides these new features  a huge amount of bug fixes has been included. Some of them quite important; some of them quite funny…

Almost two years ago I wrote the blog post What TODO? In short I described the somewhat weird way of marking TODO’s by the kernel develepors at Sitecore. Well guess what… In Sitecore 6.3 they thought they would finaly do something about it. It has now…. been marked as obsolete (?!).

“The Sitecore.Todo class has been marked as obsolete. (323278)”

I must admit it made me laugh again. Why not just remove it, when you decide to do something about it?

5 worst code smells in Sitecore

Last year I wrote a blog post on doing code reviews on Sitecore. While dealing with conceptual issues it doesn’t really say what to look for in code, when evaluating the quality or to evaluate if any wrong decisions have been made when developing the solution.  Therefore I wanted to wright a post about the top 5 code smells I look for, when I am trying to evaluate or trouble shoot a Sitecore solution. These code smells often indicate that the developers have gone down the wrong road and there is something fundamentally wrong in the architecture.

So here goes:

1) Using the descendant axes in xpath expressions:

This one I especially look for, when I am looking at a site which doesn’t perform. It may be used in some cases, but very often I find that a developer has used this to generate relations between content and thereby iterating over the complete content tree. This works fine in development where there are a few hundred content items. However in production where thousands of items are created this simply breaks the performance of the site – especially after publishing which clears the HTML cache.

The descendants can be accessed in multiple ways. In XSLT it normally looks like this:
<xsl:for-each select=”$sc_currentitem/descendant-or-self::item”>
</xsl:for-each>Or like this:
<xsl:for-each select=”$sc_currentitem//item”>
</xsl:for-each>

In C# using the Sitecore API it is most often accessed like this:

item.Axes.GetDescendants()

2) XSLT Import:

This is related to a point in the post about code reviews on Sitecore, where I argue that overcomplicated functionality and logic in XSLTs, is a problem. If you need to do an import of another XSLT in your rendering, you probably do it to call a method or in another way try and reuse functionality. If you want to do this, don’t use XSLTs! XSLTs should only – if you really want to use them at all – hold simple presentations.
XSLTs can be imported like this:

<xsl:import href=”YourOtherXslt.xslt” />

3) Explicitly accessing the master database

Although this can be needed in some cases and especially in scheduled tasks or other backend functionality, most often it is because the developer haven’t thought the issue through or if they have made a wrong decision of how data entered by the user should be stored.
The master database is most often used like this in c#

Database masterDatabase = Database.GetDatabase(“master”);

If you see a call like this, your solution is most likely not ready for staging, where there is no access to the master database from the content delivery server. You should generally use something like this:

Database database = Sitecore.Context.Database ?? Sitecore.Context.ContentDatabase;

This means you operate on the context database and on the content delivery databases this will be the web database and in preview you will use the master database, which is probably what you want.
If you need to write something to the master database, you first need to decide if this is a good idea at all. If you write to the master database from your frontend, you make yourself vulnerable for malicious attack, as people from the outside can enter data into the master database. Normally we handle user entered data in a custom database or parse all user created content to ensure that the master database won’t be flooded with data.
If you really need to access the master database, you should create a service that handles access to the master database, so that your solution is ready for staged environments.

4) XSLT extension over 100 lines of code:

This point is closely related to point number 2, but I really see a lot of issues with XSLTs which have been used wrong and this creates a lot of issues later on in the project lifecycle. If you have an XSLT extension with over a hundred lines of code, chances are, that the developer made a wrong decision when creating the XSLT; instead he/she should have created a sublayout.
XSLT extensions are functional programming and you should use an object oriented approach, so my advice is not to use XSLTs at all, unless you are doing something really simple. Otherwise your solution will become harder and harder to maintain, as utility methods like the XSLT extensions are very hard to understand and couplings in the renderings themself are difficult to map.
You can find all registered XSLT extensions in the web.config under xslExtension and it looks something like this:

<xslExtensions>
  <extension mode=”on” type=”MyNamespace.XslHelper, MyAssembly” namespace=”http://www.example.net/helper” singleInstance=”true”/>

5) Clearing of cache

This may seem obvious to some, but I have seen surprisingly many methods, which cleared the entire Sitecore cache including the data and item cache. If you see this in your code and it is needed, chances are, that you have an architectural issue in your solution. It should most rarely be needed to clear the cache, unless it is after a publish and Sitecore handles this automatically. I have heard many reasons for clearing the cache including an issue, where the developer argued that it was necessary, as he wrote directly to the Sitecore database with a SQL statement, and then Sitecores cache wasn’t updated.
If you experience something like this, the issue is of cause not the cache, but that you don’t use the Sitecore API to read and write data from the database.
The caching is most often cleared like this:

CacheManager.ClearAllCaches();

This is just some of the things I have seen many times, but I know there are many other bad code smells when developing Sitecore solution. Do you have any other code smells to share?

Sitecore Lucene index viewer 1.2 released

Last week I was contacted by the Sitecore Shared Source coordinator Jimmie Overby and he told me that a guy in the Sitecore Ukraine office had made some changes to the Sitecore Lucene IndexViewer, I released some time ago. At first I was a bit sceptic about it, so I asked to review the code changes and see the new functionality. So I installed the new package and looked through the code, and I discovered that the Sitecore Ukranian Vyacheslav Tronko (don’t ask me to pronounce it:)) had done a fantastic job!

Not only had he added some very nice features but he had rewritten some of the code so the application is more stable and extendable. So I really appreciate his work!

The new features are:

  • Possibility to browse indexes defined in the new search section in the web.config
  • Rebuild indexes through the IndexViewer (which is really nice, as you can’t rebuild the new type of indexes any other way)
  • Optimize indexes.
  • Improved usability and look-and-feel of the application

So go take a look at the changes at http://trac.sitecore.net/IndexViewer 

If you want to see some screenshots of the new features, take a look here

What I would like to add next, is extended search functionality.

Sitecore podcast launched

Over at Learn Sitecore my coworker Jimmi Lyhne Andersen and I have just launched a brand new Sitecore podcast. Hopefully you will all like it and hopefully it will bring more article contributors to Learn Sitecore.

In the first episode we interview Sitecore CTO John West. We talk about Dreamcore and Sitecore in general. Check it out at: http://learnsitecore.cmsuniverse.net/en/podcasts.aspx

Enjoy!

Will Sitecore OMS end strictly hierarchical sites?

Since the release of Sitecore OMS there have been much hype about the product and many people have speculated on what impact it will have on the CMS industry. However – in my opinion – the product has been more or less misunderstood or at least underestimated. Partly the product has been considered as a (somewhat limited) analytics tool and partly as a marketing tool allowing multivariate testing, conditional renderings etc. This is the result of several factors; one of them being that people review the product, without having understood the concept; another being that the product has been marketed for marketers. Therefore attention has been given on presentation and conditional renderings, when discussing personalization or behavioral targeting. Every demo I have seen have revolved around presentations and how they can vary depending on the profile of the website user or how a non technical marketer can act upon analytics data directly without the developer being involved.

To be honest I think this is somewhat unambitious considering the power of the product. Sitecore has always excellented in focusing on keeping presentation as layer on top of the content. The demos I have seen have focused on the presentation layer strictly. This is all fine and I find the options given to the non-technical personal impressive, but in my opinion the product has a lot more to offer than that. I think that we in the future will see the analytics, personalization etc. applied to content as well as the presentation.

The past – hierarchical sites

So what do I mean by this? Well, most of the sites I work on, the real resource on the site is good and relevant content. The issue is then how do we present and structure this best for the user of the site? (Well most of the time this is the issue; I have actually worked with clients who think the most important job is to structure the content well for the content authors.) Anyway the focus on the information architecture and structure has always been inwards and out and how the content authors can deliver the content in the right structure.

This is the case of a hierarchical site. In these types of sites content authors try to force the content into a hierarchy, which dimensions often are dictated by the design of the navigation parts on the frontend. However content is rarely strictly hierarchical. Content is often more relational and is relevant in multiple places in the hierarchy and should be presented several places. If this doesn’t happen automatically the content authors will have a lot of maintenance and will have to consider where content should be presented on the frontend each time they create it. Further the hierarchy often resembles something that the website users don’t know anything about – such as the organizational structure of the company. So hierarchical sites tend to be hard to maintain and the users of the website are more often than not, going to have problems finding the content they are looking for. (A great example is the SDN. Who can find something on that site without using the search functionality?)

The present – Taxonomy driven websites

To accommodate this issue we have seen more and more request for taxonomy driven sites and have implemented them with success. In these types of sites the navigation of the site is controlled by tags. Content is created centrally by the content authors and they will have to tag the content with terms indicating what the content is about. The content is then automatically shown relevant and often multiple places on the site, depending on how the hierarchy of the tags is structured.

The limitation of this is still that the approach is inwards and out. The content authors and the information architects still have to create the hierarchy based on the presumptions about the users.

The future – behavior driven websites

And this is where OMS comes into the picture. Imagine if we could construct sites, which are completely outwards and in. The information shown on the websites is structured by how the users behave and not by what the content authors think is a good structure. The information OMS has gathered about the each user can be used and he will automatically be presented for relevant content based on previous experiences with similar profiles. Further imagine that the web analysts and marketers can create goals in OMS and then the content structure on the site will change, so that it will present the structure which gives the most conversions based on the behavior of other users or profiles, which have made a conversion. I call this a behavior-driven website and I would love to be given the opportunity to implement it.

So what is the difference from what we have already heard about the product? In my opinion personalization, behavioral targeting etc. should be based at content level and not only at presentation level. For instance take the functionality for A/B testing. This is based on different presentations, but in a behavior-driven website the A/B testing is applied to the content – in Sitecore that would be an Item. Imagine creating two versions of an item and then test the different content to see which brings most conversions. Imagine that the hierarchy of the site isn’t based on organization structure or tags defined by the authors, but from whatever structure bring most conversions for the profile the user matches. This can’t be handled in presentations – only at content level. Imagine that content authors never need to worry about where content should be placed, it is automatically placed where analytics from real users says is the best place. Now this would really be cool and I am sure it can be achieved with OMS and that is why having live analytics data together with your CMS is going to change how we build sites… Or so I hope :)

Does this make sense outside my brain?

Publishing strategies

Publishing can be quite an issue for content authors and make an impact on performance. However if you use the right strategy, there really isn’t any issues.

One of the issues with publishing can be performance. When you publish one or more items the HTML cache is cleared, which means that if you have some performance heavy presentations, it will increase the loading time of the page, until the cache has been rebuild. Further if you use the staging module and it is configured to it, all the data caches are cleared as well. This can cause the site to take up to 30 seconds to reload your pages – especially if they are data heavy. You can read more about Sitecore Caching here and read about the issues with cache and the staging module here.

Note that Sitecore recently released a staging module which supports partial item caching, but the HTML cache is still cleared. You can read more about the new module on Alex Shybas blog.

I don’t know if performance impact when publishing is specific to Sitecore or is generally a problem in most CMS’, but I reckon it is the later. However Sitecore has a special problem with regards to caching. As Sitecore isn’t page based as many other CMS’ (this is also one of its biggest advantages), it makes it difficult to have a partial cache clearing – especially with regards to HTML cache. A page can be constructed by several content items, and there are no immediate binding between a page and the items, which it is constructed of, so all HTML caches needs to be cleared even though you just publish one item.

Another issue is the usability for content authors. Some of our clients were complaining about being queued all the time when they initiated a publish and it could take a very long time before they got their content published. What often happened was that when they initiated a publish, the modal dialog reported, that they were queued. Instead of waiting for it, they closed the window and continued editing. A few minutes later they would initiate another publish, but was once again told that they were queued. When this process repeated itself for the 50 active content authors, you can imagine the queue could get quite big.

Another thing we stumbled upon was, that some content authors used publishing instead of preview. Every time they wanted to view their work in progress to check if it looked alright frontend-wise, they published the item and viewed the page as if it was already completed and ready for actual publish.
I know we are not the only Sitecore partner experiencing this, as I have talked to other implementation partners and received questions about it on Learn Sitecore.

So why do I suggest it isn’t a problem? Well… we aren’t really experience any issues with our clients or with performance issues caused by publishing anymore. This is a result of more than one thing, but the primary solution was to use scheduled publish.

I don’t know why, but using this publishing strategy this can be a really big issue for clients. We often heard remarks like: “We need to publish content immediately”… But common! I really haven’t worked with many organizations, where this is an actual issue. What content do these organizations have, which is so important, that they can’t wait 15 minutes (at tops) to be published? I have consulted clients with huge sites, which is absolutely critical to the business and the funny part is that these companies rarely have a need to have immediate publishing. They are normally used to having a longer waiting period for the rest of their web services due to replications tasks between 100s of servers and rebuilding of caching. So why do the smaller companies have a bigger need for immediate publishing? The short answer – they don’t!

So what do we do to convince our client to use scheduled publish? We do more than one thing: First of all we explain to them, what impact publishing has. Then we allow a single or a few admins to have publish rights, so that they can have immediate publishing. Further we set the “do not publish” mark on all newly created items, so that they won’t be accidently published while they’re not finished. The authors will have to unclick the setting, when they think the item is ready for publishing.

Finally we have developed something, which in my opinion is rather cool! We acknowledged that people respond best to waiting time, if they have an indication of when the waiting will come to an end. This fact is also one of the reasons why they have timers on traffic lights in Copenhagen:

Traffic light with timer

7 seconds to green light :)

We decided to use the same strategy for scheduled publishing. If the content authors would know, when the next scheduled publish is due, they would be much more satisfied with the idea of not having immediate publishing. So we – in Pentia – developed a timer which is shown in the task bar in Sitecore: 

scheduled publish timer

The scheduled publish timer in action

The “Trinvis udgivelse” is Danish and mean incremental publish. So this timer tells the content author, that there is 12 minutes and 48 seconds to the next incremental publish. The functionality also supports multiple timers, so on some solutions we also use it to indicate, when an import from an external database is due and so forth. Cool, right?

These things combined we really don’t have any issues, with convincing our clients to use scheduled publishing.

Dedicated image server and Sitecore

A lot of sites which have a large amount of visitors use a dedicated image server.  There are several reasons for this, but mostly it is because that most browsers only allow a few simultaneous connections to the same domain, which stalls the download of a page that has more than two resources. For instance imagine you have a page consisting of multiple images, stylesheets and javascript files. As the clients only allows a few simultaneous downloads, the browser won’t download all the resources at the same time, but wait for each download to complete and first then start the next download.

The reason of this behavior is that the HTTP standard says that only two simultaneous connections between a client and a server should be allowed. This was specified to save servers from heavy IO load, when files where requested. Browsers like IE7 and Firefox 2 lives up to this standard and only allow 2 connections, while new browsers like IE8 and Firefox 3 allows 6 connections.
To ensure that the HTML can be fully loaded independently and without waiting for other downloads and to remove the load from the primary web server; a dedicated image server can be used. This server holds all files and uses a different domain (for instance images.mydomain.com). All files used on the page can then be fully referenced and thereby be downloaded from a different domain. Eg <img src=”http://images.mydomain.com/image1.jpg alt=”image1”/>. As the images are now on a different domain, these can be requested independently of the server providing the HTML and more simultaneous connections can be opened.

But how is this possible in Sitecore where the media library controls all the images?

One of the easiest ways is to have another publishing target for the media. In that way you will have two frontend servers: One serving the normal requests and one serving all media items. You can easily set up another publishing target and for instance use the staging module to clear the cache. Read more on SDN if you want to know how to set it up.

The remaining problem is: How do we ensure that all media items gets prefixed with another domain? Unfortunately there isn’t a simple setting for this (it will probably come in a future a release of Sitecore – I hope :) ). However there is a simple solution to the problem as links are expanded by the LinkProvider, which can be overridden. More precisely the links are expanded in the LinkProvider.ExpandDynamicLink method, so we need to override this replacing the media path:

namespace TestApplication
{
  public class CustomLinkProvider : LinkProvider
  {
    public const string MEDIA_PATH = “~/media/”;

    public override string ExpandDynamicLinks(string text, bool resolveSites)
    {
      string baseExpands = base.ExpandDynamicLinks(text, resolveSites);
      return baseExpands.Replace(MEDIA_PATH, “http://images.pentia.dk/” + MEDIA_PATH);
    }
  }
}

Here I override the ExpandDynamicLinks in the inherited class CustomLinkProvider. I replace all media paths (identified by the prefix ~/media/) and replace it with the full path to the image server. The remaining thing to do is to replace the LinkProvider type in the web.config:

<linkManager defaultProvider=”sitecore”>
  <providers>
    <clear />
    <add name=”sitecore” type=”TestApplication.CustomLinkProvider, TestApplication” addAspxExtension=”true” alwaysIncludeServerUrl=”false” encodeNames=”true” languageEmbedding=”asNeeded” languageLocation=”filePath” shortenUrls=”true” useDisplayName=”false” />
  </providers>
</linkManager>

Here I just point the LinkProvider type to my own implementation.

Now that this is set up, all requests for media items will be handled by the image server. This does not just allow faster load times for the client, but will put less pressure on the main server. Further you can tune the media server to handling media items by increasing the media cache, setting the prefetch cache etc.

Enjoy your new Sitecore media server :)

Why can’t I download the latest version of Sitecore?

I have earlier questioned Sitecores way of releasing new versions. You can read the full discussion here: http://sdn.sitecore.net/SDN5/Forum/ShowPost.aspx?PostID=17750. The whole notion of Sitecore having “recommended” and “not recommended” releases are to me rather strange. Especially considering the frequency of when new versions gets to the “recommended” level. For instance it is at the moment not possible to use a recommended version of Sitecore which supports Sitecore OMS!

In Pentia we finally got to that point, where we just decided to ignore the recommended label and upgrade and use the latest released version, as it seems rather arbitrary whether a version is recommended or not. This means that I wanted to download the latest version of Sitecore… But guess what. That is not possible. You have to download the release 090630; install it and then afterward download and install two updates.

So I couldn’t help but wonder… Why can’t I download a package containing the latest release of Sitecore?