Overcoming message size limits on the Windows Azure Platform with NServiceBus

When using any of the windows azure queuing mechanisms for communication between your web and worker roles, you will quickly run into their size limits for certain common online use cases.

Consider for example a very traditional use case, where you allow your users to upload a picture and you want to resize it into various thumbnail formats for use throughout your website. You do not want the resizing to be done on the web role as this implies that the user will be waiting for the result if you do it synchronously, or that the web role will be using its resources for doing something else than serving users if you do it asynchronously. So you most likely want to offload this workload to a worker role, allowing the web role to happily continue to serve customers.

Sending this image as part of a message through the traditional queuing mechanism to a worker is not easy to do. For example it is not easily implemented by means of queue storage as this mechanism is limited to 8K messages, neither is it by means of AppFabric queues as they can only handle messages up to 256K messages, and as you know image sizes far outreach these numbers.

To work your way around these limitations you could perform the following steps:

  1. Upload the image to blob storage
  2. Send a message, passing in the images Uri and metadata to the workers requesting a resize.
  3. The workers download the image blob based on the provided Uri and performs the resizing operation
  4. When all workers are done, cleanup the original blob

This all sounds pretty straight forward, until you try to do it, then you run into quite a lot of issues. Among others:

To avoid long latency and timeouts you want to upload and/or download very large images in parallel blocks. But how many blocks should you upload at once and how large should these blocks be? How do you maintain byte order while uploading in parallel? What if uploading one of the blocks fails?

To avoid paying too much for storage you want to remove the large images. But when do you remove the original blob? How do you actually know that all workers have successfully processed the resizing request? Turns out you actually can’t know this, for example due to the built in retry mechanisms in the queue the message may reappear at a later time.

Now the good news is, I’ve gone through the ordeal of solving these questions and implemented this capability for windows azure into NServiceBus. It is known as the databus and on windows azure it uses blob storage to store the images or other large properties (FYI on-premises it is implemented using a file share).

How to use the databus

When using the regular azure host, the databus is not enabled by default. In order to turn it on you need to request custom initialization and call the AzureDataBus method on the configuration instance.

internal class SetupDataBus : IWantCustomInitialization
{
     public void Init()
     {
         Configure.Instance.AzureDataBus();
     }
}

As the databus is implemented using blob storage, you do need to provide a connection string to your storage account in order to make it work properly (it will point to development storage if you do not provide this setting)

<Setting name="AzureDatabusConfig.ConnectionString" value="DefaultEndpointsProtocol=https;AccountName={yourAccountName};AccountKey={yourAccountKey} />

Alright, now the databus has been set up, using it is pretty simple. All you need to do is specify which of the message properties are too large to be sent in a regular message, this is done by wrapping the property type by the DataBusProperty<T> type. Every property of this type will be serialized independently and stored as a BlockBlob in blob storage.

Furthermore you need to specify how long the associated blobs are allowed to stay alive in blob storage. As I said before there is no way of knowing when all the workers are done processing the messages, therefore the best approach to not flooding your storage account is providing a background cleanup task that will remove the blobs after a certain time frame. This time frame is specified using the TimeToBeReceived attribute which must be specified on every message that exposes databus properties.

In my image resizing example, I created an ImageUploaded event that has an Image property of type DataBusProperty<byte[]> which contains the bytes of the uploaded image. Furthermore it contains some metadata like the original filename and content type. The TimeToBeReceived value has been set to an hour, assuming that the message will be processed within an hour.

[TimeToBeReceived("01:00:00")]
public class ImageUploaded : IMessage
{
    public Guid Id { get; set; }
    public string FileName { get; set; }
    public string ContentType { get; set; }
    public DataBusProperty<byte[]> Image { get; set; }
}

That’s it, besides this configuration there is no difference with a regular message handler. It will appear as if the message has been sent to the worker as a whole, all the complexity regarding sending the image through blob storage is completely hidden for you.

public class CreateSmallThumbnail : IHandleMessages<ImageUploaded>
{
    private readonly IBus bus;

    public CreateSmallThumbnail(IBus bus)
    {
        this.bus = bus;
    }

    public void Handle(ImageUploaded message)
    {
        var thumb = new ThumbNailCreator().CreateThumbnail(message.Image.Value, 50, 50);

        var uri = new ThumbNailStore().Store(thumb, "small-" + message.FileName, message.ContentType);

        var response = bus.CreateInstance<ThumbNailCreated>(x => { x.ThumbNailUrl = uri; x.Size = Size.Small; });
        bus.Reply(response);
    }
}

Controlling the databus

In order to control the behavior of the databus, I’ve provided you with some optional configuration settings.

AzureDataBusConfig.BlockSize allows you to control the size in bytes of each uploaded block. The default setting is 4MB, which is also the maximum value.

AzureDataBusConfig.NumberOfIOThreads allows you to set the number of threads that will upload blocks in parallel. The default is 5

AzureDataBusConfig.MaxRetries allows you to specify how many times the databus will try to upload a block before giving up and failing the send. The default is 5 times.

AzureDataBusConfig.Container specifies the container in blob storage to use for storing the message parts, by default this container is named ‘databus’, note that it will be created automatically for you.

AzureDataBusConfig.BasePath allows you to add a base path to each blob in the container, by default there is no basepath and all blobs will be put directly in the container. Note that using paths in blob storage is purely a naming convention that is being adhered to, it has no other effects as blob storage is actually a pretty flat store.

Wrapup

With the databus, your messages are no longer limited in size. At least the limit has become so big that you probably don’t care about it anymore, in theory you can use 200GB per property and 100TB per message. The real limit however has now become the amount of memory available on either the machine generating or receiving the message, you cannot exceed those amounts for the time being… Furthermore you need to keep latency in mind, uploading a multi mega or gigabyte file takes a while, even from the cloud.

That’s it for today, please give the databus a try and let me know if you encounter any issues. You can find the sample used in this article in the samples repository, it’s named AzureThumbnailCreator and shows how you can create thumbnails of various sizes (small, medium, large) from a single uploaded image using background workers.

Have fun with it…

Advertisements

3 Responses to Overcoming message size limits on the Windows Azure Platform with NServiceBus

  1. Pingback: Windows Azure and Cloud Computing Posts for 6/27/2011+ - Windows Azure Blog

  2. Yves, do you actually check what the size of DataBusProperty is? If I want to have one message class that might or might not have a large amount of data in DataBusProperty, I would like that small messages are just sent over and big messages are sent using the DataBus. Is this possible or I need two different classes?

    • Hi Alexey,

      No, we do not check that, when you use the databus property that property will always be sent through the databus implementation even if it is only 1 character in size.
      So you would need 2 different classes to do so.

      Kind regards,
      Yves

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: