Hosting options for NServiceBus on Azure – Shared Hosting

Yesterday, I discussed the dedicated hosting model for NServiceBus on Windows Azure. Today I would like to introduce to you the second model, which allows you to host multiple processes on the same role.

In order to setup shared hosting you start by creating a dedicated azure role which represents the host’s controller.

public class Host : RoleEntryPoint{}

The nservicebus role that you need to specify is a special role, called AsA_Host.

public class EndpointConfiguration : IConfigureThisEndpoint, AsA_Host { }

This role will not start a UnicastBus, but instead it will load other roles from azure blob storage and spin off child processes in order to host them.

The only profile that makes sense to specify in this case is the Production or Development profile, which controls where the logging output is sent to. Other behaviors belong strictly to the UnicastBus and it’s parts, so they cannot be used in this context. Besides this profile, you can also set a few additional configuration settings that control the behavior of the host in more detail.

  • DynamicHostControllerConfig.ConnectionString – specifies the connection string to a storage account containing the roles to load, it defaults to development storage.
  • DynamicHostControllerConfig.Container – specifies the name of the container that holds the assemblies of the roles to load, as .zip files, it defaults to ‘endpoints’.
  • DynamicHostControllerConfig.LocalResource – specifies the name of the local resource folder on the windows azure instance that will be used to drop the assemblies of the hosted roles, it defaults to a folder called ‘endpoints’
  • DynamicHostControllerConfig.RecycleRoleOnError – specifies how the host should behave in case there is an error in one of the child processes – by default it will not recycle when an error occurs.
  • DynamicHostControllerConfig.AutoUpdate – specifies whether the host should poll the storage account for changes to it’s childprocess, it defaults to false.
  • DynamicHostControllerConfig.UpdateInterval – specifies how often the host should check the storage account for updates expressed in milliseconds, it defaults to every 10 minutes.
  • DynamicHostControllerConfig.TimeToWaitUntilProcessIsKilled – if there are updates, the host will first kill the original process before provisioning the new assemblies and run the updated process. I noticed that it might take a while before the process dies, which could be troublesome when trying to provision the new assemblies. This setting allows you to specify how long the host is prepared to wait for the process to terminate before it requests a recycle.

Changes to the hosted roles

Any dedicated role can be used as a child process given a few minor changes.

The first change is that all configuration has to be specified in the app.config file. This is intentional, I have removed the ability for the childprocesses to read from the service configuration file. Why? Well, otherwise every childprocess would have been configured the same way as they would share configuration settings, same storage account, same queue, same everything… that’s not what you want. Hence the only option is to limit yourself to using the role’s app.config. Note that the RoleEnvironment is available from the child processes so if you want to read the configuration settings from it, just specify the AzureConfigurationSource yourself, but I advise against it.

The second change is that you need to add a reference to NServiceBus.Hosting.Azure.HostProcess.exe to the worker’s project. This process will be started by the host and you’re role will run inside it’s process space. Why does NServiceBus force you to specify the host process, can’t it do it itself? Well it could, but that could get you into trouble in the long run. If it would decide on the host processes for you, you would be tied to a specific vresion of NServiceBus and future upgrades might become challenging. When you specify it yourself you can just continue to run any version, or customize it if you like, only the name of the process matters to the host.

The provided host process is a .net 4.0 process, but it still uses some older assemblies, so you need to add an NServiceBus.Hosting.Azure.HostProcess.exe.config file to your project, which specifies the useLegacyV2RuntimeActivationPolicy attribute

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
    <startup useLegacyV2RuntimeActivationPolicy="true">
      <supportedRuntime version="v4.0"/>
      <requiredRuntime version="v4.0.20506"/>
    </startup>
</configuration>

The final step is to put the output of the role into a zip file and upload the zip file to the container which is being monitored by the host.

If the host process is running and it’s autoupdate property is set to true, then it will automatically pick up your zip file, extract it and run it in a child process. If the property was set to false, you will have to recycle the instances yourself.

Up until now I’ve only discussed worker roles and how to host message handlers inside them, next time I’ll take a closer look at hosting in a webrole and show you how you can create a so called webworker, where both a website and worker role are hosted in the same windows azure instance.

See you next time.

Advertisements

Hosting options for NServiceBus on Azure – Dedicated Hosting

One of the main concerns people have regarding the windows azure platform is that of costs, especially hosting costs as that makes up the majority of the equation.

This is mostly due to a descrepancy between the model imposed by visual studio, where individual projects map to roles, and the way a lot of people develop their solutions, using many small projects. Especially nservicebus developers follow the guidance to build their solutions suing autonomous components, individual and standalone processes per message type. Can you imagine having a role, with a minimum of 2 instances, per message type?

I can… for certain types of apps. I deliberatily use the short term ‘app’ to represent a system that offers a limited number of features to a large amount of users. In this case it does make sense to use a dedicated role for individual message types.

But for traditional applications, like most of us create on-premises today, it doesn’t make sense. Often these applications offer a lot of features to a limited set of (enterprise) users. For this type of application visual studio for azure imposes a very costly model onto you.

The good news is, NServiceBus gives you a solution for both models… and it allows you to switch between them when required. Very interesting if you want to save a few euros when starting up a new online service, but still want to be able to spread out when traffic picks up.

Dedicated hosting

The first model that I would like to discuss is the dedicated host, which I have used in the past samples quite a lot already, so your probably familiar with it. This one is most suited for ‘apps’ that already picked up some traffic. Every messagehandler, or small group of, is hosted in a dedicated role.

First thing to do is inherit from NServiceBus’s RoleEntrpoint.

public class Host : RoleEntryPoint{}

And add a class that specifies the role that this endpoint needs to perform.

public class EndpointConfiguration : IConfigureThisEndpoint, AsA_Worker { }

Furthermore you need to specify, through profiles specified in the config, how the role should behave.

NServiceBus.Production NServiceBus.OnAzureTableStorage NServiceBus.WithAzureStorageQueues

Depending on this behaviour you may need to specify some additional configuration settings as well.

Roles & Profiles

The dedicated host can perform multiple roles, but it can only perform one of them. The roles that are supported today are:

  • AsA_Listener – This role is the most basic of them all, it will accept messages and could potentially reply to them, but that’s it.
  • AsA_Worker – This role is probably the most used role, it will accept messges, support saga’s and publish events if needed.
  • AsA_TimeoutManager – A specialised role that handles timeout messages sent by saga’s
  • AsA_Host – A specialised role that allows hosting multiple nservicebus processes, we’ll discuss this one in more depth next time.

Using profiles you can control what internal implementations is used by the roles to do their magic.

  • Development – Specifies where nservicebus should log to, in this case the console.
  • Production – Specifies where nservicebus should log to, in this case windows azure tablestorage.
  • OnAzureTableStorage – Specifies where nservicebus should store subscriptions and saga state, in this case azure table storage
  • OnSqlAzure – Specifies where nservicebus should store subscriptions and saga state, in this case sql azure.
  • WithAzureStorageQueues – Specifies what communication mechanism to use, in this case azure storage queues.
  • WithAppFabricQueues – Specifies what communication mechanism to use, in this case appfabric queues.

Next time we’ll take a closer look at the second hosting model, shared hosting…

Stay tuned

AppFabric queue support for NServiceBus

Last week at the //BUILD conference, Microsoft announced the public availability of  the AppFabric Queues, Topics and Subscriptions. A release that I have been looking forward to for quite some time now, as especially the queues are a very valuable addition to the NServiceBus toolset.

These new queues have several advantages over azure storage queues:

  • Maximum message size is up to 256K  in contrast to 8K
  • Throughput is a lot higher as these queues are not throttled on number of messages/second
  • Supports TCP for lower latency
  • You can enable exactly once delivery semantics
  • For the time being they are free!

For a complete comparison between appfabric queues and azure storage queues, this blog post seems to be a very comprehensive and complete overview: http://preps2.wordpress.com/2011/09/17/comparison-of-windows-azure-storage-queues-and-service-bus-queues/

How to configure NServiceBus to use AppFabric queues

As always, we try to make it as simple as possible. If you manually want to initialize the bus using the Configure class, you can just call the following extension method.

.AppFabricMessageQueue()

If you use the generic role entrypoint, you can enable appfabric queue support by specifying the following profile in the service configuration file

NServiceBus.WithAppFabricQueues

Besides one of these enablers you also need to specify a configuration for both your service namespace and your issuer key. These can be specified in the AppFabricQueueConfig section and are required as they cannot be defaulted by nservicebus because there is no local development equivalent to default to.

<Setting name="AppFabricQueueConfig.IssuerKey" value="yourkey" />
<Setting name="AppFabricQueueConfig.ServiceNamespace" value="yournamespace" />

That’s it, your good to go.. this is all you need to do to make it work.

additional settings

If you want to further control the behavior of the queue there are a couple more settings:

  • AppFabricQueueConfig.IssuerName – specifies the name of the issuer, defaults to owner
  • AppFabricQueueConfig.LockDuration – specifies the duration of the message lock in milliseconds, defaults to 30 seconds
  • AppFabricQueueConfig.MaxSizeInMegabytes – specifies the size, defaults to 1024 (1GB), allowed values are 1, 2, 3, 4, 5 GB
  • AppFabricQueueConfig.RequiresDuplicateDetection – specifies whether exactly once delivery is enabled, defaults to false
  • AppFabricQueueConfig.RequiresSession – specifies whether sessions are required, defaults to false (not sure if sessions makes sense in any NServiceBus use case either)
  • AppFabricQueueConfig.DefaultMessageTimeToLive – specifies the time that a message can stay on the queue, defaults to int64.MaxValue which is roughly 10.000 days
  • AppFabricQueueConfig.EnableDeadLetteringOnMessageExpiration – specifies whether messages should be moved to a dead letter queue upon expiration
  • AppFabricQueueConfig.DuplicateDetectionHistoryTimeWindow – specifies the amount of time in milliseconds that the queue should perform duplicate detection, defaults to 1 minute
  • AppFabricQueueConfig.MaxDeliveryCount – specifies the number of times a message can be delivered before being put on the dead letter queue, defaults to 5
  • AppFabricQueueConfig.EnableBatchedOperations – specifies whether batching is enabled, defaults to no (this may change)
  • AppFabricQueueConfig.QueuePerInstance – specifies whether NServiceBus should create a queue per instance instead of a queue per role, defaults to false

Now it’s up to you

If you want to give NServiceBus on AppFabric queues a try, you can start by running either the fullduplex or pubsub sample and experiment from there.

Any feedback is welcome of course!

Azure Tip – Demystifying Table Service Exceptions

A quick tip for when you’re trying to identify exceptions resulting from the windows azure storage services. Usually these look something like ‘The remote server returned an error: (400) Bad Request.’

Stating well… not much, just that you did something bad. Now how do you go by identifying what happened?

First of all, make sure you get the exception at it’s origin even if this is not in your code. You can do this by enabling visual studio debug on throw for the exception on System.Net.WebException

Now you can use your immediate window to extract the response body from the http response by issueing following command:

new System.IO.StreamReader(((System.Net.WebException) $exception).Response.GetResponseStream()).ReadToEnd()

The response you get back contains the error message indicating what is wrong. In my case one of the values I pass in is out of range. (sadly enough it does not say which one)

"<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"yes\"?>\r\n<error xmlns=\"http://schemas.microsoft.com/ado/2007/08/dataservices/metadata\">\r\n <code>OutOfRangeInput</code>\r\n <message xml:lang=\"en-US\">One of the request inputs is out of range.\nRequestId:ba01ae00-6736-4118-ab7f-2793a8504595\nTime:2011-07-28T12:36:21.2970257Z</message>\r\n</error>"

Obviously there are other errors as well, you can find list of these error types here: http://msdn.microsoft.com/en-us/library/dd179438.aspx

But the one I got is pretty common, following docs can help you identify which of the inputs is wrong: http://msdn.microsoft.com/en-us/library/dd179338.aspx

May this post save you some time 🙂

Overcoming message size limits on the Windows Azure Platform with NServiceBus

When using any of the windows azure queuing mechanisms for communication between your web and worker roles, you will quickly run into their size limits for certain common online use cases.

Consider for example a very traditional use case, where you allow your users to upload a picture and you want to resize it into various thumbnail formats for use throughout your website. You do not want the resizing to be done on the web role as this implies that the user will be waiting for the result if you do it synchronously, or that the web role will be using its resources for doing something else than serving users if you do it asynchronously. So you most likely want to offload this workload to a worker role, allowing the web role to happily continue to serve customers.

Sending this image as part of a message through the traditional queuing mechanism to a worker is not easy to do. For example it is not easily implemented by means of queue storage as this mechanism is limited to 8K messages, neither is it by means of AppFabric queues as they can only handle messages up to 256K messages, and as you know image sizes far outreach these numbers.

To work your way around these limitations you could perform the following steps:

  1. Upload the image to blob storage
  2. Send a message, passing in the images Uri and metadata to the workers requesting a resize.
  3. The workers download the image blob based on the provided Uri and performs the resizing operation
  4. When all workers are done, cleanup the original blob

This all sounds pretty straight forward, until you try to do it, then you run into quite a lot of issues. Among others:

To avoid long latency and timeouts you want to upload and/or download very large images in parallel blocks. But how many blocks should you upload at once and how large should these blocks be? How do you maintain byte order while uploading in parallel? What if uploading one of the blocks fails?

To avoid paying too much for storage you want to remove the large images. But when do you remove the original blob? How do you actually know that all workers have successfully processed the resizing request? Turns out you actually can’t know this, for example due to the built in retry mechanisms in the queue the message may reappear at a later time.

Now the good news is, I’ve gone through the ordeal of solving these questions and implemented this capability for windows azure into NServiceBus. It is known as the databus and on windows azure it uses blob storage to store the images or other large properties (FYI on-premises it is implemented using a file share).

How to use the databus

When using the regular azure host, the databus is not enabled by default. In order to turn it on you need to request custom initialization and call the AzureDataBus method on the configuration instance.

internal class SetupDataBus : IWantCustomInitialization
{
     public void Init()
     {
         Configure.Instance.AzureDataBus();
     }
}

As the databus is implemented using blob storage, you do need to provide a connection string to your storage account in order to make it work properly (it will point to development storage if you do not provide this setting)

<Setting name="AzureDatabusConfig.ConnectionString" value="DefaultEndpointsProtocol=https;AccountName={yourAccountName};AccountKey={yourAccountKey} />

Alright, now the databus has been set up, using it is pretty simple. All you need to do is specify which of the message properties are too large to be sent in a regular message, this is done by wrapping the property type by the DataBusProperty<T> type. Every property of this type will be serialized independently and stored as a BlockBlob in blob storage.

Furthermore you need to specify how long the associated blobs are allowed to stay alive in blob storage. As I said before there is no way of knowing when all the workers are done processing the messages, therefore the best approach to not flooding your storage account is providing a background cleanup task that will remove the blobs after a certain time frame. This time frame is specified using the TimeToBeReceived attribute which must be specified on every message that exposes databus properties.

In my image resizing example, I created an ImageUploaded event that has an Image property of type DataBusProperty<byte[]> which contains the bytes of the uploaded image. Furthermore it contains some metadata like the original filename and content type. The TimeToBeReceived value has been set to an hour, assuming that the message will be processed within an hour.

[TimeToBeReceived("01:00:00")]
public class ImageUploaded : IMessage
{
    public Guid Id { get; set; }
    public string FileName { get; set; }
    public string ContentType { get; set; }
    public DataBusProperty<byte[]> Image { get; set; }
}

That’s it, besides this configuration there is no difference with a regular message handler. It will appear as if the message has been sent to the worker as a whole, all the complexity regarding sending the image through blob storage is completely hidden for you.

public class CreateSmallThumbnail : IHandleMessages<ImageUploaded>
{
    private readonly IBus bus;

    public CreateSmallThumbnail(IBus bus)
    {
        this.bus = bus;
    }

    public void Handle(ImageUploaded message)
    {
        var thumb = new ThumbNailCreator().CreateThumbnail(message.Image.Value, 50, 50);

        var uri = new ThumbNailStore().Store(thumb, "small-" + message.FileName, message.ContentType);

        var response = bus.CreateInstance<ThumbNailCreated>(x => { x.ThumbNailUrl = uri; x.Size = Size.Small; });
        bus.Reply(response);
    }
}

Controlling the databus

In order to control the behavior of the databus, I’ve provided you with some optional configuration settings.

AzureDataBusConfig.BlockSize allows you to control the size in bytes of each uploaded block. The default setting is 4MB, which is also the maximum value.

AzureDataBusConfig.NumberOfIOThreads allows you to set the number of threads that will upload blocks in parallel. The default is 5

AzureDataBusConfig.MaxRetries allows you to specify how many times the databus will try to upload a block before giving up and failing the send. The default is 5 times.

AzureDataBusConfig.Container specifies the container in blob storage to use for storing the message parts, by default this container is named ‘databus’, note that it will be created automatically for you.

AzureDataBusConfig.BasePath allows you to add a base path to each blob in the container, by default there is no basepath and all blobs will be put directly in the container. Note that using paths in blob storage is purely a naming convention that is being adhered to, it has no other effects as blob storage is actually a pretty flat store.

Wrapup

With the databus, your messages are no longer limited in size. At least the limit has become so big that you probably don’t care about it anymore, in theory you can use 200GB per property and 100TB per message. The real limit however has now become the amount of memory available on either the machine generating or receiving the message, you cannot exceed those amounts for the time being… Furthermore you need to keep latency in mind, uploading a multi mega or gigabyte file takes a while, even from the cloud.

That’s it for today, please give the databus a try and let me know if you encounter any issues. You can find the sample used in this article in the samples repository, it’s named AzureThumbnailCreator and shows how you can create thumbnails of various sizes (small, medium, large) from a single uploaded image using background workers.

Have fun with it…

Improving throughput with NServiceBus on Windows Azure

One of the things that has always bothered me personally on the ‘NServiceBus – Azure queue storage’ relationship is throughput, the amount of messages that I could transfer from one role to the other per second was rather limited.

This is mainly due to the fact that windows azure storage throttles you at the http level, every queue only accepts 500 http requests per second and will queue up the remaining requests. Given that you need 3 requests per message, you can see that throughput is quite limited, you can transfer less than a hundred messages per second. (Sending role performing 1 post request, receiving role performing 1 get and 1 delete request)

One of the first things that you can do to increase throughput is using the SendMessages() operation on the unicast bus.This operation will group all messages passed into it into 1 single message and send it across the wire. Mind that queue storage also limits message size to 8KB, so in effect you can achieve a maximum improvement of factor 10, given that you have reasonable small messages and use binary formatting.

Secondly I’ve added support to the queue for reading in batches, using the GetMessages operation on the cloud queue client. By default the queue reads 10 messages at a time, but you can use a new configuration setting called BatchSize to control the amount of messages to be read. Mind that the BatchSize setting also influences the MessageInvisibleTime, as I multiply this number by the batchsize to define how long the messages have to stay invisible as overall process time may now take longer.

In the future I may consider even more improvements to increase throughput of queue storage. Like for example using multiple queues at a time to overcome the 500 requests per second limit. But as Rinat Abdullin already pointed out to me on twitter this might have grave consequences on both overall latency and costs. So before I continue with this improvement I have a question for you, do you think this additional latency and costs are warranted?

But even then, there is another throttle in place at the storage account level, which limites all storage operation requests to 5000 requests per second (this includes table storage and blob storage requests), in order to work around this limit you can specify a separate connection string for every destination queue using the following format “queuename@connectionstring”.

Building Global Web Applications With the Windows Azure Platform – Dynamic Work Allocation and Scale out

Today I would like to finish the discussion on ‘understanding capacity’ for my ‘Building Global Web Applications With the Windows Azure Platform’ series, by talking about the holy grail of cloud capacity management: Dynamic work allocation and scale out.

The basic idea is simple, keep all roles at full utilization before scaling out:

To make optimal use of the capacity that you’re renting from your cloud provider you could design your system in such a way that it is aware of it’s own usage patterns and acts upon these patterns. For example, if role 3 is running to many cpu intensive jobs and role 1 has excess capacity, it could decide to move some cpu intensive workloads off of role 3 to role 1. The system repeats these steps for all workload types and tries to maintain a balance below 80% overall capacity before deciding to scale out.

Turns out though that implementating this is not so straight forward…

First of all you need to be able to move workloads around at runtime. Every web and worker role needs to be designed in such a way that it can dynamically load workloads from some medium, and start executing it. But it also needs to be able to unload the workload, in effect your web or worker role becomes nothing more than an agent that is able to administer the workloads on the machine instead of executing them itself.

In the .net environment this means that you need to start managing separate appdomains or processes for each workload. Here you can find a sample where I implemented a worker role that can load other workloads dynamically from blob storage into a separate appdomain in response to a command that you can send from a console application. This sort of proves that moving workloads around should be technically possible.

Even though it is technically quite feasible to move workloads around, the hardest part is the business logic that decides what workloads should be moved, when and where to. You need to take quite a few things into account!

  • Every workload consumes a certain amount of cpu, memory and bandwith, but these metrics cannot be derived from traditional monitoring information as that only shows overall usage. So you need to define and compute additional metrics for each individual workload in order to know what the impact of moving that specific workload would be.
  • Workloads tend to be rather temporal as well, so a heavy cpu usage right now, does not mean it will consume the same amount in 5 seconds. So just simply moving workloads around when you detect a problem is not going to cut it.
  • In other words, you need to find ways to accurately predict future usage based on past metrics and user supplied information.
  • You need to ensure a workload is moved well before it actually would start consuming resources as moving the workload itself takes time as well.
  • These same problems repeat themselves on the target side, where you would move the workload to as that role’s utilization is in continuous flux as well.
  • I’m only touching the tip of the iceberg here, there is even much more to it…

Lot’s of hard work… but in time you will have to go through it. Please keep in mind that this is the way most utility companies make their (enormous amounts of) money, by continuously looking for more accurate ways to use and resell excess capacity.

Alright, now that you understand the concept of capacity and how it can help you to keep your costs down. It is time to move to the next section of this series: how to make your application globally available.