Access Control Service – Why not to rely on the nameidentifier claim

Over the past week, while preparing http://www.goeleven.com/ for a migration to our production account on windows azure, I learned an important lesson that I would like to share with you, so that you don’t have to make the same mistake as I did.

The web front end, outsources authentication to various identity providers using the windows azure access control service. All of these identity providers provide a common claim that can be used to authenticate the user, being the nameidentifier claim. The value of the name identifier is unique for every user and each application. But in the scenario where there is a man in the middle, such as the access control service, this means that the value of the nameidentifier is actually unique per user per access control instance (as that is considered the application by the identity provider).

This prevents you from doing a certain number of operations with your access control service namespace as the value of the nameidentifier changes when you switch access control service instance, aka you loose your customers.

Things you can no longer provide are:

  • Migration of your namespace
  • Disaster recovery
  • High availability
  • Geographic proximity for travelling users

Therefore it’s better to correlate the user’s information with an identity on another claim, email address for example, which remains stable across different access control service instances.

The live id provider, does however not provide any additional information besides the nameidentifier, so I’m sad to report that I will have to stop supporting it!

And to make matters worse, I did not save any of the other claims. so now I have to go beg all of my users to help me upgrade their account 😦

So if you are a user of http://www.goeleven.com/ , please help me update your account:

  • For LiveId users: Login with your account, navigate to profile > identities and associate any of the other providers.
  • For Non LiveId users: Just login with your account, this will automatically fix your identity.

Thanks in advance…

Hosting options for NServiceBus on Azure – Web roles & web workers

Understanding webroles

Up until now I’ve only discussed worker roles, deliberatly, as they are relatively straight forward to use in combination with NServiceBus.

Webroles however are a bit different when it comes to NServiceBus hosting, not that it is per se more difficult to do so, but you need to understand how webroles work to avoid weird effects when running NServiceBus inside them.

There are 2 flavors of webroles, Hosted Web Core and Full IIS, and it makes a difference to host NServiceBus depending on which one you use as their process model differs quite a lot.

Web roles models

If you’re webrole uses HWC there will be only 1 process hosting your assemblies, representing the web role. In this scenario there is no difference between a web and worker and you can simply host an NServiceBus endpoint inside of it the same way.

But in Full IIS mode, which is the default, you need to know that 1 webrole project will result in 2 processes which share the same assemblies. One of the processes is the role host, the other is an IIS website which will actually serve requests. But as they share assemblies, combined with the automagical nature of the NServiceBus hosting model, this would result in 2 processes with identical messagehandlers and configuration, not what you want.

Therefore we only support the hosting model for the role hosts (represented by webrole.cs in a webrole project). For the IIS website you have to configure the bus manually, using the Configure class and it’s extension methods.

Configuring a role manually has also been shown a lot in previous posts, but for completion sake I’ll show it again.

Configure.WithWeb() // With allows you to control the assemblies to load
    .Log4Net()
    .DefaultBuilder()
    .AzureConfigurationSource()
    .AzureMessageQueue()
            .JsonSerializer()
    .AzureSubcriptionStorage()
    .UnicastBus()
            .LoadMessageHandlers()
            .IsTransactional(true)
    .CreateBus()
    .Start();

Web workers

Now that you know that you can control the behavior of NServiceBus in both processes, you can create what we call ‘web workers’ in azure lingo. This are Windows Azure instances that host both webroles and worker roles. These are very interesting from a cost saving perspective, you can in fact host multiple website and workers when combining webworkers with NServiceBus’ dynamic azure host.

To set this up, you need to add a webrole.cs file to your webrole project in which you inherit from NServiceBus’ RoleEntryPoint and configure the endpoint AsA_Host. This will setup the role host to spin off child processes that act as workers, see my previous post on the details of how to do this. The host itself will not run a bus to avoid loading the website’s messagehandlers in it’s process space as these processes do share assemblies. (Remember! Selecting another role, like AsA_Worker, will load the messagehandlers)

In global.asax you can then configure the bus manually to be hosted in the website using the Configure class as stated above. All message handlers referenced from the webrole will be hosted in the IIS process and, as manual configuration ignores any IConfigureThisEndpoint implementation, you also avoid running a second dynamic host.

IIS 7.5

One small caveat when running in IIS version 7.5, which is installed on the latest azure OS families based on Windows Server 2008 R2, is that you cannot access either RoleEnvironment nor HttpContext early in the request lifecycle. Trying to configure NServiceBus at Application_Start will result in an exception as it does use these constructs internally.

An easy way to resolve this is to use .Net’s Lazy<T> to postpone initialization of the bus to the first Application_BeginRequest occurance in a thread safe manner:

private static readonly Lazy StartBus = new Lazy(ConfigureNServiceBus);

private static IBus ConfigureNServiceBus()
{
   var bus = Configure.WithWeb()
                  ...;

   return bus;
}

protected void Application_BeginRequest()
{
   Bus = StartBus.Value;
}

Want to get started with your own webworker?

Then check out the AzureHost sample in the NServiceBus trunk and build from there.

Happy coding!

Hosting options for NServiceBus on Azure – Shared Hosting

Yesterday, I discussed the dedicated hosting model for NServiceBus on Windows Azure. Today I would like to introduce to you the second model, which allows you to host multiple processes on the same role.

In order to setup shared hosting you start by creating a dedicated azure role which represents the host’s controller.

public class Host : RoleEntryPoint{}

The nservicebus role that you need to specify is a special role, called AsA_Host.

public class EndpointConfiguration : IConfigureThisEndpoint, AsA_Host { }

This role will not start a UnicastBus, but instead it will load other roles from azure blob storage and spin off child processes in order to host them.

The only profile that makes sense to specify in this case is the Production or Development profile, which controls where the logging output is sent to. Other behaviors belong strictly to the UnicastBus and it’s parts, so they cannot be used in this context. Besides this profile, you can also set a few additional configuration settings that control the behavior of the host in more detail.

  • DynamicHostControllerConfig.ConnectionString – specifies the connection string to a storage account containing the roles to load, it defaults to development storage.
  • DynamicHostControllerConfig.Container – specifies the name of the container that holds the assemblies of the roles to load, as .zip files, it defaults to ‘endpoints’.
  • DynamicHostControllerConfig.LocalResource – specifies the name of the local resource folder on the windows azure instance that will be used to drop the assemblies of the hosted roles, it defaults to a folder called ‘endpoints’
  • DynamicHostControllerConfig.RecycleRoleOnError – specifies how the host should behave in case there is an error in one of the child processes – by default it will not recycle when an error occurs.
  • DynamicHostControllerConfig.AutoUpdate – specifies whether the host should poll the storage account for changes to it’s childprocess, it defaults to false.
  • DynamicHostControllerConfig.UpdateInterval – specifies how often the host should check the storage account for updates expressed in milliseconds, it defaults to every 10 minutes.
  • DynamicHostControllerConfig.TimeToWaitUntilProcessIsKilled – if there are updates, the host will first kill the original process before provisioning the new assemblies and run the updated process. I noticed that it might take a while before the process dies, which could be troublesome when trying to provision the new assemblies. This setting allows you to specify how long the host is prepared to wait for the process to terminate before it requests a recycle.

Changes to the hosted roles

Any dedicated role can be used as a child process given a few minor changes.

The first change is that all configuration has to be specified in the app.config file. This is intentional, I have removed the ability for the childprocesses to read from the service configuration file. Why? Well, otherwise every childprocess would have been configured the same way as they would share configuration settings, same storage account, same queue, same everything… that’s not what you want. Hence the only option is to limit yourself to using the role’s app.config. Note that the RoleEnvironment is available from the child processes so if you want to read the configuration settings from it, just specify the AzureConfigurationSource yourself, but I advise against it.

The second change is that you need to add a reference to NServiceBus.Hosting.Azure.HostProcess.exe to the worker’s project. This process will be started by the host and you’re role will run inside it’s process space. Why does NServiceBus force you to specify the host process, can’t it do it itself? Well it could, but that could get you into trouble in the long run. If it would decide on the host processes for you, you would be tied to a specific vresion of NServiceBus and future upgrades might become challenging. When you specify it yourself you can just continue to run any version, or customize it if you like, only the name of the process matters to the host.

The provided host process is a .net 4.0 process, but it still uses some older assemblies, so you need to add an NServiceBus.Hosting.Azure.HostProcess.exe.config file to your project, which specifies the useLegacyV2RuntimeActivationPolicy attribute

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
    <startup useLegacyV2RuntimeActivationPolicy="true">
      <supportedRuntime version="v4.0"/>
      <requiredRuntime version="v4.0.20506"/>
    </startup>
</configuration>

The final step is to put the output of the role into a zip file and upload the zip file to the container which is being monitored by the host.

If the host process is running and it’s autoupdate property is set to true, then it will automatically pick up your zip file, extract it and run it in a child process. If the property was set to false, you will have to recycle the instances yourself.

Up until now I’ve only discussed worker roles and how to host message handlers inside them, next time I’ll take a closer look at hosting in a webrole and show you how you can create a so called webworker, where both a website and worker role are hosted in the same windows azure instance.

See you next time.

Hosting options for NServiceBus on Azure – Dedicated Hosting

One of the main concerns people have regarding the windows azure platform is that of costs, especially hosting costs as that makes up the majority of the equation.

This is mostly due to a descrepancy between the model imposed by visual studio, where individual projects map to roles, and the way a lot of people develop their solutions, using many small projects. Especially nservicebus developers follow the guidance to build their solutions suing autonomous components, individual and standalone processes per message type. Can you imagine having a role, with a minimum of 2 instances, per message type?

I can… for certain types of apps. I deliberatily use the short term ‘app’ to represent a system that offers a limited number of features to a large amount of users. In this case it does make sense to use a dedicated role for individual message types.

But for traditional applications, like most of us create on-premises today, it doesn’t make sense. Often these applications offer a lot of features to a limited set of (enterprise) users. For this type of application visual studio for azure imposes a very costly model onto you.

The good news is, NServiceBus gives you a solution for both models… and it allows you to switch between them when required. Very interesting if you want to save a few euros when starting up a new online service, but still want to be able to spread out when traffic picks up.

Dedicated hosting

The first model that I would like to discuss is the dedicated host, which I have used in the past samples quite a lot already, so your probably familiar with it. This one is most suited for ‘apps’ that already picked up some traffic. Every messagehandler, or small group of, is hosted in a dedicated role.

First thing to do is inherit from NServiceBus’s RoleEntrpoint.

public class Host : RoleEntryPoint{}

And add a class that specifies the role that this endpoint needs to perform.

public class EndpointConfiguration : IConfigureThisEndpoint, AsA_Worker { }

Furthermore you need to specify, through profiles specified in the config, how the role should behave.

NServiceBus.Production NServiceBus.OnAzureTableStorage NServiceBus.WithAzureStorageQueues

Depending on this behaviour you may need to specify some additional configuration settings as well.

Roles & Profiles

The dedicated host can perform multiple roles, but it can only perform one of them. The roles that are supported today are:

  • AsA_Listener – This role is the most basic of them all, it will accept messages and could potentially reply to them, but that’s it.
  • AsA_Worker – This role is probably the most used role, it will accept messges, support saga’s and publish events if needed.
  • AsA_TimeoutManager – A specialised role that handles timeout messages sent by saga’s
  • AsA_Host – A specialised role that allows hosting multiple nservicebus processes, we’ll discuss this one in more depth next time.

Using profiles you can control what internal implementations is used by the roles to do their magic.

  • Development – Specifies where nservicebus should log to, in this case the console.
  • Production – Specifies where nservicebus should log to, in this case windows azure tablestorage.
  • OnAzureTableStorage – Specifies where nservicebus should store subscriptions and saga state, in this case azure table storage
  • OnSqlAzure – Specifies where nservicebus should store subscriptions and saga state, in this case sql azure.
  • WithAzureStorageQueues – Specifies what communication mechanism to use, in this case azure storage queues.
  • WithAppFabricQueues – Specifies what communication mechanism to use, in this case appfabric queues.

Next time we’ll take a closer look at the second hosting model, shared hosting…

Stay tuned

AppFabric queue support for NServiceBus

Last week at the //BUILD conference, Microsoft announced the public availability of  the AppFabric Queues, Topics and Subscriptions. A release that I have been looking forward to for quite some time now, as especially the queues are a very valuable addition to the NServiceBus toolset.

These new queues have several advantages over azure storage queues:

  • Maximum message size is up to 256K  in contrast to 8K
  • Throughput is a lot higher as these queues are not throttled on number of messages/second
  • Supports TCP for lower latency
  • You can enable exactly once delivery semantics
  • For the time being they are free!

For a complete comparison between appfabric queues and azure storage queues, this blog post seems to be a very comprehensive and complete overview: http://preps2.wordpress.com/2011/09/17/comparison-of-windows-azure-storage-queues-and-service-bus-queues/

How to configure NServiceBus to use AppFabric queues

As always, we try to make it as simple as possible. If you manually want to initialize the bus using the Configure class, you can just call the following extension method.

.AppFabricMessageQueue()

If you use the generic role entrypoint, you can enable appfabric queue support by specifying the following profile in the service configuration file

NServiceBus.WithAppFabricQueues

Besides one of these enablers you also need to specify a configuration for both your service namespace and your issuer key. These can be specified in the AppFabricQueueConfig section and are required as they cannot be defaulted by nservicebus because there is no local development equivalent to default to.

<Setting name="AppFabricQueueConfig.IssuerKey" value="yourkey" />
<Setting name="AppFabricQueueConfig.ServiceNamespace" value="yournamespace" />

That’s it, your good to go.. this is all you need to do to make it work.

additional settings

If you want to further control the behavior of the queue there are a couple more settings:

  • AppFabricQueueConfig.IssuerName – specifies the name of the issuer, defaults to owner
  • AppFabricQueueConfig.LockDuration – specifies the duration of the message lock in milliseconds, defaults to 30 seconds
  • AppFabricQueueConfig.MaxSizeInMegabytes – specifies the size, defaults to 1024 (1GB), allowed values are 1, 2, 3, 4, 5 GB
  • AppFabricQueueConfig.RequiresDuplicateDetection – specifies whether exactly once delivery is enabled, defaults to false
  • AppFabricQueueConfig.RequiresSession – specifies whether sessions are required, defaults to false (not sure if sessions makes sense in any NServiceBus use case either)
  • AppFabricQueueConfig.DefaultMessageTimeToLive – specifies the time that a message can stay on the queue, defaults to int64.MaxValue which is roughly 10.000 days
  • AppFabricQueueConfig.EnableDeadLetteringOnMessageExpiration – specifies whether messages should be moved to a dead letter queue upon expiration
  • AppFabricQueueConfig.DuplicateDetectionHistoryTimeWindow – specifies the amount of time in milliseconds that the queue should perform duplicate detection, defaults to 1 minute
  • AppFabricQueueConfig.MaxDeliveryCount – specifies the number of times a message can be delivered before being put on the dead letter queue, defaults to 5
  • AppFabricQueueConfig.EnableBatchedOperations – specifies whether batching is enabled, defaults to no (this may change)
  • AppFabricQueueConfig.QueuePerInstance – specifies whether NServiceBus should create a queue per instance instead of a queue per role, defaults to false

Now it’s up to you

If you want to give NServiceBus on AppFabric queues a try, you can start by running either the fullduplex or pubsub sample and experiment from there.

Any feedback is welcome of course!

Overcoming message size limits on the Windows Azure Platform with NServiceBus

When using any of the windows azure queuing mechanisms for communication between your web and worker roles, you will quickly run into their size limits for certain common online use cases.

Consider for example a very traditional use case, where you allow your users to upload a picture and you want to resize it into various thumbnail formats for use throughout your website. You do not want the resizing to be done on the web role as this implies that the user will be waiting for the result if you do it synchronously, or that the web role will be using its resources for doing something else than serving users if you do it asynchronously. So you most likely want to offload this workload to a worker role, allowing the web role to happily continue to serve customers.

Sending this image as part of a message through the traditional queuing mechanism to a worker is not easy to do. For example it is not easily implemented by means of queue storage as this mechanism is limited to 8K messages, neither is it by means of AppFabric queues as they can only handle messages up to 256K messages, and as you know image sizes far outreach these numbers.

To work your way around these limitations you could perform the following steps:

  1. Upload the image to blob storage
  2. Send a message, passing in the images Uri and metadata to the workers requesting a resize.
  3. The workers download the image blob based on the provided Uri and performs the resizing operation
  4. When all workers are done, cleanup the original blob

This all sounds pretty straight forward, until you try to do it, then you run into quite a lot of issues. Among others:

To avoid long latency and timeouts you want to upload and/or download very large images in parallel blocks. But how many blocks should you upload at once and how large should these blocks be? How do you maintain byte order while uploading in parallel? What if uploading one of the blocks fails?

To avoid paying too much for storage you want to remove the large images. But when do you remove the original blob? How do you actually know that all workers have successfully processed the resizing request? Turns out you actually can’t know this, for example due to the built in retry mechanisms in the queue the message may reappear at a later time.

Now the good news is, I’ve gone through the ordeal of solving these questions and implemented this capability for windows azure into NServiceBus. It is known as the databus and on windows azure it uses blob storage to store the images or other large properties (FYI on-premises it is implemented using a file share).

How to use the databus

When using the regular azure host, the databus is not enabled by default. In order to turn it on you need to request custom initialization and call the AzureDataBus method on the configuration instance.

internal class SetupDataBus : IWantCustomInitialization
{
     public void Init()
     {
         Configure.Instance.AzureDataBus();
     }
}

As the databus is implemented using blob storage, you do need to provide a connection string to your storage account in order to make it work properly (it will point to development storage if you do not provide this setting)

<Setting name="AzureDatabusConfig.ConnectionString" value="DefaultEndpointsProtocol=https;AccountName={yourAccountName};AccountKey={yourAccountKey} />

Alright, now the databus has been set up, using it is pretty simple. All you need to do is specify which of the message properties are too large to be sent in a regular message, this is done by wrapping the property type by the DataBusProperty<T> type. Every property of this type will be serialized independently and stored as a BlockBlob in blob storage.

Furthermore you need to specify how long the associated blobs are allowed to stay alive in blob storage. As I said before there is no way of knowing when all the workers are done processing the messages, therefore the best approach to not flooding your storage account is providing a background cleanup task that will remove the blobs after a certain time frame. This time frame is specified using the TimeToBeReceived attribute which must be specified on every message that exposes databus properties.

In my image resizing example, I created an ImageUploaded event that has an Image property of type DataBusProperty<byte[]> which contains the bytes of the uploaded image. Furthermore it contains some metadata like the original filename and content type. The TimeToBeReceived value has been set to an hour, assuming that the message will be processed within an hour.

[TimeToBeReceived("01:00:00")]
public class ImageUploaded : IMessage
{
    public Guid Id { get; set; }
    public string FileName { get; set; }
    public string ContentType { get; set; }
    public DataBusProperty<byte[]> Image { get; set; }
}

That’s it, besides this configuration there is no difference with a regular message handler. It will appear as if the message has been sent to the worker as a whole, all the complexity regarding sending the image through blob storage is completely hidden for you.

public class CreateSmallThumbnail : IHandleMessages<ImageUploaded>
{
    private readonly IBus bus;

    public CreateSmallThumbnail(IBus bus)
    {
        this.bus = bus;
    }

    public void Handle(ImageUploaded message)
    {
        var thumb = new ThumbNailCreator().CreateThumbnail(message.Image.Value, 50, 50);

        var uri = new ThumbNailStore().Store(thumb, "small-" + message.FileName, message.ContentType);

        var response = bus.CreateInstance<ThumbNailCreated>(x => { x.ThumbNailUrl = uri; x.Size = Size.Small; });
        bus.Reply(response);
    }
}

Controlling the databus

In order to control the behavior of the databus, I’ve provided you with some optional configuration settings.

AzureDataBusConfig.BlockSize allows you to control the size in bytes of each uploaded block. The default setting is 4MB, which is also the maximum value.

AzureDataBusConfig.NumberOfIOThreads allows you to set the number of threads that will upload blocks in parallel. The default is 5

AzureDataBusConfig.MaxRetries allows you to specify how many times the databus will try to upload a block before giving up and failing the send. The default is 5 times.

AzureDataBusConfig.Container specifies the container in blob storage to use for storing the message parts, by default this container is named ‘databus’, note that it will be created automatically for you.

AzureDataBusConfig.BasePath allows you to add a base path to each blob in the container, by default there is no basepath and all blobs will be put directly in the container. Note that using paths in blob storage is purely a naming convention that is being adhered to, it has no other effects as blob storage is actually a pretty flat store.

Wrapup

With the databus, your messages are no longer limited in size. At least the limit has become so big that you probably don’t care about it anymore, in theory you can use 200GB per property and 100TB per message. The real limit however has now become the amount of memory available on either the machine generating or receiving the message, you cannot exceed those amounts for the time being… Furthermore you need to keep latency in mind, uploading a multi mega or gigabyte file takes a while, even from the cloud.

That’s it for today, please give the databus a try and let me know if you encounter any issues. You can find the sample used in this article in the samples repository, it’s named AzureThumbnailCreator and shows how you can create thumbnails of various sizes (small, medium, large) from a single uploaded image using background workers.

Have fun with it…

Improving throughput with NServiceBus on Windows Azure

One of the things that has always bothered me personally on the ‘NServiceBus – Azure queue storage’ relationship is throughput, the amount of messages that I could transfer from one role to the other per second was rather limited.

This is mainly due to the fact that windows azure storage throttles you at the http level, every queue only accepts 500 http requests per second and will queue up the remaining requests. Given that you need 3 requests per message, you can see that throughput is quite limited, you can transfer less than a hundred messages per second. (Sending role performing 1 post request, receiving role performing 1 get and 1 delete request)

One of the first things that you can do to increase throughput is using the SendMessages() operation on the unicast bus.This operation will group all messages passed into it into 1 single message and send it across the wire. Mind that queue storage also limits message size to 8KB, so in effect you can achieve a maximum improvement of factor 10, given that you have reasonable small messages and use binary formatting.

Secondly I’ve added support to the queue for reading in batches, using the GetMessages operation on the cloud queue client. By default the queue reads 10 messages at a time, but you can use a new configuration setting called BatchSize to control the amount of messages to be read. Mind that the BatchSize setting also influences the MessageInvisibleTime, as I multiply this number by the batchsize to define how long the messages have to stay invisible as overall process time may now take longer.

In the future I may consider even more improvements to increase throughput of queue storage. Like for example using multiple queues at a time to overcome the 500 requests per second limit. But as Rinat Abdullin already pointed out to me on twitter this might have grave consequences on both overall latency and costs. So before I continue with this improvement I have a question for you, do you think this additional latency and costs are warranted?

But even then, there is another throttle in place at the storage account level, which limites all storage operation requests to 5000 requests per second (this includes table storage and blob storage requests), in order to work around this limit you can specify a separate connection string for every destination queue using the following format “queuename@connectionstring”.

Building Global Web Applications With the Windows Azure Platform – Dynamic Work Allocation and Scale out

Today I would like to finish the discussion on ‘understanding capacity’ for my ‘Building Global Web Applications With the Windows Azure Platform’ series, by talking about the holy grail of cloud capacity management: Dynamic work allocation and scale out.

The basic idea is simple, keep all roles at full utilization before scaling out:

To make optimal use of the capacity that you’re renting from your cloud provider you could design your system in such a way that it is aware of it’s own usage patterns and acts upon these patterns. For example, if role 3 is running to many cpu intensive jobs and role 1 has excess capacity, it could decide to move some cpu intensive workloads off of role 3 to role 1. The system repeats these steps for all workload types and tries to maintain a balance below 80% overall capacity before deciding to scale out.

Turns out though that implementating this is not so straight forward…

First of all you need to be able to move workloads around at runtime. Every web and worker role needs to be designed in such a way that it can dynamically load workloads from some medium, and start executing it. But it also needs to be able to unload the workload, in effect your web or worker role becomes nothing more than an agent that is able to administer the workloads on the machine instead of executing them itself.

In the .net environment this means that you need to start managing separate appdomains or processes for each workload. Here you can find a sample where I implemented a worker role that can load other workloads dynamically from blob storage into a separate appdomain in response to a command that you can send from a console application. This sort of proves that moving workloads around should be technically possible.

Even though it is technically quite feasible to move workloads around, the hardest part is the business logic that decides what workloads should be moved, when and where to. You need to take quite a few things into account!

  • Every workload consumes a certain amount of cpu, memory and bandwith, but these metrics cannot be derived from traditional monitoring information as that only shows overall usage. So you need to define and compute additional metrics for each individual workload in order to know what the impact of moving that specific workload would be.
  • Workloads tend to be rather temporal as well, so a heavy cpu usage right now, does not mean it will consume the same amount in 5 seconds. So just simply moving workloads around when you detect a problem is not going to cut it.
  • In other words, you need to find ways to accurately predict future usage based on past metrics and user supplied information.
  • You need to ensure a workload is moved well before it actually would start consuming resources as moving the workload itself takes time as well.
  • These same problems repeat themselves on the target side, where you would move the workload to as that role’s utilization is in continuous flux as well.
  • I’m only touching the tip of the iceberg here, there is even much more to it…

Lot’s of hard work… but in time you will have to go through it. Please keep in mind that this is the way most utility companies make their (enormous amounts of) money, by continuously looking for more accurate ways to use and resell excess capacity.

Alright, now that you understand the concept of capacity and how it can help you to keep your costs down. It is time to move to the next section of this series: how to make your application globally available.

Building Global Web Applications With the Windows Azure Platform – Monitoring

In the fourth installment of the series on building global web applications I want to dive a bit deeper into monitoring your instances, as measuring and monitoring is key to efficient capacity management. The goal of capacity management should be to optimally use the instances that you have, ideally all aspects of your instances are utilised for about 80% before you decide to pay more and scale out.

Windows azure offers a wide range of capabilities when it comes to monitoring, by means of the WAD (Windows Azure Diagnostics) service, which can be configured to expose all kinds of information about your instances, including event logs, trace logs, IIS logs, performance counters and many more. The WAD can be configured both from code as by means of a configuration file that can be included in your deployment. See http://msdn.microsoft.com/en-us/library/gg604918.aspx for more details on this configuration file.

Personally I prefer using the configuration file for anything that is not specific to my code, like machine level performance counters, but I do use code for things like trace logs. To enable a specific performance counter on all your instances, specify it in the performance counters, including the rate at which the counter should be collected.

<PerformanceCounters bufferQuotaInMB="512" scheduledTransferPeriod="PT1M">
    <PerformanceCounterConfiguration counterSpecifier="\Processor(_Total)\% Processor Time" sampleRate="PT5S" />
    <PerformanceCounterConfiguration counterSpecifier="\Memory\% Committed Bytes In Use" sampleRate="PT5S" />
</PerformanceCounters>

Note that I only collect processor time and memory consumption from the instances, bandwidth throttling is performed at the network level, not the instance level, so you cannot collect any valuable data for this metric.

The diagnostics manager will transfer this information to your storage account, that you specified in your service configuration file under the key Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString, at the rate mentioned in the ScheduledTransferPeriod property of the PerformanceCounters element.

Now, I admit, today the Windows Azure management tooling offered by MS is a bit lacking in terms of visualising diagnostics and monitoring information. But there is a third party product, Diagnostics Manager by Cerebrata, that covers this gap very well. Here you can see how Diagnostics Manager visualises the memory and cpu usage in my instance.

Note, the consumption rates are very low now, only 20% of memory and just a few percent cpu is effectively used at the time of measurement. this is because I upscaled to a small web role in the mean time and wasn’t executing any tests when monitoring the instance.

So, now that you know how to monitor your instances efficiently it is time to start filling up the free capacity that is sitting idle in your machines, but that is for next time when I will discuss the holy grail of capacity management: dynamically work load allocation.

Building Global Web Applications With the Windows Azure Platform – Offloading static content to blob storage or the CDN

In this third post on building global web applications, I will show you what the impact of offloading images to blob storage or the CDN is in contrast to scaling out to an additional instance. Remember from the first post in this series that I had an extra small instance that started to show signs of fatigue as soon as more than 30 people came over to visit at once. Let’s see how this will improve by simply moving the static content.

In a first stage I’ve moved all images over to blob storage and ran the original test again, resulting in a nice scale up in terms of number of users the single instance can handle. Notice that the increase in users has nearly no impact on our role.  I lost about 50ms in minimum response time though, in comparison to the initial test, but I would happy to pay that price in order to handle more users. If you need faster repsonse times than the ones delivered by blob storage, you really should consider enabling the CDN.

And I’ll prove it with this second test: I enabled the CDN for my storage account, a CDN (or Content Delivery Network) brings files to a datacenter closer to the surfer, resulting in a much better overall experience when visiting your site. As you can see in the following test result, the page response times decrease dramatically, down to 30 percent:

But I can hear you think, what if I would have scaled out instead? If you compare the above results to the test results of simply scaling out to 2 extra small instances, you can see that 2 instances only moved the tipping point from 30 users to 50 users, just doubling the number of users we can handle. While offloading the images gives us a way more serious increase for a much lower cost ($0.01 per 10.000 requests).

Note that the most probable next bottleneck will become memory, as most of the 768 MB’s are being used by the operating system already. To be honest I do not consider extra small instances good candidates for deploying web roles on, as they are pretty limited in 2 important  aspects for serving content, bandwidth and memory. I do consider them ideal for hosting worker roles though, as they have quite a lot of cpu relative to the other resources and their price.

For web roles, intended to serve rather static content, I default to small instances as they have about 1GB of useable memory and 20 times the bandwith of  an extra small role for only little more than twice the price. Still the bandwidth is not excessive, so you still want to offload your images to blob storage and the CDN.

Please remember, managing the capacity of your roles is the secret to benefitting from the cloud. Ideally you manage to use each resource for 80% without ever hitting the limit… Another smart thing to do, is to host background work loads on the same machine as the web role to use the cpu cycles that are often not required when serving relatively static content.

Next time, we’ll have a look at how to intelligently monitor your instances which is a prerequisite to being able to manage the capacity of your roles…