EPAM Cloud: Scaling Out Real Project Database With SQL Azure Federations

Application scaling

Scaling application in cloud is a burning issue. The very concept of cloud technology implicates on-demand application scaling. Any decent cloud provider supports respective functions.

Why is scaling out so important? We have several options regarding to increasing application performance. We can buy new server hardware, add more RAM, and so on. This is vertical scaling, or scaling up. However, this option can be pretty costly, time-consuming, and has its limits. You can go for cutting-edge technology, but it can still be insufficient for your application.

The second option is horizontal scaling, or scaling out. It implies extending the computing power of the application by increasing the number of servers where the application is located, or application instances in case of PaaS. In other words, if the application is located on one server, and at some point it stops "pulling its weight", we can simply buy another server, identical to the first one, and install our application on it. This way, the queries to the application will be divided between the two servers.

This is the fundamental principle of scaling out applications hosted in a "cloud", but instead of real physical servers, we have the notion of a virtual machine. When one copy of a virtual machine becomes insufficient for our application, we can increase the performance by distributing the workload between several virtual machines.

The capabilities of Microsoft cloud platform are quite broad. They include auto-scaling and on-demand scaling, which are available through a user interface as well as through SDK, REST API and PowerShell.

Although scaling applications (PaaS) or virtual machines (IaaS) is simple – just define the necessary number of instances and you got them – more questions will arise if your application uses MS SQL databases. The first thing that comes to mind is to create a cluster of virtual machines with SQL Server. This solution is relatively simple and widely known. But what if the application uses the database as a service (SaaS)? And what if we don't want to fiddle with setting up an SQL Server cluster?

Since we are dealing with Windows Azure, the SQL database should be SQL Azure. This database supports a scaling out technology (sharding) called SQL Azure Federations. It is very simple: logically independent lines of one table are stored in separate databases. A simple example:

This is one and the same table, but its data is stored in separate copies of the database (shards). The account information with ID 1 is stored in the first database, with ID 2 - in the second one, and so on.

What do we get from all of this? First of all, data isolation. There is no connection between data from one account and data from another. Thereafter, sharding allows us not only to scale out databases, but also to implement multi-tenant scenarios. Since the amount of data stored in a specific shard is significantly smaller than the total size of the database, data retrieval is much faster.

SQL Azure limitations

Every technology has its drawbacks, and sharding is not an exception. Let's consider what drawbacks we may encounter with the sharding technology, using the database described in the example above.

As specified by the SQL Azure architecture, the service does not support data retrieval from several databases simultaneously. One database equals one connection. The shards are not an exception. We would have to run a separate query to return the number of clients in the database for every shard.

Initial query:

SELECT Count(*) FROM Account

Query for a specific shard:

USE FEDERATION Accounts(AccountId = 4) WITH RESET, FILTERING = OFF
GO
SELECT Count(*) FROM Account

The logic of summing up the values returned by this query must be implemented in the application. As a result of using federations, part of the code will "go into" the application, as some of the database-level capabilities of regular SQL Server are limited.

Foreword

SQL Azure Federations is not a silver bullet, of course. You can always develop and implement your own principle for scaling out your database. Let's assume that the multi-tenant approach is a kind of database scaling out, as one user’s data is separated from the data of another not only "logically", but also "physically".

If we need to add new users, we configure separate databases for them. The problem is that the application logic must contain a "routing" mechanism, meaning the application must know which database it is currently working with.

But let's return to SQL Azure Federations.

The idea offered by Microsoft idea itself is worthy to be praised. It would be great to have a tool for scaling out databases quickly and easily. Add to that auto-scaling based on the results of specific queries (but that would be pure sci-fi)...

As a rule, before you make the final decision to start using SQL Azure Federations, you must perform a detailed analysis of the existing database (because you usually port existing databases), or plan out in minute detail the architecture of the database that the application will be using, as well as the logic of the application that will be working with the database.

That’s all for now with the theory, but when it comes to practice, there is usually a significant number of obstacles. Instead of learning how easy it is to start using SQL Azure Federations from scratch, we will try to migrate an existing SQL Azure database to using federations. We will observe the steps performed by the DBA, as well as the problems that can be encountered during the migration.

So stay tuned and have a great day!

Wednesday, April 30, 2014

Scaling Out Real Project Database With SQL Azure Federations

1 comment: