Greetings to all the Internet
dwellers out there, and a happy beginning of a new week! We continue migrating
the database using SQL Azure Federations. As you may remember, previously we
selected the table and the field we will be using to divide our database into
shards. Let’s do it!
Migration
We will perform the partitioning of our database using the Account table, becausethe data records stored in and logically connected to it do not overlap. Since we have the script for creating a database, let’s try to adapt it to be used in SQL Azure Federations.
Let’s assume that the database has been already created in the Windows Azure Management Portal or with the help of the SQL Server Management Studio.
Open the script for creating objects in the database.
USE xPenses
GO
IF EXISTS (SELECT name FROM sysobjects where name = N'Operation') DROP TABLE Operation
...
The first thing we must get rid of is the USE operation, as this is one of the main limitations of SQL Azure. One database – one connection. Instead, add requests for creating a federation:
CREATE FEDERATION Accounts(AccountId BIGINT RANGE)
GO
USE FEDERATION Accounts(AccountId = 1) WITH RESET, FILTERING = OFF
GO
Pay attention that if you connect to the SQL Azure Server via SSMS to run a query, you must select the database from a drop-down list (see below).

This way you create a new federation, the data records of which are distributed by the Account ID value. Keep in mind that no tables exist in the database at this point, meaning the AccountId field is not yet bound to any data set from real tables. The field name may also differ from the name of the field in the table that will be used for the partition.
Here you can see another logical limitation of SQL Azure Federations. The field used for the distribution must have the INT, BIGINT, UNIQUEIDENTIFIER and VARBINARY type.
After creating the federation, you must select the first shard to begin inserting data, in other words, the shard that will store the data of the first account (AccountId = 1).
Now back to the script. You need to modify the creation of the Account table, so that SQL Azure knows that the data from this specific table will be distributed among the shards using the field Id.
GO
USE FEDERATION Accounts(AccountId = 1) WITH RESET, FILTERING = OFF
GO
Pay attention that if you connect to the SQL Azure Server via SSMS to run a query, you must select the database from a drop-down list (see below).

This way you create a new federation, the data records of which are distributed by the Account ID value. Keep in mind that no tables exist in the database at this point, meaning the AccountId field is not yet bound to any data set from real tables. The field name may also differ from the name of the field in the table that will be used for the partition.
Here you can see another logical limitation of SQL Azure Federations. The field used for the distribution must have the INT, BIGINT, UNIQUEIDENTIFIER and VARBINARY type.
After creating the federation, you must select the first shard to begin inserting data, in other words, the shard that will store the data of the first account (AccountId = 1).
Now back to the script. You need to modify the creation of the Account table, so that SQL Azure knows that the data from this specific table will be distributed among the shards using the field Id.
CREATE TABLE Account (
[Id] INTEGER NOT NULL PRIMARY KEY IDENTITY(1,1),
[EntityId] INTEGER NOT NULL FOREIGN KEY REFERENCES Entity(Id),
[Currency] NVARCHAR(3)
)
This way, a table creation script transforms into the following:
CREATE TABLE Account (
[Id] BIGINT NOT NULL,
[EntityId] INTEGER NOT NULL FOREIGN KEY REFERENCES Entity(Id),
[Currency] NVARCHAR(3),
CONSTRAINT [PK_Account] PRIMARY KEY CLUSTERED
(
[Id] ASC
)
) FEDERATED ON (AccountId= Id)
So, what has changed? The ID
field type is now BIGINT. In addition to that, you lose the ability to automatically
generate a value for this field when inserting a new record. This is another SQL
Azure Federations limitation. But you can still use the DEFAULT keyword. This
can be useful if, for example, the ID field type is UNIQUEIDENTIFIER. In this case,
you can declare the field as follows:
[Id] UNIQUEIDENTIFIERNOT NULL DEFAULT NEWID()
Now, when you insert new records
into the table, you don’t need to define the ID of the created record. When
working with other field types, this logic must be implemented on the
application level.
The next thing you should
pay attention to is declaring the table’s primary key. You must explicitly define
that the created key will be a clustered one.
The last thing you must do is
use the FEDERATED ON keyword to define that the current table will be federated.
The data in it will be split by the ID field.
The Account table is
created. Moving along! As you can see from the database scheme, the Account table
is a parent for the Credit Card and Bank Account tables.
In other words, the Bank Account
and Credit Card tables have the foreign key referencing the Account table. As the
Account table is now federated, you cannot ensure referential integrity from one
table to another.
This is yet another SQL
Azure Federations limitation. Tables that are used for
data partitioning (federated tables) cannot be referenced in other tables. You
will need to remove all foreign keys from tables referencing the Account table.
Consequently, a script for
creating, for example, a Credit Card table, will change from something like
this:
CREATE
TABLE CreditCard (
[Id] INTEGER NOT NULL PRIMARY KEY IDENTITY(1,1),
[AccountId] INTEGER NOT NULL FOREIGN KEY REFERENCES Account(id),
[Type] NVARCHAR(MAX)
CONSTRAINT CreditCardType CHECK (
[Type] = 'Visa'
OR [Type] = 'MasterCard'
OR [Type] = 'JCB'
OR [Type] = 'AmericanExpress'),
[Number] NVARCHAR(MAX)
)
[Id] INTEGER NOT NULL PRIMARY KEY IDENTITY(1,1),
[AccountId] INTEGER NOT NULL FOREIGN KEY REFERENCES Account(id),
[Type] NVARCHAR(MAX)
CONSTRAINT CreditCardType CHECK (
[Type] = 'Visa'
OR [Type] = 'MasterCard'
OR [Type] = 'JCB'
OR [Type] = 'AmericanExpress'),
[Number] NVARCHAR(MAX)
)
into something like this:
CREATE
TABLE CreditCard (
[Id] INTEGER NOT NULL PRIMARY KEY IDENTITY(1,1),
[AccountId] BIGINT NOT NULL,
[Type] NVARCHAR(MAX)
CONSTRAINT CreditCardType CHECK (
[Type] = 'Visa'
OR [Type] = 'MasterCard'
OR [Type] = 'JCB'
OR [Type] = 'AmericanExpress'),
[Number] NVARCHAR(MAX)
)
[Id] INTEGER NOT NULL PRIMARY KEY IDENTITY(1,1),
[AccountId] BIGINT NOT NULL,
[Type] NVARCHAR(MAX)
CONSTRAINT CreditCardType CHECK (
[Type] = 'Visa'
OR [Type] = 'MasterCard'
OR [Type] = 'JCB'
OR [Type] = 'AmericanExpress'),
[Number] NVARCHAR(MAX)
)
The consistency of the
references from the Credit Card table records to the Account table falls
entirely on the application logic.
If you try running the database
creation script at this point, all the tables will be created successfully, but
in addition to the tables, the database also includes two procedures. The first
procedure is for adding a new category and it does not require any changes,
because its logic remains within the bounds of one shard.
However, the procedure for
adding new accounts (AddAccount) requires some minor adjustments. See the
source code of the procedure:
CREATE
PROCEDURE AddAccount(
@Name NVARCHAR(MAX),
@Description NVARCHAR(MAX),
@Currency NVARCHAR(3),
@Instrument NVARCHAR(MAX),
@Type NVARCHAR(MAX),
@Number NVARCHAR(MAX)
)
AS
INSERT INTO Entity VALUES (@Name, @Description)
DECLARE @EntityId INTEGER = (SELECT Id FROM Entity WHERE Name = @Name AND Description = @Description)
INSERT INTO Account VALUES (@EntityId, @Currency)
IF (@Instrument = 'BankAccount')
BEGIN
INSERT INTO BankAccount VALUES (
(SELECT Id FROM Account WHERE Account.EntityId = @EntityId),
@Type,
@Number
) END
IF (@Instrument = 'CreditCard')
BEGIN
INSERT INTO CreditCard VALUES (
(SELECT Id FROM Account WHERE Account.EntityId = @EntityId),
@Type,
@Number
) END
GO
@Name NVARCHAR(MAX),
@Description NVARCHAR(MAX),
@Currency NVARCHAR(3),
@Instrument NVARCHAR(MAX),
@Type NVARCHAR(MAX),
@Number NVARCHAR(MAX)
)
AS
INSERT INTO Entity VALUES (@Name, @Description)
DECLARE @EntityId INTEGER = (SELECT Id FROM Entity WHERE Name = @Name AND Description = @Description)
INSERT INTO Account VALUES (@EntityId, @Currency)
IF (@Instrument = 'BankAccount')
BEGIN
INSERT INTO BankAccount VALUES (
(SELECT Id FROM Account WHERE Account.EntityId = @EntityId),
@Type,
@Number
) END
IF (@Instrument = 'CreditCard')
BEGIN
INSERT INTO CreditCard VALUES (
(SELECT Id FROM Account WHERE Account.EntityId = @EntityId),
@Type,
@Number
) END
GO
In reality, the changes you
need to make are pretty obvious. Since you lost the ability to generate values
for the ID field automatically, this logic becomes part of the application
logic. You must change the procedure header:
CREATE PROCEDURE AddAccount(
@AccountId BIGINT,
@Name NVARCHAR(MAX),
@Description NVARCHAR(MAX),
@Currency NVARCHAR(3),
@Instrument NVARCHAR(MAX),
@Type NVARCHAR(MAX),
@Number NVARCHAR(MAX)
)
AS
INSERT INTO Entity VALUES (@Name, @Description)
DECLARE @EntityId INTEGER = (SELECT Id FROM Entity WHERE Name = @Name AND Description = @Description)
INSERT INTO Account VALUES (@AccountId, @EntityId, @Currency)
...
GO
CREATE PROCEDURE AddAccount(
@AccountId BIGINT,
@Name NVARCHAR(MAX),
@Description NVARCHAR(MAX),
@Currency NVARCHAR(3),
@Instrument NVARCHAR(MAX),
@Type NVARCHAR(MAX),
@Number NVARCHAR(MAX)
)
AS
INSERT INTO Entity VALUES (@Name, @Description)
DECLARE @EntityId INTEGER = (SELECT Id FROM Entity WHERE Name = @Name AND Description = @Description)
INSERT INTO Account VALUES (@AccountId, @EntityId, @Currency)
...
GO
Consequently, by adding records
to the Account table, instead of the following code:
EXEC
AddAccount 'Cash', 'Everyday cash account', 'USD', NULL, NULL, NULL
we will now receive one
more parameter (ID of the created account):
EXEC AddAccount 1, 'Cash', 'Everyday cash account', 'USD', NULL, NULL, NULL
Possibly, the next logical question is whether it is possible to add a command to use the necessary federation (USE FEDERATION) to the body of the procedure. As we got the ID of the account to be created, we know which federation we are going to work with, and can directly proceed to use the necessary shard.
EXEC AddAccount 1, 'Cash', 'Everyday cash account', 'USD', NULL, NULL, NULL
Possibly, the next logical question is whether it is possible to add a command to use the necessary federation (USE FEDERATION) to the body of the procedure. As we got the ID of the account to be created, we know which federation we are going to work with, and can directly proceed to use the necessary shard.
USE FEDERATION Accounts(AccountId = @AcccountId) WITH RESET, FILTERING = OFF
GO
Unfortunately, if you do this,
SSMS will display an error message. The problem is that the AddAccount
procedure is stored on a specific shard, and this means that you cannot use the
USE FEDERATION operation. Apart from that, using the USE FEDERATION statement is
just impossible in procedures. The code for switching federations must be
placed at a “higher level.”
This concludes the changes
that must be made in the database creation script. You can apply them without
errors. As a result, you create a single root database (federation root) and
one shard (federation member).
Sharding
One last thing we have left
to do is actually scaling out the database – separating the data of one account
from the data of another account.
Create a separate script
for this purpose:
--
Scaling out the federation
USE FEDERATION ROOT WITH RESET
GO
ALTER FEDERATION Accounts SPLIT AT (AccountId = 2)
GO
The first thing this script does is it starts using the federation root, meaning it starts working within the bounds of the xPenses database, since the information about federations (metadata) is stored there.
USE FEDERATION ROOT WITH RESET
GO
ALTER FEDERATION Accounts SPLIT AT (AccountId = 2)
GO
The first thing this script does is it starts using the federation root, meaning it starts working within the bounds of the xPenses database, since the information about federations (metadata) is stored there.
Next, define that you want
to partition the Accounts federation, beginning with the AccountId field with
the value “2”. This way, the data of the first account remains on the first
shard, and the data of all the other accounts is moved to the next. Keep in
mind that you do not need to specify it anywhere that the table used for data
partitioning is the Account table. You work only with the metadata of the
xPenses database!
Well then, you run this
command and… there is a possibility that when you refresh the Object Explorer
window you’ll see the following:
Instead of one new shard
you got… three! There is actually nothing wrong with that. The reason for this is
that the data stored within the bounds of the first shard is copied in
accordance with the AccountId value. In other words, for the account with the ID
value “1” you must copy the data to one shard, for one with the ID value “2” –
to another one. Obviously, this takes time. After SQL Azure redistributes the data
among the shards, you will see that you got a database with the federations metadata
and two shards.
Now, if you need to split the
data into three shards, for example, to move the data of the account with the
ID value “3” to a separate shard, you just need to run the following command:
ALTER
FEDERATION Accounts SPLIT AT (AccountId = 3)
GO
Conclusion
We have examined the process of migrating a database creation script for using it with SQL Azure Federations. As you can see, you can avoid most bottlenecks pretty easily. However, a significant part of the database logic must be moved to a “higher level”. We stayed at the database level. In real projects, we strongly recommend conducting a detailed analysis of the application environment and the database architecture before commencing migration using SQL Azure Federations.
·




No comments:
Post a Comment