Azure SQL Database is one of the most used services in Microsoft Azure, and I use it a lot in my projects. It is basically SQL Server in the cloud, but fully managed and more intelligent. There is another service in Azure that is kind of similar, but not quite: Azure SQL Data Warehouse. Azure SQL Data Warehouse uses a lot of Azure SQL technology, but is different in some profound ways.
In this article, we’ll dive into these differences. You will learn:
Let’s dive in!
Azure SQL Database is SQL Server in the cloud. And because it runs in the cloud, you don’t have to worry about maintaining any infrastructure, tweaking database files, or patching operating systems. You also don’t have to worry about SQL Server licenses, that is all included in the Azure SQL pricing. You just spin it up and use it. These benefits are some of the reasons why running your app in the cloud is incredibly beneficial.
Azure SQL can almost do anything that on-premises SQL Server can do, with a few exceptions. For instance, Azure SQL doesn’t have a SQL agent running, as there are other services in Azure that you can use for that, like Azure Data Factory. There are also some differences in supported T-SQL statements. As an example, you can’t use EXECUTE AS LOGIN in Azure SQL. You can use EXECUTE AS USER instead. Even with these differences, I always go for Azure SQL instead of on-premises SQL Server where I can, because Azure SQL is fully managed and easily scalable.
Additionally, Azure SQL Database offers a lot of intelligent features. Here are some of them:
Dynamic data masking. This enables you to mask sensitive data, like credit card numbers, for certain users, on the fly. The data itself doesn’t change, but is obfuscated when it is retrieved by users that aren’t allowed to see it. This is a great feature for using production data in your test or even development environment.
Geo-replication. This enables you to replicate your complete database to a database in another geographic region. This replication happens live and has a maximum lag of 5 minutes, but is much faster in practice. You can create as many geo-replicas as you want, but you will always have just one databases that you can write to. The other databases are read-only. You can use this to make sure that your application keeps running if one region fails, and to make your application more performant across geographic regions.
Automatic tuning. This analyzes the queries that are executed by Azure SQL Database and determines if something needs to happen to increase performance. It can scale for you, tune the indexes on your database, and tune history logs. It even checks if the action that it performed helped, and if it didn’t, it reverses it. This is much more effective than manual performance tuning in SQL Server.
Azure SQL offers many other features like SQL threat detection, Data encryption at rest, and Azure AD integration. It is a really mature and advanced service.
You now know what Azure SQL Database is, so what is Azure SQL Data Warehouse? Well, it is the SQL Server Data Warehouse feature in the cloud. SQL Server Data Warehouse exists on-premises as a feature of SQL Server. In Azure, it is a dedicated service that allows you to build a data warehouse that can store massive amounts of data, scale up and down, and is fully managed. As with Azure SQL Database, Azure SQL Data Warehouse is something that you just spin up. You don’t have to worry about infrastructure or licenses.
Azure SQL Data Warehouse is often used as a traditional data warehouse solution. This means that you would put massive amounts of data in it, using a data schema of tables and columns that you have designed. Data visualization tools, like PowerBI can than connect to the data warehouse to query the data and answer business questions in reports and graphs.
Azure SQL Data Warehouse has features that are designed for working with big data and serving it for further analysis and visualization. Some of these features are also available for Azure SQL Database. Here are some of them:
Polybase T-SQL queries. These allow you to get data from outside sources, like Hadoop or Azure Blob Storage using regular T-SQL queries. You don’t have to know the details of the system you are querying, since Polybase takes care of it.
Massive parallel processing (MPP). Azure SQL Data Warehouse is designed for data analytics performance when working with massive amounts of data. It can do this because of its MPP architecture. This means that a query is processed by a dedicated node that has its own CPU and Memory.
The ability to pause and resume the service. Unlike running a data warehouse on-premises, you can actually pause Azure SQL Data Warehouse when you don’t need it. When you do this, you don’t pay the costs for the Data Warehouse, but you do still pay the costs for the data in it (at the rate of Azure Premium Storage). When you want to use it again, you can simply resume the service.
In addition, Azure SQL Data Warehouse has similar features to Azure SQL Database, like Data encryption at rest and Azure AD integration.
So you can use both Azure SQL Database and Azure SQL Data Warehouse to store data. Why not use Azure SQL Database as a data warehouse? Well, you could do that, but it is not optimized for it.
Azure SQL Database is optimized for doing CRUD operations (Create, Read, Update and Delete) that you typically perform from an application. This is also called OLTP (Online Transaction Processing). This is reflected by the functionality that it offers, which is typically used when you are building applications. Azure SQL Database also scales for OLTP, as different pricing tiers typically scale to give you more query throughput and not so much data (the current maximum is 1TB, and in some regions 4TB).
Azure SQL Data Warehouse is optimized for performing data analytics tasks, and working with large amounts of data. This is also called OLAP (Online Analytical Processing). Data Warehouse is optimized for OLAP because it is built on top of the MPP (Massive Parallel Processing) architecture, and because it can hold massive amounts of data (currently the maximum is around 1PB) – much more than Azure SQL Database can store in one instance.
That being said, sometimes, you can use Azure SQL Database as a Data Warehouse, as it can provide a lot of performance for when you have many users with a relatively small dataset. Table 1 shows some of the characteristics that might make picking one over the other easier:
Azure SQL Database | Azure SQL Data Warehouse | |
Data type | Relational | Relational |
Active geo-replication | Yes | No |
Dynamic Data Masking | Yes | No |
Data Encryption at rest | Yes | Yes |
Polybase T-SQL queries | Yes | Yes |
Automatic Tuning | Yes | No |
Massive Parallel Processing (MPP) | No | Yes |
Ability to pause and resume | No | Yes |
Max amount of data per database | 4TB | 1PB |
Max concurrent open sessions | 30000 | 1024 |
Max concurrent queries | 6400 | 32 |
Table 1: Azure SQL Database vs. Azure SQL Data Warehouse features and limits
It seems clear what to use when, but is it really? You can also use Azure SQL Database as a Data Warehouse in a certain scenario. Table 2 shows what I think you should use when:
Azure SQL Database | Azure SQL Data Warehouse | |
Use for application database | X | |
Use for data warehouse with large amounts of data and small amounts of users | X | |
Use for data warehouse with data max 4TB and large amount of users | X |
Table 2: when to use Azure SQL Database and when to use Azure SQL Data Warehouse
I hope this article helped you to understand the differences between Azure SQL Database and Azure SQL Data Warehouse and what to use when. If you want to compare more data stores in Azure to each other, you should read “Where to Store Your Data in Azure? Understand Azure Data Storage Options” on the Stackify blog.
With APM, server health metrics, and error log integration, improve your application performance with Stackify Retrace. Try your free two week trial today
If you would like to be a guest contributor to the Stackify blog please reach out to [email protected]