The Story
Follow data on its journey through the Azure data landscape β from raw source to business insight.
The Data Ocean
Data exists everywhere β in databases, files, streams, and APIs. It comes in three forms: structured (tables with fixed schemas), semi-structured (JSON, XML with flexible tags), and unstructured (images, videos, documents). Understanding these forms is the first step.
The Foundations
Before data can flow, it needs a home. Relational databases store structured data in normalised tables with ACID guarantees. Non-relational databases offer flexibility β documents, key-value pairs, graphs, and column families each solve different problems.
The SQL Fortress
Azure SQL stands as a fortress for relational data. Three tiers serve different needs: SQL Database for cloud-native apps, Managed Instance for lift-and-shift migrations with near-full SQL Server compatibility, and SQL Server on VMs for those who need complete control.
The Global Network
Azure Cosmos DB spans the globe. Its multi-model engine speaks many languages β NoSQL, MongoDB, Cassandra, Gremlin, and Table. With single-digit millisecond reads and five tuneable consistency levels, it trades between performance and data freshness.
The Pipeline Factory
Data rarely stays where it was born. Azure Data Factory builds pipelines that extract data from 90+ sources, transform it in mapping data flows, and load it into analytical stores. Triggers schedule the work β on a timer, in tumbling windows, or when events fire.
The Stream
Some data can't wait for batch processing. Event Hubs ingests millions of events per second, and Stream Analytics processes them in real-time using windowing functions β tumbling windows for fixed intervals, sliding windows for overlapping events.
The Data Lake
Azure Data Lake Storage Gen2 is the vast reservoir where raw data accumulates. Built on Blob Storage with a hierarchical namespace, it stores petabytes of data in formats like Parquet and Avro, ready for Spark and Synapse to analyse.
The Analytics Engine
Azure Synapse Analytics is the great unifier β SQL pools for data warehousing, Spark pools for big data processing, and built-in pipelines for orchestration. Serverless SQL lets you query Data Lake files without provisioning anything.
The Dashboard Gallery
Power BI transforms data into stories. Desktop for authoring, Service for sharing, Mobile for on-the-go consumption. Reports tell detailed multi-page stories; dashboards pin the most critical tiles on a single page for at-a-glance monitoring.
The Catalogue
Microsoft Purview brings order to the data estate. It scans sources automatically, builds a Data Map of all assets, traces lineage from source to dashboard, and applies sensitivity labels for compliance β ensuring data is discoverable, trustworthy, and governed.