Master the DP-900

An interactive study guide built on 7 memory techniques to help you pass the Microsoft Azure Data Fundamentals exam.

Core 25-30% Relational 20-25% Non-Relational 15-20% Analytics 25-30%

What it covers

Core data concepts (structured/semi/unstructured), relational databases (Azure SQL), non-relational (Cosmos DB), analytics workloads (Synapse, Power BI), and data governance (Purview).

Ideal for

IT professionals new to data, business analysts, project managers, and anyone wanting a foundational understanding of Azure data services.

Aspire to this if

You're starting a career in data, moving from infrastructure to analytics, or want to validate your Azure data knowledge before pursuing associate-level certifications.

Section 1 / Spatial Memory

The Map

An interactive architecture diagram of Azure data services. Click any node to explore key exam facts.

25-30%

Core Data
Concepts

20-25%

Relational
Data

15-20%

Non-Relational
Data

25-30%

Analytics
Workload

Section 2 / Narrative Memory

The Story

Follow data on its journey through the Azure data landscape — from raw source to business insight.

🌊

The Data Ocean

Data exists everywhere — in databases, files, streams, and APIs. It comes in three forms: structured (tables with fixed schemas), semi-structured (JSON, XML with flexible tags), and unstructured (images, videos, documents). Understanding these forms is the first step.

Exam Intel Three data classifications: Structured (fixed schema, tables), Semi-structured (tags/keys but flexible — JSON, XML), Unstructured (no schema — images, video, audio). Know which Azure services handle each.

🏗️

The Foundations

Before data can flow, it needs a home. Relational databases store structured data in normalised tables with ACID guarantees. Non-relational databases offer flexibility — documents, key-value pairs, graphs, and column families each solve different problems.

Exam Intel ACID = Atomicity (all or nothing), Consistency (valid state), Isolation (concurrent safety), Durability (committed = permanent). Normalisation reduces redundancy. Denormalisation improves read performance.

🗄️

The SQL Fortress

Azure SQL stands as a fortress for relational data. Three tiers serve different needs: SQL Database for cloud-native apps, Managed Instance for lift-and-shift migrations with near-full SQL Server compatibility, and SQL Server on VMs for those who need complete control.

Exam Intel Azure SQL Database = PaaS (Microsoft manages infra). Managed Instance = PaaS with 99% SQL Server compat. SQL VM = IaaS (you manage OS + SQL). Choose PaaS first, VM only when you need OS access or unsupported features.

🌐

The Global Network

Azure Cosmos DB spans the globe. Its multi-model engine speaks many languages — NoSQL, MongoDB, Cassandra, Gremlin, and Table. With single-digit millisecond reads and five tuneable consistency levels, it trades between performance and data freshness.

Exam Intel Cosmos DB 5 consistency levels (strongest to weakest): Strong → Bounded Staleness → Session (default) → Consistent Prefix → Eventual. Request Units (RUs) = throughput currency. APIs determine data model.

🏭

The Pipeline Factory

Data rarely stays where it was born. Azure Data Factory builds pipelines that extract data from 90+ sources, transform it in mapping data flows, and load it into analytical stores. Triggers schedule the work — on a timer, in tumbling windows, or when events fire.

Exam Intel ETL = Extract, Transform, Load (transform before storing). ELT = Extract, Load, Transform (load raw, transform in place — preferred for big data). Data Factory handles both. Mapping data flows = code-free visual transforms.

⚡

The Stream

Some data can't wait for batch processing. Event Hubs ingests millions of events per second, and Stream Analytics processes them in real-time using windowing functions — tumbling windows for fixed intervals, sliding windows for overlapping events.

Exam Intel Batch = process accumulated data at intervals. Streaming = process as it arrives. Windowing: Tumbling (fixed, non-overlapping), Hopping (fixed, overlapping), Sliding (event-triggered), Session (grouped by activity gap).

🏔️

The Data Lake

Azure Data Lake Storage Gen2 is the vast reservoir where raw data accumulates. Built on Blob Storage with a hierarchical namespace, it stores petabytes of data in formats like Parquet and Avro, ready for Spark and Synapse to analyse.

Exam Intel Data Lake Gen2 = Blob Storage + hierarchical namespace. Blob tiers: Hot (frequent access), Cool (30+ days), Archive (180+ days, offline). Parquet = columnar (analytics). Avro = row-based (streaming). CSV/JSON = human-readable.

🔬

The Analytics Engine

Azure Synapse Analytics is the great unifier — SQL pools for data warehousing, Spark pools for big data processing, and built-in pipelines for orchestration. Serverless SQL lets you query Data Lake files without provisioning anything.

Exam Intel Synapse dedicated SQL pool = MPP data warehouse (provisioned, pay for compute). Serverless SQL pool = pay-per-query on files. Spark pool = big data notebooks. Synapse Link = HTAP (analytical queries on operational data without ETL).

📊

The Dashboard Gallery

Power BI transforms data into stories. Desktop for authoring, Service for sharing, Mobile for on-the-go consumption. Reports tell detailed multi-page stories; dashboards pin the most critical tiles on a single page for at-a-glance monitoring.

Exam Intel Power BI Dashboard vs Report: Dashboard = single page, pins from multiple reports, no filters. Report = multi-page, interactive filters, built from one dataset. Paginated reports = pixel-perfect, optimised for printing.

🔍

The Catalogue

Microsoft Purview brings order to the data estate. It scans sources automatically, builds a Data Map of all assets, traces lineage from source to dashboard, and applies sensitivity labels for compliance — ensuring data is discoverable, trustworthy, and governed.

Exam Intel Purview Data Catalog = discover/search data. Data Map = automated metadata scanning. Data Lineage = source-to-report tracing. Data Estate Insights = usage and health dashboards. Sensitivity labels from Microsoft 365 apply to data assets.

Section 3 / Acronym Memory

Mnemonic Wall

Memorable acronyms and phrases to anchor key exam concepts in your memory.

💎

ACID

Atomicity, Consistency, Isolation, Durability

Transaction guarantees. All-or-nothing, valid state, concurrent safety, committed = permanent.

📊

OLTP vs OLAP

OnLine Transaction Processing vs Analytical

OLTP = many small writes (point of sale). OLAP = complex reads (data warehouse). Know which services serve which.

🗄️

SMA

SQL Database, Managed Instance, Azure VM

"SMV" — Azure SQL family from most managed to least. PaaS → PaaS → IaaS.

🌐

SBSCE

Strong, Bounded Staleness, Session, Consistent Prefix, Eventual

Cosmos DB consistency levels from strongest to weakest. Session is the default.

📄

DKCG

Document, Key-value, Column-family, Graph

Four types of non-relational data stores. Cosmos DB supports all four via different APIs.

🔥

THSS

Tumbling, Hopping, Sliding, Session

"THE windows HOP, SLIDE, and SESSION." Stream Analytics windowing functions.

🏔️

HCCA

Hot, Cool, Cold, Archive

Blob Storage access tiers. Hot = frequent. Cool = 30+ days. Cold = 90+ days. Archive = 180+ days, offline, must rehydrate.

📊

DRD

Dataset → Report → Dashboard

Power BI building blocks in order. Dashboards pin tiles from reports. Reports built from datasets.

🔑

PK-FK

Primary Key — Foreign Key

PK uniquely identifies a row. FK references another table's PK. This creates relationships in relational databases.

📦

ELT

Extract, Load, Transform

Modern approach: load raw data first, transform in place. Preferred for big data. ETL transforms before loading.

🔬

DSP

Dedicated pool, Serverless pool, Spark Pool

Synapse Analytics three compute options. Dedicated = provisioned MPP. Serverless = pay-per-query. Spark = big data.

📁

PJCA

Parquet, JSON, CSV, Avro

Key file formats. Parquet = columnar (analytics). JSON = semi-structured. CSV = flat text. Avro = row-based (streaming).

🌐

NMCGT

NoSQL, MongoDB, Cassandra, Gremlin, Table

Cosmos DB's 5 APIs. NoSQL is the native/recommended API.

🛡️

AAA

Authentication, Authorisation, Auditing

Three pillars of data security. Who are you, what can you do, what did you do.

🔍

CMLD

Catalog, Map, Lineage, laD

Microsoft Purview capabilities. Catalog (discover), Map (scan metadata), Lineage (trace flow), Labels (classify).

Section 4 / Contrast Memory

API	Data Model	Best For
NoSQL	JSON documents	New apps (recommended)
MongoDB	BSON documents	MongoDB migrations
Cassandra	Column-family	Cassandra workloads
Gremlin	Graph (vertices/edges)	Relationship-heavy data
Table	Key-value	Azure Table migrations

Click to flip back

Section 5 / Visual Grouping

The Cheat Sheet

A dense four-column reference grid — one column per exam domain.

Core Data Concepts

25-30%

Data Classification

Structured: fixed schema, tables, rows/columns
Semi-structured: flexible tags (JSON, XML, YAML)
Unstructured: no schema (images, video, audio, docs)

Data Roles

Database Admin: backups, access, performance, security
Data Engineer: pipelines, integration, data prep
Data Analyst: reports, dashboards, visualisations

File Formats

CSV: plain text, comma-separated, human-readable
JSON: key-value pairs, semi-structured
Parquet: columnar, compressed, analytics-optimised
Avro: row-based, schema evolution, streaming

Processing Patterns

Batch: scheduled, large volumes, high latency OK
Streaming: real-time, event-driven, low latency
ETL: transform before load (traditional)
ELT: load raw, transform in place (modern/cloud)

Relational Data

20-25%

Core Concepts

Tables, rows (records), columns (fields)
Primary Key: unique row identifier
Foreign Key: references another table's PK
Normalisation: reduce redundancy (1NF, 2NF, 3NF)
ACID transactions: all-or-nothing guarantees

SQL Commands

DDL: CREATE, ALTER, DROP (structure)
DML: SELECT, INSERT, UPDATE, DELETE (data)
DCL: GRANT, REVOKE (permissions)
Views: virtual tables from queries
Indexes: speed up queries, slow down writes

Azure SQL Family

SQL Database: PaaS, elastic pools, serverless
Managed Instance: near-full SQL Server compat
SQL VM: IaaS, full OS access, any SQL version
PostgreSQL/MySQL: open-source managed PaaS

Security

Azure AD authentication
Firewall rules (IP-based)
TDE: Transparent Data Encryption (at rest)
Always Encrypted: client-side encryption

Non-Relational Data

15-20%

Data Store Types

Key-value: simple lookups (Redis, Table Storage)
Document: JSON docs (Cosmos DB NoSQL)
Column-family: wide columns (Cosmos DB Cassandra)
Graph: vertices + edges (Cosmos DB Gremlin)

Azure Cosmos DB

Global distribution with multi-region writes
5 APIs: NoSQL, MongoDB, Cassandra, Gremlin, Table
5 consistency levels: Strong → Eventual
Request Units (RUs) = throughput measure
Partition key determines data distribution

Azure Storage

Blob: unstructured (Hot/Cool/Archive tiers)
File: SMB/NFS file shares (lift-and-shift)
Queue: messaging between components
Table: key-value NoSQL (simple, cheap)

When to Use

Key-value: session state, caching, config
Document: content management, catalogues
Column-family: IoT telemetry, time series
Graph: social networks, fraud detection

Analytics Workload

25-30%

Azure Synapse

Dedicated SQL pool: provisioned MPP warehouse
Serverless SQL pool: pay-per-query on files
Spark pool: big data notebooks
Synapse Link: HTAP (no-ETL analytics on ops data)

Power BI

Desktop: author reports locally
Service: publish, share, collaborate
Mobile: consume on devices
Dashboard = single page, pins from reports
Report = multi-page, interactive, one dataset
Paginated = pixel-perfect, print-ready

Data Warehousing

Star schema: fact table + dimension tables
Snowflake schema: normalised dimensions
Fact tables: measurements/events (numeric)
Dimension tables: descriptive attributes
Slowly Changing Dimensions (SCD) types

Governance

Microsoft Purview: unified data governance
Data Catalog: discover and classify assets
Data Map: automated metadata scanning
Data Lineage: trace data flow end-to-end
Sensitivity labels for compliance

Section 6 / Method of Loci

Memory Palace

Walk through five rooms, each representing a key area of Azure data. Objects fade in as you scroll.

The Foundation Hall

Core Data Concepts — Where understanding begins

📊

Structured Data

Tables with rows and columns. Fixed schema. SQL databases

📄

Semi-Structured

JSON, XML, YAML. Flexible tags/keys. No fixed table format

🖼️

Unstructured Data

Images, video, audio, documents. No inherent schema

💎

ACID Properties

Atomicity (all/nothing), Consistency, Isolation, Durability

📦

ETL vs ELT

Transform then load (ETL) vs load then transform (ELT). ELT preferred for cloud

⚡

Batch vs Stream

Batch = scheduled bulk. Stream = real-time. Different tools for each

The SQL Chamber

Relational Data — Where structure reigns

🗄️

Azure SQL Database

Fully managed PaaS. Elastic pools for multi-tenant. Serverless option

🔄

Managed Instance

Near 100% SQL Server compat. Lift-and-shift migrations. VNet integration

🖥️

SQL VM

Full IaaS control. Any SQL Server version. OS-level access

🔑

Keys & Relations

Primary Key = unique ID. Foreign Key = reference. Creates table relationships

📐

Normalisation

1NF → 2NF → 3NF. Reduces redundancy. Improves data integrity

🐘

Open Source DBs

Azure Database for PostgreSQL and MySQL. Fully managed PaaS

The Cosmos Chamber

Non-Relational Data — Where flexibility thrives

🌐

Cosmos DB

Global distribution. <10ms latency. 5 APIs. 5 consistency levels

📄

Document Store

JSON documents. Flexible schema. Each doc can differ. Cosmos DB NoSQL API

🔑

Key-Value Store

Simplest model. Fast lookups by key. Azure Table Storage, Redis Cache

📊

Column-Family

Wide columns. Good for time-series, IoT. Cosmos DB Cassandra API

🕸️

Graph Store

Vertices + edges. Model relationships. Cosmos DB Gremlin API

📦

Blob Storage

Unstructured data. Hot/Cool/Archive tiers. Block, append, page blobs

The Pipeline Room

Data Integration — Where data flows

🏭

Data Factory

90+ connectors. Pipelines. Mapping data flows. Schedule/event triggers

⚡

Stream Analytics

SQL-like queries on streams. 4 windowing functions. Real-time output

🔥

Azure Databricks

Apache Spark platform. Notebooks. Python/Scala/SQL/R. Delta Lake

🔬

Synapse Analytics

Unified: SQL pools + Spark pools + Pipelines. Serverless & dedicated

🏔️

Data Lake Gen2

Blob + hierarchical namespace. Parquet/Avro/Delta. Big data optimised

📡

Event Hubs

Millions of events/second. Kafka compatible. Streaming ingestion

The Insight Gallery

Analytics & Governance — Where value emerges

📊

Power BI Desktop

Author reports. Connect to data. DAX calculations. Publish to Service

📋

Power BI Service

Share reports. Create dashboards. Schedule refresh. Collaborate

⭐

Star Schema

Fact table (measures) + dimension tables (attributes). Analytics standard

🔍

Microsoft Purview

Data Catalog + Map + Lineage. Sensitivity labels. Unified governance

🛡️

Data Security

Azure AD auth. Firewall rules. TDE at rest. Always Encrypted

📈

Paginated Reports

Pixel-perfect. Print-optimised. Multi-page tables. SSRS-based

Section 7 / Pattern Recognition

Pattern Spotter

Decision flowcharts and trigger-answer pattern cards for common exam questions.

Which Azure Database Service?

What type of data?
  ├── Relational + new cloud app → Azure SQL Database
  ├── Relational + migrating SQL Server → SQL Managed Instance
  ├── Relational + need OS access → SQL Server on VM
  ├── NoSQL documents + global scale → Cosmos DB
  ├── Simple key-value lookups → Azure Table Storage
  ├── PostgreSQL or MySQL → Azure Database for PostgreSQL/MySQL
  └── Unstructured files/blobs → Azure Blob Storage

Which Analytics Service?

What do you need to do?
  ├── Full data warehouse (provisioned) → Synapse Dedicated SQL Pool
  ├── Ad-hoc queries on Data Lake files → Synapse Serverless SQL Pool
  ├── Big data processing with Spark → Synapse Spark Pool / Databricks
  ├── Real-time stream processing → Azure Stream Analytics
  ├── Business intelligence dashboards → Power BI
  └── Data governance & cataloguing → Microsoft Purview

Which Cosmos DB API?

What's the data model / migration source?
  ├── New app, JSON documents → NoSQL API (recommended)
  ├── Migrating from MongoDB → MongoDB API
  ├── Wide-column / Cassandra workload → Cassandra API
  ├── Graph / relationship-heavy data → Gremlin API
  └── Migrating from Azure Table Storage → Table API

Which File Format?

What's the use case?
  ├── Analytics / columnar queries → Parquet
  ├── Streaming / schema evolution → Avro
  ├── API / semi-structured exchange → JSON
  ├── Simple tabular / human-readable → CSV
  └── Delta Lake / ACID on data lake → Delta (Parquet + transaction log)

Trigger → Answer Patterns

"globally distributed" or "multi-region writes"

→ Azure Cosmos DB

"single-digit millisecond latency"

→ Cosmos DB

"Request Units" or "RU/s"

→ Cosmos DB throughput

"ACID transactions" on relational data

→ Azure SQL Database

"lift-and-shift" SQL Server migration

→ SQL Managed Instance

"elastic pool" for multi-tenant

→ Azure SQL Database

"hierarchical namespace"

→ Data Lake Storage Gen2

"Hot, Cool, Archive" tiers

→ Azure Blob Storage

"tumbling window" or "hopping window"

→ Stream Analytics windowing

"mapping data flow" or "90+ connectors"

→ Azure Data Factory

"serverless SQL pool" or "pay-per-query"

→ Synapse Analytics

"Synapse Link" or "HTAP"

→ No-ETL analytics on operational data

"star schema" or "fact and dimension"

→ Data Warehouse design

"Data Catalog" or "data lineage"

→ Microsoft Purview

"dashboard vs report"

→ Dashboard = single page, Report = multi-page

"paginated report" or "pixel-perfect"

→ Power BI Paginated Reports

"partition key" for NoSQL

→ Cosmos DB data distribution

"graph database" or "vertices and edges"

→ Cosmos DB Gremlin API

Ready to certify?

Train with practitioners, not presenters

Lucid Labs delivers Microsoft certification training grounded in real-world project experience. We adapt every session to your team's environment, data stack, and business objectives — because the best exam prep comes from engineers who build these solutions every day.

🎯

Tailored Content

Training built around your actual data, your tools, and your use cases — not generic slides.

🛠️

Hands-On Labs

Work through real scenarios in your own environment with expert guidance at every step.

📈

Exam + Capability

Pass the exam and build lasting skills your team can apply from day one.

Talk to us about Azure Data Fundamentals training

Custom training for teams & individuals — remote or on-site across Australia

Keith Oak

Director & Principal Consultant — Lucid Labs

Microsoft Solutions Partner architect specialising in Fabric, Azure Data & AI, and GitHub Enterprise. 18+ years delivering data platforms for Australian businesses — building the systems these exams test every day.

LinkedIn ↗ lucidlabs.com.au ↗ Published 29-03-2026