Master the DP-700

An interactive study guide built on 7 memory techniques to help you pass the Microsoft Fabric Data Engineer Associate exam.

Implement 30-35% Ingest 30-35% Monitor 30-35%
What it covers

Lakehouses, Spark notebooks (PySpark/SQL), data pipelines, mirroring, Delta Lake, medallion architecture (bronze/silver/gold), and monitoring & optimisation.

Ideal for

Data engineers, ETL developers, platform engineers, and anyone building data pipelines and lakehouses in Microsoft Fabric.

Aspire to this if

You're a SQL developer or data analyst wanting to move into data engineering, or an Azure data engineer transitioning to Fabric.

The Map

An interactive architecture diagram of Microsoft Fabric data engineering. Click any node to explore key exam facts.

๐Ÿข OneLake Unified Storage ยท ADLS Gen2 ๐Ÿ”ง Data Pipelines Copy ยท Dataflows Gen2 ๐Ÿ““ Spark Notebooks PySpark ยท Spark SQL ยท Scala ๐Ÿ”„ Mirroring Near real-time replication ๐Ÿฅ‰ Bronze Raw / Landing ๐Ÿฅˆ Silver Cleansed / Conformed ๐Ÿฅ‡ Gold Business-Ready ๐Ÿ—๏ธ Warehouse / Lakehouse SQL Analytics Endpoint ยท T-SQL + Spark โšก Eventhouse Streaming ยท KQL ๐Ÿงฎ Semantic Model Direct Lake ยท Power BI ๐Ÿ›ก๏ธ Governance Domains ยท Lineage ยท Security
30-35%
Implement &
Manage
30-35%
Ingest &
Transform
30-35%
Monitor &
Optimise

The Story

Follow data on its engineering journey through Microsoft Fabric โ€” from raw ingestion to business-ready gold.

๐Ÿข

The Foundation

Everything begins with OneLake โ€” the unified data lake that underpins all of Microsoft Fabric. Built on ADLS Gen2, every piece of data automatically lands here. Shortcuts extend its reach across clouds without copying a single byte.

Exam Intel OneLake = one lake per tenant. Built on ADLS Gen2. All Fabric items store data automatically. Shortcuts: zero-copy access to internal/external sources (ADLS, S3, GCS, Dataverse).
๐Ÿ”ง

The Intake

Data flows in through multiple channels. Pipelines orchestrate bulk movements with Copy Activity and its 170+ connectors. Dataflows Gen2 provide no-code Power Query transformations. For heavy engineering, Spark notebooks run PySpark at scale.

Exam Intel Pipeline activities: Copy, Dataflow, Notebook, Stored Procedure, ForEach, If Condition, Web. Triggers: schedule, tumbling window, event-based. Dataflows Gen2 = Power Query Online with lakehouse staging.
๐Ÿ”„

The Mirror

Some data doesn't need pipelines at all. Mirroring creates near-real-time replicas of external databases โ€” Azure SQL, Cosmos DB, Snowflake โ€” as read-only Delta tables in OneLake. Change Data Capture ensures only differences flow, not full copies.

Exam Intel Mirroring = near real-time CDC replication. No ETL needed. Creates read-only Delta tables. Sources: Azure SQL, Cosmos DB, Snowflake, PostgreSQL, MySQL. Accessible via SQL endpoint + Spark.
๐Ÿฅ‰

The Bronze Landing

Raw data touches down in the Bronze layer โ€” the first stop in the medallion architecture. Data arrives as-is from source systems. Engineers add metadata columns (ingestion time, source, batch ID) but resist the urge to transform. Schema-on-read rules here.

Exam Intel Bronze = raw landing zone. Minimal transformation. Add metadata columns: _ingestion_timestamp, _source_system, _batch_id. Use Delta tables for ACID + time travel. Files section for unstructured data.
๐Ÿฅˆ

The Silver Refinery

In the Silver layer, data gets scrubbed. Nulls are handled, types are cast, duplicates are removed. Column names are standardised, dates normalised. Related tables join together. This is where data quality rules enforce consistency.

Exam Intel Silver = cleansed, conformed. Key operations: null handling, type casting, deduplication, column rename, date standardisation, FK resolution. Use watermark/change tracking for incremental loads.
๐Ÿฅ‡

The Gold Vault

Gold is where data becomes a product. Aggregations are pre-computed, star schemas are formed, and partitioning strategies align with downstream query patterns. Z-Order optimisation ensures filter columns are read-efficient.

Exam Intel Gold = business-ready, aggregated. Star schema (facts + dimensions). Pre-computed aggregates. Partition by date/region. Z-Order for filter columns. Serves semantic models, dashboards, APIs.
โšก

The Spark Engine

Under the hood, Spark does the heavy lifting. V-Order optimisation on writes ensures downstream reads are fast. mssparkutils handles file operations and credential management. High Concurrency mode shares sessions across notebooks for efficiency.

Exam Intel V-Order = write-time optimisation for faster reads. mssparkutils: fs, notebook.run, credentials. High Concurrency mode = shared Spark session. Autoscale pools. Runtime versions matter โ€” pin for stability.
๐Ÿ“Š

The Serving Layer

Data reaches consumers through two paths: the Lakehouse (Spark + SQL analytics endpoint) and the Warehouse (full T-SQL). Direct Lake semantic models read Delta tables without copying, while the SQL analytics endpoint provides familiar T-SQL access.

Exam Intel Lakehouse: Spark + auto SQL analytics endpoint (read-only). Warehouse: full T-SQL DML, multi-table transactions. Direct Lake: reads Delta directly, no import copy needed. Cross-database queries in warehouse.
๐Ÿ”

The Monitor's Tower

Engineers watch from the monitoring tower. The Monitoring Hub tracks pipeline runs, Spark jobs, and refresh operations. Capacity metrics reveal compute usage. Apache Spark Advisor suggests performance improvements.

Exam Intel Monitoring Hub: pipeline runs, Spark jobs, dataflow refreshes. Capacity metrics: CU usage, throttling, overload. Spark Advisor: performance recommendations. Spark UI: stage details, DAG, executors.
๐Ÿ›ก๏ธ

The Governance Gate

Before data leaves, governance ensures it's trustworthy. Fabric domains organise workspaces by business area. Sensitivity labels flow downstream automatically. OneLake RBAC controls who accesses what, while Purview catalogues everything.

Exam Intel Domains: organise workspaces by business area. Sensitivity labels propagate downstream. OneLake RBAC: item-level + folder-level permissions. Purview integration for data catalogue + classification. Data lineage end-to-end.

Mnemonic Wall

Memorable acronyms and phrases to anchor key exam concepts in your memory.

๐Ÿฅ‡
BSG
Bronze, Silver, Gold
Medallion architecture layers. Bronze = raw. Silver = cleansed. Gold = business-ready.
๐Ÿ”ง
CDNS
Copy, Dataflow, Notebook, Stored Proc
The 4 main pipeline activities for data movement and transformation.
๐Ÿ““
VOOM
V-Order, OPTIMIZE, Order by Z, Maintain
Delta table performance stack. V-Order on write, OPTIMIZE to compact, Z-Order to sort, maintain regularly.
๐Ÿ”„
CDC
Change Data Capture
How mirroring works โ€” only syncs changes, not full copies. Also used in watermark patterns for incremental loads.
๐Ÿ›๏ธ
LW
Lakehouse (Spark) vs Warehouse (T-SQL)
Two storage options. Lakehouse = schema-on-read, Spark-first. Warehouse = schema-on-write, T-SQL, multi-table tx.
โšก
VOD
VACUUM, OPTIMIZE, DESCRIBE HISTORY
Delta maintenance trinity. VACUUM cleans old files. OPTIMIZE compacts small files. DESCRIBE HISTORY for audit.
๐Ÿ”—
SAGE
Shortcuts: ADLS, GCS, External S3
OneLake shortcut sources for cross-cloud zero-copy access. Also internal + Dataverse.
๐Ÿ“Š
DL
Direct Lake
The optimal semantic model mode for Fabric. Reads Delta directly from OneLake. No data copy. Near-real-time.
๐Ÿ”
MSA
Monitoring Hub, Spark UI, Advisor
Three monitoring tools. Hub = pipeline/job status. Spark UI = stage DAG. Advisor = perf recommendations.
๐Ÿ›ก๏ธ
DSLP
Domains, Sensitivity labels, Lineage, Purview
Governance pillars. Domains organise. Labels protect. Lineage traces. Purview catalogues.
๐Ÿ“ฆ
MIST
Metadata columns, Ingestion timestamp, Source system, batch Tag
What to add in the Bronze layer. These columns enable lineage and debugging.
๐Ÿ”ง
SET
Schedule, Event, Tumbling window
Pipeline trigger types. Schedule = cron-like. Event = storage events. Tumbling window = fixed intervals.
๐Ÿงน
QFCD
Quality, Folding, Connectors, Destination
Dataflows Gen2 key concepts. Column Quality/Distribution/Profile. Query folding. 170+ connectors. Multiple outputs.
๐Ÿ”๏ธ
RBAC
Role Based Access Control
OneLake RBAC: item-level and folder-level permissions. Workspace roles: Admin > Member > Contributor > Viewer.
๐Ÿ”ฌ
HCS
High Concurrency Session
Shares Spark session across notebooks. Reduces startup time. Good for interactive development.

Versus Arena

Click any card to flip and reveal a detailed comparison table.

๐Ÿ›๏ธ vs ๐Ÿ—๏ธ
Lakehouse vs Warehouse
Click to flip

Lakehouse vs Warehouse

AspectLakehouseWarehouse
SchemaSchema-on-readSchema-on-write
LanguageSpark (PySpark/SQL)T-SQL
Data typesStructured + semi + unstructuredStructured only
TransactionsSingle table (Delta)Multi-table
SQL accessSQL analytics endpoint (read-only)Full DML (read-write)
Best forData engineering, ML, data scienceBI analytics, T-SQL workloads
Files sectionYes (unstructured storage)No
Click to flip back
๐Ÿ”ง vs ๐Ÿงน vs ๐Ÿ““
Pipeline vs Dataflow vs Notebook
Click to flip

Pipeline vs Dataflow vs Notebook

AspectPipelineDataflow Gen2Notebook
PurposeOrchestrationNo-code transformCode-based transform
LanguageActivities (visual)Power Query (M)PySpark/SQL/Scala
Best forWorkflow coordinationSimple ETLComplex engineering
ComputePipeline engineDataflow engineSpark cluster
ReusabilityTemplate pipelinesReusable dataflowsParameterised notebooks
MonitoringPipeline runsRefresh historySpark UI + jobs
Click to flip back
๐Ÿ”— vs ๐Ÿ“‹ vs ๐Ÿ”„
Shortcuts vs Copy vs Mirror
Click to flip

Shortcuts vs Copy vs Mirror

AspectShortcutsCopy ActivityMirroring
Data movementZero-copy (pointer)Full copyCDC replication
LatencyReal-timeBatch (scheduled)Near real-time
Storage costNone (source only)DuplicatedDelta tables in OneLake
TransformationNoneIn pipelineNone (read-only)
SourcesADLS, S3, GCS, internal170+ connectorsAzure SQL, Cosmos, Snowflake
Best forCross-workspace accessETL workflowsAlways-fresh replicas
Click to flip back
๐Ÿฅ‰ vs ๐Ÿฅˆ vs ๐Ÿฅ‡
Bronze vs Silver vs Gold
Click to flip

Bronze vs Silver vs Gold

LayerBronzeSilverGold
Data stateRaw, as-isCleansed, conformedBusiness-ready
SchemaSchema-on-readValidated schemaStar schema / aggregated
TransformationsMetadata onlyClean, deduplicate, joinAggregate, model, partition
QualitySource fidelityData quality rulesBusiness rules applied
UsersData engineersData engineers/analystsAnalysts, semantic models
FormatDelta (or raw files)Delta tablesDelta tables (optimised)
Click to flip back
๐Ÿ“Š vs โšก
Batch vs Streaming Ingestion
Click to flip

Batch vs Streaming Ingestion

AspectBatchStreaming
TimingScheduled intervalsContinuous / real-time
ToolsPipelines, NotebooksEventstream, Mirroring
LatencyMinutes to hoursSeconds
VolumeLarge bulk loadsEvent-by-event
ComplexitySimpler, predictableStateful, complex
Use caseNightly ETL, reportsIoT, fraud detection, alerts
Click to flip back
โš™๏ธ vs โš™๏ธ
OPTIMIZE vs VACUUM vs Z-Order
Click to flip

OPTIMIZE vs VACUUM vs Z-Order

CommandPurposeWhen to Use
OPTIMIZECompact small files into larger onesAfter many small writes/appends
VACUUMRemove old files no longer referencedRegularly to reclaim storage
Z-ORDER BYCo-locate related data for faster readsOn frequently filtered columns
V-OrderWrite-time optimisation (Fabric default)Automatic in Fabric notebooks
DESCRIBE HISTORYView table version historyAuditing, debugging, time-travel
RESTORERoll back to a previous versionRecovering from bad writes
Click to flip back
๐Ÿ’พ vs ๐Ÿ’พ
Delta vs Parquet vs CSV
Click to flip

Delta vs Parquet vs CSV

AspectDeltaParquetCSV
ACID transactionsYesNoNo
Schema evolutionYesLimitedNo
Time travelYes (versioned)NoNo
CompressionExcellentExcellentPoor
Read performanceBest (with V-Order)Good (columnar)Slow
Default in FabricYes (tables)Files areaImport only
Click to flip back
๐Ÿ”’ vs ๐Ÿ”’
Workspace Roles vs OneLake RBAC
Click to flip

Workspace Roles vs OneLake RBAC

AspectWorkspace RolesOneLake RBAC
ScopeEntire workspaceItem/folder level
GranularityCoarse (4 roles)Fine-grained
RolesAdmin, Member, Contributor, ViewerCustom per item
ControlsCreate, edit, delete itemsRead/write data in OneLake
Use caseTeam access managementData-level security
Combined withOneLake RBAC for data accessWorkspace roles for item access
Click to flip back
๐Ÿ”„ vs ๐Ÿ”„
Full Load vs Incremental Load
Click to flip

Full Load vs Incremental Load

AspectFull LoadIncremental Load
Data scopeAll data every timeOnly new/changed records
PerformanceSlower, resource-heavyFaster, efficient
ComplexitySimple implementationNeeds watermark/CDC tracking
StorageReplace or overwriteMerge/upsert (MERGE INTO)
Use caseSmall tables, initial loadLarge tables, ongoing sync
Fabric patternOverwrite modeWatermark + merge pattern
Click to flip back

The Cheat Sheet

A dense four-column reference grid โ€” one column per exam domain.

Implement & Manage

30-35%

Lakehouse & Warehouse

  • Lakehouse: schema-on-read, Spark + SQL analytics endpoint (read-only)
  • Warehouse: schema-on-write, full T-SQL DML, multi-table transactions
  • Both store Delta Parquet in OneLake
  • Cross-database queries supported in warehouse
  • Default semantic model auto-created on lakehouse tables

Medallion Architecture

  • Bronze: raw landing, metadata columns, schema-on-read
  • Silver: cleansed, conformed, deduplicated, joined
  • Gold: aggregated, star schema, partition-optimised
  • Each layer = separate lakehouse (recommended)

Lifecycle & DevOps

  • Git integration: Azure DevOps or GitHub
  • Deployment pipelines: Dev โ†’ Test โ†’ Prod
  • Notebooks as source files (.py/.sql default)
  • Database projects for warehouse schema management
  • Environment variables for stage-specific config

Security & Governance

  • Workspace roles: Admin > Member > Contributor > Viewer
  • OneLake RBAC: item-level + folder-level permissions
  • RLS, CLS, OLS for data-level security
  • Dynamic data masking in warehouse
  • Sensitivity labels propagate downstream
  • Fabric domains: group workspaces by business area
  • Endorsement: Promoted โ†’ Certified

Ingest & Transform

30-35%

Pipeline Activities

  • Copy Activity: 170+ connectors, bulk data movement
  • Dataflow Gen2: Power Query Online, no-code ETL
  • Notebook Activity: run Spark notebooks in pipelines
  • Stored Procedure: execute SQL in warehouse
  • ForEach / If Condition: control flow logic

Spark Notebooks

  • PySpark, Spark SQL, Scala, R
  • mssparkutils: fs, notebook.run, credentials
  • V-Order: auto write-time optimisation
  • High Concurrency: shared Spark session
  • Broadcast: small DataFrame to all nodes

Ingestion Patterns

  • Full load: overwrite, simple, resource-heavy
  • Incremental: watermark + MERGE INTO (upsert)
  • CDC: change data capture for deltas only
  • Mirroring: near real-time database replication
  • Shortcuts: zero-copy cross-source access

Dataflows Gen2

  • Power Query Online, query folding (web UI icons)
  • Column Quality / Distribution / Profile
  • Multiple output destinations (vs Gen1 PBI only)
  • Lakehouse staging for better performance

Design & Build

30-35%

Medallion Architecture

  • Bronze: raw landing, metadata columns, schema-on-read
  • Silver: cleansed, conformed, deduplicated, joined
  • Gold: aggregated, star schema, partition-optimised
  • Each layer = separate lakehouse (recommended)

Lakehouse Design

  • Tables (managed): Delta format, SQL analytics endpoint
  • Files (unstructured): PDFs, images, raw CSVs
  • Shortcuts: virtual folders pointing to external data
  • Default semantic model auto-created on tables

Warehouse Design

  • Full T-SQL DML: INSERT, UPDATE, DELETE, MERGE
  • Multi-table transactions supported
  • Cross-database queries across warehouses
  • Visual Query Editor for no-code querying

Delta Lake

  • Parquet + transaction log = ACID + time travel
  • Schema evolution: mergeSchema, overwriteSchema
  • OPTIMIZE: compact small files
  • VACUUM: clean up old versions
  • Z-ORDER BY: co-locate data for filter columns

Monitor & Optimise

15-20%

Monitoring Tools

  • Monitoring Hub: pipeline runs, Spark jobs, refreshes
  • Spark UI: stage DAG, tasks, executors, storage
  • Apache Spark Advisor: perf recommendations
  • Capacity Metrics: CU usage, throttling, overload

Performance

  • V-Order: automatic write-time sort optimisation
  • OPTIMIZE: compact small files into larger ones
  • Z-ORDER: sort by frequently filtered columns
  • Partition pruning: read only needed partitions
  • Cache: Spark cache() for iterative queries

Delta Maintenance

  • VACUUM: remove unreferenced files (default 7 days)
  • OPTIMIZE: compact small files (bin-packing)
  • DESCRIBE HISTORY: audit table versions
  • RESTORE: roll back to previous version
  • ANALYZE TABLE: update statistics for query planner

Troubleshooting

  • Small file problem: too many small writes โ†’ OPTIMIZE
  • Data skew: uneven partition sizes โ†’ repartition()
  • Shuffle spill: insufficient memory โ†’ increase executors
  • Stale cache: cache invalidation after writes

Monitor & Optimise

30-35%

Monitoring Tools

  • Monitoring Hub: pipeline runs, Spark jobs, dataflow refreshes
  • Spark UI: stage DAG, tasks, executors, storage, shuffle
  • Apache Spark Advisor: skew detection, optimisation hints
  • Capacity Metrics: CU usage, throttling, overload alerts

Delta Maintenance

  • OPTIMIZE: compact small files (bin-packing)
  • VACUUM: remove unreferenced files (default 7 days retention)
  • Z-ORDER BY: co-locate data for filtered column reads
  • V-Order: automatic write-time optimisation (Fabric default)
  • DESCRIBE HISTORY: audit table versions
  • RESTORE: roll back to previous version

Performance Tuning

  • Partition pruning: read only needed partitions
  • Cache: Spark cache() for iterative queries
  • Broadcast joins: small DataFrame to all nodes
  • Avoid collect() on large datasets
  • Repartition for data skew (check Spark UI)

Troubleshooting

  • Small file problem: too many appends โ†’ OPTIMIZE
  • Data skew: uneven partition sizes โ†’ repartition/salting
  • Shuffle spill: insufficient memory โ†’ increase executors
  • Pipeline failures: check activity output, retry policies
  • Eventstream errors: check DLQ (dead letter queue)

Memory Palace

Walk through five rooms, each representing a key area of Fabric data engineering. Objects fade in as you scroll.

The Landing Dock

Ingestion โ€” Where data first arrives

๐Ÿข
OneLake
Unified data lake per tenant. ADLS Gen2. All Fabric data lands here
๐Ÿ”ง
Copy Activity
170+ connectors. Bulk data movement. Full or incremental load
๐Ÿงน
Dataflows Gen2
Power Query Online. No-code ETL. Query folding. Column profiling
๐Ÿ”„
Mirroring
Near real-time CDC replication. Azure SQL, Cosmos DB, Snowflake
๐Ÿ”—
Shortcuts
Zero-copy access. ADLS, S3, GCS, Dataverse, internal OneLake
๐Ÿ“ก
Eventstream
Streaming ingestion. Event Hubs, IoT Hub, custom apps

The Medallion Hall

Architecture โ€” Where data is layered

๐Ÿฅ‰
Bronze Layer
Raw landing. Metadata columns. Schema-on-read. Delta tables
๐Ÿฅˆ
Silver Layer
Cleansed, conformed. Null handling, dedup, joins, type casting
๐Ÿฅ‡
Gold Layer
Business-ready. Star schema. Aggregates. Z-Order optimised
๐Ÿ““
Spark Notebooks
PySpark/SQL/Scala. mssparkutils. V-Order. High Concurrency
๐Ÿ’ง
Incremental Load
Watermark pattern. MERGE INTO for upsert. CDC for deltas
๐Ÿ“Š
Star Schema
Fact tables (measures) + Dimension tables (attributes). Gold layer pattern

The Engine Room

Compute โ€” Where transformation happens

โšก
Spark Pools
Auto-scaling. Runtime versions. Executor memory. Broadcast joins
๐Ÿ”ง
Pipeline Engine
Orchestration activities. ForEach, If Condition, Web, Wait
๐Ÿ“ฆ
Delta Operations
OPTIMIZE, VACUUM, Z-ORDER, DESCRIBE HISTORY, RESTORE
๐Ÿ—๏ธ
V-Order
Write-time optimisation. Automatic in Fabric. Faster downstream reads
๐Ÿ“
Partitioning
Partition by date/region. Pruning skips irrelevant files. Align with queries
๐Ÿ”„
Schema Evolution
mergeSchema: add new columns. overwriteSchema: replace schema entirely

The Control Tower

Monitoring โ€” Where engineers watch

๐Ÿ”
Monitoring Hub
Pipeline runs, Spark jobs, dataflow refreshes. Status and duration
๐Ÿ“Š
Spark UI
Stage DAG, tasks, executors, storage, shuffle metrics
๐Ÿ’ก
Spark Advisor
Performance recommendations. Skew detection. Optimisation hints
๐Ÿ“ˆ
Capacity Metrics
CU usage, throttling, overload alerts. Capacity management
๐Ÿ›
Small File Problem
Too many small writes โ†’ OPTIMIZE to compact. Common in streaming
โš–๏ธ
Data Skew
Uneven partitions โ†’ repartition() or salting. Check Spark UI

The Governance Vault

Security โ€” Where trust is enforced

๐Ÿ‘ฅ
Workspace Roles
Admin > Member > Contributor > Viewer. Coarse-grained access
๐Ÿ”’
OneLake RBAC
Item-level + folder-level permissions. Fine-grained data access
๐Ÿท๏ธ
Sensitivity Labels
Propagate downstream. Classify data. Compliance enforcement
๐Ÿ—‚๏ธ
Fabric Domains
Group workspaces by business area. Organisational data mesh
๐Ÿ”€
Git Integration
Azure DevOps or GitHub. Notebooks as .py/.sql source files
๐Ÿš€
Deployment Pipelines
Dev โ†’ Test โ†’ Prod. Deployment rules. Environment variables

Pattern Spotter

Decision flowcharts and trigger-answer pattern cards for common exam questions.

Which Ingestion Method?

What's the data source / requirement?
  โ”œโ”€โ”€ External DB, need always-fresh replica โ†’ Mirroring
  โ”œโ”€โ”€ Cross-cloud, no data copy โ†’ OneLake Shortcuts
  โ”œโ”€โ”€ Bulk ETL from 170+ sources โ†’ Pipeline + Copy Activity
  โ”œโ”€โ”€ No-code Power Query transforms โ†’ Dataflows Gen2
  โ”œโ”€โ”€ Complex Spark transformations โ†’ Notebook Activity
  โ””โ”€โ”€ Real-time events / IoT โ†’ Eventstream

Which Storage Option?

What's the workload?
  โ”œโ”€โ”€ Spark / data engineering / ML โ†’ Lakehouse
  โ”œโ”€โ”€ T-SQL / multi-table transactions โ†’ Warehouse
  โ”œโ”€โ”€ Real-time events / KQL โ†’ Eventhouse
  โ”œโ”€โ”€ Unstructured files (PDFs, images) โ†’ Lakehouse Files section
  โ””โ”€โ”€ Cross-database T-SQL queries โ†’ Warehouse

Which Delta Maintenance Command?

What do you need to do?
  โ”œโ”€โ”€ Too many small files โ†’ OPTIMIZE
  โ”œโ”€โ”€ Reclaim storage from old versions โ†’ VACUUM
  โ”œโ”€โ”€ Speed up filtered reads โ†’ Z-ORDER BY (column)
  โ”œโ”€โ”€ Check table version history โ†’ DESCRIBE HISTORY
  โ”œโ”€โ”€ Roll back a bad write โ†’ RESTORE TABLE ... TO VERSION
  โ””โ”€โ”€ Update query planner stats โ†’ ANALYZE TABLE

Which Loading Pattern?

How should data be loaded?
  โ”œโ”€โ”€ Small reference table โ†’ Full load (overwrite)
  โ”œโ”€โ”€ Large fact table, ongoing sync โ†’ Incremental (watermark + MERGE)
  โ”œโ”€โ”€ DB with change tracking โ†’ CDC-based incremental
  โ”œโ”€โ”€ External DB, always-fresh needed โ†’ Mirroring (automatic)
  โ””โ”€โ”€ Cross-cloud, zero storage cost โ†’ Shortcuts (no load needed)

Trigger โ†’ Answer Patterns

"medallion architecture" or "bronze/silver/gold"
โ†’ Lakehouse layered design
"V-Order" or "write-time optimisation"
โ†’ Fabric auto read optimisation
"mssparkutils" or "notebook.run"
โ†’ Spark notebook utilities
"High Concurrency mode"
โ†’ Shared Spark session across notebooks
"MERGE INTO" or "upsert"
โ†’ Incremental load pattern
"near real-time replication" from external DB
โ†’ Mirroring (CDC-based)
"zero-copy" or "shortcuts"
โ†’ OneLake Shortcuts
"small file problem"
โ†’ OPTIMIZE (compact files)
"Z-ORDER" or "data skipping"
โ†’ Column co-location for filter perf
"VACUUM" or "reclaim storage"
โ†’ Remove old Delta file versions
"schema evolution" or "mergeSchema"
โ†’ Delta Lake schema flexibility
"Eventstream" or "real-time ingestion"
โ†’ Streaming data into Eventhouse
"Fabric domains"
โ†’ Organise workspaces by business area
"OneLake RBAC" or "folder-level security"
โ†’ Fine-grained data access control
"deployment pipelines" for Fabric
โ†’ Dev โ†’ Test โ†’ Prod promotion
"watermark pattern"
โ†’ Incremental load tracking
"Data Activator"
โ†’ Alert triggers on real-time patterns
"time travel" or "DESCRIBE HISTORY"
โ†’ Delta Lake version history

Train with practitioners, not presenters

Lucid Labs delivers Microsoft certification training grounded in real-world project experience. We adapt every session to your team's environment, data stack, and business objectives โ€” because the best exam prep comes from engineers who build these solutions every day.

๐ŸŽฏ
Tailored Content
Training built around your actual data, your tools, and your use cases โ€” not generic slides.
๐Ÿ› ๏ธ
Hands-On Labs
Work through real scenarios in your own environment with expert guidance at every step.
๐Ÿ“ˆ
Exam + Capability
Pass the exam and build lasting skills your team can apply from day one.
Talk to us about Fabric Data Engineering training

Custom training for teams & individuals โ€” remote or on-site across Australia

Keith Oak
Keith Oak
Director & Principal Consultant โ€” Lucid Labs

Microsoft Solutions Partner architect specialising in Fabric, Azure Data & AI, and GitHub Enterprise. 18+ years delivering data platforms for Australian businesses โ€” building the systems these exams test every day.