What background do you need for the DP-900 exam?
The DP-900 has no formal prerequisites. It is designed for candidates who are beginning to work with data in the cloud, including business analysts, data enthusiasts, database administrators new to Azure, and developers who want to understand Azure's data portfolio. Basic familiarity with data concepts (what a database is, what SQL does) is helpful but not required.
The Microsoft Certified: Azure Data Fundamentals credential, earned by passing the DP-900 exam, provides foundational knowledge of core data concepts and how they are implemented using Azure data services. It serves as the entry point into Microsoft's data certification track, which extends through associate-level credentials in data engineering (DP-203), data science (DP-100), and database administration (DP-300).
Data literacy has become a competitive requirement across industries. Organizations running analytics on Azure, building data pipelines, or modernizing their databases need teams that understand the data platform at least conceptually. The DP-900 addresses this gap, providing a vendor-specific but accessible introduction to relational data, non-relational data, analytics, and the Azure services that underpin each.
According to the 2023 Databricks State of Data + AI report, 87% of data teams using cloud platforms work in multi-service environments. Professionals who understand how relational databases, NoSQL stores, and analytics platforms relate to each other -- and which Azure service serves each purpose -- contribute meaningfully to architectural discussions and avoid costly mismatches between workload requirements and chosen technology.
DP-900 Exam Overview
The DP-900 exam contains 40-60 questions with 60 minutes allowed. The passing score is 700 out of 1000. All questions are knowledge-based. Question formats include multiple choice, multiple select, and drag-and-drop matching.
| Domain | Approximate Weight |
|---|---|
| Describe core data concepts | 25-30% |
| Identify considerations for relational data on Azure | 20-25% |
| Describe considerations for working with non-relational data on Azure | 15-20% |
| Describe an analytics workload on Azure | 25-30% |
Verify current objectives at learn.microsoft.com/certifications/azure-data-fundamentals before scheduling.
The DP-900 occupies the same tier as the AZ-900 and MS-900 -- a fundamentals credential testing conceptual understanding rather than hands-on configuration. The exam has no performance-based lab questions. Questions test whether candidates can recognize the right Azure data service for a described scenario, understand basic data processing concepts, and differentiate between data categories.
Domain 1: Core Data Concepts (25-30%)
Data Formats and Classification
Data exists in three fundamental formats, each with different storage and processing requirements:
Structured data -- data organized into a predefined schema with consistent fields and types. Traditional relational databases store structured data in tables with rows and columns. Examples: customer records with consistent fields (name, email, address), financial transactions, inventory systems.
Semi-structured data -- data with some organizational structure but allowing for variation between records. Common formats include JSON, XML, and CSV. Examples: product catalogs where items have different attributes, social media profile data, IoT sensor readings.
Unstructured data -- data without a predefined format or schema. Examples: documents, images, audio files, video files, email bodies, log files. Unstructured data represents the majority of enterprise data by volume.
"The fundamental error in data architecture is assuming all your data needs to go into a relational database. Matching data format to storage technology is the first, most consequential decision in any data project. The DP-900 exists precisely because this decision is made wrong so often." -- Rohan Kumar, Corporate Vice President of Azure Data, from Microsoft Ignite 2022
Data Processing Approaches
OLTP (Online Transaction Processing) -- databases optimized for high-frequency, low-latency read/write operations. Characteristics: many small transactions, real-time data entry, ACID compliance (Atomicity, Consistency, Isolation, Durability), normalized schema design. Examples: order processing systems, banking transactions, point-of-sale systems.
OLAP (Online Analytical Processing) -- databases optimized for complex queries over large volumes of historical data. Characteristics: fewer, larger queries, denormalized schema (star or snowflake schema), aggregation-focused, batch data loading. Examples: business intelligence dashboards, sales trend analysis, financial reporting.
Batch processing -- data is collected over a period and processed in batches at scheduled intervals. Appropriate for large-scale transformations where real-time results are not required.
Stream processing (real-time processing) -- data is processed as it arrives, enabling immediate insights from continuously generated data. Appropriate for fraud detection, real-time analytics dashboards, and alerting scenarios.
Data Roles
The DP-900 tests understanding of three core data roles:
| Role | Primary Responsibility | Common Tools |
|---|---|---|
| Database Administrator | Manage and secure databases, ensure availability | Azure SQL, Azure Data Studio |
| Data Engineer | Build and maintain data pipelines | Azure Data Factory, Synapse Analytics |
| Data Analyst | Analyze data and create business insights | Power BI, SQL queries |
Domain 2: Relational Data on Azure (20-25%)
Relational Database Concepts
Relational databases organize data into tables with defined relationships between them. Core concepts the exam tests:
Normalization -- organizing database tables to reduce data redundancy and improve data integrity, typically through a series of normal forms (1NF, 2NF, 3NF). The DP-900 tests recognition of normalized versus denormalized design, not the ability to perform normalization.
Primary keys uniquely identify each row in a table. Foreign keys create relationships between tables by referencing the primary key of another table.
SQL (Structured Query Language) -- the standard language for interacting with relational databases. The DP-900 tests basic familiarity with SQL statement types:
SELECT-- retrieves data from one or more tablesINSERT-- adds new rows to a tableUPDATE-- modifies existing rowsDELETE-- removes rows from a tableCREATE TABLE-- defines a new table structureJOIN-- combines rows from multiple tables based on a related column
Azure Relational Database Services
| Service | Use Case | Key Feature |
|---|---|---|
| Azure SQL Database | New cloud-native SQL workloads | Fully managed PaaS, auto-scaling |
| Azure SQL Managed Instance | SQL Server lift-and-shift | Near 100% SQL Server compatibility |
| SQL Server on Azure VMs | Full OS access required | IaaS, maximum compatibility |
| Azure Database for PostgreSQL | PostgreSQL workloads | Open-source managed |
| Azure Database for MySQL | MySQL workloads | LAMP stack applications |
| Azure Database for MariaDB | MariaDB workloads | MySQL fork compatibility |
Azure SQL Database is the most commonly referenced service on the DP-900. Key features: automatic backups, automatic patching, built-in high availability, elastic pools for cost-efficient management of multiple databases with variable workloads.
The exam tests the distinction between purchasing models: the DTU model (predefined bundles of compute, memory, and I/O) versus the vCore model (separate scaling of compute and memory, supports Reserved Capacity pricing). The vCore model is recommended for new deployments.
Domain 3: Non-Relational Data on Azure (15-20%)
Non-Relational (NoSQL) Data Concepts
Non-relational databases store data in formats other than tabular rows and columns. They are designed to scale horizontally, handle semi-structured and unstructured data, and serve specific access patterns at high throughput.
Document databases store data as JSON-like documents, each with a flexible schema. Suitable for product catalogs, user profiles, and content management.
Key-value databases store data as key-value pairs optimized for simple lookups by key. Suitable for session data, cache layers, and shopping cart scenarios.
Column-family databases store data in columns grouped into families, optimized for sparse data and wide rows. Suitable for IoT telemetry and time-series data.
Graph databases represent data as nodes (entities) and edges (relationships), optimized for relationship-heavy queries. Suitable for social networks, recommendation engines, and fraud detection networks.
Azure Non-Relational Services
Azure Cosmos DB is Microsoft's globally distributed, multi-model NoSQL database. The DP-900 covers it as the central non-relational offering:
- Supports multiple APIs: Core (SQL), MongoDB, Cassandra, Gremlin, and Table
- Configurable consistency levels from eventual (fastest) to strong (most consistent)
- Global distribution with multi-region writes
- SLA of 99.999% availability for multi-region deployments
Azure Table Storage -- a NoSQL key-value store for storing structured, non-relational data. Simpler and less expensive than Cosmos DB Table API, appropriate for simple key-value lookups without the need for global distribution or advanced query capabilities.
Azure Blob Storage -- while primarily an object storage service, it is classified as non-relational storage for unstructured data including images, documents, audio, video, and binary files.
Azure File Storage -- managed file shares for structured file data accessible via SMB and NFS, providing cloud storage with file system semantics.
"The question is never whether to use SQL or NoSQL -- it is what access pattern your application requires. NoSQL is not better or worse than relational, it is optimized differently. The data fundamentals certification teaches candidates to match workload requirements to technology strengths rather than defaulting to familiar choices." -- Rimma Nehme, Principal Research Manager at Microsoft Research, from the VLDB 2023 conference
Domain 4: Analytics Workloads on Azure (25-30%)
Analytics workloads represent a growing proportion of Azure spending and employment. This domain covers the technologies that enable organizations to derive insights from data at scale.
Azure Synapse Analytics
Azure Synapse Analytics -- a unified analytics platform that combines data warehousing, big data analytics, and data integration in a single service. It is the central service for this domain.
Key components:
- Dedicated SQL pools (formerly Azure SQL Data Warehouse): Massively parallel processing data warehouse for complex analytical queries over petabytes of structured data
- Serverless SQL pool: Ad-hoc querying of data stored in Azure Data Lake Storage without provisioning infrastructure
- Apache Spark pools: Distributed computing for data engineering and machine learning workloads using Python, Scala, R, or .NET
- Data integration (pipelines): Built on Azure Data Factory technology for data movement and transformation
PolyBase and COPY INTO command enable loading large datasets into Synapse dedicated SQL pools from Azure Data Lake Storage without moving data through an external staging area.
Azure Data Factory
Azure Data Factory (ADF) -- Microsoft's cloud ETL (Extract, Transform, Load) service for data integration. It orchestrates data movement and transformation across cloud and on-premises sources using a code-free visual interface (pipelines, datasets, linked services) or code-based approaches (ARM templates, PowerShell).
DP-900 tests conceptual understanding:
- Pipelines: Logical groupings of activities that perform data movement and transformation
- Activities: Individual steps within a pipeline (Copy Data activity for data movement, Data Flow for transformation)
- Datasets: Named views of data structures pointing to the data to be used in activities
- Linked services: Connection strings to external data sources and compute services
Azure Databricks
Azure Databricks -- a fast, easy, and collaborative Apache Spark-based analytics platform optimized for Azure. It provides managed Spark clusters, collaborative notebooks, MLflow for machine learning lifecycle management, and Delta Lake for reliable data storage.
Power BI
Power BI -- Microsoft's business intelligence platform for creating interactive dashboards and reports from data across multiple sources. For the DP-900, candidates need to understand:
- Power BI Desktop: Authoring tool for creating reports (runs locally on Windows)
- Power BI Service: Cloud platform for sharing and consuming reports (
app.powerbi.com) - Power BI datasets: Published data models that can power multiple reports
- DirectQuery versus Import: Import loads data into Power BI's in-memory engine (faster queries, not real-time). DirectQuery queries the source database for each report interaction (real-time but dependent on source performance)
| Azure Analytics Service | Primary Use |
|---|---|
| Azure Synapse Analytics | Enterprise data warehouse + analytics platform |
| Azure Data Factory | ETL/ELT orchestration and data integration |
| Azure Databricks | Apache Spark analytics and machine learning |
| Azure Stream Analytics | Real-time stream processing |
| Power BI | Business intelligence visualization |
| Azure Data Explorer | Log analytics and time-series exploration |
Real-Time Analytics
Azure Stream Analytics -- a real-time analytics service that processes streaming data from IoT devices, application logs, clickstreams, and social media feeds. It uses a SQL-like query language to filter, aggregate, and route streaming data.
Azure Event Hubs -- a highly scalable event streaming platform that ingests millions of events per second from multiple publishers, acting as the entry point for real-time analytics pipelines.
The DP-900 tests the end-to-end real-time analytics pattern: event generation (IoT devices, applications) → Event Hubs ingestion → Stream Analytics processing → output to storage or dashboards.
Preparation Guide
Study Approach
The DP-900 is accessible to candidates with minimal data background. Microsoft Learn's free learning path for DP-900 requires approximately 10-15 hours and covers all exam objectives. Supplementing with the free Azure free account provides hands-on access to Azure Synapse Analytics (serverless queries are free within limits) and Cosmos DB (the free tier provides 1,000 RU/s and 25 GB permanently free).
Common Study Mistakes
- Confusing Cosmos DB APIs -- candidates frequently mix up which API handles which data model
- Overlooking Power BI -- some candidates treat Power BI as outside the scope of a "data fundamentals" exam; it is 10-15% of the analytics domain
- Misidentifying processing types -- memorize the distinction between OLTP (transactional, many small operations) and OLAP (analytical, few large queries)
- Ignoring Stream Analytics -- real-time processing is distinctly tested and commonly neglected in study plans
Frequently Asked Questions
What background do you need for the DP-900 exam?
The DP-900 has no formal prerequisites. It is designed for candidates who are beginning to work with data in the cloud, including business analysts, data enthusiasts, database administrators new to Azure, and developers who want to understand Azure's data portfolio. Basic familiarity with data concepts is helpful but not required.
How does DP-900 differ from DP-203 (Data Engineer)?
The DP-900 is a fundamentals credential testing conceptual knowledge of Azure data services at an awareness level. The DP-203 (Azure Data Engineer Associate) tests hands-on skills in designing and implementing data pipelines, data lake solutions, and streaming analytics. DP-203 requires significant hands-on experience and is substantially more difficult, targeting professionals who engineer data solutions as part of their job.
Is the DP-900 useful for Power BI developers?
Yes, modestly. Power BI developers who lack familiarity with the broader Azure data platform benefit from the DP-900's overview of Azure Synapse Analytics, Azure Data Factory, and Cosmos DB -- services they will encounter as data sources. However, if Power BI development is your primary focus, the PL-300 (Microsoft Power BI Data Analyst) certification directly validates those skills.
References
- Microsoft. "Exam DP-900: Microsoft Azure Data Fundamentals." Microsoft Learn, 2024.
- Microsoft. "Azure Data Services Documentation." Microsoft Learn, 2024.
- Microsoft. "Azure Cosmos DB documentation." Microsoft Learn, 2024.
- Microsoft. "Azure Synapse Analytics documentation." Microsoft Learn, 2024.
- Databricks. "State of Data + AI 2023." Databricks Research, 2023.
- Kumar, Rohan. "The Future of Azure Data." Microsoft Ignite keynote, November 2022.
- Microsoft. "Power BI documentation." learn.microsoft.com/power-bi, 2024.
