Azure Synapse Analytics — Enterprise Data Warehouse

Analyze petabytes of data with Azure Synapse — dedicated SQL pools, Spark analytics, and unified data platform.

slides

Slide 1 / 8

Azure Synapse Analytics

Unified Analytics — Data Warehouse + Spark + Pipelines
Azure Data & Analytics — Episode 28

Speaker Script

“Welcome back. Today we explore Azure Synapse Analytics — Microsoft's enterprise analytics platform that unifies data warehousing, big data processing, and data integration in a single workspace. Traditional architectures required separate tools for SQL-based analytics, Spark-based processing, and data movement. Synapse brings all of these together with a unified interface, shared governance, and seamless data movement between engines.”

Slide 2 / 8

Azure Synapse Workspace

Unified workspace for all analytics workloads
Dedicated SQL Pool — data warehouse (formerly SQL DW)
Serverless SQL Pool — query data lake without provisioning
Apache Spark Pool — distributed big data processing
Synapse Pipelines — data integration and orchestration
Azure Data Lake Storage Gen2 — underlying storage

Speaker Script

“The Synapse workspace is your analytics hub. A Dedicated SQL Pool is a provisioned, massively parallel processing data warehouse — optimized for complex analytical queries on structured data. The Serverless SQL Pool lets you query files directly in your data lake using standard SQL without provisioning any compute — pay only per query. Spark Pools provide Apache Spark for machine learning, data transformation, and streaming. Synapse Pipelines orchestrate data movement from any source to any destination.”

Slide 3 / 8

Dedicated SQL Pool

Massively Parallel Processing (MPP) architecture
Data distributed across 60 compute nodes
Columnar storage — optimized for analytics
Distributions: hash (large tables), replicate (small tables), round-robin
Pause when idle — don't pay for unused compute
Best for: structured data, complex SQL, BI workloads

Speaker Script

“The Dedicated SQL Pool is a classical enterprise data warehouse built for analytical queries. Data is distributed across 60 compute nodes in parallel — queries are broken up and executed simultaneously across all nodes, then results merged. Columnar storage compresses data significantly and accelerates analytical queries that touch only specific columns. The pause/resume feature is critical for cost control — pause the warehouse overnight or over weekends when it's not being used and stop paying for the compute.”

Slide 4 / 8

Serverless SQL Pool

Query data lake files with SQL — no compute to manage
Supports CSV, Parquet, Delta Lake, JSON formats
Pay per TB of data scanned
Create views over data lake files
External tables for schema-on-read
Best for: data exploration, ad-hoc queries, data lake analytics

Speaker Script

“Serverless SQL Pool is a game-changer for data exploration. Point it at a Parquet file in your data lake and run SQL queries — no provisioning, no loading data, no indexing. Azure Synapse figures out the execution automatically and you pay only for the data scanned. This is perfect for data engineers exploring new datasets, data scientists running ad-hoc analyses, or building lightweight reporting on raw data lake files without the overhead of loading into a warehouse.”

Slide 5 / 8

Apache Spark in Synapse

Managed Spark clusters — start in minutes
PySpark, Scala, Spark SQL, .NET
Native integration with Azure Data Lake
Delta Lake support — ACID transactions on data lake
Shared metadata — Spark tables visible in SQL pool
Best for: ML, complex transformations, unstructured data

Speaker Script

“Synapse's Apache Spark pools are fully managed — you define the size and Azure handles cluster provisioning, configuration, and maintenance. Write PySpark notebooks for data transformation, machine learning, or streaming workloads. Delta Lake support adds ACID transactions to your data lake — you can update, delete, and merge data lake files with full transaction guarantees. The shared metadata store means tables created in Spark are immediately accessible from the Serverless SQL Pool — breaking the traditional wall between batch and SQL.”

Slide 6 / 8

Synapse Pipelines

Data integration and orchestration engine
300+ connectors: databases, SaaS apps, files, streaming
Copy Activity — high-throughput data movement
Data Flow — visual ETL transformation designer
Triggers: schedule, event, tumbling window
Same engine as Azure Data Factory

Speaker Script

“Synapse Pipelines is the data movement engine — same underlying technology as Azure Data Factory but integrated directly into Synapse. Drag-and-drop pipeline design with over 300 connectors for every data source imaginable. Copy Activity moves data at high throughput from any source to any destination. Data Flows let you design ETL transformations visually without writing code — the visual designer generates optimized Spark execution code automatically. Pipelines orchestrate your entire data lifecycle: ingest from source systems, transform, load to the warehouse.”

Slide 7 / 8

Live Azure Demo

Create Synapse workspace and Data Lake
Load CSV data into Dedicated SQL Pool
Run analytical SQL queries
Query the same data with Serverless SQL Pool
Run a PySpark notebook on the data lake

Speaker Script

“Let me show Synapse in action. I'll create a workspace, load a dataset into both a Dedicated SQL Pool and the data lake, then run the same analytical query against both — comparing the dedicated warehouse performance with the serverless approach. I'll also open a PySpark notebook and process the same data with Spark, demonstrating how all three engines access the same underlying data lake seamlessly.”

Slide 8 / 8

Summary & What's Next

✅ Synapse = data warehouse + Spark + pipelines in one workspace
✅ Dedicated SQL Pool — MPP warehouse for complex analytics
✅ Serverless SQL — query data lake with SQL, no provisioning
✅ Spark Pool — big data and ML on the same data
✅ Synapse Pipelines — 300+ connectors, visual ETL design
Next: Azure Data Factory →

Speaker Script

“Azure Synapse Analytics is the enterprise data platform for organizations that need to analyze data at any scale, in any format, using any processing engine. Next we look at Azure Data Factory as a standalone deep dive — building complex data integration pipelines, incremental loads, and data transformations at enterprise scale.”

🖥️Azure Demo Steps

1Create Azure Synapse workspace
2Create a dedicated SQL pool
3Load data from Azure Data Lake using COPY INTO
4Run analytical SQL queries
5Create a Spark pool and run a PySpark notebook
6Create a pipeline with copy activity
7Create a Power BI linked service and build a report