Bringing Microsoft Defender Attack Simulation Data to Power BI

Disclaimer: This is a personal open-source project. Not an official Microsoft product or supported by Microsoft. Use at your own risk.

Microsoft Defender for Office 365 includes a solid set of built-in reports for Attack Simulation.The portal gives you simulation coverage, training completion rates, repeat offender tracking, and even a predicted compromise rate based on historical Microsoft 365 data. For day-to-day operations, these reports cover a lot of ground.

Where it gets harder is when you need to take that data somewhere else. Maybe you need to build a consolidated security dashboard that combines attack simulation metrics with other security signals. Maybe your CISO wants phishing resilience trends broken down by department, business unit, or geography over the last twelve months. Or your compliance team needs to correlate training completion with specific simulation campaigns in a format they can audit independently.

The built-in reports weren't designed for that level of customization. The data is there, but it's locked inside the Defender XDR portal. Exporting CSVs works for one-off analysis, but it breaks down when you're running monthly campaigns across thousands of users and need the data to stay current.

This project bridges that gap. It pulls the simulation data out through the Microsoft Graph API and lands it in Power BI, where you can build whatever reporting your organization needs.

What it does

It's an automated pipeline that pulls attack simulation data from the Microsoft Graph API, transforms it, and writes it as Parquet files to Azure Data Lake Storage Gen2. Power BI reads those files on a scheduled refresh. Once it's set up, the data flows without any manual steps.

Source

Defender

9 Graph endpoints

Compute

Azure Function

Python 3.11

Storage

ADLS Gen2

Parquet + JSON

Consume

Power BI

Dashboards

A timer-triggered Azure Function fires on a schedule (hourly by default). It authenticates through Managed Identity and Key Vault, then paginates through 9 Graph API endpoints with retry logic. The raw JSON gets flattened into tabular records, strings are sanitized, and everything is written as date-partitioned Parquet files. Power BI picks them up on the next refresh.

The whole pipeline runs in under two minutes.

What you get

The Power BI report template includes seven pre-built pages. These were designed by a security engineer, not a data analytics specialist, so treat them as a starting point. The real value is having all the data in Power BI where you can build whatever views, filters, and dashboards fit your organization's needs.

MDO Attack Simulation Overview

Executive Dashboard

Department Overview

Simulation Analysis

Training Compliance

User Risk Profile

Improving Submissions

The data

The solution pulls nine tables that cover the full scope of an Attack Simulation Training program:

Table	API	Description
repeatOffenders	v1.0	Users who fell for multiple simulations
simulationUserCoverage	v1.0	Per-user simulation statistics
trainingUserCoverage	v1.0	Per-user training completion
simulations	beta	Campaign definitions and metrics
simulationUsers	beta	Per-user results for each simulation
simulationUserEvents	beta	User events: clicks, credential submissions, reports
trainings	beta	Training module definitions
payloads	beta	Phishing payload templates
users	v1.0	Entra ID enrichment (department, city, country)

The first three tables use the stable v1.0 Graph API. The rest use the beta API and are enabled by default through the SYNC_SIMULATIONS setting.

Design decisions

A few things that shaped how this was built:

Fully async. The function uses aiohttp for Graph API calls and the Azure SDK async client for storage writes. This keeps the function fast and avoids blocking on I/O. A full sync across all nine endpoints finishes in under two minutes.

Parquet with explicit schemas. Each table has a PyArrow schema defined in code, with Snappy compression and INT64 timestamps. This gives Power BI clean column types on import without any guessing.

Incremental sync. After the initial full sync, the function uses a 7-day lookback window to only process recent data. This cuts API calls by 70-80% and reduces the risk of hitting Graph API throttling limits.

Security first. The function authenticates with Managed Identity. Secrets live in Key Vault. The infrastructure supports private endpoints and VNet integration. All RBAC assignments follow least-privilege.

Modular storage layer. The ADLS writer is a separate module. If you want to send data to Microsoft Fabric, Azure SQL, or Dataverse instead, you can swap the writer without touching the rest of the pipeline.

Getting started

The repo includes full deployment guides for three methods: GitHub Actions, Azure CLI, and manual portal setup. Here's the short version:

Deploy the infrastructure

Run the included Bicep templates to create the Azure Function, ADLS Gen2, Key Vault, and Application Insights.

Register the app in Entra ID

Create an app registration with AttackSimulation.Read.All and User.Read.All application permissions.

Deploy the function code

Push via GitHub Actions, Azure CLI, or manually. The timer starts syncing data automatically.

Connect Power BI

Open the included report template, point it at your storage account, and refresh.

Tech stack

Python 3.11 Azure Functions v4 Microsoft Graph API ADLS Gen2 Power BI aiohttp PyArrow Bicep Key Vault Managed Identity

This is an open-source project released under the MIT license. Contributions, issues, and feature requests are welcome on GitHub.

Get the source code

Full documentation, deployment guides, and the Power BI report template are on GitHub.

View on GitHub