View on GitHub
Community Project

Bringing Microsoft Defender Attack Simulation Data to Power BI

Carlos Suarez | March 2026
Senior Security Solution Engineer at Microsoft
Disclaimer: This is a personal open-source project. Not an official Microsoft product or supported by Microsoft. Use at your own risk.

Microsoft Defender for Office 365 includes a solid set of built-in reports for Attack Simulation.The portal gives you simulation coverage, training completion rates, repeat offender tracking, and even a predicted compromise rate based on historical Microsoft 365 data. For day-to-day operations, these reports cover a lot of ground.

Where it gets harder is when you need to take that data somewhere else. Maybe you need to build a consolidated security dashboard that combines attack simulation metrics with other security signals. Maybe your CISO wants phishing resilience trends broken down by department, business unit, or geography over the last twelve months. Or your compliance team needs to correlate training completion with specific simulation campaigns in a format they can audit independently.

The built-in reports weren't designed for that level of customization. The data is there, but it's locked inside the Defender XDR portal. Exporting CSVs works for one-off analysis, but it breaks down when you're running monthly campaigns across thousands of users and need the data to stay current.

This project bridges that gap. It pulls the simulation data out through the Microsoft Graph API and lands it in Power BI, where you can build whatever reporting your organization needs.

What it does

It's an automated pipeline that pulls attack simulation data from the Microsoft Graph API, transforms it, and writes it as Parquet files to Azure Data Lake Storage Gen2. Power BI reads those files on a scheduled refresh. Once it's set up, the data flows without any manual steps.

Source
Defender
9 Graph endpoints
Compute
Azure Function
Python 3.11
Storage
ADLS Gen2
Parquet + JSON
Consume
Power BI
Dashboards

A timer-triggered Azure Function fires on a schedule (hourly by default). It authenticates through Managed Identity and Key Vault, then paginates through 9 Graph API endpoints with retry logic. The raw JSON gets flattened into tabular records, strings are sanitized, and everything is written as date-partitioned Parquet files. Power BI picks them up on the next refresh.

The whole pipeline runs in under two minutes.

What you get

The Power BI report template includes seven pre-built pages. These were designed by a security engineer, not a data analytics specialist, so treat them as a starting point. The real value is having all the data in Power BI where you can build whatever views, filters, and dashboards fit your organization's needs.

The data

The solution pulls nine tables that cover the full scope of an Attack Simulation Training program:

TableAPIDescription
repeatOffendersv1.0Users who fell for multiple simulations
simulationUserCoveragev1.0Per-user simulation statistics
trainingUserCoveragev1.0Per-user training completion
simulationsbetaCampaign definitions and metrics
simulationUsersbetaPer-user results for each simulation
simulationUserEventsbetaUser events: clicks, credential submissions, reports
trainingsbetaTraining module definitions
payloadsbetaPhishing payload templates
usersv1.0Entra ID enrichment (department, city, country)

The first three tables use the stable v1.0 Graph API. The rest use the beta API and are enabled by default through the SYNC_SIMULATIONS setting.

Design decisions

A few things that shaped how this was built:

Fully async. The function uses aiohttp for Graph API calls and the Azure SDK async client for storage writes. This keeps the function fast and avoids blocking on I/O. A full sync across all nine endpoints finishes in under two minutes.

Parquet with explicit schemas. Each table has a PyArrow schema defined in code, with Snappy compression and INT64 timestamps. This gives Power BI clean column types on import without any guessing.

Incremental sync. After the initial full sync, the function uses a 7-day lookback window to only process recent data. This cuts API calls by 70-80% and reduces the risk of hitting Graph API throttling limits.

Security first. The function authenticates with Managed Identity. Secrets live in Key Vault. The infrastructure supports private endpoints and VNet integration. All RBAC assignments follow least-privilege.

Modular storage layer. The ADLS writer is a separate module. If you want to send data to Microsoft Fabric, Azure SQL, or Dataverse instead, you can swap the writer without touching the rest of the pipeline.

Getting started

The repo includes full deployment guides for three methods: GitHub Actions, Azure CLI, and manual portal setup. Here's the short version:

Deploy the infrastructure

Run the included Bicep templates to create the Azure Function, ADLS Gen2, Key Vault, and Application Insights.

Register the app in Entra ID

Create an app registration with AttackSimulation.Read.All and User.Read.All application permissions.

Deploy the function code

Push via GitHub Actions, Azure CLI, or manually. The timer starts syncing data automatically.

Connect Power BI

Open the included report template, point it at your storage account, and refresh.

Tech stack

Python 3.11 Azure Functions v4 Microsoft Graph API ADLS Gen2 Power BI aiohttp PyArrow Bicep Key Vault Managed Identity

This is an open-source project released under the MIT license. Contributions, issues, and feature requests are welcome on GitHub.

Get the source code

Full documentation, deployment guides, and the Power BI report template are on GitHub.

View on GitHub