Automating MongoDB Atlas Alert Configuration with Excel and CLI
Managing alerts across multiple MongoDB Atlas projects manually is tedious and error-prone. I built an automation script that reads alert configurations from an Excel spreadsheet and deploys them via the MongoDB Atlas CLI, making it easy to maintain consistent alerting across environments.
The Problem
Atlas provides excellent default alerts, but production environments often need customized thresholds. When managing multiple projects or clusters, manually configuring 20+ alert types per project becomes a maintenance nightmare. Changes need to be tracked, documented, and applied consistently.
The Solution
The script takes an Excel file defining alert configurations and translates them into Atlas CLI commands. This provides:
- Version-controlled configuration - Excel files can be tracked in git
- Bulk operations - Create dozens of alerts in one command
- Dry-run mode - Generate JSON configs without creating alerts
- Safe deletion - Track automation-created alerts separately from defaults
How It Works
The automation handles three main workflows:
1. Threshold Parsing
Excel cells contain human-readable thresholds that need translation to Atlas API format:
"> 4000 for 2 minutes" → operator: GREATER_THAN, threshold: 4000, units: MINUTES, value: 2
"< 24h for 5 minutes" → operator: LESS_THAN, threshold: 24, units: HOURS, value: 5
"> 90%" → operator: GREATER_THAN, threshold: 90, units: PERCENTAGE
The parser handles various formats including milliseconds, gigabytes, percentages, and plain durations.
2. Alert Generation
Each row in the Excel file can generate multiple alerts:
- If low and high priority thresholds differ, two separate alerts are created
- Event-based alerts (like backup failures) trigger on any occurrence
- Metric-based alerts use the parsed threshold values
3. Alert Tracking
The script maintains .automation_alert_ids.json to track which alerts it created. This enables:
- Deleting only automation-created alerts while preserving Atlas defaults
- Re-running safely without creating duplicates
- Auditing what was deployed
Alert Types Covered
The configuration supports 20+ alert types across categories:
| Category | Examples |
|---|---|
| Replication | Oplog window, replication lag, primary elections |
| Performance | Disk IOPS, latency, page faults, queue depths |
| Resources | CPU usage, disk space, swap usage |
| Availability | Host down, no primary |
| Backup | Failed/successful snapshots, schedule behind |
Usage
Basic deployment:
./run_alerts.sh --project-id YOUR_PROJECT_ID
Preview without creating:
./run_alerts.sh --project-id YOUR_PROJECT_ID --dry-run
Remove automation alerts only:
./run_alerts.sh --project-id YOUR_PROJECT_ID --delete-existing
Implementation Notes
The script uses the Atlas CLI rather than the REST API directly. This provides:
- Simpler authentication (reuses
atlas auth login) - Consistent JSON format for alert configs
- Built-in validation before submission
Generated JSON files are saved to ./alerts/ for review. Each alert config follows the Atlas Alert Configuration File format.
Practical Considerations
- Low vs High Priority - Separate thresholds create separate alerts, useful for different notification channels
- Notification Roles - Defaults to
GROUP_OWNERbut can target specific emails - Idempotency - The tracking file prevents duplicate alerts on re-runs
- Atlas Defaults - The
--delete-existingflag preserves Atlas default alerts
The code is available at research/atlas-alerts-creation.