You Can't Watch It All Day
Your trading bot runs 24/7. You can't. You need systems that watch for you and alert you when something needs attention.
What to Monitor
System Health:
- •Process status (running, crashed, restarting)
- •Memory and CPU usage
- •Disk space
- •Network connectivity
Trading Metrics:
- •P&L (daily, weekly, by strategy)
- •Win rate (rolling)
- •Drawdown (current, maximum)
- •Position exposure
Execution Quality:
- •Order success rate
- •Average slippage
- •Fill time
- •Rejected orders
Data Quality:
- •Data freshness
- •Missing data points
- •Anomalous values
Monitoring Architecture
The pattern: Your Trading System --> Metrics Collector --> Time Series Database --> Dashboard + Alerting
Metrics Collector: Your code emits metrics (counters, gauges, histograms).
Time Series Database: Stores metrics over time (InfluxDB, Prometheus, TimescaleDB).
Dashboard: Visualizes metrics (Grafana is the standard).
Alerting: Triggers notifications when thresholds are breached.
Key Metrics to Track
Counters (things that only go up):
- •Total signals generated
- •Total orders placed
- •Total errors
Gauges (point-in-time values):
- •Current P&L
- •Current position size
- •Account balance
- •Open order count
Histograms (distributions):
- •Order fill time
- •Slippage amounts
- •Signal processing latency
Building Dashboards
A good trading dashboard shows at a glance:
Top Row: Overall system health (green/yellow/red indicators)
Second Row: P&L chart (daily, cumulative), current positions, account balance
Third Row: Signal and execution metrics, error rates, latency
Fourth Row: Per-strategy breakdown
Keep it simple. If you need to scroll to find critical information, redesign.
Alerting Rules
Good alerts are actionable. Bad alerts are ignored.
Good Alert: "Position size exceeds 2x normal. Current: 0.5 BTC. Expected max: 0.25 BTC"
Bad Alert: "Error occurred" (Which error? Where? What impact?)
Alert Fatigue: Too many alerts = all alerts ignored. Be selective about what triggers notifications.
Notification Channels
Different urgency needs different channels:
SMS/Phone Call: Critical issues requiring immediate action. Position mismatch, system down, unusual P&L.
Telegram/Discord: Important but not emergency. Execution issues, high error rates, risk limit warnings.
Email: Daily summaries, reports, non-urgent information.
Dashboard: Everything else. Details available when you look, no push notification.
Building a Notification System
Simple architecture: Event --> Notification Router --> Channel-Specific Senders --> Telegram/Discord/Email/SMS
Router Logic:
- •Classify event severity
- •Apply throttling (don't send 100 alerts per minute)
- •Route to appropriate channel(s)
Throttling is Critical: If an error occurs 1000 times per minute, you don't need 1000 notifications. Aggregate and summarize.
What Notifications to Send
Always Notify:
- •System start/stop
- •Trade executions (entry and exit)
- •Risk limit breaches
- •Position mismatches
- •Significant P&L changes
Conditionally Notify:
- •Signal generation (optional, can be noisy)
- •Minor errors (aggregate into digest)
- •Performance metrics (daily summary)
Never Notify:
- •Routine operations
- •Debug information
- •Expected errors that self-resolve
Signal Notifications
For each trade signal, include:
- •Direction (LONG/SHORT)
- •Asset and exchange
- •Entry price
- •Stop loss level
- •Position size
- •Strategy/edge that generated it
Example: "LONG BTC @ $95,000 | Stop: $93,100 | Size: 0.1 BTC | Edge: DPO_PVOL_2h"
Daily Summary Reports
Send daily at a consistent time:
- •P&L for the day
- •Win/loss count
- •Current positions
- •Notable events
- •System health summary
Automate these. Manual reporting means it won't happen consistently.
Monitoring Your Monitoring
Meta, but important: What happens if your monitoring system fails?
- •Have a heartbeat: If you don't receive a "system healthy" message every hour, something's wrong
- •Use external monitoring: A third-party service that checks if your systems are reachable
- •Redundant channels: If Telegram is down, alerts should fall back to email
Takeaway
Good monitoring is invisible when things work and invaluable when they don't. Invest in dashboards that show system health at a glance and alerts that tell you exactly what's wrong and what to do about it.