Level 6
10 min readLesson 37 of 43

Notifications and Monitoring

Keeping eyes on your system 24/7

You Can't Watch It All Day

Your trading bot runs 24/7. You can't. You need systems that watch for you and alert you when something needs attention.

What to Monitor

System Health:

  • Process status (running, crashed, restarting)
  • Memory and CPU usage
  • Disk space
  • Network connectivity

Trading Metrics:

  • P&L (daily, weekly, by strategy)
  • Win rate (rolling)
  • Drawdown (current, maximum)
  • Position exposure

Execution Quality:

  • Order success rate
  • Average slippage
  • Fill time
  • Rejected orders

Data Quality:

  • Data freshness
  • Missing data points
  • Anomalous values

Monitoring Architecture

The pattern: Your Trading System --> Metrics Collector --> Time Series Database --> Dashboard + Alerting

Metrics Collector: Your code emits metrics (counters, gauges, histograms).

Time Series Database: Stores metrics over time (InfluxDB, Prometheus, TimescaleDB).

Dashboard: Visualizes metrics (Grafana is the standard).

Alerting: Triggers notifications when thresholds are breached.

Key Metrics to Track

Counters (things that only go up):

  • Total signals generated
  • Total orders placed
  • Total errors

Gauges (point-in-time values):

  • Current P&L
  • Current position size
  • Account balance
  • Open order count

Histograms (distributions):

  • Order fill time
  • Slippage amounts
  • Signal processing latency

Building Dashboards

A good trading dashboard shows at a glance:

Top Row: Overall system health (green/yellow/red indicators)

Second Row: P&L chart (daily, cumulative), current positions, account balance

Third Row: Signal and execution metrics, error rates, latency

Fourth Row: Per-strategy breakdown

Keep it simple. If you need to scroll to find critical information, redesign.

Alerting Rules

Good alerts are actionable. Bad alerts are ignored.

Good Alert: "Position size exceeds 2x normal. Current: 0.5 BTC. Expected max: 0.25 BTC"

Bad Alert: "Error occurred" (Which error? Where? What impact?)

Alert Fatigue: Too many alerts = all alerts ignored. Be selective about what triggers notifications.

Notification Channels

Different urgency needs different channels:

SMS/Phone Call: Critical issues requiring immediate action. Position mismatch, system down, unusual P&L.

Telegram/Discord: Important but not emergency. Execution issues, high error rates, risk limit warnings.

Email: Daily summaries, reports, non-urgent information.

Dashboard: Everything else. Details available when you look, no push notification.

Building a Notification System

Simple architecture: Event --> Notification Router --> Channel-Specific Senders --> Telegram/Discord/Email/SMS

Router Logic:

  • Classify event severity
  • Apply throttling (don't send 100 alerts per minute)
  • Route to appropriate channel(s)

Throttling is Critical: If an error occurs 1000 times per minute, you don't need 1000 notifications. Aggregate and summarize.

What Notifications to Send

Always Notify:

  • System start/stop
  • Trade executions (entry and exit)
  • Risk limit breaches
  • Position mismatches
  • Significant P&L changes

Conditionally Notify:

  • Signal generation (optional, can be noisy)
  • Minor errors (aggregate into digest)
  • Performance metrics (daily summary)

Never Notify:

  • Routine operations
  • Debug information
  • Expected errors that self-resolve

Signal Notifications

For each trade signal, include:

  • Direction (LONG/SHORT)
  • Asset and exchange
  • Entry price
  • Stop loss level
  • Position size
  • Strategy/edge that generated it

Example: "LONG BTC @ $95,000 | Stop: $93,100 | Size: 0.1 BTC | Edge: DPO_PVOL_2h"

Daily Summary Reports

Send daily at a consistent time:

  • P&L for the day
  • Win/loss count
  • Current positions
  • Notable events
  • System health summary

Automate these. Manual reporting means it won't happen consistently.

Monitoring Your Monitoring

Meta, but important: What happens if your monitoring system fails?

  • Have a heartbeat: If you don't receive a "system healthy" message every hour, something's wrong
  • Use external monitoring: A third-party service that checks if your systems are reachable
  • Redundant channels: If Telegram is down, alerts should fall back to email

Takeaway

Good monitoring is invisible when things work and invaluable when they don't. Invest in dashboards that show system health at a glance and alerts that tell you exactly what's wrong and what to do about it.