Files
oam/knowledge base/cloud computing/aws/cloudwatch.md
2026-02-07 13:59:39 +01:00

7.8 KiB

CloudWatch

Observability service. with functions for logging, monitoring and alerting.

  1. TL;DR
  2. Queries of interest
  3. Stream logs
  4. Cost-saving measures
  5. Further readings
    1. Sources

TL;DR

Metrics are whatever needs to be monitored (e.g. CPU usage).
Data points are the values of a metric over time.
Namespaces are containers for metrics.

Metrics only exist in the region in which they are created.

Many AWS services offer basic monitoring by publishing a default set of metrics to CloudWatch with no charge.
This feature is automatically enabled by default when one starts using one of these services.

API calls for CloudWatch are paid. This includes sending logs and metrics to it.
Refer Which log group is causing a sudden increase in my CloudWatch Logs bill? to get an idea of what changed in some time frame.

It's best practice to distribute the ListMetrics call to avoid throttling.
The default limit for ListMetrics is 25 transactions per second.

The CloudWatch console offers some default good queries.

Logs in Log Groups can be streamed elsewhere.

CloudWatch retains metrics' data as follows:

  • Data points with a period of less than 60 seconds are available for 3 hours.
    These are high-resolution custom metrics.
  • Data points with a period of 60 seconds (1 minute) are available for 15 days.
  • Data points with a period of 300 seconds (5 minutes) are available for 63 days.
  • Data points with a period of 3600 seconds (1 hour) are available for 455 days (15 months).

Data points are aggregated together for long-term storage after the initial period.
E.g., data using a period of 1 minute remains available for 15 days with 1-minute resolution, then it is aggregated and made available with a resolution of 5 minutes; after 63 days, it is further aggregated and made available with a resolution of 1 hour for 15 months.

CLI commands
# List available metrics
aws cloudwatch list-metrics --namespace 'AWS/EC2'
aws cloudwatch list-metrics --namespace 'AWS/EC2' --metric-name 'CPUUtilization'
aws cloudwatch list-metrics --namespace 'AWS/EC2' --dimensions 'Name=InstanceId,Value=i-01234567890abcdef' \
  --query 'Metrics[].MetricName'

# Show alarms information
aws cloudwatch describe-alarms-for-metric --metric-name 'CPUUtilization' --namespace 'AWS/EC2' \
  --dimensions 'Name=InstanceId,Value=i-01234567890abcdef'

# Toggle alarm actions
aws cloudwatch disable-alarm-actions --alarm-names 'SomeServer_SystemStatusCheck'
aws cloudwatch ensable-alarm-actions --alarm-names 'SomeServer_SystemStatusCheck' 'SomeServer_InstanceStatusCheck'

Queries of interest

What Section Tab How to visualize
Top 10 log groups by written bytes All Metrics Graphed metrics Add Query > Logs > Top 10 log groups by written bytes
Get a dashboard of how much data a small set of log groups ingested in the last 30 days

This graph works only with the Absolute time period option.
Should you choose Relative, the graph returns incorrect data.

  1. CloudWatch console > All metrics (navigation pane on the left).
  2. Choose Logs, Log group metrics.
  3. Select the individual IncomingBytes metrics of each log group of interest.
  4. Choose the Graphed metrics tab.
  5. For each metric:
    • Change Statistic to Sum.
    • Change Period to 30 Days.
  6. Choose the Graph options tab.
  7. Choose the Number option group.
  8. At the top right of the graph, choose Custom as the time range.
  9. Choose Absolute.
  10. Select the last 30 days as start and end date.
Get a dashboard of how much data all log groups ingested in the last 30 days

This graph works only with the Absolute time period option.
Should you choose Relative, the graph returns incorrect data.

  1. CloudWatch console > All metrics (navigation pane on the left).

  2. Choose the Graphed metrics tab.

  3. From the Add math dropdown list, choose Start with an empty expression.

  4. Paste this as math expression:

    SORT(REMOVE_EMPTY(SEARCH('{AWS/Logs,LogGroupName} MetricName="IncomingBytes"', 'Sum', 2592000)),SUM, DESC)
    
  5. At the top right of the graph, choose Custom as the time range.

  6. Choose Absolute.

  7. Select the last 30 days as start and end date.

Stream logs

Refer Real-time processing of log data with subscriptions.
Also refer Streaming CloudWatch Logs data to Amazon OpenSearch Service to stream to AWS-managed Opensearch domains.

Logs in CloudWatch Log Groups can be streamed Kinesis, Firehose or Lambda by leveraging Logs subscriptions.

Cost-saving measures

  • Configure an appropriate log retention period for any log groups.
    Log groups containing development logs should not usually need more than 1w worth.
  • When in doubt, still configure a default, long log retention period for all log groups (10y?).

Further readings

Sources