diff --git a/knowledge base/cloud computing/aws/ecs.md b/knowledge base/cloud computing/aws/ecs.md index 199c509..b8e6dc0 100644 --- a/knowledge base/cloud computing/aws/ecs.md +++ b/knowledge base/cloud computing/aws/ecs.md @@ -702,7 +702,7 @@ One must delete namespaces in AWS Cloud Map themselves. -
+
Requirements - Tasks running in Fargate **must** use the Fargate Linux platform version 1.4.0 or higher. @@ -730,11 +730,75 @@ One must delete namespaces in AWS Cloud Map themselves.
+Procedure: + +1. Configure the ECS cluster to use the desired AWS Cloud Map namespace. + +
+ Simplified process + + Create the cluster with the desired name for the AWS Cloud Map namespace, and specify that name for the namespace + when asked.
+ ECS will create a new HTTP namespace with the necessary configuration.
+ As reminder, Service Connect doesn't use or create DNS hosted zones in Amazon Route 53. FIXME: check this + +
+ +1. Configure port names in the server services' task definitions for all the port mappings that the services will expose + in Service Connect. + +
+ + ```json + containerDefinitions: [ + { + "name": "postgres", + "protocol": "tcp", + "containerPort": 5432, + }, + … + ] + ``` + +
+ +1. Configure the server services to create Service Connect endpoints within the namespace. + +
+ + ```json + "serviceConnectConfiguration": { + "enabled": true, + "namespace": "ecs-dev-cluster", + "services": [ + { + "portName": "postgres", + "discoveryName": "postgres", + "clientAliases": [{ + "port": 5432, + "dnsName": "pgsql" + }] + } + ] + }, + ``` + +
+ +1. Deploy the services.
+ This will create the endpoints AWS Cloud Map namespace used by the cluster.
+ ECS also injects the Service Connect proxy container in each task. +1. Deploy the client applications as ECS services.
+ ECS connects them to the Service Connect endpoints through the Service Connect proxy in each task. +1. Applications only use the proxy to connect to Service Connect endpoints.
+ No additional configuration is required to use the proxy. +1. \[optionally] Monitor traffic through the Service Connect proxy in Amazon CloudWatch. + ### ECS service discovery Service discovery helps manage HTTP and DNS namespaces for ECS services. -ECS syncs the list of launched tasks to AWS Cloud Map.
+ECS automatically registers and de-registers the list of launched tasks to AWS Cloud Map.
Cloud Map maintains DNS records that resolve to the internal IP addresses of one or more tasks from registered services.
Other services in the **same** VPC can use such DNS records to send traffic directly to containers using their internal @@ -756,12 +820,31 @@ configured. Service discovery supports only the `A` and `SRV` DNS record types.
DNS records are automatically added or removed as tasks start or stop for ECS services. +Until ECS registers the tasks, Containers in them might complain about being unable to resolve the services they are +using. + DNS records have a TTL and it might happen that tasks died before this ended.
One **must** implement extra logic in one's applications, so that they can handle retries and deal with connection failures when the records are not yet updated. See also [Use service discovery to connect Amazon ECS services with DNS names]. +Procedure: + +1. Create the desired AWS Cloud Map namespace. +1. Create the desired Cloud Map service in the namespace. +1. Configure ECS services to use the Cloud Map service. + +
+ + ```json + "serviceRegistries": [{ + "registryArn": "arn:aws:servicediscovery:eu-west-1:012345678901:service/srv-uuf33b226vw93biy" + }], + ``` + +
+ ### VPC Lattice Managed application networking service that customers can use to observe, secure, and monitor applications built across @@ -799,7 +882,7 @@ Solutions: This **will** cost money. - Target a lambda that returns a [308 Permanent Redirect] code with the current IP addresses of the requested tasks. -- Use dynamic service discovery mechanisms like AWS Cloud Map.
+- Use dynamic service discovery mechanisms like [AWS Cloud Map][What Is AWS Cloud Map?].
Refer [Metrics collection from Amazon ECS using Amazon Managed Service for Prometheus] and [aws-cloudmap-prometheus-sd]. @@ -838,6 +921,7 @@ Specify a supported value for the task CPU and memory in your task definition. - [EFS] - [Amazon ECS Exec Checker] - [ECS Execute-Command proposal] +- [What Is AWS Cloud Map?] ### Sources @@ -864,6 +948,7 @@ Specify a supported value for the task CPU and memory in your task definition. - [Scraping Prometheus metrics from applications running in AWS ECS] - [How can I allow the tasks in my Amazon ECS services to communicate with each other?] - [Interconnect Amazon ECS services] +- [Amazon ECS Service Discovery] [amazon ecs exec checker]: https://github.com/aws-containers/amazon-ecs-exec-checker +[Amazon ECS Service Discovery]: https://aws.amazon.com/blogs/aws/amazon-ecs-service-discovery/ [amazon ecs services]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_services.html [amazon ecs standalone tasks]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/standalone-tasks.html [amazon ecs task definition differences for the fargate launch type]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/fargate-tasks-services.html @@ -914,6 +1000,7 @@ Specify a supported value for the task CPU and memory in your task definition. [Use service discovery to connect Amazon ECS services with DNS names]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-discovery.html [using amazon ecs exec to access your containers on aws fargate and amazon ec2]: https://aws.amazon.com/blogs/containers/new-using-amazon-ecs-exec-access-your-containers-fargate-ec2/ [What is Amazon VPC Lattice?]: https://docs.aws.amazon.com/vpc-lattice/latest/ug/what-is-vpc-lattice.html +[What Is AWS Cloud Map?]: https://docs.aws.amazon.com/cloud-map/latest/dg/what-is-cloud-map.html [`aws ecs execute-command` results in `TargetNotConnectedException` `The execute command failed due to an internal error`]: https://stackoverflow.com/questions/69261159/aws-ecs-execute-command-results-in-targetnotconnectedexception-the-execute diff --git a/knowledge base/mimir.md b/knowledge base/mimir.md index 6e39112..7487042 100644 --- a/knowledge base/mimir.md +++ b/knowledge base/mimir.md @@ -12,6 +12,7 @@ and set up alerting rules across multiple tenants to leverage tenant federation. 1. [Setup](#setup) 1. [Monolithic mode](#monolithic-mode) 1. [Microservices mode](#microservices-mode) + 1. [Run on AWS ECS Fargate](#run-on-aws-ecs-fargate) 1. [Storage](#storage) 1. [Object storage](#object-storage) 1. [Authentication and authorization](#authentication-and-authorization) @@ -31,8 +32,8 @@ Scrapers (like Prometheus or Grafana's Alloy) need to send metrics data to Mimir Mimir will **not** scrape metrics itself. Mimir listens by default on port `8080` for HTTP and on port `9095` for GRPC.
-It also internally advertises data or actions to members in the cluster using the [gossip protocol]. This uses port -`7946` by default and **must** be reachable by all members in the cluster to work. +It also internally advertises data or actions to members in the cluster using [hashicorp/memberlist], which implements a +[gossip protocol]. This uses port `7946` by default, and **must** be reachable by all members in the cluster to work. Mimir stores time series in TSDB blocks, that are uploaded to an object storage bucket.
Such blocks are the same that Prometheus and Thanos use, though each application stores blocks in different places and @@ -79,21 +80,28 @@ helm --namespace 'mimir-test' upgrade --install --create-namespace 'mimir' 'graf mimir -help mimir -help-all -# Validate configuration files +# Validate configuration files. mimir -modules -config.file 'path/to/config.yaml' -# See the current configuration of components +# Run tests. +# Refer . +mimir -target='continuous-test' \ + -tests.write-endpoint='http://localhost:8080' -tests.read-endpoint='http://localhost:8080' \ + -tests.smoke-test \ # just once + -server.http-listen-port='18080' -server.grpc-listen-port='19095' # avoid colliding with the running instance + +# See the current configuration of components. GET /config GET /runtime_config -# See changes in the runtime configuration from the default one +# See changes in the runtime configuration from the default one. GET /runtime_config?mode=diff -# Check the service is ready -# A.K.A. readiness probe +# Check the service is ready. +# A.K.A. readiness probe. GET /ready -# Get metrics +# Get metrics. GET /metrics ``` @@ -305,6 +313,27 @@ Recommended using Kubernetes and the [`mimir-distributed` Helm chart][helm chart Each component scales up independently.
This allows for greater flexibility and more granular failure domains. +### Run on AWS ECS Fargate + +See also [AWS ECS] and [Mimir on AWS ECS Fargate]. + +Things to consider: + +- Go for [ECS service discovery] instead of [ECS Service Connect]. + +
+ + > This needs to be confirmed, but it is how it worked for me. + + Apparently, at the time of writing, Service Connect _prefers_ answering in IPv6 for ECS-related queries.
+ There seems to be no way to customize this for now. + + At the same time, [hashicorp/memberlist] seems to only use IPv4 unless explicitly required to listen on a IPv6 + address.
+ Which, one would have no way to programmatically set **before** creating the resources. + +
+ ## Storage Mimir supports the `s3`, `gcs`, `azure`, `swift`, and `filesystem` backends.
@@ -488,6 +517,8 @@ ingester: - [Codebase] - [Prometheus] - [Grafana] +- [hashicorp/memberlist] +- [Gossip protocol] - [Ceiling Function] Alternatives: @@ -518,6 +549,8 @@ Alternatives: [aws ecs]: cloud%20computing/aws/ecs.md [aws efs]: cloud%20computing/aws/efs.md [cortex]: cortex.md +[ecs service connect]: cloud%20computing/aws/ecs.md#ecs-service-connect +[ecs service discovery]: cloud%20computing/aws/ecs.md#ecs-service-discovery [grafana]: grafana.md [prometheus]: prometheus/README.md [thanos]: thanos.md @@ -540,5 +573,7 @@ Alternatives: [website]: https://grafana.com/oss/mimir/ -[Gossip protocol]: https://en.wikipedia.org/wiki/Gossip_protocol [Ceiling Function]: https://www.geeksforgeeks.org/ceiling-function/ +[Gossip protocol]: https://en.wikipedia.org/wiki/Gossip_protocol +[hashicorp/memberlist]: https://github.com/hashicorp/memberlist +[Mimir on AWS ECS Fargate]: https://github.com/grafana/mimir/discussions/3807#discussioncomment-4602413 diff --git a/snippets/pulumi/aws/run mimir in monolithic mode on ecs fargate.ts b/snippets/pulumi/aws/run mimir in monolithic mode on ecs fargate.ts new file mode 100644 index 0000000..3d2135a --- /dev/null +++ b/snippets/pulumi/aws/run mimir in monolithic mode on ecs fargate.ts @@ -0,0 +1,509 @@ +import * as pulumi from '@pulumi/pulumi'; +import * as aws from '@pulumi/aws'; + +const dnsZone: pulumi.Output = aws.route53.getZoneOutput({ name: 'example.com.' }); +const ecsCluster: pulumi.Output = aws.ecs.getClusterOutput({ clusterName: 'development' }); +const ecsTaskExecutionRole: pulumi.Output = aws.iam.getRoleOutput({ name: 'DefaultEcsTaskExecutionRole' }); +const privateSubnets: pulumi.Output = aws.ec2.getSubnetsOutput({ + filters: [{ + name: 'tag:Name', + values: [ + 'private-a', + 'private-b', + 'private-c', + ], + }], +}); +const vpc: pulumi.Output = aws.ec2.getVpcOutput({ default: true }); + +// FIXME: check before use + +const securityGroup: aws.ec2.SecurityGroup = new aws.ec2.SecurityGroup( + 'mimir', + { + name: 'mimir', + description: 'Manage access to and from the Mimir ECS service', + tags: { + Environment: 'Development', + Application: 'Mimir', + Component: 'Networking', + Name: 'Mimir', + }, + + vpcId: vpc.id, + }, +); +new aws.vpc.SecurityGroupIngressRule( + 'mimir-internalTraffic', + { + securityGroupId: securityGroup.id, + description: 'Traffic within the Security Group', + tags: { + Name: 'Intra-SG traffic', + }, + + referencedSecurityGroupId: securityGroup.id, + ipProtocol: '-1', + }, + { + deleteBeforeReplace: true, + parent: securityGroup, + }, +); +new aws.vpc.SecurityGroupIngressRule( + 'mimir-VPC:IPv4-httpServer', + { + securityGroupId: securityGroup.id, + tags: { + Name: 'VPC IPv4 to HTTP server', + }, + + description: 'Access the Mimir HTTP server from resources in the VPC via IPv4', + cidrIpv4: vpc.cidrBlock, + ipProtocol: 'tcp', + fromPort: 8080, + toPort: 8080, + }, + { + deleteBeforeReplace: true, + parent: securityGroup, + }, +); +new aws.vpc.SecurityGroupIngressRule( + 'mimir-VPC:IPv6-httpServer', + { + securityGroupId: securityGroup.id, + tags: { + Name: 'VPC IPv6 to HTTP server', + }, + + description: 'Access the Mimir HTTP server from resources in the VPC via IPv6', + cidrIpv6: vpc.ipv6CidrBlock, + ipProtocol: 'tcp', + fromPort: 8080, + toPort: 8080, + }, + { + deleteBeforeReplace: true, + parent: securityGroup, + }, +); +new aws.vpc.SecurityGroupIngressRule( + 'mimir-VPC:IPv4-gRPCServer', + { + securityGroupId: securityGroup.id, + tags: { + Name: 'VPC IPv4 to gRPC server', + }, + + description: 'Access the Mimir gRPC server from resources in the VPC via IPv4', + cidrIpv4: vpc.cidrBlock, + ipProtocol: 'tcp', + fromPort: 9095, + toPort: 9095, + }, + { + deleteBeforeReplace: true, + parent: securityGroup, + }, +); +new aws.vpc.SecurityGroupIngressRule( + 'mimir-VPC:IPv6-gRPCServer', + { + securityGroupId: securityGroup.id, + tags: { + Name: 'CurrentEverythingVpc IPv6 to gRPC server', + }, + + description: 'Access the Mimir gRPC server from resources in the VPC via IPv6', + cidrIpv6: vpc.ipv6CidrBlock, + ipProtocol: 'tcp', + fromPort: 9095, + toPort: 9095, + }, + { + deleteBeforeReplace: true, + parent: securityGroup, + }, +); +new aws.vpc.SecurityGroupEgressRule( + 'mimir-allowAllIPv4', + { + securityGroupId: securityGroup.id, + tags: { + Name: 'All IPv4', + }, + + description: 'Connect everywhere from Mimir on IPv4', + cidrIpv4: '0.0.0.0/0', + ipProtocol: '-1', + }, + { + deleteBeforeReplace: true, + parent: securityGroup, + }, +); +new aws.vpc.SecurityGroupEgressRule( + 'mimir-allowAllIPv6', + { + securityGroupId: securityGroup.id, + tags: { + Name: 'All IPv6', + }, + + description: 'Connect everywhere from Mimir on IPv6', + cidrIpv6: '::/0', + ipProtocol: '-1', + }, + { + deleteBeforeReplace: true, + parent: securityGroup, + }, +); + +const bucket: aws.s3.BucketV2 = new aws.s3.BucketV2( + 'mimir', + { + bucket: 'mimir', + tags: { + Environment: 'Development', + Application: 'Mimir', + Component: 'Storage', + }, + }, +); + +const ecsTaskRole: aws.iam.Role = new aws.iam.Role( + 'mimir-ecsTask', + { + name: 'Mimir-ECSTask', + description: 'Allow Mimir ECS tasks to access the resources they need', + tags: { + Environment: 'Development', + Application: 'Mimir', + Component: 'Server', + }, + + assumeRolePolicy: JSON.stringify({ + Version: '2012-10-17', + Statement: [{ + Effect: 'Allow', + Principal: { + Service: 'ecs-tasks.amazonaws.com', + }, + Action: 'sts:AssumeRole', + }], + }), + }, +); +new aws.iam.RolePolicy( + 'mimir-ecsTask-allowRoleFunctions', + { + role: ecsTaskRole, + name: 'AllowRoleFunctions', + + policy: pulumi.jsonStringify({ + Version: '2012-10-17', + Statement: [{ + Sid: 'AllowUsingS3BucketsForData', + Effect: 'Allow', + Action: [ + 's3:ListBucket', + 's3:PutObject', + 's3:GetObject', + 's3:DeleteObject', + ], + Resource: [ + pulumi.interpolate `${bucket.arn}`, + pulumi.interpolate `${bucket.arn}/*`, + ], + }], + }), + }, +); + +const cloudMap_namespace = new aws.servicediscovery.PrivateDnsNamespace( + 'mimir', + { + name: 'mimir.dev.ecs.local', + description: 'Mimir Development', + tags: { + Environment: 'Development', + Application: 'Mimir', + Component: 'Networking', + }, + + vpc: vpc.id, + }, +); +const cloudMap_service = new aws.servicediscovery.Service( + 'mimir-memberlist', + { + name: 'memberlist', + description: 'Gossip ring for ingesters in Mimir', + tags: { + Environment: 'Development', + Application: 'Mimir', + Component: 'Networking', + }, + + namespaceId: cloudMap_namespace.id, + dnsConfig: { + namespaceId: cloudMap_namespace.id, + dnsRecords: [{ + type: 'A', + ttl: 10, + }], + routingPolicy: 'MULTIVALUE', + }, + }, +); + +const logGroup: aws.cloudwatch.LogGroup = new aws.cloudwatch.LogGroup( + 'mimir', + { + name: '/ecs/dev/mimir', + tags: { + Environment: 'Development', + Application: 'Mimir', + Component: 'Server', + }, + + retentionInDays: 7, + }, +); + +const taskDefinition: aws.ecs.TaskDefinition = new aws.ecs.TaskDefinition( + 'mimir', + { + family: 'mimir', + tags: { + Environment: 'Development', + Application: 'Mimir', + Component: 'Server', + }, + + networkMode: 'awsvpc', + requiresCompatibilities: [ 'FARGATE' ], + cpu: '512', // Fargate requirement. See . + memory: '1024', // Fargate requirement. See . + executionRoleArn: ecsTaskExecutionRole.arn, // logging requirement + taskRoleArn: ecsTaskRole.arn, + containerDefinitions: pulumi.jsonStringify([ + { + name: 'mimir', + image: '012345678901.dkr.ecr.eu-west-1.amazonaws.com/cache/docker-hub/grafana/mimir:2.15.2', + essential: true, + command: [ + '-auth.multitenancy-enabled=false', + pulumi.interpolate `-memberlist.join=dns+${cloudMap_service.name}.${cloudMap_namespace.name}:7946`, + '-common.storage.backend=s3', + '-common.storage.s3.endpoint=s3.eu-west-1.amazonaws.com', // required + pulumi.interpolate `-common.storage.s3.bucket-name=${bucket.bucket}`, + '-alertmanager-storage.storage-prefix=alertmanager', + '-blocks-storage.storage-prefix=blocks', + '-ruler-storage.storage-prefix=ruler', + '-ingester.max-global-series-per-user=300000', + '-ingester.out-of-order-time-window=5m', + '-ingester.ring.replication-factor=1', // required when using less than 3 replicas + ], + // healthCheck: { + // // FIXME: the image uses `blobs` as base, which has no binaries but `mimir` + // // FIXME: mimir -target='continuous-test' -tests.write-endpoint='http://localhost:8080' -tests.read-endpoint='http://localhost:8080' -tests.smoke-test -server.http-listen-port='18080' -server.grpc-listen-port='19095' ?? + // command: [ + // 'CMD-SHELL', + // 'wget -qO- localhost:8080/ready || exit 1', + // ], + // startPeriod: 60, // it takes a while + // retries: 10, + // }, + portMappings: [ + { + name: 'memberlist', + protocol: 'tcp', + appProtocol: 'http', + containerPort: 7946, + hostPort: 7946, + }, + { + name: 'api', + protocol: 'tcp', + appProtocol: 'http', + containerPort: 8080, + hostPort: 8080, + }, + { + name: 'grpc', + protocol: 'tcp', + appProtocol: 'http', + containerPort: 9095, + hostPort: 9095, + }, + ], + logConfiguration: { + logDriver: 'awslogs', + options: { + 'awslogs-group': logGroup.name, + 'awslogs-region': 'eu-west-1', + 'awslogs-stream-prefix': 'ecs/dev', + }, + }, + + // explicitly specified to avoid showing changes on every run + environment: [], + mountPoints: [], + systemControls: [], + volumesFrom: [], + }, + ]), + }, +); + +const alb_targetGroup_http = new aws.alb.TargetGroup( + 'mimir-http', + { + name: 'mimir-http', + tags: { + Environment: 'Development', + Application: 'Mimir', + Component: 'Networking', + }, + + vpcId: vpc.id, + targetType: 'ip', + ipAddressType: 'ipv4', + protocol: 'HTTP', + port: 8080, + healthCheck: { + path: '/ready', + }, + }, +); +const alb_targetGroup_grpc = new aws.alb.TargetGroup( + 'mimir-grpc', + { + name: 'mimir-grpc', + tags: { + Environment: 'Development', + Application: 'Mimir', + Component: 'Networking', + }, + + vpcId: vpc.id, + targetType: 'ip', + ipAddressType: 'ipv4', + protocol: 'HTTP', // FIXME + port: 9095, + // healthCheck: { + // // FIXME + // path: '/ready', + // }, + }, +); +const alb = new aws.alb.LoadBalancer( + 'mimir', + { + name: 'mimir', + tags: { + Environment: 'Development', + Application: 'Mimir', + Component: 'Networking', + }, + + internal: true, + ipAddressType: 'ipv4', + subnets: privateSubnets.ids, + securityGroups: [ securityGroup.id ], + accessLogs: { + bucket: bucket.bucket, + }, + }, +); +new aws.route53.Record( + 'mimir', + { + zoneId: dnsZone.id, + name: pulumi.interpolate `mimir.dev.${dnsZone.name}`, + type: 'A', + aliases: [{ + name: alb.dnsName, + zoneId: alb.zoneId, + evaluateTargetHealth: true, + }], + }, +); +new aws.alb.Listener( + 'mimir-http', + { + tags: { + Environment: 'Development', + Application: 'Mimir', + Component: 'Networking', + }, + + loadBalancerArn: alb.arn, + port: 8080, + protocol: 'HTTP', + defaultActions: [{ + order: 1, + targetGroupArn: alb_targetGroup_http.arn, + type: 'forward', + }], + }, +); +new aws.alb.Listener( + 'mimir-grpc', + { + tags: { + Environment: 'Development', + Application: 'Mimir', + Component: 'Networking', + }, + loadBalancerArn: alb.arn, + port: 9095, + protocol: 'HTTP', // FIXME? + defaultActions: [{ + order: 1, + targetGroupArn: alb_targetGroup_grpc.arn, + type: 'forward', + }], + }, +); + +new aws.ecs.Service( + 'mimir', + { + name: 'mimir', + tags: { + Environment: 'Development', + Application: 'Mimir', + Component: 'Server', + }, + + cluster: ecsCluster.arn, + taskDefinition: taskDefinition.arn, + desiredCount: 1, // requires mimir to start with the '-ingester.ring.replication-factor=1' option + launchType: 'FARGATE', + networkConfiguration: { + subnets: privateSubnets.ids, + securityGroups: [ securityGroup.id ], + }, + loadBalancers: [ + { + containerName: 'mimir', + containerPort: 8080, + targetGroupArn: alb_targetGroup_http.arn, + }, + { + containerName: 'mimir', + containerPort: 9095, + targetGroupArn: alb_targetGroup_grpc.arn, + }, + ], + // enableExecuteCommand: true, // FIXME: the image uses `blobs` as base, which has no binaries but `mimir` + serviceRegistries: { + registryArn: cloudMap_service.arn, + }, + }, + { deleteBeforeReplace: true }, +);