chore(opensearch): review storage sections

This commit is contained in:
Michele Cereda
2024-06-15 01:20:10 +02:00
parent 80b0192c3f
commit 67bd2b40b5
3 changed files with 87 additions and 19 deletions

View File

@@ -198,6 +198,8 @@ From [Using service-linked roles]:
> Service-linked roles appear in your AWS account and are owned by the service. An IAM administrator can view, but not > Service-linked roles appear in your AWS account and are owned by the service. An IAM administrator can view, but not
> edit the permissions for service-linked roles. > edit the permissions for service-linked roles.
Check [aws.permissions.cloud] for a community-driven source of truth for AWS identity.
### IAM policies ### IAM policies
IAM does not expose policies' `Sid` element in the IAM API, so it can't be used to retrieve statements. IAM does not expose policies' `Sid` element in the IAM API, so it can't be used to retrieve statements.
@@ -269,6 +271,7 @@ Examples:
- [Working with DB instance read replicas] - [Working with DB instance read replicas]
- AWS' [CLI] - AWS' [CLI]
- [Configuring EC2 Disk alert using Amazon CloudWatch] - [Configuring EC2 Disk alert using Amazon CloudWatch]
- [aws.permissions.cloud]
### Sources ### Sources
@@ -350,6 +353,7 @@ Examples:
[automating dns-challenge based letsencrypt certificates with aws route 53]: https://johnrix.medium.com/automating-dns-challenge-based-letsencrypt-certificates-with-aws-route-53-8ba799dd207b [automating dns-challenge based letsencrypt certificates with aws route 53]: https://johnrix.medium.com/automating-dns-challenge-based-letsencrypt-certificates-with-aws-route-53-8ba799dd207b
[aws config tutorial by stephane maarek]: https://www.youtube.com/watch?v=qHdFoYSrUvk [aws config tutorial by stephane maarek]: https://www.youtube.com/watch?v=qHdFoYSrUvk
[aws icons]: https://aws-icons.com/ [aws icons]: https://aws-icons.com/
[aws.permissions.cloud]: https://aws.permissions.cloud/
[configuring ec2 disk alert using amazon cloudwatch]: https://medium.com/@chandinims001/configuring-ec2-disk-alert-using-amazon-cloudwatch-793807e40d72 [configuring ec2 disk alert using amazon cloudwatch]: https://medium.com/@chandinims001/configuring-ec2-disk-alert-using-amazon-cloudwatch-793807e40d72
[date & time policy conditions at aws - 1-minute iam lesson]: https://www.youtube.com/watch?v=4wpKP1HLEXg [date & time policy conditions at aws - 1-minute iam lesson]: https://www.youtube.com/watch?v=4wpKP1HLEXg
[introduction to aws iam assumerole]: https://aws.plainenglish.io/introduction-to-aws-iam-assumerole-fbef3ce8e90b [introduction to aws iam assumerole]: https://aws.plainenglish.io/introduction-to-aws-iam-assumerole-fbef3ce8e90b

View File

@@ -17,20 +17,28 @@ Amazon offering for managed OpenSearch clusters.
## Storage ## Storage
_Standard_ data nodes use _hot_ storage in the form of instance stores or EBS volumes attached to each node.<br/> Clusters can be set up to use the [hot-warm architecture].
Hot storage provides the fastest possible performance for indexing and searching new data.
_UltraWarm_ nodes use S3 and caching.<br/> _Hot_ storage provides the fastest possible performance for indexing and searching **new** data.
Useful for indexes that are **not** actively written to, queried less frequently, or don't need the hot storage's
performance.
> Warm indexes are **read-only** unless returned to hot storage.<br/> _Data_ nodes use **hot** storage in the form of instance stores or EBS volumes attached to each node.
> This makes UltraWarm storage best-suited for immutable data such as logs.
Warm indexes behave like any other index. Indexes that are **not** actively written to (e.g., immutable data like logs), that are queried less frequently, or that
don't need the hot storage's performance can be moved to _warm_ storage.
_Cold_ storage uses s3 too. It is meant for data accessed only occasionally or no longer in active use.<br/> Warm indexes are **read-only** unless returned to hot storage.<br/>
One **can't** read from nor write to cold indexes. When one needs it, one can selectively attach it to UltraWarm nodes. Aside that, they behave like any other hot index.
_UltraWarm_ nodes use **warm** storage in the form of S3 and caching.
AWS' managed OpenSearch service offers also _Cold_ storage.<br/>
It is meant for data accessed only occasionally or no longer in active use.<br/>
Cold indexes are normally detached from nodes and stored in S3, meaning one **can't** read from nor write to cold
indexes by default.<br/>
Should one need to query them, one needs to selectively attach them to UltraWarm nodes.
Use [Index State Management][index state management in amazon opensearch service] to automate indexes migration to
lower storage states after they meet specific conditions.
### UltraWarm storage ### UltraWarm storage
@@ -54,8 +62,7 @@ Considerations:
to the amount of storage each instance type can address and the maximum number of warm nodes supported by Domains. to the amount of storage each instance type can address and the maximum number of warm nodes supported by Domains.
- Amazon recommends a maximum shard size of 50 GiB. - Amazon recommends a maximum shard size of 50 GiB.
- Upon enablement, UltraWarm might not be available to use for several hours even if the domain state is _Active_. - Upon enablement, UltraWarm might not be available to use for several hours even if the domain state is _Active_.
- Use [Index State Management][index state management in amazon opensearch service] to automate indexes migration to - The minimum amount of UltraWarm instances allowed by AWS is 2.
UltraWarm after they meet specific conditions.
> Before disabling UltraWarm, one **must** either delete **all** warm indexes or migrate them back to hot storage.<br/> > Before disabling UltraWarm, one **must** either delete **all** warm indexes or migrate them back to hot storage.<br/>
> After warm storage is empty, wait five minutes before attempting to disable UltraWarm. > After warm storage is empty, wait five minutes before attempting to disable UltraWarm.
@@ -69,6 +76,10 @@ Requirements:
- OpenSearch/ElasticSearch >= v7.9. - OpenSearch/ElasticSearch >= v7.9.
- [UltraWarm storage] enabled for the same domain. - [UltraWarm storage] enabled for the same domain.
Considerations:
- One **can't** read from nor write to cold indexes.
## Operations ## Operations
### Migrate indexes to UltraWarm storage ### Migrate indexes to UltraWarm storage
@@ -203,6 +214,10 @@ can manage.
- Choose appropriate [instance types and sizes][supported instance types in amazon opensearch service].<br/> - Choose appropriate [instance types and sizes][supported instance types in amazon opensearch service].<br/>
Leverage the ability to select them to tailor the service offering to one's needs. Leverage the ability to select them to tailor the service offering to one's needs.
> [OR1 instances][or1 storage for amazon opensearch service] **cannot** (currently?) be selected as master nodes.<br/>
> They must also be selected **at domain creation**.
- Consider using reserved instances for long-term savings. - Consider using reserved instances for long-term savings.
- Enable index-level compression to save storage space and reduce I/O costs. - Enable index-level compression to save storage space and reduce I/O costs.
- Use Index Lifecycle Management policies to move old data in lower storage tiers. - Use Index Lifecycle Management policies to move old data in lower storage tiers.
@@ -216,17 +231,18 @@ can manage.
- Optimize data ingestion. - Optimize data ingestion.
- Optimize indexes' mapping and settings. - Optimize indexes' mapping and settings.
- Optimize the JVM heap size. - Optimize the JVM heap size.
- Summarize and compress historical data using Rollups. - Summarize and compress historical data using [index rollups].
- Check out caches. - Check out caches.
- Reduce the number of requests using throttling and rate limiting. - Reduce the number of requests using throttling and rate limiting.
- Move to single-AZ deployments. - Move to Single-AZ deployments.
- Leverage Spot Instances for data ingestion and processing. - Filter out and compress source data before sending it to OpenSearch to reduce the storage footprint and data transfer
- Compress source data before sending it to OpenSearch to reduce the storage footprint and data transfer costs. costs.
- Share a single OpenSearch cluster with multiple accounts to reduce the overall number of instances and resources. - Share a single OpenSearch cluster with multiple accounts to reduce the overall number of instances and resources.
## Further readings ## Further readings
- [OpenSearch] - [OpenSearch]
- [Hot-warm architecture]
- [Supported instance types in Amazon OpenSearch Service] - [Supported instance types in Amazon OpenSearch Service]
### Sources ### Sources
@@ -243,6 +259,7 @@ can manage.
- [Dedicated master nodes in Amazon OpenSearch Service] - [Dedicated master nodes in Amazon OpenSearch Service]
- [Best practices for configuring your Amazon OpenSearch Service domain] - [Best practices for configuring your Amazon OpenSearch Service domain]
- [Operational best practices for Amazon OpenSearch Service] - [Operational best practices for Amazon OpenSearch Service]
- [OR1 storage for Amazon OpenSearch Service]
<!-- <!--
Reference Reference
@@ -255,6 +272,7 @@ can manage.
<!-- Knowledge base --> <!-- Knowledge base -->
[dedicated master nodes]: #dedicated-master-nodes [dedicated master nodes]: #dedicated-master-nodes
[hot-warm architecture]: ../../opensearch.md#hot-warm-architecture
[opensearch]: ../../opensearch.md [opensearch]: ../../opensearch.md
[s3]: s3.md [s3]: s3.md
@@ -267,11 +285,13 @@ can manage.
[index state management in amazon opensearch service]: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/ism.html [index state management in amazon opensearch service]: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/ism.html
[lower your amazon opensearch service storage cost with gp3 amazon ebs volumes]: https://aws.amazon.com/blogs/big-data/lower-your-amazon-opensearch-service-storage-cost-with-gp3-amazon-ebs-volumes/ [lower your amazon opensearch service storage cost with gp3 amazon ebs volumes]: https://aws.amazon.com/blogs/big-data/lower-your-amazon-opensearch-service-storage-cost-with-gp3-amazon-ebs-volumes/
[operational best practices for amazon opensearch service]: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/bp.html [operational best practices for amazon opensearch service]: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/bp.html
[or1 storage for amazon opensearch service]: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/or1.html
[supported instance types in amazon opensearch service]: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/supported-instance-types.html [supported instance types in amazon opensearch service]: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/supported-instance-types.html
[ultrawarm storage for amazon opensearch service]: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/ultrawarm.html [ultrawarm storage for amazon opensearch service]: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/ultrawarm.html
<!-- Others --> <!-- Others -->
[cost-saving strategies for aws opensearch(finops): optimize performance without breaking the bank]: https://ramchandra-vadranam.medium.com/cost-saving-strategies-for-aws-opensearch-finops-optimize-performance-without-breaking-the-bank-f87f0bb2ce37 [cost-saving strategies for aws opensearch(finops): optimize performance without breaking the bank]: https://ramchandra-vadranam.medium.com/cost-saving-strategies-for-aws-opensearch-finops-optimize-performance-without-breaking-the-bank-f87f0bb2ce37
[index rollups]: https://opensearch.org/docs/latest/im-plugin/index-rollups/index/
[opensearch cost optimization: 12 expert tips]: https://opster.com/guides/opensearch/opensearch-capacity-planning/how-to-reduce-opensearch-costs/ [opensearch cost optimization: 12 expert tips]: https://opster.com/guides/opensearch/opensearch-capacity-planning/how-to-reduce-opensearch-costs/
[reducing amazon opensearch service costs: our journey to over 60% savings]: https://medium.com/kreuzwerker-gmbh/how-we-accelerate-financial-and-operational-efficiency-with-amazon-opensearch-6b86b41d50a0 [reducing amazon opensearch service costs: our journey to over 60% savings]: https://medium.com/kreuzwerker-gmbh/how-we-accelerate-financial-and-operational-efficiency-with-amazon-opensearch-6b86b41d50a0
[right-size amazon opensearch instances to cut costs by 50% or more]: https://cloudfix.com/blog/right-size-amazon-opensearch-instances-cut-costs/ [right-size amazon opensearch instances to cut costs by 50% or more]: https://cloudfix.com/blog/right-size-amazon-opensearch-instances-cut-costs/

View File

@@ -18,6 +18,7 @@ Use cases: application search, log analytics, data observability, data ingestion
1. [Tuning](#tuning) 1. [Tuning](#tuning)
1. [The split brain problem](#the-split-brain-problem) 1. [The split brain problem](#the-split-brain-problem)
1. [APIs](#apis) 1. [APIs](#apis)
1. [Hot-warm architecture](#hot-warm-architecture)
1. [Further readings](#further-readings) 1. [Further readings](#further-readings)
1. [Sources](#sources) 1. [Sources](#sources)
@@ -152,6 +153,9 @@ Enormous documents should still be indexed individually.
When indexing documents, the document's `_id` must be 512 bytes or less in size. When indexing documents, the document's `_id` must be 512 bytes or less in size.
_Static_ index settings can only be updated on **closed** indexes.<br/>
_Dynamic_ index settings can be updated at any time through the [APIs].
## Requirements ## Requirements
| Port number | Component | | Port number | Component |
@@ -193,13 +197,14 @@ Use docker compose.
## Tuning ## Tuning
- Disable swapping.<br/> - Disable swapping.<br/>
Swapping can dramatically decrease performance and stability. If kept enabled, it can dramatically **decrease** performance and stability.
- Avoid using network file systems for node storage in a production workflow.<br/> - Avoid using network file systems for node storage in a production workflow.<br/>
Using those can cause performance issues due to network conditions (i.e.: latency, limited throughput) or read/write Using those can cause performance issues due to network conditions (i.e.: latency, limited throughput) or read/write
speeds. speeds.
- Use solid-state drives (SSDs) on the hosts for node storage where possible. - Use solid-state drives (SSDs) on the hosts for node storage where possible.
- Set the size of the Java heap.<br/> - Set the size of the Java heap.<br/>
Recommend half of system RAM. Recommended to use **half** of the system's RAM.
- Set up a [hot-warm architecture].
## The split brain problem ## The split brain problem
@@ -207,7 +212,40 @@ TODO
## APIs ## APIs
TODO FIXME: expand
- Close indexes.<br/>
Disables read and write operations.
```plaintext
POST /prometheus-logs-20231205/_close
```
- (Re)Open closed indexes.<br/>
Enables read and write operations.
```plaintext
POST /prometheus-logs-20231205/_open
```
- Update indexes' settings.<br/>
_Static_ settings can only be updated on **closed** indexes.
```plaintext
PUT /prometheus-logs-20231205/_settings
{
"index": {
"codec": "zstd_no_dict",
"codec.compression_level": 3,
"refresh_interval": "2s"
}
}
```
## Hot-warm architecture
Refer
[Set up a hot-warm architecture](https://opensearch.org/docs/latest/tuning-your-cluster/#advanced-step-7-set-up-a-hot-warm-architecture).
## Further readings ## Further readings
@@ -218,6 +256,7 @@ TODO
- [Okapi BM25] - [Okapi BM25]
- [`fsync`][fsync] - [`fsync`][fsync]
- [AWS' managed OpenSearch] offering - [AWS' managed OpenSearch] offering
- [Setting up Hot-Warm architecture for ISM in OpenSearch]
### Sources ### Sources
@@ -227,6 +266,7 @@ TODO
- [Avoiding the Elasticsearch split brain problem, and how to recover] - [Avoiding the Elasticsearch split brain problem, and how to recover]
- [Index templates in OpenSearch - how to use composable templates] - [Index templates in OpenSearch - how to use composable templates]
- [Index management] - [Index management]
- [Index settings]
<!-- <!--
Reference Reference
@@ -234,6 +274,8 @@ TODO
--> -->
<!-- In-article sections --> <!-- In-article sections -->
[apis]: #apis
[hot-warm architecture]: #hot-warm-architecture
[refresh operations]: #refresh-operations [refresh operations]: #refresh-operations
[translog]: #translog [translog]: #translog
@@ -246,6 +288,7 @@ TODO
[documentation]: https://opensearch.org/docs/latest/ [documentation]: https://opensearch.org/docs/latest/
[github]: https://github.com/opensearch-project [github]: https://github.com/opensearch-project
[index management]: https://opensearch.org/docs/latest/dashboards/im-dashboards/index-management/ [index management]: https://opensearch.org/docs/latest/dashboards/im-dashboards/index-management/
[index settings]: https://opensearch.org/docs/latest/install-and-configure/configuring-opensearch/index-settings/
[website]: https://opensearch.org/ [website]: https://opensearch.org/
[what is opensearch?]: https://aws.amazon.com/what-is/opensearch/ [what is opensearch?]: https://aws.amazon.com/what-is/opensearch/
@@ -256,3 +299,4 @@ TODO
[index templates in opensearch - how to use composable templates]: https://opster.com/guides/opensearch/opensearch-data-architecture/index-templating-in-opensearch-how-to-use-composable-templates/ [index templates in opensearch - how to use composable templates]: https://opster.com/guides/opensearch/opensearch-data-architecture/index-templating-in-opensearch-how-to-use-composable-templates/
[lucene]: https://lucene.apache.org/ [lucene]: https://lucene.apache.org/
[okapi bm25]: https://en.wikipedia.org/wiki/Okapi_BM25 [okapi bm25]: https://en.wikipedia.org/wiki/Okapi_BM25
[setting up hot-warm architecture for ism in opensearch]: https://opster.com/guides/opensearch/opensearch-data-architecture/setting-up-hot-warm-architecture-for-ism/