brandon/oam

Fork 0

mirror of https://gitea.com/mcereda/oam.git synced 2026-02-09 05:44:23 +00:00

Files

Michele Cereda 67bd2b40b5 chore(opensearch): review storage sections

2024-06-15 01:20:10 +02:00

13 KiB

Raw Blame History

Amazon OpenSearch Service

Amazon offering for managed OpenSearch clusters.

Storage
1. UltraWarm storage
2. Cold storage
Operations
Best practices
1. Dedicated master nodes
Cost-saving measures
Further readings
1. Sources

Storage

Clusters can be set up to use the hot-warm architecture.

Hot storage provides the fastest possible performance for indexing and searching new data.

Data nodes use hot storage in the form of instance stores or EBS volumes attached to each node.

Indexes that are not actively written to (e.g., immutable data like logs), that are queried less frequently, or that don't need the hot storage's performance can be moved to warm storage.

Warm indexes are read-only unless returned to hot storage.
Aside that, they behave like any other hot index.

UltraWarm nodes use warm storage in the form of S3 and caching.

AWS' managed OpenSearch service offers also Cold storage.
It is meant for data accessed only occasionally or no longer in active use.
Cold indexes are normally detached from nodes and stored in S3, meaning one can't read from nor write to cold indexes by default.
Should one need to query them, one needs to selectively attach them to UltraWarm nodes.

Use Index State Management to automate indexes migration to lower storage states after they meet specific conditions.

UltraWarm storage

Refer UltraWarm storage for Amazon OpenSearch Service.

Requirements:

OpenSearch/ElasticSearch >= v6.8.
Dedicated master nodes.
No t2 nor t3 instances types as data nodes.
When using a Multi-AZ architecture with Standby domain, the number of warm nodes must be a multiple of the number of Availability Zones being used.
Others.

Considerations:

When calculating UltraWarm storage requirements, consider only the size of the primary shards.
S3 removes the need for replicas and abstracts away any operating system or service considerations.
Dashboards and _cat/indices will still report UltraWarm index size as the total of all primary and replica shards.
There are limits to the amount of storage each instance type can address and the maximum number of warm nodes supported by Domains.
Amazon recommends a maximum shard size of 50 GiB.
Upon enablement, UltraWarm might not be available to use for several hours even if the domain state is Active.
The minimum amount of UltraWarm instances allowed by AWS is 2.

Before disabling UltraWarm, one must either delete all warm indexes or migrate them back to hot storage.
After warm storage is empty, wait five minutes before attempting to disable UltraWarm.

Cold storage

Refer Cold storage for Amazon OpenSearch Service.

Requirements:

OpenSearch/ElasticSearch >= v7.9.
UltraWarm storage enabled for the same domain.

Considerations:

One can't read from nor write to cold indexes.

Operations

Migrate indexes to UltraWarm storage

Indexes' health must be green to perform migrations.

Migrations are executed one index at a time, sequentially.
There can be up to 200 migrations in the queue.
Any request that exceeds the limit will be rejected.

Index migrations to UltraWarm storage require a force merge operation, which purges documents that were marked for deletion.
By default, UltraWarm merges indexes into one segment. One can set this value up to 1000.

Migrations might fail during snapshots, shard relocations, or force merges.
Failures during snapshots or shard relocation are typically due to node failures or S3 connectivity issues.
Lack of disk space is usually the underlying cause of force merge failures.

Start migration:

POST _ultrawarm/migration/my-index/_warm

Check the migration's status:

GET _ultrawarm/migration/my-index/_status

{
  "migration_status": {
    "index": "my-index",
    "state": "RUNNING_SHARD_RELOCATION",
    "migration_type": "HOT_TO_WARM",
    "shard_level_status": {
      "running": 0,
      "total": 5,
      "pending": 3,
      "failed": 0,
      "succeeded": 2
    }
  }
}

If a migration is in the queue but has not yet started, it can be removed from the queue:

POST _ultrawarm/migration/_cancel/my-index

Return warm indexes to hot storage

Migrate them back to hot storage:

POST _ultrawarm/migration/my-index/_hot

There can be up to 10 queued migrations from warm to hot storage at a time.
Migrations requests are processed one at a time in the order they were queued.

Indexes return to hot storage with one replica.

Migrate indexes to Cold storage

As for UltraWarm storage, just change the endpoints accordingly:

POST _ultrawarm/migration/my-index/_cold
GET _ultrawarm/migration/my-index/_status
POST _ultrawarm/migration/_cancel/my-index

GET _cold/indices/_search

POST _cold/migration/_warm
GET _cold/migration/my-index/_status
POST _cold/migration/my-index/_cancel

Best practices

Refer Operational best practices for Amazon OpenSearch Service and Best practices for configuring your Amazon OpenSearch Service domain.

Use dedicated master nodes in production clusters.
Use Multi-AZ deployments in production clusters.

Dedicated master nodes

Refer Dedicated master nodes in Amazon OpenSearch Service.

They increase cluster stability by performing cluster management tasks.
They do not hold data nor respond to data upload requests.

Only one of the dedicated master nodes is active, while the others wait as backup in case the active dedicated master node fails.

All data upload requests are served by the data nodes, while all cluster management tasks are offloaded to the active dedicated master node. Cluster management tasks are:

Tracking all nodes in the cluster.
Maintaining routing information for nodes in the cluster.
Tracking the number of indexes in the cluster.
Tracking the number of shards belonging to each index.
Updating the cluster state after state changes.
I.e., creating an index and adding or removing nodes in the cluster.
Replicating changes to the cluster state across all nodes in the cluster.
Monitoring the health of all cluster nodes by sending heartbeat signals.

Use Multi-AZ with Standby adds three dedicated master nodes to each OpenSearch Service domain it is enabled for.

Even deploying in Single-AZ mode, three dedicated master nodes are recommended for stability.
In any case, never choose an even number of dedicated master nodes to avoid split brain problems.

If a cluster has an even number of master-eligible nodes, OpenSearch and Elasticsearch versions 7.x and later will ignore one node so that the voting configuration is always an odd number.
As such, an even number of dedicated master nodes are essentially equivalent to that number - 1.

If a cluster doesn't have the necessary quorum to elect a new master node, write and read requests to the cluster will both fail.
This behavior differs from the OpenSearch default.

Master nodes size is highly correlated with the data instance size and the number of instances, indexes, and shards they can manage.

Cost-saving measures

Choose appropriate instance types and sizes.
Leverage the ability to select them to tailor the service offering to one's needs.

OR1 instances cannot (currently?) be selected as master nodes.
They must also be selected at domain creation.
Consider using reserved instances for long-term savings.
Enable index-level compression to save storage space and reduce I/O costs.
Use Index Lifecycle Management policies to move old data in lower storage tiers.
Consider using S3 as data store for infrequently accessed or archived data.
Consider adjusting the frequency and retention period of snapshots.
By default, AWS OpenSearch takes daily snapshots and retains them for 14 days.
If using gp2 EBS volumes, move to gp3.
Enable autoscaling (serverless only).
Optimize indexes' sharding and replication.
Optimize queries.
Optimize data ingestion.
Optimize indexes' mapping and settings.
Optimize the JVM heap size.
Summarize and compress historical data using index rollups.
Check out caches.
Reduce the number of requests using throttling and rate limiting.
Move to Single-AZ deployments.
Filter out and compress source data before sending it to OpenSearch to reduce the storage footprint and data transfer costs.
Share a single OpenSearch cluster with multiple accounts to reduce the overall number of instances and resources.

13 KiB Raw Blame History