mirror of https://gitea.com/mcereda/oam.git synced 2026-02-09 05:44:23 +00:00

Files

Michele Cereda ae885a940f chore(opensearch): move api usage examples to their parent section, add index patterns and snippets

2025-04-02 22:22:28 +02:00

42 KiB

Raw Blame History

OpenSearch

Search and analytics suite forked from ElasticSearch by Amazon.
Makes it easy to ingest, search, visualize, and analyze data.

Use cases: application search, log analytics, data observability, data ingestion, others.

TL;DR
Node types
Indexes
Setup
Index templates
1. Composable index templates
Ingest data
1. Bulk indexing
Re-index data
Data streams
Index patterns
APIs
Further readings
1. Sources

TL;DR

Documents are the unit storing information, consisting of text or structured data.
Documents are stored in the JSON format, and returned when related information is searched for.
Documents are immutable. However, they can be updated by retrieving them, updating the information in them, and re-indexing them using the same document IDs.

Indexes are collections of documents.
Their contents are queried when information is searched for.

Nodes are servers that store data and process search requests.
OpenSearch is designed to be running on one or more nodes.

Multiple nodes can be aggregated into Clusters.
Clusters allow nodes to specialize for different responsibilities depending on their types.

Each and every cluster elects a cluster manager node is elected.
Manager nodes orchestrate cluster-level operations (e.g., creating indexes).

Nodes in clusters communicate with each other.
When a request is routed to any node, that node sends requests to the others, gathers their responses, and returns the final response.

Indexes are split into shards, each of them storing a subset of all documents in an index.
Shards are evenly distributed across nodes in a cluster.
Each shard is effectively a full Lucene index. Since each instance of Lucene is a running process consuming CPU and memory, having more shards is not necessarily better.

Shards may be either primary (the original ones) or replicas (copies of the originals).
By default, one replica shard is created for each primary shard.

OpenSearch distributes replica shards to different nodes than the ones hosting their corresponding primary shards, so that replica shards would act as backups in the event of node failures.
Replicas also improve the speed at which the cluster processes search requests, encouraging the use of more than one replica per index for each search-heavy workload.

Indexes use a data structure called an inverted index. It maps words to the documents in which they occur.
When searching, OpenSearch matches the words in the query to the words in the documents. Each document is assigned a relevance score indicating how well the document matched the query.

Individual words in a search query are called search terms.
Each term is scored according to the following rules:

Search terms that occur more frequently in a document will tend to be scored higher.
This is the term frequency component of the score.
Search terms that occur in more documents will tend to be scored lower.
This is the inverse document frequency component of the score.
Matches on longer documents should tend to be scored lower than matches on shorter documents.
This corresponds to the length normalization component of the score.

OpenSearch uses the Okapi BM25 ranking algorithm to calculate document relevance scores, then returns the results sorted by relevance.

Update operations consist of the following steps:

An update is received by a primary shard.
The update is written to the shard's transaction log (translog).
The translog is flushed to disk and followed by an fsync before the update is acknowledged to guarantee durability.
The update is passed to the Lucene index writer, which adds it to an in-memory buffer.
On a refresh operation, the Lucene index writer flushes the in-memory buffers to disk.
Each buffer becomes a new Lucene segment.
A new index reader is opened over the resulting segment files.
The updates are now visible for search.
On a flush operation, the shard fsyncs the Lucene segments.
Because the segment files are a durable representation of the updates, the translog is no longer needed to provide durability. The updates can be purged from the translog.

Transition logs make updates durable.
Indexing or bulk calls respond once the documents have been written to the translog and the translog is flushed to disk. Updates will not be visible to search requests until after a refresh operation takes place.

Refresh operations are performed periodically to write the documents from the in-memory Lucene index to files.
These files are not guaranteed to be durable, because no flush operation is yet performed at this point.

A refresh operation makes documents available for search.

Flush operations persist files to disk using fsync, ensuring durability.
Flushing ensures that the data stored only in the translog is recorded in the Lucene index.

Flushes are performed as needed to ensure that the translog does not grow too large.

Shards are Lucene indexes, which consist of segments (or segment files).
Segments store the indexed data and are immutable.

Merge operations merge smaller segments into larger ones periodically.
This reduces the overall number of segments on each shard, frees up disk space, and improves search performance.

Eventually, segments reach a maximum allowed size and are no longer merged into larger segments.
Merge policies specify the segments' maximum size and how often merge operations are performed.

Interaction with the cluster is done via REST APIs.

If indexes do not already exist, OpenSearch automatically creates them while ingesting data.

Typical setup order of operations

[optional] Create index templates.
[optional] Create data streams.
[optional] Create indexes.
Ingest data.
Create index patterns for the search dashboard to use.

Node types

Node type	Description	Best practices for production
Cluster manager	Manages the overall operation of a cluster and keeps track of the cluster state. This includes creating and deleting indexes, keeping track of the nodes that join and leave the cluster, checking the health of each node in the cluster (by running ping requests), and allocating shards to nodes.	Three dedicated cluster manager nodes in three different availability zones ensures the cluster never loses quorum. Two nodes will be idle for most of the time, except when one node goes down or needs some maintenance.
Cluster manager eligible	Elects one node among them as the cluster manager node through a voting process.	Make sure to have dedicated cluster manager nodes by marking all other node types as not cluster manager eligible.
Data	Stores and searches data. Performs all data-related operations (indexing, searching, aggregating) on local shards. These are the worker nodes and need more disk space than any other node type.	Keep them balanced between zones. Storage and RAM-heavy nodes are recommended.
Ingest	Pre-processes data before storing it in the cluster. Runs an ingest pipeline that transforms data before adding it to an index.	Use dedicated ingest nodes if you plan to ingest a lot of data and run complex ingest pipelines. Optionally offload your indexing from the data nodes so that they are used exclusively for searching and aggregating.
Coordinating	Delegates client requests to the shards on the data nodes, collects and aggregates the results into one final result, and sends this result back to the client.	Prevent bottlenecks for search-heavy workloads using a couple of dedicated coordinating-only nodes. Use CPUs with as many cores as you can.
Dynamic	Delegates specific nodes for custom work (e.g.: machine learning tasks), preventing the consumption of resources from data nodes and therefore not affecting functionality.
Search	Provides access to searchable snapshots. Incorporates techniques like frequently caching used segments and removing the least used data segments in order to access the searchable snapshot index (stored in a remote long-term storage source, for example, Amazon S3 or Google Cloud Storage).	Use nodes with more compute (CPU and memory) than storage capacity (hard disk).

Each node is by default a cluster-manager-eligible, data, ingest, and coordinating node.

Number of nodes, assigning node types, and choosing the hardware for each node type should depend on one's own use case.
One should take into account factors like the amount of time to hold on to data, the average size of documents, typical workload (indexing, searches, aggregations), expected price-performance ratio, risk tolerance, and so on.

After assessing all requirements, it is suggested to use benchmark testing tools like OpenSearch Benchmark.
Provision a small sample cluster and run tests with varying workloads and configurations. Compare and analyze the system and query metrics for these tests improve upon the architecture.

Indexes

Refer Managing indexes.

Indexes are collections of documents that one wants to make searchable.
They organize the data for fast retrieval.

To maximise one's ability to search and analyse documents, one can define how documents and their fields are stored and indexed.

Before one can search through one's data, documents must be ingested and indexed.
Data is ingested and indexed using the APIs.
There are two indexing APIs:

The Index API, which adds documents individually as they arrive.
It is intended for situations in which new data arrives incrementally (i.e., logs, or customer orders from a small business).
The _bulk API, which takes in one file lumping requests together.
It offers superior performance for situations where the flow of data is less frequent, and/or can be aggregated in a generated file.

Enormous documents are still better indexed individually.

Within indexes, OpenSearch identifies each document using a unique document ID.
The document's _id must be up to 512 bytes in size.
Should one not provide an ID for the document during ingestion, OpenSearch generates a document ID itself.

Upon receiving indexing requests, OpenSearch:

Creates an index if it does not exist already.
Stores the ingested document in that index.

Indexes must follow these naming restrictions:

All letters must be lowercase.
Index names cannot begin with underscores (_) or hyphens (-).
Index names cannot contain spaces, commas, or the following characters: :, ", *, +, /, \, |, ?, #, >, or <.

Indexes are configured with mappings and settings:

Mappings are collections of fields and the types of those fields.
Settings include index data (i.e., the index name, creation date, and number of shards).

One can specify both mappings and settings in a single request.

Unless an index's type mapping is explicitly defined, OpenSearch infers the field types from the JSON types submitted in the documents for the index (dynamic mapping).

Fields mapped to text are analyzed (lowercased and split into terms) and can be used for full-text search.
Fields mapped to keyword are used for exact term search.

Numbers are usually dynamically mapped to long.
Should one want to map them to the date type instead, one will need to delete the index, then recreate it by explicitly specifying the mappings.

Static index settings can only be updated on closed indexes.
Dynamic index settings can be updated at any time through the APIs.

Setup

Requirements

Port number	Component
443	OpenSearch Dashboards in AWS OpenSearch Service with encryption in transit (TLS)
5601	OpenSearch Dashboards
9200	OpenSearch REST API
9300	Node communication and transport (internal), cross cluster search
9600	Performance Analyzer

For Linux hosts:

vm.max_map_count must be set to 262144 or higher.

Quickstart

Just use the Docker composition.

Disable memory paging and swapping to improve performance.
Linux
```
sudo swapoff -a
```

[on Linux hosts] Increase the number of maps available to the service.

Linux

# temporarily
sudo echo '262144' > '/proc/sys/vm/max_map_count'

# permanently
echo 'vm.max_map_count=262144' | sudo tee -a '/etc/sysctl.conf'
sudo sysctl -p

Get the sample compose file:

# pick one
curl -O 'https://raw.githubusercontent.com/opensearch-project/documentation-website/2.19/assets/examples/docker-compose.yml'
wget 'https://raw.githubusercontent.com/opensearch-project/documentation-website/2.19/assets/examples/docker-compose.yml'

Adjust the compose file as you see fit, then run it.

The setup requires a strong password to be configured, or it will quit midway.
At least 8 characters of which 1 uppercase, 1 lowercase, 1 digit and 1 special.

OPENSEARCH_INITIAL_ADMIN_PASSWORD='someCustomStr0ng!Password' docker compose up -d

Confirm that the containers are running

$ docker compose ps
NAME                    COMMAND                  SERVICE                 STATUS    PORTS
opensearch-dashboards   "./opensearch-dashbo…"   opensearch-dashboards   running   0.0.0.0:5601->5601/tcp
opensearch-node1        "./opensearch-docker…"   opensearch-node1        running   0.0.0.0:9200->9200/tcp, 9300/tcp, 0.0.0.0:9600->9600/tcp, 9650/tcp
opensearch-node2        "./opensearch-docker…"   opensearch-node2        running   9200/tcp, 9300/tcp, 9600/tcp, 9650/tcp

Query the APIs to verify that the service is running.
Disable hostname checking, since the default security configuration uses demo certificates.

curl 'https://localhost:9200' -ku 'admin:someCustomStr0ng!Password'

{
    "name" : "opensearch-node1",
    "cluster_name" : "opensearch-cluster",
    "cluster_uuid" : "Cp2VzzkjR4eqlzK5H5ERJw",
    "version" : {
        "distribution" : "opensearch",
        "number" : "2.19.1",
        "build_type" : "tar",
        "build_hash" : "2e4741fb45d1b150aaeeadf66d41445b23ff5982",
        "build_date" : "2025-02-27T01:22:24.665339607Z",
        "build_snapshot" : false,
        "lucene_version" : "9.12.1",
        "minimum_wire_compatibility_version" : "7.10.0",
        "minimum_index_compatibility_version" : "7.0.0"
    },
    "tagline" : "The OpenSearch Project: https://opensearch.org/"
}

Explore OpenSearch Dashboards by opening http://localhost:5601/ in a web browser from the same host that is running the OpenSearch cluster.
The default username is admin, and the password is the one configured in the steps above.
Continue interacting with the cluster from the Dashboards, or using the APIs or clients.

Tutorial requests

PUT /students/_doc/1
{
    "name": "John Doe",
    "gpa": 3.89,
    "grad_year": 2022
}

GET /students/_mapping

GET /students/_search
{
    "query": {
        "match_all": {}
    }
}

PUT /students/_doc/1
{
    "name": "John Doe",
    "gpa": 3.91,
    "grad_year": 2022,
    "address": "123 Main St."
}

POST /students/_update/1/
{
    "doc": {
        "gpa": 3.92,
        "address": "123 Main St."
    }
}

DELETE /students/_doc/1

DELETE /students

PUT /students
{
    "settings": {
        "index.number_of_shards": 1
    },
    "mappings": {
        "properties": {
            "name": {
                "type": "text"
            },
            "grad_year": {
                "type": "date"
            }
        }
    }
}

PUT /students/_doc/1
{
    "name": "John Doe",
    "gpa": 3.89,
    "grad_year": 2022
}

GET /students/_mapping

POST _bulk
{ "create": { "_index": "students", "_id": "2" } }
{ "name": "Jonathan Powers", "gpa": 3.85, "grad_year": 2025 }
{ "create": { "_index": "students", "_id": "3" } }
{ "name": "Jane Doe", "gpa": 3.52, "grad_year": 2024 }

The split brain problem

TODO

Refer Elasticsearch Split Brain and Avoiding the Elasticsearch split brain problem, and how to recover.

Tuning

Disable swapping.
If kept enabled, it can dramatically decrease performance and stability.
Avoid using network file systems for node storage in a production workflows.
Using those can cause performance issues due to network conditions (i.e.: latency, limited throughput) or read/write speeds.
Use solid-state drives (SSDs) on the hosts for node storage where possible.
Properly set the size of the Java heap.
Recommended to use half of the host's RAM.
Set up a hot-warm architecture.

Hot-warm architecture

Refer Set up a hot-warm architecture.

Index templates

Refer Index templates.

Index templates allow to initialize new indexes with predefined mappings and settings.

Composable index templates

Composable index templates can be used to overcome challenges from index template management like index template duplication and changes across all index templates.

Composable index templates abstract common settings, mappings, and aliases into reusable building blocks.
Those are called component templates.

One can combine component templates to compose index templates.

Settings and mappings specified directly in the create index request will override any settings or mappings specified in an index template and its component templates.

Ingest data

One can ingest data by:

Ingesting individual documents.
Indexing multiple documents in bulk.
Using Data Prepper, a server-side data collector.
Using other ingestion tools.

Bulk indexing

Leverage the _bulk API endpoint to index documents in bulk.

POST _bulk
{ "create": { "_index": "students", "_id": "2" } }
{ "name": "Jonathan Powers", "gpa": 3.85, "grad_year": 2025 }
{ "create": { "_index": "students", "_id": "3" } }
{ "name": "Jane Doe", "gpa": 3.52, "grad_year": 2024 }

{
    "took": 7,
    "errors": false,
    "items": [
        {
            "create": {
                "_index": "students",
                "_id": "2",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 2,
                    "successful": 2,
                    "failed": 0
                },
                "_seq_no": 1,
                "_primary_term": 1,
                "status": 201
            }
        },
        {
            "create": {
                "_index": "students",
                "_id": "3",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 2,
                    "successful": 2,
                    "failed": 0
                },
                "_seq_no": 2,
                "_primary_term": 1,
                "status": 201
            }
        }
    ]
}

If any one of the actions in the _bulk API endpoint fail, OpenSearch continues to execute the other actions.

Examine the items array in the response to figure out what went wrong.
The entries in the items array are in the same order as the actions specified in the request.

Re-index data

Refer Reindex data.

The _reindex operation copies documents from an index, that one selects through a query, over to another index.

When needing to make an extensive change (e.g., adding a new field to every document, move documents between indexes, or combining multiple indexes into a new one), one can use the _reindex operation instead of deleting the old indexes, making the change offline, and then indexing the data again.

Re-indexing can be an expensive operation depending on the size of the source index.
It is recommended to disable replicas in the destination index by setting its number_of_replicas to 0, and re-enable them once the re-indexing process is complete.

_reindex is a POST operation.
In its most basic form, requires specifying a source index and a destination index.

Should the destination index not exist, _reindex creates a new index with default configurations.
If the destination index requires field mappings or custom settings, (re)create the destination index beforehand with the desired ones.

Reindex all documents

Copy all documents from one index to another.

POST _reindex
{
    "source": {
        "index": "sourceIndex"
    },
    "dest": {
        "index": "destinationIndex"
    }
}

{
    "took": 1350,
    "timed_out": false,
    "total": 30,
    "updated": 0,
    "created": 30,
    "deleted": 0,
    "batches": 1,
    "version_conflicts": 0,
    "noops": 0,
    "retries": {
        "bulk": 0,
        "search": 0
    },
    "throttled_millis": 0,
    "requests_per_second": -1,
    "throttled_until_millis": 0,
    "failures": []
}

Reindex only unique documents

Copy only documents missing from a destination index by setting the op_type option to create.

If a document with the same ID already exists, the operation ignores the one from the source index.
To ignore all version conflicts of documents, set the conflicts option to proceed.

For some reason, it seems to work better if the conflicts option is at the start of the request's data.

POST _reindex
{
    "conflicts": "proceed",
    "source": {
        "index": "sourceIndex"
    },
    "dest": {
        "index": "destinationIndex",
        "op_type": "create"
    }
}

Combine indexes

Combine all documents from one or more indexes into another by adding the source indexes as a list.

The number of shards for your source and destination indexes must be the same.

POST _reindex
{
    "source": {
        "index": [
            "sourceIndex_1",
            "sourceIndex_2"
        ]
    },
    "dest": {
        "index": "destinationIndex"
    }
}

Data streams

Data streams are managed indices that are highly optimised for time-series and append-only data (typically, logs and observability data in general).

They work like any other index, but OpenSearch simplifies some management operations (e.g., rollovers) and stores them in a more efficient way.

They are internally composed of multiple backing indexes.
Search requests are routed to all backing indexes, while indexing requests are routed only to the latest write index.

ISM policies allow to automatically handle index rollover or deletion.

Create data streams

Create an index template containing index_pattern: [] and data_stream: {}.
This template will configure all indexes matching the defined patterns as a data stream.
Specifying the data_stream object causes the template to create data streams, and not just regular indexes.
```
PUT _index_template/logs-template
{
    "data_stream": {},
    "index_patterns": [
        "logs-*"
    ]
}
```
```
{
    "acknowledged": true
}
```
From here on, all indices created with a name starting for logs- will be data streams instead.

By default, documents need to include a @timestamp field.
One can define one's own custom timestamp field as a property of the data_stream object to customize this.
```
-"data_stream": {},
+"data_stream": {
+    "timestamp_field": {
+        "name": "request_time"
+    }
+},
```
One can also add index mappings and other settings just as for regular index templates.
[optional] Explicitly create the data stream.
Since indexes are created with the first document they ingest, if they do not exist already, the data stream can be created just by starting ingesting documents for the indexes matching its patterns.
```
PUT _data_stream/logs-example
```
```
{
    "acknowledged": true
}
```

Start indexing documents.
If not already existing, the data stream is created together with the index, with the first document it ingests.

POST logs-example/_doc
{
    "message": "login attempt failed",
    "@timestamp": "2013-03-01T00:00:00"
}

{
    "_index": ".ds-logs-example-000001",
    "_id": "T_Zq9ZUBf2S2KQCqEc-d",
    "_version": 1,
    "result": "created",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 0,
    "_primary_term": 1
}

Create templates for data streams

PUT _index_template/logs-template
{
    "data_stream": {},
    "index_patterns": [
        "logs-*"
    ]
}

{
    "acknowledged": true
}

Explicitly create data streams

PUT _data_stream/logs-nginx

{
    "acknowledged": true
}

Get information about data streams

GET _data_stream/logs-nginx

{
    "data_streams": [
        {
            "name": "logs-nginx",
            "timestamp_field": {
                "name": "@timestamp"
            },
            "indices": [
                {
                    "index_name": ".ds-logs-nginx-000002",
                    "index_uuid": "UjUVr7haTWePKAfDz2q4Xg"
                },
                {
                    "index_name": ".ds-logs-nginx-000004",
                    "index_uuid": "gi372IUBSDO-pkaj7klLiQ"
                },
                {
                    "index_name": ".ds-logs-nginx-000005",
                    "index_uuid": "O60_VDzBStCaVGl8Sud2BA"
                }
            ],
            "generation": 5,
            "status": "GREEN",
            "template": "logs-template"
        }
    ]
}

Get statistics about data streams

GET _data_stream/logs-nginx/_stats

{
    "_shards": {
        "total": 2,
        "successful": 2,
        "failed": 0
    },
    "data_stream_count": 1,
    "backing_indices": 1,
    "total_store_size_bytes": 416,
    "data_streams": [
        {
            "data_stream": "logs-nginx",
            "backing_indices": 1,
            "store_size_bytes": 416,
            "maximum_timestamp": 0
        }
    ]
}

Delete data streams

DELETE _data_stream/logs-nginx

{
    "acknowledged": true
}

Index patterns

Index patterns reference one or more indexes, data streams, or index aliases.
They are mostly used in dashboards and in the discover tab to filter indexes to gather data from.

They require data to be indexed before creation.

Create index patterns

Go to OpenSearch Dashboards.
In the Management section of the side menu, select Dashboards Management.
Select Index patterns, then Create index pattern.
Define the pattern by entering a name in the Index pattern name field.
Dashboards automatically adds a wildcard (*). It will make the pattern match multiple sources or indexes.
Specify the time field to use when filtering documents on a time base.
Unless otherwise specified in the source or index properties, @timestamp will pop up in the dropdown menu.

Should one not want to use a time filter, select that option from the dropdown menu.
This will make OpenSearch return all the data in all the indexes that match the index pattern.
Select Create index pattern.

APIs

OpenSearch clusters offer a REST API.
It allows almost everything - changing most settings, modify indexes, check cluster health, get statistics, etc.

One can interact with the API using every method that can send HTTP requests.
One can also send HTTP requests in the Dev Tools console in OpenSearch Dashboards. It uses a simpler syntax to format REST requests compared to other tools like cURL.

OpenSearch returns responses in flat JSON format by default.
Provide the pretty query parameter to obtain response bodies in human-readable form:

curl --insecure --user 'admin:someCustomStr0ng!Password' 'https://localhost:9200/_cluster/health?pretty'
awscurl --service 'es' 'https://search-domain.eu-west-1.es.amazonaws.com/_cluster/health?pretty'

Requests that contain a body must specify the Content-Type header, and provide the request's payload:

curl … \
  -H 'Content-Type: application/json' \
  -d '{"query":{"match_all":{}}}'

REST API reference

Cluster

/_cluster endpoint.

Get clusters' status

GET _cluster/health

{
    "cluster_name": "opensearch-cluster",
    "status": "green",
    "timed_out": false,
    "number_of_nodes": 2,
    "number_of_data_nodes": 2,
    "discovered_master": true,
    "discovered_cluster_manager": true,
    "active_primary_shards": 9,
    "active_shards": 18,
    "relocating_shards": 0,
    "initializing_shards": 0,
    "unassigned_shards": 0,
    "delayed_unassigned_shards": 0,
    "number_of_pending_tasks": 0,
    "number_of_in_flight_fetch": 0,
    "task_max_waiting_in_queue_millis": 0,
    "active_shards_percent_as_number": 100.0
}

Documents

Index documents

Add a JSON document to an OpenSearch index by sending a PUT HTTP request to the /indexName/_doc endpoint.

PUT /students/_doc/1
{
    "name": "John Doe",
    "gpa": 3.89,
    "grad_year": 2022
}

In the example, the document ID is specified as the student ID (1).
Once such a request is sent, OpenSearch creates an index called students if it does not exist already, and stores the ingested document in that index.

{
    "_index": "students",
    "_id": "1",
    "_version": 1,
    "result": "created",
    "_shards": {
        "total": 2,
        "successful": 2,
        "failed": 0
    },
    "_seq_no": 0,
    "_primary_term": 1
}

Search for documents

Specify the index to search for, plus a query that will be used to match documents.
The simplest query is the match_all query, which matches all documents in an index. If no search parameters are given, the match_all query is assumed.

GET /students/_search

GET /students/_search
{
    "query": {
        "match_all": {}
    }
}

{
    "took": 11,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1,
        "hits": [
            {
                "_index": "students",
                "_id": "1",
                "_score": 1,
                "_source": {
                    "name": "John Doe",
                    "gpa": 3.89,
                    "grad_year": 2022
                }
            }
        ]
    }
}

Update documents

Completely, by re-indexing them:

PUT /students/_doc/1
{
    "name": "John Doe",
    "gpa": 3.91,
    "grad_year": 2022,
    "address": "123 Main St."
}

Only parts, by calling the _update endpoint:

POST /students/_update/1/
{
    "doc": {
        "gpa": 3.91,
        "address": "123 Main St."
    }
}

{
    "_index": "students",
    "_id": "1",
    "_version": 2,
    "result": "updated",
    "_shards": {
        "total": 2,
        "successful": 2,
        "failed": 0
    },
    "_seq_no": 1,
    "_primary_term": 1
}

Delete documents

DELETE /students/_doc/1

{
    "_index": "students",
    "_id": "1",
    "_version": 4,
    "result": "deleted",
    "_shards": {
        "total": 2,
        "successful": 2,
        "failed": 0
    },
    "_seq_no": 3,
    "_primary_term": 1
}

Indexes

View the inferred field types in indexes

Send GET requests to the _mapping endpoint:

GET /students/_mapping

{
    "students": {
        "mappings": {
            "properties": {
                "gpa": {
                    "type": "float"
                },
                "grad_year": {
                    "type": "long"
                },
                "name": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                }
            }
        }
    }
}

Create indexes specifying their mappings

PUT /students
{
    "settings": {
        "index.number_of_shards": 1
    },
    "mappings": {
        "properties": {
            "name": {
                "type": "text"
            },
            "grad_year": {
                "type": "date"
            }
        }
    }
}

{
    "acknowledged": true,
    "shards_acknowledged": true,
    "index": "students"
}

Close indexes

Disables read and write operations on the impacted indexes.

POST /prometheus-logs-20231205/_close

(Re)Open closed indexes

Enables read and write operations on the impacted indexes.

POST /prometheus-logs-20231205/_open

Update indexes' settings

Static settings can only be updated on closed indexes.

PUT /prometheus-logs-20231205/_settings
{
    "index": {
        "codec": "zstd_no_dict",
        "codec.compression_level": 3,
        "refresh_interval": "2s"
    }
}

Delete indexes

DELETE /students

{
    "acknowledged": true
}

42 KiB Raw Blame History

OpenSearch

TL;DR

Node types

Indexes

Setup

The split brain problem

Tuning

Hot-warm architecture

Index templates

Composable index templates

Ingest data

Bulk indexing

Re-index data

Data streams

Index patterns

APIs

Further readings

Sources

42 KiB

Raw Blame History