Elasticsearch InOut Plugin

This Elasticsearch plugin provides the ability to export data by query on server side, by outputting the data directly on the according node. The export can happen on all indexes, on a specific index or on a specific document type.

The data will get exported as one json object per line:

{"_id":"id1","_source":{"type":"myObject","value":"value1"},"_version":1,"_index":"myIndex","_type":"myType"}
{"_id":"id2","_source":{"type":"myObject","value":"value2"},"_version":2,"_index":"myIndex","_type":"myType"}

The exported data can also be imported into elasticsearch again. The import is happening on each elasticsearch node by processing the files located in the specified directory.

Examples

Below are some examples demonstrating what can be done with the elasticsearch inout plugin. The example commands require installation on a UNIX system. The plugin may also works with different commands on other operating systems supporting elasticsearch, but is not tested yet.

Export data to files in the node's file system. The filenames will be expanded by index and shard names (p.e. /tmp/dump-myIndex-0):

curl -X POST 'http://localhost:9200/_export' -d '{
    "fields": ["_id", "_source", "_version", "_index", "_type"],
    "output_file": "/tmp/es-data/dump-${index}-${shard}"
}
'

Do GZIP compression on file exports:

curl -X POST 'http://localhost:9200/_export' -d '{
    "fields": ["_id", "_source", "_version", "_index", "_type"],
    "output_file": "/tmp/es-data/dump-${index}-${shard}.gz",
    "compression": "gzip"
}
'

Pipe the export data through a single argumentless command on the corresponding node, like cat. This command actually returns the export data in the JSON result's stdout field:

curl -X POST 'http://localhost:9200/_export' -d '{
    "fields": ["_id", "_source", "_version", "_index", "_type"],
    "output_cmd": "cat"
}
'

Pipe the export data through argumented commands (p.e. a shell script, or provide your own sophisticated script on the node). This command will result in transforming the data to lower case and write the file to the node's file system:

curl -X POST 'http://localhost:9200/_export' -d '{
    "fields": ["_id", "_source", "_version", "_index", "_type"],
    "output_cmd": ["/bin/sh", "-c", "tr [A-Z] [a-z] > /tmp/outputcommand.txt"]
}
'

Limit the exported data with a query. The same query syntax as for search can be used:

curl -X POST 'http://localhost:9200/_export' -d '{
    "fields": ["_id", "_source", "_version", "_index", "_type"],
    "output_file": "/tmp/es-data/query-${index}-${shard}",
    "query": {
        "match": {
            "someField": "someValue"
        }
    }
}
'

Export only objects of a specifix index:

curl -X POST 'http://localhost:9200/myIndex/_export' -d '{
    "fields": ["_id", "_source", "_version", "_type"],
    "output_file": "/tmp/es-data/dump-${index}-${shard}"
}
'

Export only objects of a specific type of an index:

curl -X POST 'http://localhost:9200/myIndex/myType/_export' -d '{
    "fields": ["_id", "_source", "_version"],
    "output_file": "/tmp/es-data/dump-${index}-${shard}"
}
'

Import data from previously exported data into elastic search. This can be for example a new set up elasticsearch server with empty indexes. Take care to have the indexes prepared with correct mappings. The files must reside on the file system of the elastic search node(s):

curl -X POST 'http://localhost:9200/_import' -d '{
    "directory": "/tmp/es-data"
}
'

Import data of gzipped files:

curl -X POST 'http://localhost:9200/_import' -d '{
    "directory": "/tmp/es-data",
    "compression": "gzip"
}
'

Import data into a specific index. Can be used if no _index is given in the export data or to force data of other indexes to be imported into a specific index:

curl -X POST 'http://localhost:9200/myNewIndex/_import' -d '{
    "directory": "/tmp/es-data"
}
'

Import data into a specific type of an index:

curl -X POST 'http://localhost:9200/myNewIndex/myType/_import' -d '{
    "directory": "/tmp/es-data"
}
'

Use a regular expression to filter imported file names (e.g. for specific indexes):

curl -X POST 'http://localhost:9200/_import' -d '{
    "directory": "/tmp/es-data",
    "file_pattern": "dump-myindex-(\\d).json"
}
'

Exports

Elements of the request body

fields

A list of fields to export. Describes which data is exported for every object. A field name can be any property that is defined in the index/type mapping with "store": "yes" or one of the following special fields (prefixed with _):

  • _id: Delivers the ID of the object
  • _index: Delivers the index of the object
  • _routing: Delivers the routing value of the object
  • _source: Delivers the stored JSON values of the object
  • _timestamp: Delivers the time stamp when the object was created (or the externally provided timestamp). Works only if the _timestamp field is enabled and set to "store": "yes" in the index/type mapping of the object.
  • _ttl: Delivers the expiration time stamp of the object if the _ttl field is enabled in the index/type mapping.
  • _type: Delivers the document type of the object
  • _version: Delivers the current version of the object

Example assuming that the properties name and address are defined in the index/type mapping with the property "store": "yes":

"fields": ["_id", "name", "address"]

The fields element is required in the POST data of the request.

output_cmd

"output_cmd": "cat"

"output_cmd": ["/location/yourcommand", "argument1", "argument2"]

The command to execute. Might be defined as string or as array. The content to export will get piped to Stdin of the command to execute. Some variable substitution is possible (see Variable Substitution)

  • Required (if output_file has been omitted)

output_file

"output_file": "/tmp/dump"

A path to the resulting output file. The containing directory of the given output_file has to exist. The given output_file MUST NOT exist, unless the parameter force_overwrite is set to true.

If the path of the output file is relative, the files will be stored relative to each node's first node data location, which is usually a subdirectory of the configured data location. This absolute path can be seen in the JSON response of the request. If you don't know where this location is, you can do a dry-run with the explain element set to true to find out.

Some variable substitution in the output_file's name is also possible (see Variable Substitution).

  • Required (if output_cmd has been omitted)

force_overwrite

"force_overwrite": true

Boolean flag to force overwriting existing output_file. This option only make sense if output_file has been defined.

  • Optional (defaults to false)

explain

"explain": true

Option to evaluate the command to execute (like dry-run).

  • Optional (defaults to false)

compression

"compression": "gzip"

Option to activate compression to the output. Works both whether output_file or output_cmd has been defined. Currently only the gzip compression type is available. Omitting the option will result in uncompressed output to files or processes.

  • Optional (default is no compression)

query

The query element within the export request body allows to define a query using the Query DSL. See http://www.elasticsearch.org/guide/reference/query-dsl/

  • Optional

settings

"settings": true

Option to generate an index settings file next to the data files on all corresponding shards. The generated settings file has the generated name of the output file with the .settings extension. This option is only possible if the option output_file has been defined.

  • Optional (defaults to false)

mappings

"mappings": true

Option to generate an index mapping file next to the data files on all corresponding shards. The generated mapping file has the generated name of the output file with an .mapping extension. This option is only possible if the option output_file has been defined.

  • Optional (defaults to false)

Get parameters

The api provides the general behavior of the rest API. See http://www.elasticsearch.org/guide/reference/api/

Preference

Controls a preference of which shard replicas to execute the export request on. Different than in the search API, preference is set to "_primary" by default. See http://www.elasticsearch.org/guide/reference/api/search/preference/

Variable Substitution

The following placeholders will be replaced with the actual value in the output_file or output_cmd fields:

  • ${cluster}: The name of the cluster
  • ${index}: The name of the index
  • ${shard}: The id of the shard

JSON Response

The _export query returns a JSON response with information about the export status. The output differs a bit whether an output command or an output file is given in the request body.

Output file JSON response

The JSON response may look like this if an output file is given in the request body:

{
    "exports" : [
        {
            "index" : "myIndex",
            "shard" : 0,
            "node_id" : "the_node_id",
            "numExported" : 5,
            "output_file" : "/tmp/dump-myIndex-0"
        }
    ],
    "totalExported" : 5,
    "_shards" : {
        "total" : 2,
        "successful" : 1,
        "failed" : 1,
        "failures" : [
            {
                "index" : "myIndex",
                "shard" : 1,
                "reason" : "..."
            }
        ]
    }
}

Output command JSON response

The JSON response may look like this if an output command is given in the request body:

{
    "exports" : [
        {
            "index" : "myIndex",
            "shard" : 0,
            "node_id" : "the_node_id",
            "numExported" : 5,
            "output_cmd" : [
                "/bin/sh",
                "-c",
                "tr [A-Z] [a-z] > /tmp/outputcommand.txt"
            ],
            "stderr" : "",
            "stdout" : "",
            "exitcode" : 0
        }
    ],
    "totalExported" : 5,
    "_shards" : {
        "total" : 2,
        "successful" : 1,
        "failed" : 1,
        "failures": [
            {
                "index" : "myIndex",
                "shard" : 1,
                "reason" : "..."
            }
        ]
    }
}

Hint

  • exports: List of successful exports
  • totalExported: Number of total exported objects
  • _shards: Shard information
  • index: The name of the exported index
  • shard: The number of the exported shard
  • node_id: The node id where the export happened
  • numExported: The number of exported objects in the shard
  • output_file: The file name of the output file with substituted variables
  • failures: List of failing shard operations
  • reason: The error report of a specific shard failure
  • output_cmd: The executed command on the node with substituted variables
  • stderr: The first 8K of the standard error log of the executed command
  • stdout: The first 8K of the standard output log of the executed command
  • exitcode: The exit code of the executed command

Imports

Import data

The import data requires the same format as the format that is generated by the export. So every line in the import file represents an object in JSON format.

The _source field is required for a successful import of an object. If the _id field is not given, a random id is generated for the object. Also the _index and _type fields are required, as long as they are not given in the request URI (p.e. http://localhost:9200/<index>/<type>/_index).

Further optional fields are _routing, _timestamp, _ttl and _version. See the fields section on export for more details on the fields.

Elements of the request body

directory

Specifies the directory where the files to be imported reside. Every single node of the cluster imports files from that directory on it's file system.

If the directory is a relative path, it is based on the absolute path of each node's first node data location. See output_file in export documentation for more information.

compression

"compression": "gzip"

Option to activate decompression on the import files. Currently only the gzip compression type is available.

  • Optional (default is no decompression)

file_pattern

"file_pattern": "index-(.*)-(\d).json"

Option to import only files with a given regular expression. Take care of double escaping, as the JSON is decoded too in the process. For more information on regular expressions visit http://www.regular-expressions.info/

  • Optional (default is no filtering)

settings

"settings": true

Option to import index settings. All files in the import directory with an eponymic data file without the .settings extension will be handled. Also use the file_pattern option to reduce imported settings files. The format of a settings file is the same as the JSON output of _settings GET requests.

  • Optional (defaults to false)

mappings

"mappings": true

Option to import index mappings. All files in the import directory with an eponymic data file without the .mapping extension will be handled. Also use the file_pattern option to reduce imported mapping files. The format of a mapping file is the same as the JSON output of _mapping GET requests.

  • Optional (defaults to false)

JSON Response

The JSON response of an import may look like this:

{
    "imports" : [
        {
            "node_id" : "7RKUKxNDQlq0OzeOuZ02pg",
            "took" : 61,
            "imported_files" : [
                {
                    "file_name" : "dump-myIndex-1.json",
                    "successes" : 150,
                    "failures" : 0
                },
                {
                    "file_name" : "dump-myIndex-2.json",
                    "successes" : 149,
                    "failures" : 1,
                    "invalidated" : 1
                }
            ]
        },
        {
            "node_id" : "IrMCOlKCTtW4aDhjXiYzTw",
            "took" : 63,
            "imported_files" : [
                {
                    "file_name" : "dump-myIndex-3.json",
                    "successes" : 150,
                    "failures" : 0
                }
            ]
        }
    ],
    "failures" : [
        {
            "node_id" : "OATwHz48TEOshAISZlepcA",
            "reason" : "..."
        }
    ]
}

Hint

  • imports: List of successful imports
  • ``node_id'': The node id where the import happened
  • took: Operation time of all imports on the node in milliseconds
  • imported_files: List of imported files in the import directory of the node's file system
  • file_name: File name of the handled file
  • successes: Number of successfully imported objects per file
  • failures (in imported_files list): Number of not imported objects because of a failure
  • invalidated: Number of not imported objects because of invalidation (time to live exceeded)
  • failures (in root): List of failing node operations
  • reason: The error report of a specific node failure

Dump

The idea behind dump is to export all relevant data to recreate the cluster as it was at the time of the dump.

The basic usage of the endpoint is:

curl -X POST 'http://localhost:9200/_dump'

All data (including also settings and mappings) will get saved to a subfolder within each nodes data directory.

It's possible to call _dump on root level, on index level or on type level.

Elements of the request body

directory

The directory option defines where to store exported files. If the directory is a relative path, it is based on the absolute path of each node's first node data location. See output_file in export documentation for more information. If the directory was omitted the default location dump within the node data location will be used.

force_overwrite

"force_overwrite": true

Boolean flag to force overwriting existing output_file. This option is identical to the force_overwrite option of the _export endpoint.

Restore

Dumped data is intended to get restored. This can be done by the _restore endpoint:

curl -X POST 'http://localhost:9200/_restore'

It's possible to call _restore on root level, on index level or on type level.

Elements of the request body

directory

Specifies the directory where the files to be restored reside. See directory in import documentation for more details. If the directory was omitted the default location dump within the node data location will be used.

settings and mappings

Defaults to true on restore. See the Import documentation for more details.

Reindex

The _reindex endpoint can reindex documents of a given search query.

Reindex all indexes:

curl -X POST 'http://localhost:9200/_reindex'

Reindex a specific index:

curl -X POST 'http://localhost:9200/myIndex/_reindex'

Reindex documents of a specified query:

curl -X POST 'http://localhost:9200/myIndex/aType/_reindex' -d '{
    "query": {"text": {"name": "tobereindexed"}}
}'

An example can be found in the Reindex DocTest.

Search Into

Via the _search_into endpoint it is possible to put the result of a given query directly into an index:

curl -X POST 'http://localhost:9200/oldindex/_search_into -d '{
    "fields": ["_id", "_source", ["_index", "'newindex'"]]
}'

An example can be found in the Search Into DocTest.

Installation

  • Clone this repo with git clone git@github.com:crate/elasticsearch-inout-plugin.git
  • Checkout the tag (find out via git tag) you want to build with (possibly master is not for your elasticsearch version)
  • Run: mvn clean package -DskipTests=true – this does not run any unit tests, as they take some time. If you want to run them, better run mvn clean package
  • Install the plugin: /path/to/elasticsearch/bin/plugin -install elasticsearch-inout-plugin -url file:///$PWD/target/elasticsearch-inout-plugin-$version.jar


Elasticsearch InOut Plugin

这个Elasticsearch插件提供了通过查询导出数据的功能 在服务器端,通过直接在相应节点上输出数据。 导出可以发生在所有索引,特定索引或特定索引上 文件类型。

数据将被导出为每行一个json对象:

{"_id":"id1","_source":{"type":"myObject","value":"value1"},"_version":1,"_index":"myIndex","_type":"myType"}
{"_id":"id2","_source":{"type":"myObject","value":"value2"},"_version":2,"_index":"myIndex","_type":"myType"}

导出的数据也可以再次导入弹性搜索。进口是 通过处理位于的每个弹性搜索节点进行处理 指定目录。

示例

以下是一些例子,说明弹性搜索可以做些什么 inout插件。示例命令需要在UNIX系统上进行安装。 该插件还可以与其他操作上的不同命令一起使用 支持弹性搜索的系统,但尚未测试。

将数据导出到节点文件系统中的文件。文件名将扩大 按索引和分片名称(p.e. / tmp / dump-myIndex-0):

curl -X POST 'http://localhost:9200/_export&#39; -d '{
    "fields": ["_id", "_source", "_version", "_index", "_type"],
    "output_file": "/tmp/es-data/dump-${index}-${shard}"
}
'

对文件导出执行GZIP压缩:

curl -X POST 'http://localhost:9200/_export&#39; -d '{
    "fields": ["_id", "_source", "_version", "_index", "_type"],
    "output_file": "/tmp/es-data/dump-${index}-${shard}.gz",
    "compression": "gzip"
}
'

通过一个无参数的命令来管理导出数据 节点,像猫。该命令实际返回JSON中的导出数据 结果的标准字段:

curl -X POST 'http://localhost:9200/_export&#39; -d '{
    "fields": ["_id", "_source", "_version", "_index", "_type"],
    "output_cmd": "cat"
}
'

通过有参数的命令管道导出数据(p.e.一个shell脚本或 在节点上提供您自己的复杂脚本)。这个命令会 导致将数据转换为小写并将文件写入 节点的文件系统:

curl -X POST 'http://localhost:9200/_export&#39; -d '{
    "fields": ["_id", "_source", "_version", "_index", "_type"],
    "output_cmd": ["/bin/sh", "-c", "tr [A-Z] [a-z] > /tmp/outputcommand.txt"]
}
'

使用查询限制导出的数据。与搜索相同的查询语法可以 被使用:

curl -X POST 'http://localhost:9200/_export&#39; -d '{
    "fields": ["_id", "_source", "_version", "_index", "_type"],
    "output_file": "/tmp/es-data/query-${index}-${shard}",
    "query": {
        "match": {
            "someField": "someValue"
        }
    }
}
'

只导出指定索引的对象:

curl -X POST 'http://localhost:9200/myIndex/_export&#39; -d '{
    "fields": ["_id", "_source", "_version", "_type"],
    "output_file": "/tmp/es-data/dump-${index}-${shard}"
}
'

仅导出特定类型索引的对象:

curl -X POST 'http://localhost:9200/myIndex/myType/_export&#39; -d '{
    "fields": ["_id", "_source", "_version"],
    "output_file": "/tmp/es-data/dump-${index}-${shard}"
}
'
将数据从以前导出的数据导入到弹性搜索中。这可以 例如一个新的设置了具有空索引的弹性搜索服务器。小心点 使用正确的映射准备索引。文件必须位于 弹性搜索节点的文件系统:

curl -X POST 'http://localhost:9200/_import&#39; -d '{
    "directory": "/tmp/es-data"
}
'

导入gzip压缩文件的数据:

curl -X POST 'http://localhost:9200/_import&#39; -d '{
    "directory": "/tmp/es-data",
    "compression": "gzip"
}
'

将数据导入特定索引。如果没有_index可以使用 导出数据或强制将其他索引的数据导入特定的索引 索引:

curl -X POST 'http://localhost:9200/myNewIndex/_import&#39; -d '{
    "directory": "/tmp/es-data"
}
'

将数据导入特定类型的索引:

curl -X POST 'http://localhost:9200/myNewIndex/myType/_import&#39; -d '{
    "directory": "/tmp/es-data"
}
'

使用正则表达式过滤导入的文件名(例如特定的) 索引):

curl -X POST 'http://localhost:9200/_import&#39; -d '{
    "directory": "/tmp/es-data",
    "file_pattern": "dump-myindex-(\d).json"
}
'

导出

请求正文的元素

fields

要导出的字段列表。描述每个数据导出的数据 目的。字段名称可以是在索引/类型中定义的任何属性 使用store映射:yes或以下特殊字段之一 (前缀为_):

  • _id :提供对象的ID
  • _index :提供对象的索引
  • _routing :提供对象的路由值
  • _source :传递对象的存储的JSON值
  • _timestamp :在创建对象时递送时间戳(或 外部提供的时间戳)。只有在启用了_timestamp字段时才有效 并在对象的索引/类型映射中设置为store:yes
  • _ttl :如果_ttl字段,则递送对象的到期时间戳 在索引/类型映射中启用。
  • _type :提供对象的文档类型
  • _version :提供对象的当前版本
假设定义了属性 name address 的示例 在索引/类型映射中使用属性store:yes

"fields": ["_id", "name", "address"]

请求的POST数据中需要 fields 元素。

output_cmd

"output_cmd": "cat"

"output_cmd": ["/location/yourcommand", "argument1", "argument2"]

要执行的命令。可能被定义为字符串或数组。的 要导出的内容将被管道传递给Stdin的命令执行。 一些变量替代是可能的(参见变量替换)

  • 必需(如果 output_file 已省略)

output_file

"output_file": "/tmp/dump"

生成的输出文件的路径。包含目录的 给定 output_file 必须存在。给定的 output_file 不能存在, 除非参数 force_overwrite 设置为true。

如果输出文件的路径是相对的,那么这些文件将被相对存储 到每个节点的第一个节点数据位置,通常是一个子目录 配置的数据位置。这个绝对路径可以在JSON中看到 响应请求。如果你不知道这个位置在哪里,你可以做 一个干运行的 explain 元素设置为 true 来查找。

output_file的名称中的一些变量替换也是可以的(参见 变量替代)。

  • 必需(如果 output_cmd 已省略)

force_overwrite

"force_overwrite": true

布尔标志以强制覆盖现有的 output_file 。仅此选项 如果已经定义了 output_file ,那么就有意义

  • 可选(默认为false)

explain

"explain": true

评估执行命令的选项(如干运行)。

  • 可选(默认为false)

compression

"compression": "gzip"

激活压缩到输出的选项。无论是否 已经定义了 output_file output_cmd 。目前只有 压缩类型为 gzip 。将导致省略选项 在未压缩的输出到文件或进程。

  • 可选(默认为无压缩)

query

导出请求正文中的查询元素允许定义一个 使用查询DSL进行查询。看到 http://www.elasticsearch.org/guide/reference/query-dsl/

  • 可选

settings

"settings": true

选择在所有数据文件旁生成索引设置文件 相应的碎片。生成的设置文件具有生成的名称 带有 .settings 扩展名的输出文件。此选项仅可能 如果定义了 output_file 选项。

  • 可选(默认为false)

mappings

"mappings": true

选择在所有数据文件旁生成索引映射文件 相应的碎片。生成的映射文件具有生成的名称 输出文件具有 .mapping 扩展名。此选项仅可能 如果定义了 output_file 选项。

  • 可选(默认为false)

获取参数

api提供其余API的一般行为。看到 http://www.elasticsearch.org/guide/reference/api/

Preference

控制哪些分片复本执行导出的首选项 请求。与搜索API不同,偏好设置为 _primary默认。看到 http://www.elasticsearch.org/guide/reference/api/search/preference/

变量替换

以下占位符将被替换为实际值 output_file output_cmd 字段:

  • $ {cluster} :集群的名称
  • $ {index} :索引的名称
  • $ {shard} :分片的id

JSON响应

_export查询返回一个包含导出信息的JSON响应 状态。输出命令或输出文件的输出有所不同 在请求正文中给出。

Output file JSON response

如果在中给出输出文件,则JSON响应可能如下所示 请求正文:

{
    "exports" : [
        {
            "index" : "myIndex",
            "shard" : 0,
            "node_id" : "the_node_id",
            "numExported" : 5,
            "output_file" : "/tmp/dump-myIndex-0"
        }
    ],
    "totalExported" : 5,
    "_shards" : {
        "total" : 2,
        "successful" : 1,
        "failed" : 1,
        "failures" : [
            {
                "index" : "myIndex",
                "shard" : 1,
                "reason" : "…"
            }
        ]
    }
}

Output command JSON response

如果在中给出输出命令,则JSON响应可能如下所示 请求正文:

{
    "exports" : [
        {
            "index" : "myIndex",
            "shard" : 0,
            "node_id" : "the_node_id",
            "numExported" : 5,
            "output_cmd" : [
                "/bin/sh",
                "-c",
                "tr [A-Z] [a-z] > /tmp/outputcommand.txt"
            ],
            "stderr" : "",
            "stdout" : "",
            "exitcode" : 0
        }
    ],
    "totalExported" : 5,
    "_shards" : {
        "total" : 2,
        "successful" : 1,
        "failed" : 1,
        "failures": [
            {
                "index" : "myIndex",
                "shard" : 1,
                "reason" : "…"
            }
        ]
    }
}

Hint

  • exports: List of successful exports
  • totalExported: Number of total exported objects
  • _shards: Shard information
  • index: The name of the exported index
  • shard: The number of the exported shard
  • node_id: The node id where the export happened
  • numExported: The number of exported objects in the shard
  • output_file: The file name of the output file with substituted variables
  • failures: List of failing shard operations
  • reason: The error report of a specific shard failure
  • output_cmd: The executed command on the node with substituted variables
  • stderr: The first 8K of the standard error log of the executed command
  • stdout: The first 8K of the standard output log of the executed command
  • exitcode: The exit code of the executed command

导入

http:// localhost:9200 /&lt; index&gt; /&lt; type&gt; / _ index)中给出。

其他可选字段是 _routing _timestamp _ttl _version 。有关更多详细信息,请参阅导出的 fields 部分 字段。

请求正文的元素

directory

指定要导入的文件所在的目录。每一个 集群的节点从文件系统上的目录导入文件。

如果目录是一个相对路径,它是基于每个的绝对路径 节点的第一个节点数据位置。请参阅导出文档中的 output_file 更多信息。

compression

"compression": "gzip"

在导入文件上激活解压缩的选项。目前只有 gzip 压缩类型可用。

  • 可选(默认为无解压缩)

file_pattern

"file_pattern": "index-(.*)-(\d).json"

仅使用给定正则表达式导入文件的选项。照顾好 双重转义,因为JSON在过程中也被解码。更多 有关正则表达式的信息,请访问 http://www.regular-expressions.info/

  • 可选(默认为无过滤)

settings

"settings": true

导入索引设置的选项。导入目录中的所有文件都带有 将不处理没有 .settings 扩展名的eponymic数据文件。也 使用 file_pattern 选项来减少导入的设置文件。格式 的设置文件与 _settings GET请求的JSON输出相同。

  • 可选(默认为false)

mappings

"mappings": true

导入索引映射的选项。导入目录中的所有文件都带有 未处理 .mapping 扩展名的eponymic数据文件将被处理。也 使用 file_pattern 选项来减少导入的映射文件。格式 映射文件与 _mapping GET请求的JSON输出相同。

  • 可选(默认为false)

JSON响应< / h3>

导入的JSON响应可能如下所示:

{
    "imports" : [
        {
            "node_id" : "7RKUKxNDQlq0OzeOuZ02pg",
            "took" : 61,
            "imported_files" : [
                {
                    "file_name" : "dump-myIndex-1.json",
                    "successes" : 150,
                    "failures" : 0
                },
                {
                    "file_name" : "dump-myIndex-2.json",
                    "successes" : 149,
                    "failures" : 1,
                    "invalidated" : 1
                }
            ]
        },
        {
            "node_id" : "IrMCOlKCTtW4aDhjXiYzTw",
            "took" : 63,
            "imported_files" : [
                {
                    "file_name" : "dump-myIndex-3.json",
                    "successes" : 150,
                    "failures" : 0
                }
            ]
        }
    ],
    "failures" : [
        {
            "node_id" : "OATwHz48TEOshAISZlepcA",
            "reason" : "…"
        }
    ]
}

Hint

  • imports: List of successful imports
  • ``node_id'': The node id where the import happened
  • took: Operation time of all imports on the node in milliseconds
  • imported_files: List of imported files in the import directory of the node's file system
  • file_name: File name of the handled file
  • successes: Number of successfully imported objects per file
  • failures (in imported_files list): Number of not imported objects because of a failure
  • invalidated: Number of not imported objects because of invalidation (time to live exceeded)
  • failures (in root): List of failing node operations
  • reason: The error report of a specific node failure

转储

转储后的想法是导出所有相关数据以重新创建 集群,因为它是在转储时。

端点的基本用法是:

curl -X POST 'http://localhost:9200/_dump'

所有数据(包括设置和映射)都将保存到子文件夹 在每个节点数据目录中。

可以在根级别,索引级别或类型上调用_dump 级别。

请求正文的元素

directory

目录选项定义存储导出文件的位置。如果 目录是一个相对路径,它是基于每个的绝对路径 节点的第一个节点数据位置。请参阅导出中的 output_file 文档了解更多信息。如果目录被省略了 将使用节点数据位置内的默认位置转储。

force_overwrite

"force_overwrite": true

布尔标志以强制覆盖现有的 output_file 。这个 选项与_export的force_overwrite选项相同 端点。

恢复

转储数据旨在恢复。这可以由_restore完成 端点:

curl -X POST 'http://localhost:9200/_restore'

可以在根级别,索引级别或类型上调用_restore 级别。

请求正文的元素

directory

指定要还原的文件所在的目录。看到 导入文档中的目录了解更多详细信息。如果 目录省略了节点数据内的默认位置转储 将使用位置。

settings and mappings

恢复时默认为true。有关详细信息,请参阅导入文档。

Reindex

_reindex 端点可以重新指定给定搜索查询的文档。

重新索引所有索引:

curl -X POST 'http://localhost:9200/_reindex&#39;

Reindex一个特定的索引:

curl -X POST 'http://localhost:9200/myIndex/_reindex&#39;

指定查询的Reindex文档:

curl -X POST 'http://localhost:9200/myIndex/aType/_reindex&#39; -d '{
    "query": {"text": {"name": "tobereindexed"}}
}'

可以在 Reindex DocTest 中找到一个示例。

搜索到

通过 _search_into 端点,可以将其结果 一个给定的查询直接转换成一个索引:

curl -X POST 'http://localhost:9200/oldindex/_search_into -d '{
    "fields": ["_id", "_source", ["_index", "'newindex'"]]
}'

可以在搜索到DocTest 中找到一个示例。

安装

  • 用git克隆克隆这个repo git@github.com :crate / elasticsearch-inout-plugin.git
  • 通过git标签查看你想要构建的标签 (可能是主人不适合你的弹性版)
  • 运行:mvn clean package -DskipTests = true - 这不运行任何单元 测试,因为他们需要一些时间。如果要运行它们,运行更好 mvn clean package
  • 安装插件:/ path / to / elasticsearch / bin / plugin -install elasticsearch-inout-plugin -url file:///$PWD/target/elasticsearch-inout-plugin-$version.jar




相关问题推荐