Ошибка проверки: 1: не добавлено количество запросов в массовой индексации ElasticSearch

У меня есть файл JSON, и мне нужно индексировать его на сервере ElasticSearch.

Файл JSOIN выглядит следующим образом:

{
    "sku": "1",
    "vbid": "1",
    "created": "Sun, 05 Oct 2014 03:35:58 +0000",
    "updated": "Sun, 06 Mar 2016 12:44:48 +0000",
    "type": "Single",
    "downloadable-duration": "perpetual",
    "online-duration": "365 days",
    "book-format": "ePub",
    "build-status": "In Inventory",
    "description": "On 7 August 1914, a week before the Battle of Tannenburg and two weeks before the Battle of the Marne, the French army attacked the Germans at Mulhouse in Alsace. Their objective was to recapture territory which had been lost after the Franco-Prussian War of 1870-71, which made it a matter of pride for the French. However, after initial success in capturing Mulhouse, the Germans were able to reinforce more quickly, and drove them back within three days. After forty-three years of peace, this was the first test of strength between France and Germany. In 1929 Karl Deuringer wrote the official history of the battle for the Bavarian Army, an immensely detailed work of 890 pages; First World War expert and former army officer Terence Zuber has translated this study and edited it down to more accessible length, to produce the first account in English of the first major battle of the First World War.",
    "publication-date": "07/2014",
    "author": "Deuringer, Karl",
    "title": "The First Battle of the First World War: Alsace-Lorraine",
    "sort-title": "First Battle of the First World War: Alsace-Lorraine",
    "edition": "0",
    "sampleable": "false",
    "page-count": "0",
    "print-drm-text": "This title will only allow printing of 2 consecutive pages at a time.",
    "copy-drm-text": "This title will only allow copying of 2 consecutive pages at a time.",
    "kind": "book",
    "fro": "false",
    "distributable": "true",
    "subjects": {
      "subject": [
        {
          "-schema": "bisac",
          "-code": "HIS027090",
          "#text": "World War I"
        },
        {
          "-schema": "coursesmart",
          "-code": "cs.soc_sci.hist.milit_hist",
          "#text": "Social Sciences -> History -> Military History"
        }
      ]
    },   
   "pricelist": {
      "publisher-list-price": "0.0",
      "digital-list-price": "7.28"
    },
    "publisher": {
      "publisher-name": "The History Press",
      "imprint-name": "The History Press Ireland"
    },
    "aliases": {
      "eisbn-canonical": "1",
      "isbn-canonical": "1",
      "print-isbn-canonical": "9780752460864",
      "isbn13": "1",
      "isbn10": "0750951796",
      "additional-isbns": {
        "isbn": [
          {
            "-type": "print-isbn-10",
            "#text": "0752460862"
          },
          {
            "-type": "print-isbn-13",
            "#text": "97807524608"
          }
        ]
      }
    },
    "owner": {
      "company": {
        "id": "1893",
        "name": "The History Press"
      }
    },
    "distributor": {
      "company": {
        "id": "3658",
        "name": "asc"
      }
    }
  }

Но когда я пытаюсь проиндексировать этот файл JSON с помощью команды

curl -XPOST 'http://localhost:9200/_bulk' -d @1.json

Я получаю эту ошибку:

{"error":{"root_cause":[{"type":"action_request_validation_exception","reason":"Validation Failed: 1: no requests added;"}],"type":"action_request_validation_exception","reason":"Validation Failed: 1: no requests added;"},"status":400}

Я не знаю, где я делаю ошибку.

Ответ 1

В API-интерфейсе Elasticsearch используется специальный синтаксис, который фактически состоит из документов json, написанных на отдельных строках. Взгляните на документацию.

Синтаксис довольно прост. Для индексирования, создания и обновления вам нужны 2 однострочных json-документа. Первые строки сообщают о действии, второе дает документу индексирование/создание/обновление. Чтобы удалить документ, требуется только строка действия. Например (из документации):

{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1"} }   
{ "doc" : {"field2" : "value2"} }
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }

Не забудьте закончить свой файл новой строкой. Затем, чтобы вызвать массив api, используйте команду:

curl -s -XPOST localhost:9200/_bulk --data-binary "@requests"

Из документации:

Если вы предоставляете ввод текстового файла для завивки, вы должны использовать флаг --data-binary вместо простого -d

Ответ 2

У меня была аналогичная проблема в том, что я хотел удалить конкретный документ определенного типа и с помощью вышеприведенного ответа мне удалось завершить мою простую работу bash script!

У меня есть файл, который имеет document_id для каждой строки (document_id.txt) и используя ниже bash script. Я могу удалить документы определенного типа с указанными документами.

Вот как выглядит файл:

c476ce18803d7ed3708f6340fdfa34525b20ee90
5131a30a6316f221fe420d2d3c0017a76643bccd
08ebca52025ad1c81581a018febbe57b1e3ca3cd
496ff829c736aa311e2e749cec0df49b5a37f796
87c4101cb10d3404028f83af1ce470a58744b75c
37f0daf7be27cf081e491dd445558719e4dedba1

bash script выглядит следующим образом:

#!/bin/bash

es_cluster="http://localhost:9200"
index="some-index"
doc_type="some-document-type"

for doc_id in `cat document_id.txt`
do
    request_string="{\"delete\" : { \"_type\" : \"${doc_type}\", \"_id\" : \"${doc_id}\" } }"
    echo -e "${request_string}\r\n\r\n" | curl -s -XPOST "${es_cluster}/${index}/${doc_type}/_bulk" --data-binary @-
    echo
done

Трюк после множества разочарований заключался в использовании параметра -e для эха и добавления \n\n к выходному сигналу эха, прежде чем я передал его в локон.

И затем в curl я установил параметр - data-binary, чтобы остановить его удаление \n\n, необходимого для _bulk, а затем параметр @-, чтобы прочитать его с помощью stdin!