⚡ ElasticsearchBook is crafted by Jozef Sorocin (🟢 Book a consulting hour) and powered by:
- Spatialized.io (Elasticsearch & Google Maps consulting)
- in cooperation with Garages-Near-Me.com (Effortless parking across Germany)
Once you've established a solid mapping, you'll want to index multiple documents at once using the Bulk API. A typical payload to the
_bulk
endpoint would be sent as newline-delimited JSON (ndjson
) but since this format is quite verbose and often hard to get right, it's helpful to use the client libraries' helpers instead. Nonetheless, we'll cover the ndjson
format too in case you don't plan on using any client library.There's no "optimal" payload chunk size due to a plethora of factors but a good amount to start with is 1,000 documents, or 5MB per request.
Use Case 1: JSON Files in python
How do I index a JSON file consisting of an array of objects in python?
Approach
Using no python client lib, only requests
import requests
import json
index_name = 'my_index'
# we can either append the index name to the url like here
# or add it as one of the the `_index` attributes down the line
endpoint = 'http://localhost:9200/_bulk/' + index_name
data = []
with open('file.json', 'r') as json_in:
docs = json.loads(json_in.read())
for doc in docs:
# if '_id' is blank or left out, it'll be auto-generated
data.append({'index': {'_id': doc['_id']}})
data.append(doc)
# rudimentary conversion into ndjson
payload = '\n'.join([json.dumps(line) for line in data]) + '\n'
r = requests.put(endpoint,
# `data` instead of `json`
data=payload,
headers={
# it's a requirement
'Content-Type': 'application/x-ndjson'
})
print(r.json())
Using the elasticsearch-py
library
import json
from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk
es = Elasticsearch('http://localhost:9200')
index_name = 'my_index'
data = []
with open('file.json', 'r') as json_in:
data = json.loads(json_in.read())
actions = [
{
'_index': index_name,
'_id': doc['_id'] # if None, the ID will be auto-generated
'_source': doc
} for doc in data
]
bulk(es, actions)