β‘Β ElasticsearchBook is crafted by Jozef Sorocin (π’Β Book a consulting hour) and powered by:
- Spatialized.io (Elasticsearch & Google Maps consulting)
- in cooperation with Garages-Near-Me.com (Effortless parking across Germany)
Β
An interesting application of partial text matching is an autocomplete with a dropdown:
People are unlikely to input the whole tag and will want only the tags that contain their input. These days we even expect auto-correct and natural language understanding but that's a topic for a different chapter β maybe even a whole new (hand)book π .
Use Case
Each of my documents contains an array of tags:
{
...
"tags": ["Race", "Racing", "Mountain Bike", "Horizontal"],
...
},
{
...
"tags": ["Tracey Chapman", "Silverfish", "Blue"],
...
},
{
...
"tags": ["Surfing", "Race", "Disgrace"],
...
}
If the user types
race
, I only want to return Race
, Tracey Chapman
, and Disgrace
β i.e. only the tags that contain my query string. What's more, I want this to be as fast as possible because it's fed through an autocomplete.Approach
Option #1: Regex
Since there's no dedicated array data type in ES, we'll apply a simple keyword mapping:
PUT tags_index
{
"mappings": {
"properties": {
"tags": {
"type": "keyword"
}
}
}
}
and then index the above docs:
POST tags_index/_doc
{"tags":["Race","Racing","Mountain Bike","Horizontal"]}
POST tags_index/_doc
{"tags":["Tracey Chapman","Silverfish","Blue"]}
POST tags_index/_doc
{"tags":["Surfing","Race","Disgrace"]}
Now, extracting and grouping the contents of textual arrays is usually done through a
terms
aggregation:POST tags_index/_search
{
"size": 0,
"aggs": {
"topics": {
"terms": {
"field": "tags",
"size": 10
}
}
}
}
This'll give us all tags separately. But we want to filter them and luckily, there's the
include
parameter which supports regex:POST tags_index/_search
{
"size": 0,
"aggs": {
"topics": {
"terms": {
"field": "tags",
"include": ".*[Rr][Aa][Cc][Ee].*",
"size": 10
}
}
}
}
There are two problems with this regex-based approach though: