Relevant Tags Only

🏑 Home πŸ“– Chapter Home πŸ‘ˆ Prev πŸ‘‰ Next
⚑  ElasticsearchBook is crafted by Jozef Sorocin (🟒 Book a consulting hour) and powered by:
Β 
An interesting application of partial text matching is an autocomplete with a dropdown:
Courtesy of Ant Design
People are unlikely to input the whole tag and will want only the tags that contain their input. These days we even expect auto-correct and natural language understanding but that's a topic for a different chapter β€” maybe even a whole new (hand)book πŸ˜‰ .

Use Case

Each of my documents contains an array of tags:
{
  ...
  "tags": ["Race", "Racing", "Mountain Bike", "Horizontal"],
  ...
},
{
  ...
  "tags": ["Tracey Chapman", "Silverfish", "Blue"],
  ...
},
{
  ...
  "tags": ["Surfing", "Race", "Disgrace"],
  ...
}
If the user types race, I only want to return Race, Tracey Chapman, and Disgrace β€” i.e. only the tags that contain my query string. What's more, I want this to be as fast as possible because it's fed through an autocomplete.

Approach

Option #1: Regex

Since there's no dedicated array data type in ES, we'll apply a simple keyword mapping:
PUT tags_index
{
  "mappings": {
    "properties": {
      "tags": {
        "type": "keyword"
      }
    }
  }
}
and then index the above docs:
POST tags_index/_doc
{"tags":["Race","Racing","Mountain Bike","Horizontal"]}

POST tags_index/_doc
{"tags":["Tracey Chapman","Silverfish","Blue"]}

POST tags_index/_doc
{"tags":["Surfing","Race","Disgrace"]}
Now, extracting and grouping the contents of textual arrays is usually done through a terms aggregation:
POST tags_index/_search
{
  "size": 0,
  "aggs": {
    "topics": {
      "terms": {
        "field": "tags",
        "size": 10
      }
    }
  }
}
This'll give us all tags separately. But we want to filter them and luckily, there's the include parameter which supports regex:
POST tags_index/_search
{
  "size": 0,
  "aggs": {
    "topics": {
      "terms": {
        "field": "tags",
        "include": ".*[Rr][Aa][Cc][Ee].*", 
        "size": 10
      }
    }
  }
}
There are two problems with this regex-based approach though:

Already purchased? Sign in here.