De-Duplicated Autocomplete

🏡 Home 📖 Chapter Home 👈 Prev 👉 Next
⚡  ElasticsearchBook.com is crafted by Jozef Sorocin and powered by:
 
As hinted in the previous chapter, there's a new kid on the block to support search-as-you-type use cases — a field data type aptly named search_as_you_type.
Let's explore it in the common context of multi-word autocomplete queries where the order of the terms does not matter.

Use Case

I'm building an autocomplete for my catalogue of medical articles. Each article has a title:
{ "id": 1, "title": "Complete Blood Count" }
{ "id": 2, "title": "Preventing Blood Clots" }
{ "id": 3, "title": "What Is the CEA Test?" }
{ "id": 4, "title": "Coughing up Blood" }
...
When I search for the query blood c, I want the docs #1, #2, and #4 to be suggested. #4 too despite the fact that the word blood is not directly followed by the letter c.
On top of that, since some articles have duplicate titles, I want to only show unique titles in my autocomplete.

Approach

Let's set up an index where the title is a multi-field covering search_as_you_type to query on and keyword to aggregate on:
PUT articles
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "fields": {
          "as_you_type": {
            "type": "search_as_you_type"
          },
          "keyword": {
            "type": "keyword"
          }
        }
      }
    }
  }
}
Let's then sync some medical articles' titles:
POST articles/_doc
{"title":"Complete Blood Count"}

POST articles/_doc
{"title":"Beyond COVID-19: Can mRNA Treat Diseases, Too?"}

POST articles/_doc
{"title":"Complete Blood Count"}

POST articles/_doc
{"title":"Preventing Blood Clots"}

POST articles/_doc
{"title":"What Is the CEA Test?"}

POST articles/_doc
{"title":"Coughing up Blood"}
 
Now, assuming that people search for things in-order, we could start with a phrase_prefix query followed by a terms aggregation ordered by the max score of each individual term:
POST articles/_search
{
  "size": 0, 
  "query": {
    "match_phrase_prefix": {
      "title": "blood c"
    }
  },
  "aggs": {
    "by_unique_titles": {
      "terms": {
        "field": "title.keyword",
        "order": {
          "max_score": "desc"
        }
      },
      "aggs": {
        "max_score": {
          "max": {
            "script": "_score"
          }
        }
      }
    }
  }
}

Already purchased? Sign in here.