Homepage Instant Search

🏑 Home πŸ“– Chapter Home πŸ‘‰ Next
⚑  ElasticsearchBook.com is powered by Notion-Paywall.com

Terminology

First off, let's clear up some commonly misused terms:
  • Autofill pre-fills form input fields such as addresses and credit card details
  • Autocomplete suggests complete words or strings without the user needing to type them in full (think typing a few characters into the Wikipedia search input field and then clicking the ↓ and tab keys)
  • Instant search quickly reacts to user queries and suggests matching entries in a dropdown. There’s a very fine line between an instant search and an autocomplete.
Β 
Given current trends, offering a fast and relevant instant search experience to your users is not only advisable β€” it's a must. With that being said, let's break down what's happening on the MIT homepage, extract some lessons from it, and replicate something similar with Elasticsearch.

Use Case

I'd like to implement an instant, multi-category homepage search just like MIT:
web.mit.edu captured on Jan 30, 2021
such that as I start typing the word department, only relevant suggestions are presented.

Observations

I chose MIT's homepage because it already uses Elasticsearch. As a matter of fact, many sites do but this one is particularly easily explorable due to the nature of the unmasked HTTP calls involved.
Let's inspect what's happening under the hood using Chrome Dev Tools:

Observation #1: No typeahead debouncing

If you look closely, each time the user types a new character, a new XHR network request is performed. This happens for the tokens de, dep, depa, depar, etc. This gives the impression that the search is super-reactive but in turn costs network bandwidth and ES processing resources.
What's worse, the "outdated" requests are not cancelled. This means that the by-now-irrelevant search results for the token dep will have been superseded by the results for depa but the dep request will have finished nonetheless.
Now, why worry about request cancellation when we can prevent the unnecessary requests from being sent in the first place!
πŸ”‘
In reactive programming, the technique for preventing multiple sequential typeahead requests is called debouncing. If you're using Angular, this wonderful egghead.io course explains it succinctly. There's a lesson for React devotees too.

Observation #2: Querying multiple indices

As we expand the nodes of the JSON response, we see familiar keywords like took, hits, max_score, _source, and so on. What's somewhat striking is the the structure of the response.
There are multiple groups of hits instead of just one:
GET https://tlesearch.mit.edu/q/department?count=3
{
  "responses": [
    {
      "hits": {
        "total": 35,
        "hits": [ { "_source": { ... } }, { "_source": { ... } }, { "_source": { ... } } ]
      },
      "status": 200
    },
    {
      "hits": {
        "total": 7,
        "hits": [ { "_source": { ... } }, { "_source": { ... } }, { "_source": { ... } } ]
      },
      "status": 200
    },
    {
      "hits": {
        "total": 0,
        "hits": [ ]
      },
      "status": 200
    }
  ]
}
This is not an arbitrary format that the site builders came up with β€” it's a typical response of the so-called Multi search API which is used to query multiple indices in one go!
Β 
πŸ”‘
In accordance with the notion of typeless indices, it's customary to store different object types in different indices.
So MIT's resources were split between the indices:
  • mit-tle-links covering the section Top Links
  • mit-tle-locations for the section Locations
  • and so on.
Β 

Already purchased? Sign in here.