β‘Β ElasticsearchBook is crafted by Jozef Sorocin (π’Β Book a consulting hour) and powered by:
- Spatialized.io (Elasticsearch & Google Maps consulting)
- in cooperation with Garages-Near-Me.com (Effortless parking across Germany)
Terminology
First off, let's clear up some commonly misused terms:
- Autofill pre-fills form input fields such as addresses and credit card details
- Autocomplete suggests complete words or strings without the user needing to type them in full (think typing a few characters into the Wikipedia search input field and then clicking the
β
andtab
keys)
- Instant search quickly reacts to user queries and suggests matching entries in a dropdown. Thereβs a very fine line between an instant search and an autocomplete.
Β
Given current trends, offering a fast and relevant instant search experience to your users is not only advisable β it's a must. With that being said, let's break down what's happening on the MIT homepage, extract some lessons from it, and replicate something similar with Elasticsearch.
Use Case
I'd like to implement an instant, multi-category homepage search just like MIT:
such that as I start typing the word
department
, only relevant suggestions are presented.Observations
I chose MIT's homepage because it already uses Elasticsearch. As a matter of fact, many sites do but this one is particularly easily explorable due to the nature of the unmasked HTTP calls involved.
Let's inspect what's happening under the hood using Chrome Dev Tools:
Observation #1: No typeahead debouncing
If you look closely, each time the user types a new character, a new XHR network request is performed. This happens for the tokens
de
, dep
, depa
, depar
, etc. This gives the impression that the search is super-reactive but in turn costs network bandwidth and ES processing resources.What's worse, the "outdated" requests are not cancelled. This means that the by-now-irrelevant search results for the token
dep
will have been superseded by the results for depa
but the dep
request will have finished nonetheless. Now, why worry about request cancellation when we can prevent the unnecessary requests from being sent in the first place!
In reactive programming, the technique for preventing multiple sequential typeahead requests is called debouncing. If you're using Angular, this wonderful egghead.io course explains it succinctly. There's a lesson for React devotees too.
Observation #2: Querying multiple indices
As we expand the nodes of the JSON response, we see familiar keywords like
took
, hits
, max_score
, _source
, and so on. What's somewhat striking is the the structure of the response.There are multiple groups of hits
instead of just one:
GET https://tlesearch.mit.edu/q/department?count=3
{
"responses": [
{
"hits": {
"total": 35,
"hits": [ { "_source": { ... } }, { "_source": { ... } }, { "_source": { ... } } ]
},
"status": 200
},
{
"hits": {
"total": 7,
"hits": [ { "_source": { ... } }, { "_source": { ... } }, { "_source": { ... } } ]
},
"status": 200
},
{
"hits": {
"total": 0,
"hits": [ ]
},
"status": 200
}
]
}
This is not an arbitrary format that the site builders came up with β it's a typical response of the so-called Multi search API which is used to query multiple indices in one go!
Β
In accordance with the notion of typeless indices, it's customary to store different object types in different indices.
So MIT's resources were split between the indices:
mit-tle-links
covering the section Top Links
mit-tle-locations
for the section Locations
- and so on.
Β