Missing vs. Empty Fields

🏡 Home 📖 Chapter Home 👉 Next

Empty fields?

The topic of empty-ish fields comes up often because:
  • input fields weren't required and were skipped (e.g. optional input fields in signup forms)
  • because of human error (e.g. data entry "interns" forgot to fill out a form field)
  • and last but not least, because of purposefully set empty-ish values (null, [], {}, "")

Disambiguation

Existent Fields

In Elasticsearch, these values would indicate that a given field does exist:
  • Empty strings, such as "" or "-"
  • Arrays containing null and another value, such as [null, "foo"]
  • A custom null-value, defined in field mapping (discussed later on)

Non-Existent Fields

When a doc is inserted into ES, its fields are usually indexed. The indexed value of one of those fields may not exist due to a variety of reasons:
  • The field is simply not present in the source JSON
  • The field in the source JSON is nullnull or [] or [null, null, ...?] or {}
  • The field value was malformed and the ignore_malformed parameter was defined in the index mapping
 
These statements might still be confusing so let's look at a concrete example.

Use case: Detecting Absent Fields

Given the following documents that contain different data types, any of which can be emptyish or even absent (non-existent):
POST myindex/_doc/1
{
	"id": 1
                                  
}

POST myindex/_doc/2
{  
	"id": 2,
  "potentially_absent_field": null
}

POST myindex/_doc/3
{
	"id": 3,
  "potentially_absent_field": 123
}
I want to find documents that strictly lack the potentially_absent_field, i.e. only doc#1.

Approaches

At the first glance it looks like we could reverse the exists query and thus target only those docs that do not contain the potentially_absent_field:
POST myindex/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "exists": {
            "field": "potentially_absent_field"
          }
        }
      ]
    }
  }
}
Somewhat unexpectedly, this returns both #1 and #2! The reason for that is that doc#2's potentially_absent_field cannot be indexed or searched. It is treated as though it has no values and is, for all intents and purposes, considered a non-existent field — just like the one in doc#1.

Already purchased? Sign in here.