⚡ ElasticsearchBook is crafted by Jozef Sorocin (🟢 Book a consulting hour) and powered by:

Spatialized.io (Elasticsearch & Google Maps consulting)

in cooperation with Garages-Near-Me.com (Effortless parking across Germany)

Sub-Aggregation Sanity

As debated in Aggregation Data Tables, we know it's neat to treat tables of different columnar metrics with sets of row-based sub-aggregations.

The downside of this approach is having to deal with lots of iterations, lookups, and path accessors in the response JSON when trying to extract the actually presentable values.

The URL parameter filter_path offers the possibility reduce the response and thus limit which parts of the default JSON are returned. Applying it with in connection with the products aggregation from the previous chapter would mean:

POST products/_search?filter_path=aggregations.by_range.*.*.value
{
  "aggs": {
    "by_range": {
      "filter": {
         ...

yielding the reduced response:

{
  "aggregations" : {
    "by_range" : {
      "1.row|0_to_25" : {
        "2.revenue" : {
          "value" : 20.0
        },
        "1.sold_count" : {
          "value" : 1
        }
      },
      ...
    }
  }
}

💡

filter_path can in fact be used in almost any ES request that returns JSON or YAML (?format=yaml), thus somewhat emulating the widely used command-line utility jq.

Response filtering is, however, the last step of the request making process so let's first talk about the "optimal" aggregation request structure.

Use Case: Context-Aware Sub-Aggregations

Building on top of the products example, I'd like to efficiently label row- & column-based aggregations based on what they'll look like in the frontend, and keep track of the final aggregation sequence so that I don't have to use alphanumerically-prefixed aggregation names.

Approach

Any frontend-relevant attributes can be stored as key-value pairs right within each sub-aggregation's metadata. What's relevant for us is the row & column indices, plus any frontend post-processing functions that we'd want to apply to the values.

In the following sample, the rows are red, the columns yellow, and the metadata green.

POST products/_search
{
  "size": 0,
  "aggs": {
    "by_range": {
      "filter": {
        "term": {
          "status": "sold"
        }
      },
      "aggs": {
        "1.row|0_to_25": {
          "filter": {
            "range": {
              "price": {
                "gt": 0,
                "lte": 25
              }
            }
          },
          "meta": {
            "column": {
              "index": 0,
              "name": "Range in USD"
            },
            "row": {
              "index": 0,
              "name": "0 to 25"
            }
          },
          "aggs": {
            "1.sold_count": {
              "cardinality": {
                "field": "_id"
              },
              "meta": {
                "column": {
                  "index": 1,
                  "name": "Sold Count"
                },
                "post_processing": {
                  "formatter": "FUNCTIONS.IDENTITY",
									"args": []
                }
              }
            },
            "2.revenue": {
              "sum": {
                "field": "price"
              },
              "meta": {
                "column": {
                  "index": 2,
                  "name": "Revenue in USD"
                },
                "post_processing": {
                  "formatter": "FUNCTIONS.PPRINT_CURRENCY",
                  "args": ["USD"]
                }
              }
            }
          }
        },
        ...
      }
    }
  }
}

Such a configuration may appear unnecessarily verbose but in fact, it contains just the right amount of information. More importantly, it's highly scalable: my team and I have built battle-tested BI dashboards with dozens of highly nested aggregations using this exact approach and it's worked wonders. Granted, our production metadata labelling is slightly more complex than the above example, but the principle largely stays the same.

Plus, due to the standardized request structure, debugging becomes easier. And if you're working with a strongly-typed programming language, predictable.

🔑

If you're indeed working with a strongly-typed language, you may find quicktype.io very useful as it allows you to convert JSON into typesafe code.

Table Cell Metadata

Sub-Aggregation Sanity

Use Case: Context-Aware Sub-Aggregations

Approach