Scripted Metrics

🏑 Home πŸ“– Chapter Home πŸ‘ˆ Prev πŸ‘‰ Next
⚑  ElasticsearchBook.com is crafted by Jozef Sorocin and powered by:

Metrics Aggregations

The aggregations in this family compute (numeric) metrics based on values extracted one way or another from the matched documents. Examples include:
Β 
Generally speaking, ES aggregations are run in a distributed, map/reduce-like fashion whereby the intermediate results are collected from all available shards and combined into the final result.
Scripted metric aggregations allow us to glide along these iterative processes and compute our metrics of choice β€” be it a single numeric metric, a hash map of key-metric pairs, a list of sorted values, etc.
Scripted metrics are composed of the following scripts:
  1. init_script? β€” Executed prior to any collection of documents. Allows the aggregation to set up an initial state. It's the only script that's not required to be declared.
  1. map_script β€” Executed once per document collected.
  1. combine_script β€” Executed once on each shard after document collection is complete. Allows the aggregation to consolidate the state returned from each shard.
  1. reduce_script β€” Executed once on the coordinating node after all shards have returned their results.

Use Case: Distinct Sum & Average

Given this denormalized data where the cost is always the same for a given id:
{ "id" : "1", "cost" : 42 }

{ "id" : "2", "cost" : 67 }
{ "id" : "2", "cost" : 67 }
{ "id" : "2", "cost" : 67 }

{ "id" : "4", "cost" : 11 }
{ "id" : "4", "cost" : 11 }

{ "id" : "5", "cost" : 10 }
{ "id" : "6", "cost" : 99 }
How can I get the AVERAGE of the cost but DISTINCT by the id β€” e.g. the following in SQL:
SELECT AVG(T.cost)
FROM 
    (SELECT DISTINCT id, cost
     FROM records_table) as T

Already purchased? Sign in here.