β‘Β ElasticsearchBook is crafted by Jozef Sorocin (π’Β Book a consulting hour) and powered by:
- Spatialized.io (Elasticsearch & Google Maps consulting)
- in cooperation with Garages-Near-Me.com (Effortless parking across Germany)
Β
In the previous chapter we've discussed the usefulness of
bucket_script
aggregations which allow for per-bucket computations. When combined with a scripted_metric
aggregation, other practical applications arise.Let me illustrate.
Use Case: Device Fleet Health
I have a fleet of devices, each of which posts a message to ES every 10 minutes in the form of:
{
"deviceId": "unique-device-id",
"timestamp": "2021-01-19 06:54:00",
"message" : "morning ping at 06:54 AM"
}
I'm trying to get a sense of the health of this fleet by finding devices that haven't reported anything in a given period of time. What I dream of is getting:
- the total count of distinct
deviceIds
seen in the last 7 days
- the total count of
deviceIds
NOT seen in the last hour
- the IDs of the devices that stopped reporting (β reported in the last 2hrs but not the last 1h)
Approach: Bucket Scripts & Scripted Metrics
Let's first set up our fleet_messages
index:
PUT fleet_messages
{
"mappings": {
"properties": {
"timestamp": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"message": {
"type": "text"
},
"deviceId": {
"type": "keyword"
}
}
}
}
and then ingest a few messages that occurred between Jan 13, 2021 and Jan 20, 2021.
POST fleet_messages/_doc
{
"deviceId": "device#1",
"timestamp": "2021-01-14 10:00:00",
"message": "device#1 in the last week"
}
POST fleet_messages/_doc
{
"deviceId": "device#1",
"timestamp": "2021-01-20 15:40:00",
"message": "device#1 in the last 2 hours"
}
POST fleet_messages/_doc
{
"deviceId": "device#1",
"timestamp": "2021-01-20 16:52:00",
"message": "device#1 in the last hour"
}
POST fleet_messages/_doc
{
"deviceId": "device#2",
"timestamp": "2021-01-15 09:00:00",
"message": "device#2 in the last week"
}
POST fleet_messages/_doc
{
"deviceId": "device#2",
"timestamp": "2021-01-20 15:58:00",
"message": "device#2 in the last 2hrs"
}
After that, let's assume it's exactly 5 PM on Jan 20, 2021.
Β
1. The total count of distinct deviceIds
seen in the last 7 days
We're going to use a
range
filter to restrict the timestamp, plus a cardinality
aggregation to obtain the unique device count. In pseudo-code:"last7d": {
"filter":
"range": "2021-01-13 <= timestamp <= 2021-01-20"
"aggs":
"cardinality": "on the field deviceId"
}
Translated to query DSL:
"last7d": {
"filter": {
"range": {
"timestamp": {
"gte": "2021-01-13 00:00:00",
"lte": "2021-01-20 17:00:00"
}
}
},
"aggs": {
"uniq_device_count": {
"cardinality": {
"field": "deviceId"
}
}
}
}
Β