An introduction to Elasticsearch Query Language (ES|QL) – Part 2

In this blog post, we’ll explore some advanced features that make querying in ES|QL both flexible and efficient. Histograms aggrege data over time ranges, which can reveal trends and patterns in your datasets. The WHERE command, which allows you to filter data precisely, enhancing the relevance of your queries. We’ll cover field type conversions, which ensures your data types are correctly interpreted and manipulated within your queries. Lastly, we’ll work with multivalued fields, a common scenario in Elasticsearch that requires special handling to extract meaningful insights.

Histograms – Similar to the ‘timechart’ command widely used in Splunk, in ES|QL the STATS by BUCKET command can be used to produce similar results. The below command counts all events over a 24 hour period and splits the results into 24 buckets. In this example some hours contain no events so only 20 buckets are created.

from kibana_sample_data_logs
| STATS count = COUNT(*) BY bucket(@timestamp, 24, "2024-06-09T00:00:00.000", "2024-06-09T23:59:59.000")

WHERE command – This can be used to query the data that has been returned by the source command ‘FROM’. In the first example the ‘>’ quantifier is used to return hosts with an average bytes values greater than 5500. In the 2nd example the ‘like’ operator is used to return hosts containing ‘.org’ in their name (the visualisation style has been changed from ‘Bar verticle’ to ‘Donut’ in the visualisation editor).

from kibana_sample_data_logs
| stats avg_bytes = avg(bytes) by host
| WHERE avg_bytes>5500
from kibana_sample_data_logs
| stats avg_bytes = avg(bytes) by host
| WHERE host LIKE "*.org*"

Field type conversions – In the previous ES|QL blog we used the DISSECT command to extract fields from the ‘message’ field. These new fields are returned as ‘keyword’ type fields. In this example the eval command uses ‘to_date’ and ‘to_ip’ functions to change the field types of ‘date’ and ‘clientip’. This allows CIDR matching for the ‘clientip’ field and aggregation of events using time functions on ‘date’.

from kibana_sample_data_logs
| keep message
| dissect message "%{clientip} - - [%{date}] \"%{request_type} "
| eval date = to_datetime(date), clientip = to_ip(clientip)

Working with multivalued fields (numbers) – The elastic demo data does not contain multivalued (mv) fields so the ROW command is used to generate a mv field containing 3 numbers. The EVAL command uses the MV_AVG function to return a new field conatining the average of the 3 values. MV_COUNT/MAX/MEDIAN/MIN/SUM can also be used to manipulate the number values.

ROW EventDurationMS=[12, 16, 75]
| EVAL avg_EventDurationMS = MV_AVG(EventDurationMS)

Working with multivalued fields (strings) – Again, the ROW command is used to create a MV field containing 4 names as strings.

ROW username=["Dave", "Mike", "Keith", "Ben"]

If you have mv fields in your data and they are not usable, the MV_EXPAND command can be used to split these out into 4 seperate values.

ROW username=["Dave", "Mike", "Keith", "Ben"]
| MV_EXPAND username

MV_FIRST/LAST can be used to return the earliest or latest value in the mv field, MV_DEDUPE can be used to remove duplicates and MV_CONCAT can used to changed a MV field into a single value column containing all values.

ROW username=["Dave", "Mike", "Keith", "Ben"]
| EVAL combinedusername = MV_CONCAT(username, ", ")

All commands within this blog are covered in the ES|QL reference documentation, HERE.

Leave a Reply

Discover more from Planned Link

Subscribe now to keep reading and get access to the full archive.

Continue reading