KQL Fundamentals – Summarize
Level: Beginner | Reading time: 5 minutes
Let’s continue our series on KQL with a focus on Cyber Security. Let’s talk today about how to use the Summarize operator. Other posts can be seen in our KQL category.
We can think of Summarize as an aggregator, as it produces a table that groups (or summarizes) the contents of the input table. In an analogy with SQL commands, it can be compared to GROUP BY.
In the following example, I am listing in Azure Sentinel the SecurityEvent table and listing with Summarize using count() function and grouping by activity. We can see that we have the events and activities and a count_ column that aggregates the value of each one of them.
I could also reference another column to compose the final result, as in the example below:
Notice that our column doesn’t look very interesting with the name count_.
Here the tip is to rename the column to make the result easier to understand, using the example below:
I can also make a call to an aggregation function like count() shown above and to other aggregators like avg() and similar ones with column names as arguments, examples of aggregators can be seen at the following link:
summarize operator – Azure Data Explorer | Microsoft Docs
For example, I can search for the min and max timestamp of all records in the Activities table. There is no group-by clause, so there is only one line in the output:
Activities | summarize Min = min(Timestamp), Max = max(Timestamp)
Let’s see another example from the Azure Sentinel GitHub repository:
In this example, the summarize is being used with the make_set() aggregator in the first example, and in the second part, the column is renamed to FailureCount.
let ExcludedIP = dynamic ([
'172.24.1.4'
]);
let PreviousFailures = SecurityEvent
| where TimeGenerated between (ago(60m) .. ago(10m))
| where EventID == 4625
| where SubStatus != "0xc0000064"
| where AccountType != 'Machine'
| where IpAddress !in (ExcludedIP)
| summarize FailureCount=count() by TargetAccount, IpAddress, bin(TimeGenerated, 50m)
| where FailureCount >= 50
| summarize make_set(strcat(TargetAccount, ' ', IpAddress));
SecurityEvent
| where TimeGenerated > ago(10m)
| where EventID == 4625
| where SubStatus != "0xc0000064"
| where AccountType != 'Machine'
| where IpAddress !in (ExcludedIP)
| summarize FailureCount=count() by TargetAccount, IpAddress, bin(TimeGenerated, 10m)
| where FailureCount >= 10
| where strcat(TargetAccount, ' ', IpAddress) !in (PreviousFailures)
That’s it for today, hope it helps!