Aggregators¶
Aggregators represent different ways to count and output data as it is processed in rare. Aggregation takes in different formats of the matched and extracted expression, and either counts or analyzes the values.
More Examples
More examples of each can be found in examples. For CLI documentation, run rare help
Filter¶
rare help filter
Summary¶
Filter is a command used to match and (optionally) extract that match without
any aggregation. It's effectively a grep
or a combination of grep
, awk
,
and/or sed
.
Example¶
Extract out two numbers from access.log
$ rare filter -n 4 -m "(\d{3}) (\d+)" -e "{1} {2}" access.log
404 169
404 169
404 571
404 571
Matched: 4 / 4
Histogram¶
rare help histogram
Summary¶
The histogram format outputs an aggregation by counting the occurences of an extracted match. That is to say, on every line a regex will be matched (or not), and the matched groups can be used to extract and build a key, that will act as the bucketing name.
Example¶
Extract HTTP verb, URL and status code. Key off of status code and verb.
Tip
Use -x
to display percentages and a simple bargraph.
$ rare histo -m '"(\w{3,4}) ([A-Za-z0-9/.]+).*" (\d{3})' -e '{3} {1}' access.log
200 GET 160663
404 GET 857
304 GET 53
200 HEAD 18
403 GET 14
Bar Graph¶
rare help bargraph
Summary¶
Similar to histogram or table, bargraph can generate a stacked or grouped bargraph by one or two keys.
Example¶
Color Coded Keys
When run in terminal, below will be color-coded keys. Alternatively, you can leave
off -s
(stacking) to see each key displayed vertically.
$ rare bars -sz -m "\[(.+?)\].*\" (\d+)" \
-e "{$ {buckettime {1} year nginx} {bucket {2} {multi 10 10}}}" \
testdata/*
0 200 1 300 2 400
2019 000000000222222222222222222222222222222 458,136
2020 0000000000000000002222222222222222222222222222222 576,030
Matched: 1,034,166 / 1,034,166
Numerical Analysis¶
rare help analyze
Summary¶
This command will extract a number from the match and run basic analysis on that number (Such as mean, median, mode, and quantiles).
Example¶
Note
-x
or --extra
will capture more information (Median, Mode, and Percentiles),
but dramatically slows down the analysis.
$ rare analyze --extra \
-m '"(\w{3,4}) ([A-Za-z0-9/.@_-]+).*" (\d{3}) (\d+)' \
-e "{4}" testdata/access.log
Samples: 161,622
Mean: 2,566,283.9616
Min: 0.0000
Max: 1,198,677,592.0000
Median: 1,021.0000
Mode: 1,021.0000
P90: 19,506.0000
P99: 64,757,808.0000
P99.9: 395,186,166.0000
Matched: 161,622 / 161,622
Table¶
rare help table
Summary¶
Create a 2D view (table) of data extracted from a file. Expression needs to
yield a two dimensions. Can either use \x00
or the {$ a b}
helper. First
element is the column name, followed by the row name.
Example¶
$ rare tabulate -m "(\d{3}) (\d+)" \
-e "{$ {1} {bucket {2} 100000}}" -sk access.log
200 404 304 403 301 206
0 153,271 860 53 14 12 2
1000000 796 0 0 0 0 0
2000000 513 0 0 0 0 0
7000000 262 0 0 0 0 0
4000000 257 0 0 0 0 0
6000000 221 0 0 0 0 0
5000000 218 0 0 0 0 0
9000000 206 0 0 0 0 0
3000000 202 0 0 0 0 0
10000000 201 0 0 0 0 0
11000000 190 0 0 0 0 0
21000000 142 0 0 0 0 0
15000000 138 0 0 0 0 0
8000000 137 0 0 0 0 0
22000000 123 0 0 0 0 0
14000000 121 0 0 0 0 0
16000000 110 0 0 0 0 0
17000000 99 0 0 0 0 0
34000000 91 0 0 0 0 0
Matched: 161,622 / 161,622
Rows: 223; Cols: 6
Heatmap¶
rare help heatmap
Summary¶
Create a dense, color-coded, version of table-data by using cells to display
the strength of a value. Can either use \x00
or the {$ a b}
helper. First
element is the column name, followed by the row name.
Example¶
$ rare heatmap -m '\[(.+?)\].*" (\d+)' \
-e "{timeattr {time {1}} yearweek}" -e "{2}" access.log
- 0 5 22,602 9 45,204
2019-34..2019-41..2019-50..2020-15..2020-23..2020-31...2020-9
200 1111111111111111111111111111111111111111111111111111111-11111
206 -------------------------------------------------------------
301 -------------------------------------------------------------
304 -------------------------------------------------------------
400 -------------------------------------------------------------
404 33516265914153253212111-1511-13-141-1412-132111--14-1-1-13211
405 -------------------------------------------------------------
408 -------------------------------------------------------------
Matched: 1,035,666 / 1,035,666 (R: 8; C: 61)
Reduce¶
rare help reduce
Summary¶
Create a set of values or table based on an expression that accumulates results. More powerful than table or histogram because it can interpret data in multiple ways in a single output. Also can group and sort results.
Usage:
- Extract data from a regex using
--match
(-m
) - Optionally group the data into buckets using
--group
(-g
). Can usekey=value
format to give column a name - Specify one or more "accumulators", which are expressions.
{.}
represents the current value, so for example,{sumi {.} {1}}
adds the match{1}
to the current value.- Can specify only expression,
key=expression
format orkey:initial=expression
. - Can reference past accumulators by key, eg
{divi {key1} {key2}}
- Can specify only expression,
- Optionally
--sort
the data based on an expression or reference to an accumulator. eg.--sort {key1}
. Can reverse with--sort-reverse
flag
Example¶
$ rare reduce -m "(\d{3}) (\d+)" -g "http={1}" -a "total={sumi {.} {2}}" \
-a "count={sumi {.} 1}" -a "avg={divi {total} {count}}" \
--sort="-{avg}" access.log
http total count avg
206 260976 6 43496
200 1481979390 318003 4660
404 275969007 686541 401
405 279264 708 394
301 3885 21 185
400 5027469 30165 166
304 0 48 0
408 0 174 0
Matched: 1,035,666 / 1,035,666
Sorting¶
Many of the aggregators support changing the order in which the data is displayed in. You
can change this from default either by setting the --sort
flag or --sort-rows
and --sort-cols
flags for tables.
These are the supported sorters:
text
-- Pure alphanumeric sort. Fastest, but can sort numbers oddly (eg. would sort 1, 11, 2, ...)numeric
-- Attempts to parse the value as numeric. If unable to parse, falls back to alphanumeric (Default)contextual
-- Tries to use context to be smart about sorting, eg if it sees a month or weekday name, will sort by that. Falls back to numericdate
-- Parses the value as if it were a date. Falls back to contextualvalue
-- Orders the results based on their aggregated value. eg. would put the most frequent item at the top. Defaults to descending order
Modifiers¶
In addition to the sorting method, you can also modify the sort by adding a colon and the modifier, eg: numeric:desc
These are the supported modifiers:
:reverse
-- Reverse of the "default":asc
-- Ascending order:desc
-- Descending order