Regex Parser Transform
The Vector regex_parser transform accepts log events and allows you to parse a log field's value with a Regular Expression.
Configuration
[transforms.my_transform_id]# REQUIRED - Generaltype = "regex_parser" # example, must be: "regex_parser"inputs = ["my-source-id"] # exampleregex = "^(?P<timestamp>[\\w\\-:\\+]+) (?P<level>\\w+) (?P<message>.*)$" # example# OPTIONAL - Generaldrop_field = true # defaultfield = "message" # default# OPTIONAL - Types[transforms.my_transform_id.types]status = "int"
Options
drop_field
If the specified field should be dropped (removed) after parsing.
truefield
The log field to parse. See Failed Parsing for more info.
"message"regex
The Regular Expression to apply. Do not include the leading or trailing /. See Failed Parsing and Regex Debugger for more info.
types
Key/Value pairs representing mapped log field types. See Regex Syntax for more info.
[field-name]
A definition of log field type conversions. They key is the log field name and the value is the type. strptime specifiers are supported for the timestamp type.
"bool" "float" "int" "string" "timestamp" Output
Given the following log line:
{"message": "5.86.210.12 - zieme4647 5667 [19/06/2019:17:20:49 -0400] \"GET /embrace/supply-chains/dynamic/vertical\" 201 20574"}
And the following configuration:
[transforms.<transform-id>]type = "regex_parser"field = "message"regex = '^(?P<host>[\w\.]+) - (?P<user>[\w]+) (?P<bytes_in>[\d]+) \[(?P<timestamp>.*)\] "(?P<method>[\w]+) (?P<path>.*)" (?P<status>[\d]+) (?P<bytes_out>[\d]+)$'[transforms.<transform-id>.types]bytes_int = "int"timestamp = "timestamp|%d/%m/%Y:%H:%M:%S %z"status = "int"bytes_out = "int"
A log event will be output with the following structure:
{// ... existing fields"bytes_in": 5667,"host": "5.86.210.12","user_id": "zieme4647","timestamp": <19/06/2019:17:20:49 -0400>,"message": "GET /embrace/supply-chains/dynamic/vertical","status": 201,"bytes": 20574}
Things to note about the output:
- The
messagefield was overwritten. - The
bytes_in,timestamp,status, andbytes_outfields were coerced.
How It Works
Environment Variables
Environment variables are supported through all of Vector's configuration.
Simply add ${MY_ENV_VAR} in your Vector configuration file and the variable
will be replaced before being evaluated.
You can learn more in the Environment Variables section.
Failed Parsing
If the field value fails to parse against the provided regex then an error
will be logged and the event will be kept or discarded
depending on the drop_failed value.
A failure includes any event that does not successfully parse against the
provided regex. This includes bad values as well as events missing the
specified field.
Performance
The regex_parser source has been involved in the following performance tests:
Learn more in the Performance sections.
Regex Debugger
To test the validity of theregex option, we recommend the Rust
Regex Tester. Note, you must use
named captures in your regex to map the results to fields.
Regex Syntax
Vector follows the documented Rust Regex syntax since Vector is written in Rust. This syntax follows a Perl-style regular expression syntax, but lacks a few features like look around and backreferences.
Named Captures
You can name Regex captures with the <name> syntax. For example:
^(?P<timestamp>\w*) (?P<level>\w*) (?P<message>.*)$
Will capture timestamp, level, and message. All values are extracted as
string values and must be coerced with the types table.
More info can be found in the Regex grouping and flags documentation.
Flags
Regex flags can be toggled with the (?flags) syntax. The available flags are:
| Flag | Descriuption |
|---|---|
i | case-insensitive: letters match both upper and lower case |
m | multi-line mode: ^ and $ match begin/end of line |
s | allow . to match \n |
U | swap the meaning of x* and x*? |
u | Unicode support (enabled by default) |
x | ignore whitespace and allow line comments (starting with #) |
For example, to enable the case-insensitive flag you can write:
(?i)Hello world
More info can be found in the Regex grouping and flags documentation.