Tokenizer Transform
The Vector tokenizer transform accepts log events and allows you to tokenize a field's value by splitting on white space, ignoring special wrapping characters, and zip the tokens into ordered field names.
Configuration
[transforms.my_transform_id]# REQUIRED - Generaltype = "tokenizer" # example, must be: "tokenizer"inputs = ["my-source-id"] # examplefield_names = ["timestamp", "level", "message"] # example# OPTIONAL - Generaldrop_field = true # defaultfield = "message" # default# OPTIONAL - Types[transforms.my_transform_id.types]status = "int"
Options
drop_field
If true the field will be dropped after parsing.
truefield
The log field to tokenize.
"message"field_names
The log field names assigned to the resulting tokens, in order.
types
Key/Value pairs representing mapped log field types.
[field-name]
A definition of log field type conversions. They key is the log field name and the value is the type. strptime specifiers are supported for the timestamp type.
"bool" "float" "int" "string" "timestamp" Output
Given the following log line:
{"message": "5.86.210.12 - zieme4647 [19/06/2019:17:20:49 -0400] "GET /embrace/supply-chains/dynamic/vertical" 201 20574"}
And the following configuration:
[transforms.<transform-id>]type = "tokenizer"field = "message"fields = ["remote_addr", "ident", "user_id", "timestamp", "message", "status", "bytes"]
A log event will be output with the following structure:
{// ... existing fields"remote_addr": "5.86.210.12","user_id": "zieme4647","timestamp": "19/06/2019:17:20:49 -0400","message": "GET /embrace/supply-chains/dynamic/vertical","status": "201","bytes": "20574"}
A few things to note about the output:
- The
messagefield was overwritten. - The
identfield was dropped since it contained a"-"value. - All values are strings, we have plans to add type coercion.
- Special wrapper characters were dropped, such as
wrapping
[...]and"..."characters.
How It Works
Blank Values
Both " " and "-" are considered blank values and their mapped field will
be set to null.
Environment Variables
Environment variables are supported through all of Vector's configuration.
Simply add ${MY_ENV_VAR} in your Vector configuration file and the variable
will be replaced before being evaluated.
You can learn more in the Environment Variables section.
Special Characters
In order to extract raw values and remove wrapping characters, we must treat certain characters as special. These characters will be discarded:
"..."- Quotes are used tp wrap phrases. Spaces are preserved, but the wrapping quotes will be discarded.[...]- Brackets are used to wrap phrases. Spaces are preserved, but the wrapping brackets will be discarded.\- Can be used to escape the above characters, Vector will treat them as literal.