Configuration
This section covers configuring Vector and creating
pipelines like the example below.
Vector's configuration uses the TOML syntax, and the configuration
file must be passed via the --config flag
when starting vector:
vector --config /etc/vector/vector.toml
Example
# Set global optionsdata_dir = "/var/lib/vector"# Ingest data by tailing one or more files[sources.apache_logs]type = "file"include = ["/var/log/apache2/*.log"] # supports globbingignore_older = 86400 # 1 day# Structure and parse the data[transforms.apache_parser]inputs = ["apache_logs"]type = "regex_parser" # fast/powerful regexregex = '^(?P<host>[w.]+) - (?P<user>[w]+) (?P<bytes_in>[d]+) [(?P<timestamp>.*)] "(?P<method>[w]+) (?P<path>.*)" (?P<status>[d]+) (?P<bytes_out>[d]+)$'# Sample the data to save on cost[transforms.apache_sampler]inputs = ["apache_parser"]type = "sampler"rate = 50 # only keep 50%# Send structured data to a short-term storage[sinks.es_cluster]inputs = ["apache_sampler"] # only take sampled datatype = "elasticsearch"host = "http://79.12.221.222:9200" # local or external hostindex = "vector-%Y-%m-%d" # daily indices# Send structured data to a cost-effective long-term storage[sinks.s3_archives]inputs = ["apache_parser"] # don't sample for S3type = "aws_s3"region = "us-east-1"bucket = "my-log-archives"key_prefix = "date=%Y-%m-%d" # daily partitions, hive friendly formatbatch_size = 10000000 # 10mb uncompressedcompression = "gzip" # compress final objectsencoding = "ndjson" # new line delimited JSON
Quick Start
At the very minimum, a Vector configuration file must be composed of a source and a sink, transforms are optional. To get started:
Choose a source
To begin, you'll need to ingest data into Vector. This happens through one or more sources. For example:
vector.toml[sources.nginx_logs]type = "file"include = "/var/log/nginx*.log"Optionally choose a transform
Next, you'll want to choose a transform. Transforms are optional, but most configuration include at least one since they help to improve your data through parsing, structuring, and enriching. For example, let's use the
regex_parsertransform to parse and structure our data:vector.toml[sources.nginx_logs]type = "file"include = "/var/log/nginx*.log"[transforms.nginx_parser]inputs = ["nginx_logs"] # <--- connect the transform to our sourcetype = "regex_parser"include = '^(?P<host>[w.]+) - (?P<user>[w]+) (?P<bytes_in>[d]+) [(?P<timestamp>.*)] "(?P<method>[w]+) (?P<path>.*)" (?P<status>[d]+) (?P<bytes_out>[d]+)$'Notice how we connected the new transform to our source via the
inputsoption.Choose a sink
Finally, you'll want to choose a sink. Sinks are responsible for emitting data out of Vector. For this example, we'll use the
consolesink, which is simply writes the data toSTDOUT:vector.toml[sources.nginx_logs]type = "file"include = "/var/log/nginx*.log"[transforms.nginx_parser]inputs = ["nginx_logs"]type = "regex_parser"include = '^(?P<host>[w.]+) - (?P<user>[w]+) (?P<bytes_in>[d]+) [(?P<timestamp>.*)] "(?P<method>[w]+) (?P<path>.*)" (?P<status>[d]+) (?P<bytes_out>[d]+)$'[sinks.print]inputs = ["nginx_parser"] # <--- connect the sink to our transformtype = "console"Again, notice how we connect the new sink via the
inputsoption.Next steps
This serves as a basic example of how to build a minimal Vector configuration file. It's likely you'll want to build more advanced pipelines which are covered in the guides section.
How It Works
Config File Location
The location of your Vector configuration file depends on your installation
method. For most Linux based systems the file can be
found at /etc/vector/vector.toml.
Environment Variables
Vector will interpolate environment variables within your configuration file with the following syntax:
[transforms.add_host]type = "add_fields"[transforms.add_host.fields]host = "${HOSTNAME}"environment = "${ENV:-development}" # default value when not present
Interpolation is done before parsing the configuration file. As such, the
entire ${ENV_VAR} variable will be replaced, hence the requirement of
quotes around the definition.
Environment Variable Escaping
You can escape environment variable by preceding them with a $ character. For
example $${HOSTNAME} will be treated literally in the above environment
variable example.
Field Interpolation
Select configuration options support Vector's field interpolation syntax to produce dynamic values derived from the event's data. Two syntaxes are supported for fields that support field interpolation:
- Strptime specifiers. Ex:
date=%Y/%m/%d - Event fields. Ex:
{{ field_name }}
For example:
[sinks.es_cluster]type = "elasticsearch"index = "user-{{ user_id }}-%Y-%m-%d"
The above index value will be calculated for each event. For example, given
the following event:
{"timestamp": "2019-05-02T00:23:22Z","message": "message","user_id": 2}
The index value will result in:
index = "user-2-2019-05-02"
Syntax
The Vector configuration file follows the TOML syntax for it's simplicity, explicitness, and relaxed white-space parsing. For more information, please refer to the TOML documentation.
Types
All TOML values types are supported. For convenience this includes: