Configuration

This section covers configuring Vector and creating pipelines like the example below. Vector's configuration uses the TOML syntax, and the configuration file must be passed via the --config flag when starting vector:

vector --config /etc/vector/vector.toml

Example

vector.toml

# Set global options
data_dir = "/var/lib/vector"
# Ingest data by tailing one or more files
[sources.apache_logs]
  type         = "file"
  include      = ["/var/log/apache2/*.log"]    # supports globbing
  ignore_older = 86400                         # 1 day
# Structure and parse the data
[transforms.apache_parser]
  inputs       = ["apache_logs"]
  type         = "regex_parser"                # fast/powerful regex
  regex        = '^(?P<host>[w.]+) - (?P<user>[w]+) (?P<bytes_in>[d]+) [(?P<timestamp>.*)] "(?P<method>[w]+) (?P<path>.*)" (?P<status>[d]+) (?P<bytes_out>[d]+)$'
# Sample the data to save on cost
[transforms.apache_sampler]
  inputs       = ["apache_parser"]
  type         = "sampler"
  rate         = 50                            # only keep 50%
# Send structured data to a short-term storage
[sinks.es_cluster]
  inputs       = ["apache_sampler"]            # only take sampled data
  type         = "elasticsearch"
  host         = "http://79.12.221.222:9200"   # local or external host
  index        = "vector-%Y-%m-%d"             # daily indices
# Send structured data to a cost-effective long-term storage
[sinks.s3_archives]
  inputs       = ["apache_parser"]             # don't sample for S3
  type         = "aws_s3"
  region       = "us-east-1"
  bucket       = "my-log-archives"
  key_prefix   = "date=%Y-%m-%d"               # daily partitions, hive friendly format
  batch_size   = 10000000                      # 10mb uncompressed
  compression  = "gzip"                        # compress final objects
  encoding     = "ndjson"                      # new line delimited JSON

Quick Start

At the very minimum, a Vector configuration file must be composed of a source and a sink, transforms are optional. To get started:

Choose a source

To begin, you'll need to ingest data into Vector. This happens through one or more sources. For example:

vector.toml

[sources.nginx_logs]
  type = "file"
  include = "/var/log/nginx*.log"

Optionally choose a transform

Next, you'll want to choose a transform. Transforms are optional, but most configuration include at least one since they help to improve your data through parsing, structuring, and enriching. For example, let's use the regex_parser transform to parse and structure our data:

vector.toml

[sources.nginx_logs]
  type = "file"
  include = "/var/log/nginx*.log"
[transforms.nginx_parser]
  inputs  = ["nginx_logs"] # <--- connect the transform to our source
  type    = "regex_parser"
  include = '^(?P<host>[w.]+) - (?P<user>[w]+) (?P<bytes_in>[d]+) [(?P<timestamp>.*)] "(?P<method>[w]+) (?P<path>.*)" (?P<status>[d]+) (?P<bytes_out>[d]+)$'

Notice how we connected the new transform to our source via the inputs option.

Choose a sink

Finally, you'll want to choose a sink. Sinks are responsible for emitting data out of Vector. For this example, we'll use the console sink, which is simply writes the data to STDOUT:

vector.toml

[sources.nginx_logs]
  type = "file"
  include = "/var/log/nginx*.log"
[transforms.nginx_parser]
  inputs  = ["nginx_logs"]
  type    = "regex_parser"
  include = '^(?P<host>[w.]+) - (?P<user>[w]+) (?P<bytes_in>[d]+) [(?P<timestamp>.*)] "(?P<method>[w]+) (?P<path>.*)" (?P<status>[d]+) (?P<bytes_out>[d]+)$'
[sinks.print]
  inputs = ["nginx_parser"] # <--- connect the sink to our transform
  type   = "console"

Again, notice how we connect the new sink via the inputs option.

Next steps
This serves as a basic example of how to build a minimal Vector configuration file. It's likely you'll want to build more advanced pipelines which are covered in the guides section.

How It Works

Config File Location

The location of your Vector configuration file depends on your installation method. For most Linux based systems the file can be found at /etc/vector/vector.toml.

Environment Variables

Vector will interpolate environment variables within your configuration file with the following syntax:

vector.toml

[transforms.add_host]
  type = "add_fields"
    
  [transforms.add_host.fields]
    host = "${HOSTNAME}"
    environment = "${ENV:-development}" # default value when not present

Interpolation is done before parsing the configuration file. As such, the entire ${ENV_VAR} variable will be replaced, hence the requirement of quotes around the definition.

Environment Variable Escaping

You can escape environment variable by preceding them with a $ character. For example $${HOSTNAME} will be treated literally in the above environment variable example.

Field Interpolation

Select configuration options support Vector's field interpolation syntax to produce dynamic values derived from the event's data. Two syntaxes are supported for fields that support field interpolation:

Strptime specifiers. Ex: date=%Y/%m/%d
Event fields. Ex: {{ field_name }}

For example:

vector.toml

[sinks.es_cluster]
  type  = "elasticsearch"
  index = "user-{{ user_id }}-%Y-%m-%d"

The above index value will be calculated for each event. For example, given the following event:

{
  "timestamp": "2019-05-02T00:23:22Z",
  "message": "message",
  "user_id": 2
}

The index value will result in:

index = "user-2-2019-05-02"

Syntax

The Vector configuration file follows the TOML syntax for it's simplicity, explicitness, and relaxed white-space parsing. For more information, please refer to the TOML documentation.

Types

All TOML values types are supported. For convenience this includes:

#Example

#Quick Start

#How It Works

#Config File Location

#Environment Variables

#Environment Variable Escaping

#Field Interpolation

#Syntax

#Types

Example

Quick Start

How It Works

Config File Location

Environment Variables

Environment Variable Escaping

Field Interpolation

Syntax

Types