Data Enrichment#

Preparing data for better correlation#

Unryo ingests data from disparate sources. Then, it's important to normalize it and enrich data - metrics and logs - with common tags.

Common tags provides key benefits:

simplify grouping, searching and filtering when querying data
give context to users
drive correlation

Considerations#

You can add your own dimensions, called tags, on metrics. For example, you might want to add a tag named Region or Customer or Business Unit, or any other tag that is particular to your environment. This feature allows you to add an arbitrary number of tags before storage. Customers define the tagging rules (key/value pairs) in a simple CSV file.

Tags are indexed, which means queries on tags are fast.
You should use tags for storing commonly-queried metadata, when you plan to use them for grouping, searching and filtering.
Don't use tags that contain highly variable information like UUIDs, hashes or random strings, otherwise it can lead to a large number of series in the database, known as high series cardinality. High series cardinality is a primary driver of high memory usage for many database workloads.

How the tagger works#

The goal of the tagger is to add additional tags or string fields to metrics, based on the value of their existing tags' or fields' values. For example, for some input metrics:

measurement resource_type=Router,resource=R1 cpu=33,memory=42
measurement resource_type=Host,resource=H1 cpu=12,memory=1
measurement resource_type=Router,resource=R2 cpu=65,memory=12

The tagger could be configured to add a city and country tag based on the value of the resource_type and resource tags in order to transform the metrics into:

measurement resource_type=Router,resource=R1,city=Montreal,country=Canada cpu=33,memory=42
measurement resource_type=Host,resource=H1,city=Paris,country=France cpu=12,memory=1
measurement resource_type=Router,resource=R2,city=Naples,country=USA cpu=65,memory=12

For convenience and maximum flexibility, the tagger is configured, from the Unryo UI, using a CSV along with some metadata describing the columns. In our example above, the CSV file may look like:

Router,R1,Montreal,Canada
Host,H1,Paris,France
Router,R2,Naples,USA

Along with some metadata that explains that the first two columns are the resource and resource tag names, whose values we have to match, and the two next ones are the new city and country tags that we want to add.

Add tags#

Go in Configuration Management.

Go in Tagging

Create your metadata file (for example tagger.conf) and the associated CSV file (for example tagger.csv)

The metadata file must be configured using the following:

[[processors.tagger]]
  ## the path to the CSV file
  file = "/etc/telegraf/telegraf.d/tagger.csv"

  ## the list of keys in the CSV file
  [[processors.tagger.key]]
    tag = "resource_type"
  [[processors.tagger.key]]
    tag = "resource"

  ## the list of values in the CSV file
  [[processors.tagger.value]]
    tag = "city"
  [[processors.tagger.value]]
    tag = "country"

The provided CSV file must have, in the same order as the configuration file, one column per declared key, then one column per declared value.

Router,R1,Montreal,Canada
Host,H1,Paris,France
Router,R2,Naples,USA

That's it! The next data points will be enriched with the new tags.

Copy tags#

You can create a new tag from an existing tag. The new tag will have the same value of the existing tag. A typical use-case for this is to regroup similar tags under a single one for data model unification purpose.

Example:

Incoming data:

system,region=us-midwest temperature=82 1465839830100400200

With this tagging configuration:

[[processors.tagger]]
   ...
   [[processors.tagger.value]]
      tag = "resource"
      type = "copy-tag"

And CSV configuration:

...,region

Would give:

system,region=us-midwest,resource=us-midwest temperature=82 1465839830100400200

Out-of-the-box Tagging Configurations#

The tagger comes also with some predefined configurations covering popular tagging use-cases:

Add a simple tag (city, customer name, ...)
Add Maintenance Periods
Add well-know TCP ports
Add SNMP Vendor and Model
Unify naming
and more.