Skip to content

Analytics and Alerting#

Introduction#

The Unryo platform processes millions of metrics and events within the stream of data, in order to detect anomalies in real-time.

To determine if conditions are met, Unryo uses numerous alert configurations that cover the major technologies, and that are built-in with best-practices thresholds. When an anomaly is detected, Unryo performs root cause analysis (to determine the probable cause), impact analysis (to identify the impacted resources) and (if specified) can execute notification(s), such as email, SNMP trap, a Microsoft Teams message, or more.

Current and past alerts are visible from the Alert Console, as well as displayed in context in dashboards and topology map.

Section Content:

Topics Description
The Alert Console The Alert Console displays current and past alerts.
Alert Life Cycle Alerts in Unryo have a state (active or inactive) and a set a normalized metadata (to facilitate triage and filtering). Users can take actions on them (acknowledge, ownership, ...)
Predefined Alerts Unryo is shipped with tens of default alert rules configured with best practices thresholds covering major technologies.
Customizing Alerts Users can create their own alerts using the Alert Editor.
Notification Channels Unryo supports multiple notification channels, such as sending an email, a Slack message, a Microsoft Teams notification or more.
Root Cause Analysis The Unryo Correlation Engine determines if the resource is the root-cause or an impact, based on the topology.
Impact Analysis Unryo lets you defines business elements and logical groups, then calculates the impact based on the propagation.
ChatGPT Integration Provides users with insight and potential solutions on their alerts.

Alert Life Cycle#

Events are processed and regrouped into alerts, which are visible by users in the Alert Console. Events have a set of tags (or properties), some of which are required and some of which are optional. The Alert Engine uses event tags to create alerts, update them and maintain their state (active or inactive).

Event Properties#

Event Property Type Description
resource Required Tag Resource associated with the event.
resource_type Required Tag Resource Type for the resource
technology Required Tag Technology for the resource
level Required Tag The level (or severity) of the event. Possible values: WARNING, CRITICAL, OK, INFO
category Required Tag The category of the event. Could be any string, or a normalized category: Availability (Down, Not Responding, Cluster at Risk, Cluster Degraded, Unavailable, and more): indicates a resource unavailable or at risk of being unavailable. Reachability (No Data Received): indicates a connectivity problem with the monitored resources, such as a network communication failure, cloud-API unavailable, EMS access unavailable. Errors (Failed Status Check, Failed Health Check, Authentication Failed, and more): Error-related events (e.g. a request that fails) or increased error rates (e.g. traffic errors). Saturation (High Processor Usage, High Memory Usage, High Queue Length, High Disk Reads, Disk almost full, High Load, and more): a measure of the resource utilization, that indicates how "full" a resource is. Latency (Degraded Disk IO, Query Slowdown, Long Response Time, and more): informs of slowdowns, increased times for query executions or user transactions. Custom: User-Defined Event Type. Informational: Purely informational event, no impact.
eventname Required Tag A string that acts as a title of the event. e.g. "High CPU Utilization"
eventtext Required Tag A string that provides a short description about the event, e.g. "Linux Server is experiencing a high CPU utilization"
message Optional Field A string that provides a longer description about the event, e.g. "CPU utilization is high: 83.4%" or "CPU utilization is now back to normal: 72.0%"
eventtype Required Tag Either durable (indicates that Unryo knows the state of the event and is able to send a new event if the state changes) or momentary (indicates a problem occurred at a point of time, for example, a SNMP trap notification)
value Required Field The value associated with the current event. It has to be a float.
unit Optional Tag
resource_component Optional Tag
resource_component_type Optional Tag
measurement Required Tag Timeseries measurement that stores the metric
alertname Required Tag The alert policy that triggered the event, e.g. "Linux CPU"
alertID Required Tag The unique alert identifier. Automatically set. Could be customized to control events to alert matching.

If you customize your own alert policies, make sure all the required tags and fields are set on the resulting events. If not, the event could not be interpreted and converted into an alert. You can add your own custom tags and fields.

Alert Properties#

Alert Property Description
Event State Active indicates that the problem is active and requires attention. This is the initial state when an event occurs. Inactive indicates that the problem no longer requires attention because it has been cleared or has expired. They are displayed in white in the Alerts Console. A durable alert becomes Inactive when the corresponding clear is received. A momentary alert becomes inactive when acknowledged if its clear on ack is enabled or when the expiration is reached (default 24 hours). All Inactive alerts are deleted after a period of time (default 3 days). Deleted Inactive alerts are automatically deleted after a period of time (default 3 days). Deleted alerts do not appear in the Alerts Console anymore.
Acknowledge Acknowledging an alert tells other operators that you are aware of the issue and are working on it. Acknowledging an alert assigns the ownership to you. Acknowledging a momentary alert sets its state to inactive (if clearOnAck). Momentary alerts are automatically acknowledged after a period of time (default 24 hours). Unacknowledging an alert does not release the ownership.
Ownership Take Ownership tells other users you are working on it. Release Ownership releases the ownership.
Root Cause Boolean enriched by the Correlation Engine. Yes means that the Correlation Engine determined that the alert is a root-cause alert. No means that the alert is an impact.

Possible actions on alerts:

  • Alert Details: Shows information such as alert occurrences, tags, alert policy information, resource information, etc.
  • Acknowledge/Unacknowledge
  • Take Ownership/Release Ownership
  • Force Clear: Force clear an alert. Forcibly cleared alerts are slightly different than naturally cleared alerts. They will stay cleared for the remainder of a specific occurrence while naturally cleared alerts will change back to being active on any re-notification.

View Alerts#

Go on the Alerts section to view all the active alerts in your organization.

image

Using the Alerts Console, users can:

  • Search, Sort, Filter alerts.
  • Acknowledge, Force Clear or set ownership on alerts.
  • Details an alert and see all event occurrences and properties.
  • Execute Action Tools such as ping, incident system integration or any executable.
  • Click on a resource to access the dashboard or topology map.
  • Customize their alert console, by adding panels and columns.

Configure Alerts#

Predefined Alert Configurations#

Unryo is shipped with out-of-the-box, best-practice alert definitions for common devices and applications. Those alerts are enabled by default, so day one, you are informed on most common problems, for example if your AWS VMs are in trouble, if your Kubernetes PODs are running out of memory, or if your users have a degraded experience.

Setting up Alerts#

Alert definitions are managed centrally from the Configuration UI. In addition, you can also use the Unryo API to programmatically manage alert definitions.

Go in Configuration Management.

image

Click on the Alert Definitions panel to list all the alert configurations. Tens of configurations are available and ready to use. They are instrumented with best-practice thresholds and settings to monitor a particular technology.

image

From there, you can:

  • Enable, Disable, Delete and Duplicate an alert configuration.
  • Edit an alert configuration. You can change alert settings to your particular requirements. Typically, you may want to change thresholds, monitoring time windows, formulas, filter-out the stream of data to analyze (based on devices or any criteria), add a notification channel such as an email, Slack or other.
  • Add a new configuration. Numerous alert templates are predefined to cover most common alerting needs.

Create your own Alert Configuration#

You add an alert definition by choosing a template to start from. Templates are designed to work out-of-the-box, by covering many analytics cases, such as simple threshold, forecast, deviation, no data detection or combo-metrics KPIs. You can use them as-is or adjust your thresholds and other settings.

Click + button to open the alert editor.

Select:

  • the alert template you want to use,
  • the analytics engine on which you want this configuration to be deployed;
  • and provide a Configuration Name that is meaningful for you.
  • The Description is optional.

image

Define the alert definition as per your requirements, by specifying the stream of data to analyze, the alert conditions and which notifications to fire if any.

You can either use the Alert UI or switch in edition mode to display the configuration file.

Once done, click Apply to save and then finally Enable the configuration to start the analysis.

Configure Notifications#

Add a Notification Channel#

Now that you are familiar with how to configure alerts, you may want to have notifications from them, such as sending an email or a Slack message. Unryo allows you to automatically notify users about a serious problem or condition that requires a quick resolution.

Channel Description
Discord Sends out notifications to Discord
HipChat Sends out notifications to your HipChat room
HTTP Post Sends out notifications to a HTTP endpoint
Kafka Sends out notifications to a Kafka consumer
Microsoft Teams Sends out notifications in Microsoft Teams
MQTT Sends out notifications to MQTT
OpsGenie Sends out notifications in OpsGenie
PagerDuty Sends out notifications in PagerDuty
Pushover Sends out notifications to Pushover
ServiceNow Sends out notifications to ServiceNow
Slack Sends out notifications to your Slack channel
SMTP Sends out notifications to email recipients
SNMP Traps Sends out notifications as SNMP Traps to a SNMP Trap receiver.
Telegram Sends out notifications to Telegram
VictorOps Sends out notifications in VictorOps

Notification rate limiting & silencing#

Notification Channels can be configured to limit the notification rate, and prevent a same notification to be fired in some situations.