About Events and Alerts#

Overview#

Events comes from two sources:

generated by Unryo when it detects anomalies from the data stream
ingested from your external systems (monitoring tools, SNMP traps, Logs, ...).

Events are filtered, deduplicated and mapped into alerts, visibles in the alarm console and in the topology maps.

Alert Life Cycle#

Events are processed and regrouped into alerts, which are visible by users in the Alert Console. Events have a set of tags (or properties), some of which are required and some of which are optional. The Alert Engine uses event tags to create alerts, update them and maintain their state (active or inactive).

Event Properties#

All events in Unryo are standardized with a set of common properties.

Event Property	Type	Description
resource	Required Tag	Resource associated with the event.
resource_type	Required Tag	Resource Type for the resource
technology	Required Tag	Technology for the resource
level	Required Tag	The level (or severity) of the event. Possible values: `WARNING`, `CRITICAL`, `OK`, `INFO`
category	Required Tag	The category of the event. Could be any string, or a normalized category: `Availability` (Down, Not Responding, Cluster at Risk, Cluster Degraded, Unavailable, and more): indicates a resource unavailable or at risk of being unavailable. `Reachability` (No Data Received): indicates a connectivity problem with the monitored resources, such as a network communication failure, cloud-API unavailable, EMS access unavailable. `Errors` (Failed Status Check, Failed Health Check, Authentication Failed, and more): Error-related events (e.g. a request that fails) or increased error rates (e.g. traffic errors). `Saturation` (High Processor Usage, High Memory Usage, High Queue Length, High Disk Reads, Disk almost full, High Load, and more): a measure of the resource utilization, that indicates how "full" a resource is. `Latency` (Degraded Disk IO, Query Slowdown, Long Response Time, and more): informs of slowdowns, increased times for query executions or user transactions. `Custom`: User-Defined Event Type. `Informational`: Purely informational event, no impact. `Notification`: events (such as SNMP traps, Kubernetes Events) received from external systems that are unknown or not mapped.
eventname	Required Tag	A string that acts as a title of the event. e.g. "High CPU Utilization"
eventtext	Required Tag	A string that provides a short description about the event, e.g. "Linux Server is experiencing a high CPU utilization"
message	Optional Field	A string that provides a longer description about the event, e.g. "CPU utilization is high: 83.4%" or "CPU utilization is now back to normal: 72.0%"
eventtype	Required Tag	Either `durable` (indicates that Unryo knows the state of the event and is able to send a new event if the state changes) or `momentary` (indicates a problem occurred at a point of time, for example, a SNMP trap notification)
value	Required Field	The value associated with the current event. It has to be a float.
unit	Optional Tag
resource_component	Optional Tag
resource_component_type	Optional Tag
measurement	Required Tag	Timeseries measurement that stores the metric
alertname	Required Tag	The alert policy that triggered the event, e.g. "Linux CPU"
alertID	Required Tag	The unique alert identifier. Automatically set. Could be customized to control events to alert matching.

If you customize your own alert policies, make sure all the required tags and fields are set on the resulting events. If not, the event could not be interpreted and converted into an alert. You can add your own custom tags and fields.

Alert Properties#

Alert Property	Description
Event State	`Active` indicates that the problem is active and requires attention. This is the initial state when an event occurs. `Inactive` indicates that the problem no longer requires attention because it has been cleared or has expired. They are displayed in white in the Alerts Console. A durable alert becomes Inactive when the corresponding clear is received. A momentary alert becomes inactive when acknowledged if its `clear on ack` is enabled or when the expiration is reached (default 24 hours). All Inactive alerts are deleted after a period of time (default 3 days). `Deleted` Inactive alerts are automatically deleted after a period of time (default 3 days). Deleted alerts do not appear in the Alerts Console anymore.
Acknowledge	`Acknowledging an alert` tells other operators that you are aware of the issue and are working on it. Acknowledging an alert assigns the ownership to you. Acknowledging a momentary alert sets its state to inactive (if clearOnAck). Momentary alerts are automatically acknowledged after a period of time (default 24 hours). `Unacknowledging an alert` does not release the ownership.
Ownership	`Take Ownership` tells other users you are working on it. `Release Ownership` releases the ownership.
Root Cause	Boolean enriched by the Correlation Engine. `Yes` means that the Correlation Engine determined that the alert is a root-cause alert. `No` means that the alert is an impact.