Unrecovered Events¶
The Explorer for unrecovered events displays all event records at the alert level within the current workspace to help users comprehensively understand the context of alert events, accelerating understanding and cognition of events. It also effectively reduces alert fatigue through associations with monitors and alert strategies.
The data source for unrecovered events queries event data, aggregates it using df_fault_id
as a unique identifier, and displays the most recent results. You can use this visualization tool to intuitively understand a series of key data points from event levels to trigger threshold baselines, including event levels, duration, alarm notifications, monitors, event content, and historical trigger trend charts. These pieces of information together form a comprehensive view that helps you analyze and understand events from different angles, enabling more informed response decisions.
Event Card¶
Event Level¶
Based on the trigger condition configuration of the monitor, there are statistics for Unrecovered (df_status != ok), Critical, Major, Minor, and No Data states.
In the unrecovered event Explorer, each event's level is defined as the level during the last trigger of the detection object.
For more details, refer to Event Level Description.
Event Title¶
The event title displayed in the unrecovered event Explorer directly originates from the title set when configuring the monitor rule. It represents the title used during the last trigger of the detection object.
Duration¶
This indicates the time from when the current detection object first triggered an anomaly event until the end time of the current time widget, such as 5 minutes (08/20 17:53:00 ~ 17:57:38)
.
Alert Notifications¶
The alert notification status for the last trigger of the current detection object. It mainly includes the following three states:
- Mute: Indicates that the current event is affected by mute rules but no external alert notifications have been sent;
- Identifier for the actual sent notification targets: Includes DingTalk bots, WeCom bots, Lark bots, etc.;
-
: No external alert notifications were triggered.
Monitor Detection Type¶
That is, the type of monitor.
Detection Object¶
When configuring monitor rules, if by
group queries are used in the detection metrics, the event card will display filter conditions, such as source:kodo-servicemap
.
Event Content¶
The event content for the last trigger of the current detection object originates from the preset content when configuring the monitor rule. It represents the event content during the last trigger of the detection object.
Historical Trigger Trend Chart¶
This trend is displayed using Window functions, showing the historical trend of 60 detection results.
Based on the detection results of the current unrecovered event, it shows the historical abnormal trend of the event. The trigger threshold value set in the configured monitor detection rule is set as a clear reference line. The system specially marks the detection result of the last trigger of the current detection object, and through the vertical line in the trend chart, you can quickly locate the exact time point of the event trigger. At the same time, the corresponding detection interval of the detection result is also displayed, providing you with an intuitive analysis tool to evaluate the development process and impact of the event.
Management Card¶
Display Items¶
The unrecovered events list supports the following display styles:
- Standard: Displays the event title, detection dimensions, and event content.
- Extended: In addition to standard information, it also displays the historical trend of the detection results for unrecovered events historical trend.
- List: Displays event data in list form.
Show Only Related Issue Events¶
By checking this option, you can filter out all events associated with Issues in the current event list with one click.
For a single event with an association relationship, clicking the icon on the right side of the event data allows you to jump directly to the view:
Issue & Create Issue¶
For unrecovered events create Issue, notify relevant members to handle it promptly.
- List mode:
- Standard/Extended mode:
- Event Details:
Mute Event¶
In large-scale monitoring scenarios, to avoid the cumbersome steps, time consumption, and potential omissions brought by manually handling a large number of similar alerts, you can directly "mute" the rules on the current page.
- Hover over a single event and click Mute on the right side;
- Select mute time type;
- Confirm.
Mute Time Types¶
You can customize the start and end times for muting, or quickly set it to 1 hour, 6 hours, 12 hours, 1 day, 1 week.
- Select the start time and duration for muting;
- Choose the mute cycle starting from a certain moment;
- Choose the expiration time for muting. You can choose to repeat forever according to the above time or repeat until a specific moment.
Recover Events¶
Events with a status of normal (df_sub_status = ok
) are considered recovered events.
-
To recover a single rule, you can do so via the button on the right side of the rule or go to Monitors settings, or manually recover.
-
By clicking "Recover All", you can restore all abnormal events in the current list, with the option to associate Issues.
Recovered events are divided into four types:
Name |
df_status |
Description |
---|---|---|
Recovered | ok | Previously detected "Critical", "Major", "Minor" these 3 types of abnormal events, if not triggered again within N detections, then it is considered recovered. |
Data Gap Recovery | ok | Data reporting stopped and restarted, judged as recovered. |
Data Gap Considered as Recovery | ok | If detection data has a gap, it is considered a normal state. |
Manual Recovery | ok | User manually clicks to recover, supporting single/batch recovery. |