Health Check and Alarms

Health Checks

Health checks within MobiledgeX’s Platform can be an automated, or it can be configured manually. For example, you can configure health check so that it does not run for app instances. Health checks verify the performance of a specific component, and where possible, verifies the module is functioning within the designated normal tolerances.

Health checks and QoS checks are built into the MobiledgeX’s platform. Once the results of the health checks are available, the MobiledgeX team is notified so that appropriate actions can be taken to address any errors or warning conditions detected by the health checks.

For example, periodic tests are performed between the MobiledgeX Global Controller and the regional controllers and cloudlets. These tests can confirm that the cloudlets and controllers are active and responding. Additionally, the latency between these components is recorded and monitored for performance.

Alerts generated by the health checks are treated the same as any other alert; that is, they can be sent to any defined alert receivers provided that the RBAC security is satisfied. Managing alerts this way allows Operators to receive notifications of possible up stack issues that could potentially affect their cloudlet(s).

Alarms

Within MobiledgeX’s platform, an alarm is triggered by any abnormal system behavior or unexpected result.

Alarms are classified into one of four severity levels of severity based on the component's performance's nature and impact.

  • Critical: Requires immediate attention and reflects conditions that may affect an appliance's performance or signal the loss of a broad category of service. An example would be a network failure taking an entire cloudlet offline.

  • Major:  Indicates conditions that should be addressed within 24 hours of the notification. An example would be an unexpected traffic class error.

  • Minor: Denotes performance that may be addressed at your convenience. An example would be a user that has not changed their account’s default password or a degraded disk.

  • Warning: Signifies conditions that may develop into an issue over time. For example, a software version mismatch.

Alerts

Alerts provide notifications of alarms that constitute an irregular performance so that issues can be proactively mitigated. For all Critical and Major alerts, a notification will be sent to the user. Depending on the preferred delivery method configured by the user, the notification will be sent either through Slack or email. Once the issue or condition is resolved, another notification is sent to the user indicating that the issue has been resolved.

AlertManager

The AlertManager is a global component of the MobiledgeX’s product and is responsible for distributing alerts to cloud operators. Alarms are consolidated at the regional level, where each regional controller receives alarms via a notification.

The image below illustrates the AlertManager workflow. A cloud operator can create an alert receiver and set up their preferred notification method through the Edge-Cloud Console. Once an alert receiver is created, the receiver is pushed to the MobiledgeX Platform. When an alarm is triggered, the Alert Manager from within the platform sends an alert notification to the cloud operator for mitigation.

Alert Receiver Worflow

Alert Management

The MobiledgeX platform provides a flexible alerting interface that includes the following:

* RBAC support for users, roles, and organizations that control access to alerts. Any users having the ability to view a resource [that generates an alert] can create or delete an alert receiver for the resource. However, since alerts are raised and cleared by the platform, users cannot create custom alerts.

* Flexibility to manage the delivery of alerts to different “alert receivers” based on user configuration. We currently support the delivery of alerts to your Slack or email account.

AlertManager and MobiledgeXAPIs

The AlertManager is designed to be configurable via the MeX APIs, both directly and through the mcctl utility program, providing flexibility for users integrating with their existing monitoring systems.

Action

API Route

Create an Alert Receiver

api/v1/auth/alertreceiver/create

Delete an Alert Receiver

api/v1/auth/alertreceiver/delete

Show all Alert Receivers

api/v1/auth/alertreceiver/show

For detailed AlertReceiver API examples, please refer to the MCCTL Reference Guide.

To set up alert receivers and notification methods through the console

While you can use the mcctl tool and the commands provided to set up your alerts and notification preferences, we recommend using the Edge-Cloud Console to set up your alert receivers for ease-of-use.

Step 1. Navigate to the Alert Receivers sub-menu and click the + plus sign. The Create Receiver screen opens.

Create Alert Receiver screen

Step 2. Additional fields appear depending on your selections. Populate all the required fields.

Step 3. Your new Alert Receiver will appear on the Alert Receivers page. When you click the Alert icon, information about the alert is displayed.

Resolving Alerts

All alerts received by the user has an expiration time. Once the alert expires and it does not get refreshed, the alert is considered resolved when no further alerts are fired off.

View Audit Logs

Historical activities performed by you and others within your organization are logged and viewed from the Edge-Cloud Console. These logs are used for diagnostic purposes or error correction, and each activity is logged by date and time. You can trace different events through the various sub-sections, which are separated into three parts-- Raw Viewer, Request, and Response. These sections provide valuable information if you require support from MobiledgeX. Simply copy and paste the traceid from the Raw Viewer section, and email the traceid to [email protected].

To view the audit logs, from the Organizations page, under the Actions menu, select Audit.
The following actions may be performed on this page:

  • Filter logs by region

  • Filter logs by time range

Contact Support

You can email the [email protected] to assist you in resolving product issues. To help expedite your request, make sure you copy and paste the tracid, which can be found on the Audit logs page, into your email with a brief description of your issue.