Solr Service Alerting - SearchStax


Overview

SearchStax® from Measured Search® provides two kinds of real-time email alerts:

Both types of alerts create an "incident" report that you can inspect in the SearchStax dashboard.

Heartbeat Alerts

Both Zookeeper and Solr send reports of system metrics to SearchStax at least once per minute. You can set up a "heartbeat" alarm to notify you if these reports should be delayed or interrupted. The system also notifies you when the updates resume.

You can configure the heatbeat alert to trigger when a configurable number of reports have been missed within a configurable amount of time. These settings let you guard against false alarms caused by transient network delays.

Set up a Heartbeat Alert

To set up a heartbeat alert, open the SearchStax dashboard and navigate to a specific deployment.

Scroll the left-side menu down until you see the Alerting node. Expand this node and select Heartbeat. Click the New Heartbeat button.

Receive a Heartbeat Alert

A heartbeat email notification resembles this one:

Subject: Host 52.57.163.20 DOWN SearchStax notification
From: alert@measuredsearch.com
Date: 11/1/2016 4:46 PM
To: you@gmail.com

Hi there,

This is a notification sent by SearchStax.

Host 52.57.163.20 is DOWN

Log in to your account at https://searchstax.measuredsearch.com/admin/deployment/xxx/threshold/incident/update/528 to see further details and take the necessary actions.

Best regards,
Measured Search Team

You will receive a similar "UP" notification when the heartbeat is again detected.

View the Heartbeat Incident Report

Click the URL in the email to view the incident report. (Or use the SearchStax dashboard menu to visit Alerting > Incidents. Choose the current incident from the list.)

Screenshot

You'll see a brief description of the incident followed by a timeline of events. Read the timeline from the bottom up.

   

You may Close or Open each incident as many times as needed.

Threshold Alerts

A "threshold" alert watches a specific system metric and sends you email when the metric meets or exceeds some specific value.

SearchStax allows you to monitor the following system metrics:

SearchStax can also issue alerts on the following Solr metrics:

Set up a Threshold Alert

To set up a threshold alert, open the SearchStax dashboard and navigate to a specific deployment.

Scroll the left-side menu down until you see the Alerting node. Expand this node and select Threshold. Click the Create New Check button.

Screenshot

Receive a Threshold Alert

A threshold email notification resembles this one:

Subject: OPENED SearchStax incident #523 for System Load Average
From: alert@measuredsearch.com
Date: 10/31/2016 2:58 PM
To: you@gmail.com

Hi there,

This is a notification sent by SearchStax.

Incident 523, System Load Average, has been opened.

Log in to your account at https://searchstax.measuredsearch.com/admin/deployment/xxx/threshold/incident/update/523 to see further details and take the necessary actions.

Best regards,
Measured Search Team

View the Threshold Incident Report

Threshold incidents appear in the same incident list as the Heartbeat incidents.

Click the URL in the email to view the incident report.

Alerting Tips and Tricks

Here are a few notes about setting up specific types of alerts.

CPU Utilization / System Load

Set up a threshold alert monitoring the System Load Average. Here's an example:

Screenshot

This alert will trigger when the System Load Average metric is greater than 0.5 for more than one minute. It will send five emails at two-minute intervals.

Free Memory

There is no direct metric of free memory, but you can monitor Used Physical Memory plus a selection of more specific usage stats (JVM, etc.)

Average Search Latency

There is no direct metric for search latency. You can monitor Solr Search 5-minute Rate, setting alerts for both high and low rates, to alert you when search behavior become atypical.

Commits per Minute

There is no metric that reports commits per time unit. The information is present in your solr.log file.

$ grep "start commit" solr.log

A glance at the time stamps will answer your question.

Cache Warm Up Time

There are four cache warmup metrics.

Five-Minute Requests per Second

Monitor the Solr Search 5-minute Rate metric. Excessively high (or low) rates may mean that you need to add (or remove) servers.

Search Errors per Minute

Monitor Solr Search Errors and set the threshold and delay values to create an appropriate rate-per-minute.

Indexing Errors per Minute

Monitor Solr Indexing Errors and set the threshold and delay values to create an appropriate rate-per-minute.