Monitor the status of SQL Server Agent jobs with Nagios XI

I am looking into monitoring the status of  SQL Server Agent jobs and reporting them with Nagios with the following requirements

  • For each step report the ‘run_status’.
  • Provide date arithmetic to determine a time span for when the step was last run in days.
  • Return a status of Critical if the run_status is  reported as ‘0’ and Warning if the run_status is reported as ‘1’ and the ‘run_date’ is greater than the time span.

The powershell script will require two parameters for both the SQL Agent step name and the number of days to compare the ‘run_date’ agaisnt a calculated time span object.

Param ([string] $StepName,[string] $Days)

As I will be querying the SQL Server database engine instance and in this case a majority of those are SQL Server 2008 R2 I will need to import the SQLPS module into my session.

Import-Module SQLPS -DisableNameChecking

Below is my the query I will be invoking to return the information for the last run where the step_name is specified as a parameter.

$Query = Invoke-SQLCmd - ServerInstance "." -Query ("SELECT TOP 1 * from msdb..sysjobhistory h WHERE step_name = " + $StepName + " order by instance_id desc")

Once the information has been returned I will need to use DateTime.ParseExact method in order to convert  the ‘run_date’ value to a date string of ‘dd/MM/yyyy’ and store this as a variable.

$LastRun = [datetime]::parseexact(($Query.run_date),"yyyyMMdd",$null).toString('dd/MM/yyyy')

The run_duration value returns the amount of time the step was running in seconds, I want to return this as a string ‘dd:hh:mm:ss’ by using the New-TimeSpan cmdlet and formatting the output.

$Duration = (New-TimeSpan -Seconds $Query.run_duration).ToString()

In order to provide the date arithmetic I will subtract the  number of days specified in the parameter from the current date.

$Timespan = (Get-Date) - (New-TimeSpan -Days $Days -Hours 0 -Minutes 0)

In order to configure the service status and the status information conditional logic will be used to determine the return code. As per my requirements, I require the to report the service status to be ‘OK’ if the run_status is equal to ‘1’ and the ‘run_date’ is greater than the TimeSpan variable generated from the date arithmetic.

If ($Query.run_status -eq "1" -and [datetime]::parse($LastRun ) -ge [datetime]::parse($TimeSpan) )
{ 
$returncode = 0
"Completed succesfully on " + $LastRun + " with run duration " + $Duration
}

We will report a service status of ‘Warning if the ‘run_status’ is equal to ‘1’ and the ‘run_date’ is less then the TimeSpan variable generated from the date arithmetic.

ElseIf ($Query.run_status -eq "1" -and $LastRun -lt $Timespan )
{ 
$returncode = 1
"Completed succesfully on " + $LastRun + " with run duration " + $Duration
}

Finally, if the ‘run_status’ is equal to ‘0’ the service status will be reported as ‘Critical.

ElseIf ($Query.run_status -eq "0")
{ 
$returncode = 2
"Completed with failure on " + $LastRun  + " with run duration " + $Duration
}

Once the status information has been generated the script will exit the powershell session returning the exit code.

exit $returncode

While the script was created to be executed as an external script within Nagios, this can be run standalone from Windows Powershell as below.

./Check-SQLServerAgentJob,ps11 -StepName <step_name> -Days <number of days>

If your are looking to add external scripts to Nagios such as this one see the below link for more information;

https://deangrant.wordpress.com/2013/09/12/creating-and-running-external-scripts-within-nagios-xi/

The full Windows Powershell script can be downloaded from the below link:

https://app.box.com/s/ph6h3qmzaytw5oiel1eb


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s