December 12, 2013

Jenkins Slave Monitoring

Ever wanted to monitor the status of your Jenkins slaves? As I’ve said before, Jenkins is one of my favorite tools.

At SEP, we program all the things, so we need many different build environments. Jenkins slaves to the rescue!

We have several slaves that run the following environments:

  • Mountain Lion
  • Windows 8
  • Windows 7
  • Ubuntu

Occasionally there is a hiccup and one will go down, or not stay connected to the master. Jenkins doesn’t necessarily tell you that a node is down, so we cooked up a way for Jenkins to tell us.

There are typically two types of node failures:

  1. The machine is actually not on
  2. The slave is not connected to the master

Addressing the not on problem

ping is really good at telling me whether a machine is on or not. So, I created a Jenkins job that is based on ping to tell me if the slave is up or not:

  1. Create a free-style Jenkins job.
  2. (no source control necessary)
  3. Make a build step for “execute shell” with the following contents (replace slave-hostname with the hostname of your slave):

      ping -c 4 slave-hostname
    
  4. Save.
  5. Bask in the glory of your newly monitored slave.

Addressing the not connected to master problem

Jenkins obviously knows that a slave isn’t connected… but doesn’t give us a great way to monitor it.

slaves

This information is available in the Jenkins computer API.

So…

  1. Create a free-style Jenkins job.
  2. (again, no source control necessary)
  3. Make a build step for “execute shell” with the following contents (replace slave-hostname with the hostname of your slave):

      ENDPOINT="http://jenkins.sep.com/computer/api/xml?xpath=computerSet/computer\[displayName='slave-hostname'\]/offline"
      ENDPOINT_RESULT=$(curl $ENDPOINT 2> /dev/null)
         
      if [ $ENDPOINT_RESULT = "<offline>false</offline>" ] ; then
        exit 0
      fi
      exit 1
    
  4. Save.
  5. Enjoy the piece-of-mind knowing that your slave status is now monitored by a Jenkins job.

Now what?

Once I had Jenkins jobs for monitoring my slaves, I can hook them up to any sort of notification system I want. For example, I have the twilio plugin set up to send me a text message anytime a build node goes down, so I can go hook it back up.