Bookmarks: Clustered Filesystems for CentOS

Link

Excellent resources….

Clustered Filesystem with DRBD and GFS2 on CentOS 5.4

…a short walk-through of how to set up a filesystem, which replicates across two web nodes, and allows concurrent access from both nodes. This scenario is particularly useful, when you intend to load-balance or automatically fail-over two web nodes…

Clustered Filesystem with DRBD and OCFS2 on CentOS 5.5

…OCFS2 works very similar to GFS2, except that it doesn’t use RedHat’s Cluster Manager, but instead ships with O2CB, Oracle’s own cluster manager. As far as the filesystem is concerned, it does the same thing.

I’ve been playing with both solutions in VirtualBox with a plan to roll out to ec2 and solve my cpu issues.

GFS won’t be happening in EC2 as that requires multicast, I’ve played with IPSEC and GRE and the redhat clustering stuff just won’t bind to the tunnel interfaces.

OCFS2 looks like it will work, I’ll be testing on a micro-instance later but doesn’t support SELINUX so I’ll need to review my security config.

More posts no doubt as testing continues!

[ # ]

CentOS/Redhat IPSEC and EC2

So it turns out my 5 minute vpn doesn’t work in EC2 because the ESP/AH protocols (50 and 51) are blocked on the AWS network.

This is no big deal tho, as NAT-T allows one to tunnel IPSEC over UDP… however getting it to work on CentOS required a bit of a hack.

If you have already tried setting up an IPSEC vpn, shut it down with ifdown ipsec1 and remove your /etc/racoon/192.168.56.101.conf (or whatever IP yours is).

To start the hack on BOTH boxes, you need to edit /etc/sysconfig/network-scripts/ifup-ipsec. Around line 215 you need to insert nat_traversal force;… like this….

BEFORE:

        case "$IKE_METHOD" in
           PSK)
              cat >> /etc/racoon/$DST.conf << EOF
        my_identifier address;
        proposal {
                encryption_algorithm $IKE_ENC;
                hash_algorithm $IKE_AUTH;
                authentication_method pre_shared_key;
                dh_group $IKE_DHGROUP;
        }
}

AFTER:

        case "$IKE_METHOD" in
           PSK)
              cat >> /etc/racoon/$DST.conf << EOF
        my_identifier address;
        nat_traversal force;
        proposal {
                encryption_algorithm $IKE_ENC;
                hash_algorithm $IKE_AUTH;
                authentication_method pre_shared_key;
                dh_group $IKE_DHGROUP;
        }
}

Again, on both boxes update your /etc/sysconfig/network-scripts/ifcfg-ipsec1 files so that AH is disabled… because AH doesn’t like NAT… like this….


[root@CentOS2 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ipsec1 
DST=192.168.56.101
TYPE=IPSEC
ONBOOT=yes
IKE_METHOD=PSK
AH_PROTO=none
[root@CentOS2 ~]#

On your iptables policy make sure that UDP 500 and UDP 4500 are permitted and volia.

# tcpdump -n -i eth1 port not 22
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 96 bytes
20:26:49.257590 IP 192.168.56.101.ipsec-nat-t > 192.168.56.102.ipsec-nat-t: UDP-encap: ESP(spi=0x08de7c32,seq=0xa), length 116
20:26:49.261076 IP 192.168.56.102.ipsec-nat-t > 192.168.56.101.ipsec-nat-t: UDP-encap: ESP(spi=0x03787bd0,seq=0xa), length 116
20:26:50.260942 IP 192.168.56.101.ipsec-nat-t > 192.168.56.102.ipsec-nat-t: UDP-encap: ESP(spi=0x08de7c32,seq=0xb), length 116
20:26:50.262939 IP 192.168.56.102.ipsec-nat-t > 192.168.56.101.ipsec-nat-t: UDP-encap: ESP(spi=0x03787bd0,seq=0xb), length 116
20:26:51.261298 IP 192.168.56.101.ipsec-nat-t > 192.168.56.102.ipsec-nat-t: UDP-encap: ESP(spi=0x08de7c32,seq=0xc), length 116
20:26:51.264974 IP 192.168.56.102.ipsec-nat-t > 192.168.56.101.ipsec-nat-t: UDP-encap: ESP(spi=0x03787bd0,seq=0xc), length 116
20:26:52.262289 IP 192.168.56.101.ipsec-nat-t > 192.168.56.102.ipsec-nat-t: UDP-encap: ESP(spi=0x08de7c32,seq=0xd), length 116
20:26:52.265488 IP 192.168.56.102.ipsec-nat-t > 192.168.56.101.ipsec-nat-t: UDP-encap: ESP(spi=0x03787bd0,seq=0xd), length 116
20:26:53.264008 IP 192.168.56.101.ipsec-nat-t > 192.168.56.102.ipsec-nat-t: UDP-encap: ESP(spi=0x08de7c32,seq=0xe), length 116
20:26:53.267003 IP 192.168.56.102.ipsec-nat-t > 192.168.56.101.ipsec-nat-t: UDP-encap: ESP(spi=0x03787bd0,seq=0xe), length 116
20:26:54.265655 IP 192.168.56.101.ipsec-nat-t > 192.168.56.102.ipsec-nat-t: UDP-encap: ESP(spi=0x08de7c32,seq=0xf), length 116
20:26:54.267264 IP 192.168.56.102.ipsec-nat-t > 192.168.56.101.ipsec-nat-t: UDP-encap: ESP(spi=0x03787bd0,seq=0xf), length 116
20:26:55.267459 IP 192.168.56.101.ipsec-nat-t > 192.168.56.102.ipsec-nat-t: UDP-encap: ESP(spi=0x08de7c32,seq=0x10), length 116
20:26:55.269678 IP 192.168.56.102.ipsec-nat-t > 192.168.56.101.ipsec-nat-t: UDP-encap: ESP(spi=0x03787bd0,seq=0x10), length 116
14 packets captured
14 packets received by filter
0 packets dropped by kernel
#

IPSEC VPN Tunnelling over UDP…. done!

RedHat Cluster – How to Disable Fencing

I’ve spent far too long googling how to disable fencing…. I can only guess that because you shouldn’t really disable fencing no-one wants to post a how to… so for the hard of hearing.

Do NOT disable fencing on your RedHat Cluster unless you really know what you’re doing! Fencing is designed to protect your data from corruption, if you disable fencing your data is at RISK, you have been warned!

I however am working on building a GFS DRBD cluster, as far as I can gather DRBD doesn’t need fencing, and the bottom line is my data is personal data not mission critical and if my website goes down due to my disabling fencing then it’s no big deal.

Rant over, here we go….. To disable fencing, create a custom fence agent.

Fence agents are simply scripts in /sbin, I’ve created /sbin/myfence and here are the contents.

#!/bin/bash
echo "success: myfence $2"
exit 0

Next, change your cluster.conf…

<?xml version="1.0"?>
<cluster alias="linickx" config_version="41" name="linickx">
        <cman expected_votes="1" two_node="1" />

        <clusternodes>
                <clusternode name="CentOS1" nodeid="1" votes="1">
                         <fence>
                                <method name="1">
                                        <device nodename="CentOS1" name="myfence"/>
                                </method>
                        </fence>
                </clusternode>

                <clusternode name="CentOS2" nodeid="2" votes="1">
                        <fence>
                                <method name="2">
                                        <device nodename="CentOS2" name="myfence"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>

        <fencedevices>
                <fencedevice agent="myfence" name="myfence"/>
        </fencedevices>
        <rm/>
</cluster>

If you’re running SELINUX don’t forget to update that! … start with restorecon /sbin/myfence then update your policy.

This is the policy I’ve created…

module fenced 1.0;

require {
        type fenced_t;
        type shell_exec_t;
        class file { read execute };
}

#============= fenced_t ==============
allow fenced_t shell_exec_t:file { read execute };

If you save the above as fenced.te, then run this to install it..

checkmodule -M -m -o fenced.mod fenced.te
semodule_package -o fenced.pp -m fenced.mod
semodule -i fenced.pp

You should now be able to start cman, fencing will start but will return success for any fencing issues without actually doing anything!

Happy non-fencing!

GRE example for CentOS/RHEL

I’m not sure why GRE isn’t in RedHat’s Documentation, but setting up a GRE tunnel between two RedHat boxes is quite straight forward…

On Host1 (192.168.56.101)…

[root@CentOS1 ~]# cat /etc/sysconfig/network-scripts/ifcfg-tun0 
DEVICE=tun0
BOOTPROTO=none
ONBOOT=no
TYPE=GRE
PEER_OUTER_IPADDR=192.168.56.102
PEER_INNER_IPADDR=192.168.168.2
MY_INNER_IPADDR=192.168.168.1
[root@CentOS1 ~]#

On host2 (192.168.56.102) ….

[root@CentOS2 ~]# cat /etc/sysconfig/network-scripts/ifcfg-tun0 
DEVICE=tun0
BOOTPROTO=none
ONBOOT=no
TYPE=GRE
PEER_OUTER_IPADDR=192.168.56.101
PEER_INNER_IPADDR=192.168.168.1
MY_INNER_IPADDR=192.168.168.2
[root@CentOS1 ~]#

Bring the interfaces up….

[root@CentOS1 ~]# ifup tun0

.. on host2…

[root@CentOS2 ~]# ifup tun0

And we’re done! … see the proof in the pudding below….

[root@CentOS1 ~]# ifconfig tun0
tun0      Link encap:UNSPEC  HWaddr 00-00-00-00-05-08-80-3C-00-00-00-00-00-00-00-00  
          inet addr:192.168.168.1  P-t-P:192.168.168.2  Mask:255.255.255.255
          UP POINTOPOINT RUNNING NOARP  MTU:1476  Metric:1
          RX packets:2 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:168 (168.0 b)  TX bytes:756 (756.0 b)

[root@CentOS1 ~]# ping 192.168.168.2
PING 192.168.168.2 (192.168.168.2) 56(84) bytes of data.
64 bytes from 192.168.168.2: icmp_seq=1 ttl=64 time=1.51 ms
64 bytes from 192.168.168.2: icmp_seq=2 ttl=64 time=2.13 ms
64 bytes from 192.168.168.2: icmp_seq=3 ttl=64 time=2.12 ms

--- 192.168.168.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2004ms
rtt min/avg/max/mdev = 1.511/1.921/2.132/0.289 ms
[root@CentOS1 ~]#

The other end…

[root@CentOS2 ~]# ifconfig tun0
tun0      Link encap:UNSPEC  HWaddr 00-00-00-00-05-08-80-4C-00-00-00-00-00-00-00-00  
          inet addr:192.168.168.2  P-t-P:192.168.168.1  Mask:255.255.255.255
          UP POINTOPOINT RUNNING NOARP  MTU:1476  Metric:1
          RX packets:42 errors:0 dropped:0 overruns:0 frame:0
          TX packets:42 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:3528 (3.4 KiB)  TX bytes:4536 (4.4 KiB)

[root@CentOS2 ~]# ping 192.168.168.1
PING 192.168.168.1 (192.168.168.1) 56(84) bytes of data.
64 bytes from 192.168.168.1: icmp_seq=1 ttl=64 time=4.39 ms
64 bytes from 192.168.168.1: icmp_seq=2 ttl=64 time=1.41 ms
64 bytes from 192.168.168.1: icmp_seq=3 ttl=64 time=2.57 ms

--- 192.168.168.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2005ms
rtt min/avg/max/mdev = 1.419/2.795/4.393/1.224 ms
[root@CentOS2 ~]# 

Here we show the tunnelled packets…

[root@CentOS1 ~]# tcpdump -n -i eth1 proto 47
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 96 bytes
13:45:59.429315 IP 192.168.56.102 > 192.168.56.101: GREv0, length 88: IP 192.168.168.2 > 192.168.168.1: ICMP echo request, id 55053, seq 7, length 64
13:45:59.429315 IP 192.168.56.101 > 192.168.56.102: GREv0, length 88: IP 192.168.168.1 > 192.168.168.2: ICMP echo reply, id 55053, seq 7, length 64
13:46:00.530528 IP 192.168.56.102 > 192.168.56.101: GREv0, length 88: IP 192.168.168.2 > 192.168.168.1: ICMP echo request, id 55053, seq 8, length 64
13:46:00.530686 IP 192.168.56.101 > 192.168.56.102: GREv0, length 88: IP 192.168.168.1 > 192.168.168.2: ICMP echo reply, id 55053, seq 8, length 64
13:46:01.418447 IP 192.168.56.102 > 192.168.56.101: GREv0, length 88: IP 192.168.168.2 > 192.168.168.1: ICMP echo request, id 55053, seq 9, length 64
13:46:01.418526 IP 192.168.56.101 > 192.168.56.102: GREv0, length 88: IP 192.168.168.1 > 192.168.168.2: ICMP echo reply, id 55053, seq 9, length 64

6 packets captured
6 packets received by filter
0 packets dropped by kernel
[root@CentOS1 ~]#

Since we can see the ICMP packets inside the GRE tunnel that show’s us that GRE is in clear text… to add some security setup a simple IPSEC VPN :)

Reference: http://juliano.info/en/Blog:Memory_Leak/Bridges_and_tunnels_in_Fedora

5 Minute CentOS/RHEL VPN

I’m looking at running two servers on EC2; as we all know the most important thing about running services in the cloud is encryption!

Whilst googling on how to setup a host-to-host IPSEC VPN I was surprised at how easy it is…

On Host1 (192.168.56.101)…

[root@CentOS1 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ipsec1 
DST=192.168.56.102
TYPE=IPSEC
ONBOOT=no
IKE_METHOD=PSK
[root@CentOS1 ~]#
[root@CentOS1 ~]# cat /etc/sysconfig/network-scripts/keys-ipsec1 
IKE_PSK=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
[root@CentOS1 ~]#
[root@CentOS1 ~]# ifup ipsec1

On host2 (192.168.56.102)…

[root@CentOS2 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ipsec1 
DST=192.168.56.101
TYPE=IPSEC
ONBOOT=no
IKE_METHOD=PSK
[root@CentOS2 ~]#
[root@CentOS2 ~]# cat /etc/sysconfig/network-scripts/keys-ipsec1 
IKE_PSK=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
[root@CentOS2 ~]#
[root@CentOS2 ~]#ifup ipsec1

… done!!!

[root@CentOS1 ~]# tcpdump -n -i eth1 host 192.168.56.102
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 96 bytes
09:46:37.306292 IP 192.168.56.101 > 192.168.56.102: AH(spi=0x0aff2b10,seq=0x203): ESP(spi=0x00a0a3cc,seq=0x203), length 84
09:46:37.310197 IP 192.168.56.102 > 192.168.56.101: AH(spi=0x09f82154,seq=0x203): ESP(spi=0x098f0ff9,seq=0x203), length 68
09:46:38.175048 IP 192.168.56.101 > 192.168.56.102: AH(spi=0x0aff2b10,seq=0x204): ESP(spi=0x00a0a3cc,seq=0x204), length 84
09:46:38.179017 IP 192.168.56.102 > 192.168.56.101: AH(spi=0x09f82154,seq=0x204): ESP(spi=0x098f0ff9,seq=0x204), length 68
09:46:39.313583 IP 192.168.56.101 > 192.168.56.102: AH(spi=0x0aff2b10,seq=0x205): ESP(spi=0x00a0a3cc,seq=0x205), length 84
09:46:39.316427 IP 192.168.56.102 > 192.168.56.101: AH(spi=0x09f82154,seq=0x205): ESP(spi=0x098f0ff9,seq=0x205), length 68

6 packets captured
6 packets received by filter
0 packets dropped by kernel
[root@CentOS1 ~]#

Now this is a simple IKE pre-shared key vpn, you might want to google for using certificates for stronger authentication, you can also edit /etc/racoon/racoon.conf to change your IPSEC parameters.

Reference: http://www.centos.org/docs/5/html/Deployment_Guide-en-US/ch-vpn.html

UPDATE: To make this work in EC2, you need to enable NAT-T see my hack here!

SELINUX and OSSEC IPTables error

OSSEC is my favourite linux HIDS however now that I’m running a SELINUX secured web server I noticed that my active responses were not working after a reboot.

After enabling SELINUX, I started getting alerts about the following problem in my messages file….

Nov 11 12:16:22 amy kernel: type=1400 audit(1289477782.569:8): avc:  denied  { read write } for  pid=2551 comm="iptables" path="socket:[5261]" dev=sockfs ino=5261 scontext=system_u:system_r:iptables_t:s0 tcontext=system_u:system_r:initrc_t:s0 tclass=unix_dgram_socket

This appears to be ossec trying to update iptables, but failing as they’re in different contexts… now I’m no selinx expert but this CentOS Wiki Page helped… run the following command which will create osseciptables.pp in the current directtory…

root@amy# grep iptable /var/log/messages | audit2allow -M osseciptables

This creates a new binary module that can be installed with….

/usr/sbin/semodule -i osseciptables.pp 

You can view current selinux modules with …

/usr/sbin/semodule -l

If you want to see what is being created by audit to allow, try the following…

root@amy# grep iptable /var/log/messages | audit2allow -m osseciptables

module osseciptables 1.0;

require {
        type iptables_t;
        type initrc_t;
        class unix_dgram_socket { read write };
}

#============= iptables_t ==============
allow iptables_t initrc_t:unix_dgram_socket { read write };
root@amy#

I hope this helps some future googler!

Smolt RPM for CentOS, RHEL, etc

I after installing Fedora7 I thought I’d take a look at the stats the project had gathered, I saw some centos devices, but couldn’t find a rpm.

I’ve had a go at building one, it mostly works (this is my nagios box), it’s a rebuild of the f7 source, I have to frig about with the spec file, so I’ve published my source rpm here, search for Nick in the .spec file, you’ll see my bodge.

The smolt rpms are in my yum repo, feel free to download the packages and have a go.

Extra Packages for Enterprise Linux… CentOS !

Why has it taken me so long to spot this ? Looks like this draft was written on the 13th May, if I hadn’t have been just about to download FC7 then I’d have missed it !

EPEL – Fedora Project Wiki
EPEL is a volunteer-based community effort from the Fedora project to create a repository of high-quality add-on packages that complement the Fedora-based Red Hat Enterprise Linux (RHEL) and its compatible spinoffs like CentOS or Scientific Linux.

About time, and thank you redhat/fedora, want fedora extra packages in centos, then install this epel-release .rpm frickin’ sweet ! :cool:

Intel 3945ABG Wireless / WiFi Card on CentOS 5

I’ve taken to using CentOS on my servers, and fedora on my Laptop. New job, means new laptop, and to avoid fedora update hell, I thought I’d try CentOS on my laptop.

All seems good other than my wifi card not being detected, and for some reason googling for “centos 5 intel 3945” didn’t provide a working anserwer, actually I found the answer by googling for “supplementary disk centos 5” which finds this thread that says…

Install dag’s repo (this rpm), and then install dkms-ipw3945 (yum will pick up the dependancies)

yum install dkms-ipw3945

Next enable network manager…

chkconfig --level 345 NetworkManager on
chkconfig --level 345 NetworkManagerDispatcher on

reboot (seriously) and when you next log in you’ll get a little icon in you system tray where you can manage your WiFi :)

How to Migrate from White Box Linux 4 to CentOS 4.4

There are somethings that you just never get round to, my nagios box was still running whitebox linux, and I’ve finally gotten round “upgrading” it to CentOS… yeah ok, upgrade is arguable, but you get my point.

First off a warning: Don’t do this ! All the documentation, for CentOS, RHEL, Fedora, any redhat linux all say, clean installs are the best way, and upgrades are not advised…. therefore I offer no support or warranty that this will work, in fact, I you advise you to read this post, but step away from your consoles !

But, if you think it might be a laugh, the centos documentation is a bit old, and not 100% correct, so here is what I did. First up (as root – obviously), clear out your yum cache,and install the CentOS gpg key.

yum clean all
rpm --import http://mirror.centos.org/centos/RPM-GPG-KEY-CentOS-4

Next, install some base centos packages, take not that some need to be forced on

rpm -Uvh --nodeps http://mirror.centos.org/centos/4.4/os/i386/CentOS/RPMS/centos-release-4-4.2.i386.rpm
rpm -ivh http://mirror.centos.org/centos/4.4/os/i386/CentOS/RPMS/python-elementtree-1.2.6-4.2.1.i386.rpm
rpm -ivh http://mirror.centos.org/centos/4.4/os/i386/CentOS/RPMS/python-sqlite-1.1.7-1.2.i386.rpm
rpm -ivh http://mirror.centos.org/centos/4.4/os/i386/CentOS/RPMS/sqlite-3.3.3-1.2.i386.rpm
rpm -Uvh --force http://mirror.centos.org/centos/4.4/os/i386/CentOS/RPMS/python-urlgrabber-2.9.8-2.noarch.rpm
rpm -Uvh --nodeps http://mirror.centos.org/centos/4.4/os/i386/CentOS/RPMS/yum-2.4.3-1.c4.noarch.rpm

finally remove the whitebox rpm db.

rpm -ev rpmdb-whitebox

Move any “whitebox” mirrors still in /etc/yum.repos.d and

yum install rpmdb-CentOS

Once you have that sorted, you can complete the upgrade with

yum update
reboot

& cross your fingers ;)
If you come across the following warnings while using yum: Warning, could not load sqlite, falling back to pickle , I found…

yum install python-sqlite

Fixed the problem. And there we have it, all my boxes are now running CentOS – yay – just in time to look at the CentOS 5 upgrade ;)

Dependency Problems ?
If a whitebox rpm is newer than the CentOS one, it won’t get upgraded, this might cause problems when installing new packages via yum. To solve the problem download the rpm manually from http://www.centos.org/modules/tinycontent/index.php?id=13 and force an upgrade

rpm --force -Uvh Something-CentOS.rpm

UPDATE: If you’re using something like Root Kit Hunter, you will notice a load of md5 hashes fail, these are whitebox rpm’s that didn’t need upgrading, to correct the problem you need to replace these with CentOS versions.. example rkhunter output:

/sbin/init  [ BAD ]

Find which rpm, init belongs to

# rpm -q --whatprovides /sbin/init
SysVinit-2.85-34.3

and upgrade it

wget http://www.mirrorservice.org/sites/mirror.centos.org/4.4/os/i386/CentOS/RPMS/SysVinit-2.85-34.3.i386.rpm
rpm --force -Uvh SysVinit-2.85-34.3.i386.rpm

Service Recovery Scripts & Error Page Tips.

A couple of weeks ago, I was proper ill with flu; the problem with looking after your own server is that only you can fix it – it’s well and good having monitoring systems (nagios) telling you about faults, but if you can’t read or see the alerts the fault won’t get resolved.

During this time I was ill, for an unknown reason the mySQL process on my server died, as such my website (and others I look after) were down for 8 hours. The fix was simple, one command, restart the service and normal service was resumed (excuse the pun).

This led to me to the conclusion that there must be a way to get the server to fix it’s self. after all, why do a job when you can get a computer to do it for you ! Fortunately I had a light bulb moment and realised that I could use the init scripts that are provided by redhat, the below code will restart apache (httpd) and mySQL on a redhat based system in the event that the service was not stopped cleanly. (In-fact this config has only be tested on CentOS, your mileage may vary on anything else)

#!/bin/bash

# taken from redhast default scripts - /etc/rc.d/init.d/functions

# Set up a default search path.
PATH="/sbin:/usr/sbin:/bin:/usr/bin:/usr/X11R6/bin"
export PATH

status() {
        local base=${1##*/}
        local pid

        # Test syntax.
        if [ "$#" = 0 ] ; then
                echo $"Usage: status {program}"
                return 1
        fi

        # First try "pidof"
        pid=`pidof -o $$ -o $PPID -o %PPID -x $1 ||
             pidof -o $$ -o $PPID -o %PPID -x ${base}`
        if [ -n "$pid" ]; then
# Uncomment this if you want OK messages
#               echo $"${base} (pid $pid) is running..."
                return 0
        fi

        # Next try "/var/run/*.pid" files
        if [ -f /var/run/${base}.pid ] ; then
                read pid < /var/run/${base}.pid
                if [ -n "$pid" ]; then
                        echo $"${base} dead but pid file exists"
                        /etc/init.d/${base} restart
                        return 1
                fi
        fi
        # See if /var/lock/subsys/${base} exists
        if [ -f /var/lock/subsys/${base} ]; then
                echo $"${base} dead but subsys locked"
                /etc/init.d/${base} restart
                return 2
        fi
        echo $"${base} is stopped"
        return 3
}

# found in /etc/init.d/httpd
httpd=${HTTPD-/usr/sbin/httpd}

status mysqld
status $httpd

If you save this, as /etc/cron.hourly/auto_recovery.sh , then do chmod +x /etc/cron.hourly/auto_recovery.sh , assuming you’ve not changed the default cron setup, every hour mySQL & httpd will be checked, if they have died the’ll be restarted and root will get an e-mail about what happened.

Cool eh !

A final finishing touch: I wanted to change the default “Database Down” error messages on my two most popular applications.

  • Melvin Rivera has written a tutorial on how to customize the wordpress error page, note that it involves editing a file outside of wp-content, that means you’ll have to re-do this “hack” every time you upgrade wordpress.
  • PHPBB: Setting a custom error page on that is really easy, first create a php page displaying your message. Then at the bottom of /path/to/phpbb-install/includes/db.php you’ll see
    // Make the database connection.
    $db = new sql_db($dbhost, $dbuser, $dbpasswd, $dbname, false);
    if(!$db->db_connect_id)
    {
    message_die(CRITICAL_ERROR, "Could not connect to the database");
    }

    change it to

     // Make the database connection.
    $db = new sql_db($dbhost, $dbuser, $dbpasswd, $dbname, false);
    if(!$db->db_connect_id)
    {
     include("/path/to/my-custom-error-page.php");
            die();
    }

Now if you database dies, for the time it’s down (before cron fixes it) wordpress & phpbb sites would get a much prettier error message. Obviously there’s no solution for apache as there’s nothing to serve the pages, but hopefully this kind of thing doesn’t happen to often :D

Cacti & Nagios – Missing Favicons

Recently I decided to re-organise my bookmarks toolbar, and added links to my nagios and cacti installations. I noticed that the favicons where missing.

For cacti, there’s a how to, but I found it a little over kill – I didn’t need step 2 , as my catci install is an rpm from dag, and I didn’t bother with step 4, as it worked without it, but hey ymmv!

Nagios was simpler, depending on how you installed nagios, will effect file permission , owners, directories etc. Again, I’ve got another dag rpm, so for me I logged in as root,

cd /usr/share/nagios/
wget http://www.nagios.org/images/favicon.ico

then edit index.html. just before </head> , insert

<link rel="shortcut icon" href="/nagios/favicon.ico" type="image/x-icon" />

refresh your browser (delete the cache if necessary), and job done ! :D

SNMP v3 on Redhat Linux

I think it’s safe to say, if you can’t get something to work then the manual is rubbish or the user is stupid, with setting up snmp v3 on linux, the user is me, so the fault is probably lies there.

SNMPv3 moves away from the community string idea from older version, and into a username & password combo. The correct tool for creating users is snmpusm, but no matter how many times I read the man page I can’t work it out. I get that you copy a user from the initial user, but how do you create the initial user ? If I try on my box I just get an “snmp timeout” error.

I found a work around for my stupidity, on redhat based boxes (RHEL, CENTOS, WHEL, FEDORA) there is a development package to do the job, so to to get the snmp v3 encrypted goodness going run,

yum install net-snmp-utils net-snmp-devel 

Yum will pick up the dependencies you need. Now as root, run (make sure snmpd is stopped first)

/usr/bin/net-snmp-config --create-snmpv3-user -a PASSWORD MYUSERNAME

You’ll get the following output…

adding the following line to /var/net-snmp/snmpd.conf:
   createUser MYUSERNAME MD5 "PASSWORD" DES
adding the following line to /usr/share/snmp/snmpd.conf:
   rwuser MYUSERNAME

Before testing make sure that UDP 161 is permitted through iptables, and restart snmpd

/etc/init.d/snmpd start

now from another box, you can test, snmpwalk is the command, if it works your screen will fill up with loads of interesting snmp stuff, if it fails you’ll get an error. Timeout usually means UDP 161 is blocked or they can’t ping each other, and you’ll get authentication failure messages if there is a problem with your snmp accounts.

snmpwalk -v 3 -a md5 -A PASSWORD -x des -X PASSWORD -u MYUSERNAME IP.ADD.RE.SS

good luck !