Another day in IT

“Can you please look at the pascoe13 server? It’s completely unresponsive, I can’t even ping it”

Half an hour later the system administrator comes back, “Er, we’re going to have to restore from last night’s back-up. Most of the file system seems to be missing!?”

And so our IT personnel dutifully restored the server back to its original state. However, to their horror the same failure occurred again the following day.

Now who would maliciously be deleting all content off a production server? Only a handful of people had access, and they all assured us the server was happy as larry when they went home for the night. It had to be a bot.

A brief investigation revealed the following script running as a cron job:

#!/bin/bash
# cleanup.sh - script that clears out empty log files
#
#
cd /$APP_HOME
... <do some other stuff here>
cd /usr/local/apps/system/apox-314-1.7/apox/WEB-INF/classes/data/logs/
rm -rf *

Now it seems apox-314-1.7 was an old application long since retired, and replaced with apox2-15-2.03. As disk space was paramount, one of the admins removed the old application directory, unaware, that a cron job was still dutifully running every night cleaning up the long since unused log directory…

Consequently the ‘cd’ operation failed. But the ‘rm -rf *’ worked as well as ever. It just happened to be on the current directory ‘/’.

Ooops.

Moral of the story: Never, ever, ever use ‘rm -rf *’ on it’s own. Specify the directory to delete in full, or combine with ‘&&’ e.g.:

cd /usr/local/apps/system/apox-313.1.3/apox/WEB-INF/classes/data/logs \
&& rm -rf *

(Note names have been changed and this post is fictonal any similarities to actual events are purely coincidental)

This entry was posted in IT. Bookmark the permalink.

Leave a Reply

Your email address will not be published.