Passionsblomst
IKT-ret.dk
Home

Projects, overview

Projects Menu

Dynamic log analyzer

Server temperature monitor

Robot detection

Feedback form

If you are experiencing

performance problems

on your web server, dylan can help you by providing a snapshot of critical data, like the number of requests and active sessions.

It is not a tool for producing weekly or monthly reports with bar graphs and pie-charts. If that is what you are looking for, I'll strongly recommend AWStats.

DYnamic Log ANalyzer

Real-time analysis of your web server log file.

I wanted an overview of the web server activity, right here, right now, not just a monthly 20-pages report.

The answer was a simple perl program, that listens in on the access log, using a "tail -f" as its input. ( -f means follow, ie. continue reading from the file as it grows.)

Every 10 minutes the program will generate a HTML page, which shows the number of active users, average session length, the amount and type of data transferred, and the number of errors. A user session is considered active as long as the server is receiving requests, not more than 10 minutes apart, from the same IP address.

One day, as I was paging my way through the log file, I noticed an unusual activity from a single IP address. He/she was fetching a new document every other second, but showed no interest in the images on those pages. The address traced back to a search engine. The log analyzer needed a few adjustments anyway, so this would be the perfect time to add some robot detection rules.

Match any of the following rules, and you are definitely a robot/web crawler (in case you didn't know that):

  1. Requesting /robots.txt
  2. More than 10 HTML files per GIF/JPEG file
  3. A total of more than 25 HTML files AND more than 6 files per minute
  4. A session length exceeding 10 hours
For quite some time now, the percentage of data handed out to non-human readers, has been pretty stable at approximately 8%. (Remember that the robots usually ignore images and such, so their share of HTML files is greater than that.) I'm not quite sure, if this spells victory or defeat. Of course: The CBS web is a major success - we have dozens of web spiders crawling all over us at any time. Or *pout* : Nobody likes me, all my guests are brainless robots. And they don't even bother to sign the guestbook.

Another thing: Sending the program a USR1 signal will cause it to dump the records to a text file, containing the IP-addresses of active users, their entry point, last page visited, total pages requested etc. The example text has been anonymized.


Installing and running the program. dylan.pl runs with no special privileges, assuming that your log file is world-readable.
However, it must be detached from the controlling terminal. Rather than demon-icing the whole thing, I prefer to start it as an "at" job. dylan.pl will fork a control program, dylanctl.pl, that monitors the log file. If the size of the log decreases, dylanctl will kill the main program, then execute it in its own place.
Caution: Use ps -ef | grep 'dylan' to make sure you only have one instance of dylan and dylanctl running.