There are many Apache log analyzers that you can now choose from, but it can be long or confusing to install most of them. I wanted to try and find a simple log analyzer that just does its work using cronjobs. Visitors seems to fit the needs!
We'll also use ip2host to resolve the IP addresses into domain names.
All of this will be run daily by a cronjob.
Screenshot of a report generated by Visitors
Requirements
Here's what you need to keep going:
- Visitors: homepage
- ip2host: (I couldn't find the homepage) it can be downloaded from here
- cron, apache logs, ... all the obvious!
Instructions
First, we need to create a folder where to store the ip2host DNS cache file.
sudo mkdir /var/cache/ip2host/ |
Then open a new file /etc/cron.daily/visitors and you can put your own variant of the following code:
#!/bin/bash MYIP="99.99.99.99" # i want to exclude my home ip from the logs SERVERIP="222.222.222.222" # my server's ip REPORTDIR="/var/www/webstats" # folder where to store reports, this folder must exist ALOGDIR="/var/log/apache2" # folder containing the logs VISITORS="/usr/bin/visitors -A --exclude wp-cron.php --exclude robots.txt" # i exclude some files from the reports IP2H="ip2host --cache=/var/cache/ip2host/cache.db" GREPOPTIONS="-hv -e ^$MYIP -e ^$SERVERIP" # exclude my home ip and my server's ip from the logs # we create a tmp file that will hold the logs TMPFILE=$(mktemp) if [[ ! -f "$TMPFILE" ]]; then echo "tmpfile doesn't exist." exit 1 fi # if you only have one site, or you want all the logs in a single report /bin/grep $GREPOPTIONS $ALOGDIR/access*.log{.1,} 2>/dev/null > $TMPFILE # get all the logs into the tmpfile, notice the GREPOPTIONS variable. ($IP2H < $TMPFILE ) | $VISITORS --trails --prefix http://www.domain.com - > $REPORTDIR/stats.html # resolve all the ips and generate the reports, note that "--trails --prefix http://www.domain.com" is optional it's only needed for generating trails stats # -OR- # if you have multiple vhosts/prefixes and want separate reports, you can use this: # replace all the "www prefix1 prefix2 prefix3" by your own prefixes (as in http://PREFIX.domain.com) for name in www prefix1 prefix2 prefix3; do /bin/grep $GREPOPTIONS $ALOGDIR/access-$name.log{.1,} 2>/dev/null > $TMPFILE ($IP2H < $TMPFILE ) | $VISITORS --trails --prefix http://$name.domain.com - > $REPORTDIR/stats-$name.html done rm -f $TMPFILE |
If you use logrotate or other tool to rotate your logfiles, this cron job will use the last two log files (access*.log and access*.log.1). This usually means you get statistics for the current week and the last week altogether. And it gets updated everyday.
The first run might take some time as the ip2host cache needs to be built, but then it's very quick.
By tweaking REPORTDIR, you can put your reports so you can access them from the internet like http://www.domain.com/webstats. Note that you might need to secure this folder, but this is left as an exercise! (hint: htpasswd!)