Apache log to DB

My company used to download the Apache log file daily, parse it and then put it into the database. A script did this but as we added more servers, it became hard to maintain. I ran across Apache piped logs and found a way to store the log directly into DB.

This is what the httpd.conf file looks like:

# Set some variables to exclude certain cases

SetEnvIf Remote_Addr “123\.123\.0\.11″ dontlog
SetEnvIf Host “donotlog.com” dontlog
SetEnvIf Request_URI “\.gif|jpg|png|css|js$” dontlog
SetEnvIf Request_URI “/dontlog_folder” dontlog

# Log data: IP bytes protocol method host file uri status session_id user_agent referrer response_time datetime cookie
# delimiter [|]

LogFormat “%a[|]%B[|]%H[|]%m[|]%{Host}i[|]%f[|]%U%q[|]%>s[|]%{PHPSESSID}C[|]%{User-Agent}i[|]%{Referer}i[|]%D[|]%t[|]%{Cookie}i” chong

CustomLog “|/etc/httpd/logs/log2db.php” chong env=!dontlog
CustomLog “/etc/httpd/logs/access_excluded.log” chong env=donglog

Notice | in the last line. That pipe does the magic. This configuration will do these:

  1. It will log requesting IP, bytes, protocol and so forth as listed above. And pass that data to log2db.php file.
  2. It will exclude IP address of 123.123.0.11 from being logged.
  3. It will exclude donotlog.com. This will be useful if you want to exclude certain virtual hosts.
  4. it will exclude image files, style sheet and javascript file.
  5. It will exclude request that contains ‘/dontlog_folder’ directory
  6. Just in case, it will log all the excluded log data into access_excluded.log file.

Now, let’s create a simple script to insert these data to MySql db.

Create a table something like this:

CREATE TABLE `log_200707` (
`id` int(11) NOT NULL auto_increment,
`ip` varchar(32) NOT NULL default '',
`bytes` int(11) NOT NULL default '0',
`protocol` varchar(32) NOT NULL default '',
`method` varchar(8) NOT NULL default '',
`host` varchar(32) NOT NULL default '',
`file` varchar(255) NOT NULL default '',
`uri` varchar(255) NOT NULL default '',
`ssid` varchar(128) NOT NULL default '',
`useragent` varchar(255) NOT NULL default '',
`referrer` varchar(255) NOT NULL default '',
`restime` int(11) NOT NULL default '0',
`datetime` datetime NOT NULL default '0000-00-00 00:00:00',
`cookie_user_tr_id` int(11) NOT NULL default '0',
PRIMARY KEY  (`id`),
KEY `host` (`host`),
KEY `cookie_user_tr_id` (`cookie_user_tr_id`)
) TYPE=MyISAM;

Finally, here is a code snippet of log2db.php script:

#!/usr/bin/php
<?php

$fp = @fopen(’php://stdin’, ‘r’);
$line = @fgets($fp, 1024);

$data = explode(’[|]‘, $line);
if( 13 > count($data) ) exit(’Insufficient data. Check the log data’);

$link = mysql_pconnect(’123.123.0.000′, ‘weblog’, ‘password’);
if( ! $link ) exit(’DB connection failed’);

$db = mysql_select_db(’weblog’, $link);

$table = ‘log_’ . date(’Ym’);

// Our case, we needed to capture user tracking id

$cookie = ”;

if( $data[13] != ” && strpos($data[13], ‘user_trk_id’) !== false ) {
preg_match(’/user_trk_id=(d+)/’, $data[13], $match);
$cookie = $match[1];
}

// You may add a query to create a table. Monthly table is suffice for us. We decided to pre-create tables for better performance

$qry = ‘INSERT INTO ‘ . $table .
‘(ip, bytes, protocol, method, host, file, uri, status, ssid, useragent, referrer, restime, datetime, ut_uid)’ .
‘ VALUES (” . $data[0] . ”, ” . $data[1] . ”, ” . $data[2] . ”, ” . $data[3] . ”, ” . $data[4] . ”, ” . $data[5] . ”, ” . $data[6] . ”, ” . $data[7] . ”, ” . $data[8] . ”, ” . $data[9] . ”, ” . $data[10] . ”, ” . $data[11] . ”, NOW(), ”. $cookie .”)’;

$rs = mysql_unbuffered_query($qry, $link);

?>

You may implement this per-domain basis usting VirtualHost setting or be fancy with DB part but you get the idea. Another nagging headache has gone.

*Updated

1. Make sure logging PHP script’s permission. It should be executable by web server.

2. The script should live as long as http request. Imagine, a request to a URI can contain multiple files that needs to be logged. Implement while statement and make sure the script doesn’t miss out anything.

admin on July 6th 2007 in Life, Work-around

Trackback URI | Comments RSS

Leave a Reply

*
To prove you're a person (not a spam script), type the security text shown in the picture. Click here to regenerate some new text.
Click to hear an audio file of the anti-spam word