Apache log to DB
My company used to download the Apache log file daily, parse it and then put it into the database. A script did this but as we added more servers, it became hard to maintain. I ran across Apache piped logs and found a way to store the log directly into DB.
This is what the httpd.conf file looks like:
# Set some variables to exclude certain cases
SetEnvIf Remote_Addr “123\.123\.0\.11″ dontlog
SetEnvIf Host “donotlog.com” dontlog
SetEnvIf Request_URI “\.gif|jpg|png|css|js$” dontlog
SetEnvIf Request_URI “/dontlog_folder” dontlog# Log data: IP bytes protocol method host file uri status session_id user_agent referrer response_time datetime cookie
# delimiter [|]LogFormat “%a[|]%B[|]%H[|]%m[|]%{Host}i[|]%f[|]%U%q[|]%>s[|]%{PHPSESSID}C[|]%{User-Agent}i[|]%{Referer}i[|]%D[|]%t[|]%{Cookie}i” chong
CustomLog “|/etc/httpd/logs/log2db.php” chong env=!dontlog
CustomLog “/etc/httpd/logs/access_excluded.log” chong env=donglog
Notice | in the last line. That pipe does the magic. This configuration will do these:
- It will log requesting IP, bytes, protocol and so forth as listed above. And pass that data to log2db.php file.
- It will exclude IP address of 123.123.0.11 from being logged.
- It will exclude donotlog.com. This will be useful if you want to exclude certain virtual hosts.
- it will exclude image files, style sheet and javascript file.
- It will exclude request that contains ‘/dontlog_folder’ directory
- Just in case, it will log all the excluded log data into access_excluded.log file.
Now, let’s create a simple script to insert these data to MySql db.
Create a table something like this:
CREATE TABLE `log_200707` ( `id` int(11) NOT NULL auto_increment, `ip` varchar(32) NOT NULL default '', `bytes` int(11) NOT NULL default '0', `protocol` varchar(32) NOT NULL default '', `method` varchar(8) NOT NULL default '', `host` varchar(32) NOT NULL default '', `file` varchar(255) NOT NULL default '', `uri` varchar(255) NOT NULL default '', `ssid` varchar(128) NOT NULL default '', `useragent` varchar(255) NOT NULL default '', `referrer` varchar(255) NOT NULL default '', `restime` int(11) NOT NULL default '0', `datetime` datetime NOT NULL default '0000-00-00 00:00:00', `cookie_user_tr_id` int(11) NOT NULL default '0', PRIMARY KEY (`id`), KEY `host` (`host`), KEY `cookie_user_tr_id` (`cookie_user_tr_id`) ) TYPE=MyISAM;
Finally, here is a code snippet of log2db.php script:
#!/usr/bin/php<?php$fp = @fopen(’php://stdin’, ‘r’);
$line = @fgets($fp, 1024);$data = explode(’[|]‘, $line);
if( 13 > count($data) ) exit(’Insufficient data. Check the log data’);$link = mysql_pconnect(’123.123.0.000′, ‘weblog’, ‘password’);
if( ! $link ) exit(’DB connection failed’);$db = mysql_select_db(’weblog’, $link);
$table = ‘log_’ . date(’Ym’);
// Our case, we needed to capture user tracking id
$cookie = ”;
if( $data[13] != ” && strpos($data[13], ‘user_trk_id’) !== false ) {
preg_match(’/user_trk_id=(d+)/’, $data[13], $match);
$cookie = $match[1];
}// You may add a query to create a table. Monthly table is suffice for us. We decided to pre-create tables for better performance
$qry = ‘INSERT INTO ‘ . $table .
‘(ip, bytes, protocol, method, host, file, uri, status, ssid, useragent, referrer, restime, datetime, ut_uid)’ .
‘ VALUES (” . $data[0] . ”, ” . $data[1] . ”, ” . $data[2] . ”, ” . $data[3] . ”, ” . $data[4] . ”, ” . $data[5] . ”, ” . $data[6] . ”, ” . $data[7] . ”, ” . $data[8] . ”, ” . $data[9] . ”, ” . $data[10] . ”, ” . $data[11] . ”, NOW(), ”. $cookie .”)’;$rs = mysql_unbuffered_query($qry, $link);
?>
You may implement this per-domain basis usting VirtualHost setting or be fancy with DB part but you get the idea. Another nagging headache has gone.
*Updated
1. Make sure logging PHP script’s permission. It should be executable by web server.
2. The script should live as long as http request. Imagine, a request to a URI can contain multiple files that needs to be logged. Implement while statement and make sure the script doesn’t miss out anything.
admin on July 6th 2007 in Life, Work-around