Apache2 Version Analysis
During a recent attempt at answering the Honeynet Log Mysteries Challenge, I wrote a series of reasoned analyses for the supplied Honeynet logging data. Unfortunately, teaching workloads stopped me from submitting any realistic challenge answer.
Inspired by the idea of applying the Scientific Method to Digital Forensics (see Casey2009 and Carrier2006), I set about attempting to apply the same principles to analysing the Log Mysteries data sets.
Using just the
apache2/www-* logs from the Log Mysteries Honeynet challenge, this blog post demonstrates how we can define upper bounds on the version of Apache2 used and, more interestingly, data regarding Apache's worker threads. We are also able to establish how to obtain the log events with microsecond (instead of just second) timestamp accuracy.
These surprising results (well they were to me!) arise because the Apache2
LogFormat directive had been customised to include the contents of the environment variable
UNIQUE_ID (which in turn has had its value set by the Apache2 module
mod_unique_id). By examining source code changes to the underlying module, one is then able to deduce that Apache2 is at revision 420983 (ie. release version 2.2.2) or below.
Using our Apache2 revision number estimate now allows us to correctly decode the
UNIQUE_ID value to extract:
- the Apache worker thread ID (as present in the Apache2 score board data structure) that handled the request
- the web server process ID for the worker thread that handled the request
- a 4 byte timestamp value that is derived from the time that the request was received
- a 2 byte counter value that is initialised (when the worker thread runs for the first time) from the current time in microseconds and then incremented whenever the worker thread handles a new request
- and the IP address of the web server handling the request.
apache2/www-*files, we can now determine that the largest recorded
UNIQUE_IDtimestamp value is 4281. If Apache2 were at a revision number > 420983 then these timestamp values would be close (ie. we'd expect them to be within ±1 second) to the logged events observed timestamp value (expressed as seconds from UNIX Epoch).
As this is not what we observe, then we may estimate 420983 (ie. release version 2.2.2 - see the Apache2 tags link) as our upper bound on the Apache2 revision number.
If we take the
apache2/www-access.log log line:
10.0.1.2 - - [19/Apr/2010:06:36:15 -0700] "GET /feed/ HTTP/1.1" 200 16605 "-" "Apple-PubSub/65.12.1" C4nE4goAAQ4AAEP1Dh8AAAAA 3822005
C4nE4goAAQ4AAEP1Dh8AAAAA, which in turn provides us with the following data regarding the Apache2 worker thread that handled the request:
- PID is 17397 and the (scoreboard) thread index is 0
- a timestamp value of 193
- a counter value of 3615
- the web server that handled the request is at 10.0.1.14.
UNIQUE_IDvalue has been encoded using a revision of
mod_unique_id.cprior to 420983.
mod_unique_id.c code present in revision 420983 was used to generate our
UNIQUE_ID values, then we may additionally estimate our observed timestamp values to microsecond accuracy.
We now assume that revision 420983 of
mod_unique_id.c was used. Thus, 193 is the number of microseconds past 1271684175 seconds at which our log event was observed. In other words, our log event was actually received at 1271684175.000193 seconds past the UNIX Epoch.
At this point, the reader may be interested to know that the previous estimates on a version for Apache2 actually introduces a subtle error! In the next blog post, we'll rework our logical reasoning (with a dash of data visualisation) to locate and fix this error - in the meantime, the reader is invited to try and locate that reasoning error. Future blog posts will focus on using data visualisation and statistical analysis techniques to further analyse the Honeynet logging data.
Some Technical Details
The mod_unique_id documentation informs us that the tuple
( ip_addr, pid, time_stamp, counter ), via an algorithm similar to MIME base64, is encoded as a 19 character string using the characters
[A-Za-z0-9+/]. The resulting value is placed in the
UNIQUE_ID environment variable.
Viewing the subversion revision log for
mod_unique_id.c, we see that revision 596448 is the latest version of code that our server could have used (this is based on the revision log timestamps and that the last Apache log entry timestamp, in the
apache2/www-* log files, is 01:52:24 on the 25th April 2010 UTC).
In viewing revision 596448 of the mod_unique_id source code, we notice (see lines 56 to 103) that the tuple
( time_stamp, in_addr, pid, counter, thread_index ) is in fact used to generate our
UNIQUE_ID value - this explains why the
UNIQUE_ID value is in fact 24 characters in length and not 19 (BTW, revision 981084 fixes the incorrectly commented code).
According to the source code (see function
unique_id_global_init), the size of the
UNIQUE_ID string is:
unsigned int)) * 8 + 5) div 6
= ((4+4+4+2+4) * 8 + 5) div 6
= 24 characters
From the source code (see function
gen_unique_id) we also find that a standard MIME base64 encoding is used followed by translating the '
+' character to '
@' and the '
/' character to '
-'. This allows us to easily reverse engineer our
UNIQUE_ID values and so extract the original input tuple.
In examining the revision log for mod_unique_id.c we have that:
- revision 596448 differs from revision 420893 in the way in which it handles the timestamp value within the function
- revision 420893 differs from revision 596059 by changes to the license comments and in the way that it handles calculating
UNIQUE_IDover frequent process restarts (ie. within < 1 second of the previous restart)
- revisions prior to 596059 (all of which are dated July 2002 or earlier) alter the code in multiple ways.
- revision 420893 and 596448 both use the C++ expression
r->request_timeto extract the
apr_time_tstruct that encodes the time at which the request was received
- revision 596448 uses the C++ function
apr_time_secto extract (from the
apr_time_tstruct) the time at which the request was received in seconds as our timestamp value (modulo 232)
- revision 420893 uses the first 4 bytes of the
apr_time_tstruct (ie. the
tm_usecfield or the microseconds component of the time at which the request was received) as our timestamp value.
UNIQUE_IDvalue is at a revision > 420893 or not.