Analyzing Cloud Foundry Access Logs

September 23, 2017 4 mins read

Analyzing Cloud Foundry Access Logs

The gorouter component of Cloud Foundry routes all incoming HTTP requests to their target containers and writes all requests to an access log. Each access log entry produced by the gorouters contains a lot of information you would typically receive in other web server access logs as well. This blog post shows how to obtain the logs with either the BOSH CLI or the Cloud Foundry CLI and how to analyze them with goaccess.

Obtaining Cloud Foundry Access Logs

If you have administrative access to Cloud Foundry you can use the BOSH CLI to obtain the access logs for all apps running on Cloud Foundry. In case you are just interested in the access logs of a single application, you can use the CF CLI to obtain the logs.

Obtaining Access Logs with the BOSH CLI

Using the BOSH CLI you need to download the logs of each router instance separately:

# put all access logs into a separate folder
mkdir ./access-logs
$INDEX=0
bosh logs JOB $INDEX --dir ./access-logs

Put all the obtained router logs into the same directory so that you can can easily unpack and concatenate them to a single file for analysis with the following script:

for archive in $(find . -name 'router*.job.tgz'); do
    # extract all the logs
    destination=./logs/logs-$(basename $archive)
    echo "Unpacking $archive to $destination"
    mkdir -p ./$destination
    tar -xzvf $archive -C $destination

    # gunzip all individual access logs
    for accessLog in $(find $destination -name "access*.gz"); do
        echo "Unpacking $accessLog in place"
        gunzip $accessLog
    done

    # concatenate all access logs of the router
    accessLogCombined=$(echo $destination)-combined.log
    for accessLog in $(find $destination -name "access.log*"); do
        echo "Concatenating $accessLog to $accessLogCombined"
        cat $accessLog >> $accessLogCombined
    done
done

The above script creates a combined access log for each router instance in ./logs/<ROUTER>-combined.log. The combined log files serve as input for the next step, the report generation with goaccess.

Obtaining Access Logs with the CF CLI

You can use the CF CLI to access the individual logs of an application. As the application logs contain various types of log messages, we need to make sure to grab only those that are produced by the router component:

cf logs <APP_NAME> --recent \
    | grep RTR \
    | sed -E 's/^.*(OUT\|ERR)\s*(.*)$/\2/' \
    | awk 'NF' \
    > access-log-<APP_NAME>.log

The grep part keeps only logs produced by the routers, the sed part removes the log timestamp and keeps only the plain access log, while the awk part removes empty lines. The access-log-<APP_NAME>.log can now be used in the next step to generate a report with goaccess.

Generating a Report with goaccess

goaccess is a handy CLI tool to analyze access logs and provides a graphical HTML report (or ncurses if you want!) of the access logs. It ships with a set of predefined access logs formats, but provides CLI options to customize the access log format. To analyze the access logs generated in the previous steps, the following goaccess options are used:

../goaccess-1.2/goaccess.exe \
    access-log.log \
    --log-format '%v - [%dT%t.%^] "%m %U %H" %s %^ %b "%R" "%u" %^ %^ x_forwarded_for:~h{," } %^ %^ response_time:%T %^ %^ %^ %^ %^' \
    --time-format '%H:%M:%S' \
    --date-format '%Y-%m-%d' \
    --date-spec hr \
    --hour-spec min \
    --invalid-requests invalid-requests.log \
    -o report.html

The command generates the interactive report to report.html. Depending on the time range you are analyzing it may be useful to adjust the date-spec or hour-spec settings (as per the docs):

  • --hour-spec=<hour|min>: Set the time specificity to either hour (default) or min to display the tenth of an hour appended to the hour. This is used in the time distribution panel. It's useful for tracking peaks of traffic on your server at specific times.
  • --date-spec=<date|hr>: Set the date specificity to either date (default) or hr to display hours appended to the date. This is used in the visitors panel. It's useful for tracking visitors at the hour level. For instance, an hour specificity would yield to display traffic as 18/Dec/2010:19

And now, explore the interactive report with all its charts and options.

Comments

👋 I'd love to hear your opinion and experiences. Share your thoughts with a comment below!

comments powered by Disqus