(site name from Band Name Generator)

Welcome to the site

This is Jeff's personal blog / photoblog / link repository. See About for a little more information.

Latest

XCP (Xen Cloud Platform) lesson learned

How to PXE boot a paravirtualized guest in XCP 1.6.

1) Make an Etherboot ISO (etherboot.org)
2) Add the ISO to your XCP ISO store
3) Create a new VM, using whatever template you want and selecting the etherboot ISO (gpxe-something.iso) as the install CD.
4) UNCHECK “Start after provisioned” (or whatever it’s called)
5) Start the image from VM-> Start/Shut down->Start in Recovery Mode

This forces Xen to use some sort of crazy hypervirtualized BIOS that actually boots up Etherboot properly (otherwise, pygrub chokes on it).

Saving flash video in Windows

We ran into an issue where we needed to save the video streams from the BBC website (because our internet was slow, etc.) This was a bit of a new experience for me, and in the end the easiest free way I found to do it was:

1) Get RTMPDump for Windows
2) Get RTMPDumpHelper

Unzip these two archives into the same folder. Then, run rtmpdumphelper.exe . This will hook into any running browsers, and have them (via a Windows DLL shim) run rtmpsuck to proxy RTMP connections and save off any video. Your mileage may vary, but doing so created a .mp4 file in the application folder of any video, as soon as the video was started on the page. You might need to reload the webpage if it was started before you ran rtmpdumphelper

The current state of media playing on the PS3

So, if you’re me, then you use Twonky on a WD Live as a poor-man’s media server to play stuff on a PS3. Twonky is a capable piece of software (although overkill for what I use it for), and the PS3 is pretty decent at playing media. A problem arises because many HD videos today are in MKV (Matroska) format. The PS3 doesn’t handle that format very well, despite the fact that most MKV videos use H.264 video and AC3 audio, which the PS3 handles without trouble. So! On Linux, you need to do the following:


avconv -i SomeMediaFile.mkv -acodec copy -vcodec copy SomeMediaFile.mp4

this will (rather quickly) repack the ac3 and h.264 streams in a mkv file into the mp4 container format, which the PS3 handles without issue. Because it’s just a container format change, there should be no loss of quality and it should be comparatively speedy.

Fixing the NVidia 313.18 driver for Linux 3.7.6

I was running Linux 3.6 under Debian, when it got EOL’d from underneath me (apparently Linux now has extended support and short lifetime branches, who knew). I decided to upgrade to 3.7.6 instead of reverting to an extended support kernel because, well, science.

I’d been manually running NVidia’s driver installer vice attempting to use any Debian packaging wrappers, because while there is totally a “right way” to do things, in this case the “right way” kept me from having up-to-date drivers and was just shy of physically painful to do.

With 3.7.6 (this problem was not apparent on 3.7.5) the NVidia kernel module failed to build. I tracked this down to the creation and population of the /usr/src/linux/include/generated/uapi directory. Previously (3.6 branch and earlier) version.h would be generated, but stuff in /usr/src/linux/include.

Here is the patch to conftest.sh in the NVidia kernel module source that fixes the issue. To use this patch, do a driver install with the –no-kernel-module flag (this will extract the kernel module source but not try to compile it). Apply the patch to conftest.sh (manually is fine, it’s two lines) and then in the kernel source directory run “dkms install -m nvidia -v 313.18″ to build and install the module. It should be installed to /lib/modules/3.7.6/updates/dkms, and should be loadable with “modprobe nvidia”

The conftest.sh changes add /usr/src/linux/include/generated/uapi to the include file path for cc, and add /usr/src/linux/include/generated/uapi/linux to the list of locations search for a generated version.h (which NVidia’s script uses to determine whether the kernel source has been properly configured).

Logging from Apache to Cassandra via Flume

I needed to debug a WordPress plugin problem on ConDFW’s site, and while doing so I discovered my Apache 2.2 instance on FreeBSD 8 was not logging properly. Well, says I, why don’t I log to a database again? And instead of using mod_log_sql like a sane human, and like I’ve done in the past, I decided to peek into the exciting world of NoSQL.

First, choices. You need a NoSQL database. See http://en.wikipedia.org/wiki/NoSQL for a big long list, but the main choices are: key/value, graph, document, object, multivalue, and table. I went with Cassandra, a key/value DB. In retrospect, HBase might’ve been a better choice. http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis describes all of the common databases and their strengths/weaknesses.

Then you need a pipe. Two ways I found were Flume and Scribe. Scribe is a log aggregation system created by the Facebook team, while Flume was a tool created by Cloudera and then picked up by the Apache team. Scribe hasn’t seen commits to its github in forever (maybe it’s super stable?) and Flume allows some pretty robust manipulation of how your logs are picked up, modifications that are made in-flight, and where they end up. Honestly, I didn’t even read about Scribe until I was well into the Flume configuration.

OK, now you can get concerned with point A (Apache webserver, in my case) to point B (your NoSQL db). I decided to do things the “right way”. It’s entirely possible to have an exec source  in Flume that runs “tail -F” on your logfile (note the -F, this follows your log when the old inode is rotated off) of your Apache logs, but that does not provide any means for Flume to signal that it is, for instance, unable to keep up. I chose to use Flume’s spooling directory source which makes all sorts of promises about consistency, reliability, etc. The source expects files to appear in the spooldir atomically, that is as the result of a mv operation, and it expect unique filenames. After it’s done injesting the logfile, it renames it with .COMPLETED at the end, for easier collection. I decided to have Apache use its “rotatelogs” utility to generate a logfile at a short interval (I don’t want to wait forever for logs to appear in the db) that I then move to the spooldir for flume.
Apache httpd.conf:

# logs for Flume injestion. Custom log format for easier regex matching
LogFormat "%{UNIQUE_ID}e %v %h %l %u %{%Y%m%d%H%M%S}t %r %>s %b %{Referer}i %{User-Agent}i" vfcombined
CustomLog "|/usr/local/sbin/rotatelogs /www/logs/al_flume.%Y%m%d%H%M%S 60" vfcombined

(in the LogFormat, those are tabs separating the fields. It’s important(sh) ). So you see, I added UNIQUE_ID, an environment variable generated by mod_unique_id which is a module that seems to be built by default with Apache 2.2 (at least it was in my install, and I don’t remember adding it); and I changed the timestamp format to something more machine parseable. I also could’ve use epoch time, but this timestamp style allowed easier human verification that everything was working properly. The CustomLog directive pipes the log out to rotatelogs (part of Apache), with a particular filename template. I rotate the file every 60 seconds.
Movement into spooldir, flume_rot.sh:

#!/bin/sh

# current time minus 61 seconds (latest file we want to move)
# used as a filename and timestamp for a file (used for a find(8) comparison)

STAMP=`date -j -v-61S “+%Y%m%d%H%M.%S”`

# where rotatelogs is writing
SDIR=/www/logs

# where flume is picking up
DDIR=/www/logs.flume

# debugging
# echo “Date: ” `date`
# echo “Using: $STAMP”

# Create our comparison file
touch -t $STAMP $SDIR/$STAMP

# move older logs to flume spooldir
find $SDIR -name ‘al_flume.*’ ‘!’ -newer $SDIR/$STAMP -exec mv “{}” $DDIR/ \;

# cleanup touchfile
rm $SDIR/$STAMP

This code is called from cron every minute.

After I verified that was dropping logs and moving them properly, it was on to configuring Flume. A quick note, if you’re building 1.4.0 prerelease from source on FreeBSD (like I chose to do for some weird reason), then this might still be an issue for you. It’s literally a single space in a single file that keeps Flume from building, so seriously fuck everyone who only tests their Maven code on Linux. (well no, I understand why you don’t, but it’s hurtful). Also make sure you use -DskipTests at the end of your maven build line with Flume to skip tests (which broke for me in 1.4.0-pre)

Here’s a decent guide on getting started with Flume. After that, you need a “sink” from Flume to your database. I used a Cassandra sink available on github, and if you use it definitely read the README like five or six times. Because it explains most of the issues I had to debug with the sink. You can build it with:
mvn clean package -P assemble-artifacts

then untar the .tar.gz file in target/ (in my case flume-ng-cassandra-sink-1.0.0-SNAPSHOT-dist.tar.gz) into your flume/lib/ directory (wherever you installed Flume to).

Next up, your database. Nothing fancy here, I just installed Cassandra from the FreeBSD ports (/usr/ports/databases/cassandra). You’ll need to make your cluster and keyspace and column family names match with what is being used by the Cassandra sink. You can most easily change your cluster name by editing /usr/local/share/cassandra/conf/cassandra.yaml . I recommend avoiding any periods or spaces, I ended up going with cepheidORG. Copy src/main/resources/speed4j.properties from the flume-ng-cassandra source to flume/conf (in your Flume install directory). Edit it, changing Logging to whatever you named your cluster. Run zip -d lib/flume-ng-cassandra-sink-1.0.0-SNAPSHOT.jar speed4j.properties. to remove that configuration file from the jarfile. Go ahead and start Cassandra (/usr/local/etc/rc.d/cassandra start, after you’ve enabled in /etc/rc.conf). Log in and create a keyspace and column family for your logs.

On to Flume configuration! Here’s the relevant bit from my flume.conf:

# Agent
webserver.sources = spoolDirSrc
webserver.channels = fileChannel
webserver.sinks = cassandraSink

# Source spoolDirSrc
webserver.sources.spoolDirSrc.type = spooldir
webserver.sources.spoolDirSrc.channels = fileChannel
webserver.sources.spoolDirSrc.spoolDir = /www/logs.flume/
webserver.sources.spoolDirSrc.interceptors = logDeserial host src ts
# interceptor for log format parsing
# LogFormat “%{UNIQUE_ID}e %v %h %l %u %{%Y%m%d%H%M%S}t %r %>s %b %{Referer}i %{User-Agent}i” vfcombined
webserver.sources.spoolDirSrc.interceptors.logDeserial.type = regex_extractor
webserver.sources.spoolDirSrc.interceptors.logDeserial.regex = ^([^\\t]*)\\t([^\\t]*)\\t([^\\t]*)\\t([^\\t]*)\\t([^\\t]*)\\t([^\\t]*)\\t([^\\t]*)\\t([^\\t]*)\\t([^\\t]*)\\t([^\\t]*)
webserver.sources.spoolDirSrc.interceptors.logDeserial.serializers = s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11
webserver.sources.spoolDirSrc.interceptors.logDeserial.serializers.s1.name = key
webserver.sources.spoolDirSrc.interceptors.logDeserial.serializers.s2.name = hostname
webserver.sources.spoolDirSrc.interceptors.logDeserial.serializers.s3.name = remotehost
webserver.sources.spoolDirSrc.interceptors.logDeserial.serializers.s4.name = remoteident
webserver.sources.spoolDirSrc.interceptors.logDeserial.serializers.s5.name = remoteuser
webserver.sources.spoolDirSrc.interceptors.logDeserial.serializers.s6.type = org.apache.flume.interceptor.RegexExtractorInterceptorMillisSerializer
webserver.sources.spoolDirSrc.interceptors.logDeserial.serializers.s6.name = timestamp
webserver.sources.spoolDirSrc.interceptors.logDeserial.serializers.s6.pattern = yyyyMMddHHmmss
webserver.sources.spoolDirSrc.interceptors.logDeserial.serializers.s7.name = request
webserver.sources.spoolDirSrc.interceptors.logDeserial.serializers.s8.name = requeststatus
webserver.sources.spoolDirSrc.interceptors.logDeserial.serializers.s9.name = responsebytes
webserver.sources.spoolDirSrc.interceptors.logDeserial.serializers.s10.name = referrer
webserver.sources.spoolDirSrc.interceptors.logDeserial.serializers.s11.name = agent
# interceptor to insert host field (static to this source/spooldir)
webserver.sources.spoolDirSrc.interceptors.host.type = static
webserver.sources.spoolDirSrc.interceptors.host.preserveExisting = false
webserver.sources.spoolDirSrc.interceptors.host.key = host
webserver.sources.spoolDirSrc.interceptors.host.value = www.cepheid.org
# interceptor to insert src field (static to this application)
webserver.sources.spoolDirSrc.interceptors.src.type = static
webserver.sources.spoolDirSrc.interceptors.src.preserveExisting = false
webserver.sources.spoolDirSrc.interceptors.src.key = src
webserver.sources.spoolDirSrc.interceptors.src.value = apacheWWW

# last chance timestamp
webserver.sources.spoolDirSrc.interceptors.ts.type = timestamp
webserver.sources.spoolDirSrc.interceptors.ts.preserveExisting = true

# Sink definitions
# loggerSink (testing)
webserver.sinks.loggerSink.type = logger
webserver.sinks.loggerSink.channel = fileChannel
# cassandraSink
webserver.sinks.cassandraSink.type = com.btoddb.flume.sinks.cassandra.CassandraSink
webserver.sinks.cassandraSink.channel = fileChannel
webserver.sinks.cassandraSink.hosts = localhost
webserver.sinks.cassandraSink.cluster-name = CepheidORG
webserver.sinks.cassandraSink.keyspace-name = ApacheLogs
webserver.sinks.cassandraSink.records-colfam = Requests

# Channel definitions
# durable channel
webserver.channels.fileChannel.type = file
webserver.channels.fileChannel.checkpointDir = /var/spool/flume/fchans/fchan1/checkpoint
webserver.channels.fileChannel.dataDirs = /var/spool/flume/fchans/fchan1/data
# non-durable channel
webserver.channels.memoryChannel.type = memory
webserver.channels.memoryChannel.capacity = 100

Things of note: First, read through the Flume configuration guide to get a basic idea of the format. Second, while I used a file channel, it very much wasn’t necessary. The configuration has a memory channel defined that would work as well, but to use my file channel you need to “mkdir -p /var/spool/flume/fchans/fchan1/checkpoint; mkdir -p /var/spool/flume/fchans/fchan1/data”. Third, I used the regex_extractor interceptor to pull out the UNIQUE_ID and timestamp of the request, but the other fields I populate are not used by the Cassandra sink, in the end.

If Apache’s throwing logs in the spool directory, and Cassandra seems to be up and responding to the CLI, it’s time to hook them together. Start Flume with /usr/local/flume/bin/flume-ng agent -c /usr/local/flume/conf/ -f /usr/local/flume/conf/flume.conf -n webserver -Dflume.root.logger=DEBUG,console &> logs/flume.log , and you will be able to pull up logs/flume.log for extensive debugging information. If everything is going well, you should see your spool files change their names to .COMPLETED, and you should be able to run cqlsh and see log entries with:

use ApacheLogs;
select * from Requests;

(changing whatever your keyspace and column family are)

Good luck!

What’s been going on?

I haven’t really touched this blog much since restarting it back in 2011. Nature of the beast, really.

I got out of the Marines in October of this year, so I’m looking for work. Got some resumes out, so we’ll see. Alicia’s still in the Air Force which put some limits on what jobs I go after.

Got a cheap desktop computer (pretty close to the specs of the Toms Hardware $500 gaming box) and put Debian on it. Been messing with Virtualbox and getting the Linux running back in the fingers.

I took a ton of photos while Alicia and I were on our cruise of the South Pacific, They’re up in separate folders on https://picasaweb.google.com/112184154457467328981

World of Coca-Cola World

There’s some confusion about what the proper name is for the Coca-Cola World shrine to cola in Atlanta, GA.

We visited around early April, and I finally got around to uploading the photos to my Picasaweb site. Enjoy!

World of Coca-Cola, 2011

These pipes are clean!

Reactivated the www.cepheid.org/~jeff site after a long hiatus. Truthfully I hadn’t been updating it much, and when I used a subdirectory to store some files that got indexed by Google, well, it became time to take the site down.

But recently I noticed that publishing anything on Facebook lets it get lost. There’s no great way to go back to your old Facebook posts, tag them by category, and look at information / links / photos you put up in the past.

So, the recreation of my blag. Narcissistic in the extreme, the purpose is really just to keep stuff around for my failing memory. See the About page if you’re curious about me…although if you’re curious about me, I don’t know how you came across this site.