As the year draws to a close, I think it might be fun to use a couple of online tools to look back at this year of blogging.
Wordle is a Java app that combines statistics and typography to give one a visual representation of word frequency in a sample of text. The app can download an RSS feed as a sample, or text can be pasted in. Since I wanted a representation of the entire year, I exported my posts from WordPress, and wrote a very basic C# program to extract the text and clean out HTML tags. (Yes, I could have done it in Perl in about ten minutes, but it’s important to continuously improve one’s skills.) I then pasted the output into Wordle and was rewarded with this.
What does this tell us? I suppose it tells us that I apparently use an unfortunate number of filler words. Also, I’m likely to say something like, “Well, now, think something, people!” I may also be interested in “another Christmas challenge,” subjects such as “Doctor Space,” and may agree that “Big GOOD!”
Fortunately, another site called Urlai can perform a more meaningful (if less beautiful) analysis of blog posts. Last time I visited that site, it guessed my posts to have been written by an elderly female. Let’s see if it’s changed its opinion!
electronic-replicant.com is probably written by a female somewhere between 66-100 years old. The writing style is personal and happy most of the time.
The analysis is 54% sure that I am a female, which is both wrong and only slightly more precise than a totally random guess. It further calculates a 30% chance that I am in the 66-100 age group. The correct age group, 36-55, scored only a 15% chance.
I’m curious as to how the site arrives at an age estimate. Is it based on vocabulary, sentence structure, or keyword frequency? One might assume that someone who says “one might assume” would find oneself toward the far end of the age continuum.
As for the happiness score, it is specific: electronic-replicant.com is the 9756th most happy blog of 17663 ranked. This puts my apparent happiness score somewhere near median happiness. A good thing?
A new feature on the site is an interactive tree that shows which words influenced the site’s decision.
Notable masculine words are: Ayn, android, atlas, Gibson, goggles, leaks, recursively, and sleuth.
Notable feminine words are: blankets, baking, catsup, mania, museum, shampoo, Victorian, and whitening.
I don’t mean to accuse Urlai of being sexist, but it seems to believe that men are more likely to discuss Objectivist Cyberpunk detective stories, and women are more likely to discuss household products.
I wonder about the overall accuracy of this site’s algorithm. For example, words that it identifies as happy are: honor, peace, and gift. That sounds about right. However, words that it identifies as upset are: sleep, doubt, air, if, and to.
So, maybe Urlai doesn’t tell me anything I don’t already know. There’s one more tool I’d like to try on my extracted text, I Write Like. Last time I tried it, it told me I wrote like Douglas Adams, but that’s probably because I said “Zaphod Beeblebrox” a few times.
This year, it says I write like H.P. Lovecraft.
Ugh. Mention but a single time the closed timelike curve of non-Euclidean geometry under my bed, and now it’s apparently all shoggoths all the time.