Thingamablog Recovery

Okay, dear readers, here’s the scoop. As you will see to the right someplace, I use Thingamablog to publish The Electronic Replicant. It’s a Java program that stores your blog entries in a local database, then publishes them via FTP as static HTML files. This means that the resulting site is portable and universal, making it easy to move from server to server (which I plan to do soon, but that’s another story.) I also happen to like it since, as it’s a Java program, I can put it on a FAT32 partition on my PC and use it in both Windows and Linux. And if I share that partition, I can also use Thingamablog on my notebook elsewhere in the house, such as on the couch in front of the TV.

That’s where my trouble began. I don’t know exactly what happened, but after completing and uploading last Sunday’s post, I closed Thingamablog and told Windows to disconnect the mapped drive. Windows gave a warning about files being open, but since Thingamablog– and all other programs, for that matter– were closed, I assumed that Windows was simply being overprotective, and so I clicked OK. Later that night, I reopened Thingamablog on my PC, only to receive this nastygram:

Error Details...
java.lang.Exception
Unable to connect to database
net.sf.thingamablog.backend.HSQLDatabaseBackend.connectToDB(Unknown Source)
net.sf.thingamablog.gui.app.ThingamablogFrame$2.construct(Unknown Source)
net.sf.thingamablog.SwingWorker$2.run(Unknown Source)
java.lang.Thread.run(Unknown Source)

Apparently, that means the database is toast. I was faced with the tedious task of copying text from my previously-uploaded pages, pasting said text into the TAMB editor, and backdating the replacement posts to approximately the correct date and time. That is a job for a computer, not a human! And, since Thingamablog can import and export all of a blog’s posts via RSS, I decided to write a Perl script to munge my archives into an RSS feed. I am presenting this script here in its entirety, in the hope that it may be even a little bit useful to someone else in this position.

use File::Find;
use XML::RSS;
use Date::Manip;
&Date_Init("TZ=PST");
# Where are my files?
my $archive_dir = "c:\\erik\\archives";
my $rss_file = "c:\\erik\\rss.xml";
my $tempfile = "c:\\erik\\temp.txt";
# create an RSS 2.0 file
my $rss = new XML::RSS (version => '2.0');
$rss->channel(title => 'recovered blog', description => 'recovered blog' );
# enter directory containing downloaded archive folders
chdir ($archive_dir);
my @files_found = <*>;
my @search_dirs = ();
# make a list of the "year" folders
foreach $file (@files_found)
{
print $file . "\n";
if ($file =~ /\d{4}/)
{
push (@search_dirs, $file);
}
}
# Now process the files in there
find(\&wanted, @search_dirs);
# save RSS to a file
$rss->save($rss_file);
die("End");

sub wanted {
if ($_ =~ /html$/i)
{
my $file = $_;
print "opening $file . \n";

# How to find the data...
# Publication date be found within <h2 class="date">...<h2> and in the <div class="posted"> after at.
my $pubdate = "01-01-1980";
my $pubdate_open = "<h2 class=\"date\">";
my $pubdate_close = "</h2>";
# Title will be found within <h3 class="title">...</h3>
my $title = "default_title";
my $title_open = "<h3 class=\"title\">";
my $title_close = "</h3>";
# Description: everything after <h3 class="title">...</h3> and before <div class="posted">
my $description = "";
my $description_close = "<div class=\"posted\">";
# Category will be found within <div class="posted"> after Categories: inside <a href...>...</a>
my $category = "default_category";
my $permalink = "";

# Convert the raw HTML into something a little easier to parse
open (INFILE, $file);
open (OUTFILE, ">" . $tempfile);
while (<INFILE>)
{
#go thru file converting all whitespace characters to ' '
chomp;
s/\s+/ /g;
#if a '<' is encountered, print an "\n" then '<' then continue
s/</\n</g;
#if a '>' is encountered, print '<' then an "\n" then continue
s/>/>\n/g;
print OUTFILE $_;
}
close OUTFILE;
close INFILE;

#now go thru easy-to-read temp file searching for desired tags, which will be on individual lines
open (INFILE, $tempfile);

# get date
print "getting date... ";
do {$input = <INFILE>;}
until ((eof) or (uc($input) eq uc($pubdate_open) . "\n"));
$pubdate = <INFILE>;
print "$pubdate \n";
do {$input = <INFILE>; }
until ((eof) or (uc($input) eq uc($pubdate_close) . "\n"));

# get title
print "getting title... ";
do {$input = <INFILE>; }
until ((eof) or (uc($input) eq uc($title_open) . "\n"));
$title = <INFILE>;
print "$title \n";
do {$input = <INFILE>; }
until ((eof) or (uc($input) eq uc($title_close) . "\n"));

#get description
print "getting description...";
$input = "";
until ((eof) or (uc($input) eq uc($description_close) . "\n"))
{
$description .= $input;
$input = <INFILE>;
print ".";
}
print "\n";

#Get time
print "getting time...";
do {$input = <INFILE>; }
until ((eof) or (uc($input) eq uc(" Posted By ") . "\n"));
do {$input = <INFILE>; }
until ((eof) or (uc($input) eq uc("</a>") . "\n"));
$pubdate .= <INFILE>;
$date = ParseDate($pubdate);
#Put date/time into this format: Wed, 21 Feb 2007 24:57:14 -0800
$pubdate = UnixDate ($date, "%g");
print $pubdate . "\n";

#Get Category
print "getting category...";
do {$input = <INFILE>; }
until ((eof) or ($input =~ /Categories/i));
#discard href
$input = <infile>;
$category = <infile>;

print "\n";
close (INFILE);
$rss->add_item(title => $title, permaLink => $permalink, description => $description, pubDate=>$pubdate, $category=>$category);
}
}

I’m sure the real Perl wizards will scoff at this script, but at least it did work well enough for me to get my posts back. For some reason, it wouldn’t pull in the category, but I didn’t mind, since I’d been meaning to recategorize everything anyway. So I did. The next thing I had to do was to edit the default layout templates to resemble the ones I’d had before, but I didn’t mind that either, as I’d been wanting to make a few changes here and there.

Finally, since I now really wanted to make sure that I would never have to go through any of this again, I wrote a batch file that will launch Thingamablog, and then automatically back up the database afterward:

@ECHO OFF
set zip="c:\program files\7-zip\7z.exe"
set tambdir="c:\erik\tamb2\"
set archivedir="c:\erik\archive\"
set options=a -r -tzip -mx9
set thingamablog="c:\program files\thingamablog1\thingamablog.jar"

java -jar %thingamablog%

for /f "tokens=1-3 delims=/ " %%a in ("%DATE%") do (
set dd=%%a
set m
m=%%b
set yyyy=%%c)
set filename=%dd%%mm%%yyyy%.zip

cd %archivedir%
%zip% %options% %filename% %tambdir%

Related Posts Plugin for WordPress, Blogger...

2 thoughts on “Thingamablog Recovery”

  1. Hi, I came here through the Thingamablog Forums.
    I have 2 questions:
    Is Perl compatible with HTML (is the only thing I know! :D)?
    Where do I put all this code.

  2. Very! It used to be the scripting language of choice for Web servers. You can save the Perl script as a text file to your PC, but you’ll need a Perl interpreter to execute it. If you are a Windows user, check out ActiveState.

Comments are closed.