Dan Knight
- 2003.02.27
It's been a long time since I've added any automation to the
site, although I have made a few tweaks to existing automation, but
this week we added one more significant piece to our ongoing site
automation project, an automatically generated RSS news feed.
What's RSS?
An RSS (Rich Site Summary) news feed is a text file that others
can use to find out what's new on your website. There's a standard
format for these documents, which includes things like the domain
name and URL, email for the webmaster, when the latest update took
place, along with article titles, their URL, and usually a brief
description or teaser.
There's a good introduction to them in Using RSS News
Feeds.
I've been generating RSS news feeds for Low End Mac for a long
time. Quite frankly, it's a tedious business. I've been slacking
off on doing them because of the time involved. What time I have,
I'd rather spend writing, editing, designing, trying to keep on on
email.
And after about an hour or so of coding and debugging, I don't
have to manually create an RSS feed any longer. PHP uses the same
MySQL database that already tracks site content to generate a news
feed.
Here's a snippet of the last one I created manually:
<?xml version="1.0"?>
<!DOCTYPE rss SYSTEM "http://my.netscape.com/publish/formats/rss-0.91.dtd">
<rss version="0.91">
<channel>
<copyright>Copyright 1997-2004 Cobweb Publishing Inc</copyright>
<pubDate>Fri, 14 Feb 2003 14:00:00 EDT</pubDate>
<description>Resources for Mac users.</description>
<link>http://lowendmac.com/</link>
<title>Low End Mac</title>
<image>
<link>http://lowendmac.com/</link>
<title>Low End Mac</title>
<url>http://lowendmac.com/88x31.gif</url>
<height>31</height>
<width>88</width>
</image>
<webMaster>webmaster@lowendmac.com (Dan Knight)</webMaster>
<managingEditor>webmaster@lowendmac.com (Dan Knight)</managingEditor>
<language>en-us</language>
<skipHours>
<hour>1</hour>
<hour>24</hour>
</skipHours>
<skipDays>
<day>Saturday</day>
<day>Sunday</day>
</skipDays>
<item>
<title>Windows, Macs, OS X, and Real World Performance, Adam Robert Guha,
Apple Archive</title>
<link>http://lowendmac.com/archive/03/0214.html</link>
<description>As Mac OS X improves and hardware gets faster, complaints of
sluggishness will become nothing but a memory.</description>
</item>
<item>
<title>Safari Update, Mac OS X 10.2.4, a Neat Haxie, and How Mail Can
Better Fight Spam, Dan Knight, 10 Forward</title>
<link>http://lowendmac.com/2003/safari-update-mac-os-x-10-2-4-a-neat-haxie-and-how-mail-can-better-fight-spam/</link>
<description>Safari mostly improved but adds a glitch, 10.2.4 seems just
fine, a better CPU monitor, and ways Apple can leverage Mail to better
fight spam.</description>
</item>
</channel>
</rss>
The header only has one tricky part - the date. Everything else
up there remains the same. The items change each day that we add
new content. The tags pretty much explain the parts, which is a
very nice feature of an RSS news feed.
The Program
I'm not going to give you the whole program. Program code will
be set in monospace type and comments will be in regular type.
<?php
$db_server = "yourserver.com";
$db_username = "userID";
$db_password = "password";
$db_name = "mydata";
$connection = @mysql_connect($db_server,$db_username,$db_password);
mysql_select_db($db_name,$connection);
$filepointer = fopen("/usr/local/apache/sites/mydomain.com/htdocs/rss.txt", "w");
This was one of the tricky parts. Since the RSS newsfeed file,
rss.txt, is in the root directory, I had to find out from our
system administrator just how to write the file to the correct
spot. During testing, I wrote it to the same directory I was
writing the script in. (The \r indicates a return.)
We also had to have things set up on the server so we had
permission to write the file. Without that, none of this works.
fputs ($filepointer, "<?xml version='1.0'?>\r<!DOCTYPE rss SYSTEM
'http://my.netscape.com/publish/formats/rss-0.91.dtd'>\r<rss version='0.91'>\r
<channel>\r <copyright>Copyright 1997-2004 Cobweb Publishing Inc</copyright>
\r <pubDate>");
The above line starts writing the text file with our news feed.
This part only changes once a year as we update the copyright
notice.
The next chunk of code is used to post the correct date and time
in our file:
$rightnow = date(U);
$latestdate = mysql_fetch_array(mysql_query("SELECT * FROM links
WHERE timestamp <= $rightnow ORDER BY timestamp DESC LIMIT 1"));
fputs ($filepointer, date("D, d M Y h:i:s T", $latestdate[timestamp]));
The first line checks the current time and compares this with
the latest timestamp in our MySQL database that's earlier than now.
(We do this to assure that content scheduled for later release
doesn't get posted in the news feed early.)
We then post the day, date, time, and time zone as the
publication date. This will always match the moment when the most
recent article went live.
- UPDATE: The format we originally presented was incorrect and
included am/pm between the time and time zone - h:i:s a T
- which produces an invalid RSS feed. Please use <http://feeds.archive.org/validator/>
to validate your feed before sharing it with the world.
Next we close the publication date, describe the site, provide
the base URL for Low End Mac as well as the site's name, and
provide the URL and dimensions for a site graphic.
fputs ($filepointer, "</pubDate>\r <description>Resources for Mac users.
</description>\r <link>http://lowendmac.com/</link>\r <title>Low End Mac
</title>\r<image>\r <link>http://lowendmac.com/</link>\r <title>Low End
Mac</title>\r <url>http://lowendmac.com/88x31.gif</url>\r <height>31
</height>\r <width>88</width>\r</image>\r
Here we provide the email address for the webmaster and managing
editor, which in our case is the same person. Someone having a
problem with the news feed could use this to contact us and let us
know about the problem.
<webMaster>webmaster@lowendmac.com (Dan Knight)</webMaster>\r
<managingEditor>webmaster@lowendmac.com (Dan Knight)</managingEditor>\r
Low End Mac is published in English, and most of our writers use
US English. So we cover that next.
<language>en-us</language>\r
We can tell what hours and days the site isn't updated, which is
done in GMT. For those outside the UK, this can involve some head
scratching. Basically we don't publish new articles between 9:00
p.m. and 6:00 a.m. (We try to have our publishing day done by noon
Eastern Time to make time for lunch, email, and my other job.)
<skipHours>\r
<hour>1</hour>\r <hour>2</hour>\r <hour>3</hour>\r <hour>4</hour>\r
<hour>5</hour>\r <hour>6</hour>\r <hour>7</hour>\r <hour>8</hour>\r
<hour>23</hour>\r <hour>24</hour>\r </skipHours>\r <skipDays>\r
<day>Saturday</day>\r <day>Sunday</day>\r </skipDays>\r");
The Meat
That's all header. Except for the time stamp, it doesn't change.
From here on, we're creating content based on information in our
MySQL database.
First up, get today's articles, starting with the most recently
published:
$get_links = mysql_query("SELECT * FROM links
WHERE pubdate = '$latestdate[pubdate]' and timestamp <= $rightnow
ORDER BY rank DESC");
while ($array = mysql_fetch_array($get_links))
{
fputs ($filepointer, "<item>\r<title>$array[linktext]");
if ($array[author]<>"")
{fputs ($filepointer, ", $array[author]");}
if ($array[columnname]<>"")
{fputs ($filepointer, ", $array[columnname]");}
fputs ($filepointer, "</title>\r<link>http://lowendmac.com
$array[path]$array[html]</link>\r<description>$array[description]
</description>\r</item>\r\r");
}
This is pretty similar to the code we use to display pages
throughout the site. Instead of using just the title of the article
as a link, we also include the name of the author and the column
name as part of the title.
// This Date in LEM History
$thisdate = date(md);
$thisrecord = date(nd);
$today = date("F j");
$get_links = mysql_fetch_array(mysql_query("SELECT * FROM lemhistory
WHERE datefield = '$thisrecord'"));
if ($get_links <> "")
{fputs ($filepointer, "<item>\r<title>$today in LEM history</title>
\r<link>http://lowendmac.com/arc/$thisdate.html</link>\r<description>
$get_links[stories]</description>\r</item>\r\r");}
Following the new articles, we check our database to see if we
have an archive covering this date in Mac and LEM History. Most
days we do, but about once a month there's a date we still haven't
posted new content on. Except for holidays, most of those will
disappear as we add content almost every weekday.
// Previous Day's Links
$previous_date = mysql_fetch_array(mysql_query("SELECT * FROM links
WHERE pubdate < '$latestdate[pubdate]' ORDER BY timestamp DESC LIMIT 1"));
$previous_links = mysql_query("SELECT * FROM links
WHERE pubdate = '$previous_date[pubdate]' ORDER BY clicks DESC");
while ($previous_array = mysql_fetch_array($previous_links))
{
fputs ($filepointer, "<item>\r<title>$previous_array[linktext]");
if ($previous_array[author]<>"")
{fputs ($filepointer, ", $previous_array[author]");}
if ($previous_array[columnname]<>"")
{fputs ($filepointer, ", $previous_array[columnname]");}
fputs ($filepointer, "</title>\r<link>http://lowendmac.com
$previous_array[path]$previous_array[html]</link>\r<description>
$previous_array[description]</description>\r</item>\r\r");
}
The above code checks for the next previous date in our
database, giving it the flexibility to work around holidays and
weekends. It then lists that day's content ranked by popularity.
The article that received the most traffic is listed first, working
down to the best deals of the day, where we don't bother counting
clicks.
We modify the above code once more to provide a third day in our
feed, and then we close the text file.
fputs ($filepointer, "</channel>\r</rss>");
fclose ($filepointer);
?>
To read the current rss.txt file, click
here. And if you're using OS X, download NetNewsWire Lite, add Low End
Mac, and let software tell you when your favorite websites have new
content.
As I write this, we're running the script manually, but the
folks at BackBeat will be setting this up as a cron job that will
automatically run every 5 minutes. This is the kind of tedium
computers were designed for!
Other Updates
A couple weeks ago we changed the way we list our new content on
the LEM home page. Instead of a bullet list, we display the article
title using the term style (<dl>) in bold type, followed by
the author, column name, and publication date in italic and a
description in plain text. The latter two lines are styled as
definition (<dd>).
We've received very positive feedback on the change. Here's an
example from yesterday:
- Switching from Mac to Windows,
really clearing a hard drive, OS X and viruses, RAM disks, and
more
- Dan Knight, Low End Mac Mailbag, 02.26
- More on replacing Claris Home Page, printing from a classic
Mac, getting a CD-RW drive to work with a Performa, SpamBouncer,
and why some G4s won't work in G3s.
We also discovered a problem one Sunday afternoon when we
discovered Monday content was being displayed below Friday's
content. Looking over the PHP script, we found that we were looking
for the most recent date != (PHP's way of saying "not
equal") to the most recent date matching or before today's date.
Changing that to < solved the problem.
I'm still far from an expert on PHP, and I'd be lost without
some assistance from Dave Hamilton at BackBeat Media (they handle
our ads and our servers) and two of my sons. Brian and Stephen's
help has been invaluable at finding typos and other problems to
crop up so easily. You'll be able to check out their programming
abilities soon when the launch their third virtual pet, second
robotic virtual pet, and ultimate virtual robot pet site.
Next project: Using PHP and MySQL to generate our daily news
releases.