Improved RSS feed for Joost content update are now available

May 1, 2007

Hal Schechner has update his already nice RSS feed with a lot of nice features. The new feeds can be found here:

Note that you can change the “lastN=” to a different value to get more or less updates. The default value is set to 10 for now.

Hal has also made available an API to query his DB directly over the web. Here is what he has to say about it:

First off, “I have no affiliation with Joost etc…”. I’m just here for the fun.

In addition to the raw data below (which will be going away shortly, there’s not much need for it now), there is the ability to query my local database. This is still in the “experimental” stage, so if you see something strange and it _doesn’t_ go away after a few minutes, let me know.

I’ll document this further later, but look here for the table definitions.

You’ll note that not all of the content description are all that clean (and as a result, some won’t parse properly in some XML parsers (hence my calling it not-quite-xml below)). I know about this, and it’ll be fixed. I just opted to make this available sooner with issues, rather than pefect and later (potentially much later, since I’m only doing this for fun). I’ve stripped out HTML tags to make things friendlier (not more correct… The georestriction data is available and kind of slapped on the end of the description as a result), but some bad characters (& and friends) still exist. I’ll be fixing this soon(ish).

To query the database, pass an SQL query to query.cgi using the ‘query’ variable, for example:

example query


Some more examples as I come up with useful ones (as well as a place to enter your own for playing around) can be found here

The results will be returned in not-quite-XML (unless there’s an error, in which case the result is text/plain and is the error string). The not-quite-xml is a single “Query” element containing zero or more “Result” elements. Each “Result” element contains one element per column selected.

Both POST and GET are supported, but POST may do a better job for some queries. I tossed together a quick interface for querying here

Drop me an email at the address below with any questions/comments.

Some notes on how the database is managed:
Every so often (every other hour automatically, plus whenever I need to test the loading code) I scrape the Joost search site and parse out the data. Before touching the database, I “unvalidate” all programs and content (set the last_validated date to zero). Then I go through the content pieces, updating where necessary, adding where necessary, and in all cases (even if content hasn’t changed) I set the last_validated date to the current date/time. When done with everything, I assume any content piece that still has a last_validated of zero is gone, and I move it out of the channel/program table in to deleted_channel/deleted_program.

Inversely, when a channel/program is _added_, I drop all mentions of it in the deleted_* tables.

I don’t drop references to programs/channels in the mapping tables since IDs are preserved in the deleted_* tables, and these mappings can be used to look at channels/programs that no longer exist in Joost (Is this even useful?)

Say thank’s to Hal for bringing this nice interface to you! With it you can keep an eye on what is added daily on Joost!

Enjoy!

Related Posts

  • No Related Post

Comments

One Response to “Improved RSS feed for Joost content update are now available”

  1. Fix for Joost “Tvprunner has encountered problems and needs to close.” | JoostTeam.com on May 8th, 2007 7:00 pm

    [...] that brought the “new Joost content” RSS feed, is at it again. This time Hal is bringing a little application to help those poor [...]

Got something to say?

You must be logged in to post a comment.