Improved RSS feed for Joost content update are now available
May 1, 2007
Hal Schechner has update his already nice RSS feed with a lot of nice features. The new feeds can be found here:
Note that you can change the “lastN=” to a different value to get more or less updates. The default value is set to 10 for now.
Hal has also made available an API to query his DB directly over the web. Here is what he has to say about it:
First off, “I have no affiliation with Joost etc…”. I’m just here for the fun.
In addition to the raw data below (which will be going away shortly, there’s not much need for it now), there is the ability to query my local database. This is still in the “experimental” stage, so if you see something strange and it _doesn’t_ go away after a few minutes, let me know.
I’ll document this further later, but look here for the table definitions.
You’ll note that not all of the content description are all that clean (and as a result, some won’t parse properly in some XML parsers (hence my calling it not-quite-xml below)). I know about this, and it’ll be fixed. I just opted to make this available sooner with issues, rather than pefect and later (potentially much later, since I’m only doing this for fun). I’ve stripped out HTML tags to make things friendlier (not more correct… The georestriction data is available and kind of slapped on the end of the description as a result), but some bad characters (& and friends) still exist. I’ll be fixing this soon(ish).
To query the database, pass an SQL query to query.cgi using the ‘query’ variable, for example:
Some more examples as I come up with useful ones (as well as a place to enter your own for playing around) can be found hereThe results will be returned in not-quite-XML (unless there’s an error, in which case the result is text/plain and is the error string). The not-quite-xml is a single “Query” element containing zero or more “Result” elements. Each “Result” element contains one element per column selected.
Both POST and GET are supported, but POST may do a better job for some queries. I tossed together a quick interface for querying here
Drop me an email at the address below with any questions/comments.
Some notes on how the database is managed:
Every so often (every other hour automatically, plus whenever I need to test the loading code) I scrape the Joost search site and parse out the data. Before touching the database, I “unvalidate” all programs and content (set the last_validated date to zero). Then I go through the content pieces, updating where necessary, adding where necessary, and in all cases (even if content hasn’t changed) I set the last_validated date to the current date/time. When done with everything, I assume any content piece that still has a last_validated of zero is gone, and I move it out of the channel/program table in to deleted_channel/deleted_program.Inversely, when a channel/program is _added_, I drop all mentions of it in the deleted_* tables.
I don’t drop references to programs/channels in the mapping tables since IDs are preserved in the deleted_* tables, and these mappings can be used to look at channels/programs that no longer exist in Joost (Is this even useful?)
Say thank’s to Hal for bringing this nice interface to you! With it you can keep an eye on what is added daily on Joost!
Enjoy!
Related Posts
Comments
One Response to “Improved RSS feed for Joost content update are now available”
Got something to say?
You must be logged in to post a comment.



[...] that brought the “new Joost content” RSS feed, is at it again. This time Hal is bringing a little application to help those poor [...]