OPML and work with the Project P

I have worked some time on something that I plan to release to the public, at the moment without a time frame but with good progress.

Here is a few fruits of this effort;
b19.se/data/sr-pod.opml – OPML for SverigesRadio (Swedish Public Radio) podcast feeds
b19.se/data/nrk-pod.opml – OPML for NRK (Norwegian Public Radio) podcast feeds
b19.se/data/dr-pod.opml – OPML for DR (Danish Public Radio) podcast feeds
b19.se/data/bbc-pod.opml – OPML for BBC (GB/UK Public Radio) podcast feeds
b19.se/data/npr-pod.opml – OPML for NPR (American Public Radio) podcast feeds

The above links point to OPML-files that contains podcast feeds provided by SR, NRK, DR, BBC and NPR, all of them have good materials but do not carry or provide OPML of their own podcast feeds — so I’m providing that service for them. OPML-files are useful for adding feeds (channels) to modern and spiffy podcast catchers on both mobiles and desktops, their whole catalogs available to choose from.

Some 750000 podcast feeds has been indexed, sifted, validated and arbited, leaving some 300000 feeds, these span about 90+ languages from all over the world. I’m still chasing sources for more podcasts as I aim to have more.

I have set up a few rules when validating feeds, they are as follow.

  1. Check availability of RSS feed through HTTP/HTTPS, on errors feed will be marked for deletion
  2. Check contents of RSS feed as XML-validity, expected RSS-specific structure, enclosure elements and content-type declaration of enclosure, all MIME-types declarations that do not carry audio will be counted as invalid, on zero valid items, feed will be marked for deletion.
  3. Items with valid enclosures (and MIME-types) will be checked for “freshness” where items older than 12 months will not be indexed and items newer than a day into the future (to allow for timezones) will not be indexed either.
  4. If possible feed will be upgraded to HTTPS if on HTTP and HTTPS is available.

The above rules are validated automatically in an attempt to sift through feeds so they are fresh and that they have had episodes/items posted within the last 12 months, as most podcast directories I have found online is carrying old and stale feeds that hasn’t moved since 2006 or similar .. those are NOT interesting, as I use podcasts for learning and knowledge should be fresh and up to date.

Later on I plan on having some UI built so you can browse all ~300K feeds and help you find the ones you find interesting without locking in or trying to flog you for monies for access. API first .. and then UI.

Locking in and walled gardens. iTunes and Anchor.fm are good examples of how companies attempt to lock people into their apps or platforms, unfortunately most podcasters only see iTunes as a gate to the masses and usually only publish a link to their iTunes page and leave out the most basic RSS feed link. Anchor.fm pushes their mobile app and makes it hard to even find a feed, it is there if you know what to look for but almost nobody outside the Anchor.fm sphere will even know there is podcasts on topics X, Y and Z as they are not publicly available outside their app.

I intend to change the podcast landscape a bit, by indexing and exposing podcasts that would not see the light of day if not liberated from these walled gardens.