RSS 3.1, I propose a few podcast-specific tokens

I have fiddled a lot with podcasts lately, as I have a little project where I have indexed and cleaned up a list of about 135000 links to podcast feeds to about 15000 at the moment. As I have written scripts to fetch, validate and parse all these feeds, I have also run in all sorts of problems with them.

RSS 2.0 is a current XML-based sort-of-standard, defacto more or less and have a heritage back to the first specification, RSS 0.90. RSS is XML and XML is heavy, the RSS specifications has no real official schema to use and is nearly impossible to use anyway, people will fill out elements with whatever they think should be there and in most cases does it wrong. Todays world of RSS feeds are held together with wire, spit and a good chunk of good luck.

I browsed around and found Aaron Swartz RSS 3.0 draft in which he put words to my thoughts, lets skip the XML and make it more light-weight and human-readable. XML with name-spaces is a mess. I took his notes and extended them with everything needed to carry Audio and Video podcasting to 2018. The below text is captured and modified from the above link.


This proposal builds upon the draft Aaron Swartz outlines in 2002, it only implements text variants of RSS.
The below adds and extends it with tokens for Podcast (as RSS + Enclosure in RSS 2.0-world).

Format
An item consists of a series of lines separated by "\n".
Each line is a series of letters, numbers, "-", "." or "_" (called the name) followed by ": " followed by a series of characters (called the value). No two lines should start with the same name. If a line starts with a space or tab character, then it is a continuation of the value on the previous line. The newline in between is preserved. UTF-8 encoding is always used.
An item ends at the first blank line (that is, a line with no characters).

Document
An RSS 3.1 document consists of one head item followed by zero or more body items.

Head
The head is an item. Names for the lines are globally assigned. Names are case-insensitive. The assigned names are:

title
description
link
generator
errorsto
creator
created
last-modified
language
rights
license
guid
uri
subject

Most properties refer to the whole feed in adddition to the item. i.e. last-modified is the last-modified date of the feed.

Body
The body is a series of zero or more items. Names for the lines are globally-assigned and case-insensitive. The assigned names are:

title
description
link
generator
creator
created
last-modified
language
rights
license
guid
uri
subject
enclosure-type
enclosure-length
enclosure-uri
enclosure-episode
enclosure-season
enclosure-explicit
enclosure-duration
enclosure-tags

Tokens

title
The title of the item.

description
A short description of the item.

link
A link to the item.
generator
The person or program that generated the item.
errorsto
An email address, optionally followed by a space and a name, of the person to send error reports about the feed to.
creator
An email address, optionally followed by a space and a name, of the person who created the item.
created
The date (in W3CDTF format) the item was created.
last-modified
The date (in W3CDTF format) the item was modified.
language
The language of the item, using the language tag format specified in RFC 3066.
rights
The copyright statement for the item.
license
A URI for the copyright license of the item.
guid
A globally unique identifier for the item.
uri
A globally unique identifier in the form of a URI for the item.
subject
The topic of the item.
enclosure-type
A MIME-type identifier for enclosure, audio/mpeg or video/mpeg
enclosure-length
Number of bytes in Base10 notation
enclosure-uri
A URI for the item
enclosure-episode
A date in W3CDTF format, episode number to indicate order
enclosure-season
A year or season number to indicate order
enclosure-explicit
A flag to indicate if item is explicit, positive values should be "1", "Yes" or "True"
enclosure-duration
A value of whole seconds in Base10 format or colon-separated values in order Hours, minutes, seconds (hh:mm:ss)
enclosure-tags
A single entry or comma-separated list of categories (IAB19) and/or keywords

Example
title: RSS 3.0 News
description: Latest updates on RSS 3.0.
link: http://www.aaronsw.com/2002/rss30
creator: me@aaronsw.com Aaron Swartz
errorsTo: me@aaronsw.com Aaron Swartz
language: en-US

title: Spec Introduced
created: 2002-09-06
guid: 00795648-C1E0-11D6-9AA6-003065F376B6
description:
The spec was introduced to the world.

A few people noticed.

Title: Zooko Likes It
Created: 2002-09-06
GUID: 0894CB2F-C1E0-11D6-9649-003065F376B6
Description: Zooko says he likes the spec.

title: A podcast extension?
created: 2018-04-08 19:19:26
last-modified: 2018-04-08 19:21:34
description: A possible extension of Aarons RSS 3.0 draft with aim to add podcast capabilities.
New tokens added to accomodate values used for podcasts of audio and video contents.
enclosure-type
enclosure-length
enclosure-uri
enclosure-episode
enclosure-season
enclosure-explicit
enclosure-duration
enclosure-tags
The above list would be able to carry enough information to replace RSS 2.0 feeds
enclosure-type: audio/mpeg
enclosure-length: 10485760
enclosure-uri: http://content.example.com/podcast/audio/does-not-exist.mp3
enclosure-explicit: No
enclosure-duration: 00:08:13
enclosure-tags: rss 3.1, podcast, tokens, random blab, horses are insects

WordPress and current theme does not allow presentation of my draft in a format-correct way, if you feel you would like to have a copy of my draft, please comment with your email address and I will attach the draft. I don’t really know where to enter my draft to make it official or so. I will code up something that allows for a RSS 3.1 feed of blogposts and podcasts (whenever my project goes live and public).

Leave a Reply

Your email address will not be published. Required fields are marked *