How to implement a universal streaming player

We will describe in detail the stages of building a streaming player that recognizes all formats using the XML capabilities of PHP 5.
For this study, you need to know the structure of the RSS file.

This extensible page is executed using Ajax Extensible Page Xul.fr and the Anaa framework.

RSS file structure

Any syndication file contains a list of items, articles, tickets or other documents and a description of the site that is their source, which is called a channel.
For the channel, as well as for the elements, the title and description, as well as the URL will be given.

Articles or documents

In all formats, the basic data is repeated: the link to the article, and the title, and the summary.

<item> 
    <title>Tutoriels RSS</title>
    <link>https://www.iqlevsha.ru/lecteur-de-flux.php</link>     
    <description>Tutoriels sur la construction et l'utilisation de flux RSS </description> 
</item>

Tag names differ depending on the format. Other data may be presented, such as release date, author, logo, etc.

Channel or site providing content

The ribbon includes a description of the source, so the site where the documents are published. Its URL, home page name, site description.

<channel>
    <title></title>
    <link>https://www.iqlevsha.ru/</link>     
    <description></description> 
<channel>

And here the name of the tags depends on the format used.

Items representing articles are placed after the description of the channel, as shown below in various formats.

Differences in formats

The global difference between RSS 2.0 and Atom is that RSS uses the rss container and atom uses only the channel. Other differences are related to tag names.
As for RSS 1.0, which is based on RDF, the syntax is noticeably different from the other two formats.

RSS 2.0 format

The example is based on the Harvard RSS 2.0 specification.

<?xml version="1.0"?>
<rss version="2.0">
   <channel>
      <title>Xul News</title>
      <link>https://www.iqlevsha.ru/</link>
      <description>Réaliser un lecteur de flux.</description>
      <language>fr-FR</language>
      <pubDate>Tue, 10 Jun 2003 04:00:00 GMT</pubDate>
      <item>
         <title>Tutoriel</title>        
         <link>https://www.iqlevsha.ru/rss/</link>
         <description></description>
         <pubDate>Jeu, 28 Sep 2007 09:39:21 GMT</pubDate>
      </item>
   </channel>
</rss>

RDF-based RSS 1.0 format

The 1.0 format uses the same tag names as 2.0, which will make it easier to create a universal drive. However, there are differences in structures. First, the rdf container belongs to the namespace of the same name. The structure is defined in the channel tag, but element descriptions are added after this tag.

The example below is based on the RSS 1.0 specification.

<?xml version="1.0"?>
<rdf:RDF 
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns="http://purl.org/rss/1.0/"
>
 <channel rdf:about="http://www.xml.com/xml/news.rss">
    <title>iqlevsha.ru</title>
    <link>https://www.iqlevsha.ru</link>
    <description>   </description>
    <image rdf:resource="https://www.iqlevsha.ru/images/logo.gif" />
    <items>
        <rdf:Seq>
            <rdf:li resource="https://www.iqlevsha.ru/rss/" />
             ...autres articles...
         </rdf:Seq>
    </items>
  </channel>
  <image rdf:about="https://www.iqlevsha.ru/images/logo.gif">
       <title>iqlevsha.ru</title>
       <link>https://www.iqlevsha.ru</link>
       <url>https://www.iqlevsha.ru/universal/images/logo.gif</url>
 </image>
 <item rdf:about="https://www.iqlevsha.ru/rss/l">
      <title>RSS</title>
      <link>https://www.iqlevsha.ru/rss/</link>
      <description>   </description>
 </item>...autres items...
</rdf:RDF>

Even if the format is more complex, usage will remain simple with PHP and DOM XML functions.

Atom format, general structure

The Atom format uses the channel directly as the root container. The channel tag is feed and the elements are entry.

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="https://www.iqlevsha.ru">
  <title>Exemple de flux</title> 
  <link href="https://www.iqlevsha.ru/rss/"/>
  <updated></updated>
  <author> 
    <name>Denis Sureau</name>
  </author> 
  <entry>
    <title>Réaliser un lecteur de flux</title>
    <link href="https://www.iqlevsha.ru/rss/lecteur-de-flux.php"/>
    <updated></updated>
    <summary>Une description.</summary>
  </entry>
</feed>

It can be seen that Atom uses clean tag names, while both RSS formats have the same format, which will help us determine the stream file format.

Using DOM with PHP 5

A document object template allows you to extract tags in an XML or HTML document. The getElementsByTagName function will be used to get a list of tags whose name is specified in the parameter. This function returns a DOMNodeList containing items in DOMNode format.
It applies to the entire document or DOMNode element and thus allows you to retrieve parts of a file, channel, or element and find a list of tags in that part.

Extract RSS Feed
DOMDocument $doc = new DOMDocument("1.0");
DOMNodeList $canal = $doc->getElementsByTagName("channel");

The "feed" parameter will be used for the Atom stream. Please note that class names are here for information, PHP code does not use them.

To Retrieve the First Feature
DOMElement $element = $canal.item(0);

You can assign a DOMNode directly when you call the item () method, which returns a DOMNode.
The advantage is that DOMElement has attributes and methods that you can use to access the content of the element.

Retrieving all items
for($i = 0; $i < $canal->length; i++)
{
    $element = $canal->item(i);
}
Using Feature Data

For each element, as for the channel, the components are extracted using the same method and with the firstChild attribute. For example, for the name:

$title = $element.getElementsByTagName("title");   // obtenir la liste des balises title
$title = $title->item(0);  // obtenir la balise
$title = $title->firstChild->textContent;  // obtenir le contenu de la balise

If one item cannot be retrieved, getElementsByTagName is used to retrieve the list that will actually contain one item, and that item is retrieved with the item method.
In XML, tag content is treated as a sub-element, so the firstChild property is used to retrieve the XML content element and data for text content.

Then it remains to apply these methods to the channel and each element of the stream to obtain its contents .

For more general use of the stream player, the implemented function returns the content in a two-dimensional array. Then we will have a choice of how to use it: show directly on the web page or perform processing on this table.

How to define a format

It is very easy to define a format if you know that RSS 1.0 and 2.0 use the same tags, and therefore the same function can be applied to these two formats. This will recognize the Atom format in the feed container, while RSS 2.0 uses the channel, and 1.0 uses rdf.
Since both versions of RSS use the channel tag, it is the presence of the feed tag that allows you to identify this format .

DOMDocument $doc = new DOMDocument("1.0");
DOMNodeList $canal = $doc->getElementsByTagName("feed");
$isAtom = ($canal != false);

Trying to extract the channel by the "feed" tag. If the interpreter finds this tag, the DOMNodeList will contain an element. The isAtom flag is set to true when the object is not empty because the tag is present, otherwise the thread will be treated as an RSS without distinction.

Read channel data

So you know how to extract the channel. The same function can be used with "feed" or "channel" strings as a parameter.
The document pointer is assumed to be a global variable, $ doc.

function extractChannel($chan)
{
   DOMNodeList $canal = $doc->getElementsByTagName($chan);
   return $canal->item(0);
}

Then, using the following function, which is called the name of each tag in the parameter, you can read the header and each descriptive element of the channel.

function getTag($tag)
{
   $content = $canal->getElementsByTagName($tag);
   $content = $content->item(0);
   return($content->firstChild->textContent);
}

Then the function will be named sequentially with the parameter "title," "link," "description..."

The names will depend on the format, it will be "summary" or "thin" for Atom and "descriptive" for the rest.

Reading item data

The principle will be the same, but we will have to loop through the list of elements while there is only one channel.

It should also be borne in mind that RSS 1.0 places element descriptions outside the channel tag, while they are contained there for other formats. Elements are contained in feed in Atom, in channel in RSS 2.0, but in rdf: RDF in RSS 1.0.

The extractItems function extracts the list of items, translates it into the "item" parameter for RSS and "entry" for Atom:

function extractItems($tag)
{
   DOMNodeList $dnl = $doc->getElementsByTagName($tag);
   return $dnl;
}

The returned list is used to access each item. It is added to the $ a table.
Example with RSS format.

$a = array();
$items = extractItems("item");
for($i = 0; $i < $items->length; i++)
{
    array_push($a, $items->item($i));
}

You can also directly create an array of element tags, title, link, description for each element and place it in a two-dimensional array.
To do this, we will use the previously defined generic version of the getTag function:

function getTag($item, $tag)
{
   $content = $item->getElementsByTagName($tag);
   $content = $content->item(0);
   return($content->firstChild->textContent);
}

for($i = 0; $i < $items->length; i++)
{
    $a = array();
    $item = $items->item($i);
    array_push($a,  getTag($item, "title"));
   ... même chose pour chaque balise d'un élément ...

    array_push($FeedArray, $a);
}

So we put each article in a two-dimensional table that can simply be shown or used as we see fit. The above loop will be placed in the getTags function.

Full Disk Features

We now have a list of all the useful features for the general reader.

With the appropriate settings, these functions are used for all formats.

Stream loading

In the simplest case, the stream is intended for integration into a web page either before loading or later at the request of the user.

Regardless of the format, and especially for streams in French, it is necessary to take into account the compatibility of the encoding format, which is most often UTF-8 for the stream, and sometimes ISO-8159 or windows-1252 for the page where the stream will be displayed.
It is better to give the UTF-8 encoding to the page to avoid incorrect display of underlined characters.

The encoding is a meta content type with a string in the following format:

<meta http-equiv="content-type" content="text/html; charset=UTF-8">

Loading with page

If you want to display the page containing the stream, the following code is inserted into the HTML code:

<?php
include("universal-reader.php");
Universal_Reader("https://www.iqlevsha.ru/rss.xml");
echo Universal_Display();
?>

See demonstration below.

Download on Demand

This case occurs when a visitor selects a stream from a list or enters the stream name himself.
After that, loading can be carried out in Ajax for asynchronous display or only in PHP, reassigning the page.
A form with an input text field will be used to specify a stream URL or a simple link (or selection of links) that can be clicked to display the stream.

Loading

Download Universal Reader functions to Zip archive .

It contains two demos.

Further information