XML API Rework

  Status: draft
  • Created: 2010-04-11 11:18:34+0200
  • Categories: xml
  • Target: 5.8
  • Author: friebe

Scope of Change

The XML API will be extended to fully support markup-style XML documents.


Rationale

At the moment, certain information will be lost when loading and then emitting an XML tree:

  $xml= Tree::fromString('<html>Hello<br/>World</html>')->getSource();

// <html>
// World
// <br/>
// </html>



Functionality

This loss of information is due to the fact that the above is transformed to the following internal structure by the parser:

  Node(name = 'html', content= 'World') {
    Node(name = 'br')
  }

The "Hello" is lost because any text encountered during parsing will be set as the enclosing node's content, while any node will be added as a child element. Then, when outputting the document again, the content is appended first and after that any existing child.

Addressing this issue

Several new classes will be added to the xml package. They will all implement the xml.Element interface:

  * xml.Text
  * xml.Comment
  * xml.ProcessingInstruction

The xml.Node and xml.Tree class' addChild() methods will be changed to accept any xml.Element implementation.

Finally, the parser will turn markup as seen before into the following internal structure:

  Node(name = 'html') {
    Text(content = 'Hello'),
    Node(name = 'br'),
    Text(content = 'World')
  }

The existing model with children will be used for the new implementation, whereas the "content" member in a xml.Node and its accessors, the getContent() and setContent() methods will be deprecated.

BC: Constructor parameter "content"

A Node can be constructed with the content passed in its second parameter:

  $n= new Node('a', 'Click', array('href' => 'http://example.com/'));

The old behaviour would set the content member:

  Node(name = 'a', content = 'Click', @{href = 'http://example.com/'})

The constructor will be changed to add a Text child node:

  Node(name = 'a', @{href = 'http://example.com/'}) {
    Text(content = 'Click')   
  }

This will yield the same XML when getSource() is called.

CData

The xml.CData class will also implement the Element interface.

PCData vs. Fragment

The xml.PCData class will also implement the Element interface. It will be deprecated though, and replaced with a new class called xml.Fragment.

I/O

The xml.Tree's fromFile() and fromString() methods will be deprecated in favor of the xml.parser.TreeParser class.

  // Deprecated
$t= Tree::fromFile('payload.xml');

// New
$parser= new TreeParser();
$t= $parser->parse(new StreamInputSource(new FileInputStream('payload.xml')));

The getSource() method will be deprecated in favor of the asXml() method which will accept an xml.io.OutputFormat instance.

  // Deprecated
$xml= $tree->getSource(INDENT_DEFAULT);

// New
$xml= $tree->asXml(OutputFormat::$DEFAULT);

// Writing to any stream
$writer= new TreeWriter(new ...OutputStream(), OutputFormat::$DEFAULT);
$writer->write($tree);



Security considerations

n/a

Speed impact

(TODO: Test)

Dependencies

None.

Related documents

Inspiration:

Implementing patch: Overlay:

Comments



<EOF>


Table of contents