PYTHON xml_parse_into_struct

(0)
Python replacement for PHP's xml_parse_into_struct [ edit ]



Do you know a Python replacement for PHP's xml_parse_into_struct ? Write it!

PHP xml_parse_into_struct

PHP original manual for xml_parse_into_struct [ show | php.net ]

xml_parse_into_struct

(PHP 4, PHP 5)

xml_parse_into_structParse XML data into an array structure

Description

int xml_parse_into_struct ( resource $parser , string $data , array &$values [, array &$index ] )

This function parses an XML file into 2 parallel array structures, one (index ) containing pointers to the location of the appropriate values in the values array. These last two parameters must be passed by reference.

Parameters

parser

data

values

index

Return Values

xml_parse_into_struct() returns 0 for failure and 1 for success. This is not the same as FALSE and TRUE, be careful with operators such as ===.

Examples

Below is an example that illustrates the internal structure of the arrays being generated by the function. We use a simple note tag embedded inside a para tag, and then we parse this and print out the structures generated:

Example #1 xml_parse_into_struct() example

<?php
$simple 
"<para><note>simple note</note></para>";
$p xml_parser_create();
xml_parse_into_struct($p$simple$vals$index);
xml_parser_free($p);
echo 
"Index array\n";
print_r($index);
echo 
"\nVals array\n";
print_r($vals);
?>

When we run that code, the output will be:

Index array
Array
(
    [PARA] => Array
        (
            [0] => 0
            [1] => 2
        )

    [NOTE] => Array
        (
            [0] => 1
        )

)

Vals array
Array
(
    [0] => Array
        (
            [tag] => PARA
            [type] => open
            [level] => 1
        )

    [1] => Array
        (
            [tag] => NOTE
            [type] => complete
            [level] => 2
            [value] => simple note
        )

    [2] => Array
        (
            [tag] => PARA
            [type] => close
            [level] => 1
        )

)

Event-driven parsing (based on the expat library) can get complicated when you have an XML document that is complex. This function does not produce a DOM style object, but it generates structures amenable of being transversed in a tree fashion. Thus, we can create objects representing the data in the XML file easily. Let's consider the following XML file representing a small database of aminoacids information:

Example #2 moldb.xml - small database of molecular information

<?xml version="1.0"?>
<moldb>

  <molecule>
      <name>Alanine</name>
      <symbol>ala</symbol>
      <code>A</code>
      <type>hydrophobic</type>
  </molecule>

  <molecule>
      <name>Lysine</name>
      <symbol>lys</symbol>
      <code>K</code>
      <type>charged</type>
  </molecule>

</moldb>

And some code to parse the document and generate the appropriate objects:

Example #3 parsemoldb.php - parses moldb.xml into an array of molecular objects

<?php

class AminoAcid {
    var 
$name;  // aa name
    
var $symbol;    // three letter symbol
    
var $code;  // one letter code
    
var $type;  // hydrophobic, charged or neutral
    
    
function AminoAcid ($aa
    {
        foreach (
$aa as $k=>$v)
            
$this->$k $aa[$k];
    }
}

function 
readDatabase($filename
{
    
// read the XML database of aminoacids
    
$data implode(""file($filename));
    
$parser xml_parser_create();
    
xml_parser_set_option($parserXML_OPTION_CASE_FOLDING0);
    
xml_parser_set_option($parserXML_OPTION_SKIP_WHITE1);
    
xml_parse_into_struct($parser$data$values$tags);
    
xml_parser_free($parser);

    
// loop through the structures
    
foreach ($tags as $key=>$val) {
        if (
$key == "molecule") {
            
$molranges $val;
            
// each contiguous pair of array entries are the 
            // lower and upper range for each molecule definition
            
for ($i=0$i count($molranges); $i+=2) {
                
$offset $molranges[$i] + 1;
                
$len $molranges[$i 1] - $offset;
                
$tdb[] = parseMol(array_slice($values$offset$len));
            }
        } else {
            continue;
        }
    }
    return 
$tdb;
}

function 
parseMol($mvalues
{
    for (
$i=0$i count($mvalues); $i++) {
        
$mol[$mvalues[$i]["tag"]] = $mvalues[$i]["value"];
    }
    return new 
AminoAcid($mol);
}

$db readDatabase("moldb.xml");
echo 
"** Database of AminoAcid objects:\n";
print_r($db);

?>

After executing parsemoldb.php, the variable $db contains an array of AminoAcid objects, and the output of the script confirms that:

** Database of AminoAcid objects:
Array
(
    [0] => aminoacid Object
        (
            [name] => Alanine
            [symbol] => ala
            [code] => A
            [type] => hydrophobic
        )

    [1] => aminoacid Object
        (
            [name] => Lysine
            [symbol] => lys
            [code] => K
            [type] => charged
        )

)