Python xml namespace parsing with libxml2

The goal of this tinkering was simple: to parse a KML file for further interpretation and use using python and libxml2.

First test

<pre lang="python">#!/usr/bin/env python
import libxml2, sys

doc = libxml2.parseFile(sys.argv[1])
ctxt = doc.xpathNewContext()
ctxt.xpathRegisterNs('kml', "http://www.opengis.net/kml/2.2")
root = doc.getRootElement()
res = ctxt.xpathEval("//kml:Placemark")
for e in res:
 nodes = e.xpathEval('kml:name')
 if len(nodes) > 0:
 if nodes[0].content.strip().startswith('cp'):
 coord = e.xpathEval('kml:Point/kml:coordinates')
 if len(coord) > 0:
 print coord[0].content.split(',')

doc.freeDoc()
ctxt.xpathFreeContext()

This sounded ok according to the scarce doc but resulted in the following error:

<pre lang="python">Undefined namespace prefix
xmlXPathEval: evaluation failed
Traceback (most recent call last):
 File "./test1.py", line 11, in <module>
 nodes = e.xpathEval('kml:name')
 File "/usr/lib/pymodules/python2.6/libxml2.py", line 436, in xpathEval
 res = ctxt.xpathEval(expr)
 File "/usr/lib/pymodules/python2.6/libxml2.py", line 4894, in xpathEval
 if ret is None:raise xpathError('xmlXPathEval() failed')
libxml2.xpathError: xmlXPathEval() failed

Second test

After some searching on the net I’ve found that many people reported namespace parsing problems with xpath so removing the default namespace did the trick:

<pre lang="python"># removes default namespace
def clean_data(file):
 data = open(file, 'r').read();
 data = re.sub('xmlns="[^"]*"\s*', '', data)
 return data

doc = libxml2.parseDoc(clean_data(sys.argv[1]))

Third test

I could have easily left at that but it’s a bit ugly, isn’t it? The solution?

<pre lang="python">#!/usr/bin/env python
import libxml2, sys

doc = libxml2.parseFile(sys.argv[1])
ctxt = doc.xpathNewContext()
ctxt.xpathRegisterNs('kml', "http://www.opengis.net/kml/2.2")
root = doc.getRootElement()
res = ctxt.xpathEval("//kml:Placemark")
for e in res:
 ctxt.setContextNode(e)
 nodes = ctxt.xpathEval('kml:name')
 if len(nodes) > 0:
 if nodes[0].content.strip().startswith('cp'):
 coord = ctxt.xpathEval('kml:Point/kml:coordinates')
 if len(coord) > 0:
 print coord[0].content.split(',')

doc.freeDoc()
ctxt.xpathFreeContext()

Notice the ctxt.setContextNode(e) and then ctxt.xpathEval(…) instead of e.xpathEval(…). Solved. Actually it seems that not all of the kml files have the same default ns so the ugly method was simpler and better :)

Comments:

Python xml namespace parsing with libxml2 solution | Ilya Kharlamov - Aug 3, 2011

[…] people reported namespace parsing problems with xpath, Python (libxslt) for example here or here I have created an easy solution for that: def xpath(xml, xpathexpression): […]

First test#

Second test#

Third test#

Comments:#

Python xml namespace parsing with libxml2 solution | Ilya Kharlamov - Aug 3, 2011#

First test

Second test

Third test

Comments:

Python xml namespace parsing with libxml2 solution | Ilya Kharlamov - Aug 3, 2011