Python xml namespace parsing with libxml2

The goal of this tinkering was simple: to parse a KML file for further interpretation and use using python and libxml2.

First test

#!/usr/bin/env python
import libxml2, sys
 
doc = libxml2.parseFile(sys.argv[1])
ctxt = doc.xpathNewContext()
ctxt.xpathRegisterNs('kml', "http://www.opengis.net/kml/2.2")
root = doc.getRootElement()
res = ctxt.xpathEval("//kml:Placemark")
for e in res:
 nodes = e.xpathEval('kml:name')
 if len(nodes) > 0:
 if nodes[0].content.strip().startswith('cp'):
 coord = e.xpathEval('kml:Point/kml:coordinates')
 if len(coord) > 0:
 print coord[0].content.split(',')
 
doc.freeDoc()
ctxt.xpathFreeContext()

This sounded ok according to the scarce doc but resulted in the following error:

Undefined namespace prefix
xmlXPathEval: evaluation failed
Traceback (most recent call last):
 File "./test1.py", line 11, in <module>
 nodes = e.xpathEval('kml:name')
 File "/usr/lib/pymodules/python2.6/libxml2.py", line 436, in xpathEval
 res = ctxt.xpathEval(expr)
 File "/usr/lib/pymodules/python2.6/libxml2.py", line 4894, in xpathEval
 if ret is None:raise xpathError('xmlXPathEval() failed')
libxml2.xpathError: xmlXPathEval() failed

Second test

After some searching on the net I’ve found that many people reported namespace parsing problems with xpath so removing the default namespace did the trick:

# removes default namespace
def clean_data(file):
 data = open(file, 'r').read();
 data = re.sub('xmlns="[^"]*"\s*', '', data)
 return data
 
doc = libxml2.parseDoc(clean_data(sys.argv[1]))

Third test

I could have easily left at that but it’s a bit ugly, isn’t it? The solution?

#!/usr/bin/env python
import libxml2, sys
 
doc = libxml2.parseFile(sys.argv[1])
ctxt = doc.xpathNewContext()
ctxt.xpathRegisterNs('kml', "http://www.opengis.net/kml/2.2")
root = doc.getRootElement()
res = ctxt.xpathEval("//kml:Placemark")
for e in res:
 ctxt.setContextNode(e)
 nodes = ctxt.xpathEval('kml:name')
 if len(nodes) > 0:
 if nodes[0].content.strip().startswith('cp'):
 coord = ctxt.xpathEval('kml:Point/kml:coordinates')
 if len(coord) > 0:
 print coord[0].content.split(',')
 
doc.freeDoc()
ctxt.xpathFreeContext()

Notice the ctxt.setContextNode(e) and then ctxt.xpathEval(…) instead of e.xpathEval(…). Solved. Actually it seems that not all of the kml files have the same default ns so the ugly method was simpler and better :)

One Response

  1. […] people reported namespace parsing problems with xpath, Python (libxslt) for example here or here I have created an easy solution for that: def xpath(xml, xpathexpression):     […]

Leave a Reply

*