python中用来处理的xml的方法很多,处理简单的xml我们用python自带的xml.dom.minidom即可,下面我们看看如何用minidom来简单的生成和读取xml。

先看如何生成一个xml:

Python 2.6.6 (r266:84297, Aug 24 2010, 18:46:32) [MSC v.1500 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
 
    ****************************************************************
    Personal firewall software may warn about the connection IDLE
    makes to its subprocess using this computer's internal loopback
    interface.  This connection is not visible on any external
    interface and no data is sent to or received from the Internet.
    ****************************************************************
     
IDLE 2.6.6     
>>> from xml.dom import minidom
>>> doc = minidom.getDOMImplementation().createDocument(None, 'pagelist', None)
>>> rootdoc = doc.documentElement
>>> page = doc.createElement('page')
>>> rootdoc.appendChild(page)
<DOM Element: page at 0x1247df0>
>>> url = doc.createElement('url')
>>> urltext = doc.createTextNode('http://quke.org')
>>> url.appendChild(urltext)
<DOM Text node "'http://quk'...">
>>> doc.toxml()
'<?xml version="1.0" ?><pagelist><page/></pagelist>'
>>> page.appendChild(url)
<DOM Element: url at 0x1247eb8>
>>> doc.toxml()
'<?xml version="1.0" ?><pagelist><page><url>http://quke.org</url></page></pagelist>'
>>> title = doc.createElement('title')
>>> titledata = doc.createCDATASection('趣客')
>>> title.appendChild(titledata)
<DOM CDATASection node "'\xc8\xa4\xbf\xcd'">
>>> page.appendChild(title)
<DOM Element: title at 0x124eeb8>
>>> doc.toxml()
'<?xml version="1.0" ?><pagelist><page><url>http://quke.org</url><title><![CDATA[\xc8\xa4\xbf\xcd]]></title></page></pagelist>'
>>> rootdoc.appendChild(page)
<DOM Element: page at 0x1247df0>
>>> doc.toxml()
'<?xml version="1.0" ?><pagelist><page><url>http://quke.org</url><title><![CDATA[\xc8\xa4\xbf\xcd]]></title></page></pagelist>'
>>> page = doc.createElement('page')
>>> rootdoc.appendChild(page)
<DOM Element: page at 0x124edf0>
>>> doc.toxml()
'<?xml version="1.0" ?><pagelist><page><url>http://quke.org</url><title><![CDATA[\xc8\xa4\xbf\xcd]]></title></page><page/></pagelist>'
>>>

注:xml结构为文档下有元素,元素下有节点;元素上可以有属性,元素下可以套元素。

读取xml方式如下:

>>> page = doc.getElementsByTagName("page")[0]
>>> page.toxml()
'<page><url>http://quke.org</url><title><![CDATA[\xc8\xa4\xbf\xcd]]></title></page>'
>>> url = page.getElementsByTagName("url")[0]
>>> url.toxml()
'<url>http://quke.org</url>'
>>> url.firstChild.data
'http://quke.org'
>>> title = page.getElementsByTagName("title")[0]
>>> title.firstChild.data
'\xc8\xa4\xbf\xcd'

简单的使用方法先到这儿。