python - xpath to select from child element to end of parent -
I'm trying to do this by using lxml, but this is actually a question about proper xpath. I What you are trying to do is not trivial: not only match you Want to 'Pidgeback' elements and all subsequent siblings, then you want to take them out of the original guardian and wrap up the brothers and sisters in a 'P' element. Funny stuff The following code should get the idea of how to get it (DISCLAIMER: Example only, clean-up is required, the edges have probably not been handled). Code is intentionally unrelated, so you have to find out this :) I have modified the input XML slightly better so that the functionality can be better understood. Para & lt; PgBreak pgId = "2" /> Some more text & lt; quote & gt; some text to fill a quoted block & lt; / quote & gt; remaining paragraphs & lt; pgBreak pgId = "3" /> gt ; & Lt; p & gt; blurb & lt; / p & gt; & lt; / p & gt; & lt; / root & gt; "" "root = lxml.etree.fromstring text) root.xpath (' // pgBreak '): internal = pgbreak.getparent () if internal == root: external = inner.getparent (continue) pgbreak_index = inner.index (pgbreak) inner_index = external.index (internal) + 1 siblings = Internal [pg break_ index + 1:] internal. Rev. (PGBR) External.Instant (Initial Index, PGBR), Brother Sister [0]. Tags! = 'P': P = Lx ml. Tree.Alment ('P') p.text = pgbreak.tail pgbreak.tail = None for brothers in node: p.append (node) outer.insert (inner_index + 1, P) Other: For nodes in siblings: inner_index + = 1 exterior. (I have Nner_index, node) output: & lt; PgBreak & gt; element until the end of her parents, in this case
& lt; P & gt;
I want to select from XML:
& lt; Root & gt; & Lt; PgBreak pgId = "1" /> & Lt; P & gt; One paragraph & lt; PgBreak pgId = "2" /> to fill some text some more text & lt; Quote & gt; A quoted block & lt; / Quote & gt; Paragraph & lt; / P & gt; & Lt; / Root & gt;
XML Out:
& lt; Root & gt; & Lt; PgBreak pgId = "1" /> & Lt; P & gt; One paragraph & lt; / P & gt; Some text to fill in & lt; PgBreak pgId = "2" /> & Lt; P & gt; Some more text & lt; Quote & gt; A quoted block & lt; / Quote & gt; Paragraph & lt; / P & gt; & Lt; / Root & gt;
import lxml.etree text = "" "& gt; Route & gt; & lt; pgBreak pgId =" 1 "/>
& lt; Root & gt; & Lt; PgBreak pgId = "1" /> & Lt; P & gt; One paragraph & lt; / P & gt; Some text to fill in & lt; PgBreak pgId = "2" /> & Lt; P & gt; Some more text & lt; Quote & gt; A quoted block & lt; / Quote & gt; Paragraph & lt; / P & gt; & Lt; PgBreak pgId = "3" /> & Lt; P & gt; Blurb & lt; / P & gt; & Lt; / Root & gt;
Comments
Post a Comment