Extracting HTML data fields with Python -
Please forgive me for lack of knowledge, but give HTML the following format, the best way to remove the person What is the data field? Please keep in mind that most of the times will be faucet, in comparison with some, or all, in which we will keep them in the tap.
& lt; Div class = "profile-section" id = "about a bit more" & gt; & Lt; DL & gt; & Lt; DT & gt; Name: & lt; / Dt & gt; & Lt; Dd & gt; & Lt; Span class = "given-name" & gt; Claim & lt; / Span & gt; & Lt; Span class = "family-name" & gt; Cuddles & lt; / Span & gt; & Lt; / Dd> & Lt; / DL & gt; & Lt ;! - & lt; Span class = "realname" & gt; / & Lt; Span class = "fn n" & gt; & Lt; Span class = "given-name" & gt; Claim & lt; / Span & gt; & Lt; Span class = "family-name" & gt; Kadlepler & lt; / Span & gt; & Lt; / Span & gt; & Lt; / Span & gt; - & gt; & Lt; DL & gt; & Lt; DT & gt; Included: & lt; / Dt & gt; & Lt; Dd & gt; September 1910 & lt; / Dd> & Lt; / DL & gt; & Lt; Div class = "sep" & gt; & Lt; / Div & gt; & Lt; DL & gt; & Lt; DT & gt; Hometown: & lt; / Dt & gt; & Lt; Dd & gt; Cool Balance Maximum Security Twilight House & lt; / Dd> & Lt; / DL & gt; & Lt; DL & gt; & Lt; DT & gt; Currently: & lt; / Dt & gt; & Lt; Dd & gt; & Lt; Span class = "adr" & gt; & Lt; Span class = "locality" & gt; They give me & lt; / Span>, & lt; Span class = "country-name" & gt; Zimbabwe & lt; / Time & gt; & Lt; / Span & gt; & Lt; / Dd> & Lt; / DL & gt; & Lt; Div class = "sep" & gt; & Lt; / Div & gt;
Use beautiful soup, LXML or built-in module html.parser to use third party modules. For example: Beautiful soup from the BS4 import = beautiful soup (' gt; & lt; body & gt; & lt; a & gt; BBB & lt; / a & gt; & lt; / body & Gt; & lt; / html ') Soup.find (' a ')
Or if you want, you can use regex for a small target.
Comments
Post a Comment