With beautifulsoup4, we can use the select_one()
function to extract data from an HTML document.
-
To get a tag, such as
<a></a>
,<body></body>
, use the naked name for the tag. E.g.select_one('a')
gets an anchor/link element,select_one('body')
gets the body element -
.temp
gets an element with a class oftemp
, E.g. to get<a class="temp"></a>
useselect_one('.temp')
-
#temp
gets an element with an id oftemp
, E.g. to get<a id="temp"></a>
useselect_one('#temp')
-
.temp.example
gets an element with both classestemp
andexample
, E.g. to get<a class="temp example"></a>
useselect_one('.temp.example')
-
.temp
a gets an anchor element nested inside of a parent element with classtemp
, E.g. to get<div class="temp"><a></a></div>
useselect_one('.temp a')
. Note the space between.temp
anda
. -
.temp
.example
gets an element with classexample
nested inside of a parent element with classtemp
, E.g. to get<div class="temp"><a class="example"></a></div>
useselect_one('.temp .example')
. Again, note the space between.temp
and.example
. The space tells the selector that the class after the space is a child of the class before the space. -
ids, such as
<a id=one></a>
, are unique so you can usually use the id selector by itself to get the right element. No need to do nested selectors when using ids.