Helping my subscriber to extract data from br tags using SCRAPY Shell in 2 ways

2020/06/05 に公開
視聴回数 715
0
0
Hey what's up guys, in this video we gonna learn 2 ways of extracting data from BR tags.

The first easy way is to extract ALL the text from the parent DIV tag and then reference the last two indexes containing BR tags' data but this approach has a limitation - it's applicable only in case of fixed number of elements in the parent DIV tag - otherwise you'll either get a malformed data or encounter an index error.

locations = [list(filter(None, [text.strip() for text in card.css('div[class="grid__unit grid__unit--1-2-l grid__unit--1-4-1"]').css(' *::text').getall()]))[-2:] for card in response.css('div[class="grid"]')[0:-1]]


The second way is a bit tricky but way more flexible for it's applicable for extracting whatever number of BR text. The idea is to use scrapy Selector to parse specific sub DIV and then extract all the textual nodes recursively.

loc_selector = [list(filter(None, [text.strip() for text in Selector(text=card.css('div[class="grid__unit grid__unit--1-2-l grid__unit--1-4-1"]').getall()[-1]).css(' *::text').getall()]))[1:] for card in response.css('div[class="grid"]')[0:-1]]