python - Can't scrape HTML table using BeautifulSoup -
i'm trying scrape data off table on web page using python, beautifulsoup, requests, selenium log site. here's table i'm looking data for...
<div class="sastrupp-class"> <table> <tbody> <tr> <td class="key">thing dont want 1</td> <td class="value money">$1.23</td> <td class="key">thing dont want 2</td> <td class="value">99,999,999</td> <td class="key">target</td> <td class="money value">$1.23</td> <td class="key">thing dont want 3</td> <td class="money value">$1.23</td> <td class="key">thing dont want 4</td> <td class="value percentage">1.23%</td> <td class="key">thing dont want 5</td> <td class="money value">$1.23</td> </tr> </tbody> </table> </div>
output = soup.find('td', {'class':'key'}) print(output)
but doesn't return anything.
important note:
- < td>s inside table have same class name 1 want. if can't separate them out, i'm ok although i'd rather return 1 want.
2.there other < div>s class="sastrupp-class" on site.
- i'm beginner @ let me know if can me. help/pointers appreciated.
1) first of, 'target' need find_all, not find. then, considering know in position target (in example gave index=2) solution reached this:
from bs4 import beautifulsoup html = """(your html)""" soup = beautifulsoup(html, 'html.parser') table = soup.find('div', {'class': 'sastrupp-class'}) all_keys = table.find_all('td', {'class': 'key'}) my_key = all_keys[2] print my_key.text # prints 'target'
2)
there other < div>s class="sastrupp-class" on site
again, need select 1 need using find_all , selecting correct index.
example html:
<body> <div class="sastrupp-class"> don't need this</div> <div class="sastrupp-class"> don't need this</div> <div class="sastrupp-class"> don't need this</div> <div class="sastrupp-class"> target</div> </body>
to extract target, can just:
all_divs = soup.find_all('div', {'class':'sastrupp-class'}) target = all_divs[3] # assuming know index
Comments
Post a Comment