python - Can't scrape HTML table using BeautifulSoup -


i'm trying scrape data off table on web page using python, beautifulsoup, requests, selenium log site. here's table i'm looking data for...

<div class="sastrupp-class">          <table>              <tbody>                  <tr>                      <td class="key">thing dont want 1</td>                      <td class="value money">$1.23</td>                        <td class="key">thing dont want 2</td>                      <td class="value">99,999,999</td>                        <td class="key">target</td>                      <td class="money value">$1.23</td>                        <td class="key">thing dont want 3</td>                      <td class="money value">$1.23</td>                        <td class="key">thing dont want 4</td>                      <td class="value percentage">1.23%</td>                        <td class="key">thing dont want 5</td>                      <td class="money value">$1.23</td>                  </tr>              </tbody>          </table>      </div>
can find "sastrupp-class" fine, don't know how through , part of table want. figured class i'm searching this...

    output = soup.find('td', {'class':'key'})     print(output) 

but doesn't return anything.

important note:

  1. < td>s inside table have same class name 1 want. if can't separate them out, i'm ok although i'd rather return 1 want.

2.there other < div>s class="sastrupp-class" on site.

  1. i'm beginner @ let me know if can me. help/pointers appreciated.

1) first of, 'target' need find_all, not find. then, considering know in position target (in example gave index=2) solution reached this:

from bs4 import beautifulsoup  html = """(your html)"""  soup = beautifulsoup(html, 'html.parser') table = soup.find('div', {'class': 'sastrupp-class'}) all_keys = table.find_all('td', {'class': 'key'}) my_key = all_keys[2]  print my_key.text  # prints 'target' 

2)

there other < div>s class="sastrupp-class" on site

again, need select 1 need using find_all , selecting correct index.

example html:

<body> <div class="sastrupp-class"> don't need this</div> <div class="sastrupp-class"> don't need this</div> <div class="sastrupp-class"> don't need this</div> <div class="sastrupp-class"> target</div> </body> 

to extract target, can just:

all_divs = soup.find_all('div', {'class':'sastrupp-class'}) target = all_divs[3]  # assuming know index 

Comments

Popular posts from this blog

Delphi XE2 Indy10 udp client-server interchange using SendBuffer-ReceiveBuffer -

Qt ActiveX WMI QAxBase::dynamicCallHelper: ItemIndex(int): No such property in -

python - cx_oracle unable to find Oracle Client -