javascript - How to automate selecting certain codes in an html? -
hi have question automating selecting content in html. if save webpage html only, we'll html codes along other stylesheets , javascript codes. however, want extract html codes between <div class='post-content' itemprop='articlebody'>
and</div>
, create new html file has extracted html codes. there possible way it? example codes down below:
<html> <script src='.....'> </script> <style> ... </style> <div class='header-outer'> <div class='header-title'> <div class='post-content' itemprop='articlebody'> <p>content want</p> </div> </div></div> <div class='footer'> </div> </html>
while i'm typing, i'm thinking javascript, seems able manipulate html dom elements..is ruby able that? can generate new clean html contains content between <div class='post-content' itemprop='articlebody'>
and</div>
using javascript or ruby? however, how write actual code, don't have clue.
so has idea it? thank much!
i'm not quite sure you're asking, i'll take crack @ it.
can ruby modify dom on webpage?
short answer, no. browsers don't know how run ruby. know how run javascript, that's used real-time dom manipulation.
can generate new clean html
yes? @ end of day, html formatted string. if want download source page , find in <div class='post-content' itemprop='articlebody'>
tag, there couple of ways go that. best nokogiri
gem, ruby html parser. you'll able feed string (from file or otherwise) represents old page , strip out want. doing this:
require 'nokogiri' page = nokogiri::html(open("https://googleblog.blogspot.com")) # finds first child of <div class="post-content"> element text = page.css('.post-content')[0].text
i believe gives text you're looking for. more detailed nokogiri instructions can found here.
Comments
Post a Comment