Beautiful soup is indeed beautiful!
I wanted to parse an HTML page containing a table and import it into a MySQL table in an automated way. Upon my friend Kumar’s advice, I came to know about Beautiful Soup. Today was the day to explore Beautiful Soup. Being new to python, I had to do a bit of python reading side-by-side. Finally, I was able to successfully pass an HTML file to my script and get a CSV output.
f = open("input_file.html","r")
g = open("outfile_file.csv,"w")
soup = BeautifulSoup(f)
t = soup.findAll('table')
for table in t:
rows = table.findAll('tr')
for tr in rows:
cols = tr.findAll('td')
for td in cols:
g.write(td.find(text=True))
g.write(",")
g.write("\n")
This script parses a simple HTML table without looking for any special tags or anything. Now that this is working, I have to make this more stronger and parse an uglier table, my task for tomorrow.
May 28, 2008 at 9:28 pm
Awesome… Beautiful soup is really amazing!
A very nice idea too…
Continue writing such posts…
May 28, 2008 at 10:14 pm
Amazing! I didn’t know it would be THAT simple. I like the for…in funda of python a lot for its simplicity.
May 30, 2008 at 6:31 pm
I’m glad you’ve also joined the “Python for making life simple” bandwagon!
May 31, 2008 at 4:41 pm
@Naveen
Thanks
@Akarsh
Yes. That’s true
@Kumar
Thank you for making me dive in
July 16, 2008 at 8:43 pm
Got an error with the above code for the line “g.write(td.find(text=True))”