Site icon JnPnote

Scrape practice with Beautiful Soup – Part 2

This post is for myself to remember how to Scrape/Crawl using Beautiful Soup.
Continue from the previous post.
The video SUB) Crawling text and images with Python from JoCoding (in Korean) and Beautiful Soup Documentation are the main reference.
I am doing this on Linux Ubuntu 20.04.4 LTS. Using Pycharm Community Edition.

1. Save scraped data as csv

For this post, I have done several weeks ago. So, will just share the code only.

from bs4 import BeautifulSoup
import requests
import lxml
import csv


url = 'https://www.premierleague.com/tables'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'lxml')

standings = soup.find('div', attrs={'data-ui-tab': 'First Team'}).find_all('tr')[1::2]
print(standings)

file = open("pl_standings.csv", 'w')
writer = csv.writer(file)

writer.writerow(['position', 'club_name', 'points'])

for standing in standings:
    position = standing.find('span', attrs={'class': 'value'}).text.strip()
    club_name = standing.find('span', {'class': 'long'}).text
    points = standing.find('td', {'class': 'points'}).text

    print(position, club_name, points)

    writer.writerow([position, club_name, points])

file.close()

2. Reference

While I was doing this, I had a lot of problems and I could not solve this problem. So, I asked on stackoverflow, and got solved. So, I will share the link there and my github for the source code. On my github, you might be able to see what I have tried. there are a lot failed code.

Exit mobile version