This post is for myself to remember how to Scrape/Crawl using Beautiful Soup.
Continue from the previous post.
The video SUB) Crawling text and images with Python from JoCoding (in Korean) and Beautiful Soup Documentation are the main reference.
I am doing this on Linux Ubuntu 20.04.4 LTS. Using Pycharm Community Edition.
1. Save scraped data as csv
For this post, I have done several weeks ago. So, will just share the code only.
from bs4 import BeautifulSoup
import requests
import lxml
import csv
url = 'https://www.premierleague.com/tables'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'lxml')
standings = soup.find('div', attrs={'data-ui-tab': 'First Team'}).find_all('tr')[1::2]
print(standings)
file = open("pl_standings.csv", 'w')
writer = csv.writer(file)
writer.writerow(['position', 'club_name', 'points'])
for standing in standings:
position = standing.find('span', attrs={'class': 'value'}).text.strip()
club_name = standing.find('span', {'class': 'long'}).text
points = standing.find('td', {'class': 'points'}).text
print(position, club_name, points)
writer.writerow([position, club_name, points])
file.close()
2. Reference
While I was doing this, I had a lot of problems and I could not solve this problem. So, I asked on stackoverflow, and got solved. So, I will share the link there and my github for the source code. On my github, you might be able to see what I have tried. there are a lot failed code.