Added a top 100 movie list website which scrapes data from a website

This commit is contained in:
Muhammad Ibrahim 2024-11-22 21:43:50 +05:30
parent 189b4b260c
commit e3a4a0f4e9
2 changed files with 47 additions and 0 deletions

View File

@ -0,0 +1,29 @@
## 100 Movies that You Must Watch
# Objective
Scrape the top 100 movies of all time from a website. Generate a text file called `movies.txt` that lists the movie titles in ascending order (starting from 1).
The result should look something like this:
```
1) The Godfather
2) The Empire Strikes Back
3) The Dark Knight
4) The Shawshank Redemption
... and so on
```
The central idea behind this project is to be able to use BeautifulSoup to obtain some data - like movie titles - from a website like Empire's (or from, say Timeout or Stacker that have curated similar lists).
### ⚠️ Important: Use the Internet Archive's URL
Since websites change very frequently, **use this link**
```
URL = "https://web.archive.org/web/20200518073855/https://www.empireonline.com/movies/features/best-movies-2/"
```
from the Internet Archive's Wayback machine. That way your work will match the solution video.
(Do *not* use https://www.empireonline.com/movies/features/best-movies-2/ which I've used in the screen recording)
# Solution
You can find the code from my walkthrough and solution as a downloadable .zip file in the course resources for this lesson.

View File

@ -0,0 +1,18 @@
import requests
from bs4 import BeautifulSoup
URL = "https://web.archive.org/web/20200518073855/https://www.empireonline.com/movies/features/best-movies-2/"
# Write your code below this line 👇
response = requests.get(URL)
website_html = response.text
soup = BeautifulSoup(website_html, "html.parser")
all_movies = soup.find_all(name="h3", class_="title")
movie_title = [movie.getText() for movie in all_movies]
movies = movie_title[::-1]
with open("movies.txt", mode="w") as file:
for movie in movies:
file.write(f"{movie}\n")