Added a top 100 movie list website which scrapes data from a website

2024-11-22 21:43:50 +05:30 · 2024-11-22 21:43:50 +05:30 · e3a4a0f4e9
commit e3a4a0f4e9
parent 189b4b260c
2 changed files with 47 additions and 0 deletions
--- a/Projects/Starting
+++ b/Projects/Starting
@ -0,0 +1,29 @@
+## 100 Movies that You Must Watch
+
+# Objective
+
+Scrape the top 100 movies of all time from a website. Generate a text file called `movies.txt` that lists the movie titles in ascending order (starting from 1). 
+The result should look something like this:
+
+```
+1) The Godfather
+2) The Empire Strikes Back
+3) The Dark Knight
+4) The Shawshank Redemption
+... and so on
+```
+The central idea behind this project is to be able to use BeautifulSoup to obtain some data - like movie titles - from a website like Empire's (or from, say Timeout or Stacker that have curated similar lists). 
+
+### ⚠️ Important: Use the Internet Archive's URL
+
+Since websites change very frequently, **use this link** 
+```
+URL = "https://web.archive.org/web/20200518073855/https://www.empireonline.com/movies/features/best-movies-2/"
+```
+from the Internet Archive's Wayback machine. That way your work will match the solution video.
+
+(Do *not* use https://www.empireonline.com/movies/features/best-movies-2/ which I've used in the screen recording)
+
+# Solution
+
+You can find the code from my walkthrough and solution as a downloadable .zip file in the course resources for this lesson. 
--- a/Projects/Starting
+++ b/Projects/Starting
@ -0,0 +1,18 @@
+import requests
+from bs4 import BeautifulSoup
+
+URL = "https://web.archive.org/web/20200518073855/https://www.empireonline.com/movies/features/best-movies-2/"
+
+# Write your code below this line 👇
+
+response = requests.get(URL)
+website_html = response.text
+
+soup = BeautifulSoup(website_html, "html.parser")
+all_movies = soup.find_all(name="h3", class_="title")
+movie_title = [movie.getText() for movie in all_movies]
+movies = movie_title[::-1]
+
+with open("movies.txt", mode="w") as file:
+    for movie in movies:
+        file.write(f"{movie}\n")