CS50 Week 7 SQL — Spotify Songs, IMDb Movies and the Fiftyville Mystery
- Parsa Dev
- Feb 4
- 3 min read

Week 7 of CS50x introduces SQL, and it does so through three very different databases. Songs is the warm-up: a SQLite database of the top 100 streamed Spotify tracks from 2018, with eight queries to write ranging from simple SELECT statements to subqueries joining the songs and artists tables. The queries cover ordering by tempo and duration, filtering by audio features like danceability and valence, computing averages, and using LIKE with a wildcard to find songs featuring collaborating artists. Query 6 — finding Post Malone's songs without hardcoding his artist ID — is where subqueries start to feel genuinely useful: a WHERE artist_id = (SELECT id FROM artists WHERE name = 'Post Malone') pattern that would still work correctly if the entire database were renumbered. The answers.txt reflection on Spotify's Audio Aura is an interesting exercise too — thinking about why averaging energy and danceability across 100 songs might flatten out the nuance of a listener's actual taste. Movies is a significant step up. Thirteen queries across a five-table IMDb database covering movies, people, ratings, stars, and directors — and the complexity increases steadily from a simple year filter in query 1 to multi-table JOIN chains, nested subqueries, DISTINCT, INTERSECT-style logic, and LIMIT for ranking. Notable queries include finding every actor who has starred in a film with Kevin Bacon (born 1958 specifically, to avoid name collision), the top five Chadwick Boseman films by rating, and — a personal twist on the spec — films featuring both Johnny Depp and Helena Bonham Carter instead of the default Bradley Cooper and Jennifer Lawrence pair.
Fiftyville is the standout problem of the week and probably one of the most enjoyable in the entire CS50x course. The setup is simple: the CS50 duck has been stolen from Fiftyville on 28 July, and the only tool available to solve the crime is a SQLite database full of town records. The investigation starts with the crime scene report, then fans out through witness interviews, bakery CCTV logs, ATM transaction records, phone call logs, airport flight data, and passenger manifests — each query narrowing the field of suspects. The approach taken here is to build up intersecting sets: first finding who left the bakery car park within ten minutes of the theft, then cross-referencing with who withdrew cash from the ATM on Leggett Street that morning, then cross-checking against passengers on the first flight out of Fiftyville the next day. The phone call records — filtering for calls under sixty seconds made on the day of the theft — narrow the final suspects to three, and from there it is a matter of tracing who called whom and who purchased the flight ticket. The thief is Bruce, the accomplice is Robin, and the destination is New York City. The log.sql file documents every query run along the way, with comments explaining the reasoning at each step — which is the whole point of the exercise.
What Week 7 teaches is that SQL is not just a query language — it is a way of thinking about data in terms of sets, relationships, and joins. Writing the Fiftyville investigation as a series of intersecting subqueries made that concrete in a way that no amount of theory would have. The jump from simple SELECT statements in Songs to multi-level nested joins in Movies to a genuine investigative query chain in Fiftyville happens across a single week, which is a testament to how well the problem sets are sequenced. The full code for this week is on GitHub — all three problem sets are in separate folders: Songs, Movies, and Fiftyville. You can also browse the Week 7 folder to see everything together, or explore the entire CS50x repository to follow the course week by week.




Comments