CS50 Week 6 Python — Rewriting C Problems in Python and DNA Profiling with STR Matching

Parsa Dev
Feb 1
3 min read

A bright London café workspace showing a MacBook with a light-themed Python editor open to CS50x Week 6 DNA profiling code, with warm natural light, a flat white coffee, and a softly blurred view of a London street through the window.

CS50 Week 6 Python - Week 6 of CS50x is where the language changes entirely. After five weeks of writing C — managing memory manually, counting bytes, handling pointers — you switch to Python, and the contrast is immediate. The first four problems this week are sentimental re-implementations: Hello, Mario, Credit, and Readability are all problems from earlier weeks, rebuilt in Python. Hello becomes four lines of clean, readable code using an f-string. Mario's double pyramid shrinks down to a while loop for input validation and a single for loop with string multiplication — " " a + "#" b + " " + "#" * b — replacing what took three nested loops in C. Credit uses Python's math.log10 to count digits, a nested function to extract individual digits as a list, and slices to pull the first one or two digits for card type detection — the Luhn checksum logic becomes a compact list comprehension pattern that is genuinely easier to read than the C equivalent. Readability iterates over the text using string.ascii_letters to count letters, tracks words through a careful space-counting condition that handles both the first word and subsequent ones, and applies the same Coleman-Liau formula as before. Doing the same problems again in a different language is not busywork — it makes you see exactly what C was doing for you (nothing) and what Python does instead.

DNA is the standout problem of the week and the most technically interesting one. The program takes two command-line arguments — a CSV database of STR counts per person, and a text file containing a DNA sequence — and identifies who the DNA belongs to, or prints "No match" if no one in the database fits. The CSV is read using Python's csv.reader, with the header row parsed separately to extract the STR names and the remaining rows stored as lists with numeric values. For each STR in the header, the longest_match function scans the entire DNA sequence to find the longest consecutive run of that STR — a sliding window approach that checks every position in the sequence and counts how many times the subsequence repeats consecutively from that point. Once all STR counts are computed, they are compared against each person's row in the database. An exact match on every STR identifies the person. The implementation uses plain lists rather than DictReader for the database rows, converting STR count strings to integers during the initial parse so the comparison works cleanly without any type coercion later.

What Week 6 makes very clear is that Python is not just easier to write than C — it is a different way of thinking about problems. String multiplication, list slicing, csv.reader, sys.exit, f-strings, and the math module all let you express solutions more directly, with less scaffolding around the actual logic. The DNA problem is a good example: the sliding window in longest_match is the same algorithmic idea regardless of language, but Python lets you write sequence[start:end] == subsequence instead of managing character arrays and loop bounds manually. That said, having learned C first means you understand what is happening underneath those abstractions — and that matters more than it might seem. The full code for this week is on GitHub — browse the Week 6 folder directly, or explore the entire CS50x repository to follow the full course progression.

CS50 Week 6 Python — Rewriting C Problems in Python and DNA Profiling with STR Matching

Recent Posts

Comments