The 90s rewind: Exploring Netflix's movie data chronicles

Sharon NguyaiSharon Nguyai
5 min read

Remember the magic of 90s movies? They left a lasting impact on culture and continue to evoke nostalgia. The 90s were a pivotal time for cinema, introducing iconic films that shaped genres and influenced filmmakers for years to come.

Many people say, "Don't do Netflix projects—everyone does them." And they’re right, to an extent. What truly matters is starting with a project that helps you understand key concepts. The skills you gain here have a high chance of being useful in future projects. Instead of just following trends, focus on projects that have purpose and align with your learning goals. The 1990s were a golden era for movies, with classics spanning action, drama, and comedy. In this project, I analyzed a dataset provided by DataCamp that contains Netflix movie data from this decade. This beginner-friendly guide walks through key data analysis techniques, including filtering DataFrames, using Matplotlib for visualization, and applying loops and conditionals.

To begin, let's focus on setting up the data

SETTING UP THE DATA

I loaded the Netflix dataset and explored its structure:

import pandas as pd
import matplotlib.pyplot as plt

# Load the dataset
netflix_df = pd.read_csv("netflix_data.csv")

# Display the first few rows
print(netflix_df.head())

Loading and inspecting the data is an exciting first step in any project! Discovering the available columns helps you decide what kind of analysis or filtering you can explore.

This dataset includes columns like show_id, type, title, director, cast, country, date_added, release_year, duration, description, and genre.

  • import pandas as pd: This imports the pandas library, which is used for handling and analyzing data in Python.

  • import matplotlib.pyplot as plt: This imports matplotlib.pyplot, which is used for creating visualizations like charts and graphs.

  • netflix_df = pd.read_csv("netflix_data.csv"): This loads a dataset called "netflix_data.csv" into a variable named netflix_df. The function pd.read_csv() reads CSV (Comma-Separated Values) files and converts them into a structured table (DataFrame), making it easier to analyze.

  • print(netflix_df.head()): This prints the first few rows of the dataset. It helps you quickly check what kind of data you’re working with, including column names and example values.

FILTERING DATAFRAMES: EXTRACTING 1990S MOVIES

Now that we have set up the data, let's move on to filtering DataFrames.

movies_90s = netflix_df[(netflix_df["type"] == "Movie") & (netflix_df["release_year"].between(1990, 1999))]
  • netflix_df["type"] == "Movie": This ensures we select only movies and exclude TV shows.

  • netflix_df["release_year"].between(1990, 1999): This checks if the movie's release year is between 1990 and 1999 (inclusive).

  • Combining Conditions: The & (AND operator) ensures that both conditions must be true. The result is stored in movies_90s, which now contains only 90s movies

ANALYZING MOVIE DURATIONS

I wanted to find the most frequent movie duration in this decade:

# Convert duration to numeric format
movies_90s["duration"] = movies_90s["duration"].str.replace(" min", "", regex=True).astype(float)

# Find the most common duration
duration = int(movies_90s["duration"].mode()[0])
print(duration)
  • str.replace(" min", "", regex=True): This removes the text " min" from each value in the "duration" column, leaving only the numeric part. For example, "120 min" becomes "120".

  • .astype(float): Converts the cleaned string values into floating-point numbers. Now, "120" becomes 120.0, making it easier to perform numerical analysis.

  • The original "duration" column is stored as text (e.g., "120 min"), which prevents mathematical operations like calculating averages or creating histograms. Converting it to numeric format allows for easier analysis, filtering, and visualization.

  • The .mode() function returns the most frequently occurring value(s) in the duration column. [0] selects the first mode from the returned Series.

  • int(...) converts the results to integers.

COUNTING SHORT ACTION MOVIES

A movie is considered short if it is less than 90 minutes. I counted the number of short action movies:

short_movie_count = movies_90s[(movies_90s["genre"].str.contains("Action", na=False)) & (movies_90s["duration"] < 90)].shape[0]
print(short_movie_count)
  • movies_90s["genre"].str.contains("Action", na=False): This checks if the "genre" column contains the word "Action". na=False ensures that missing values (NaN) are treated as False instead of causing an error.

  • movies_90s["duration"] < 90: This filters movies that have a duration of less than 90 minutes.

  • movies_90s[...]: The & operator ensures both conditions (Action genre and duration < 90) are met.

MATPLOTLIB CUSTOMIZATION: VISUALIZING MOVIE DURATIONS

A histogram helps visualize movie durations:

plt.hist(movies_90s["duration"], bins=15, color='blue', edgecolor='black')
plt.xlabel("Duration (minutes)")
plt.ylabel("Number of Movies")
plt.title("Distribution of Movie Durations in the 1990s")
plt.show()
  • plt.hist(movies_90s["duration"], bins=15, color='lightblue', edgecolor='black') Plots a histogram of the "duration" column.

  • bins=15 groups the movie durations into 15 intervals.

  • color='lightblue' fills the bars with light blue.

  • edgecolor='black' adds black edges to the bars for better visibility.

  • plt.xlabel("Duration (minutes)"): Labels the x-axis as "Duration (minutes)".

  • plt.ylabel("Number of Movies"): Labels the y-axis as "Number of Movies".

  • plt.title("Distribution of Movie Durations in the 1990s"): Adds a title to the plot.

  • plt.show(): Displays the histogram.

USING LOOPS AND IF-ELSE STATEMENTS

I iterated over the movies and categorized them:

for index, row in movies_90s.iterrows():
    if row["duration"] < 90:
        category = "Short"
    elif row["duration"] <= 150:
        category = "Medium"
    else:
        category = "Long"
    print(f"{row['title']} is a {category} movie.")
  • for index, row in movies_90s.iterrows(): Iterates through each row of the DataFrame using iterrows(), which returns the index and row data.

  • Categorization Logic (if-elif-else):

    • If the movie duration is less than 90 minutes, it's classified as "Short".

    • If the movie duration is between 90 and 150 minutes, it's classified as "Medium".

    • Otherwise, it’s classified as "Long" (more than 150 minutes).

  • Printing the result: Uses an f-string to print the movie title along with its category. The output will be something like this.

    187 is a Medium movie.

    A Dangerous Woman is a Medium movie.

    A Night at the Roxbury is a Short movie.

Conclusion

Through this analysis, I uncovered some amazing insights into 1990s movie trends! The most common movie duration was around 100 minutes, showing that filmmakers of this era loved a standard runtime. The histogram revealed that most movies were packed between 80-120 minutes, with only a few daring outliers. Plus, short action movies (less than 90 minutes) were quite rare, hinting that 90s action films usually went for longer thrills!

The skills I've learned here—filtering data, applying conditions, and visualizing trends—are essential in any data-driven field. Isn't it fascinating how versatile data analysis is? The same techniques can be used in sports analytics, business intelligence, or even predicting future movie trends.

So, if you're just beginning, don't worry about creating the perfect project. Just dive in, learn, and keep growing. Every dataset has a story—aren't you curious to discover and tell it?

0
Subscribe to my newsletter

Read articles from Sharon Nguyai directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Sharon Nguyai
Sharon Nguyai