Handling missing data is a crucial part of data preprocessing, and Pandas provides several methods to deal with missing values effectively. Here are some common techniques for handling missing data using Pandas:
isna()
and isnull()
: These methods return a DataFrame of the same shape as the original, where each cell contains True
if it's missing and False
otherwise.print(df.isna()) # Boolean DataFrame indicating missing values
dropna()
: This method drops rows or columns containing missing values from the DataFrame.
df.dropna() # Drops rows with any missing values
Threshold: You can specify a threshold to drop rows or columns based on the number of non-null values.
df.dropna(thresh=2) # Drops rows with less than 2 non-null values
fillna()
: This method fills missing values with a specified value or method.
df.fillna(0) # Fill missing values with 0
Forward Fill (ffill) and Backward Fill (bfill): These methods propagate non-null values forward or backward along a Series or DataFrame.
df.fillna(method='ffill') # Forward fill df.fillna(method='bfill') # Backward fill
df['Column_Name'].fillna('unknown', inplace=True)