Pandas DataFrame is a two-dimensional labeled data structure, similar to a table or a spreadsheet. It consists of rows and columns, where each column can have a different data type. Pandas provides powerful tools for data manipulation and analysis using DataFrames. Here's a guide to working with Pandas DataFrames:
You can create DataFrames from various data sources such as dictionaries, lists, NumPy arrays, or external files like CSV, Excel, SQL databases, etc.
From Dictionary:
import pandas as pd data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['New York', 'Los Angeles', 'Chicago']} df = pd.DataFrame(data)
From Lists
data = [['Alice', 25, 'New York'], ['Bob', 30, 'Los Angeles'], ['Charlie', 35, 'Chicago']] df = pd.DataFrame(data, columns=['Name', 'Age', 'City'])
head()
and tail()
: View the first or last few rows of the DataFrame.
print(df.head()) # View the first few rows print(df.tail()) # View the last few rows
info()
: Get a concise summary of the DataFrame including column names, data types, and non-null counts.
print(df.info())
describe()
: Generate descriptive statistics for numerical columns.
print(df.describe())
Selecting Columns: You can select one or more columns using square brackets or dot notation.
print(df['Name']) print(df.Name) # Alternative syntax
Selecting Rows: Use iloc[]
or loc[]
to select rows by index or label, respectively.
print(df.iloc[0]) # Select row by index print(df.loc[0]) # Select row by label (if label is index)
Adding/Removing Columns: You can add or remove columns from a DataFrame.
df['Gender'] = ['Female', 'Male', 'Male'] # Adding a new column df.drop(columns=['Gender'], inplace=True) # Removing a column
Filtering Data: You can filter rows based on conditions.
print(df[df['Age'] > 25]) # Filter rows where Age > 25
Grouping and Aggregating: You can group data using groupby()
and perform aggregations like sum()
, mean()
, count()
, etc.
print(df.groupby('City').mean()) # Mean value for each group