Pandas is a powerful library in Python for data manipulation and analysis. One of its primary data structures is the DataFrame, which represents data in a tabular format. This article explores how to create and manipulate DataFrames in Pandas with examples.
Before using Pandas, you need to import the library:
import pandas as pd
You can create a DataFrame from various data sources such as dictionaries, lists, or CSV files.
# Creating a DataFrame from a dictionary data = { "Name": ["Alice", "Bob", "Charlie"], "Age": [25, 30, 35], "City": ["New York", "Los Angeles", "Chicago"] } df = pd.DataFrame(data) print(df)
# Creating a DataFrame from a list of lists data = [ ["Alice", 25, "New York"], ["Bob", 30, "Los Angeles"], ["Charlie", 35, "Chicago"] ] df = pd.DataFrame(data, columns=["Name", "Age", "City"]) print(df)
# Creating a DataFrame from a CSV file df = pd.read_csv("data.csv") print(df)
Once a DataFrame is created, you can perform various operations on it.
# Accessing a single column print(df["Name"]) # Accessing multiple columns print(df[["Name", "City"]])
# Adding a new column df["Salary"] = [50000, 60000, 70000] print(df)
# Deleting a column df = df.drop("Salary", axis=1) print(df)
# Accessing a single row by index print(df.iloc[1]) # Accessing multiple rows print(df.iloc[0:2])
# Filtering rows based on a condition filtered_df = df[df["Age"] > 25] print(filtered_df)
# Updating a value in the DataFrame df.loc[1, "City"] = "San Francisco" print(df)
You can perform aggregation and statistical operations on DataFrames.
# Summary statistics print(df.describe())
# Grouping data and calculating the mean grouped = df.groupby("City").mean() print(grouped)
# Sorting by a column sorted_df = df.sort_values("Age") print(sorted_df)
DataFrames are a fundamental feature of Pandas, allowing you to store and manipulate structured data easily. By mastering DataFrame creation and manipulation, you can perform efficient data analysis and preprocessing in Python.