Home Python C Language C ++ HTML 5 CSS Javascript Java Kotlin SQL DJango Bootstrap React.js R C# PHP ASP.Net Numpy Dart Pandas Digital Marketing

Pandas - Removing Duplicates




Removing duplicates in Pandas is a common operation, especially when dealing with datasets where duplicate rows may exist. You can remove duplicates based on one or more columns or consider the entire row for duplicates. Here's how you can do it:

Removing Duplicates Based on Columns

You can use the drop_duplicates() method to remove duplicate rows based on specific columns.


        import pandas as pd

        # Example DataFrame with duplicate rows
        data = {'Name': ['Alice', 'Bob', 'Alice', 'Charlie', 'Bob'],
                'Age': [25, 30, 25, 35, 30],
                'City': ['New York', 'Los Angeles', 'New York', 'Chicago', 'Los Angeles']}
        df = pd.DataFrame(data)
        
        # Removing duplicates based on the 'Name' column
        df_no_duplicates = df.drop_duplicates(subset=['Name'])
        print(df_no_duplicates)
      

Removing Complete Duplicate Rows

If you want to remove rows where all columns have identical values, you can use the drop_duplicates() method without specifying any subset.


        # Removing complete duplicate rows
        df_no_duplicates = df.drop_duplicates()
        print(df_no_duplicates)
      

Inplace Operation

By default, drop_duplicates() returns a new DataFrame with duplicates removed. If you want to modify the existing DataFrame in place, you can use the inplace=True argument.


        # Removing duplicates in place
        df.drop_duplicates(inplace=True)
        print(df)
      

Keeping the Last Occurrence

By default, drop_duplicates() keeps the first occurrence of a duplicate row and removes the rest. If you want to keep the last occurrence instead, you can use the keep='last' argument.

        
        # Keeping the last occurrence of duplicates
        df_no_duplicates = df.drop_duplicates(keep='last')
        print(df_no_duplicates)
      

These are the basic ways to remove duplicates in Pandas. Depending on your specific use case, you can customize the method with different arguments and options.



Advertisement





Q3 Schools : India


Online Complier

HTML 5

Python

java

C++

C

JavaScript

Website Development

HTML

CSS

JavaScript

Python

SQL

Campus Learning

C

C#

java