Skip to content

Latest commit

 

History

History
25 lines (18 loc) · 994 Bytes

merge_remove_duplicates.md

File metadata and controls

25 lines (18 loc) · 994 Bytes

Merging two Dataframes and removing duplicates

Merging two dataframes is pretty easy to do using pandas when you can used DataFrame.merge()

In the example below I used that here:

df_merged = df_rpt.merge(df_sj, how="inner", on="domain", indicator=True)

When merging two data frames there are sometimes duplicates. I removed these using DataFrame.drop_duplicates

Here:

df_merged_no_dupes = df_merged.drop_duplicates(subset=["domain"]).reset_index(drop=True)

Full example of merging two data frames then removing duplicates.

 df_merged = df_rpt.merge(df_sj, how="inner", on="domain", indicator=True)
    df_merged_no_dupes = df_merged.drop_duplicates(subset=["domain"]).reset_index(
        drop=True
    )