pandas: Remove NaN (missing values) with dropna() | note.nkmk.me (2024)

You can remove NaN from pandas.DataFrame and pandas.Series with the dropna() method.

Contents

  • Remove rows/columns where all elements are NaN: how='all'
  • Remove rows/columns that contain at least one NaN: how='any' (default)
  • Remove rows/columns according to the number of non-missing values: thresh
  • Remove based on specific rows/columns: subset
  • Update the original object: inplace
  • For pandas.Series

While this article primarily deals with NaN (Not a Number), it's important to note that in pandas, None is also treated as a missing value.

  • Missing values in pandas (nan, None, pd.NA)

See the following article on extracting, replacing, and counting missing values.

  • pandas: Find rows/columns with NaN (missing values)
  • pandas: Replace NaN (missing values) with fillna()
  • pandas: Detect and count NaN (missing values) with isnull(), isna()

The sample code in this article uses pandas version 2.0.3. As an example, read a CSV file with missing values.

import pandas as pdprint(pd.__version__)# 2.0.3df = pd.read_csv('data/src/sample_pandas_normal_nan.csv')print(df)# name age state point other# 0 Alice 24.0 NY NaN NaN# 1 NaN NaN NaN NaN NaN# 2 Charlie NaN CA NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen NaN CA 88.0 NaN# 5 Frank 30.0 NaN NaN NaN

Remove rows/columns where all elements are NaN: how='all'

By setting how='all', rows where all elements are NaN are removed.

print(df.dropna(how='all'))# name age state point other# 0 Alice 24.0 NY NaN NaN# 2 Charlie NaN CA NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen NaN CA 88.0 NaN# 5 Frank 30.0 NaN NaN NaN

If axis is set to 1 or 'columns', columns where all elements are NaN are removed.

print(df.dropna(how='all', axis=1))# name age state point# 0 Alice 24.0 NY NaN# 1 NaN NaN NaN NaN# 2 Charlie NaN CA NaN# 3 Dave 68.0 TX 70.0# 4 Ellen NaN CA 88.0# 5 Frank 30.0 NaN NaN

Note that if axis is set to 0 or 'index', rows are removed. Since the default value of axis is 0, rows are removed if omitted, as shown in the first example.

In former versions, both rows and columns were removed with axis=[0, 1], but since version 1.0.0, axis can no longer be specified with a list or tuple. If you want to remove both rows and columns, you can repeatedly apply dropna().

# print(df.dropna(how='all', axis=[0, 1]))# TypeError: supplying multiple axes to axis is no longer supported.print(df.dropna(how='all').dropna(how='all', axis=1))# name age state point# 0 Alice 24.0 NY NaN# 2 Charlie NaN CA NaN# 3 Dave 68.0 TX 70.0# 4 Ellen NaN CA 88.0# 5 Frank 30.0 NaN NaN

Remove rows/columns that contain at least one NaN: how='any' (default)

To use as an example, remove rows and columns where all values are NaN.

df2 = df.dropna(how='all').dropna(how='all', axis=1)print(df2)# name age state point# 0 Alice 24.0 NY NaN# 2 Charlie NaN CA NaN# 3 Dave 68.0 TX 70.0# 4 Ellen NaN CA 88.0# 5 Frank 30.0 NaN NaN

By setting how='any', rows that contain at least one NaN are removed. Since the default value of how is 'any', the result is the same even if omitted.

print(df2.dropna(how='any'))# name age state point# 3 Dave 68.0 TX 70.0print(df2.dropna())# name age state point# 3 Dave 68.0 TX 70.0

If axis is set to 1 or 'columns', columns that contain at least one NaN are removed.

print(df2.dropna(axis=1))# name# 0 Alice# 2 Charlie# 3 Dave# 4 Ellen# 5 Frank

Remove rows/columns according to the number of non-missing values: thresh

With the thresh argument, you can remove rows and columns according to the number of non-missing values.

For example, if thresh=3, the rows that contain more than three non-missing values remain, and the other rows are removed.

print(df.dropna(thresh=3))# name age state point other# 0 Alice 24.0 NY NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen NaN CA 88.0 NaN

If axis is set to 1 or 'columns', columns are removed.

print(df.dropna(thresh=3, axis=1))# name age state# 0 Alice 24.0 NY# 1 NaN NaN NaN# 2 Charlie NaN CA# 3 Dave 68.0 TX# 4 Ellen NaN CA# 5 Frank 30.0 NaN

Remove based on specific rows/columns: subset

If you want to remove based on specific rows and columns, specify a list of rows/columns labels (names) to the subset argument of dropna(). Even if you want to set only one label, you need to specify it as a list, like subset=['name'].

Since the default is how='any' and axis=0, rows with NaN in the columns specified by subset are removed.

print(df.dropna(subset=['age']))# name age state point other# 0 Alice 24.0 NY NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 5 Frank 30.0 NaN NaN NaNprint(df.dropna(subset=['age', 'state']))# name age state point other# 0 Alice 24.0 NY NaN NaN# 3 Dave 68.0 TX 70.0 NaN

If how is set to 'all', rows with NaN in all specified columns are removed.

print(df.dropna(subset=['age', 'state'], how='all'))# name age state point other# 0 Alice 24.0 NY NaN NaN# 2 Charlie NaN CA NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen NaN CA 88.0 NaN# 5 Frank 30.0 NaN NaN NaN

If axis is set to 1 or 'columns', columns are removed.

print(df.dropna(subset=[0, 4], axis=1))# name state# 0 Alice NY# 1 NaN NaN# 2 Charlie CA# 3 Dave TX# 4 Ellen CA# 5 Frank NaNprint(df.dropna(subset=[0, 4], axis=1, how='all'))# name age state point# 0 Alice 24.0 NY NaN# 1 NaN NaN NaN NaN# 2 Charlie NaN CA NaN# 3 Dave 68.0 TX 70.0# 4 Ellen NaN CA 88.0# 5 Frank 30.0 NaN NaN

An error is raised if a non-existent row or column name is specified. An error is also raised if you set axis=1 but specify column names or set axis=0 (default) but specify row names.

# print(df.dropna(subset=['age', 'state', 'xxx']))# KeyError: ['xxx']# print(df.dropna(subset=['age', 'state'], axis=1))# KeyError: ['age', 'state']

Update the original object: inplace

As shown in the examples above, by default, a new object is returned, and the original object is not changed, but if inplace=True, the original object itself is updated.

df.dropna(subset=['age'], inplace=True)print(df)# name age state point other# 0 Alice 24.0 NY NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 5 Frank 30.0 NaN NaN NaN

For pandas.Series

The only valid argument for dropna() of pandas.Series is inplace. Since it is one-dimensional data, the elements with NaN are simply removed.

s = pd.read_csv('data/src/sample_pandas_normal_nan.csv')['age']print(s)# 0 24.0# 1 NaN# 2 NaN# 3 68.0# 4 NaN# 5 30.0# Name: age, dtype: float64print(s.dropna())# 0 24.0# 3 68.0# 5 30.0# Name: age, dtype: float64s.dropna(inplace=True)print(s)# 0 24.0# 3 68.0# 5 30.0# Name: age, dtype: float64
pandas: Remove NaN (missing values) with dropna() | note.nkmk.me (2024)
Top Articles
Latest Posts
Article information

Author: Dong Thiel

Last Updated:

Views: 6183

Rating: 4.9 / 5 (59 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Dong Thiel

Birthday: 2001-07-14

Address: 2865 Kasha Unions, West Corrinne, AK 05708-1071

Phone: +3512198379449

Job: Design Planner

Hobby: Graffiti, Foreign language learning, Gambling, Metalworking, Rowing, Sculling, Sewing

Introduction: My name is Dong Thiel, I am a brainy, happy, tasty, lively, splendid, talented, cooperative person who loves writing and wants to share my knowledge and understanding with you.