pandas: Remove NaN (missing values) with dropna()

You can remove NaN from pandas.DataFrame and pandas.Series with the dropna() method.

Contents

Remove rows/columns where all elements are NaN: how='all'
Remove rows/columns that contain at least one NaN: how='any' (default)
Remove rows/columns according to the number of non-missing values: thresh
Remove based on specific rows/columns: subset
Update the original object: inplace
For pandas.Series

While this article primarily deals with NaN (Not a Number), it's important to note that in pandas, None is also treated as a missing value.

Missing values in pandas (nan, None, pd.NA)

See the following article on extracting, replacing, and counting missing values.

pandas: Find rows/columns with NaN (missing values)
pandas: Replace NaN (missing values) with fillna()
pandas: Detect and count NaN (missing values) with isnull(), isna()

The sample code in this article uses pandas version 2.0.3. As an example, read a CSV file with missing values.

sample_pandas_normal_nan.csv

import pandas as pdprint(pd.__version__)# 2.0.3df = pd.read_csv('data/src/sample_pandas_normal_nan.csv')print(df)# name age state point other# 0 Alice 24.0 NY NaN NaN# 1 NaN NaN NaN NaN NaN# 2 Charlie NaN CA NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen NaN CA 88.0 NaN# 5 Frank 30.0 NaN NaN NaN

source: pandas_nan_dropna.py

Remove rows/columns where all elements are `NaN`: `how='all'`

By setting how='all', rows where all elements are NaN are removed.

print(df.dropna(how='all'))# name age state point other# 0 Alice 24.0 NY NaN NaN# 2 Charlie NaN CA NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen NaN CA 88.0 NaN# 5 Frank 30.0 NaN NaN NaN

source: pandas_nan_dropna.py

If axis is set to 1 or 'columns', columns where all elements are NaN are removed.

print(df.dropna(how='all', axis=1))# name age state point# 0 Alice 24.0 NY NaN# 1 NaN NaN NaN NaN# 2 Charlie NaN CA NaN# 3 Dave 68.0 TX 70.0# 4 Ellen NaN CA 88.0# 5 Frank 30.0 NaN NaN

source: pandas_nan_dropna.py

Remove rows/columns that contain at least one `NaN`: `how='any'` (default)

To use as an example, remove rows and columns where all values are NaN.

df2 = df.dropna(how='all').dropna(how='all', axis=1)print(df2)# name age state point# 0 Alice 24.0 NY NaN# 2 Charlie NaN CA NaN# 3 Dave 68.0 TX 70.0# 4 Ellen NaN CA 88.0# 5 Frank 30.0 NaN NaN

source: pandas_nan_dropna.py

By setting how='any', rows that contain at least one NaN are removed. Since the default value of how is 'any', the result is the same even if omitted.

print(df2.dropna(how='any'))# name age state point# 3 Dave 68.0 TX 70.0print(df2.dropna())# name age state point# 3 Dave 68.0 TX 70.0

source: pandas_nan_dropna.py

If axis is set to 1 or 'columns', columns that contain at least one NaN are removed.

print(df2.dropna(axis=1))# name# 0 Alice# 2 Charlie# 3 Dave# 4 Ellen# 5 Frank

source: pandas_nan_dropna.py

Remove rows/columns according to the number of non-missing values: `thresh`

With the thresh argument, you can remove rows and columns according to the number of non-missing values.

For example, if thresh=3, the rows that contain more than three non-missing values remain, and the other rows are removed.

print(df.dropna(thresh=3))# name age state point other# 0 Alice 24.0 NY NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen NaN CA 88.0 NaN

source: pandas_nan_dropna.py

If axis is set to 1 or 'columns', columns are removed.

print(df.dropna(thresh=3, axis=1))# name age state# 0 Alice 24.0 NY# 1 NaN NaN NaN# 2 Charlie NaN CA# 3 Dave 68.0 TX# 4 Ellen NaN CA# 5 Frank 30.0 NaN

source: pandas_nan_dropna.py

Remove based on specific rows/columns: `subset`

If you want to remove based on specific rows and columns, specify a list of rows/columns labels (names) to the subset argument of dropna(). Even if you want to set only one label, you need to specify it as a list, like subset=['name'].

Since the default is how='any' and axis=0, rows with NaN in the columns specified by subset are removed.

print(df.dropna(subset=['age']))# name age state point other# 0 Alice 24.0 NY NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 5 Frank 30.0 NaN NaN NaNprint(df.dropna(subset=['age', 'state']))# name age state point other# 0 Alice 24.0 NY NaN NaN# 3 Dave 68.0 TX 70.0 NaN

source: pandas_nan_dropna.py

If how is set to 'all', rows with NaN in all specified columns are removed.

print(df.dropna(subset=['age', 'state'], how='all'))# name age state point other# 0 Alice 24.0 NY NaN NaN# 2 Charlie NaN CA NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen NaN CA 88.0 NaN# 5 Frank 30.0 NaN NaN NaN

source: pandas_nan_dropna.py

If axis is set to 1 or 'columns', columns are removed.

print(df.dropna(subset=[0, 4], axis=1))# name state# 0 Alice NY# 1 NaN NaN# 2 Charlie CA# 3 Dave TX# 4 Ellen CA# 5 Frank NaNprint(df.dropna(subset=[0, 4], axis=1, how='all'))# name age state point# 0 Alice 24.0 NY NaN# 1 NaN NaN NaN NaN# 2 Charlie NaN CA NaN# 3 Dave 68.0 TX 70.0# 4 Ellen NaN CA 88.0# 5 Frank 30.0 NaN NaN

source: pandas_nan_dropna.py

An error is raised if a non-existent row or column name is specified. An error is also raised if you set axis=1 but specify column names or set axis=0 (default) but specify row names.

# print(df.dropna(subset=['age', 'state', 'xxx']))# KeyError: ['xxx']# print(df.dropna(subset=['age', 'state'], axis=1))# KeyError: ['age', 'state']

source: pandas_nan_dropna.py

Update the original object: `inplace`

As shown in the examples above, by default, a new object is returned, and the original object is not changed, but if inplace=True, the original object itself is updated.

df.dropna(subset=['age'], inplace=True)print(df)# name age state point other# 0 Alice 24.0 NY NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 5 Frank 30.0 NaN NaN NaN

source: pandas_nan_dropna.py

For `pandas.Series`

The only valid argument for dropna() of pandas.Series is inplace. Since it is one-dimensional data, the elements with NaN are simply removed.

s = pd.read_csv('data/src/sample_pandas_normal_nan.csv')['age']print(s)# 0 24.0# 1 NaN# 2 NaN# 3 68.0# 4 NaN# 5 30.0# Name: age, dtype: float64print(s.dropna())# 0 24.0# 3 68.0# 5 30.0# Name: age, dtype: float64s.dropna(inplace=True)print(s)# 0 24.0# 3 68.0# 5 30.0# Name: age, dtype: float64

source: pandas_nan_dropna.py

pandas: Remove NaN (missing values) with dropna() | note.nkmk.me (2024)

Remove rows/columns where all elements are NaN: how='all'

Remove rows/columns that contain at least one NaN: how='any' (default)

Remove rows/columns according to the number of non-missing values: thresh

Remove based on specific rows/columns: subset

Update the original object: inplace

For pandas.Series

Remove rows/columns where all elements are `NaN`: `how='all'`

Remove rows/columns that contain at least one `NaN`: `how='any'` (default)

Remove rows/columns according to the number of non-missing values: `thresh`

Remove based on specific rows/columns: `subset`

Update the original object: `inplace`

For `pandas.Series`