Python學習筆記:Pandas 將DataFrame 轉換成 Dictionary,並依條件fillna

資料檔:

目標:將NaN以平均值填入

 import pandas as pd

import numpy as np

file = 'Fish.csv'

df = pd.read_csv(file)
print(len(df))
print(df.columns)
df.head()
159
Index(['Species', 'Weight', 'Length1', 'Length2', 'Length3', 'Height',
       'Width'],
      dtype='object')
Out[1]:
Species Weight Length1 Length2 Length3 Height Width
0 Bream 242.0 23.2 25.4 30.0 11.5200 4.0200
1 Bream 290.0 24.0 26.3 31.2 12.4800 4.3056
2 Bream 340.0 23.9 26.5 31.1 12.3778 4.6961
3 Bream 363.0 26.3 29.0 33.5 12.7300 4.4555
4 Bream 430.0 NaN 29.0 34.0 12.4440 5.1340
In [2]:
df_mean = df[['Species', 'Length1']].groupby('Species').mean()#.reset_index(drop=False)
df_mean
Out[2]:
Length1
Species
Bream 30.417647
Parkki 18.727273
Perch 25.735714
Pike 42.476471
Roach 20.645000
Smelt 11.257143
Whitefish 28.800000
In [3]:
df_dict = df_mean.squeeze().to_dict()
df_dict
Out[3]:
{'Bream': 30.41764705882353,
 'Parkki': 18.727272727272727,
 'Perch': 25.735714285714288,
 'Pike': 42.476470588235294,
 'Roach': 20.645,
 'Smelt': 11.257142857142856,
 'Whitefish': 28.8}
In [4]:
for i in df['Species']:
    df.loc[(df['Species']==i) & (df['Length1'].isna()), 'Length1'] = df_dict[i]

df
Out[4]:
Species Weight Length1 Length2 Length3 Height Width
0 Bream 242.0 23.200000 25.4 30.0 11.5200 4.0200
1 Bream 290.0 24.000000 26.3 31.2 12.4800 4.3056
2 Bream 340.0 23.900000 26.5 31.1 12.3778 4.6961
3 Bream 363.0 26.300000 29.0 33.5 12.7300 4.4555
4 Bream 430.0 30.417647 29.0 34.0 12.4440 5.1340
154 Smelt 12.2 11.500000 12.2 13.4 2.0904 1.3936
155 Smelt 13.4 11.700000 12.4 13.5 2.4300 1.2690
156 Smelt 12.2 12.100000 13.0 13.8 2.2770 1.2558
157 Smelt 19.7 13.200000 14.3 15.2 2.8728 2.0672
158 Smelt 19.9 13.800000 15.0 16.2 2.9322 1.8792

159 rows × 7 columns

發佈留言

發佈留言必須填寫的電子郵件地址不會公開。 必填欄位標示為 *