Python學習筆記：Pandas 將DataFrame 轉換成 Dictionary，並依條件fillna

資料檔：

目標：將NaN以平均值填入

import pandas as pd

import numpy as np

file = 'Fish.csv'

df = pd.read_csv(file)
print(len(df))
print(df.columns)
df.head()

159
Index(['Species', 'Weight', 'Length1', 'Length2', 'Length3', 'Height',
       'Width'],
      dtype='object')

Out[1]:

	Species	Weight	Length1	Length2	Length3	Height	Width
0	Bream	242.0	23.2	25.4	30.0	11.5200	4.0200
1	Bream	290.0	24.0	26.3	31.2	12.4800	4.3056
2	Bream	340.0	23.9	26.5	31.1	12.3778	4.6961
3	Bream	363.0	26.3	29.0	33.5	12.7300	4.4555
4	Bream	430.0	NaN	29.0	34.0	12.4440	5.1340

In [2]:

df_mean = df[['Species', 'Length1']].groupby('Species').mean()#.reset_index(drop=False)
df_mean

Out[2]:

	Length1
Species
Bream	30.417647
Parkki	18.727273
Perch	25.735714
Pike	42.476471
Roach	20.645000
Smelt	11.257143
Whitefish	28.800000

In [3]:

df_dict = df_mean.squeeze().to_dict()
df_dict

Out[3]:

{'Bream': 30.41764705882353,
 'Parkki': 18.727272727272727,
 'Perch': 25.735714285714288,
 'Pike': 42.476470588235294,
 'Roach': 20.645,
 'Smelt': 11.257142857142856,
 'Whitefish': 28.8}

In [4]:

for i in df['Species']:
    df.loc[(df['Species']==i) & (df['Length1'].isna()), 'Length1'] = df_dict[i]

df

Out[4]:

	Species	Weight	Length1	Length2	Length3	Height	Width
0	Bream	242.0	23.200000	25.4	30.0	11.5200	4.0200
1	Bream	290.0	24.000000	26.3	31.2	12.4800	4.3056
2	Bream	340.0	23.900000	26.5	31.1	12.3778	4.6961
3	Bream	363.0	26.300000	29.0	33.5	12.7300	4.4555
4	Bream	430.0	30.417647	29.0	34.0	12.4440	5.1340
…	…	…	…	…	…	…	…
154	Smelt	12.2	11.500000	12.2	13.4	2.0904	1.3936
155	Smelt	13.4	11.700000	12.4	13.5	2.4300	1.2690
156	Smelt	12.2	12.100000	13.0	13.8	2.2770	1.2558
157	Smelt	19.7	13.200000	14.3	15.2	2.8728	2.0672
158	Smelt	19.9	13.800000	15.0	16.2	2.9322	1.8792

159 rows × 7 columns

發佈留言 取消回覆

發佈留言取消回覆