Python學習筆記: 正則表達式_Regex(2)

import pandas as pd
file_location = '0000_學生資料範例檔_非正規.xlsx'
df = pd.read_excel(file_location)

print(len(df))
print(df.columns)
df.info()
df
# 找出學號裏面含文數字資料
df[df['學號'].str.contains(r'\w')]['學號']
0      a 1002252
1     U100  7128
2      U_1017113
3      U1017117 
4       U1017146
5      U102九7140
6       U1027153
7       U1017123
8       U1027130
9       U1027156
10      U1027119
11      U1037130
12      U1037131
13      U1047102
14      U1047113
15      U1047120
16      U1047124
Name: 學號, dtype: object
# 找出學號裏面含非文數字資料
df[df['學號'].str.contains(r'\W')]['學號']
0     a 1002252
1    U100  7128
3     U1017117 
Name: 學號, dtype: object
# 將學號裏面含非文數字資料取代為""
df['學號'].str.replace(r'\W+', '', regex=True)
0      a1002252
1      U1007128
2     U_1017113
3      U1017117
4      U1017146
5     U102九7140
6      U1027153
7      U1017123
8      U1027130
9      U1027156
10     U1027119
11     U1037130
12     U1037131
13     U1047102
14     U1047113
15     U1047120
16     U1047124
Name: 學號, dtype: object
# 將學號裏面含非英文及數字資料取代為""
df['學號'].str.replace(r'[^a-zA-Z0-9]', '', regex=True)
0     a1002252
1     U1007128
2     U1017113
3     U1017117
4     U1017146
5     U1027140
6     U1027153
7     U1017123
8     U1027130
9     U1027156
10    U1027119
11    U1037130
12    U1037131
13    U1047102
14    U1047113
15    U1047120
16    U1047124
Name: 學號, dtype: object
# 學號篩選出純數字
print(df['學號'].str.replace(r'\D', '', regex=True))

# 等同上列程式碼
print(df['學號'].str.replace(r'[^0-9]', '', regex=True))
0     1002252
1     1007128
2     1017113
3     1017117
4     1017146
5     1027140
6     1027153
7     1017123
8     1027130
9     1027156
10    1027119
11    1037130
12    1037131
13    1047102
14    1047113
15    1047120
16    1047124
Name: 學號, dtype: object
0     1002252
1     1007128
2     1017113
3     1017117
4     1017146
5     1027140
6     1027153
7     1017123
8     1027130
9     1027156
10    1027119
11    1037130
12    1037131
13    1047102
14    1047113
15    1047120
16    1047124
Name: 學號, dtype: object

發佈留言

發佈留言必須填寫的電子郵件地址不會公開。 必填欄位標示為 *