Python學習筆記: 批次刪除word docx 檔之頁首及頁尾(頁碼)

在實做本篇前，如果你的word檔案是舊版doc附檔名，
要先參考前一篇，將所有word舊版doc檔轉換成docx檔：

https://kuo.us.to/wordpress/python%e5%ad%b8%e7%bf%92%e7%ad%86%e8%a8%98/823/

在Python批次處理Word檔案的過程，可以使用docx這個模組，

例如收了20幾個word檔案，想要將word個別檔案的頁首和頁碼等資料去除後，
再來批次轉成pdf檔案編頁碼。

但這個模組只能處理Word的docx副檔名。

以下為示範檔案的下載連結：

https://drive.google.com/file/d/189VuhPyHImQV88592M9uwjSMcJgm6Bwu/view?usp=share_link

先看一下資料夾中有5個示範檔案，有些有頁首，有些有頁碼，也有些二者皆有：

打開「附件2-1_示範檔案3_有頁首_有頁碼.docx」檔案，
看一下頁首是一張相片，頁碼是P.1，P.2。

接下來用程式處理這些示範檔案：

import os
from docx import Document

# 設定目標資料夾
file_path = "d:\\temp\\test"
 
# 找出所有doc檔做成list
files = [x for x in os.listdir(file_path) if x.endswith(".docx")]

for file in files:
    docx_file = os.path.join(file_path, file)
    edited_docx_file = os.path.join(file_path, os.path.splitext(file)[0] + "_edited" + ".docx")
    document = Document(docx_file)
    for section in document.sections:
        section.different_first_page_header_footer = False
        section.header.is_linked_to_previous = True # 如果設定為False，則保留頁首
        section.footer.is_linked_to_previous = True # 如果設定為False，則保留頁尾
        print(f'正在處理{docx_file}')
        print('----------------------')
        
    document.save(edited_docx_file)
     
print('所有word檔移除頁首、頁尾：已完成')

程式處理檔案後，會將原檔名再加_edited後存檔：

打開「附件2-1_示範檔案3_有頁首_有頁碼_edited.docx」檔案，
原先的相片頁首和P.1，P.2的頁尾已被移除：

其他的檔案就留給大家自行查看結果。

但這個模組只能處理Word的docx副檔名。

發佈留言 取消回覆

發佈留言取消回覆