本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《
阿里云开发者社区用户服务协议
》和
《
阿里云开发者社区知识产权保护指引
》。如果您发现本社区中有涉嫌抄袭的内容,填写
侵权投诉表单
进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。
def
read_word_document
(
file_path
)
:
doc
=
docx
.
Document
(
file_path
)
paragraphs
=
[
p
.
text
for
p
in
doc
.
paragraphs
]
return
paragraphs
def
remove_extra_newlines
(
paragraphs
)
:
cleaned_paragraphs
=
[
]
for
paragraph
in
paragraphs
:
cleaned_paragraph
=
paragraph
.
replace
(
"\n"
,
" "
)
cleaned_paragraphs
.
append
(
cleaned_paragraph
)
return
cleaned_paragraphs
def
save_modified_document
(
paragraphs
,
output_file_path
)
:
doc
=
docx
.
Document
(
)
for
paragraph
in
paragraphs
:
doc
.
add_paragraph
(
paragraph
)
doc
.
save
(
output_file_path
)
input_file
=
"input.docx"
output_file
=
"output.docx"
paragraphs
=
read_word_document
(
input_file
)
cleaned_paragraphs
=
remove_extra_newlines
(
paragraphs
)
save_modified_document
(
cleaned_paragraphs
,
output_file
)
参考
博文