文件编码转换


远古项目往往有很多问题,编码首当其冲。鉴于 GBKUTF-8 互不兼容,IDEA默认打开就会乱码,单个转码太慢太麻烦,所以整个批量转 UTF-8 国际通用码的小工具。

import os
import chardet
import codecs

def write_file(file_path, content, encoding="utf-8"):
with codecs.open(file_path, "w", encoding) as f:
f.write(content)

def convert_to_utf8(src_path):
with open(src_path, "rb") as f:
raw_data = f.read()
detected = chardet.detect(raw_data)
original_encoding = detected["encoding"]

if original_encoding is None:
print(f"[SKIP] {src_path}: encoding not detected")
return

if original_encoding.lower() != "utf-8":
try:
with codecs.open(src_path, "r", original_encoding) as f:
content = f.read()
write_file(src_path, content, encoding="utf-8")
print(f"[OK] {src_path}: {original_encoding} → utf-8")
except Exception as e:
print(f"[ERROR] {src_path}: failed to convert ({original_encoding}) - {e}")
else:
print(f"[SKIP] {src_path}: already utf-8")

def process_directory(root_dir):
for parent, dirnames, filenames in os.walk(root_dir):
for filename in filenames:
if filename.endswith((".java", ".jsp")):
full_path = os.path.join(parent, filename)
convert_to_utf8(full_path)

if __name__ == "__main__":
src_path = "C:/Users/File"
process_directory(src_path)

主要转 javajsp 文件,如果你有需求,可在 if filename.endswith((".java", ".jsp")) 这行代码的括号中添加后缀格式。

前提:有Python环境

  1. 首先复制代码并保存为.py文件,名称随意,例: convert.py

  2. 替换需要转码的目录路径,根目录即可,会递归执行

    src_path = "C:/Users/File"

    # 注意路径划分以正斜杠/
  3. 在保存的位置打开终端并执行

    pip install chardet
  4. 最后执行

    python convert.py

出现 [OK] 开头说明成功了。还有,不要忘记 jsp 文件开头的 pageEncoding 也要改为 UTF-8,否则部署打开全是乱码锟斤拷。

最后附个视频,对编码不了解的可以看看,视频相同,两个平台。

YouTube:

BiliBili: