这里创建了7B文件夹,用来放置7B模型,可根据所需模型大小自动调整
2、使用
wegt
命令下载模型
这里展示了
7B
、
13B
、
30B
、
65B
的下载命令,可根据自身需要下载
其中
tokenizer.model
和
tokenizer_checklist.chk
是必须下载的
wget https://agi.gpt4.org/llama/LLaMA/tokenizer.model -O ./tokenizer.model
wget https://agi.gpt4.org/llama/LLaMA/tokenizer_checklist.chk -O ./tokenizer_checklist.chk
wget https://agi.gpt4.org/llama/LLaMA/7B/consolidated.00.pth -O ./7B/consolidated.00.pth
wget https://agi.gpt4.org/llama/LLaMA/7B/params.json -O ./7B/params.json
wget https://agi.gpt4.org/llama/LLaMA/7B/checklist.chk -O ./7B/checklist.chk
wget https://agi.gpt4.org/llama/LLaMA/13B/consolidated.00.pth -O ./13B/consolidated.00.pth
wget https://agi.gpt4.org/llama/LLaMA/13B/consolidated.01.pth -O ./13B/consolidated.01.pth
wget https://agi.gpt4.org/llama/LLaMA/13B/params.json -O ./13B/params.json
wget https://agi.gpt4.org/llama/LLaMA/13B/checklist.chk -O ./13B/checklist.chk
wget https://agi.gpt4.org/llama/LLaMA/30B/consolidated.00.pth -O ./30B/consolidated.00.pth
wget https://agi.gpt4.org/llama/LLaMA/30B/consolidated.01.pth -O ./30B/consolidated.01.pth
wget https://agi.gpt4.org/llama/LLaMA/30B/consolidated.02.pth -O ./30B/consolidated.02.pth
wget https://agi.gpt4.org/llama/LLaMA/30B/consolidated.03.pth -O ./30B/consolidated.03.pth
wget https://agi.gpt4.org/llama/LLaMA/30B/params.json -O ./30B/params.json
wget https://agi.gpt4.org/llama/LLaMA/30B/checklist.chk -O ./30B/checklist.chk
wget https://agi.gpt4.org/llama/LLaMA/65B/consolidated.00.pth -O ./65B/consolidated.00.pth
wget https://agi.gpt4.org/llama/LLaMA/65B/consolidated.01.pth -O ./65B/consolidated.01.pth
wget https://agi.gpt4.org/llama/LLaMA/65B/consolidated.02.pth -O ./65B/consolidated.02.pth
wget https://agi.gpt4.org/llama/LLaMA/65B/consolidated.03.pth -O ./65B/consolidated.03.pth
wget https://agi.gpt4.org/llama/LLaMA/65B/consolidated.04.pth -O ./65B/consolidated.04.pth
wget https://agi.gpt4.org/llama/LLaMA/65B/consolidated.05.pth -O ./65B/consolidated.05.pth
wget https://agi.gpt4.org/llama/LLaMA/65B/consolidated.06.pth -O ./65B/consolidated.06.pth
wget https://agi.gpt4.org/llama/LLaMA/65B/consolidated.07.pth -O ./65B/consolidated.07.pth
wget https://agi.gpt4.org/llama/LLaMA/65B/params.json -O ./65B/params.json
wget https://agi.gpt4.org/llama/LLaMA/65B/checklist.chk -O ./65B/checklist.chk
方法三:磁力链接
window系统可以通过磁力链接进行下载,使用如迅雷软件
magnet:?xt=urn:btih:CDEE3052D85C697B84F4C1192F43A2276C0DAEA0
2、转化模型
当下载完模型后,其文件最终效果如下所示
├── 7B
│ ├── checklist.chk
│ ├── consolidated.00.pth
│ └── params.json
├── tokenizer.model
└── tokenizer_checklist.chk
这些文件是PyTorch(.pth
格式的),是不能被HuggingFace-transformers加载的,需要把这些文件转成HuggingFace格式的才行。
请整理成,如上所示文件存放位置,方便将其转化为 haggingface 训练所需模型
【【【 一定要调成这种文件路径 】】】
1、下载transformers
库
pip install transformers
2、使用 convert_llama_weights_to_hf.py
脚本,对模型进行转换
参考资料:
https://huggingface.co/docs/transformers/main/en/model_doc/llama
https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py
python -m transformers.models.llama.convert_llama_weights_to_hf \
--input_dir wget-llama \
--model_size 7B \
--output_dir path_to_llama_model
input_dir :模型路径
model_size :模型大小
output_dir :输出路径
1、Killed
Fetching all parameters from the checkpoint at wget-llama/7B.
Killed
显卡获取失败,请检查显卡
2、protoc 或 protobuf
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
1. Downgrade the protobuf package to 3.20.x or lower.
2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
protobuf 版本太高,降低版本
pip install protobuf==3.19.0 -U
使用V100显卡(32G 显存)大约10分钟左右即可转换成功
(sciengu_llama) root@autodl-container-9bb9119a00-f69936e8:~/autodl-tmp# python -m transformers.models.llama.convert_llama_weights_to_hf \
> --input_dir wget-llama \
> --model_size 7B \
> --output_dir path_to_llama_model
Fetching all parameters from the checkpoint at wget-llama/7B.
Loading the checkpoint in a Llama model.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:43<00:00, 1.31s/it]
Saving in the Transformers format.
Saving a LlamaTokenizerFast to path_to_llama_model2.
转换完成后的模型截图 ( path_to_llama_model 文件夹)