相关文章推荐

这里创建了7B文件夹,用来放置7B模型,可根据所需模型大小自动调整

2、使用 wegt 命令下载模型

这里展示了 7B 13B 30B 65B 的下载命令,可根据自身需要下载

其中 tokenizer.model tokenizer_checklist.chk 是必须下载的

wget https://agi.gpt4.org/llama/LLaMA/tokenizer.model -O ./tokenizer.model
wget https://agi.gpt4.org/llama/LLaMA/tokenizer_checklist.chk -O ./tokenizer_checklist.chk
wget https://agi.gpt4.org/llama/LLaMA/7B/consolidated.00.pth -O ./7B/consolidated.00.pth
wget https://agi.gpt4.org/llama/LLaMA/7B/params.json -O ./7B/params.json
wget https://agi.gpt4.org/llama/LLaMA/7B/checklist.chk -O ./7B/checklist.chk
wget https://agi.gpt4.org/llama/LLaMA/13B/consolidated.00.pth -O ./13B/consolidated.00.pth
wget https://agi.gpt4.org/llama/LLaMA/13B/consolidated.01.pth -O ./13B/consolidated.01.pth
wget https://agi.gpt4.org/llama/LLaMA/13B/params.json -O ./13B/params.json
wget https://agi.gpt4.org/llama/LLaMA/13B/checklist.chk -O ./13B/checklist.chk
wget https://agi.gpt4.org/llama/LLaMA/30B/consolidated.00.pth -O ./30B/consolidated.00.pth
wget https://agi.gpt4.org/llama/LLaMA/30B/consolidated.01.pth -O ./30B/consolidated.01.pth
wget https://agi.gpt4.org/llama/LLaMA/30B/consolidated.02.pth -O ./30B/consolidated.02.pth
wget https://agi.gpt4.org/llama/LLaMA/30B/consolidated.03.pth -O ./30B/consolidated.03.pth
wget https://agi.gpt4.org/llama/LLaMA/30B/params.json -O ./30B/params.json
wget https://agi.gpt4.org/llama/LLaMA/30B/checklist.chk -O ./30B/checklist.chk
wget https://agi.gpt4.org/llama/LLaMA/65B/consolidated.00.pth -O ./65B/consolidated.00.pth
wget https://agi.gpt4.org/llama/LLaMA/65B/consolidated.01.pth -O ./65B/consolidated.01.pth
wget https://agi.gpt4.org/llama/LLaMA/65B/consolidated.02.pth -O ./65B/consolidated.02.pth
wget https://agi.gpt4.org/llama/LLaMA/65B/consolidated.03.pth -O ./65B/consolidated.03.pth
wget https://agi.gpt4.org/llama/LLaMA/65B/consolidated.04.pth -O ./65B/consolidated.04.pth
wget https://agi.gpt4.org/llama/LLaMA/65B/consolidated.05.pth -O ./65B/consolidated.05.pth
wget https://agi.gpt4.org/llama/LLaMA/65B/consolidated.06.pth -O ./65B/consolidated.06.pth
wget https://agi.gpt4.org/llama/LLaMA/65B/consolidated.07.pth -O ./65B/consolidated.07.pth
wget https://agi.gpt4.org/llama/LLaMA/65B/params.json -O ./65B/params.json
wget https://agi.gpt4.org/llama/LLaMA/65B/checklist.chk -O ./65B/checklist.chk

方法三:磁力链接

window系统可以通过磁力链接进行下载,使用如迅雷软件

magnet:?xt=urn:btih:CDEE3052D85C697B84F4C1192F43A2276C0DAEA0

2、转化模型

当下载完模型后,其文件最终效果如下所示

├── 7B
│   ├── checklist.chk
│   ├── consolidated.00.pth
│   └── params.json
├── tokenizer.model
└── tokenizer_checklist.chk

这些文件是PyTorch.pth格式的),是不能被HuggingFace-transformers加载的,需要把这些文件转成HuggingFace格式的才行。

请整理成,如上所示文件存放位置,方便将其转化为 haggingface 训练所需模型

【【【 一定要调成这种文件路径 】】】

1、下载transformers

pip install transformers

2、使用 convert_llama_weights_to_hf.py 脚本,对模型进行转换

参考资料:

  • https://huggingface.co/docs/transformers/main/en/model_doc/llama
  • https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py
  • python -m transformers.models.llama.convert_llama_weights_to_hf \
      --input_dir wget-llama \
      --model_size 7B \
      --output_dir path_to_llama_model
    

    input_dir :模型路径

    model_size :模型大小

    output_dir :输出路径

    1、Killed

    Fetching all parameters from the checkpoint at wget-llama/7B.
    Killed
    

    显卡获取失败,请检查显卡

    2、protoc 或 protobuf

    TypeError: Descriptors cannot not be created directly.
    If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
    If you cannot immediately regenerate your protos, some other possible workarounds are:
     1. Downgrade the protobuf package to 3.20.x or lower.
     2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
    

    protobuf 版本太高,降低版本

    pip install protobuf==3.19.0 -U
    

    使用V100显卡(32G 显存)大约10分钟左右即可转换成功

    (sciengu_llama) root@autodl-container-9bb9119a00-f69936e8:~/autodl-tmp# python -m transformers.models.llama.convert_llama_weights_to_hf \
    >   --input_dir wget-llama \
    >   --model_size 7B \
    >   --output_dir path_to_llama_model
    Fetching all parameters from the checkpoint at wget-llama/7B.
    Loading the checkpoint in a Llama model.
    Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:43<00:00,  1.31s/it]
    Saving in the Transformers format.
    Saving a LlamaTokenizerFast to path_to_llama_model2.
    

    转换完成后的模型截图 ( path_to_llama_model 文件夹)

     
    推荐文章