Trainer 类提供了一个 API ,用于在 PyTorch 中对大多数标准的 use case 进行 feature-complete training 。在实例化 Trainer 之前,请创建一个 TrainingArguments 。该 API 支持在多个 GPU/TPU 上进行分布式训练、也支持通过 NVIDIA Apex 和 Native AMP 从而针对 PyTorch 的混合精度训练。
Trainer
API
PyTorch
use case
feature-complete training
TrainingArguments
GPU/TPU
NVIDIA Apex
Native AMP
Trainer 类包含 basic training loop 。为了注入自定义行为,你可以对 Trainer 进行子类化,并重写以下方法:
basic training loop
</div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"/><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">get_train_dataloader</span>; <span class="cm-variable">get_eval_dataloader</span>; <span class="cm-variable">get_test_dataloader</span>; <span class="cm-variable">log</span>; <span class="cm-variable">create_optimizer_and_scheduler</span>;</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">create_optimizer</span>; <span class="cm-variable">create_scheduler</span>; <span class="cm-variable">compute_loss</span>; <span class="cm-variable">training_step</span>; <span class="cm-variable">prediction_step</span>; <span class="cm-variable">evaluate</span>; <span class="cm-variable">predict</span></span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 44px;"/><div class="CodeMirror-gutters" style="display: none; height: 44px;"/></div></div></pre><p><span>下面是一个对 </span><code>Trainer</code><span> 子类化的例子,其中使用一个带权重的损失函数:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><span><span></span>x</span></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">torch</span> <span class="cm-keyword">import</span> <span class="cm-variable">nn</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">transformers</span> <span class="cm-keyword">import</span> <span class="cm-variable">Trainer</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">CustomTrainer</span>(<span class="cm-variable">Trainer</span>):</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-keyword">def</span> <span class="cm-def">compute_loss</span>(<span class="cm-variable-2">self</span>, <span class="cm-variable">model</span>, <span class="cm-variable">inputs</span>, <span class="cm-variable">return_outputs</span><span class="cm-operator">=</span><span class="cm-keyword">False</span>):</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">labels</span> <span class="cm-operator">=</span> <span class="cm-variable">inputs</span>.<span class="cm-property">get</span>(<span class="cm-string">"labels"</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-comment"># forward pass</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">outputs</span> <span class="cm-operator">=</span> <span class="cm-variable">model</span>(<span class="cm-operator">**</span><span class="cm-variable">inputs</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">logits</span> <span class="cm-operator">=</span> <span class="cm-variable">outputs</span>.<span class="cm-property">get</span>(<span class="cm-string">"logits"</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-comment"># compute custom loss (suppose one has 3 labels with different weights)</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">loss_fct</span> <span class="cm-operator">=</span> <span class="cm-variable">nn</span>.<span class="cm-property">CrossEntropyLoss</span>(<span class="cm-variable">weight</span><span class="cm-operator">=</span><span class="cm-variable">torch</span>.<span class="cm-property">tensor</span>([<span class="cm-number">1.0</span>, <span class="cm-number">2.0</span>, <span class="cm-number">3.0</span>]))</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">loss</span> <span class="cm-operator">=</span> <span class="cm-variable">loss_fct</span>(<span class="cm-variable">logits</span>.<span class="cm-property">view</span>(<span class="cm-operator">-</span><span class="cm-number">1</span>, <span class="cm-variable-2">self</span>.<span class="cm-property">model</span>.<span class="cm-property">config</span>.<span class="cm-property">num_labels</span>), <span class="cm-variable">labels</span>.<span class="cm-property">view</span>(<span class="cm-operator">-</span><span class="cm-number">1</span>))</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-keyword">return</span> (<span class="cm-variable">loss</span>, <span class="cm-variable">outputs</span>) <span class="cm-keyword">if</span> <span class="cm-variable">return_outputs</span> <span class="cm-keyword">else</span> <span class="cm-variable">loss</span></span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 286px;"/><div class="CodeMirror-gutters" style="display: none; height: 286px;"/></div></div></pre><p><span>对于 </span><code>PyTorch</code><span>,另一种定制化 </span><code>training loop</code><span> 行为的方法是采用 </span><code>callbacks</code><span>,它可以检查 </span><code>training loop state</code><span> (用于进度报告、向 </span><code>TensorBoard</code><span> 写日志)并作出决定(如 </span><code>early stopping</code><span> )。</span></p></li><li><p><code>Trainer</code><span> 类是针对 </span><code>Transformers</code><span> 模型进行了优化。如果你在其他模型上使用 </span><code>Trainer</code><span>,可能会有意想不到的行为。当你在其他模型上使用时,要确保:</span></p><ul><li><span>你的模型总是返回元组、或者 </span><code>ModelOutput</code><span> 的子类。</span></li><li><span>如果提供了一个 </span><code>labels</code><span> 参数,那么你的模型可以计算损失,并且该损失作为元组的第一个元素被模型返回(如果你的模型返回元组)。</span></li><li><span>你的模型可以接受多个 </span><code>label</code><span> 参数(使用 </span><code>TrainingArguments</code><span> 中的 </span><code>label_names</code><span> 来向 </span><code>Trainer</code><span> 指定它们的名称),但是它们全都不应该被命名为 </span><code>"label"</code><span> 。</span></li></ul></li></ol><h3 id="11-api"><span>1.1 API</span></h3><h4 id="a-trainingarguments"><span>a. TrainingArguments</span></h4><ol start=""><li><p><code>class transformers.TrainingArguments</code><span>:用于 </span><code>Trainer</code><span> 的参数(和 </span><code>training loop</code><span> 相关)。</span></p><p><span>通过使用 </span><code>class transformers.HfArgumentParser</code><span>,我们可以将 </span><code>TrainingArguments</code><span> 实例转换为 </span><code>argparse</code><span> 参数(可以在命令行中指定)。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python" style="break-inside: unset;"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">transformers</span>.<span class="cm-property">TrainingArguments</span>(</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">output_dir</span>: <span class="cm-builtin">str</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">overwrite_output_dir</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">do_train</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">do_eval</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">do_predict</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">evaluation_strategy</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-variable">transformers</span>.<span class="cm-property">trainer_utils</span>.<span class="cm-property">IntervalStrategy</span>, <span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-string">'no'</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">prediction_loss_only</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">per_device_train_batch_size</span>: <span class="cm-builtin">int</span> <span class="cm-operator">=</span> <span class="cm-number">8</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">per_device_eval_batch_size</span>: <span class="cm-builtin">int</span> <span class="cm-operator">=</span> <span class="cm-number">8</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">per_gpu_train_batch_size</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">int</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">per_gpu_eval_batch_size</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">int</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">gradient_accumulation_steps</span>: <span class="cm-builtin">int</span> <span class="cm-operator">=</span> <span class="cm-number">1</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">eval_accumulation_steps</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">int</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">eval_delay</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">float</span>] <span class="cm-operator">=</span> <span class="cm-number">0</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">learning_rate</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">5e-05</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">weight_decay</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">0.0</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">adam_beta1</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">0.9</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">adam_beta2</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">0.999</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">adam_epsilon</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">1e-08</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">max_grad_norm</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">1.0</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">num_train_epochs</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">3.0</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">max_steps</span>: <span class="cm-builtin">int</span> <span class="cm-operator">=</span> <span class="cm-operator">-</span><span class="cm-number">1</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">lr_scheduler_type</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-variable">transformers</span>.<span class="cm-property">trainer_utils</span>.<span class="cm-property">SchedulerType</span>, <span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-string">'linear'</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">warmup_ratio</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">0.0</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">warmup_steps</span>: <span class="cm-builtin">int</span> <span class="cm-operator">=</span> <span class="cm-number">0</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">log_level</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-string">'passive'</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">log_level_replica</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-string">'passive'</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">log_on_each_node</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">True</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">logging_dir</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">logging_strategy</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-variable">transformers</span>.<span class="cm-property">trainer_utils</span>.<span class="cm-property">IntervalStrategy</span>, <span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-string">'steps'</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">logging_first_step</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">logging_steps</span>: <span class="cm-builtin">int</span> <span class="cm-operator">=</span> <span class="cm-number">500</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">logging_nan_inf_filter</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">True</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">save_strategy</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-variable">transformers</span>.<span class="cm-property">trainer_utils</span>.<span class="cm-property">IntervalStrategy</span>, <span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-string">'steps'</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">save_steps</span>: <span class="cm-builtin">int</span> <span class="cm-operator">=</span> <span class="cm-number">500</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">save_total_limit</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">int</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">save_on_each_node</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">no_cuda</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">use_mps_device</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">seed</span>: <span class="cm-builtin">int</span> <span class="cm-operator">=</span> <span class="cm-number">42</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">data_seed</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">int</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">jit_mode_eval</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">use_ipex</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">bf16</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">fp16</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">fp16_opt_level</span>: <span class="cm-builtin">str</span> <span class="cm-operator">=</span> <span class="cm-string">'O1'</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">half_precision_backend</span>: <span class="cm-builtin">str</span> <span class="cm-operator">=</span> <span class="cm-string">'auto'</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">bf16_full_eval</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">fp16_full_eval</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">tf32</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">bool</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">local_rank</span>: <span class="cm-builtin">int</span> <span class="cm-operator">=</span> <span class="cm-operator">-</span><span class="cm-number">1</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">xpu_backend</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">tpu_num_cores</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">int</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">tpu_metrics_debug</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">debug</span>: <span class="cm-builtin">str</span> <span class="cm-operator">=</span> <span class="cm-string">''</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">dataloader_drop_last</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">eval_steps</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">int</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">dataloader_num_workers</span>: <span class="cm-builtin">int</span> <span class="cm-operator">=</span> <span class="cm-number">0</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">past_index</span>: <span class="cm-builtin">int</span> <span class="cm-operator">=</span> <span class="cm-operator">-</span><span class="cm-number">1</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">run_name</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">disable_tqdm</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">bool</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">remove_unused_columns</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">bool</span>] <span class="cm-operator">=</span> <span class="cm-keyword">True</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">label_names</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-variable">typing</span>.<span class="cm-property">List</span>[<span class="cm-builtin">str</span>]] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">load_best_model_at_end</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">bool</span>] <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">metric_for_best_model</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">greater_is_better</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">bool</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">ignore_data_skip</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">sharded_ddp</span>: <span class="cm-builtin">str</span> <span class="cm-operator">=</span> <span class="cm-string">''</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">fsdp</span>: <span class="cm-builtin">str</span> <span class="cm-operator">=</span> <span class="cm-string">''</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">fsdp_min_num_params</span>: <span class="cm-builtin">int</span> <span class="cm-operator">=</span> <span class="cm-number">0</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">fsdp_transformer_layer_cls_to_wrap</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">deepspeed</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">label_smoothing_factor</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">0.0</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">optim</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-variable">transformers</span>.<span class="cm-property">training_args</span>.<span class="cm-property">OptimizerNames</span>, <span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-string">'adamw_hf'</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">optim_args</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">adafactor</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">group_by_length</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">length_column_name</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-string">'length'</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">report_to</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-variable">typing</span>.<span class="cm-property">List</span>[<span class="cm-builtin">str</span>]] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">ddp_find_unused_parameters</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">bool</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">ddp_bucket_cap_mb</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">int</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">dataloader_pin_memory</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">True</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">skip_memory_metrics</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">True</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">use_legacy_prediction_loop</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">push_to_hub</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">resume_from_checkpoint</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">hub_model_id</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">hub_strategy</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-variable">transformers</span>.<span class="cm-property">trainer_utils</span>.<span class="cm-property">HubStrategy</span>, <span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-string">'every_save'</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">hub_token</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">hub_private_repo</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">gradient_checkpointing</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">include_inputs_for_metrics</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">fp16_backend</span>: <span class="cm-builtin">str</span> <span class="cm-operator">=</span> <span class="cm-string">'auto'</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">push_to_hub_model_id</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">push_to_hub_organization</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">push_to_hub_token</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">mp_parameters</span>: <span class="cm-builtin">str</span> <span class="cm-operator">=</span> <span class="cm-string">''</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">auto_find_batch_size</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">full_determinism</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">torchdynamo</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">ray_scope</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-string">'last'</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">ddp_timeout</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">int</span>] <span class="cm-operator">=</span> <span class="cm-number">1800</span> </span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 2288px;"/><div class="CodeMirror-gutters" style="display: none; height: 2288px;"/></div></div></pre><p><span>参数:</span></p><ul><li><p><code>output_dir</code><span>:一个字符串,指定 </span><code>model prediction</code><span> 和 </span><code>model checkpoint</code><span> 输出的目录。</span></p></li><li><p><code>overwrite_output_dir</code><span>:一个布尔值,如果为 </span><code>True</code><span> 则覆盖 </span><code>output_dir</code><span> 的内容。如果 </span><code>output_dir</code><span> 指向一个 </span><code>checkpoint</code><span> 目录,则使用该参数来继续训练。</span></p></li><li><p><code>do_train</code><span>:一个布尔值,指定是否执行训练。该参数不是由 </span><code>Trainer</code><span> 直接使用,而是由你的 </span><code>training/evaluation</code><span> 脚本来使用。</span></p></li><li><p><code>do_eval</code><span>:一个布尔值,指定是否在验证集上执行评估。如果 </span><code>evaluation_strategy</code><span> 不是 </span><code>"no"</code><span>,那么该参数将被设置为 </span><code>True</code><span> 。该参数不是由 </span><code>Trainer</code><span> 直接使用,而是由你的 </span><code>training/evaluation</code><span> 脚本来使用。</span></p></li><li><p><code>do_predict</code><span>:一个布尔值,指定是否在测试集上执行预测。如果 </span><code>evaluation_strategy</code><span> 不是 </span><code>"no"</code><span>,那么该参数将被设置为 </span><code>True</code><span> 。该参数不是由 </span><code>Trainer</code><span> 直接使用,而是由你的 </span><code>training/evaluation</code><span> 脚本来使用。</span></p></li><li><p><code>evaluation_strategy</code><span>:一个字符串、或 </span><code>IntervalStrategy</code><span> ,指定训练过程中要采用的评估策略。可以为:</span></p><ul><li><code>"no"</code><span>:训练期间不进行评估。</span></li><li><code>"steps"</code><span>:每隔 </span><code>eval_steps</code><span> 训练步进行评估(并且记录日志)。</span></li><li><code>"epoch"</code><span>:在每个 </span><code>epoch</code><span> 结束时进行评估。</span></li></ul></li><li><p><code>prediction_loss_only</code><span>:一个布尔值,指定当执行评估和生成预测时是否仅返回 </span><code>loss</code><span> 。</span></p></li><li><p><code>per_device_train_batch_size</code><span>:一个整数,指定用于训练的每个 </span><code>GPU/TPU core/CPU</code><span> 的 </span><code>batch size</code><span> 。</span></p></li><li><p><code>per_device_eval_batch_size</code><span>:一个整数,指定用于评估的每个 </span><code>GPU/TPU core/CPU</code><span> 的 </span><code>batch size</code><span> 。</span></p></li><li><p><code>gradient_accumulation_steps</code><span>:一个整数,指定在进行反向传播(即,梯度更新)之前,用于累积梯度的 </span><code>updates steps</code><span> 的数量。当使用 </span><code>gradient accumulation</code><span>,一个 </span><code>step</code><span> 指的是执行一次反向传播。因此,</span><code>logging, evaluation, save</code><span> 将在每隔 </span><code>gradient_accumulation_steps * xxx_step</code><span> 的训练样本之后进行。</span></p><blockquote><p><span>它相当于扩大了 </span><code>batch size</code><span> 。</span></p></blockquote></li><li><p><code>eval_accumulation_steps</code><span>:一个整数,指定在将结果移动到 </span><code>CPU</code><span> 之前,对输出张量进行累积的 </span><code>predictions steps</code><span> 的数量。如果未设置,则整个预测结果在移动到 </span><code>CPU</code><span> 之前会在 </span><code>GPU/TPU</code><span> 上累积(速度更快,但是需要更多的显存)。</span></p></li><li><p><code>eval_delay</code><span>:一个浮点数,指定在执行第一次评估之前需要等待多少个 </span><code>epochs</code><span> 或 </span><code>steps</code><span>(根据 </span><code>evaluation_strategy</code><span> 的不同来选择 </span><code>epochs</code><span> 或 </span><code>steps</code><span>)。</span></p></li><li><p><code>learning_rate</code><span>:一个浮点数,指定 </span><code>AdamW</code><span> 优化器的初始学习率。</span></p></li><li><p><code>weight_decay</code><span>:一个浮点数,指定 </span><code>AdamW</code><span> 优化器中适用于所有层的权重衰减,除了所有的 </span><code>bias</code><span>、以及 </span><code>LayerNorm weights</code><span> 。</span></p></li><li><p><code>adam_beta1</code><span>:一个浮点数,指定 </span><code>AdamW</code><span> 优化器的 </span><code>beta1</code><span> 超参数。</span></p></li><li><p><code>adam_beta2</code><span>:一个浮点数,指定 </span><code>AdamW</code><span> 优化器的 </span><code>beta2</code><span> 超参数。</span></p></li><li><p><code>adam_epsilon</code><span>:一个浮点数,指定 </span><code>AdamW</code><span> 优化器的 </span><code>epsilon</code><span> 超参数。</span></p></li><li><p><code>max_grad_norm</code><span>:一个浮点数,指定最大梯度范数(用于梯度裁剪)。</span></p></li><li><p><code>num_train_epochs</code><span>:一个浮点数,指定训练的 </span><code>epoch</code><span> 数量。如果不是整数,那么在停止训练之前执行最后一个 </span><code>epoch</code><span> 的小数部分的百分比。</span></p></li><li><p><code>max_steps</code><span>:一个整数,如果设置为正数,则指定训练的 </span><code>step</code><span> 总数,它会覆盖 </span><code>num_train_epochs</code><span> 。如果使用有限的可迭代数据集,那么当所有数据耗尽时,可能会在 </span><code>max_steps</code><span> 之前就结束训练。</span></p></li><li><p><code>lr_scheduler_type</code><span>:一个字符串或 </span><code>SchedulerType</code><span>,指定学习率调度器的类型。</span></p></li><li><p><code>warmup_ratio</code><span>:一个浮点数,指定从 </span><code>0</code><span> 到峰值学习率(通常就是 </span><code>learning_rate</code><span> 指定的)的线性预热所使用的训练步占 </span><code>total training steps</code><span> 的比例。</span></p></li><li><p><code>warmup_steps</code><span>:一个浮点数,指定从 </span><code>0</code><span> 到峰值学习率(通常就是 </span><code>learning_rate</code><span> 指定的)的线性预热所使用的训练步的数量。它覆盖 </span><code>warmup_ratio</code><span> 。</span></p></li><li><p><code>log_level</code><span>:一个字符串,指定主进程中使用的 </span><code>logger log level</code><span> 。可以为:</span><code>'debug', 'info', 'warning', 'error', 'critical', 'passive'</code><span> 。其中 </span><code>'passive'</code><span> 表示不设置任何级别而是让 </span><code>application</code><span> 来设置。</span></p></li><li><p><code>log_level_replica</code><span>:一个字符串,指定在副本进程中使用的 </span><code>logger log level</code><span> 。参考 </span><code>log_level</code><span> 。</span></p></li><li><p><code>log_on_each_node</code><span>:一个布尔值,指定在多节点分布式训练中,是否在每个节点使用 </span><code>log_level</code><span> 来 </span><code>log</code><span>、或者仅在主节点上 </span><code>log</code><span> 。</span></p></li><li><p><code>logging_dir</code><span>:一个字符串,指定 </span><code>TensorBoard log</code><span> 目录,默认为 </span><code>output_dir/runs/CURRENT_DATETIME_HOSTNAME</code><span> 。</span></p></li><li><p><code>logging_strategy</code><span>:一个字符串或 </span><code>IntervalStrategy</code><span>,指定训练期间的 </span><code>logging</code><span> 策略。可以为:</span></p><ul><li><code>"no"</code><span>:不做任何 </span><code>logging</code><span> 。</span></li><li><code>"epoch"</code><span>:在每个 </span><code>epoch</code><span> 结束时 </span><code>logging</code><span> 。</span></li><li><code>"steps"</code><span>:每隔 </span><code>logging_steps</code><span> 就 </span><code>logging</code><span> 。</span></li></ul></li><li><p><code>logging_first_step</code><span>:一个布尔值,指定是否 </span><code>log</code><span> 和 </span><code>evaluate</code><span> 第一个 </span><code>global_step</code><span> 。</span></p></li><li><p><code>logging_steps</code><span>:一个整数,指定当 </span><code>logging_strategy="steps"</code><span> 时每两次 </span><code>logging</code><span> 之间的 </span><code>update steps</code><span> 数量。</span></p></li><li><p><code>logging_nan_inf_filter</code><span>:一个布尔值,指定是否要过滤 </span><code>nan</code><span> 和 </span><code>inf</code><span> 的损失从而用于 </span><code>logging</code><span> 。如果为 </span><code>True</code><span>,那么每个 </span><code>step</code><span> 的 </span><code>nan</code><span> 或 </span><code>inf</code><span> 的损失都会被过滤掉从而仅选取当前 </span><code>logging window</code><span> 的平均损失。</span></p><p><span>注意,该参数仅影响 </span><code>logging</code><span>,不影响梯度的计算或模型的预测。</span></p></li><li><p><code>save_strategy</code><span>:一个字符串或 </span><code>IntervalStrategy</code><span>,指定训练时采用的 </span><code>checkpoint save</code><span> 策略。可以为:</span></p><ul><li><code>"no"</code><span>:不做保存。</span></li><li><code>"epoch"</code><span>:在每个 </span><code>epoch</code><span> 结束时保存。</span></li><li><code>"steps"</code><span>:每隔 </span><code>save_steps</code><span> 就保存。</span></li></ul></li><li><p><code>save_steps</code><span>:一个整数,指定当 </span><code>save_strategy="steps"</code><span> 时每两次 </span><code>checkpoint save</code><span> 之间的 </span><code>update steps</code><span> 数量。</span></p></li><li><p><code>save_total_limit</code><span>:一个整数,如果传入一个值,那么指定 </span><code>checkpoint</code><span> 总的保存数量。这可能会删除 </span><code>output_dir</code><span> 中的 </span><code>older checkpoints</code><span> 。</span></p></li><li><p><code>save_on_each_node</code><span>:一个布尔值,指定当进行多节点分布式训练时,是否在每个节点上保存模型和 </span><code>checkpoint</code><span>、还是仅在主节点上保存。</span></p><p><span>当不同的节点使用相同的 </span><code>storage</code><span> 时,这个能力应该禁用,因为每个节点的文件都将被保存为相同的名称。</span></p></li><li><p><code>no_cuda</code><span>:一个布尔值,指定是否禁用 </span><code>CUDA</code><span> 。</span></p></li><li><p><code>seed</code><span>:一个整数,指定随机数种子,它在训练开始时设置。</span></p><p><span>为了确保不同运行的可重复性,如果模型有一些随机初始化的参数,请使用 </span><code>Trainer.model_init()</code><span> 函数将模型其实例化。</span></p></li><li><p><code>data_seed</code><span>:一个整数,指定用于 </span><code>data samplers</code><span> 的随机数种子。如果未设置,数据采样的随机数生成器将使用与 </span><code>seed</code><span> 相同的种子。这可以用来确保数据采样的可重复性,与 </span><code>model seed</code><span> 无关。</span></p></li><li><p><code>jit_mode_eval</code><span>:一个布尔值,指定是否使用 </span><code>PyTorch jit trace</code><span> 进行推断。</span></p></li><li><p><code>use_ipex</code><span>:一个布尔值,指定当 </span><code>Intel extension for PyTorch: IPEX</code><span> 可用时是否使用 </span><code>IPEX</code><span>。如果启用,则要求安装 </span><code>IPEX</code><span> 。</span></p></li><li><p><code>bf16</code><span>:一个布尔值,指定是否使用 </span><code>bf16</code><span> 的 </span><code>16</code><span> 位(混合)精度训练,而不是 </span><code>fp32</code><span> 的 </span><code>32</code><span> 位训练。需要 </span><code>Ampere</code><span> 或更高的 </span><code>NVIDIA</code><span> 架构、或使用 </span><code>CPU</code><span>(</span><code>no_cuda</code><span>)。这是一个实验性的</span><code>API</code><span>,它可能会发生变化。</span></p></li><li><p><code>fp16</code><span>:一个布尔值,指定是否使用 </span><code>fp16</code><span> 的 </span><code>16</code><span>位(混合)精度训练而不是 </span><code>fp32</code><span> 的 </span><code>32</code><span> 位训练。</span></p></li><li><p><code>fp16_opt_level</code><span>:一个字符串,指定 </span><code>Apex AMP</code><span> 优化等级(</span><code>'O0', 'O1', 'O2', 'O3</code><span>)从而用于 </span><code>fp16</code><span> 训练。</span></p></li><li><p><code>fp16_backend</code><span>:目前已被废弃,建议使用 </span><code>half_precision_backend</code><span> 。</span></p></li><li><p><code>half_precision_backend</code><span>:一个字符串,指定 用于混合精度训练的后端。必须是 </span><code>"auto", "cuda_amp", "apex", "cpu_amp"</code><span>中的一个。</span><code>"auto"</code><span> 将根据检测到的 </span><code>PyTorch</code><span> 版本使用 </span><code>CPU/CUDA AMP</code><span> 或 </span><code>APEX</code><span> ,而其他选择将强制使用所对应的后端。</span></p></li><li><p><code>bf16_full_eval</code><span>:一个布尔值,指定是否使用 </span><code>full bfloat16</code><span> 评估,而不是使用 </span><code>fp32</code><span> 的 </span><code>32</code><span> 位进行评估。这将会更快并节省内存,但会损害评估指标。这是一个实验性的 </span><code>API</code><span>,它可能会改变。</span></p></li><li><p><code>f16_full_eval</code><span>:一个布尔值,指定是否使用 </span><code>full float16</code><span> 评估,而不是使用 </span><code>fp32</code><span> 的 </span><code>32</code><span> 位进行评估。这将会更快并节省内存,但会损害评估指标。</span></p></li><li><p><code>tf32</code><span>:一个布尔值,指定是否启用 </span><code>TF32</code><span> 模式,在 </span><code>Ampere</code><span> 和较新的GPU架构中可用。默认值取决于 </span><code>PyTorch</code><span> 的</span><code>torch.backends.cuda.matmul.allow_tf32</code><span> 的默认版本。这是一个实验性的 </span><code>API</code><span> ,它可能会改变。</span></p></li><li><p><code>local_rank</code><span>:一个整数,指定分布式训练中进程的 </span><code>rank</code><span>。</span></p></li><li><p><code>xpu_backend</code><span>:一个字符串,指定 </span><code>xpu</code><span> 分布式训练要使用的后端。必须是 </span><code>"mpi", "ccl", "gloo"</code><span> 中的一个。</span></p></li><li><p><code>dataloader_drop_last</code><span>:一个布尔值,指定是否放弃最后一个 </span><code>incomplete batch</code><span> (如果数据集的长度不能被 </span><code>batch size</code><span> 所整除)。</span></p></li><li><p><code>eval_steps</code><span>:一个整数,指定如果 </span><code>evaluation_strategy="steps"</code><span> 则两次评估之间的 </span><code>update steps</code><span> 的数量。如果没有设置,将默认为与 </span><code>logging_steps</code><span> 相同。</span></p></li><li><p><code>dataloader_num_workers</code><span>:一个整数,指定用于加载数据集的子进程的数量(仅限 </span><code>PyTorch</code><span> )。</span><code>0</code><span>表示数据将在主进程中加载。</span></p></li><li><p><code>past_index</code><span>:一个整数,有些模型如 </span><code>TransformerXL</code><span> 或 </span><code>XLNet</code><span> 可以利用 </span><code>past hidden states</code><span> 进行预测。如果这个参数被设置为一个正整数,训练器将使用相应的输出(通常是 </span><code>index 2</code><span> )作为 </span><code>past state</code><span> ,并在下一个 </span><code>training step</code><span> 中根据关键字参数</span><code>ems</code><span> 将其馈入模型。</span></p></li><li><p><code>run_name</code><span>:一个字符串,指定一个当前 </span><code>run</code><span> 的描述文本,通常用于 </span><code>wandb</code><span> 和 </span><code>mlflow</code><span> 日志。</span></p></li><li><p><code>disable_tqdm</code><span>:一个布尔值,指定是否禁用 </span><code>tqdm</code><span> 进度条和 </span><code>table of metrics</code><span> 。</span></p><p><span>如果 </span><code>logging level</code><span> 被设置为 </span><code>warn</code><span> 或更低(默认),将默认为</span><code>True</code><span>,否则为</span><code>False</code><span>。</span></p></li><li><p><code>remove_unused_columns</code><span>:一个布尔值,指定是否自动删除未被使用的列(指的是在模型的前向传播中未被使用)。</span></p><p><span>注意,这个行为尚未在 </span><code>TFTrainer</code><span> 中实现。</span></p></li><li><p><code>label_names</code><span>:一个关于字符串的列表,指定 </span><code>inputs</code><span> 的字典中对应于 </span><code>label</code><span> 的键的列表。</span></p><p><span>默认值是 </span><code>["labels"]</code><span>,但是如果是 </span><code>XxxForQuestionAnswering</code><span> 模型则默认值是 </span><code>["start_positions", "end_positions"]</code><span> 。</span></p></li><li><p><code>load_best_model_at_end</code><span>:一个布尔值,指定是否在训练结束时加载训练中发现的最佳模型。</span></p><p><span>如果是 </span><code>True</code><span> ,那么 </span><code>save_strategy</code><span> 需要和 </span><code>evaluation_strategy</code><span> 相同;并且如果 </span><code>save_strategy</code><span> 和 </span><code>evaluation_strategy</code><span> 都是 </span><code>"step"</code><span>,那么 </span><code>save_steps</code><span> 必须是 </span><code>eval_steps</code><span> 的整数倍。</span></p></li><li><p><code>metric_for_best_model</code><span>:一个字符串,指定评估最佳模型的指标,与 </span><code>load_best_model_at_end</code><span> 配合使用。</span></p><p><span>它必须是模型评估所返回的指标的名字。如果 </span><code>load_best_model_at_end=True</code><span> 且该参数未指定,则默认为 </span><code>"loss"</code><span> 。</span></p><p><span>注意,</span><code>greater_is_better</code><span> 默认为 </span><code>True</code><span>;如果评估指标越低越好,则需要将 </span><code>greater_is_better</code><span> 设置为 </span><code>False</code><span> 。</span></p></li><li><p><code>greater_is_better</code><span>:一个布尔值,指定更好的模型是否具有更大的指标值,与 </span><code>load_best_model_at_end</code><span> 和 </span><code>metric_for_best_model</code><span> 一起使用。</span></p><p><span>如果 </span><code>metric_for_best_model</code><span> 被设置为既不是 </span><code>"loss"</code><span> 也不是 </span><code>"eval_loss"</code><span>,那么默认为 </span><code>True</code><span> ;如果 </span><code>metric_for_best_model</code><span> 没有被设置、或者是 </span><code>"loss"</code><span>、或者是 </span><code>"eval_loss"</code><span>,那么默认为 </span><code>False</code><span> 。</span></p></li><li><p><code>ignore_data_skip</code><span>:一个布尔值,指定在恢复训练时,是否跳过 </span><code>epochs</code><span> 和 </span><code>batches</code><span> 从而获得与 </span><code>previous training</code><span> 相同阶段的 </span><code>data loading</code><span> 。如果设置为 </span><code>True</code><span> ,训练将更快开始(因为 </span><code>skipping step</code><span> 可能需要很长时间),但不会产生与被中断的训练相同的结果。</span></p></li><li><p><code>sharded_ddp</code><span>:一个布尔值或字符串或关于 </span><code>ShardedDDPOption</code><span> 的列表,指定使用来自 </span><code>FairScale</code><span> 的 </span><code>Sharded DDP</code><span> 训练(仅在分布式训练中)。这是一个实验性的功能。可选的参数为:</span></p><ul><li><code>"simple"</code><span> :使用由 </span><code>fairscale</code><span> 发布的 </span><code>sharded DDP</code><span> 的第一个实例(</span><code>ShardedDDP</code><span>),类似于 </span><code>ZeRO-2</code><span> 。</span></li><li><code>"zero_dp_2"</code><span>:以 </span><code>Zero-2</code><span> (采用 </span><code>reshard_after_forward=False</code><span> )使用由 </span><code>fairscale</code><span> 发布的 </span><code>sharded DDP</code><span> 的第二个实例(</span><code>FullyShardedDDP</code><span>)。</span></li><li><code>"zero_dp_3"</code><span>:以 </span><code>Zero-3</code><span> (采用 </span><code>reshard_after_forward=True</code><span> )使用由 </span><code>fairscale</code><span> 发布的 </span><code>sharded DDP</code><span> 的第二个实例(</span><code>FullyShardedDDP</code><span>)。</span></li><li><code>"offload"</code><span>:添加 </span><code>ZeRO-offload</code><span> (只与 </span><code>"zero_dp_2"</code><span> 和 </span><code>"zero_dp_3"</code><span> 兼容)。</span></li></ul><p><span>如果传递的是一个字符串,它将在空格处被拆分。如果传递的是一个布尔值,如果是 </span><code>True</code><span> 则代表 </span><code>["simple"]</code><span>,如果是 </span><code>False</code><span> 则代表 </span><code>[]</code><span> 。</span></p></li><li><p><code>fsdp</code><span>:一个布尔值或字符串或关于</span><code>FSDPOption</code><span> 的列表,指定使用 </span><code>PyTorch Distributed Parallel Training</code><span> 训练(仅在分布式训练中)。可选的参数为:</span></p><ul><li><code>"full_shard"</code><span>:对 </span><code>parameters, gradients, optimizer states</code><span> 进行分片。</span></li><li><code>"shard_grad_op"</code><span>:对 </span><code>optimizer states, gradients</code><span> 进行分片。</span></li><li><code>"offload"</code><span>:将 </span><code>parameters, gradients</code><span> 卸载到 </span><code>CPU</code><span> (仅与 </span><code>"full_shard", "shard_grad_op"</code><span> 兼容)。</span></li><li><code>"auto_wrap"</code><span>:使用 </span><code>default_auto_wrap_policy</code><span> 从而利用 </span><code>FSDP</code><span> 来自动递归地 </span><code>wrap layers</code><span> 。</span></li></ul></li><li><p><code>fsdp_min_num_params</code><span>:一个整数,指定 </span><code>FSDP</code><span> 的默认的最少的 </span><code>parameters</code><span> 数量从而用于 </span><code>Default Auto Wrapping</code><span> 。仅当 </span><code>fsdp</code><span> 参数有效时 </span><code>fsdp_min_num_params</code><span> 才有意义。</span></p></li><li><p><code>deepspeed</code><span>:一个字符串或字典,指定使用 </span><code>Deepspeed</code><span> 。</span></p><p><span>这是一个实验性的功能,其 </span><code>API</code><span> 可能会在未来演变。该值是 </span><code>DeepSpeed json</code><span> 配置文件的位置(例如 </span><code>ds_config.json</code><span> )或者是一个代表已加载 </span><code>json</code><span> 文件的字典。</span></p></li><li><p><code>label_smoothing_factor</code><span>:一个浮点数,指定 </span><code>label smoothing</code><span> 因子。零代表没有标签平滑。</span></p><p><span>否则, </span><code>label 0</code><span> 变成:</span><code>label_smoothing_factor/num_labels</code><span>;</span><code>label 1</code><span> 变成 </span><code>1 - label_smoothing_factor + label_smoothing_factor/num_labels</code><span> 。</span></p></li><li><p><code>debug</code><span>:一个字符串或者关于 </span><code>DebugOption</code><span> 的列表,指定启用一个或多个 </span><code>debug</code><span> 特性。这是一个实验性的功能。可选的参数有:</span></p><ul><li><code>"underflow_overflow"</code><span>:检测模型的 </span><code>input/outputs</code><span> 中的溢出,并报告导致该事件的最后一帧。</span></li><li><code>"tpu_metrics_debug"</code><span>:打印 </span><code>TPU</code><span> 的调试指标。</span></li></ul><p><span>如果是字符串,那么用空格拆分。</span></p></li><li><p><code>optim</code><span>:一个字符串或者 </span><code>training_args.OptimizerNames</code><span>,指定使用的优化器。可以为:</span><code>adamw_hf, adamw_torch, adamw_apex_fused, adamw_anyprecision, adafactor</code><span> 。</span></p></li><li><p><code>optim_args</code><span>:一个字符串,指定提供给 </span><code>AnyPrecisionAdamW</code><span> 的可选参数。</span></p></li><li><p><code>adafactor</code><span>:被废弃,推荐使用 </span><code>optim="adafactor"</code><span> 来代替。</span></p></li><li><p><code>group_by_length</code><span>:一个布尔值,指定是否将训练数据集中长度大致相同的样本分组(从而尽量减少填充的使用,使之更有效)。只有在应用动态填充时才有用。</span></p></li><li><p><code>length_column_name</code><span>:一个字符串,指定预计算的长度的列名。如果该列存在,</span><code>group_by_length</code><span> 将使用这些值,而不是在训练启动时计算它们。除非 </span><code>group_by_lengt = True</code><span> 并且数据集是 </span><code>Dataset</code><span> 的一个实例,否则会被忽略。</span></p></li><li><p><code>report_to</code><span>:一个字符串或关于字符串的列表,指定要报告结果和日志的集成商的列表。支持的集成商有:</span><code>"azure_ml", "comet_ml", "mlflow", "neptune", "tensorboard","clearml", "wandb"</code><span> 。</span><code>"all"</code><span> 表示报告所有的集成商;</span><code>"none"</code><span> 表示都不报告。</span></p></li><li><p><code>ddp_find_unused_parameters</code><span>:一个布尔值,指定当使用分布式训练时,传递给 </span><code>DistributedDataParallel</code><span> 的 </span><code>find_unused_parameters</code><span> 的值。</span></p><p><span>如果使用 </span><code>gradient checkpointing</code><span>,那么默认值为 </span><code>False</code><span>;否则默认值为 </span><code>True</code><span> 。</span></p></li><li><p><code>ddp_bucket_cap_mb</code><span>:一个整数,指定当使用分布式训练时,传递给 </span><code>DistributedDataParallel</code><span> 的 </span><code>bucket_cap_mb</code><span> 的值。</span></p></li><li><p><code>dataloader_pin_memory</code><span>:一个布尔值,指定在 </span><code>data loaders</code><span> 中是否要 </span><code>pin memory</code><span> 。</span></p></li><li><p><code>skip_memory_metrics</code><span>:一个布尔值,指定是否跳过向 </span><code>metrics</code><span> 添加 </span><code>memory profiler</code><span> 报告。默认情况下跳过(即,</span><code>True</code><span>),因为这会减慢训练和评估的速度。</span></p></li><li><p><code>push_to_hub</code><span>:一个布尔值,指定每次保存模型时是否将模型推送到 </span><code>Hub</code><span> 。</span></p><p><span>如果为 </span><code>True</code><span> ,</span><code>output_dir</code><span> 是一个 </span><code>git</code><span> 目录(这个 </span><code>git</code><span> 目录与 </span><code>repo</code><span> 同步),目录内容将在每次触发保存时被推送(取决于你的 </span><code>save_strategy</code><span> )。调用 </span><code>save_model()</code><span>也会触发一次推送。</span> <span>如果 </span><code>output_dir</code><span> 存在,则该目录必须是 </span><code>repo</code><span> 的一个 </span><code>local clone</code><span> ,这个 </span><code>repo</code><span> 就是 </span><code>Trainer</code><span> 将被推送到的地方。</span></p></li><li><p><code>resume_from_checkpoint</code><span>:一个字符串,指定有效 </span><code>checkpoint</code><span> 的文件夹的路径。</span></p><p><span>这个参数不直接被 </span><code>Trainer</code><span> 使用,而是由 </span><code>training/evaluation</code><span> 脚本使用。</span></p></li><li><p><code>hub_model_id</code><span>:一个字符串,指定 </span><code>repo</code><span> 的名字从而与 </span><code>local output_dir</code><span> 保持同步。默认为 </span><code>output_dir</code><span> 的名称。</span></p></li><li><p><code>hub_strategy</code><span>:一个字符串或 </span><code>HubStrategy</code><span>,指定推送到 </span><code>Hub</code><span> 的范围和时间。可以为:</span></p><ul><li><code>"end"</code><span>:在调用 </span><code>save_model()</code><span> 方法时,推送模型、模型配置、</span><code>tokenizer</code><span> (如果传递给 </span><code>Trainer</code><span> )和模型卡的草稿。</span></li><li><code>"every_save"</code><span>:每次有模型保存时,推送模型、模型配置、</span><code>tokenizer</code><span> (如果传递给 </span><code>Trainer</code><span> )和模型卡的草稿。推送是异步的,从而避免阻碍训练,而且如果保存非常频繁,只有在前一次完成后才会尝试新的推送。最后一次推送是在训练结束后用最后的模型进行的。</span></li><li><code>"checkpoint"</code><span>:和 </span><code>"every_save "</code><span> 一样,但最 </span><code>latest checkpoint</code><span> 也会被推送到一个名为 </span><code>last-checkpoint</code><span> 的子文件夹中,这样你就可以用 </span><code>trainer.train(resume_from_checkpoint="last-checkpoint")</code><span> 轻松恢复训练。</span></li><li><code>"all_checkpoints"</code><span>:和 </span><code>"checkpoint "</code><span> 一样,但是所有的 </span><code>checkpoint</code><span> 都会被推送(所以你会在 </span><code>final repo</code><span> 的每个文件夹中得到一个 </span><code>checkpoint</code><span> 文件夹)。</span></li></ul></li><li><p><code>hub_token</code><span>:一个字符串,指定用来推送模型到 </span><code>Hub</code><span> 的 </span><code>token</code><span> 。将默认为通过 </span><code>huggingface-cli</code><span> 登录获得的缓存文件夹中的</span><code>token</code><span> 。</span></p></li><li><p><code>hub_private_repo</code><span>:一个布尔值,指定是否将 </span><code>Hub repo</code><span> 设为私有。</span></p></li><li><p><code>gradient_checkpointing</code><span>:一个布尔值,指定是否使用 </span><code>gradient checkpointing</code><span> 来节省内存。如果为 </span><code>True</code><span>,则会降低反向传播的速度。</span></p></li><li><p><code>include_inputs_for_metrics</code><span>:一个布尔值,指定 </span><code>inputs</code><span> 是否会被传递给 </span><code>compute_metrics</code><span> 函数。这适用于需要 </span><code>inputs</code><span>、</span><code>predictions</code><span> 和 </span><code>references</code><span> 的指标的计算。</span></p></li><li><p><code>auto_find_batch_size</code><span>:一个布尔值,指定是否通过指数衰减自动寻找适合内存的 </span><code>batch size</code><span> ,以避免 </span><code>CUDA Out-of-Memory</code><span> 的错误。需要安装 </span><code>accelerate</code><span> (</span><code>pip install accelerate</code><span> )。</span></p></li><li><p><code>full_determinism</code><span>:一个布尔值,如果为 </span><code>True</code><span> 则调用 </span><code>enable_full_determinism()</code><span> 而不是 </span><code>set_seed()</code><span> ,从而确保分布式训练的结果可重复。</span></p></li><li><p><code>torchdynamo</code><span>:一个字符串,用于 </span><code>TorchDynamo</code><span> 的后端编译器。可以为: </span><code>"eager", "aot_eager", "inductor", "nvfuser", "aot_nvfuser", "aot_cudagraphs", "ofi", "fx2trt", "onnxrt", "ipex"</code><span> 。</span></p></li><li><p><code>ray_scope</code><span>:一个字符串,指定使用 </span><code>Ray</code><span> 进行超参数搜索时使用的范围。默认情况下,将使用 </span><code>"last"</code><span> 。然后,</span><code>Ray</code><span> 将使用所有试验的最后一个 </span><code>checkpoint</code><span> ,比较这些 </span><code>checkpoint</code><span> 并选择最佳 </span><code>checkpoint</code><span> 。然而,也有其他选项。更多选项见 </span><code>Ray</code><span> 文档。</span></p></li><li><p><code>ddp_timeout</code><span>:一个整数,指定 </span><code>torch.distributed.init_process_group</code><span> 调用的超时时间。</span></p></li><li><p><code>use_mps_device</code><span>:一个布尔值,指定是否使用基于 </span><code>Apple Silicon</code><span> 的 </span><code>mps</code><span> 设备。</span></p></li></ul><p><span>方法:</span></p><ul><li><p><code>get_process_log_level()</code><span>:返回 </span><code>log level</code><span> ,具体结果取决于是否是 </span><code>node 0</code><span> 的主进程、</span><code>node non-0</code><span> 的主进程、以及非主进程。</span></p><ul><li><span>对于主进程, </span><code>log level</code><span> 默认为 </span><code>logging.INFO</code><span> ,除非被 </span><code>log_level</code><span> 参数覆盖。</span></li><li><span>对于非主进程,除非被 </span><code>log_level_replica</code><span> 参数覆盖,否则 </span><code>log level</code><span> 默认为 </span><code>logging.warning</code><span>。</span></li></ul><p><span>主进程和非主进程的 </span><code>setting</code><span> 是根据 </span><code>should_log</code><span> 的返回值进行选择的。</span></p></li><li><p><code>get_warmup_steps( num_training_steps: int)</code><span>:返回用于线性预热的 </span><code>step</code><span> 数量。</span></p></li><li><p><code>main_process_first(local = True, desc = 'work' )</code><span>:一个用于 </span><code>Torch</code><span> 分布式环境的上下文管理器,它需要在主进程上做一些事情,同时 </span><code>block</code><span> 副本,当它完成后再 </span><code>release</code><span> 副本。</span></p><p><span>其中一个用途是数据集的 </span><code>map</code><span> 特性:为了提高效率,应该在主进程上运行一次,完成后保存一个 </span><code>cached</code><span> 版本数据集的结果,然后由副本自动加载。</span></p><p><span>参数:</span></p><ul><li><p><code>local</code><span>:一个布尔值,如果为 </span><code>True</code><span> 则表示 </span><code>first</code><span> 是每个节点的 </span><code>rakn 0</code><span> 进程;如果为 </span><code>False</code><span> 则表示 </span><code>first</code><span> 是 </span><code>node rank 0</code><span> 的 </span><code>rank 0</code><span> 进程。</span></p><p><span>在共享文件系统的多节点环境中,你很可能想使用 </span><code>local=False</code><span> ,这样只有第一个节点的主进程会进行处理。然而,如果文件系统不是共享的,那么每个节点的主进程将需要做处理,这是默认行为(即,默认为 </span><code>True</code><span>)。</span></p></li><li><p><code>desc</code><span>:一个字符串,指定 </span><code>work</code><span> 描述文本从而用于调试日志。</span></p></li></ul></li><li><p><code>to_dict()</code><span>:序列化该实例到一个字典,同时用枚举的值来代替 </span><code>Enum</code><span> 对象。</span></p></li><li><p><code>to_json_string()</code><span>:序列化该实例到一个 </span><code>json</code><span> 字符串。</span></p></li><li><p><code>to_sanitized_dict()</code><span>:采用 </span><code>TensorBoard</code><span> 的 </span><code>hparams</code><span> 来序列化该实例。</span></p></li></ul></li><li><p><code>class transformers.Seq2SeqTrainingArguments</code><span>:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python" style="break-inside: unset;"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">transformers</span>.<span class="cm-property">Seq2SeqTrainingArguments</span>(</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">output_dir</span>: <span class="cm-builtin">str</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">overwrite_output_dir</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">do_train</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">do_eval</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">do_predict</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">evaluation_strategy</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-variable">transformers</span>.<span class="cm-property">trainer_utils</span>.<span class="cm-property">IntervalStrategy</span>, <span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-string">'no'</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">prediction_loss_only</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">per_device_train_batch_size</span>: <span class="cm-builtin">int</span> <span class="cm-operator">=</span> <span class="cm-number">8</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">per_device_eval_batch_size</span>: <span class="cm-builtin">int</span> <span class="cm-operator">=</span> <span class="cm-number">8</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">per_gpu_train_batch_size</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">int</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">per_gpu_eval_batch_size</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">int</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">gradient_accumulation_steps</span>: <span class="cm-builtin">int</span> <span class="cm-operator">=</span> <span class="cm-number">1</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">eval_accumulation_steps</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">int</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">eval_delay</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">float</span>] <span class="cm-operator">=</span> <span class="cm-number">0</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">learning_rate</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">5e-05</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">weight_decay</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">0.0</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">adam_beta1</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">0.9</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">adam_beta2</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">0.999</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">adam_epsilon</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">1e-08</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">max_grad_norm</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">1.0</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">num_train_epochs</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">3.0</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">max_steps</span>: <span class="cm-builtin">int</span> <span class="cm-operator">=</span> <span class="cm-operator">-</span><span class="cm-number">1</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">lr_scheduler_type</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-variable">transformers</span>.<span class="cm-property">trainer_utils</span>.<span class="cm-property">SchedulerType</span>, <span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-string">'linear'</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">warmup_ratio</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">0.0</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">warmup_steps</span>: <span class="cm-builtin">int</span> <span class="cm-operator">=</span> <span class="cm-number">0</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">log_level</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-string">'passive'</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">log_level_replica</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-string">'passive'</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">log_on_each_node</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">True</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">logging_dir</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">logging_strategy</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-variable">transformers</span>.<span class="cm-property">trainer_utils</span>.<span class="cm-property">IntervalStrategy</span>, <span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-string">'steps'</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">logging_first_step</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">logging_steps</span>: <span class="cm-builtin">int</span> <span class="cm-operator">=</span> <span class="cm-number">500</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">logging_nan_inf_filter</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">True</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">save_strategy</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-variable">transformers</span>.<span class="cm-property">trainer_utils</span>.<span class="cm-property">IntervalStrategy</span>, <span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-string">'steps'</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">save_steps</span>: <span class="cm-builtin">int</span> <span class="cm-operator">=</span> <span class="cm-number">500</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">save_total_limit</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">int</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">save_on_each_node</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">no_cuda</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">use_mps_device</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">seed</span>: <span class="cm-builtin">int</span> <span class="cm-operator">=</span> <span class="cm-number">42</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">data_seed</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">int</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">jit_mode_eval</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">use_ipex</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">bf16</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">fp16</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">fp16_opt_level</span>: <span class="cm-builtin">str</span> <span class="cm-operator">=</span> <span class="cm-string">'O1'</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">half_precision_backend</span>: <span class="cm-builtin">str</span> <span class="cm-operator">=</span> <span class="cm-string">'auto'</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">bf16_full_eval</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">fp16_full_eval</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">tf32</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">bool</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">local_rank</span>: <span class="cm-builtin">int</span> <span class="cm-operator">=</span> <span class="cm-operator">-</span><span class="cm-number">1</span><span class="cm-variable">xpu_backend</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">tpu_num_cores</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">int</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">tpu_metrics_debug</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">debug</span>: <span class="cm-builtin">str</span> <span class="cm-operator">=</span> <span class="cm-string">''</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">dataloader_drop_last</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">eval_steps</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">int</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">dataloader_num_workers</span>: <span class="cm-builtin">int</span> <span class="cm-operator">=</span> <span class="cm-number">0</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">past_index</span>: <span class="cm-builtin">int</span> <span class="cm-operator">=</span> <span class="cm-operator">-</span><span class="cm-number">1</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">run_name</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">disable_tqdm</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">bool</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">remove_unused_columns</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">bool</span>] <span class="cm-operator">=</span> <span class="cm-keyword">True</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">label_names</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-variable">typing</span>.<span class="cm-property">List</span>[<span class="cm-builtin">str</span>]] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">load_best_model_at_end</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">bool</span>] <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">metric_for_best_model</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">greater_is_better</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">bool</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">ignore_data_skip</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">sharded_ddp</span>: <span class="cm-builtin">str</span> <span class="cm-operator">=</span> <span class="cm-string">''</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">fsdp</span>: <span class="cm-builtin">str</span> <span class="cm-operator">=</span> <span class="cm-string">''</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">fsdp_min_num_params</span>: <span class="cm-builtin">int</span> <span class="cm-operator">=</span> <span class="cm-number">0</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">fsdp_transformer_layer_cls_to_wrap</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">deepspeed</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">label_smoothing_factor</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">0.0</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">optim</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-variable">transformers</span>.<span class="cm-property">training_args</span>.<span class="cm-property">OptimizerNames</span>, <span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-string">'adamw_hf'</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">optim_args</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">adafactor</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">group_by_length</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">length_column_name</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-string">'length'</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">report_to</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-variable">typing</span>.<span class="cm-property">List</span>[<span class="cm-builtin">str</span>]] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">ddp_find_unused_parameters</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">bool</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">ddp_bucket_cap_mb</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">int</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">dataloader_pin_memory</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">True</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">skip_memory_metrics</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">True</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">use_legacy_prediction_loop</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">push_to_hub</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">resume_from_checkpoint</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">hub_model_id</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">hub_strategy</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-variable">transformers</span>.<span class="cm-property">trainer_utils</span>.<span class="cm-property">HubStrategy</span>, <span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-string">'every_save'</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">hub_token</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">hub_private_repo</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">gradient_checkpointing</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">include_inputs_for_metrics</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">fp16_backend</span>: <span class="cm-builtin">str</span> <span class="cm-operator">=</span> <span class="cm-string">'auto'</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">push_to_hub_model_id</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">push_to_hub_organization</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">push_to_hub_token</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">mp_parameters</span>: <span class="cm-builtin">str</span> <span class="cm-operator">=</span> <span class="cm-string">''</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">auto_find_batch_size</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">full_determinism</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">torchdynamo</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">ray_scope</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-string">'last'</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">ddp_timeout</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">int</span>] <span class="cm-operator">=</span> <span class="cm-number">1800</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">sortish_sampler</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">predict_with_generate</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">generation_max_length</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">int</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">generation_num_beams</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">int</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 2354px;"/><div class="CodeMirror-gutters" style="display: none; height: 2354px;"/></div></div></pre><p><span>参数:</span></p><ul><li><code>sortish_sampler</code><span>:一个布尔值,指定是否使用 </span><code>sortish</code><span> 采样器。目前只有在底层数据集是 </span><code>Seq2SeqDataset</code><span> 的情况下才有可能,但在不久的将来会变得普遍可用。</span> <span>它根据长度对输入进行排序从而最小化 </span><code>padding</code><span> 的大小,其中对训练集有一点随机性。</span></li><li><code>predict_with_generate</code><span>:一个布尔值,指定是否使用 </span><code>generate</code><span> 来计算生成指标(</span><code>ROUGE, BLEU</code><span> )。</span></li><li><code>generation_max_length</code><span>:一个整数,指定当 </span><code>predict_with_generate=True</code><span> 时,在每个 </span><code>evaluation loop</code><span> 中使用的最大长度。默认为模型配置的 </span><code>max_length</code><span> 值。</span></li><li><code>generation_num_beams</code><span>:一个布尔值,指定当 </span><code>predict_with_generate=True</code><span> 时,在每个 </span><code>evaluation loop</code><span> 使用的 </span><code>beams</code><span> 数量。将默认为模型配置中的 </span><code>num_beams</code><span> 值。</span></li></ul></li></ol><h4 id="b-trainer"><span>b. Trainer</span></h4><ol start=""><li><p><code>class transformers.Trainer</code><span>:</span><code>Trainer</code><span> 是针对 </span><code>PyTorch</code><span> 的一个简单的、但是特征完备 </span><code>feature-complete</code><span> 的 </span><code>training</code><span> 和 </span><code>eval loop</code><span> ,并且针对 </span><code>Transformers</code><span> 进行了优化。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">transformers</span>.<span class="cm-property">Trainer</span>(</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">model</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-variable">transformers</span>.<span class="cm-property">modeling_utils</span>.<span class="cm-property">PreTrainedModel</span>, <span class="cm-variable">torch</span>.<span class="cm-property">nn</span>.<span class="cm-property">modules</span>.<span class="cm-property">module</span>.<span class="cm-property">Module</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">args</span>: <span class="cm-variable">TrainingArguments</span> <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">data_collator</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-variable">DataCollator</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">train_dataset</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-variable">torch</span>.<span class="cm-property">utils</span>.<span class="cm-property">data</span>.<span class="cm-property">dataset</span>.<span class="cm-property">Dataset</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">eval_dataset</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-variable">torch</span>.<span class="cm-property">utils</span>.<span class="cm-property">data</span>.<span class="cm-property">dataset</span>.<span class="cm-property">Dataset</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">tokenizer</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-variable">transformers</span>.<span class="cm-property">tokenization_utils_base</span>.<span class="cm-property">PreTrainedTokenizerBase</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">model_init</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Callable</span>[[], <span class="cm-variable">transformers</span>.<span class="cm-property">modeling_utils</span>.<span class="cm-property">PreTrainedModel</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">compute_metrics</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-variable">typing</span>.<span class="cm-property">Callable</span>[[<span class="cm-variable">transformers</span>.<span class="cm-property">trainer_utils</span>.<span class="cm-property">EvalPrediction</span>], <span class="cm-variable">typing</span>.<span class="cm-property">Dict</span>], <span class="cm-variable">NoneType</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">callbacks</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-variable">typing</span>.<span class="cm-property">List</span>[<span class="cm-variable">transformers</span>.<span class="cm-property">trainer_callback</span>.<span class="cm-property">TrainerCallback</span>]] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">optimizers</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Tuple</span>[<span class="cm-variable">torch</span>.<span class="cm-property">optim</span>.<span class="cm-property">optimizer</span>.<span class="cm-property">Optimizer</span>, <span class="cm-variable">torch</span>.<span class="cm-property">optim</span>.<span class="cm-property">lr_scheduler</span>.<span class="cm-property">LambdaLR</span>] <span class="cm-operator">=</span> (<span class="cm-keyword">None</span>, <span class="cm-keyword">None</span>),</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">preprocess_logits_for_metrics</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Callable</span>[[<span class="cm-variable">torch</span>.<span class="cm-property">Tensor</span>, <span class="cm-variable">torch</span>.<span class="cm-property">Tensor</span>], <span class="cm-variable">torch</span>.<span class="cm-property">Tensor</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 330px;"/><div class="CodeMirror-gutters" style="display: none; height: 330px;"/></div></div></pre><p><span>参数:</span></p><ul><li><p><code>model</code><span>:一个 </span><code>PreTrainedModel</code><span> 或 </span><code>torch.nn.Module</code><span> 对象,指定用于训练、评估、或预测的模型。如果未提供,则必须传入 </span><code>model_init</code><span> 参数。</span></p><p><code>Trainer</code><span> 被优化为与 </span><code>PreTrainedModel</code><span> 一起工作。但是你仍然可以使用自定义的 </span><code>torch.nn.Module</code><span> 的模型,只要模型的工作方式与 </span><code>Transformers</code><span> 模型相同。</span></p></li><li><p><code>args</code><span>:一个 </span><code>TrainingArguments</code><span>,指定训练时的参数。如果未提供,则默认为 </span><code>TrainingArguments</code><span> 的 </span><code>basic instance</code><span>,其中 </span><code>output_dir</code><span> 设置为当前目录下叫做 </span><code>tmp_trainer</code><span> 的目录。</span></p></li><li><p><code>data_collator</code><span>:一个 </span><code>DataCollator</code><span>,指定用于从 </span><code>train_dataset</code><span> 或 </span><code>eval_dataset</code><span> 的元素列表中构建一个 </span><code>batch</code><span> 的函数。如果没有提供 </span><code>tokenizer</code><span> ,则默认的 </span><code>DataCollator</code><span> 为 </span><code>default_data_collator()</code><span> ,否则默认的 </span><code>DataCollator</code><span> 为 </span><code>DataCollatorWithPadding</code><span> 的一个实例。</span></p></li><li><p><code>train_dataset</code><span>:一个 </span><code>torch.utils.data.Dataset</code><span> 或 </span><code>torch.utils.data.IterableDataset</code><span>,指定训练集。如果它是 </span><code>HuggingFace</code><span> 的 </span><code>Dataset</code><span>,那么 </span><code>model.forward()</code><span> 方法不需要的列则会被自动移除。</span></p><p><span>注意,如果它是一个带有一些随机性的 </span><code>torch.utils.data.IterableDataset</code><span> ,并且你是以分布式方式进行训练的,你的 </span><code>iterable dataset</code><span> 要么使用一个内部的 </span><code>attribute generator</code><span> ,该</span><code>generator</code><span> 是一个 </span><code>torch.Generator</code><span> 用于随机化,且在所有进程上必须是相同的(并且 </span><code>Trainer</code><span> 将在每个 </span><code>epoch</code><span> 手动设置该 </span><code>generator</code><span> 的种子);要么有一个 </span><code>set_epoch()</code><span> 方法,在该方法内部设置所用随机数生成器的种子。</span></p></li><li><p><code>eval_dataset</code><span>:一个 </span><code>torch.utils.data.Dataset</code><span> 或 </span><code>torch.utils.data.IterableDataset</code><span>,指定验证集。如果它是 </span><code>HuggingFace</code><span> 的 </span><code>Dataset</code><span>,那么 </span><code>model.forward()</code><span> 方法不需要的列则会被自动移除。</span></p><p><span>如果它是一个字典(键为数据集名称、值为数据集),它将在每个数据集上进行评估,并将数据集名称添加到指标名称之前作为前缀。</span></p></li><li><p><code>tokenizer</code><span>:一个 </span><code>PreTrainedTokenizerBase</code><span>,指定用于预处理数据的 </span><code>tokenizer</code><span> 。如果提供了该参数,将用于在 </span><code>batching input</code><span> 时自动将 </span><code>input</code><span> 填充到最大长度,并且它将与模型一起保存,以便更容易重新运行中断的训练、或复用微调后的模型。</span></p></li><li><p><code>model_init</code><span>:一个可调用对象,它实例化将要被使用的模型。如果提供的话,对 </span><code>train()</code><span> 的每次调用将从这个函数给出的模型的一个新实例开始。</span></p><p><span>该函数可以有零个参数,也可以有一个包含 </span><code>optuna/Ray Tune/SigOpt</code><span> 的 </span><code>trial object</code><span> 的单个参数,以便能够根据超参数(如层数、层的维度、</span><code>dropout rate</code><span> 等)选择不同的架构。该函数返回一个 </span><code>PreTrainedModel</code><span> 对象。</span></p></li><li><p><code>compute_metrics</code><span>:一个可调用对象,指定评估时用来计算指标的函数。</span></p><p><code>compute_metrics</code><span> 必须接受一个 </span><code>EvalPrediction</code><span> ,并返回一个关于各种指标的字典。</span></p></li><li><p><code>callbacks</code><span>:一个关于 </span><code>TrainerCallback</code><span> 的列表,指定用于 </span><code>training loop</code><span> 的自定义 </span><code>callback</code><span> 列表。</span><code>Trainer</code><span> 将把这些添加到 </span><code>default callbacks</code><span> 的列表中。</span></p><p><span>如果你想删除其中一个 </span><code>default callback</code><span> ,请使用 </span><code>Trainer.remove_callback()</code><span> 方法。</span></p></li><li><p><code>optimizers</code><span>:一个元组 </span><code>Tuple[torch.optimizer, torch.optim.lr_scheduler.LambdaLR]</code><span> ,指定要使用的优化器和调度器。</span></p><p><span>默认为作用在你的模型上的 </span><code>AdamW</code><span> 实例、以及由 </span><code>args</code><span> 控制的 </span><code>get_linear_schedule_with_warmup()</code><span> 给出的调度器。</span></p></li><li><p><code>preprocess_logits_for_metrics</code><span>:一个可调用对象,它在每个评估 </span><code>step</code><span> 中 </span><code>caching logits</code><span> 之前对 </span><code>logits</code><span> 进行预处理。</span></p><p><code>preprocess_logits_for_metrics</code><span> 必须接受两个张量(即, </span><code>logits</code><span> 和 </span><code>labels</code><span> ),并在按需要处理后返回 </span><code>logits</code><span> 。</span><code>preprocess_logits_for_metrics</code><span> 所做的修改将反映在 </span><code>compute_metrics</code><span> 所收到的预测结果中。</span> <span>注意,如果数据集没有 </span><code>labels</code><span> ,则 </span><code>labels</code><span> 参数(元组的第二个位置)将是 </span><code>None</code><span> 。</span></p></li></ul><p><span>重要的属性:</span></p><ul><li><p><code>model</code><span>:始终指向</span><code>core model</code><span> 。如果使用 </span><code>transformers</code><span> 模型,它将是一个 </span><code>PreTrainedModel</code><span> 的子类。</span></p></li><li><p><code>model_wrapped</code><span>:始终指向最外层的模型,因为有的时候有一个或多个其他模块来 </span><code>wrap</code><span> 原始模型。 </span><code>model_wrapped</code><span> 就是被用来前向传播的模型。例如,在 </span><code>DeepSpeed</code><span> 下,内层模型被包裹在 </span><code>DeepSpeed</code><span> 中、然后又被包裹在</span><code>torch.nn.DistributedDataParallel</code><span> 中。</span></p><p><span>如果内层模型还没有被 </span><code>wrap</code><span> ,那么 </span><code>self.model_wrapped</code><span> 和 </span><code>self.model</code><span> 是一样的。</span></p></li><li><p><code>is_model_parallel</code><span>:一个模型是否被切换到模型并行 </span><code>model parallel</code><span> 模式(与数据并行 </span><code>data parallelism</code><span> 不同,这意味着一些 </span><code>model layers</code><span> 被分割到不同的</span><code>GPU</code><span>上)。</span></p></li><li><p><code>place_model_on_device</code><span>: 是否自动将模型放置在设备上。如果使用模型并行或 </span><code>deepspeed</code><span> 则默认为 </span><code>False</code><span> ,或者默认的</span><code>TrainingArguments.place_model_on_device</code><span> 被重写为返回 </span><code>False</code><span> 则这里的默认值也是 </span><code>False</code><span> 。</span></p></li><li><p><code>is_in_train</code><span>:模型当前是否正在运行训练(例如,当在训练中调用 </span><code>evaluation</code><span> 时)。</span></p></li></ul><p><span>方法:</span></p><ul><li><p><code>add_callback(callback: transformer.TrainerCallback)</code><span>: 添加一个 </span><code>callback</code><span> 到当前的 </span><code>transformer.TrainerCallback</code><span> 列表。</span></p><p><span>参数:</span><code>callback</code><span>:一个 </span><code>transformer.TrainerCallback</code><span> 类、或者</span><code>transformer.TrainerCallback</code><span> 的实例。如果是类,那么</span><code>Trainer</code><span> 将会实例化它。</span></p></li><li><p><code>autocast_smart_context_manager( cache_enabled: typing.Optional[bool] = True )</code><span>:一个辅助的 </span><code>wrapper</code><span>,它为 </span><code>autocast</code><span> 创建一个适当的上下文管理器,同时根据情况给它馈入所需的参数。</span></p><p><span>它用于混合精度训练,即 </span><code>torch.cuda.amp.autocast()</code><span> 。</span></p></li><li><p><code>compute_loss(model, inputs, return_outputs=False)</code><span>: 作为 </span><code>Trainer</code><span> 的计算损失的函数。默认情况下,所有的模型通过 </span><code>output</code><span> 返回损失(</span><code>output</code><span> 的第一个元素)。</span></p></li><li><p><code>compute_loss_context_manager()</code><span>:一个 </span><code>helper wrapper</code><span>,用于聚合针对 </span><code>compute_loss</code><span> 的上下文管理器(如,混合精度训练)。</span></p></li><li><p><code>create_model_card()</code><span>:创建 </span><code>model card</code><span> 的一个草稿。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">create_model_card</span>(</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">language</span>: <span class="cm-variable">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">license</span>: <span class="cm-variable">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">tags</span>: <span class="cm-variable">Union</span>[<span class="cm-builtin">str</span>, <span class="cm-variable">List</span>[<span class="cm-builtin">str</span>], <span class="cm-keyword">None</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">model_name</span>: <span class="cm-variable">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">finetuned_from</span>: <span class="cm-variable">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">tasks</span>: <span class="cm-variable">Union</span>[<span class="cm-builtin">str</span>, <span class="cm-variable">List</span>[<span class="cm-builtin">str</span>], <span class="cm-keyword">None</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">dataset_tags</span>: <span class="cm-variable">Union</span>[<span class="cm-builtin">str</span>, <span class="cm-variable">List</span>[<span class="cm-builtin">str</span>], <span class="cm-keyword">None</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">dataset</span>: <span class="cm-variable">Union</span>[<span class="cm-builtin">str</span>, <span class="cm-variable">List</span>[<span class="cm-builtin">str</span>], <span class="cm-keyword">None</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">dataset_args</span>: <span class="cm-variable">Union</span>[<span class="cm-builtin">str</span>, <span class="cm-variable">List</span>[<span class="cm-builtin">str</span>], <span class="cm-keyword">None</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 242px;"/><div class="CodeMirror-gutters" style="display: none; height: 242px;"/></div></div></pre><p><span>参数:</span></p><ul><li><code>language</code><span>:一个字符串,指定模型的语言。</span></li><li><code>license</code><span>:一个字符串,指定模型的 </span><code>license</code><span> 。</span></li><li><code>tags</code><span>:一个字符串或关于字符串的列表,指定模型卡片的 </span><code>tag</code><span> 。</span></li><li><code>model_name</code><span>:一个字符串,指定模型的名称。</span></li><li><code>finetuned_from</code><span>:一个字符串,指定当前模型从哪个模型微调而来。</span></li><li><code>tasks</code><span>:一个字符串或关于字符串的列表,指定当前模型用于哪些任务。</span></li><li><code>dataset_tags</code><span>:一个字符串或关于字符串的列表,指定数据集的 </span><code>tag</code><span> 。</span></li><li><code>dataset</code><span>:一个字符串或关于字符串的列表,指定数据集的 </span><code>identifier</code><span> 。</span></li><li><code>dataset_args</code><span>:一个字符串或关于字符串的列表,指定数据集参数。</span></li></ul></li><li><p><code>create_optimizer()</code><span>:创建优化器 </span><code>optimizer</code><span> 。 </span></p><p><span>我们提供了一个合理的默认值,运行良好。如果你想使用自定义的优化器,你可以通过 </span><code>optimizers</code><span> 参数在 </span><code>Trainer</code><span> 的 </span><code>init</code><span> 方法中传递一个元组,或者在子类中重写这个方法。</span></p></li><li><p><code>create_optimizer_and_scheduler(num_training_steps: int )</code><span>:创建优化器和学习率调度器 </span><code>scheduler</code><span> 。</span></p><p><span>参数:</span><code>num_training_steps</code><span>:一个整数,指定总的 </span><code>training step</code><span> 数量。</span></p><p><span>我们提供了一个合理的默认值,运行良好。如果你想使用自定义的优化器和调度器,你可以通过 </span><code>optimizers</code><span> 参数在 </span><code>Trainer</code><span> 的 </span><code>init</code><span> 方法中传递一个元组,或者在子类中重写这个方法。</span></p></li><li><p><code>create_scheduler( num_training_steps: int, optimizer: Optimizer = None)</code><span>:创建学习率调度器 </span><code>scheduler</code><span> 。</span></p><p><span>参数:参考 </span><code>create_optimizer_and_scheduler()</code><span> 。</span></p><p><span>我们提供了一个合理的默认值,运行良好。如果你想使用自定义的调度器,你可以通过 </span><code>optimizers</code><span> 参数在 </span><code>Trainer</code><span> 的 </span><code>init</code><span> 方法中传递一个元组,或者在子类中重写这个方法。</span></p></li><li><p><code>evaluate()</code><span>:评估模型并返回评估指标。注意,需要使用者提供一个方法来计算指标(通过 </span><code>Training</code><span> 的 </span><code>init</code><span> 方法中的 </span><code>compute_metrics</code><span> 参数)。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">evaluate</span>(</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">eval_dataset</span>: <span class="cm-variable">Optional</span>[<span class="cm-variable">Dataset</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>, <span class="cm-variable">ignore_keys</span>: <span class="cm-variable">Optional</span>[<span class="cm-variable">List</span>[<span class="cm-builtin">str</span>]] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">metric_key_prefix</span>: <span class="cm-builtin">str</span> <span class="cm-operator">=</span> <span class="cm-string">"eval"</span> ) <span class="cm-operator">-></span> <span class="cm-variable">Dict</span>[<span class="cm-builtin">str</span>, <span class="cm-builtin">float</span>]</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 66px;"/><div class="CodeMirror-gutters" style="display: none; height: 66px;"/></div></div></pre><p><span>参数:</span></p><ul><li><code>eval_dataset</code><span>:一个 </span><code>Dataset</code><span>,指定验证集。如果非 </span><code>None</code><span>,那么它将覆盖 </span><code>self.eval_dataset</code><span> 。它必须实现 </span><code>__len__()</code><span> 方法。对于前向传播不需要的列,都会被自动移除。</span></li><li><code>ignore_keys</code><span>:一个关于字符串的列表,指定需要忽略 </span><code>model output</code><span> 中的哪些 </span><code>key</code><span> (如果 </span><code>model output</code><span> 是一个字典)。</span></li><li><code>metric_key_prefix</code><span>:一个字符串,指定添加到指标名称的前缀。默认为 </span><code>eval</code><span> 。</span></li></ul></li><li><p><code>evaluation_loop()</code><span>:</span><code>prediction/evaluation</code><span> 的 </span><code>loop</code><span>,由 </span><code>Trainer.evaluate()</code><span> 和 </span><code>Trainer.predict()</code><span> 所使用。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">evaluation_loop</span>(</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">dataloader</span>: <span class="cm-variable">DataLoader</span>, <span class="cm-variable">description</span>: <span class="cm-builtin">str</span>, <span class="cm-variable">prediction_loss_only</span>: <span class="cm-variable">Optional</span>[<span class="cm-builtin">bool</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">ignore_keys</span>: <span class="cm-variable">Optional</span>[<span class="cm-variable">List</span>[<span class="cm-builtin">str</span>]] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>, <span class="cm-variable">metric_key_prefix</span>: <span class="cm-builtin">str</span> <span class="cm-operator">=</span> <span class="cm-string">"eval"</span> ) <span class="cm-operator">-></span> <span class="cm-variable">EvalLoopOutput</span></span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 66px;"/><div class="CodeMirror-gutters" style="display: none; height: 66px;"/></div></div></pre><p><span>参数:</span></p><ul><li><code>dataloader</code><span>:一个 </span><code>DataLoader</code><span> 。</span></li><li><code>description</code><span>:一个字符串,指定描述文本。</span></li><li><code>prediction_loss_only</code><span>:一个布尔值,指定是否仅计算损失函数。</span></li><li><span>其它参数参考 </span><code>evaluate()</code><span> 。</span></li></ul></li><li><p><code>floating_point_ops( inputs: typing.Dict[str, typing.Union[torch.Tensor, typing.Any]] ) -> int</code><span>:返回模型的浮点操作的数量。</span></p><p><span>参数:</span><code>inputs</code><span>:一个字典,键为字符串、值为 </span><code>torch.Tensor</code><span> 或其他对象,指定模型的 </span><code>inputs</code><span> 和 </span><code>targets</code><span> 。浮点运算数量就是在它之上统计的。</span></p><p><span>对于继承自 </span><code>PreTrainedModel</code><span> 的模型,使用该方法来计算每次反向传播+前向传播的浮点运算数量。如果使用其他模型,要么在模型中实现这样的方法,要么在子类中覆盖这个方法。</span></p></li><li><p><code>get_eval_dataloader( eval_dataset: Optional[Dataset] = None) -> DataLoader</code><span>:返回评估时的 </span><code>dataloader</code><span> 。</span></p><p><span>参数:</span><code>eval_dataset</code><span>:一个 </span><code>torch.utils.data.Dataset</code><span>,如果提供该参数,则覆盖 </span><code>self.eval_dataset</code><span> 。如果它是一个 </span><code>Transformer Dataset</code><span> 类,那么</span><code>model.forward()</code><span> 不需要的列将被自动移除。</span></p></li><li><p><code>get_test_dataloader( test_dataset: Dataset) -> DataLoader</code><span>:返回测试时的 </span><code>dataloader</code><span> 。</span></p><p><span>参数:参考 </span><code>get_eval_dataloader</code><span> 。</span></p></li><li><p><code>get_train_dataloader() -> DataLoader</code><span>:返回训练时的 </span><code>dataloader</code><span> 。</span></p><p><span>如果 </span><code>train_dataset</code><span> 没有实现 </span><code>__len__</code><span> 方法,那么将不使用 </span><code>sampler</code><span>;否则使用一个 </span><code>random sampler</code><span> (适配分布式训练,如果有必要的话)。</span></p></li><li><p><code>get_optimizer_cls_and_kwargs(args: TrainingArguments ) -> Tuple[Any, Any]</code><span>:基于 </span><code>training arguments</code><span>,返回优化器的 </span><code>class</code><span> 和优化器的参数。</span></p></li><li><p><code>hyperparameter_search()</code><span>:使用 </span><code>optuna</code><span> 或 </span><code>Ray Tune</code><span> 或 </span><code>SigOpt</code><span> 启动一个超参数搜索。</span></p><p><span>被优化的量由 </span><code>compute_objective</code><span> 决定:当没有提供指标时默认为一个返回 </span><code>evaluation loss</code><span> 的函数;否则,为所有指标的 </span><code>sum</code><span> 。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">hyperparameter_search</span>(</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">hp_space</span>: <span class="cm-variable">Optional</span>[<span class="cm-variable">Callable</span>[[<span class="cm-string">"optuna.Trial"</span>], <span class="cm-variable">Dict</span>[<span class="cm-builtin">str</span>, <span class="cm-builtin">float</span>]]] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">compute_objective</span>: <span class="cm-variable">Optional</span>[<span class="cm-variable">Callable</span>[[<span class="cm-variable">Dict</span>[<span class="cm-builtin">str</span>, <span class="cm-builtin">float</span>]], <span class="cm-builtin">float</span>]] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">n_trials</span>: <span class="cm-builtin">int</span> <span class="cm-operator">=</span> <span class="cm-number">20</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">direction</span>: <span class="cm-builtin">str</span> <span class="cm-operator">=</span> <span class="cm-string">"minimize"</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">backend</span>: <span class="cm-variable">Optional</span>[<span class="cm-variable">Union</span>[<span class="cm-string">"str"</span>, <span class="cm-variable">HPSearchBackend</span>]] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">hp_name</span>: <span class="cm-variable">Optional</span>[<span class="cm-variable">Callable</span>[[<span class="cm-string">"optuna.Trial"</span>], <span class="cm-builtin">str</span>]] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-operator">**</span><span class="cm-variable">kwargs</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">) <span class="cm-operator">-></span> <span class="cm-variable">BestRun</span></span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 198px;"/><div class="CodeMirror-gutters" style="display: none; height: 198px;"/></div></div></pre><p><span>要使用这个方法,你需要在初始化你的 </span><code>Trainer</code><span> 时提供一个 </span><code>model_init</code><span> :我们需要在每次 </span><code>new run</code><span> 时重新初始化模型。这与 </span><code>optimizer argument</code><span> 不兼容,所以你需要对 </span><code>Trainer</code><span> 进行子类化,并重写 </span><code>create_optimizer_and_scheduler()</code><span> 方法从而用于自定义 </span><code>optimizer/scheduler</code><span> 。</span></p><p><span>参数:</span></p><ul><li><code>hp_space</code><span>:一个可调用对象,它定义了超参数搜索空间。默认为 </span><code>default_hp_space_optuna()</code><span>、</span><code>default_hp_space_ray()</code><span>、或 </span><code>default_hp_space_sigopt()</code><span>,取决于你的后端。</span></li><li><code>compute_objective</code><span>:一个可调用对象,它计算目标的函数,这个目标就是我们需要最大化或最小化的。默认为 </span><code>default_compute_objective()</code><span> 。</span></li><li><code>n_trials</code><span>:一个整数,指定要测试的 </span><code>trial runs</code><span> 的数量。</span></li><li><code>direction</code><span>:一个字符串,指定是最大化还是最小化目标。可以为 </span><code>"minimize"</code><span>、</span><code>"maximize"</code><span> 。</span></li><li><code>backend</code><span>:一个字符串,指定超参数搜索的后端。默认为 </span><code>optuna</code><span> 、</span><code>Ray Tune</code><span>、或者 </span><code>SigOpt</code><span> ,取决于哪一个被安装。如果都安装了,那么默认为 </span><code>optuna</code><span> 。</span></li><li><code>hp_name</code><span>:一个可调用对象,对它调用的返回值给出了 </span><code>trial/run</code><span> 的名称。默认为 </span><code>None</code><span> 。</span></li><li><code>kwargs</code><span>:传递给 </span><code>optuna.create_study</code><span> 或 </span><code>ray.tune.run</code><span> 的额外关键字参数。</span></li></ul><p><span>返回 </span><code>trainer_utils.BestRun</code><span> ,它包含 </span><code>best run</code><span> 的所有信息。</span></p></li><li><p><code>init_git_repo(at_init: bool = False)</code><span>:在 </span><code>self.args.hub_model_id</code><span> 中初始化一个 </span><code>git repo</code><span> 。</span></p><p><span>参数:</span></p><ul><li><code>at_init</code><span>:一个布尔值,指定该函数是否在任何训练之前被调用。如果 </span><code>self.args.overwrite_output_dir = True</code><span> ,并且 </span><code>at_init=True</code><span>,那么 </span><code>repo</code><span> 的路径(也就是 </span><code>self.args.output_dir</code><span> )可能会被抹去。</span></li></ul></li><li><p><code>is_local_process_zero() -> bool</code><span>:返回当前进程是否是 </span><code>local</code><span> 的主进程。</span></p><p><code>local</code><span> 指的是分布式训练环境中的 </span><code>local</code><span> 机器。</span></p></li><li><p><code>is_world_process_zero() -> bool</code><span>:返回当前进程是否是 </span><code>global</code><span> 的主进程。</span></p><p><span>当在机台机器上执行分布式训练时,只有一个进程是 </span><code>global</code><span> 主进程,但是可能有多个进程是 </span><code>local</code><span> 主进程。</span></p></li><li><p><code>log( logs: Dict[str, float])</code><span>:记录日志。</span></p><p><span>参数:</span><code>logs</code><span>:需要被记录的内容。</span></p></li><li><p><code>log_metrics(split: str, metrics: Dict[str, float])</code><span>:以一种特殊的格式记录指标。</span></p><p><span>参数:</span></p><ul><li><code>split</code><span>:一个字符串,指定</span><code>split</code><span> 名称,如 </span><code>train</code><span>、</span><code>eval</code><span>、</span><code>test</code><span> 。</span></li><li><code>metrics</code><span>:一个字典,指定需要被记录的指标值。</span></li></ul><p><span>注意:在分布式环境下,这只对 </span><code>rank = 0</code><span> 的进程进行记录。</span></p><p><span>关于内存报告的说明:为了获得内存使用报告,你需要安装</span><code>psutil</code><span> (</span><code>pip install psutil</code><span>)。然后,当 </span><code>log_metrics()</code><span> 运行时你将看到如下的报告:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">init_mem_cpu_alloc_delta</span> <span class="cm-operator">=</span> <span class="cm-number">1301</span><span class="cm-variable">MB</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">init_mem_cpu_peaked_delta</span> <span class="cm-operator">=</span> <span class="cm-number">154</span><span class="cm-variable">MB</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">init_mem_gpu_alloc_delta</span> <span class="cm-operator">=</span> <span class="cm-number">230</span><span class="cm-variable">MB</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">init_mem_gpu_peaked_delta</span> <span class="cm-operator">=</span> <span class="cm-number">0</span><span class="cm-variable">MB</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">train_mem_cpu_alloc_delta</span> <span class="cm-operator">=</span> <span class="cm-number">1345</span><span class="cm-variable">MB</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">train_mem_cpu_peaked_delta</span> <span class="cm-operator">=</span> <span class="cm-number">0</span><span class="cm-variable">MB</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">train_mem_gpu_alloc_delta</span> <span class="cm-operator">=</span> <span class="cm-number">693</span><span class="cm-variable">MB</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">train_mem_gpu_peaked_delta</span> <span class="cm-operator">=</span> <span class="cm-number">7</span><span class="cm-variable">MB</span></span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 176px;"/><div class="CodeMirror-gutters" style="display: none; height: 176px;"/></div></div></pre><p><span>其中:</span></p><ul><li><p><span>第一个字段(如 </span><code>train_</code><span>)告诉你指标是针对哪个 </span><code>stage</code><span> 的。以</span><code>init_</code><span> 开头的报告将被添加到第一个 </span><code>stage</code><span> 。因此,如果只运行模型评估,</span><code>init_</code><span> 的内存使用量将与 </span><code>eval_</code><span> 的指标一起被报告。</span></p></li><li><p><span>第三个字段是 </span><code>cpu</code><span> 或 </span><code>gpu</code><span>,告诉你这是通用 </span><code>RAM</code><span> 指标、还是 </span><code>GPU0</code><span> 的内存指标。</span></p></li><li><p><code>alloc_delta</code><span> 是 </span><code>stage</code><span> 结束和 </span><code>state</code><span> 开始之间所使用/分配的内存计数器的差值。如果一个函数释放的内存比分配的内存更多,那么它可以是负数。</span></p></li><li><p><code>peaked_delta</code><span> 是额外的内存,这些内存被消费然后被释放掉。它永远不会是负数。</span></p><p><span>当你看任何 </span><code>stage</code><span> 的内存指标时,你把</span><code>alloc_delta + peaked_delta</code><span> 加起来,你就知道完成该 </span><code>stage</code><span> 需要多少内存了。</span></p></li></ul><p><span>报告只发生在 </span><code>rank = 0</code><span> 的进程、以及 </span><code>gpu 0</code><span>(如果有</span><code>gpu</code><span>)。通常这就足够了,因为主进程做了大部分工作,但如果使用模型并行,那么其他 </span><code>GPU</code><span> 可能会使用不同数量的 </span><code>gpu</code><span> 内存 。这在 </span><code>DataParallel</code><span> 下也是不一样的,</span><code>gpu0</code><span> 可能需要比其他 </span><code>gpu</code><span> 多得多的内存,因为它为所有参与的 </span><code>GPU</code><span>存储梯度和 </span><code>optimizer states</code><span>。也许在未来,这些报告也会发展到测量这些指标。</span></p><p><code>CPU RAM</code><span> 指标 </span><code>RSS</code><span> (常驻集大小</span><code>Resident Set Size</code><span> )包括进程特有的内存、以及与其他进程共享的内存。值得注意的是,它不包括交换出来的内存 </span><code>swapped out memory</code><span> ,所以报告可能不精确。</span></p><p><code>CPU</code><span> 的峰值内存是用一个采样线程测量的。由于 </span><code>python</code><span> 的</span><code>GIL</code><span>,如果该线程在最高内存使用时没有机会运行,它可能会错过一些峰值内存。因此,这个报告可能比实际情况要少。使用 </span><code>tracemalloc</code><span> 会报告准确的峰值内存,但它并不报告 </span><code>python</code><span> 以外的内存分配情况。因此,如果某个 </span><code>C++ CUDA extension</code><span> 分配了自己的内存,就不会被报告。因此,它被放弃了,而采用了内存采样的方法,即读取当前进程的内存使用量。</span></p><p><code>GPU</code><span> 分配的内存和峰值内存的报告是通过 </span><code>torch.cuda.memory_allocated()</code><span> 和 </span><code>torch.cuda.max_memory_allocated()</code><span> 完成的。这个指标只报告 </span><code>pytorch-specific allocation</code><span> 的 </span><code>"deltas"</code><span> ,因为 </span><code>torch.cuda</code><span> 内存管理系统并不跟踪 </span><code>pytorch</code><span> 以外分配的任何内存。例如,第一个 </span><code>cuda</code><span> 调用通常加载 </span><code>CUDA kernel</code><span> ,这可能需要 </span><code>0.5</code><span> 到 </span><code>2GB</code><span> 的 </span><code>GPU</code><span> 内存。</span></p><p><span>请注意,这个 </span><code>tracker</code><span> 并不考虑</span><code>Trainer</code><span> 的 </span><code>__init__</code><span> 、训练、评估、以及预测的调用之外的内存分配。</span></p><p><span>因为 </span><code>evaluation</code><span> 调用可能发生在训练过程中,我们无法处理嵌套调用,因为 </span><code>torch.cuda.max_memory_allocated</code><span> 是一个单一的计数器,所以如果它被一个嵌套的 </span><code>evaluation</code><span> 调用重置,</span><code>train</code><span> 的 </span><code>tracker</code><span> 将报告错误的信息。如果这个 </span><code>pytorch</code><span> 问题得到解决,就有可能把这个类改成可重入的。在那之前,我们将只追踪外层的 </span><code>train</code><span>、</span><code>evaluation</code><span> 和 </span><code>predict</code><span> 方法。这意味着如果 </span><code>eval</code><span> 在 </span><code>train</code><span> 过程中被调用,那么将 </span><code>train</code><span> 阶段统计的内存报告其实是 </span><code>eval</code><span> 的。</span></p><p><span>这也意味着,如果任何其他与 </span><code>Trainer</code><span> 一起使用的工具调用 </span><code>torch.cuda.reset_peak_memory_stats</code><span> ,</span><code>gpu</code><span> 峰值内存统计可能是无效的。而且 </span><code>Trainer</code><span> 会扰乱任何依赖调用 </span><code>torch.cuda.reset_peak_memory_stats</code><span> 的工具的正常行为。</span></p><p><span>为了获得最佳性能,你可能要考虑在生产运行中关闭 </span><code>memory profiling</code><span> 功能。</span></p></li><li><p><code>metrics_format(metrics: Dict[str, float]) -> Dict[str, float])</code><span>:格式化 </span><code>Trainer</code><span> 指标值到人类可阅读的格式。</span></p><p><span>参数:</span><code>metrics</code><span>:一个字典,指定需要被格式化的指标值。</span></p></li><li><p><code>num_examples(dataloader: DataLoader) -> int</code><span>:返回数据集中的样本数量。</span></p><p><span>如果 </span><code>dataloader.dataset</code><span> 不存在,或者 </span><code>dataloader.dataset</code><span> 没有长度,那么该方法尽力估算一个数量。</span></p></li><li><p><code>pop_callback(callback) -> transformer.TrainerCallback</code><span>:从当前的 </span><code>TrainerCallback</code><span> 列表中移除一个 </span><code>callback</code><span> 并返回它。</span></p><p><span>参数:参考 </span><code>add_callback()</code><span> 。</span></p></li><li><p><code>predict(test_dataset: Dataset, ignore_keys: Optional[List[str]] = None, metric_key_prefix: str = "test") -> PredictionOutput</code><span>:执行预测。</span></p><p><span>参数:</span></p><ul><li><code>test_dataset</code><span>:一个 </span><code>Dataset</code><span>,指定测试集。</span></li><li><span>其它参数参考 </span><code>evaluate()</code><span> 。</span></li></ul><p><span>返回一个命名元组,它包含以下字段:</span></p><ul><li><code>predictions</code><span>:一个 </span><code>np.ndarray</code><span>,包含测试集的预测结果。</span></li><li><code>label_ids</code><span>:一个 </span><code>np.ndarray</code><span>,包含 </span><code>labels</code><span>(如果测试集有的话)。</span></li><li><code>metrics</code><span>:一个字典,给出测试集上的预测结果的指标(如果测试集有 </span><code>labels</code><span> 的话)。</span></li></ul><p><span>如果测试集包含 </span><code>labels</code><span>,那么该方法也会像 </span><code>evaluate()</code><span> 那样返回指标。</span></p><p><span>如果你的 </span><code>predictions</code><span> 或 </span><code>labels</code><span> 有不同的序列长度(例如,在 </span><code>token</code><span> 分类任务中做动态填充), </span><code>predictions</code><span> 将被填充(右填充),以允许拼接成一个数组。 </span><code>padding index = -100</code><span>。</span></p></li><li><p><code>prediction_loop()</code><span>:</span><code>prediction/evaluation</code><span> 的 </span><code>loop</code><span>,由 </span><code>Trainer.evaluate()</code><span> 和 </span><code>Trainer.predict()</code><span> 所使用。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">prediction_loop</span>(</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">dataloader</span>: <span class="cm-variable">DataLoader</span>, <span class="cm-variable">description</span>: <span class="cm-builtin">str</span>, <span class="cm-variable">prediction_loss_only</span>: <span class="cm-variable">Optional</span>[<span class="cm-builtin">bool</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">ignore_keys</span>: <span class="cm-variable">Optional</span>[<span class="cm-variable">List</span>[<span class="cm-builtin">str</span>]] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>, <span class="cm-variable">metric_key_prefix</span>: <span class="cm-builtin">str</span> <span class="cm-operator">=</span> <span class="cm-string">"eval"</span> ) <span class="cm-operator">-></span> <span class="cm-variable">PredictionOutput</span></span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 66px;"/><div class="CodeMirror-gutters" style="display: none; height: 66px;"/></div></div></pre><p><span>参数:参考 </span><code>evaluation_loop()</code><span> 。</span></p></li><li><p><code>prediction_step()</code><span>:在模型上使用 </span><code>inputs</code><span> 执行单个 </span><code>evaluation step</code><span> 。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">prediction_step</span>(</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">model</span>: <span class="cm-variable">nn</span>.<span class="cm-property">Module</span>, <span class="cm-variable">inputs</span>: <span class="cm-variable">Dict</span>[<span class="cm-builtin">str</span>, <span class="cm-variable">Union</span>[<span class="cm-variable">torch</span>.<span class="cm-property">Tensor</span>, <span class="cm-variable">Any</span>]],<span class="cm-variable">prediction_loss_only</span>: <span class="cm-builtin">bool</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">ignore_keys</span>: <span class="cm-variable">Optional</span>[<span class="cm-variable">List</span>[<span class="cm-builtin">str</span>]] <span class="cm-operator">=</span> <span class="cm-keyword">None</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">) <span class="cm-operator">-></span> <span class="cm-variable">Tuple</span>[<span class="cm-variable">Optional</span>[<span class="cm-variable">torch</span>.<span class="cm-property">Tensor</span>], <span class="cm-variable">Optional</span>[<span class="cm-variable">torch</span>.<span class="cm-property">Tensor</span>], <span class="cm-variable">Optional</span>[<span class="cm-variable">torch</span>.<span class="cm-property">Tensor</span>]]</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 88px;"/><div class="CodeMirror-gutters" style="display: none; height: 88px;"/></div></div></pre><p><span>参数:</span></p><ul><li><code>model</code><span>:一个 </span><code>nn.Module</code><span> 对象,指定被使用的模型。</span></li><li><code>inputs</code><span>:一个字典,键为字符串、值为 </span><code>torch.Tensor</code><span> 或其他对象,指定模型的 </span><code>inputs</code><span> 和 </span><code>targets</code><span> 。</span></li><li><span>其它参数参考 </span><code>evaluation_loop</code><span> 。</span></li></ul><p><span>返回一个元组,分别为 </span><code>loss, logits, labels</code><span> (可能有的项没有)。</span></p></li><li><p><code>push_to_hub(commit_message: Optional[str] = "End of training", blocking: bool = True, **kwargs) -> str</code><span>:将 </span><code>self.model</code><span> 和 </span><code>self.tokenizer</code><span> 上传到 </span><code>model hub</code><span> 上的 </span><code>self.args.hub_model.id</code><span> 所对应的 </span><code>repo</code><span> 。</span></p></li><li><p><code>remove_callback(callback)</code><span>:从当前的 </span><code>TrainerCallback</code><span> 列表中移除一个 </span><code>callback</code><span> 。</span></p><p><span>参数:参考 </span><code>add_callback()</code><span> 。</span></p></li><li><p><code>save_metrics(split, metrics, combined=True)</code><span>:为指定的 </span><code>split</code><span> 保存指标到 </span><code>json</code><span> 文件,如 </span><code>train_results.json</code><span> 。</span></p><p><span>参数:</span></p><ul><li><code>combined</code><span>:一个布尔值,指定是否创建一个汇总所有 </span><code>split</code><span> 的指标到 </span><code>all_results.json</code><span> 。</span></li><li><span>其它参数参考 </span><code>log_metrics()</code><span> 。</span></li></ul><p><span>注意:在分布式环境下,这只对 </span><code>rank = 0</code><span> 的进程进行保存。</span></p></li><li><p><code>save_model(output_dir: Optional[str] = None, _internal_call: bool = False)</code><span>:保存模型,使得接下来可以采用 </span><code>from_pretrained()</code><span> 方法来加载模型。</span></p><p><span>注意:仅仅从主进程保存;另外除了保存模型之外还会保存模型相应的 </span><code>tokenizer</code><span> 。</span></p></li><li><p><code>save_state()</code><span>:保存 </span><code>Trainer state</code><span>。</span></p><p><span>注意:在分布式环境下,这只对 </span><code>rank = 0</code><span> 的进程进行保存。</span></p></li><li><p><code>train()</code><span>:训练模型。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">train</span>(</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">resume_from_checkpoint</span>: <span class="cm-variable">Optional</span>[<span class="cm-variable">Union</span>[<span class="cm-builtin">str</span>, <span class="cm-builtin">bool</span>]] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>, </span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">trial</span>: <span class="cm-variable">Union</span>[<span class="cm-variable">optuna</span>.<span class="cm-property">Trial</span>, <span class="cm-variable">Dict</span>[<span class="cm-builtin">str</span>, <span class="cm-variable">Any</span>]] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">ignore_keys_for_eval</span>: <span class="cm-variable">Optional</span>[<span class="cm-variable">List</span>[<span class="cm-builtin">str</span>]] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>, <span class="cm-operator">**</span><span class="cm-variable">kwargs</span>)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 88px;"/><div class="CodeMirror-gutters" style="display: none; height: 88px;"/></div></div></pre><p><span>参数:</span></p><ul><li><p><code>resume_from_checkpoint</code><span>:一个字符串或布尔值。</span></p><ul><li><span>如果是一个字符串,那么该字符串为 </span><code>Trainer</code><span> 前一个实例所保存的 </span><code>checkpoint</code><span> 的 </span><code>local path</code><span> 。训练将从这个 </span><code>checkpoint</code><span> 开始继续。</span></li><li><span>如果是一个布尔值且为 </span><code>True</code><span>,那么加载 </span><code>args.output_dir</code><span> 中的最近一个 </span><code>checkpoint</code><span> ,该</span><code>checkpoint</code><span> 由 </span><code>Trainer</code><span> 的前一个实例保存。训练将从这个 </span><code>checkpoint</code><span> 开始继续。</span></li></ul><p><span>对于这两种情况,训练将从这里加载的 </span><code>model/optimizer/scheduler states</code><span> 恢复。</span></p></li><li><p><code>trial</code><span>:一个</span><code>optuna.Trial</code><span> 或者字典,指定用于超参数搜索的 </span><code>trial run</code><span>、或超参数字典。</span></p></li><li><p><code>ignore_keys_for_eval</code><span>:一个关于字符串的列表,指定当在训练期间进行 </span><code>evaluation</code><span> 时,需要忽略 </span><code>model output</code><span> 中的哪些 </span><code>key</code><span> (如果 </span><code>model output</code><span> 是一个字典)。</span></p></li><li><p><code>kwargs</code><span>:关键字参数。</span></p></li></ul></li><li><p><code>training_step( model: nn.Module, inputs: Dict[str, Union[torch.Tensor, Any]]) -> torch.Tensor</code><span> :训练一个 </span><code>batch</code><span> 。</span></p><p><span>参数:参考 </span><code>prediction_step()</code><span> 。</span></p><p><span>返回值:这个 </span><code>batch</code><span> 上的训练损失。</span></p></li></ul></li><li><p><code>class transformers.Seq2SeqTrainer</code><span>:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">transformers</span>.<span class="cm-property">Seq2SeqTrainer</span>(</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">model</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-variable">transformers</span>.<span class="cm-property">modeling_utils</span>.<span class="cm-property">PreTrainedModel</span>, <span class="cm-variable">torch</span>.<span class="cm-property">nn</span>.<span class="cm-property">modules</span>.<span class="cm-property">module</span>.<span class="cm-property">Module</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">args</span>: <span class="cm-variable">TrainingArguments</span> <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">data_collator</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-variable">DataCollator</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">train_dataset</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-variable">torch</span>.<span class="cm-property">utils</span>.<span class="cm-property">data</span>.<span class="cm-property">dataset</span>.<span class="cm-property">Dataset</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">eval_dataset</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-variable">torch</span>.<span class="cm-property">utils</span>.<span class="cm-property">data</span>.<span class="cm-property">dataset</span>.<span class="cm-property">Dataset</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">tokenizer</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-variable">transformers</span>.<span class="cm-property">tokenization_utils_base</span>.<span class="cm-property">PreTrainedTokenizerBase</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">model_init</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Callable</span>[[], <span class="cm-variable">transformers</span>.<span class="cm-property">modeling_utils</span>.<span class="cm-property">PreTrainedModel</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">compute_metrics</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-variable">typing</span>.<span class="cm-property">Callable</span>[[<span class="cm-variable">transformers</span>.<span class="cm-property">trainer_utils</span>.<span class="cm-property">EvalPrediction</span>], <span class="cm-variable">typing</span>.<span class="cm-property">Dict</span>], <span class="cm-variable">NoneType</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">callbacks</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-variable">typing</span>.<span class="cm-property">List</span>[<span class="cm-variable">transformers</span>.<span class="cm-property">trainer_callback</span>.<span class="cm-property">TrainerCallback</span>]] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">optimizers</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Tuple</span>[<span class="cm-variable">torch</span>.<span class="cm-property">optim</span>.<span class="cm-property">optimizer</span>.<span class="cm-property">Optimizer</span>, <span class="cm-variable">torch</span>.<span class="cm-property">optim</span>.<span class="cm-property">lr_scheduler</span>.<span class="cm-property">LambdaLR</span>] <span class="cm-operator">=</span> (<span class="cm-keyword">None</span>, <span class="cm-keyword">None</span>),</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">preprocess_logits_for_metrics</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Callable</span>[[<span class="cm-variable">torch</span>.<span class="cm-property">Tensor</span>, <span class="cm-variable">torch</span>.<span class="cm-property">Tensor</span>], <span class="cm-variable">torch</span>.<span class="cm-property">Tensor</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span> </span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 330px;"/><div class="CodeMirror-gutters" style="display: none; height: 330px;"/></div></div></pre><p><span>参数参考 </span><code>class transformers.Trainer</code><span> 。</span></p><p><span>方法:</span></p><ul><li><p><code>evaluate(eval_dataset: Optional[Dataset] = None, ignore_keys: Optional[List[str]] = None, metric_key_prefix: str = "eval", **gen_kwargs) -> Dict[str, float]</code><span>:评估模型并返回评估指标,参考 </span><code>Trainer.evaluate()</code><span> 。</span></p><p><span>参数:</span></p><ul><li><code>max_length</code><span>:一个整数,指定生成的目标序列的最大长度。</span></li><li><code>num_beams</code><span>:一个整数,指定用于 </span><code>beam search</code><span> 的 </span><code>beam size</code><span> 。 </span><code>1</code><span> 意味着不使用 </span><code>beam search</code><span> 。</span></li><li><span>其它参数参考 </span><code>Trainer.evaluate()</code><span> 。</span></li></ul></li><li><p><code>predict(test_dataset: Dataset, ignore_keys: Optional[List[str]] = None, metric_key_prefix: str = "test", **gen_kwargs) -> PredictionOutput</code><span>:执行预测,参考 </span><code>Trainer.predict()</code><span> 。</span></p><p><span>参数:参考 </span><code>evaluate()</code><span> 和 </span><code>Trainer.predict()</code><span> 。</span></p></li></ul></li></ol><h3 id="12-注意事项"><span>1.2 注意事项</span></h3><h4 id="a-checkpoints"><span>a. Checkpoints</span></h4><ol start=""><li><p><span>默认情况下,</span><code>Trainer</code><span> 会将所有 </span><code>checkpoints</code><span> 保存在 </span><code>TrainingArguments</code><span> 中设置的 </span><code>output_dir</code><span> 。这些 </span><code>checkpoints</code><span> 将被放在名为 </span><code>checkpoint-xxx</code><span> 的子文件夹中,</span><code>xxx</code><span> 是训练所处的 </span><code>step</code><span> 。</span></p><p><span>可以通过在调用 </span><code>Trainer.train()</code><span> 使用如下的方式,从而从 </span><code>checkpoints</code><span> 恢复训练:</span></p><ul><li><code>resume_from_checkpoint=True</code><span>:这将从 </span><code>latest checkpoint</code><span> 恢复训练。</span></li><li><code>resume_from_checkpoint=checkpoint_dir</code><span>:这将从指定目录中的 </span><code>specific checkpoint</code><span> 恢复训练。</span></li></ul><p><span>此外,当使用 </span><code>push_to_hub=True</code><span> 时,你可以轻松地将 </span><code>checkpoints</code><span> 保存在 </span><code>Model Hub</code><span> 。默认情况下,所有保存在 </span><code>intermediate checkpoints</code><span> 的模型被保存在不同的 </span><code>commits</code><span> 中,但不包括 </span><code>optimizer state</code><span> 。你可以将 </span><code>TrainingArguments</code><span> 的 </span><code>hub-strategy</code><span> 值调整为如下两种:</span></p><ul><li><code>"checkpoint"</code><span>:</span><code>latest checkpoint</code><span> 也被推送到一个名为 </span><code>last-checkpoint</code><span> 的子文件夹中,允许你用</span><code>trainer.train(resume_from_checkpoint="output_dir/last-checkpoint")</code><span> 轻松恢复训练。</span></li><li><code>"all_checkpoints"</code><span>:所有 </span><code>checkpoints</code><span> 都被推送到输出文件夹中(所以你会在 </span><code>final repo</code><span> 的每个文件夹中得到一个 </span><code>checkpoint</code><span> 文件夹)。</span></li></ul></li></ol><h4 id="b-logging"><span>b. Logging</span></h4><ol start=""><li><p><span>默认情况下,</span><code>Trainer</code><span> 将对主进程使用 </span><code>logging.INFO</code><span> 、对副本使用 </span><code>logging.WARNING</code><span> (如果有副本的话)。这些默认值可以通过</span><code>TrainingArguments</code><span> 的参数被覆盖,以使用 </span><code>5</code><span> 个 </span><code>logging level</code><span> 中的任何一个:</span></p><ul><li><code>log_level</code><span> 参数:用于主进程的 </span><code>logging level</code><span> 设置。</span></li><li><code>log_level_replica</code><span> 参数:用于副本进程的 </span><code>logging level</code><span> 设置。</span></li></ul><p><span>此外,如果 </span><code>TrainingArguments</code><span> 的 </span><code>log_on_each_node = False</code><span> ,只有主节点会使用其主进程的 </span><code>log level setting</code><span> ,所有其他节点将使用副本的 </span><code>log level setting</code><span> 。</span></p><p><span>注意,</span><code>Trainer</code><span> 将在其 </span><code>Trainer.__init__()</code><span> 中为每个节点单独设置 </span><code>transformers</code><span> 的 </span><code>log level</code><span> 。因此,如果你在创建 </span><code>Trainer</code><span> 对象之前就调用了 </span><code>transformers</code><span> 的函数,你可能希望在 </span><code>Trainer</code><span> 创建之前就为 </span><code>transformers</code><span> 设置 </span><code>log level</code><span> 。示例:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python" style="break-inside: unset;"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">[<span class="cm-operator">...</span>]</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">logger</span> <span class="cm-operator">=</span> <span class="cm-variable">logging</span>.<span class="cm-property">getLogger</span>(<span class="cm-variable">__name__</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment"># Setup logging</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">logging</span>.<span class="cm-property">basicConfig</span>(</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-builtin">format</span><span class="cm-operator">=</span><span class="cm-string">"%(asctime)s - %(levelname)s - %(name)s - %(message)s"</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">datefmt</span><span class="cm-operator">=</span><span class="cm-string">"%m/%d/%Y %H:%M:%S"</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">handlers</span><span class="cm-operator">=</span>[<span class="cm-variable">logging</span>.<span class="cm-property">StreamHandler</span>(<span class="cm-variable">sys</span>.<span class="cm-property">stdout</span>)],</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment"># set the main code and the modules it uses to the same log-level according to the node</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">log_level</span> <span class="cm-operator">=</span> <span class="cm-variable">training_args</span>.<span class="cm-property">get_process_log_level</span>()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">logger</span>.<span class="cm-property">setLevel</span>(<span class="cm-variable">log_level</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">datasets</span>.<span class="cm-property">utils</span>.<span class="cm-property">logging</span>.<span class="cm-property">set_verbosity</span>(<span class="cm-variable">log_level</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">transformers</span>.<span class="cm-property">utils</span>.<span class="cm-property">logging</span>.<span class="cm-property">set_verbosity</span>(<span class="cm-variable">log_level</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">trainer</span> <span class="cm-operator">=</span> <span class="cm-variable">Trainer</span>(<span class="cm-operator">...</span>)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 374px;"/><div class="CodeMirror-gutters" style="display: none; height: 374px;"/></div></div></pre></li><li><p><span>如果你只想看到主节点上的警告,而所有其他节点不打印任何很可能是重复的警告,你可以这样运行:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">my_app.py ... <span class="cm-attribute">--log_level</span> warning <span class="cm-attribute">--log_level_replica</span> error</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre><p><span>在多节点环境中,如果你也不希望每个节点的主进程的日志重复,你要把上面的内容改为:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">my_app.py ... <span class="cm-attribute">--log_level</span> warning <span class="cm-attribute">--log_level_replica</span> error <span class="cm-attribute">--log_on_each_node</span> <span class="cm-number">0</span></span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre><p><span>如果你需要你的应用程序尽可能的安静,你可以这样做:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">my_app.py ... <span class="cm-attribute">--log_level</span> error <span class="cm-attribute">--log_level_replica</span> error <span class="cm-attribute">--log_on_each_node</span> <span class="cm-number">0</span></span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre></li></ol><h4 id="c-随机性"><span>c. 随机性</span></h4><ol start=""><li><p><span>当从 </span><code>Trainer</code><span> 生成的 </span><code>checkpoint</code><span> 恢复训练时,所有的努力都是为了将 </span><code>python, numpy, pytorch RNG</code><span> 的状态恢复到保存该 </span><code>checkpoint</code><span> 时的状态,这应该使 </span><code>"stop and resume"</code><span> 的训练方式尽可能地接近于 </span><code>non-stop training</code><span> 。</span></p><p><span>然而,由于各种默认的 </span><code>non-deterministic pytorch settings</code><span> ,这可能不完全有效。如果你想要完全的确定性,请参考 </span><code>https://pytorch.org/docs/stable/notes/randomness</code><span>。正如文档中所解释的,那些让事情变得确定的一些</span><code>settings</code><span>(如 </span><code>torch.backends.cudnn.deterministic</code><span> )可能会让事情变慢,因此这不能在默认情况下进行。但如果需要,你可以自己启用这些</span><code>settings</code><span> 。</span></p></li></ol><h4 id="d-指定-gpu"><span>d. 指定 GPU</span></h4><ol start=""><li><p><span>这里讨论一下:如何告诉你的程序哪些 </span><code>GPU</code><span> 要被使用、以及按照什么顺序来使用。</span></p></li><li><p><span>当使用 </span><code>DistributedDataParallel</code><span> 并且只使用 </span><code>GPU</code><span> 的一个子集时,你只需指定要使用的 </span><code>GPU</code><span> 的数量。例如,如果你有 </span><code>4</code><span> 个</span><code>GPU</code><span>,但你希望使用前两个,你可以这样做:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">python <span class="cm-attribute">-m</span> torch.distributed.launch <span class="cm-attribute">--nproc_per_node</span><span class="cm-operator">=</span><span class="cm-number">2</span> trainer-program.py ...</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre><p><span>如果你已经安装了 </span><code>accelerate</code><span> 或 </span><code>deepspeed</code><span> ,你也可以通过使用以下方法之一来完成同样的工作:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">accelerate launch <span class="cm-attribute">--num_processes</span> <span class="cm-number">2</span> trainer-program.py ...</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">deepspeed <span class="cm-attribute">--num_gpus</span> <span class="cm-number">2</span> trainer-program.py ...</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 44px;"/><div class="CodeMirror-gutters" style="display: none; height: 44px;"/></div></div></pre><p><span>你不需要使用 </span><code>Accelerate</code><span> 或 </span><code>Deepspeed</code><span> 的 </span><code>integration features</code><span> 来使用这些 </span><code>launchers</code><span> 。</span></p></li><li><p><span>到目前为止,我们可以告诉程序要使用多少个 </span><code>GPU</code><span> 。现在讨论一下如何选择特定的 </span><code>GPU</code><span> 并控制其顺序。</span></p><p><code>CUDA_VISIBLE_DEVICES</code><span> 环境变量可以帮助你控制使用哪些</span><code>GPU</code><span> 、以及</span><code>GPU</code><span> 的顺序,方法是:将环境变量 </span><code>CUDA_VISIBLE_DEVICES</code><span> 设置为将要使用的 </span><code>GPU</code><span> 的列表。例如,假设有</span><code>4</code><span>个 </span><code>GPU</code><span>:</span><code>0, 1, 2, 3</code><span>。为了只在物理 </span><code>GPU 0</code><span> 和 </span><code>GPU 2</code><span> 上运行,你可以这样做:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-def">CUDA_VISIBLE_DEVICES</span><span class="cm-operator">=</span><span class="cm-number">0</span>,2 python <span class="cm-attribute">-m</span> torch.distributed.launch trainer-program.py ...</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre><p><span>所以现在 </span><code>pytorch</code><span> 将只看到 </span><code>2</code><span> 个 </span><code>GPU</code><span> ,其中你的物理 </span><code>GPU 0</code><span> 和 </span><code>GPU 2</code><span> 分别映射到 </span><code>cuda:0</code><span> 和 </span><code>cuda:1</code><span> 。</span></p><ul><li><p><span>你甚至可以改变它们的顺序:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-def">CUDA_VISIBLE_DEVICES</span><span class="cm-operator">=</span><span class="cm-number">2</span>,0 python <span class="cm-attribute">-m</span> torch.distributed. launch trainer-program.py ...</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre><p><span>现在你的物理 </span><code>GPU 0</code><span> 和 </span><code>GPU 2</code><span> 被映射到 </span><code>cuda:1</code><span> 和 </span><code>cuda:0</code><span> 上。</span></p></li><li><p><span>上面的例子都是针对 </span><code>DistributedDataParallel</code><span> 的使用模式,但同样的方法也适用于</span><code>DataParallel</code><span>:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-def">CUDA_VISIBLE_DEVICES</span><span class="cm-operator">=</span><span class="cm-number">2</span>,0 python trainer-program.py ...</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre></li><li><p><span>要模拟一个没有 </span><code>GPU</code><span> 的环境,只需将这个环境变量设置为空值,像这样:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-def">CUDA_VISIBLE_DEVICES</span><span class="cm-operator">=</span> python trainer-program.py ...</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre></li><li><p><span>与任何环境变量一样,你也可以导出这些环境变量,而不是将这些环境变量添加到命令行中,例如:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">export</span> <span class="cm-def">CUDA_VISIBLE_DEVICES</span><span class="cm-operator">=</span><span class="cm-number">0</span>,2</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">python <span class="cm-attribute">-m</span> torch.distributed.launch trainer-program.py ...</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 44px;"/><div class="CodeMirror-gutters" style="display: none; height: 44px;"/></div></div></pre><p><span>但这种方法可能会让人困惑,因为你可能会忘记你之前设置的环境变量,不明白为什么会使用错误的 </span><code>GPU</code><span>。因此,通常的做法是在同一命令行中只为特定的运行设置环境变量。</span></p></li></ul></li><li><p><span>有一个额外的环境变量 </span><code>CUDA_DEVICE_ORDER</code><span> 用于控制物理设备的排序方式。两个选择是:</span></p><ul><li><p><span>根据 </span><code>PCIe</code><span> 总线 </span><code>ID</code><span> 排序(与 </span><code>nvidia-smi</code><span> 的顺序一致)。这是默认的方式。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">export</span> <span class="cm-def">CUDA_DEVICE_ORDER</span><span class="cm-operator">=</span>PCI_BUS_ID</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre></li><li><p><span>根据 </span><code>GPU</code><span> 的计算能力排序。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">export</span> <span class="cm-def">CUDA_DEVICE_ORDER</span><span class="cm-operator">=</span>FASTEST_FIRST</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre></li></ul><p><span>大多数情况下,你不需要关心这个环境变量。但是,假如你有一个旧且慢的 </span><code>GPU</code><span> 显卡、以及一个新且快的 </span><code>GPU</code><span> 显卡,并且不恰当的插入方式使得旧显卡看起来是第一位的,那么这个环境变量就非常有用。</span></p><p><span>解决这个问题的方法之一是交换显卡的插入位置。或者设置 </span><code>CUDA_DEVICE_ORDER=FASTEST_FIRST</code><span> 将总是把较快的新卡放在第一位。不过这将会有些混乱,因为 </span><code>nvidia-smi</code><span> 仍然会按照 </span><code>PCIe</code><span> 顺序报告它们。</span></p><p><span>交换顺序的另一个解决方案是使用:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">export</span> <span class="cm-def">CUDA_VISIBLE_DEVICES</span><span class="cm-operator">=</span><span class="cm-number">1</span>,0</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre></li></ol><h4 id="e-trainer-集成"><span>e. Trainer 集成</span></h4><ol start=""><li><p><code>Trainer</code><span> 已经被扩展到支持一些库,这些库可能会极大地改善你的训练时间并适应更大的模型。</span></p><p><span>目前,它支持第三方解决方案,如 </span><code>DeepSpeed, PyTorch FSDP, FairScale</code><span> ,它们实现了论文 </span><code>《ZeRO: Memory Optimizations Toward Training Trillion Parameter Models》</code><span> 的一部分。</span></p><p><span>截至本文写作时,这种提供的支持是新的和实验性的。虽然对 </span><code>DeepSpeed</code><span> 和 </span><code>PyTorch FSDP</code><span> 的支持是活跃的,我们也欢迎围绕它们的问题,但我们不再支持 </span><code>FairScale</code><span> 的集成,因为 </span><code>FairScale</code><span> 已经集成到 </span><code>PyTorch</code><span> 主系统中。</span></p></li><li><p><code>CUDA Extension</code><span> 安装:截至目前,</span><code>FairScale</code><span> 和 </span><code>Deepspeed</code><span> 都需要编译 </span><code>CUDA C++</code><span> 代码才能使用。</span></p><p><span>虽然所有的安装问题都应该通过 </span><code>FairScale</code><span> 和 </span><code>Deepspeed</code><span> 的相应 </span><code>GitHub issue</code><span> 来处理,但在构建任何需要构建 </span><code>CUDA Extension</code><span> 的 </span><code>PyTorch extension</code><span> 时,可能会遇到一些常见的问题。因此,如果你在执行如下指令时遇到了与 </span><code>CUDA</code><span> 相关的 </span><code>build issue</code><span>:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">pip install fairscale</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">pip install deepspeed</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 44px;"/><div class="CodeMirror-gutters" style="display: none; height: 44px;"/></div></div></pre><p><span>那么请阅读以下说明。在这些说明中,我们举例说明了当 </span><code>pytorch</code><span> 是用 </span><code>CUDA 10.2</code><span> 构建的时候应该怎么做。如果你的情况不一样,请记得把版本号调整为你所需要的版本。</span></p><ul><li><p><span>可能的问题 </span><code>1</code><span>:虽然,</span><code>Pytorch</code><span> 带有自己的 </span><code>CUDA toolkit</code><span> ,但要构建这两个项目(即,</span><code>fairscale, deepspeed</code><span>),你必须在全系统安装相同版本的 </span><code>CUDA</code><span> 。</span></p><p><span>例如,如果你在 </span><code>Python</code><span> 环境下安装了 </span><code>pytorch</code><span> ,并使用 </span><code>cudatoolkit==10.2</code><span> ,你也需要在全系统安装</span><code>CUDA 10.2</code><span> 。</span></p><p><span>具体位置可能因系统而异,但 </span><code>/usr/local/cuda-10.2</code><span> 是许多 </span><code>Unix</code><span> 系统上最常见的位置。当 </span><code>CUDA</code><span> 被正确设置并添加到</span><code>PATH</code><span>环境变量中时,可以通过以下操作找到安装位置:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">which nvcc</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre><p><span>如果你的系统中没有安装 </span><code>CUDA</code><span>,请先安装它。</span></p></li><li><p><span>可能的问题 </span><code>2</code><span>:你可能在系统中安装了不止一个 </span><code>CUDA toolkit</code><span> ,如:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">/usr/local/cuda-10.2</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">/usr/local/cuda-11.0</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 44px;"/><div class="CodeMirror-gutters" style="display: none; height: 44px;"/></div></div></pre><p><span>现在,在这种情况下,你需要确保你的 </span><code>PATH</code><span> 和 </span><code>LD_LIBRARY_PATH</code><span> 环境变量包含所需 </span><code>CUDA</code><span> 版本的正确路径。通常情况下,软件包安装程序会将这些设置为包含最仅安装的任何版本。如果你遇到这样的问题,即尽管你已经在全系统安装了 </span><code>CUDA</code><span> ,但由于找不到正确的 </span><code>CUDA</code><span> 版本而导致 </span><code>package</code><span> 构建失败,这意味着你需要调整上述两个环境变量:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">export</span> <span class="cm-def">PATH</span><span class="cm-operator">=</span>/usr/local/cuda-10.2/bin:<span class="cm-def">$PATH</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">export</span> <span class="cm-def">LD_LIBRARY_PATH</span><span class="cm-operator">=</span>/usr/local/cuda-10.2/lib64:<span class="cm-def">$LD_LIBRARY_PATH</span></span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 44px;"/><div class="CodeMirror-gutters" style="display: none; height: 44px;"/></div></div></pre></li><li><p><span>可能的问题 </span><code>3</code><span>:一些旧的 </span><code>CUDA</code><span> 版本可能会拒绝使用较新的编译器进行编译。例如,你有 </span><code>gcc-9</code><span> ,但它想要 </span><code>gcc-7</code><span> 。有多种方法可以解决这个问题:</span></p><ul><li><p><span>如果你能安装最新的</span><code>CUDA toolkit</code><span>,它通常应该支持较新的编译器。</span></p></li><li><p><span>另外,你可以在你已经有的编译器之外再安装低版本的编译器;或者,你已经有了低版本的编译器但它不是默认的,所以构建系统看不到它。下面的方法可能会有帮助:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-builtin">sudo</span> <span class="cm-builtin">ln</span> <span class="cm-attribute">-s</span> /usr/bin/gcc-7 /usr/local/cuda-10.2/bin/gcc</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-builtin">sudo</span> <span class="cm-builtin">ln</span> <span class="cm-attribute">-s</span> /usr/bin/g<span class="cm-operator">++</span><span class="cm-attribute">-7</span> /usr/local/cuda-10.2/bin/g<span class="cm-operator">++</span></span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 44px;"/><div class="CodeMirror-gutters" style="display: none; height: 44px;"/></div></div></pre><p><span>这里,我们从 </span><code>/usr/local/cuda-10.2/bin/gcc</code><span> 建立了一个指向 </span><code>gcc-7</code><span> 的符号链接,由于 </span><code>/usr/local/cuda-10.2/bin/</code><span> 应该在 </span><code>PATH</code><span> 环境变量中(见前面问题的解决方案),它应该找到 </span><code>gcc-7</code><span>(和 </span><code>g++7</code><span> ),然后构建就会成功。</span></p></li></ul></li></ul></li><li><p><code>PyTorch Fully Sharded Data Parallel: FSDP</code><span>:为了在更大的 </span><code>batch size</code><span> 上加速训练巨大的模型,我们可以使用一个 </span><code>fully sharded data parallel model</code><span> 。这种类型的数据并行范式通过分片 </span><code>optimizer states</code><span> 、梯度、以及</span><code>parameters</code><span>,能够适应更多的数据和更大的模型。我们已经集成了最新 </span><code>PyTorch’s Fully Sharded Data Parallel: FSDP</code><span> 训练特性。你只需通过配置将其启用即可。</span></p><p><span>注意,必须从 </span><code>PyTorch 1.12.0</code><span> 及其以后的版本才可以使用 </span><code>FSDP</code><span> 的能力。</span></p><p><span>用法:</span></p><ul><li><p><span>确保你已经添加了</span><code>distributed launcher</code><span>:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">python <span class="cm-attribute">-m</span> torch.distributed. launch <span class="cm-attribute">--nproc_per_node</span><span class="cm-operator">=</span>NUMBER_OF_GPUS_YOU_HAVE ...</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre></li><li><p><span>分片策略:</span></p><ul><li><code>FULL_SHARD</code><span>:将 </span><code>optimizer states + gradients + model parameters</code><span> 分片到 </span><code>data parallel workers/GPUs</code><span> 中。为此,在命令行参数中添加 </span><code>-fsdp full_shard</code><span> 。</span></li><li><code>SHARD_GRAD_OP</code><span>:将 </span><code>optimizer states + gradients</code><span> 分片到 </span><code>data parallel workers/GPUs</code><span> 中。为此,在命令行参数中添加 </span><code>-fsdp shard_grad_op</code><span> 。</span></li><li><code>NO_SHARD</code><span>:不分片。为此,在命令行参数中添加 </span><code>-fsdp no_shard</code><span> 。</span></li></ul></li><li><p><span>要将 </span><code>parameters</code><span> 和 </span><code>gradients</code><span> 卸载到 </span><code>CPU</code><span> ,请在命令行参数中添加:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-attribute">--fsdp</span> <span class="cm-string">"full_shard offload"</span> <span class="cm-comment"># or --fsdp "shard_grad_op offload"</span></span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre></li><li><p><span>要使用 </span><code>default_auto_wrap_policy</code><span> 来采用 </span><code>FSDP</code><span> 自动递归地 </span><code>wrap layers</code><span> ,请在命令行参数中添加:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-attribute">--fsdp</span> <span class="cm-string">"full_shard auto_wrap"</span> <span class="cm-comment"># or --fsdp "shard_grad_op auto_wrap"。</span></span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre></li><li><p><span>要同时启用 </span><code>CPU</code><span> 卸载和 </span><code>auto wrapping</code><span>,请在命令行参数中添加:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-attribute">--fsdp</span> <span class="cm-string">"full_shard offload auto_wrap"</span> <span class="cm-comment"># or --fsdp "shard_grad_op offload auto_wrap"</span></span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre></li><li><p><span>如果启用了 </span><code>auto wrapping</code><span>,你可以使用 </span><code>transformer based auto wrap policy</code><span> 或 </span><code>size based auto wrap policy</code><span>。</span></p><ul><li><p><span>对于 </span><code>transformer based auto wrap policy</code><span> ,请在命令行参数中加入:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-attribute">--fsdp_transformer_layer_cls_to_wrap</span> <value> </span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre><p><span>这指定了要包装的 </span><code>transformer layer class name</code><span>(区分大小写),例如,</span><code>BertLayer, GPTJBlock, T5Block,...</code><span> 。这很重要,因为共享权重的子模块(例如,</span><code>embedding layer</code><span>)不应该最终出现在不同的 </span><code>FSDP wrapped units</code><span> 中。</span></p><p><span>使用这个策略,每个包含 </span><code>Multi-Head Attention followed by couple of MLP layers</code><span> 的 </span><code>block</code><span> 都会发生包装。其余的层,包括 </span><code>shared embeddings</code><span> ,都方便地被包裹在同一个最外层的 </span><code>FSDP unit</code><span> 中。因此,对于 </span><code>transformer based</code><span> 的模型,可以使用这个策略。</span></p></li><li><p><span>对于 </span><code>size based auto wrap policy</code><span>,请在命令行参数中加入:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-attribute">--fsdp_min_num_params</span> <number></span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre><p><span>它指定了 </span><code>FSDP auto wrapping</code><span> 的最少的 </span><code>parameters</code><span> 数量。</span></p></li></ul></li><li><p><span>一些注意事项:</span></p><ul><li><code>FSDP</code><span> 目前不支持混合精度,因为我们在等待 </span><code>PyTorch</code><span> 修复对混合精度的支持。</span></li><li><code>FSDP</code><span> 目前不支持</span><code>multiple parameter groups</code><span> 。</span></li></ul></li></ul></li></ol><h2 id="二callbacks"><span>二、Callbacks</span></h2><ol start=""><li><p><code>callbacks</code><span> 是一种对象,它可以自定义 </span><code>PyTorch Trainer</code><span> 中 </span><code>training loop</code><span> (</span><code>TensorFlow</code><span> 中尚未实现此功能)。例如,检查 </span><code>training loop</code><span> 状态(用于进度报告、在 </span><code>TensorBoard</code><span> 或其他 </span><code>ML</code><span> 平台上进行 </span><code>logging</code><span>)并做出决定(如 </span><code>early stopping</code><span> )。</span></p><p><code>callbacks</code><span> 是 </span><code>"read only"</code><span> 的代码,除了它们返回的 </span><code>TrainerControl</code><span> 对象外,它们不能改变 </span><code>training loop</code><span> 中的任何东西。如果需要改变 </span><code>training loop</code><span> ,那么你应该对 </span><code>Trainer</code><span> 进行子类化并覆盖你想要改变的方法。</span></p></li><li><p><span>默认情况下,</span><code>Trainer</code><span> 将使用以下 </span><code>callbacks</code><span>:</span></p><ul><li><code>DefaultFlowCallback</code><span>:处理 </span><code>logging, saving, evaluation</code><span> 的默认 </span><code>callback</code><span> 。</span></li><li><code>PrinterCallback</code><span> 或 </span><code>ProgressCallback</code><span>:显示训练进度,或打印日志。如果你通过 </span><code>TrainingArguments</code><span> 禁用 </span><code>tqdm</code><span>,那么</span><code>Trainer</code><span> 就使用 </span><code>PrinterCallback</code><span>;否则就使用 </span><code>ProgressCallback</code><span> 。</span></li><li><code>TensorBoardCallback</code><span>:如果 </span><code>tensorboard</code><span>可用(安装了 </span><code>PyTorch >= 1.4</code><span> 或 </span><code>tensorboardX</code><span> ),则</span><code>Trainer</code><span> 就使用 </span><code>TensorBoardCallback</code><span> 。</span></li><li><code>WandbCallback</code><span>:如果 </span><code>wandb</code><span> 已安装,则 </span><code>Trainer</code><span> 使用 </span><code>WandbCallback</code><span>。</span></li><li><code>CometCallback</code><span>:如果 </span><code>comet_ml</code><span> 已安装,则 </span><code>Trainer</code><span> 使用 </span><code>CometCallback</code><span>。</span></li><li><code>MLflowCallback</code><span>:如果 </span><code>mlflow</code><span> 已安装,则 </span><code>Trainer</code><span> 使用 </span><code>MLflowCallback</code><span>。</span></li><li><code>NeptuneCallback</code><span>:如果 </span><code>neptune</code><span> 已安装,则 </span><code>Trainer</code><span> 使用 </span><code>NeptuneCallback</code><span>。</span></li><li><code>AzureMLCallback</code><span>:如果 </span><code>azureml-sdk</code><span> 已安装,则 </span><code>Trainer</code><span> 使用 </span><code>AzureMLCallback</code><span>。</span></li><li><code>CodeCarbonCallback</code><span>:如果 </span><code>codecarbon</code><span> 已安装,则 </span><code>Trainer</code><span> 使用 </span><code>CodeCarbonCallback</code><span>。</span></li><li><code>ClearMLCallback</code><span>:如果 </span><code>clearml</code><span> 已安装,则 </span><code>Trainer</code><span> 使用 </span><code>ClearMLCallback</code><span>。</span></li></ul></li><li><p><span>实现 </span><code>callbacks</code><span> 的主要类是 </span><code>TrainerCallback</code><span> 。它获得用于实例化 </span><code>Trainer</code><span> 的 </span><code>TrainingArguments</code><span> ,可以通过 </span><code>TrainerState</code><span> 访问该 </span><code>Trainer</code><span> 的内部状态,并且可以通过 </span><code>TrainerControl</code><span> 对 </span><code>training loop</code><span> 采取一些行动。</span></p></li></ol><h3 id="21-api"><span>2.1 API</span></h3><ol start=""><li><p><code>class transformers.TrainerCallback</code><span>:</span><code>TrainerCallback</code><span> ,它将在一些事件中检查 </span><code>training loop</code><span> 的状态并作出一些决定。</span></p><p><span>初始化参数:</span></p><ul><li><p><code>args</code><span>:一个 </span><code>TrainingArguments</code><span>,指定用于实例化 </span><code>Trainer</code><span> 的训练参数。</span></p></li><li><p><code>state</code><span>:一个 </span><code>TrainerState</code><span>,指定训练器的当前状态。</span></p></li><li><p><code>control</code><span>:一个 </span><code>TrainerControl</code><span>,指定返回给训练器的对象,它可以用来做一些决定。</span></p></li><li><p><code>model</code><span>:一个 </span><code>PreTrainedModel</code><span> 或 </span><code>torch.nn.Module</code><span>,指定正在训练的模型。</span></p></li><li><p><code>tokenizer</code><span>:一个 </span><code>PreTrainedTokenizer</code><span>,指定用于对数据进行编码的 </span><code>tokenizer</code><span>。</span></p></li><li><p><code>optimizer</code><span>:一个 </span><code>torch.optim.Optimizer</code><span>,指定用于训练的优化器。</span></p></li><li><p><code>lr_scheduler</code><span>:一个 </span><code>torch.optim.lr_scheduler.LambdaLR</code><span>,指定用于训练的学习率调度器。</span></p></li><li><p><code>train_dataloader</code><span>:一个 </span><code>torch.utils.data.DataLoader</code><span>,指定 </span><code>training dataloader</code><span> 。</span></p></li><li><p><code>eval_dataloader</code><span>:一个 </span><code>torch.utils.data.DataLoader</code><span>,指定 </span><code>evaluation dataloader</code><span> 。</span></p></li><li><p><code>metrics</code><span>:一个字典 </span><code>Dict[str, float]</code><span>,指定由上一次 </span><code>evaluation</code><span> 阶段计算得到的指标。</span></p><p><span>它仅在 </span><code>on_evaluate</code><span> 事件中才能访问。</span></p></li><li><p><code>logs</code><span>:一个字典 </span><code>Dict[str, float]</code><span>,指定需要 </span><code>log</code><span> 的内容。</span></p><p><span>它只能在事件 </span><code>on_log</code><span> 中访问。</span></p></li></ul><p><span>方法(这些参数参考初始化参数):</span></p><ul><li><p><code>on_epoch_begin(args: TrainingArguments, state: TrainerState, control: TrainerControl, **kwargs)</code><span>:在一个 </span><code>epoch</code><span> 的开始时被调用的事件。</span></p></li><li><p><code>on_epoch_end(args: TrainingArguments, state: TrainerState, control: TrainerControl, **kwargs)</code><span>:在一个 </span><code>epoch</code><span> 的结束时被调用的事件。</span></p></li><li><p><code>on_evaluate(args: TrainingArguments, state: TrainerState, control: TrainerControl, **kwargs)</code><span>:在 </span><code>evaluation</code><span> 阶段之后被调用的事件。</span></p></li><li><p><code>on_init_end(args: TrainingArguments, state: TrainerState, control: TrainerControl, **kwargs)</code><span>:在 </span><code>Trainer</code><span> 的初始化结束之后被调用的事件。</span></p></li><li><p><code>on_log(args: TrainingArguments, state: TrainerState, control: TrainerControl, **kwargs)</code><span>:在 </span><code>logging last logs</code><span> 之后被调用的事件。</span></p></li><li><p><code>on_predict(args: TrainingArguments, state: TrainerState, control: TrainerControl, metrics, **kwargs)</code><span> :在一个成功的预测之后被调用的事件。</span></p></li><li><p><code>on_prediction_step(args: TrainingArguments, state: TrainerState, control: TrainerControl, **kwargs)</code><span>:在一个 </span><code>prediction step</code><span> 之后被调用的事件。</span></p></li><li><p><code>on_save(args: TrainingArguments, state: TrainerState, control: TrainerControl, **kwargs)</code><span>:在一个 </span><code>checkpoint save</code><span> 之后被调用的事件。</span></p></li><li><p><code>on_step_begin(args: TrainingArguments, state: TrainerState, control: TrainerControl, **kwargs)</code><span>:在一个 </span><code>training step</code><span> 之前被调用的事件。</span></p><p><span>如果使用梯度累积</span><code>gradient accumulation</code><span>,那么一个 </span><code>training step</code><span> 可能需要若干个 </span><code>inputs</code><span> 。</span></p></li><li><p><code>on_step_end(args: TrainingArguments, state: TrainerState, control: TrainerControl, **kwargs)</code><span>:在一个 </span><code>training step</code><span> 之后被调用的事件。</span></p><p><span>如果使用梯度累积</span><code>gradient accumulation</code><span>,那么一个 </span><code>training step</code><span> 可能需要若干个 </span><code>inputs</code><span> 。</span></p></li><li><p><code>on_substep_end(args: TrainingArguments, state: TrainerState, control: TrainerControl, **kwargs)</code><span>:在 </span><code>gradient accumulation</code><span> 期间的每个 </span><code>training substep</code><span> 之后被调用的事件。</span></p></li><li><p><code>on_train_begin(args: TrainingArguments, state: TrainerState, control: TrainerControl, **kwargs)</code><span>:在训练开始时被调用的事件。</span></p></li><li><p><code>on_train_end(args: TrainingArguments, state: TrainerState, control: TrainerControl, **kwargs)</code><span>:在训练结束时被调用的事件。</span></p></li></ul><p><span>在每个事件中,都有以下参数:</span></p><ul><li><code>control</code><span> 对象:是唯一可以被 </span><code>callback</code><span> 改变的对象,在这种情况下,改变它的事件应该返回修改后的版本。</span></li><li><code>args, state, control</code><span> 是所有事件中的位置参数,而其他参数都位于 </span><code>kwargs</code><span> 关键字参数。你可以 </span><code>unpack</code><span> 你需要的关键字参数。例如:</span></li></ul><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">PrinterCallback</span>(<span class="cm-variable">TrainerCallback</span>):</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-keyword">def</span> <span class="cm-def">on_log</span>(<span class="cm-variable-2">self</span>, <span class="cm-variable">args</span>, <span class="cm-variable">state</span>, <span class="cm-variable">control</span>, <span class="cm-variable">logs</span><span class="cm-operator">=</span><span class="cm-keyword">None</span>, <span class="cm-operator">**</span><span class="cm-variable">kwargs</span>):</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">_</span> <span class="cm-operator">=</span> <span class="cm-variable">logs</span>.<span class="cm-property">pop</span>(<span class="cm-string">"total_flos"</span>, <span class="cm-keyword">None</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-keyword">if</span> <span class="cm-variable">state</span>.<span class="cm-property">is_local_process_zero</span>:</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-builtin">print</span>(<span class="cm-variable">logs</span>)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 110px;"/><div class="CodeMirror-gutters" style="display: none; height: 110px;"/></div></div></pre></li><li><p><span>将一个自定义的 </span><code>callback</code><span> 注册到 </span><code>PyTorch Trainer</code><span> 的例子:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">MyCallback</span>(<span class="cm-variable">TrainerCallback</span>):</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-keyword">def</span> <span class="cm-def">on_train_begin</span>(<span class="cm-variable-2">self</span>, <span class="cm-variable">args</span>, <span class="cm-variable">state</span>, <span class="cm-variable">control</span>, <span class="cm-operator">**</span><span class="cm-variable">kwargs</span>):</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-builtin">print</span>(<span class="cm-string">"Starting training"</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">trainer</span> <span class="cm-operator">=</span> <span class="cm-variable">Trainer</span>(<span class="cm-variable">model</span>,<span class="cm-variable">args</span>,<span class="cm-variable">train_dataset</span><span class="cm-operator">=</span><span class="cm-variable">train_dataset</span>,<span class="cm-variable">eval_dataset</span><span class="cm-operator">=</span><span class="cm-variable">eval_dataset</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">callbacks</span><span class="cm-operator">=</span>[<span class="cm-variable">MyCallback</span>], <span class="cm-comment"># 可以传入一个类,也可以传入一个 callback 对象</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 154px;"/><div class="CodeMirror-gutters" style="display: none; height: 154px;"/></div></div></pre><p><span>也可以通过如下的方式注册:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">trainer</span> <span class="cm-operator">=</span> <span class="cm-variable">Trainer</span>(<span class="cm-operator">...</span>)</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">trainer</span>.<span class="cm-property">add_callback</span>(<span class="cm-variable">MyCallback</span>) <span class="cm-comment"># 或者 trainer.add_callback(MyCallback())</span></span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 44px;"/><div class="CodeMirror-gutters" style="display: none; height: 44px;"/></div></div></pre></li><li><p><code>library</code><span> 中目前可用的 </span><code>TrainerCallback</code><span>:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python" style="break-inside: unset;"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">transformers</span>.<span class="cm-property">integrations</span>.<span class="cm-property">CometCallback</span>() <span class="cm-comment"># send logs to CometML</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-tab" role="presentation" cm-text=" "> </span><span class="cm-keyword">def</span> <span class="cm-def">setup</span>(<span class="cm-variable">args</span>, <span class="cm-variable">state</span>, <span class="cm-variable">model</span>) </span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">transformers</span>.<span class="cm-property">DefaultFlowCallback</span>() <span class="cm-comment"># default callback for logging, saving, evaluation</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">transformers</span>.<span class="cm-property">PrinterCallback</span>() <span class="cm-comment"># just prints the logs</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">transformers</span>.<span class="cm-property">ProgressCallback</span>() <span class="cm-comment"># displays the progress of training or evaluation</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">transformers</span>.<span class="cm-property">EarlyStoppingCallback</span>( <span class="cm-comment"># handles early stopping, Use with TrainingArguments metric_for_best_model </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">early_stopping_patience</span>: <span class="cm-builtin">int</span> <span class="cm-operator">=</span> <span class="cm-number">1</span>, </span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">early_stopping_threshold</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">float</span>] <span class="cm-operator">=</span> <span class="cm-number">0.0</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">transformers</span>.<span class="cm-property">integrations</span>.<span class="cm-property">TensorBoardCallback</span>( <span class="cm-variable">tb_writer</span> <span class="cm-operator">=</span> <span class="cm-keyword">None</span> ) <span class="cm-comment"># sends the logs to TensorBoard</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">transformers</span>.<span class="cm-property">integrations</span>.<span class="cm-property">WandbCallback</span>() <span class="cm-comment"># sends the logs to Weight and Biases</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-keyword">def</span> <span class="cm-def">setup</span>(<span class="cm-variable">args</span>, <span class="cm-variable">state</span>, <span class="cm-variable">model</span>, <span class="cm-operator">**</span><span class="cm-variable">kwargs</span> )</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">transformers</span>.<span class="cm-property">integrations</span>.<span class="cm-property">MLflowCallback</span>() <span class="cm-comment"># sends the logs to MLflow</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-tab" role="presentation" cm-text=" "> </span><span class="cm-keyword">def</span> <span class="cm-def">setup</span>(<span class="cm-variable">args</span>, <span class="cm-variable">state</span>, <span class="cm-variable">model</span>) </span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">transformers</span>.<span class="cm-property">integrations</span>.<span class="cm-property">AzureMLCallback</span>(<span class="cm-variable">azureml_run</span> <span class="cm-operator">=</span> <span class="cm-keyword">None</span>) <span class="cm-comment"># sends the logs to AzureML</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">transformers</span>.<span class="cm-property">integrations</span>.<span class="cm-property">CodeCarbonCallback</span>() <span class="cm-comment"># tracks the CO2 emission of training</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">transformers</span>.<span class="cm-property">integrations</span>.<span class="cm-property">NeptuneCallback</span>( <span class="cm-comment"># sends the logs to Neptune</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">api_token</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">project</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">name</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">base_namespace</span>: <span class="cm-builtin">str</span> <span class="cm-operator">=</span> <span class="cm-string">'finetuning'</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">run</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-variable">ForwardRef</span>(<span class="cm-string">'Run'</span>)] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">log_parameters</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">True</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">log_checkpoints</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-operator">**</span><span class="cm-variable">neptune_run_kwargs</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">transformers</span>.<span class="cm-property">integrations</span>.<span class="cm-property">ClearMLCallback</span>() <span class="cm-comment"># sends the logs to ClearML</span></span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 594px;"/><div class="CodeMirror-gutters" style="display: none; height: 594px;"/></div></div></pre></li><li><p><code>class transformers.TrainerState</code><span>:一个包含</span><code>Trainer</code><span> 内部状态的类,在 </span><code>checkpointing</code><span> 时将伴随着模型和优化器保存并传递给 </span><code>TrainerCallback</code><span> 。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python" style="break-inside: unset;"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">transformers</span>.<span class="cm-property">TrainerState</span>(</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">epoch</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">float</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">global_step</span>: <span class="cm-builtin">int</span> <span class="cm-operator">=</span> <span class="cm-number">0</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">max_steps</span>: <span class="cm-builtin">int</span> <span class="cm-operator">=</span> <span class="cm-number">0</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">num_train_epochs</span>: <span class="cm-builtin">int</span> <span class="cm-operator">=</span> <span class="cm-number">0</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">total_flos</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">0</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">log_history</span>: <span class="cm-variable">typing</span>.<span class="cm-property">List</span>[<span class="cm-variable">typing</span>.<span class="cm-property">Dict</span>[<span class="cm-builtin">str</span>, <span class="cm-builtin">float</span>]] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">best_metric</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">float</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">best_model_checkpoint</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">is_local_process_zero</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">True</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">is_world_process_zero</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">True</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">is_hyper_param_search</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">trial_name</span>: <span class="cm-builtin">str</span> <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">trial_params</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Dict</span>[<span class="cm-builtin">str</span>, <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-builtin">str</span>, <span class="cm-builtin">float</span>, <span class="cm-builtin">int</span>, <span class="cm-builtin">bool</span>]] <span class="cm-operator">=</span> <span class="cm-keyword">None</span> </span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 330px;"/><div class="CodeMirror-gutters" style="display: none; height: 330px;"/></div></div></pre><p><span>参数:</span></p><ul><li><code>epoch</code><span>:一个浮点数,仅用于训练期间,指定当前训练所处的 </span><code>epoch</code><span>(小数部分代表当前 </span><code>epoch</code><span> 完成的百分比)。</span></li><li><code>global_step</code><span>:一个整数,仅用于训练期间,指定已经完成的 </span><code>update steps</code><span> 数量。</span></li><li><code>max_steps</code><span>:一个整数,指定当前训练需要执行的 </span><code>update steps</code><span> 数量。</span></li><li><code>total_flos</code><span>:一个浮点数,指定从训练开始以来,模型所做的浮点预算的总和。以浮点形式存储,避免溢出。</span></li><li><code>log_history</code><span>:一个关于字典的列表 </span><code>List[Dict[str, float]]</code><span>,指定自训练开始以来完成的日志列表。</span></li><li><code>best_metric</code><span>:一个浮点数,指定当 </span><code>tracking best model</code><span> 时,到目前为止遇到的最佳指标值。</span></li><li><code>best_model_checkpoint</code><span>:一个浮点数,指定当 </span><code>tracking best model</code><span> 时,到目前为止遇到的最佳模型的 </span><code>checkpoint</code><span> 的名称。</span></li><li><code>is_local_process_zero</code><span>:一个布尔值,指定当前进程是否是 </span><code>local</code><span> 的主进程(用于分布式训练的场景)。</span></li><li><code>is_world_process_zero</code><span>:一个布尔值,指定当前进程是否是 </span><code>global</code><span> 的主进程。当以分布式的方式在几台机器上进行训练时,只有一个进程为 </span><code>True</code><span>。</span></li><li><code>is_hyper_param_search</code><span>:一个布尔值,指定我们是否正在使用 </span><code>Trainer.hyperparameter_search</code><span> 进行超参数搜索。这将影响数据在 </span><code>TensorBoard</code><span> 中的记录方式。</span></li></ul><p><span>注意,在 </span><code>TrainerState</code><span> 中,一个 </span><code>step</code><span> 应理解为一个 </span><code>update step</code><span> 。当使用 </span><code>gradient accumulation</code><span> 时,一个 </span><code>update step</code><span> 可能需要几个前向和反向传播:如果你使用 </span><code>gradient_accumulation_steps=n</code><span> ,那么一个 </span><code>update step</code><span> 需要经过 </span><mjx-container class="MathJax" jax="SVG" style="position: relative;"><svg xmlns="http://www.w3.org/2000/svg" width="1.357ex" height="1.025ex" role="img" focusable="false" viewbox="0 -442 600 453" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" style="vertical-align: -0.025ex;"><defs><path id="MJX-1-TEX-I-1D45B" d="M21 287Q22 293 24 303T36 341T56 388T89 425T135 442Q171 442 195 424T225 390T231 369Q231 367 232 367L243 378Q304 442 382 442Q436 442 469 415T503 336T465 179T427 52Q427 26 444 26Q450 26 453 27Q482 32 505 65T540 145Q542 153 560 153Q580 153 580 145Q580 144 576 130Q568 101 554 73T508 17T439 -10Q392 -10 371 17T350 73Q350 92 386 193T423 345Q423 404 379 404H374Q288 404 229 303L222 291L189 157Q156 26 151 16Q138 -11 108 -11Q95 -11 87 -5T76 7T74 17Q74 30 112 180T152 343Q153 348 153 366Q153 405 129 405Q91 405 66 305Q60 285 60 284Q58 278 41 278H27Q21 284 21 287Z"/></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="mi"><use data-c="1D45B" xlink:href="#MJX-1-TEX-I-1D45B"/></g></g></g></svg><mjx-assistive-mml unselectable="on" display="inline"><math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math></mjx-assistive-mml></mjx-container><span> 个 </span><code>batch</code><span> 。</span></p><p><span>方法:</span></p><ul><li><code>load_from_json(json_path: str )</code><span>:从 </span><code>json_path</code><span> 的内容创建一个 </span><code>TrainerState</code><span> 实例。</span></li><li><code>save_to_json(json_path: str )</code><span>:将当前实例的内容以 </span><code>JSON</code><span> 格式存储到 </span><code>json_path</code><span> 。</span></li></ul></li><li><p><code>class class transformers.TrainerControl</code><span>:一个处理 </span><code>Trainer</code><span> 控制流的类。这个类被 </span><code>TrainerCallback</code><span> 用来激活 </span><code>training loop</code><span> 中的一些开关。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">transformers</span>.<span class="cm-property">TrainerControl</span>(</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">should_training_stop</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">should_epoch_stop</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">should_save</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">should_evaluate</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">should_log</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span> </span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 154px;"/><div class="CodeMirror-gutters" style="display: none; height: 154px;"/></div></div></pre><p><span>参数:</span></p><ul><li><code>should_training_stop</code><span>:一个布尔值,指定训练是否应该被中断。如果为 </span><code>True</code><span>,那么这个变量将没有机会被设置为 </span><code>False</code><span>,因为训练将直接停止。</span></li><li><code>should_epoch_stop</code><span>:一个布尔值,指定当前的 </span><code>epoch</code><span> 是否应该被中断。如果是 </span><code>True</code><span> ,这个变量将在下一个 </span><code>epoch</code><span> 的开始被设置为 </span><code>False</code><span>。</span></li><li><code>should_save</code><span>:一个布尔值,指定当前 </span><code>step</code><span> 是否应该保存模型。如果是 </span><code>True</code><span> ,这个变量将在下一个 </span><code>step</code><span> 开始时被设置为</span><code>False</code><span>。</span></li><li><code>should_evaluate</code><span>:一个布尔值,指定当前 </span><code>step</code><span> 是否应该评估模型。如果是 </span><code>True</code><span> ,这个变量将在下一个 </span><code>step</code><span> 开始时被设置为</span><code>False</code><span>。</span></li><li><code>should_log</code><span>:一个布尔值,指定当前 </span><code>step</code><span> 是否应该上报日志。如果是 </span><code>True</code><span> ,这个变量将在下一个 </span><code>step</code><span> 开始时被设置为</span><code>False</code><span>。</span></li></ul></li></ol><h2 id="三keras-callbacks"><span>三、Keras callbacks</span></h2><h3 id="31-api"><span>3.1 API</span></h3><ol start=""><li><p><code>class transformers.KerasMetricCallback</code><span>:用于 </span><code>keras</code><span> 的 </span><code>callback</code><span>,用于在每个 </span><code>epoch</code><span> 结束时计算指标。</span></p><p><span>与普通的 </span><code>Keras</code><span> 指标不同,这些指标不需要由 </span><code>TF</code><span> 来编译。它对于像 </span><code>BLEU</code><span> 和 </span><code>ROUGE</code><span> 这样需要字符串操作或 </span><code>generation loop</code><span> 的常见 </span><code>NLP</code><span> 指标特别有用,这些指标不能被编译。预测(或生成)将在 </span><code>eval_dataset</code><span> 上计算,然后以 </span><code>np.ndarray</code><span> 格式传递给</span><code>metric_fn</code><span>。</span><code>metric_fn</code><span>应该计算指标并返回一个字典,字典的键为指标名、值为指标值。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">transformers</span>.<span class="cm-property">KerasMetricCallback</span>(</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">metric_fn</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Callable,</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">eval_dataset</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-variable">tensorflow</span>.<span class="cm-property">python</span>.<span class="cm-property">data</span>.<span class="cm-property">ops</span>.<span class="cm-property">dataset_ops</span>.<span class="cm-property">DatasetV2</span>, <span class="cm-variable">numpy</span>.<span class="cm-property">ndarray</span>, <span class="cm-variable">tensorflow</span>.<span class="cm-property">python</span>.<span class="cm-property">framework</span>.<span class="cm-property">ops</span>.<span class="cm-property">Tensor</span>, <span class="cm-builtin">tuple</span>, <span class="cm-builtin">dict</span>]<span class="cm-variable">,</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">output_cols</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-variable">typing</span>.<span class="cm-property">List</span>[<span class="cm-builtin">str</span>]] <span class="cm-operator">=</span> <span class="cm-keyword">None</span><span class="cm-variable">,</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">label_cols</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-variable">typing</span>.<span class="cm-property">List</span>[<span class="cm-builtin">str</span>]] <span class="cm-operator">=</span> <span class="cm-keyword">None</span><span class="cm-variable">,</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">batch_size</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">int</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span><span class="cm-variable">,</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">predict_with_generate</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span><span class="cm-variable">,</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">use_xla_generation</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span><span class="cm-variable">,</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">generate_kwargs</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">dict</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 242px;"/><div class="CodeMirror-gutters" style="display: none; height: 242px;"/></div></div></pre><p><span>参数:</span></p><ul><li><p><code>metric_fn</code><span>:一个可调用对象,指定度量函数。调用</span><code>metric_fn</code><span> 时需要提供两个参数:</span><code>predictions</code><span> 和 </span><code>labels</code><span>,它们分别对应了模型的输出结果、以及 </span><code>ground-truth label</code><span> 。</span><code>metric_fn</code><span> 函数需要返回一个字典,字典的键为指标名、值为指标值。</span></p><p><span>下面是一个摘要模型计算 </span><code>ROUGE</code><span> 分数的 </span><code>metric_fn</code><span> 的示例:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">datasets</span> <span class="cm-keyword">import</span> <span class="cm-variable">load_metric</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">rouge_metric</span> <span class="cm-operator">=</span> <span class="cm-variable">load_metric</span>(<span class="cm-string">"rouge"</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">def</span> <span class="cm-def">rouge_fn</span>(<span class="cm-variable">predictions</span>, <span class="cm-variable">labels</span>):</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">decoded_predictions</span> <span class="cm-operator">=</span> <span class="cm-variable">tokenizer</span>.<span class="cm-property">batch_decode</span>(<span class="cm-variable">predictions</span>, <span class="cm-variable">skip_special_tokens</span><span class="cm-operator">=</span><span class="cm-keyword">True</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">decoded_labels</span> <span class="cm-operator">=</span> <span class="cm-variable">tokenizer</span>.<span class="cm-property">batch_decode</span>(<span class="cm-variable">labels</span>, <span class="cm-variable">skip_special_tokens</span><span class="cm-operator">=</span><span class="cm-keyword">True</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">result</span> <span class="cm-operator">=</span> <span class="cm-variable">rouge_metric</span>.<span class="cm-property">compute</span>(<span class="cm-variable">predictions</span><span class="cm-operator">=</span><span class="cm-variable">decoded_predictions</span>, <span class="cm-variable">references</span><span class="cm-operator">=</span><span class="cm-variable">decoded_labels</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-keyword">return</span> {<span class="cm-variable">key</span>: <span class="cm-variable">value</span>.<span class="cm-property">mid</span>.<span class="cm-property">fmeasure</span> <span class="cm-operator">*</span> <span class="cm-number">100</span> <span class="cm-keyword">for</span> <span class="cm-variable">key</span>, <span class="cm-variable">value</span> <span class="cm-keyword">in</span> <span class="cm-variable">result</span>.<span class="cm-property">items</span>()}</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 176px;"/><div class="CodeMirror-gutters" style="display: none; height: 176px;"/></div></div></pre></li><li><p><code>eval_dataset</code><span>:一个 </span><code>tf.data.Dataset</code><span> 或字典或元组或 </span><code>np.ndarray</code><span> 或 </span><code>tf.Tensor</code><span>,指定验证数据集。</span></p></li><li><p><code>output_cols</code><span>:一个关于字符串的列表,指定模型输出中的哪些列作为 </span><code>predictions</code><span> 。默认为所有列。</span></p></li><li><p><code>label_cols</code><span>:一个关于字符串的列表,指定验证集中的哪些列作为 </span><code>label</code><span> 列。如果未提供,则自动检测。</span></p></li><li><p><code>batch_size</code><span>:一个整数,指定 </span><code>batch size</code><span> 。只有在验证集不是 </span><code>pre-batched tf.data.Dataset</code><span> 时才起作用。</span></p></li><li><p><code>predict_with_generate</code><span>:一个布尔值,指定是否应该使用 </span><code>model.generate()</code><span> 来获取模型的输出。</span></p></li><li><p><code>use_xla_generation</code><span>:一个布尔值,如果我们要执行 </span><code>generating</code><span> ,是否要用 </span><code>XLA</code><span> 来编译 </span><code>model generation</code><span> 。这可以极大地提高生成的速度(最多可以提高 </span><code>100</code><span> 倍),但是需要对每个 </span><code>input shape</code><span> 进行新的 </span><code>XLA</code><span> 编译。当使用 </span><code>XLA generation</code><span> 时,最好将你的输入填充到相同的大小,或者在你的 </span><code>tokenizer</code><span> 或 </span><code>DataCollator</code><span> 中使用 </span><code>pad_to_multiple_of</code><span> 参数,这将减少 </span><code>unique input shape</code><span> 的数量,并节省大量的编译时间。</span></p><p><span>如果 </span><code>predict_with_generate = False</code><span> ,该参数没有影响。</span></p></li><li><p><code>generate_kwargs</code><span>:关键字参数,用于 </span><code>generating</code><span> 时传递给 </span><code>model.generate()</code><span> 的关键字参数。</span></p><p><span>如果 </span><code>predict_with_generate = False</code><span> ,该参数没有影响。</span></p></li></ul></li><li><p><code>class transformers.PushToHubCallback</code><span>:用于 </span><code>keras</code><span> 的 </span><code>callback</code><span>,用于定期保存和推送模型到 </span><code>Hub</code><span>。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">transformers</span>.<span class="cm-property">PushToHubCallback</span>(</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">output_dir</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-builtin">str</span>, <span class="cm-variable">pathlib</span>.<span class="cm-property">Path</span>],</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">save_strategy</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-builtin">str</span>, <span class="cm-variable">transformers</span>.<span class="cm-property">trainer_utils</span>.<span class="cm-property">IntervalStrategy</span>] <span class="cm-operator">=</span> <span class="cm-string">'epoch'</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">save_steps</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">int</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">tokenizer</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-variable">transformers</span>.<span class="cm-property">tokenization_utils_base</span>.<span class="cm-property">PreTrainedTokenizerBase</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">hub_model_id</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">hub_token</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">checkpoint</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-operator">**</span><span class="cm-variable">model_card_args</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 220px;"/><div class="CodeMirror-gutters" style="display: none; height: 220px;"/></div></div></pre><p><span>参数:</span></p><ul><li><code>output_dir</code><span>:一个字符串,指定输出目录,</span><code>model predictions</code><span> 和 </span><code>model checkpoints</code><span> 将被写入该目录并与 </span><code>Hub</code><span> 上的 </span><code>repo</code><span> 同步。</span></li><li><code>save_strategy/save_steps</code><span>:参考 </span><code>transformers.TrainingArguments</code><span> 。</span></li><li><code>tokenizer</code><span>:一个 </span><code>PreTrainedTokenizerBase</code><span>,指定模型使用的 </span><code>tokenizer</code><span> 。如果提供,将与模型权重一起上传到 </span><code>repo</code><span> 。</span></li><li><code>hub_model_id/hub_token</code><span>:参考 </span><code>transformers.TrainingArguments</code><span> 。</span></li><li><code>checkpoint</code><span>:一个布尔值,指定是否保存完整的 </span><code>training checkpoints</code><span> (包括 </span><code>epoch</code><span> 和 </span><code>optimizer state</code><span> )以允许恢复训练。只在 </span><code>save_strategy="epoch"</code><span> 时可用。</span></li></ul><p><span>示例:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">transformers</span>.<span class="cm-property">keras_callbacks</span> <span class="cm-keyword">import</span> <span class="cm-variable">PushToHubCallback</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">push_to_hub_callback</span> <span class="cm-operator">=</span> <span class="cm-variable">PushToHubCallback</span>(</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">output_dir</span><span class="cm-operator">=</span><span class="cm-string">"./model_save"</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">tokenizer</span><span class="cm-operator">=</span><span class="cm-variable">tokenizer</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">hub_model_id</span><span class="cm-operator">=</span><span class="cm-string">"gpt5-7xlarge"</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">model</span>.<span class="cm-property">fit</span>(<span class="cm-variable">train_dataset</span>, <span class="cm-variable">callbacks</span><span class="cm-operator">=</span>[<span class="cm-variable">push_to_hub_callback</span>])</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 198px;"/><div class="CodeMirror-gutters" style="display: none; height: 198px;"/></div></div></pre></li></ol><h2 id="四logger"><span>四、Logger</span></h2><ol start=""><li><p><code>Transformers</code><span> 有一个集中化的日志系统,默认的 </span><code>verbosity level</code><span> 是 </span><code>WARNING</code><span> 。有多种方式可以改变 </span><code>verbosity level</code><span>:</span></p><ul><li><p><span>可以在代码中直接指定:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">transformers</span>.<span class="cm-property">utils</span> <span class="cm-keyword">import</span> <span class="cm-variable">logging</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">logging</span>.<span class="cm-property">set_verbosity_info</span>()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment"># 然后使用 logger</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">logger</span> <span class="cm-operator">=</span> <span class="cm-variable">logging</span>.<span class="cm-property">get_logger</span>(<span class="cm-string">"transformers"</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">logger</span>.<span class="cm-property">info</span>(<span class="cm-string">"INFO"</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">logger</span>.<span class="cm-property">warning</span>(<span class="cm-string">"WARN"</span>)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 132px;"/><div class="CodeMirror-gutters" style="display: none; height: 132px;"/></div></div></pre></li><li><p><span>可以通过环境变量指定:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-def">TRANSFORMERS_VERBOSITY</span><span class="cm-operator">=</span>error ./myprogram.py</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre></li></ul></li><li><p><code>verbosity level</code><span> (从最少的日志到最多的日志)为:</span></p><ul><li><code>transformers.logging.CRITICAL</code><span> 或 </span><code>transformers.logging.FATAL</code><span>(整数值 </span><code>50</code><span>):仅报告最关键的错误。</span></li><li><code>transformers.logging.ERROR</code><span>(整数值 </span><code>40</code><span>):仅报告错误。</span></li><li><code>transformers.logging.WARNING</code><span> 或 </span><code>transformers.logging.WARN</code><span>(整数值 </span><code>30</code><span>):仅报告错误和警告。这是默认级别。</span></li><li><code>transformers.logging.INFO</code><span>(整数值 </span><code>20</code><span>):报告错误、警告和基本信息。</span></li><li><code>transformers.logging.DEBUG</code><span>(整数值 </span><code>10</code><span>):报告所有信息。</span></li></ul></li><li><p><span>默认情况下,在模型下载过程中会显示 </span><code>tqdm</code><span> 进度条。</span><code>logging.disable_progress_bar()</code><span> 和 </span><code>logging.enable_progress_bar()</code><span> 可以用来禁止或开启这种行为。</span></p></li><li><p><span>获取和配置 </span><code>verbosity level</code><span>:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">transformers</span>.<span class="cm-property">utils</span>.<span class="cm-property">logging</span>.<span class="cm-property">set_verbosity_warning</span>()</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">transformers</span>.<span class="cm-property">utils</span>.<span class="cm-property">logging</span>.<span class="cm-property">set_verbosity_info</span>()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">transformers</span>.<span class="cm-property">utils</span>.<span class="cm-property">logging</span>.<span class="cm-property">set_verbosity_debug</span>()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">transformers</span>.<span class="cm-property">utils</span>.<span class="cm-property">logging</span>.<span class="cm-property">get_verbosity</span>() <span class="cm-operator">-></span> <span class="cm-builtin">int</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">transformers</span>.<span class="cm-property">utils</span>.<span class="cm-property">logging</span>.<span class="cm-property">set_verbosity</span>(<span class="cm-variable">verbosity</span>: <span class="cm-builtin">int</span> )</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 110px;"/><div class="CodeMirror-gutters" style="display: none; height: 110px;"/></div></div></pre></li><li><p><code>transformers.utils.logging.get_logger(name: typing.Optional[str] = None )</code><span>:获取指定名字的 </span><code>logger</code><span> 。</span></p><p><span>这个函数不应该被直接访问,除非你正在编写一个自定义的 </span><code>transformers module</code><span> 。</span></p></li><li><p><span>开启和禁用 </span><code>HuggingFace Transformers’s root logger</code><span> 的默认 </span><code>handler</code><span>:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">transformers</span>.<span class="cm-property">utils</span>.<span class="cm-property">logging</span>.<span class="cm-property">enable_default_handler</span>()</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">transformers</span>.<span class="cm-property">utils</span>.<span class="cm-property">logging</span>.<span class="cm-property">disable_default_handler</span>()</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 44px;"/><div class="CodeMirror-gutters" style="display: none; height: 44px;"/></div></div></pre></li><li><p><code>transformers.utils.logging.enable_explicit_format()</code><span>:为每个 </span><code>HuggingFace Transformers</code><span> 的 </span><code>logger</code><span> 启用显式格式化。显式格式化的内容如下:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">[LEVELNAME|FILENAME|LINE NUMBER] TIME >> MESSAGE</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre><p><span>当前与 </span><code>root logger</code><span> 绑定的所有 </span><code>handler</code><span> 都受此方法影响。</span></p></li><li><p><code>transformers.utils.logging.reset_format()</code><span>:为每个 </span><code>HuggingFace Transformers</code><span> 的 </span><code>logger</code><span> 重设格式化。</span></p><p><span>当前与 </span><code>root logger</code><span> 绑定的所有 </span><code>handler</code><span> 都受此方法影响。</span></p></li><li><p><span>开启和禁用 </span><code>tqdm</code><span> 进度条:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">transformers</span>.<span class="cm-property">utils</span>.<span class="cm-property">logging</span>.<span class="cm-property">enable_progress_bar</span>()</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">transformers</span>.<span class="cm-property">utils</span>.<span class="cm-property">logging</span>.<span class="cm-property">disable_progress_bar</span>()</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 44px;"/><div class="CodeMirror-gutters" style="display: none; height: 44px;"/></div></div></pre></li></ol><h2 id="五optimization"><span>五、Optimization</span></h2><ol start=""><li><p><code>optimization</code><span> 模块提供如下功能:</span></p><ul><li><span>一个具有固定权重衰减的 </span><code>optimizer</code><span>,可用于对模型进行微调。</span></li><li><span>几个 </span><code>schedule</code><span>,它们是以继承自 </span><code>_LRSchedule</code><span> 的 </span><code>schedule</code><span> 对象的形式提供。</span></li><li><span>一个 </span><code>gradient accumulation</code><span> 类,用于累积多个 </span><code>batch</code><span> 的梯度。</span></li></ul></li></ol><h3 id="51-optimizer-api"><span>5.1 Optimizer API</span></h3><ol start=""><li><p><code>class transformers.AdamW</code><span>:具有固定的 </span><code>weight decay</code><span> 的 </span><code>Adam</code><span> 。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">transformers</span>.<span class="cm-property">AdamW</span>(</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">params</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Iterable</span>[<span class="cm-variable">torch</span>.<span class="cm-property">nn</span>.<span class="cm-property">parameter</span>.<span class="cm-property">Parameter</span>],</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">lr</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">0.001</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">betas</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Tuple</span>[<span class="cm-builtin">float</span>, <span class="cm-builtin">float</span>] <span class="cm-operator">=</span> (<span class="cm-number">0.9</span>, <span class="cm-number">0.999</span>),</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">eps</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">1e-06</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">weight_decay</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">0.0</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">correct_bias</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">True</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">no_deprecation_warning</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 198px;"/><div class="CodeMirror-gutters" style="display: none; height: 198px;"/></div></div></pre><p><span>参数:</span></p><ul><li><code>params</code><span>:一个 </span><code>nn.parameter.Parameter</code><span> 可迭代对象或字典,指定需要被优化的 </span><code>parameters</code><span> 或 </span><code>parameter groups</code><span> 。</span></li><li><code>lr</code><span>:一个浮点数,指定初始学习率。</span></li><li><code>betas</code><span>:一个 </span><code>Tuple[float,float]</code><span> 元组,指定 </span><code>Adam</code><span> 的参数 </span><mjx-container class="MathJax" jax="SVG" style="position: relative;"><svg xmlns="http://www.w3.org/2000/svg" width="7.303ex" height="2.262ex" role="img" focusable="false" viewbox="0 -750 3227.8 1000" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" style="vertical-align: -0.566ex;"><defs><path id="MJX-2-TEX-N-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"/><path id="MJX-2-TEX-I-1D6FD" d="M29 -194Q23 -188 23 -186Q23 -183 102 134T186 465Q208 533 243 584T309 658Q365 705 429 705H431Q493 705 533 667T573 570Q573 465 469 396L482 383Q533 332 533 252Q533 139 448 65T257 -10Q227 -10 203 -2T165 17T143 40T131 59T126 65L62 -188Q60 -194 42 -194H29ZM353 431Q392 431 427 419L432 422Q436 426 439 429T449 439T461 453T472 471T484 495T493 524T501 560Q503 569 503 593Q503 611 502 616Q487 667 426 667Q384 667 347 643T286 582T247 514T224 455Q219 439 186 308T152 168Q151 163 151 147Q151 99 173 68Q204 26 260 26Q302 26 349 51T425 137Q441 171 449 214T457 279Q457 337 422 372Q380 358 347 358H337Q258 358 258 389Q258 396 261 403Q275 431 353 431Z"/><path id="MJX-2-TEX-N-31" d="M213 578L200 573Q186 568 160 563T102 556H83V602H102Q149 604 189 617T245 641T273 663Q275 666 285 666Q294 666 302 660V361L303 61Q310 54 315 52T339 48T401 46H427V0H416Q395 3 257 3Q121 3 100 0H88V46H114Q136 46 152 46T177 47T193 50T201 52T207 57T213 61V578Z"/><path id="MJX-2-TEX-N-2C" d="M78 35T78 60T94 103T137 121Q165 121 187 96T210 8Q210 -27 201 -60T180 -117T154 -158T130 -185T117 -194Q113 -194 104 -185T95 -172Q95 -168 106 -156T131 -126T157 -76T173 -3V9L172 8Q170 7 167 6T161 3T152 1T140 0Q113 0 96 17Z"/><path id="MJX-2-TEX-N-32" d="M109 429Q82 429 66 447T50 491Q50 562 103 614T235 666Q326 666 387 610T449 465Q449 422 429 383T381 315T301 241Q265 210 201 149L142 93L218 92Q375 92 385 97Q392 99 409 186V189H449V186Q448 183 436 95T421 3V0H50V19V31Q50 38 56 46T86 81Q115 113 136 137Q145 147 170 174T204 211T233 244T261 278T284 308T305 340T320 369T333 401T340 431T343 464Q343 527 309 573T212 619Q179 619 154 602T119 569T109 550Q109 549 114 549Q132 549 151 535T170 489Q170 464 154 447T109 429Z"/><path id="MJX-2-TEX-N-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"/></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="mo"><use data-c="28" xlink:href="#MJX-2-TEX-N-28"/></g><g data-mml-node="msub" transform="translate(389,0)"><g data-mml-node="mi"><use data-c="1D6FD" xlink:href="#MJX-2-TEX-I-1D6FD"/></g><g data-mml-node="mn" transform="translate(599,-150) scale(0.707)"><use data-c="31" xlink:href="#MJX-2-TEX-N-31"/></g></g><g data-mml-node="mo" transform="translate(1391.6,0)"><use data-c="2C" xlink:href="#MJX-2-TEX-N-2C"/></g><g data-mml-node="msub" transform="translate(1836.2,0)"><g data-mml-node="mi"><use data-c="1D6FD" xlink:href="#MJX-2-TEX-I-1D6FD"/></g><g data-mml-node="mn" transform="translate(599,-150) scale(0.707)"><use data-c="32" xlink:href="#MJX-2-TEX-N-32"/></g></g><g data-mml-node="mo" transform="translate(2838.8,0)"><use data-c="29" xlink:href="#MJX-2-TEX-N-29"/></g></g></g></svg><mjx-assistive-mml unselectable="on" display="inline"><math xmlns="http://www.w3.org/1998/Math/MathML"><mo stretchy="false">(</mo><msub><mi>β</mi><mn>1</mn></msub><mo>,</mo><msub><mi>β</mi><mn>2</mn></msub><mo stretchy="false">)</mo></math></mjx-assistive-mml></mjx-container><span> 。</span></li><li><code>eps</code><span>:一个浮点数,用于数值稳定。</span></li><li><code>weight_decay</code><span>:一个浮点数,指定权重衰减。</span></li><li><code>correct_bias</code><span>:一个布尔值,指定是否在 </span><code>Adam</code><span> 中 </span><code>correct bias</code><span> 。例如,在 </span><code>BERT TF repo</code><span> 中,他们使用 </span><code>False</code><span> 。</span></li><li><code>no_deprecation_warning</code><span>:一个布尔值,指定是否屏蔽 </span><code>deprecation warning</code><span> 。</span></li></ul><p><span>方法:</span></p><ul><li><p><code>step( closure: typing.Callable = None)</code><span>:指定单个 </span><code>optimization step</code><span> 。</span></p><p><span>参数:</span><code>closure</code><span>:一个可调用对象,用于重新评估模型并返回 </span><code>loss</code><span> 。</span></p></li></ul></li><li><p><code>class transformers.Adafactor(PyTorch)</code><span>:</span><code>Adafactor</code><span> 优化器。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">transformers</span>.<span class="cm-property">Adafactor</span>(</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">paramslr</span> <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">eps</span> <span class="cm-operator">=</span> (<span class="cm-number">1e-30</span>, <span class="cm-number">0.001</span>),</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">clip_threshold</span> <span class="cm-operator">=</span> <span class="cm-number">1.0</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">decay_rate</span> <span class="cm-operator">=</span> <span class="cm-operator">-</span><span class="cm-number">0.8</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">beta1</span> <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">weight_decay</span> <span class="cm-operator">=</span> <span class="cm-number">0.0</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">scale_parameter</span> <span class="cm-operator">=</span> <span class="cm-keyword">True</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">relative_step</span> <span class="cm-operator">=</span> <span class="cm-keyword">True</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">warmup_init</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 242px;"/><div class="CodeMirror-gutters" style="display: none; height: 242px;"/></div></div></pre><p><span>参数:</span></p><ul><li><code>params</code><span>:参考 </span><code>AdamW</code><span> 。</span></li><li><code>lr</code><span>:</span><code>external</code><span> 的学习率。</span></li><li><code>eps</code><span>:一个 </span><code>Tuple[float, float]</code><span> 元组,指定正则化系数,分别用于 </span><code>square gradient</code><span> 和 </span><code>parameter scale</code><span> 。</span></li><li><code>clip_threshold</code><span>:一个浮点数,指定 </span><code>final gradient update</code><span> 的均方根阈值。</span></li><li><code>decay_rate</code><span>:一个浮点数,指定用于计算 </span><code>running averages of square</code><span> 的系数。</span></li><li><code>beta1</code><span>:一个浮点数,指定用于计算 </span><code>running averages of gradient</code><span> 的系数。</span></li><li><code>weight_decay</code><span>:一个浮点数,指定权重衰减(</span><code>L2</code><span> 正则化)。</span></li><li><code>scale_parameter</code><span>:一个布尔值,如果为 </span><code>True</code><span>,则学习率通过 </span><code>root mean square</code><span> 来缩放。</span></li><li><code>relative_step</code><span>:一个布尔值,如果为 </span><code>True</code><span>,则计算 </span><code>time-dependent</code><span> 学习率而不是 </span><code>external</code><span> 学习率。</span></li><li><code>warmup_init</code><span>:一个布尔值,指定 </span><code>time-dependent</code><span> 学习率是否启用 </span><code>warm-up</code><span> 初始化。</span></li></ul><p><span>方法:参考 </span><code>AdamW</code><span> 。</span></p></li><li><p><code>class transformers.AdamWeightDecay(TensorFlow)</code><span>:在梯度上启用了 </span><code>L2</code><span> 权重衰减和 </span><code>clip_by_global_norm</code><span> 的 </span><code>Adam</code><span> 。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">transformers</span>.<span class="cm-property">AdamWeightDecay</span>(</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">learning_rate</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-builtin">float</span>, <span class="cm-variable">keras</span>.<span class="cm-property">optimizers</span>.<span class="cm-property">schedules</span>.<span class="cm-property">learning_rate_schedule</span>.<span class="cm-property">LearningRateSchedule</span>] <span class="cm-operator">=</span> <span class="cm-number">0.001</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">beta_1</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">0.9</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">beta_2</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">0.999</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">epsilon</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">1e-07</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">amsgrad</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">weight_decay_rate</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">0.0</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">include_in_weight_decay</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-variable">typing</span>.<span class="cm-property">List</span>[<span class="cm-builtin">str</span>]] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">exclude_from_weight_decay</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-variable">typing</span>.<span class="cm-property">List</span>[<span class="cm-builtin">str</span>]] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">name</span>: <span class="cm-builtin">str</span> <span class="cm-operator">=</span> <span class="cm-string">'AdamWeightDecay'</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-operator">**</span><span class="cm-variable">kwargs</span> </span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 286px;"/><div class="CodeMirror-gutters" style="display: none; height: 286px;"/></div></div></pre><p><span>参数:</span></p><ul><li><p><code>learning_rate</code><span>:一个浮点数或 </span><code>tf.keras.optimizers.schedules.LearningRateSchedule</code><span>,指定学习率或学习率调度。</span></p></li><li><p><code>beta_1</code><span>:一个浮点数,指定 </span><code>Adam</code><span> 的 </span><mjx-container class="MathJax" jax="SVG" style="position: relative;"><svg xmlns="http://www.w3.org/2000/svg" width="2.268ex" height="2.034ex" role="img" focusable="false" viewbox="0 -705 1002.6 899" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" style="vertical-align: -0.439ex;"><defs><path id="MJX-6-TEX-I-1D6FD" d="M29 -194Q23 -188 23 -186Q23 -183 102 134T186 465Q208 533 243 584T309 658Q365 705 429 705H431Q493 705 533 667T573 570Q573 465 469 396L482 383Q533 332 533 252Q533 139 448 65T257 -10Q227 -10 203 -2T165 17T143 40T131 59T126 65L62 -188Q60 -194 42 -194H29ZM353 431Q392 431 427 419L432 422Q436 426 439 429T449 439T461 453T472 471T484 495T493 524T501 560Q503 569 503 593Q503 611 502 616Q487 667 426 667Q384 667 347 643T286 582T247 514T224 455Q219 439 186 308T152 168Q151 163 151 147Q151 99 173 68Q204 26 260 26Q302 26 349 51T425 137Q441 171 449 214T457 279Q457 337 422 372Q380 358 347 358H337Q258 358 258 389Q258 396 261 403Q275 431 353 431Z"/><path id="MJX-6-TEX-N-31" d="M213 578L200 573Q186 568 160 563T102 556H83V602H102Q149 604 189 617T245 641T273 663Q275 666 285 666Q294 666 302 660V361L303 61Q310 54 315 52T339 48T401 46H427V0H416Q395 3 257 3Q121 3 100 0H88V46H114Q136 46 152 46T177 47T193 50T201 52T207 57T213 61V578Z"/></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="msub"><g data-mml-node="mi"><use data-c="1D6FD" xlink:href="#MJX-6-TEX-I-1D6FD"/></g><g data-mml-node="mn" transform="translate(599,-150) scale(0.707)"><use data-c="31" xlink:href="#MJX-6-TEX-N-31"/></g></g></g></g></svg><mjx-assistive-mml unselectable="on" display="inline"><math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>β</mi><mn>1</mn></msub></math></mjx-assistive-mml></mjx-container><span> 参数。</span></p></li><li><p><code>beta_2</code><span>:一个浮点数,指定 </span><code>Adam</code><span> 的 </span><mjx-container class="MathJax" jax="SVG" style="position: relative;"><svg xmlns="http://www.w3.org/2000/svg" width="2.268ex" height="2.034ex" role="img" focusable="false" viewbox="0 -705 1002.6 899" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" style="vertical-align: -0.439ex;"><defs><path id="MJX-7-TEX-I-1D6FD" d="M29 -194Q23 -188 23 -186Q23 -183 102 134T186 465Q208 533 243 584T309 658Q365 705 429 705H431Q493 705 533 667T573 570Q573 465 469 396L482 383Q533 332 533 252Q533 139 448 65T257 -10Q227 -10 203 -2T165 17T143 40T131 59T126 65L62 -188Q60 -194 42 -194H29ZM353 431Q392 431 427 419L432 422Q436 426 439 429T449 439T461 453T472 471T484 495T493 524T501 560Q503 569 503 593Q503 611 502 616Q487 667 426 667Q384 667 347 643T286 582T247 514T224 455Q219 439 186 308T152 168Q151 163 151 147Q151 99 173 68Q204 26 260 26Q302 26 349 51T425 137Q441 171 449 214T457 279Q457 337 422 372Q380 358 347 358H337Q258 358 258 389Q258 396 261 403Q275 431 353 431Z"/><path id="MJX-7-TEX-N-32" d="M109 429Q82 429 66 447T50 491Q50 562 103 614T235 666Q326 666 387 610T449 465Q449 422 429 383T381 315T301 241Q265 210 201 149L142 93L218 92Q375 92 385 97Q392 99 409 186V189H449V186Q448 183 436 95T421 3V0H50V19V31Q50 38 56 46T86 81Q115 113 136 137Q145 147 170 174T204 211T233 244T261 278T284 308T305 340T320 369T333 401T340 431T343 464Q343 527 309 573T212 619Q179 619 154 602T119 569T109 550Q109 549 114 549Q132 549 151 535T170 489Q170 464 154 447T109 429Z"/></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="msub"><g data-mml-node="mi"><use data-c="1D6FD" xlink:href="#MJX-7-TEX-I-1D6FD"/></g><g data-mml-node="mn" transform="translate(599,-150) scale(0.707)"><use data-c="32" xlink:href="#MJX-7-TEX-N-32"/></g></g></g></g></svg><mjx-assistive-mml unselectable="on" display="inline"><math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>β</mi><mn>2</mn></msub></math></mjx-assistive-mml></mjx-container><span> 参数。</span></p></li><li><p><code>epsilon</code><span>:一个浮点数,是 </span><code>Adam</code><span> 中的 </span><mjx-container class="MathJax" jax="SVG" style="position: relative;"><svg xmlns="http://www.w3.org/2000/svg" width="0.919ex" height="1ex" role="img" focusable="false" viewbox="0 -431 406 442" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" style="vertical-align: -0.025ex;"><defs><path id="MJX-8-TEX-I-1D716" d="M227 -11Q149 -11 95 41T40 174Q40 262 87 322Q121 367 173 396T287 430Q289 431 329 431H367Q382 426 382 411Q382 385 341 385H325H312Q191 385 154 277L150 265H327Q340 256 340 246Q340 228 320 219H138V217Q128 187 128 143Q128 77 160 52T231 26Q258 26 284 36T326 57T343 68Q350 68 354 58T358 39Q358 36 357 35Q354 31 337 21T289 0T227 -11Z"/></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="mi"><use data-c="1D716" xlink:href="#MJX-8-TEX-I-1D716"/></g></g></g></svg><mjx-assistive-mml unselectable="on" display="inline"><math xmlns="http://www.w3.org/1998/Math/MathML"><mi>ϵ</mi></math></mjx-assistive-mml></mjx-container><span> 参数,用于数值稳定性。</span></p></li><li><p><code>amsgrad</code><span>:一个布尔值,指定是否应该使用算法的 </span><code>AMSGrad</code><span> 变体。</span></p></li><li><p><code>weight_decay_rate</code><span>:一个浮点数,指定权重衰减的系数。</span></p></li><li><p><code>include_in_weight_decay</code><span>:一个关于字符串的列表,指定对哪些 </span><code>parameters</code><span> 应用权重衰减。如果未传递该参数,则默认应用于所有的 </span><code>parameters</code><span>(除非它们位于 </span><code>exclude_from_weight_decay</code><span> 中)。</span></p></li><li><p><code>exclude_from_weight_decay</code><span>:一个关于字符串的列表,指定对哪些 </span><code>parameters</code><span> 排除权重衰减。如果一个 </span><code>parameter name</code><span> 同时位于 </span><code>include_in_weight_decay</code><span> 和 </span><code>exclude_from_weight_decay</code><span>,那么以 </span><code>include_in_weight_decay</code><span> 优先级最高。</span></p></li><li><p><code>name</code><span>:一个字符串,指定权重衰减操作的名称。</span></p></li><li><p><code>kwargs</code><span>:关键字操作,可以为 </span><code>clipnorm, clipvalue, lr, decay</code><span>:</span></p><ul><li><code>clipnorm</code><span>:基于梯度范数来裁剪梯度。</span></li><li><code>clipvalue</code><span>:基于梯度的取值来裁剪梯度。</span></li><li><code>decay</code><span>:用于后向兼容性,从而允许学习率的时间逆向衰减 </span><code>time inverse decay</code><span> 。</span></li><li><code>lr</code><span>:用于后向兼容性,建议使用 </span><code>learning_rate</code><span> 。</span></li></ul></li></ul><p><span>方法:</span></p><ul><li><code>from_config(config)</code><span> :从配置文件中创建一个 </span><code>AdamWeightDecay</code><span> 。</span></li></ul></li><li><p><code>transformers.create_optimizer()</code><span>:创建一个 </span><code>optimizer</code><span> 。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">transformers</span>.<span class="cm-property">create_optimizer</span>(</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">init_lr</span>: <span class="cm-builtin">float</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">num_train_steps</span>: <span class="cm-builtin">int</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">num_warmup_steps</span>: <span class="cm-builtin">int</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">min_lr_ratio</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">0.0</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">adam_beta1</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">0.9</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">adam_beta2</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">0.999</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">adam_epsilon</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">1e-08</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">adam_clipnorm</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">float</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">adam_global_clipnorm</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">float</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">weight_decay_rate</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">0.0</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">power</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">1.0</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">include_in_weight_decay</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-variable">typing</span>.<span class="cm-property">List</span>[<span class="cm-builtin">str</span>]] <span class="cm-operator">=</span> <span class="cm-keyword">None</span> )</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 286px;"/><div class="CodeMirror-gutters" style="display: none; height: 286px;"/></div></div></pre><p><span>参数:</span></p><ul><li><code>init_lr</code><span>:一个浮点数,指定 </span><code>warmup</code><span> 阶段结束时的期望学习率。</span></li><li><code>num_train_steps</code><span>:一个整数,指定总的训练 </span><code>step</code><span> 数。</span></li><li><code>num_warmup_steps</code><span>:一个整数,指定 </span><code>warmup step</code><span> 数。</span></li><li><code>min_lr_ratio</code><span>:一个浮点数,学习率线性衰减结束时的最终学习率为 </span><code>init_lr * min_lr_ratio</code><span> 。</span></li><li><code>adam_beta1</code><span>:一个浮点数,指定 </span><code>Adam</code><span> 的 </span><mjx-container class="MathJax" jax="SVG" style="position: relative;"><svg xmlns="http://www.w3.org/2000/svg" width="2.268ex" height="2.034ex" role="img" focusable="false" viewbox="0 -705 1002.6 899" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" style="vertical-align: -0.439ex;"><defs><path id="MJX-6-TEX-I-1D6FD" d="M29 -194Q23 -188 23 -186Q23 -183 102 134T186 465Q208 533 243 584T309 658Q365 705 429 705H431Q493 705 533 667T573 570Q573 465 469 396L482 383Q533 332 533 252Q533 139 448 65T257 -10Q227 -10 203 -2T165 17T143 40T131 59T126 65L62 -188Q60 -194 42 -194H29ZM353 431Q392 431 427 419L432 422Q436 426 439 429T449 439T461 453T472 471T484 495T493 524T501 560Q503 569 503 593Q503 611 502 616Q487 667 426 667Q384 667 347 643T286 582T247 514T224 455Q219 439 186 308T152 168Q151 163 151 147Q151 99 173 68Q204 26 260 26Q302 26 349 51T425 137Q441 171 449 214T457 279Q457 337 422 372Q380 358 347 358H337Q258 358 258 389Q258 396 261 403Q275 431 353 431Z"/><path id="MJX-6-TEX-N-31" d="M213 578L200 573Q186 568 160 563T102 556H83V602H102Q149 604 189 617T245 641T273 663Q275 666 285 666Q294 666 302 660V361L303 61Q310 54 315 52T339 48T401 46H427V0H416Q395 3 257 3Q121 3 100 0H88V46H114Q136 46 152 46T177 47T193 50T201 52T207 57T213 61V578Z"/></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="msub"><g data-mml-node="mi"><use data-c="1D6FD" xlink:href="#MJX-6-TEX-I-1D6FD"/></g><g data-mml-node="mn" transform="translate(599,-150) scale(0.707)"><use data-c="31" xlink:href="#MJX-6-TEX-N-31"/></g></g></g></g></svg><mjx-assistive-mml unselectable="on" display="inline"><math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>β</mi><mn>1</mn></msub></math></mjx-assistive-mml></mjx-container><span> 参数。</span></li><li><code>adam_beta2</code><span>:一个浮点数,指定 </span><code>Adam</code><span> 的 </span><mjx-container class="MathJax" jax="SVG" style="position: relative;"><svg xmlns="http://www.w3.org/2000/svg" width="2.268ex" height="2.034ex" role="img" focusable="false" viewbox="0 -705 1002.6 899" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" style="vertical-align: -0.439ex;"><defs><path id="MJX-7-TEX-I-1D6FD" d="M29 -194Q23 -188 23 -186Q23 -183 102 134T186 465Q208 533 243 584T309 658Q365 705 429 705H431Q493 705 533 667T573 570Q573 465 469 396L482 383Q533 332 533 252Q533 139 448 65T257 -10Q227 -10 203 -2T165 17T143 40T131 59T126 65L62 -188Q60 -194 42 -194H29ZM353 431Q392 431 427 419L432 422Q436 426 439 429T449 439T461 453T472 471T484 495T493 524T501 560Q503 569 503 593Q503 611 502 616Q487 667 426 667Q384 667 347 643T286 582T247 514T224 455Q219 439 186 308T152 168Q151 163 151 147Q151 99 173 68Q204 26 260 26Q302 26 349 51T425 137Q441 171 449 214T457 279Q457 337 422 372Q380 358 347 358H337Q258 358 258 389Q258 396 261 403Q275 431 353 431Z"/><path id="MJX-7-TEX-N-32" d="M109 429Q82 429 66 447T50 491Q50 562 103 614T235 666Q326 666 387 610T449 465Q449 422 429 383T381 315T301 241Q265 210 201 149L142 93L218 92Q375 92 385 97Q392 99 409 186V189H449V186Q448 183 436 95T421 3V0H50V19V31Q50 38 56 46T86 81Q115 113 136 137Q145 147 170 174T204 211T233 244T261 278T284 308T305 340T320 369T333 401T340 431T343 464Q343 527 309 573T212 619Q179 619 154 602T119 569T109 550Q109 549 114 549Q132 549 151 535T170 489Q170 464 154 447T109 429Z"/></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="msub"><g data-mml-node="mi"><use data-c="1D6FD" xlink:href="#MJX-7-TEX-I-1D6FD"/></g><g data-mml-node="mn" transform="translate(599,-150) scale(0.707)"><use data-c="32" xlink:href="#MJX-7-TEX-N-32"/></g></g></g></g></svg><mjx-assistive-mml unselectable="on" display="inline"><math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>β</mi><mn>2</mn></msub></math></mjx-assistive-mml></mjx-container><span> 参数。</span></li><li><code>adam_epsilon</code><span>:一个浮点数,指定 </span><code>Adam</code><span> 中的 </span><mjx-container class="MathJax" jax="SVG" style="position: relative;"><svg xmlns="http://www.w3.org/2000/svg" width="0.919ex" height="1ex" role="img" focusable="false" viewbox="0 -431 406 442" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" style="vertical-align: -0.025ex;"><defs><path id="MJX-8-TEX-I-1D716" d="M227 -11Q149 -11 95 41T40 174Q40 262 87 322Q121 367 173 396T287 430Q289 431 329 431H367Q382 426 382 411Q382 385 341 385H325H312Q191 385 154 277L150 265H327Q340 256 340 246Q340 228 320 219H138V217Q128 187 128 143Q128 77 160 52T231 26Q258 26 284 36T326 57T343 68Q350 68 354 58T358 39Q358 36 357 35Q354 31 337 21T289 0T227 -11Z"/></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="mi"><use data-c="1D716" xlink:href="#MJX-8-TEX-I-1D716"/></g></g></g></svg><mjx-assistive-mml unselectable="on" display="inline"><math xmlns="http://www.w3.org/1998/Math/MathML"><mi>ϵ</mi></math></mjx-assistive-mml></mjx-container><span> 参数,用于数值稳定性。</span></li><li><code>adam_clipnorm</code><span>:一个浮点数,如果不是</span><code>None</code><span> ,指定每个权重梯度范数的裁剪值。</span></li><li><code>adam_global_clipnorm</code><span>:一个浮点数,如果不是</span><code>None</code><span> ,把所有权重梯度拼接起来,然后这个拼接结果的范数的裁剪值。</span></li><li><code>weight_decay_rate</code><span>:一个浮点数,指定权重衰减系数。</span></li><li><code>power</code><span>:一个浮点数,指定多项式衰减的幂次。</span></li><li><code>include_in_weight_decay</code><span>:参考 </span><code>AdamWeightDecay</code><span> 。</span></li></ul></li></ol><h3 id="52-schedule-api"><span>5.2 Schedule API</span></h3><ol start=""><li><p><code>class transformers.SchedulerType</code><span>:一个枚举类型。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">transformers</span>.<span class="cm-property">SchedulerType</span>(</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">value</span>, <span class="cm-variable">names</span> <span class="cm-operator">=</span> <span class="cm-keyword">None</span>, <span class="cm-variable">module</span> <span class="cm-operator">=</span> <span class="cm-keyword">None</span>, <span class="cm-variable">qualname</span> <span class="cm-operator">=</span> <span class="cm-keyword">None</span>, <span class="cm-builtin">type</span> <span class="cm-operator">=</span> <span class="cm-keyword">None</span>, <span class="cm-variable">start</span> <span class="cm-operator">=</span> <span class="cm-number">1</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 66px;"/><div class="CodeMirror-gutters" style="display: none; height: 66px;"/></div></div></pre></li><li><p><code>transformers.get_scheduler()</code><span>:一个统一的 </span><code>API</code><span>,根据 </span><code>scheduler name</code><span> 来获取 </span><code>scheduler</code><span> 。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">transformers</span>.<span class="cm-property">get_scheduler</span>(</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">name</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-builtin">str</span>, <span class="cm-variable">transformers</span>.<span class="cm-property">trainer_utils</span>.<span class="cm-property">SchedulerType</span>],</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">optimizer</span>: <span class="cm-variable">Optimizer</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">num_warmup_steps</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">int</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">num_training_steps</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">int</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span> </span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 132px;"/><div class="CodeMirror-gutters" style="display: none; height: 132px;"/></div></div></pre><p><span>参数:</span></p><ul><li><code>name</code><span>:一个字符串或 </span><code>SchedulerType</code><span>,指定 </span><code>scheduler</code><span> 的名字。</span></li><li><code>optimizer</code><span>:一个 </span><code>torch.optim.Optimizer</code><span> 对象,指定优化器。</span></li><li><code>num_warmup_steps</code><span>:一个整数,指定需要的 </span><code>warmup step</code><span> 的数量。不是所有的 </span><code>scheduler</code><span> 都需要这个参数(因此这个参数是可选的)。如果这个参数没有设置,而 </span><code>scheduler</code><span> 需要这个参数,则将引发一个错误。</span></li><li><code>num_training_steps</code><span>:一个整数,指定需要的 </span><code>training step</code><span> 的数量。不是所有的 </span><code>scheduler</code><span> 都需要这个参数(因此这个参数是可选的)。如果这个参数没有设置,而 </span><code>scheduler</code><span> 需要这个参数,则将引发一个错误。</span></li></ul></li><li><p><code>transformers.get_constant_schedule( optimizer: Optimizer, last_epoch: int = -1)</code><span> :创建一个常数学习率的调度器。</span></p><p><span>参数:</span></p><ul><li><code>optimizer</code><span>:一个 </span><code>torch.optim.Optimizer</code><span> 对象,指定优化器。</span></li><li><code>last_epoch</code><span>:一个整数,指定 </span><code>last epoch</code><span> 的索引,用于恢复训练。</span></li></ul></li><li><p><code>transformers.get_constant_schedule_with_warmup(optimizer: Optimizer, num_warmup_steps: int, last_epoch: int = -1)</code><span>:创建一个带 </span><code>warmup</code><span> 的常数学习率的调度器。</span></p><p><span>参数:</span></p><ul><li><code>num_warmup_steps</code><span>:一个整数,指定 </span><code>warmup</code><span> 阶段的 </span><code>step</code><span> 数。</span></li><li><span>其它参数参考 </span><code>get_constant_schedule()</code><span> 。</span></li></ul></li><li><p><code>transformers.get_cosine_schedule_with_warmup(optimizer: Optimizer, num_warmup_steps: int, num_training_steps: int, num_cycles: float = 0.5, last_epoch: int = -1)</code><span>:创建一个带 </span><code>warmup</code><span> 的余弦学习率的调度器。</span></p><p><span>参数:</span></p><ul><li><code>num_training_steps</code><span>:一个整数,指定总的训练 </span><code>step</code><span> 数。</span></li><li><code>num_cycles</code><span>:一个浮点数,指定余弦调度中的波数,默认为 </span><code>0.5</code><span>,表示半个余弦(从最大值下降到零)。</span></li><li><span>其它参数参考 </span><code>get_constant_schedule_with_warmup()</code><span> 。</span></li></ul></li><li><p><code>transformers.get_cosine_with_hard_restarts_schedule_with_warmup(optimizer: Optimizer, num_warmup_steps: int, num_training_steps: int, num_cycles: float = 0.5, last_epoch: int = -1)</code><span>:创建一个带 </span><code>warmup</code><span> 的、且若干个硬重启的余弦学习率的调度器</span></p><p><span>参数:</span></p><ul><li><code>num_cycles</code><span>:一个整数,指定 </span><code>hard restart</code><span> 的数量。</span></li><li><span>其它参数参考 </span><code>get_cosine_schedule_with_warmup()</code><span> 。</span></li></ul></li><li><p><code>transformers.get_linear_schedule_with_warmup( optimizer, num_warmup_steps, num_training_steps, last_epoch = -1)</code><span>:创建一个带 </span><code>warmup</code><span> 的线性调度器。</span></p><p><span>参数:参考 </span><code>get_constant_schedule_with_warmup()</code><span> 。</span></p></li><li><p><code>transformers.get_polynomial_decay_schedule_with_warmup(optimizer, num_warmup_steps, num_training_steps, lr_end = 1e-07, power = 1.0, last_epoch = -1)</code><span>:创建一个带 </span><code>warmup</code><span> 的多项式衰减调度器。</span></p><p><span>参数:</span></p><ul><li><code>lr_end</code><span>:一个浮点数,制定结束时的学习率。</span></li><li><code>power</code><span>:一个浮点数,指定指数因子。</span></li><li><span>其它参数参考 </span><code>get_constant_schedule_with_warmup()</code><span> 。</span></li></ul></li><li><p><code>class transformers.WarmUp(TensorFlow)</code><span>:在一个给定的 </span><code>learning rate decay schedule</code><span> 上应用一个 </span><code>warmup</code><span> 。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">transformers</span>.<span class="cm-property">WarmUp</span>(</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">initial_learning_rate</span>: <span class="cm-builtin">float</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">decay_schedule_fn</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Callable</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">warmup_steps</span>: <span class="cm-builtin">int</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">power</span>: <span class="cm-builtin">float</span> <span class="cm-operator">=</span> <span class="cm-number">1.0</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">name</span>: <span class="cm-builtin">str</span> <span class="cm-operator">=</span> <span class="cm-keyword">None</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 154px;"/><div class="CodeMirror-gutters" style="display: none; height: 154px;"/></div></div></pre><p><span>参数:</span></p><ul><li><code>initial_learning_rate</code><span>:一个浮点数,指定 </span><code>warmup</code><span> 结束时的学习率。</span></li><li><code>decay_schedule_fn</code><span>:一个可调用对象,指定在 </span><code>warmup</code><span> 之后所采用的 </span><code>schedule</code><span> 函数。</span></li><li><code>warmup_steps</code><span>:一个整数,指定 </span><code>warmup</code><span> 阶段的 </span><code>step</code><span> 数。</span></li><li><code>power</code><span>:一个浮点数,指定用于多项式 </span><code>warmup</code><span> 的指数因子(默认为线性 </span><code>warmup</code><span> )。</span></li><li><code>name</code><span>:一个字符串,指定在 </span><code>schedule</code><span> 阶段返回的张量的 </span><code>name prefix</code><span> 。</span></li></ul></li></ol><h3 id="53-gradient-strategies-api"><span>5.3 Gradient Strategies API</span></h3><ol start=""><li><p><code>class transformers.GradientAccumulator(TensorFlow)</code><span>:</span><code>gradient accumulation</code><span> 工具函数。</span></p><p><span>当用于分布式训练时,应在副本上下文中调用该 </span><code>accumulator</code><span> 。梯度将在每个副本上局部地累积,不需要同步。然后用户应该调用 </span><code>.gradients</code><span> ,如果需要的话则 </span><code>scale</code><span> 梯度,并将结果传递给 </span><code>apply_gradients</code><span>。</span></p><p><span>方法:</span></p><ul><li><code>reset()</code><span>:在当前 </span><code>replica</code><span> 上 </span><code>reset</code><span> 被累计的梯度。</span></li></ul></li></ol><h2 id="六processors"><span>六、Processors</span></h2><ol start=""><li><p><span>任何多模态模型都需要一个对象来编码或解码数据。该数据分组了几种模态(文本、视频、音频)。这由被称为 </span><code>processor</code><span> 的对象处理,</span><code>processor</code><span> 将两个或更多的 </span><code>processing</code><span> 对象组合在一起,如 </span><code>tokenizer</code><span> (用于文本模态)、</span><code>image processors</code><span> (用于视觉)和 </span><code>feature extractors</code><span> (用于音频)。</span></p></li><li><p><code>class transformers.ProcessorMixin(*args, **kwargs )</code><span>:所有 </span><code>processor</code><span> 的 </span><code>mixin</code><span> ,用于保存和加载。</span></p><p><span>方法:</span></p><ul><li><p><code>from_pretrained(pretrained_model_name_or_path, **kwargs )</code><span>:用一个预训练模型来初始化一个 </span><code>processor</code><span> 。</span></p><p><span>参数:参考 </span><code>PreTrainedTokenizerBase.from_pretrained()</code><span> 。</span></p></li><li><p><code>push_to_hub()</code><span>:将 </span><code>processor</code><span> 上传到 </span><code>Model Hub</code><span> (对应于本地 </span><code>repo clone</code><span> 的远程 </span><code>repo path</code><span> 或 </span><code>repo name</code><span>)。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">push_to_hub</span>(<span class="cm-variable">repo_id</span>: <span class="cm-builtin">str</span>, <span class="cm-variable">use_temp_dir</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">bool</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>, <span class="cm-variable">commit_message</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>, <span class="cm-variable">private</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">bool</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>, <span class="cm-variable">use_auth_token</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-builtin">bool</span>, <span class="cm-builtin">str</span>, <span class="cm-variable">NoneType</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>, <span class="cm-variable">max_shard_size</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-builtin">int</span>, <span class="cm-builtin">str</span>, <span class="cm-variable">NoneType</span>] <span class="cm-operator">=</span> <span class="cm-string">'10GB'</span>, <span class="cm-variable">create_pr</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>, <span class="cm-operator">**</span><span class="cm-variable">deprecated_kwargs</span> )</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 88px;"/><div class="CodeMirror-gutters" style="display: none; height: 88px;"/></div></div></pre><p><span>参数:参考 </span><code>PreTrainedTokenizerBase.push_to_hub()</code><span> 。</span></p></li><li><p><code>register_for_auto_class( auto_class = 'AutoProcessor' )</code><span>:以给定的 </span><code>auto class</code><span> 来注册该类。</span></p><p><span>参数:参考 </span><code>PreTrainedTokenizerBase.register_for_auto_class()</code><span> 。</span></p></li><li><p><code>save_pretrained( save_directory: typing.Union[str, os.PathLike], push_to_hub: bool = False, **kwargs )</code><span>:保存 </span><code>processor</code><span> 。</span></p><p><span>参数:参考 </span><code>PreTrainedTokenizerBase.save_pretrained()</code><span> 。</span></p></li></ul></li></ol><h3 id="61-feature-extractor"><span>6.1 Feature Extractor</span></h3><ol start=""><li><p><code>feature extractor</code><span> 负责为音频模型或视觉模型准备输入特征。这包括:</span></p><ul><li><span>从序列中抽取特征(如将音频文件预处理为 </span><code>Log-Mel Spectrogram</code><span> 特征)。</span></li><li><span>从图像中抽取特征(如裁剪图像文件)。</span></li><li><span>以及 </span><code>padding, normalization, conversion to Numpy/PyTorch/TensorFlow tensors</code><span> 。</span></li></ul></li><li><p><code>class transformers.FeatureExtractionMixin(**kwargs)</code><span>: </span><code>feature extraction mixin</code><span> ,用于为 </span><code>sequential and image feature extractors</code><span> 提供保存和加载的能力。</span></p><p><span>方法:</span></p><ul><li><code>from_pretrained(pretrained_model_name_or_path, **kwargs )</code><span> :参考 </span><code>ProcessorMixin.from_pretrained()</code><span> 。</span></li><li><code>save_pretrained(save_directory: typing.Union[str, os.PathLike], push_to_hub: bool = False, **kwargs )</code><span>:参考 </span><code>ProcessorMixin.save_pretrained()</code><span> 。</span></li></ul></li><li><p><code>class transformers.SequenceFeatureExtractor</code><span>:用于语音识别的通用的</span><code>feature extraction</code><span> 类。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">transformers</span>.<span class="cm-property">SequenceFeatureExtractor</span>(</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">feature_size</span>: <span class="cm-builtin">int</span>, <span class="cm-variable">sampling_rate</span>: <span class="cm-builtin">int</span>, <span class="cm-variable">padding_value</span>: <span class="cm-builtin">float</span>, <span class="cm-operator">**</span><span class="cm-variable">kwargs</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 66px;"/><div class="CodeMirror-gutters" style="display: none; height: 66px;"/></div></div></pre><p><span>参数:</span></p><ul><li><code>feature_size</code><span>:一个整数,指定被抽取特征的特征维度。</span></li><li><code>sampling_rate</code><span>:一个整数,指定音频文件应该被数字化的采样率,以赫兹/秒(</span><code>Hz</code><span>)表示。</span></li><li><code>padding_value</code><span>:一个浮点数,指定 </span><code>padding value</code><span> 。</span></li></ul><p><span>方法:</span></p><ul><li><p><code>pad()</code><span>:填充 </span><code>input values/input vectors</code><span> (或者它们的 </span><code>batch</code><span> 版本),从而达到预定义的长度或 </span><code>batch</code><span> 中的最大序列长度。</span></p><p><code>padding side</code><span>(左侧/右侧)、</span><code>padding values</code><span> 是定义在 </span><code>feature extractor level</code><span>(通过 </span><code>self.padding_side</code><span>、</span><code>self.padding_value</code><span>)。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">pad</span>(</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">processed_features</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-variable">transformers</span>.<span class="cm-property">feature_extraction_utils</span>.<span class="cm-property">BatchFeature</span>, <span class="cm-variable">typing</span>.<span class="cm-property">List</span>[<span class="cm-variable">transformers</span>.<span class="cm-property">feature_extraction_utils</span>.<span class="cm-property">BatchFeature</span>], <span class="cm-variable">typing</span>.<span class="cm-property">Dict</span>[<span class="cm-builtin">str</span>, <span class="cm-variable">transformers</span>.<span class="cm-property">feature_extraction_utils</span>.<span class="cm-property">BatchFeature</span>], <span class="cm-variable">typing</span>.<span class="cm-property">Dict</span>[<span class="cm-builtin">str</span>, <span class="cm-variable">typing</span>.<span class="cm-property">List</span>[<span class="cm-variable">transformers</span>.<span class="cm-property">feature_extraction_utils</span>.<span class="cm-property">BatchFeature</span>]], <span class="cm-variable">typing</span>.<span class="cm-property">List</span>[<span class="cm-variable">typing</span>.<span class="cm-property">Dict</span>[<span class="cm-builtin">str</span>, <span class="cm-variable">transformers</span>.<span class="cm-property">feature_extraction_utils</span>.<span class="cm-property">BatchFeature</span>]]],</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">padding</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-builtin">bool</span>, <span class="cm-builtin">str</span>, <span class="cm-variable">transformers</span>.<span class="cm-property">utils</span>.<span class="cm-property">generic</span>.<span class="cm-property">PaddingStrategy</span>] <span class="cm-operator">=</span> <span class="cm-keyword">True</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">max_length</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">int</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">truncation</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">pad_to_multiple_of</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">int</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">return_attention_mask</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">bool</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">return_tensors</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-builtin">str</span>, <span class="cm-variable">transformers</span>.<span class="cm-property">utils</span>.<span class="cm-property">generic</span>.<span class="cm-property">TensorType</span>, <span class="cm-variable">NoneType</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 286px;"/><div class="CodeMirror-gutters" style="display: none; height: 286px;"/></div></div></pre><p><span>参数:</span></p><ul><li><p><code>processed_features</code><span>:表示被处理的特征,可以是一个输入,也可以是 </span><code>batch</code><span> 的输入。</span></p></li><li><p><code>padding/max_length/truncation/pad_to_multiple_of/return_attention_mask</code><span>:参考 </span><code>PreTrainedTokenizerBase.__call__()</code><span> 方法。</span></p></li><li><p><code>return_tensors</code><span>:一个字符串或 </span><code>TensorType</code><span>,指定返回的数据类型。如果设置了,则返回张量类型而不是 </span><code>Python</code><span> 的整数列表。</span></p><ul><li><code>'tf'</code><span>:返回的是 </span><code>TensorFlow tf.constant</code><span> 对象。</span></li><li><code>'pt'</code><span>:返回的是 </span><code>PyTorch torch.Tensor</code><span> 对象。</span></li><li><code>'np'</code><span>:返回的是 </span><code>Numpy np.ndarray</code><span> 对象。</span></li></ul></li></ul></li></ul></li><li><p><code>class transformers.BatchFeature</code><span>:持有 </span><code>pad()</code><span> 、以及 </span><code>feature extractor</code><span> 的 </span><code>__call__()</code><span> 方法的 </span><code>output</code><span> 。它是 </span><code>Python</code><span> 字典的派生类,可以作为一个字典来使用。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">transformers</span>.<span class="cm-property">BatchFeature</span>(</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">data</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-variable">typing</span>.<span class="cm-property">Dict</span>[<span class="cm-builtin">str</span>, <span class="cm-variable">typing</span>.<span class="cm-property">Any</span>], <span class="cm-variable">NoneType</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">tensor_type</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-variable">NoneType</span>, <span class="cm-builtin">str</span>, <span class="cm-variable">transformers</span>.<span class="cm-property">utils</span>.<span class="cm-property">generic</span>.<span class="cm-property">TensorType</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span> </span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 88px;"/><div class="CodeMirror-gutters" style="display: none; height: 88px;"/></div></div></pre><p><span>参数:</span></p><ul><li><code>data</code><span>:一个字典,是由 </span><code>__call__()/pad()</code><span> 方法返回的值。</span></li><li><code>tensor_type</code><span>:一个字符串或 </span><code>TensorType</code><span>,指定张量类型。</span></li></ul><p><span>方法:</span></p><ul><li><p><code>convert_to_tensors( tensor_type: typing.Union[str, transformers.utils.generic.TensorType, NoneType] = None)</code><span>:将内部内容转换为指定的张量类型。</span></p><p><span>参数:</span><code>tensor_type</code><span>:一个字符串或 </span><code>TensorType</code><span>,指定张量类型。</span></p></li><li><p><code>to(device: typing.Union[str, ForwardRef('torch.device')]) -> BatchFeature</code><span>:将所有的值都移动到指定设备上(仅用于 </span><code>PyTorch</code><span> )。</span></p><p><span>参数:</span><code>device</code><span>:一个字符串或 </span><code>torch.device</code><span>,指定设备。</span></p></li></ul></li><li><p><code>class transformers.ImageFeatureExtractionMixin</code><span>:用于准备图片特征的 </span><code>mixin</code><span> 。</span></p><p><span>方法:</span></p><ul><li><p><code>center_crop(image, size ) -> new_image</code><span>:使用中心裁剪的方式将图像裁剪到指定的尺寸。注意,如果图像太小而无法裁剪到指定的尺寸,它将被填充(所以返回的结果具有指定的尺寸)。</span></p><p><span>参数:</span></p><ul><li><code>image</code><span>:一个 </span><code>PIL.Image.Image</code><span> 或 </span><code>np.ndarray</code><span> 或 </span><code>torch.Tensor</code><span> (形状为 </span><code>(n_channels, height, width) or (height, width, n_channels)</code><span> ),表示输入的图像。</span></li><li><code>size</code><span>:一个整数或 </span><code>Tuple[int, int]</code><span> 元组,指定目标尺寸。</span></li></ul><p><span>返回一个新的图像,类型和 </span><code>image</code><span> 相同。</span></p></li><li><p><code>convert_rgb(image) -> new_image</code><span>:将 </span><code>PIL.Image.Image</code><span> 转换为 </span><code>RGB</code><span> 格式。</span></p><p><span>参数:</span><code>image</code><span>:一个 </span><code>PIL.Image.Image</code><span>,指定被转换的图片。</span></p></li><li><p><code>expand_dims(image) -> new_image</code><span>:将二维图像扩展为三维。</span></p><p><span>参数:</span><code>image</code><span>:一个 </span><code>PIL.Image.Image</code><span> 或 </span><code>np.ndarray</code><span> 或 </span><code>torch.Tensor</code><span>,指定输入图像。</span></p></li><li><p><code>flip_channel_order(image) -> new_image</code><span>:将 </span><code>image</code><span> 的通道顺序从 </span><code>RGB</code><span> 翻转为 </span><code>BGR</code><span>、或从 </span><code>BGR</code><span> 翻转为 </span><code>RGB</code><span> 。注意,如果 </span><code>image</code><span> 是一个 </span><code>PIL Image</code><span>,则会将 </span><code>image</code><span> 转换到 </span><code>numpy array</code><span> 。</span></p><p><span>参数:</span><code>image</code><span>:一个 </span><code>PIL.Image.Image</code><span> 或 </span><code>np.ndarray</code><span> 或 </span><code>torch.Tensor</code><span>,指定输入图像。</span></p></li><li><p><code>normalize( image, mean, std, rescale = False ) -> new_image</code><span>:将 </span><code>image</code><span> 归一化到均值 </span><code>mean</code><span>、标准差 </span><code>std</code><span>。注意,如果 </span><code>image</code><span> 是一个 </span><code>PIL Image</code><span>,则会将 </span><code>image</code><span> 转换到 </span><code>numpy array</code><span> 。</span></p><p><span>参数:</span></p><ul><li><code>image</code><span>:一个 </span><code>PIL.Image.Image</code><span> 或 </span><code>np.ndarray</code><span> 或 </span><code>torch.Tensor</code><span>,指定输入图像。</span></li><li><code>mean</code><span>:一个 </span><code>List[float]</code><span> 或 </span><code>np.ndarray</code><span> 或 </span><code>torch.Tensor</code><span>,指定每个通道的均值。</span></li><li><code>std</code><span>:一个 </span><code>List[float]</code><span> 或 </span><code>np.ndarray</code><span> 或 </span><code>torch.Tensor</code><span>,指定每个通道的标准差。</span></li><li><code>rescale</code><span>:一个布尔值,指定是否将 </span><code>image</code><span> 重新缩放到 </span><code>0.0 ~ 1.0</code><span> 之间。如果 </span><code>image</code><span> 是一个 </span><code>PIL Image</code><span> ,则自动执行缩放。</span></li></ul></li><li><p><code>rescale(image: ndarray, scale: typing.Union[float, int] ) -> new_image</code><span>:缩放一个 </span><code>numpy image</code><span> 。</span></p></li><li><p><code>resize(image, size, resample = None, default_to_square = True, max_size = None) -> new_image</code><span> :</span><code>reisze</code><span> 图片。会强制将 </span><code>image</code><span> 转换为 </span><code>PIL.Image</code><span>,最终返回结果是 </span><code>PIL.Image</code><span> 。</span></p><p><span>参数:</span></p><ul><li><p><code>image</code><span>:一个 </span><code>PIL.Image.Image</code><span> 或 </span><code>np.ndarray</code><span> 或 </span><code>torch.Tensor</code><span>,指定输入图像。</span></p></li><li><p><code>size</code><span>:一个整数或 </span><code>Tuple[int, int]</code><span>,指定目标尺寸。</span></p><ul><li><span>如果 </span><code>size</code><span> 是一个元组,那么输出尺寸将与之匹配。</span></li><li><span>如果 </span><code>size</code><span> 是一个整数且 </span><code>default_to_square = True</code><span>,则输出尺寸为 </span><code>(size, size)</code><span> 。</span></li><li><span>如果 </span><code>size</code><span> 是一个整数且 </span><code>default_to_square = False</code><span>,那么图像的较短的边将与 </span><code>size</code><span> 相匹配。即,如果 </span><code>height > width</code><span> ,那么图像将被调整为 </span><code>(size * height / width, size)</code><span> 。</span></li></ul></li><li><p><code>resample</code><span>:一个整数,指定用于 </span><code>resampling</code><span> 的 </span><code>filter</code><span>,默认为 </span><code>PILImageResampling.BILINEAR</code><span> 。</span></p></li><li><p><code>default_to_square</code><span>:一个布尔值,指定当 </span><code>size</code><span> 是一个整数时是否调整为正方形。</span></p></li><li><p><code>max_size</code><span>:一个整数,指定被调整之后的图像的 </span><code>longer edge</code><span> 的最大值。如果超出了这个 </span><code>max_size</code><span>,则图像被再次调整,使得 </span><code>longer edge</code><span> 等于 </span><code>max_size</code><span> 。仅在 </span><code>default_to_square = False</code><span> 时有效。</span></p></li></ul></li><li><p><code>rotate(image, angle, resample = None, expand = 0, center = None, translate = None, fillcolor = None ) -> new_image</code><span>:旋转图像,返回一个 </span><code>PIL.Image.Image</code><span> 。</span></p></li><li><p><code>to_numpy_array(image, rescale = None, channel_first = True)</code><span>:将图片转换为 </span><code>numpy array</code><span> 。</span></p><p><span>参数:</span></p><ul><li><code>image</code><span>:一个 </span><code>PIL.Image.Image</code><span> 或 </span><code>np.ndarray</code><span> 或 </span><code>torch.Tensor</code><span>,指定输入图像。</span></li><li><code>rescale</code><span>:一个布尔值,指定是否将 </span><code>image</code><span> 重新缩放到 </span><code>0.0 ~ 1.0</code><span> 之间。如果 </span><code>image</code><span> 是一个 </span><code>PIL Image</code><span> 或整数的 </span><code>array/tensor</code><span>,则默认为 </span><code>True</code><span>。</span></li><li><code>channel_first</code><span>:一个布尔值,指定是否 </span><code>channel dimension first</code><span> 。</span></li></ul></li><li><p><code>to_pil_image( image, rescale = None )</code><span>:将图片转换为 </span><code>PIL Image</code><span> 。</span></p><ul><li><code>image</code><span>:一个 </span><code>PIL.Image.Image</code><span> 或 </span><code>np.ndarray</code><span> 或 </span><code>torch.Tensor</code><span>,指定输入图像。</span></li><li><code>rescale</code><span>:一个布尔值,指定是否将 </span><code>image</code><span> 重新缩放到 </span><code>0 ~ 255</code><span> 之间。如果 </span><code>image</code><span> 是浮点类型的 </span><code>array/tensor</code><span>,则默认为 </span><code>True</code><span>。</span></li></ul></li></ul></li></ol><h3 id="62-image-processor"><span>6.2 Image Processor</span></h3><ol start=""><li><p><code>image processor</code><span> 负责为视觉模型准备输入特征,并对其输出进行后处理。这包括 </span><code>transformations</code><span> (如 </span><code>resizing</code><span>、</span><code>normalization</code><span> 、以及转换为 </span><code>PyTorch/TensorFlow/Flax/Numpy</code><span> 张量)。还可能包括特定模型的后处理,如将 </span><code>logits</code><span> 转换为 </span><code>segmentation masks</code><span> 。</span></p></li><li><p><code>class transformers.ImageProcessingMixin(** kwargs)</code><span>:</span><code>image processor mixin</code><span> 。</span></p><p><span>方法:</span></p><ul><li><code>from_pretrained(pretrained_model_name_or_path, **kwargs )</code><span> :参考 </span><code>ProcessorMixin.from_pretrained()</code><span> 。</span></li><li><code>save_pretrained(save_directory: typing.Union[str, os.PathLike], push_to_hub: bool = False, **kwargs )</code><span>:参考 </span><code>ProcessorMixin.save_pretrained()</code><span> 。</span></li></ul></li></ol><h2 id="七分享预训练的模型"><span>七、分享预训练的模型</span></h2><ol start=""><li><p><span>创建新的 </span><code>model repository</code><span> 的方法有以下三种:使用 </span><code>push_to_hub API</code><span> 、使用 </span><code>huggingface_hub</code><span> 的 </span><code>Python</code><span> 库、使用 </span><code>web</code><span> 界面。</span></p><p><span>创建 </span><code>repository</code><span> 后,你可以通过 </span><code>git</code><span> 和 </span><code>git-lfs</code><span> 将文件上传到其中。</span></p></li></ol><h3 id="71-使用-pushtohub-api"><span>7.1 使用 push_to_hub API</span></h3><ol start=""><li><p><span>首先登录 </span><code>Hugging Face</code><span> 。</span></p><p><span>如果在</span><code>notebook</code><span> 中,可以使用以下函数登录:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">huggingface_hub</span> <span class="cm-keyword">import</span> <span class="cm-variable">notebook_login</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">notebook_login</span>()</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 66px;"/><div class="CodeMirror-gutters" style="display: none; height: 66px;"/></div></div></pre><p><span>如果你在终端中,可以运行命令:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">huggingface-cli login</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre><p><span>在这两种情况下,系统都会提示你输入用户名和密码。</span></p></li><li><p><span>如果你使用 </span><code>Trainer API</code><span> 来训练一个模型,将其上传到 </span><code>Hub</code><span> 的最简单方法是:当定义 </span><code>TrainingArguments</code><span> 时设置 </span><code>push_to_hub=True</code><span> :</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">transformers</span> <span class="cm-keyword">import</span> <span class="cm-variable">TrainingArguments</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">training_args</span> <span class="cm-operator">=</span> <span class="cm-variable">TrainingArguments</span>(</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-string">"bert-finetuned-mrpc"</span>, <span class="cm-variable">save_strategy</span><span class="cm-operator">=</span><span class="cm-string">"epoch"</span>, <span class="cm-variable">push_to_hub</span><span class="cm-operator">=</span><span class="cm-keyword">True</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 110px;"/><div class="CodeMirror-gutters" style="display: none; height: 110px;"/></div></div></pre><p><span>当你调用 </span><code>trainer.train()</code><span> 时,</span><code>Trainer</code><span> 将在每次保存模型时,同时将模型上传到 </span><code>Hub</code><span> 中你的命名空间中的 </span><code>repository</code><span>。该</span><code>repository</code><span> 将命名为你选择的输出目录(此处 </span><code>bert-finetuned-mrpc</code><span> ) ,但是你也可以选择不同的名称,通过设置 </span><code>hub_model_id = "a_different_name"</code><span> 参数。</span></p><p><span>要将模型上传到你所属的组织,只需将其传递给 </span><code>hub_model_id = my_organization/my_repo_name</code><span> 。</span></p><p><span>训练结束后,你应该做最后的 </span><code>trainer.push_to_hub()</code><span> 上传模型的最新版本。它还将生成包含所有相关元数据的模型卡,报告使用的超参数和评估结果!以下是你可能会在此类模型卡中找到的内容示例:</span></p><p><img src="../imgs/transformer_model_card.png" width="500px"/></p></li><li><p><span>在 </span><code>lower level</code><span> ,可以通过模型、</span><code>tokenizer</code><span> 和配置对象的 </span><code>push_to_hub()</code><span> 方法直接访问 </span><code>Model Hub</code><span> 。此方法负责创建 </span><code>repository</code><span> 并将模型和 </span><code>tokenizer</code><span> 文件直接推送到 </span><code>repository</code><span> 。如:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">transformers</span> <span class="cm-keyword">import</span> <span class="cm-variable">AutoModelForMaskedLM</span>, <span class="cm-variable">AutoTokenizer</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">checkpoint</span> <span class="cm-operator">=</span> <span class="cm-string">"camembert-base"</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">model</span> <span class="cm-operator">=</span> <span class="cm-variable">AutoModelForMaskedLM</span>.<span class="cm-property">from_pretrained</span>(<span class="cm-variable">checkpoint</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">tokenizer</span> <span class="cm-operator">=</span> <span class="cm-variable">AutoTokenizer</span>.<span class="cm-property">from_pretrained</span>(<span class="cm-variable">checkpoint</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-operator">...</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">model</span>.<span class="cm-property">push_to_hub</span>(<span class="cm-string">"dummy-model"</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">tokenizer</span>.<span class="cm-property">push_to_hub</span>(<span class="cm-string">"dummy-model"</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment"># tokenizer.push_to_hub("dummy-model", organization="huggingface")</span></span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 220px;"/><div class="CodeMirror-gutters" style="display: none; height: 220px;"/></div></div></pre><p><span>如果你属于一个组织,只需指定 </span><code>organization</code><span> 参数 。</span></p><p><span>如果你希望使用特定的 </span><code>Hugging Face token</code><span> ,你可以自由地将其指定给 </span><code>push_to_hub()</code><span> 方法 :</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">tokenizer</span>.<span class="cm-property">push_to_hub</span>(<span class="cm-string">"dummy-model"</span>, <span class="cm-variable">organization</span><span class="cm-operator">=</span><span class="cm-string">"huggingface"</span>, <span class="cm-variable">use_auth_token</span><span class="cm-operator">=</span><span class="cm-string">"<TOKEN>"</span>)</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre></li></ol><h3 id="72-使用-huggingfacehub-的-python-库"><span>7.2 使用 huggingface_hub 的 Python 库</span></h3><ol start=""><li><p><span>类似于使用</span><code>push_to_hub API</code><span>,首先要求你将</span><code>API token</code><span> 保存在缓存中。为此,需要在终端中运行命令 </span><code>huggingface-cli login</code><span> 。</span></p></li><li><p><code>huggingface_hub package</code><span> 提供了几种对我们有用的方法和类。首先,有几种方法可以管理存储库的创建、删除等:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python" style="break-inside: unset;"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">huggingface_hub</span> <span class="cm-keyword">import</span> (</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-comment"># User management</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">login</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">logout</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">whoami</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-comment"># Repository creation and management</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">create_repo</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">delete_repo</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">update_repo_visibility</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-comment"># And some methods to retrieve/change information about the content</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">list_models</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">list_datasets</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">list_metrics</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">list_repo_files</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">upload_file</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">delete_file</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 418px;"/><div class="CodeMirror-gutters" style="display: none; height: 418px;"/></div></div></pre><p><span>此外,它还提供了非常强大的 </span><code>Repository</code><span> 类用于管理本地 </span><code>repository</code><span> 。</span></p><p><code>create_repo</code><span> 用于创建位于 </span><code>hub</code><span> 上的新 </span><code>repository</code><span>:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">huggingface_hub</span> <span class="cm-keyword">import</span> <span class="cm-variable">create_repo</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">create_repo</span>(<span class="cm-string">"dummy-model"</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment"># create_repo("dummy-model", organization="huggingface")</span></span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 88px;"/><div class="CodeMirror-gutters" style="display: none; height: 88px;"/></div></div></pre><p><span>创建 </span><code>repository</code><span> 后,我们应该向其中添加文件。</span></p></li></ol><h3 id="73-使用-web-界面"><span>7.3 使用 Web 界面</span></h3><ol start=""><li><p><code>Web</code><span> 界面提供了直接在 </span><code>Hub</code><span> 中管理 </span><code>repo</code><span> 的工具。使用该界面,你可以轻松创建 </span><code>repo</code><span> 、添加文件(甚至是大文件)、探索模型、可视化差异等等。</span></p><p><span>要创建新的 </span><code>repo</code><span> ,请访问</span><code>https://huggingface.co/new</code><span> 。</span></p></li></ol><h3 id="74-其它"><span>7.4 其它</span></h3><ol start=""><li><p><span>上传模型文件:</span><code>Hugging Face Hub</code><span> 上的文件管理系统基于用于常规文件的 </span><code>gi</code><span>t 和 </span><code>git-lfs</code><span> 。可以通过 </span><code>huggingface_hub</code><span> 、以及通过 </span><code>git</code><span>命令来上传文件到 </span><code>Hub</code><span>。</span></p><ul><li><p><span>通过 </span><code>huggingface_hub</code><span> :使用 </span><code>upload_file</code><span> 不需要系统上安装 </span><code>git</code><span>和 </span><code>git-lfs</code><span>。它使用 </span><code>HTTP POST</code><span> 请求将文件直接推送到 </span><code>Hub</code><span>。这种方法的一个限制是它不能处理大于 </span><code>5GB</code><span> 的文件。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">huggingface_hub</span> <span class="cm-keyword">import</span> <span class="cm-variable">upload_file</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">upload_file</span>(</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-string">"<path_to_file>/config.json"</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">path_in_repo</span><span class="cm-operator">=</span><span class="cm-string">"config.json"</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">repo_id</span><span class="cm-operator">=</span><span class="cm-string">"<namespace>/dummy-model"</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 154px;"/><div class="CodeMirror-gutters" style="display: none; height: 154px;"/></div></div></pre><p><span>这将位于 </span><code><path_to_file></code><span> 下的 </span><code>config.json</code><span> 上传到 </span><code>repository</code><span> (由 </span><code><namespace>/dummy-model</code><span> 指定)根目录下的 </span><code>config.json</code><span> 。</span></p><p><span>其他可能有用的参数是:</span></p><ul><li><code>token</code><span> ,如果您想用给定的</span><code>token</code><span> 覆盖存储在缓存中的</span><code>token</code><span> 。</span></li><li><code>repo_type</code><span> , 如果你想要上传一个 </span><code>dataset</code><span> 或一个 </span><code>space</code><span> 而不是模型。 接受的值为 </span><code>"dataset"</code><span> 和 </span><code>"space"</code><span>。</span></li></ul></li><li><p><code>Repository</code><span> 类:以类似 </span><code>git</code><span> 的方式管理本地 </span><code>repository</code><span> 。它抽象了 </span><code>git</code><span> 可能遇到的大部分痛点,以提供我们需要的所有功能。使用这个类需要安装 </span><code>git</code><span>和 </span><code>git-lfs</code><span>,所以确保你已经安装了 </span><code>git-lfs</code><span>。</span></p><p><span>我们可以通过克隆远程 </span><code>repo</code><span> 将其初始化到本地文件夹开始:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">huggingface_hub</span> <span class="cm-keyword">import</span> <span class="cm-variable">Repository</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">repo</span> <span class="cm-operator">=</span> <span class="cm-variable">Repository</span>(<span class="cm-string">"<path_to_dummy_folder>"</span>, <span class="cm-variable">clone_from</span><span class="cm-operator">=</span><span class="cm-string">"<namespace>/dummy-model"</span>)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 66px;"/><div class="CodeMirror-gutters" style="display: none; height: 66px;"/></div></div></pre><p><span>这将在当前目录创建文件夹 </span><code><path_to_dummy_folder></code><span> 。接下来我们可以运行一些传统的 </span><code>git</code><span> 方法:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">repo</span>.<span class="cm-property">git_pull</span>()</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">repo</span>.<span class="cm-property">git_add</span>()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">repo</span>.<span class="cm-property">git_commit</span>()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">repo</span>.<span class="cm-property">git_push</span>()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">repo</span>.<span class="cm-property">git_tag</span>()</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 110px;"/><div class="CodeMirror-gutters" style="display: none; height: 110px;"/></div></div></pre></li><li><p><code>git-based</code><span> 方法:这是上传文件的非常简单的方法:我们将直接使用 </span><code>git</code><span> 和 </span><code>git-lfs</code><span> 来完成。</span></p><p><span>首先从初始化 </span><code>git-lfs</code><span> 开始:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-builtin">git</span> lfs install</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre><p><span>完成后,第一步是克隆您的模型 </span><code>repository</code><span> :</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-builtin">git</span> clone https://huggingface.co/<namespace>/<your-model-id></span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre><p><span>接下来运行 </span><code>Python</code><span> 代码并保存模型或 </span><code>tokenizer</code><span>,然后执行 </span><code>git add, git commit, git push</code><span> 从而上传模型文件。</span></p></li></ul></li><li><p><code>Model Card</code><span>:模型卡片是一个配置文件,可以说与模型和</span><code>tokenizer</code><span> 文件一样重要。它包含了模型的核心定义,确保了社区成员可以复现模型的结果,并提供一个其他成员可以在这个模型基础上构建他们的组件的平台。</span></p><p><span>记录训练和评估过程有助于其他人了解模型的预期效果,并且提供有关所使用的数据以及预处理/后处理的足够信息,可确保能够识别和了解模型的局限性、 </span><code>bias</code><span> 以及 </span><code>context</code><span> 。</span></p><p><span>创建模型卡片是通过 </span><code>README.md</code><span> 来实现的。模型卡片通常以非常简短的概述开始,说明模型的用途,然后是模型卡片需要的其他信息:模型描述、预期用途和限制、如何使用、局限性和 </span><code>bias</code><span> 、训练数据、训练程序、评价结果。</span></p></li></ol><h2 id="八示例"><span>八、示例</span></h2><ol start=""><li><p><span>使用 </span><code>Trainer API</code><span>微调模型:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python" style="break-inside: unset;"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment">### 加载数据集 ###</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">datasets</span> <span class="cm-keyword">import</span> <span class="cm-variable">load_dataset</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">transformers</span> <span class="cm-keyword">import</span> <span class="cm-variable">AutoTokenizer</span>, <span class="cm-variable">DataCollatorWithPadding</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">raw_datasets</span> <span class="cm-operator">=</span> <span class="cm-variable">load_dataset</span>(<span class="cm-string">"glue"</span>, <span class="cm-string">"mrpc"</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">checkpoint</span> <span class="cm-operator">=</span> <span class="cm-string">"bert-base-uncased"</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">tokenizer</span> <span class="cm-operator">=</span> <span class="cm-variable">AutoTokenizer</span>.<span class="cm-property">from_pretrained</span>(<span class="cm-variable">checkpoint</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">def</span> <span class="cm-def">tokenize_function</span>(<span class="cm-variable">example</span>):</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-keyword">return</span> <span class="cm-variable">tokenizer</span>(<span class="cm-variable">example</span>[<span class="cm-string">"sentence1"</span>], <span class="cm-variable">example</span>[<span class="cm-string">"sentence2"</span>], <span class="cm-variable">truncation</span><span class="cm-operator">=</span><span class="cm-keyword">True</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">tokenized_datasets</span> <span class="cm-operator">=</span> <span class="cm-variable">raw_datasets</span>.<span class="cm-property">map</span>(<span class="cm-variable">tokenize_function</span>, <span class="cm-variable">batched</span><span class="cm-operator">=</span><span class="cm-keyword">True</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">data_collator</span> <span class="cm-operator">=</span> <span class="cm-variable">DataCollatorWithPadding</span>(<span class="cm-variable">tokenizer</span><span class="cm-operator">=</span><span class="cm-variable">tokenizer</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment">### 定义 TrainingArguments ### </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">transformers</span> <span class="cm-keyword">import</span> <span class="cm-variable">TrainingArguments</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">training_args</span> <span class="cm-operator">=</span> <span class="cm-variable">TrainingArguments</span>(<span class="cm-string">"test-trainer"</span>, <span class="cm-variable">evaluation_strategy</span><span class="cm-operator">=</span><span class="cm-string">"epoch"</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment">### 定义模型 ###</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">transformers</span> <span class="cm-keyword">import</span> <span class="cm-variable">AutoModelForSequenceClassification</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">model</span> <span class="cm-operator">=</span> <span class="cm-variable">AutoModelForSequenceClassification</span>.<span class="cm-property">from_pretrained</span>(<span class="cm-variable">checkpoint</span>, <span class="cm-variable">num_labels</span><span class="cm-operator">=</span><span class="cm-number">2</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment">### 定义 Trainer ###</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">transformers</span> <span class="cm-keyword">import</span> <span class="cm-variable">Trainer</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">import</span> <span class="cm-variable">evaluate</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">def</span> <span class="cm-def">compute_metrics</span>(<span class="cm-variable">eval_preds</span>): <span class="cm-comment"># 评估函数</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">metric</span> <span class="cm-operator">=</span> <span class="cm-variable">evaluate</span>.<span class="cm-property">load</span>(<span class="cm-string">"glue"</span>, <span class="cm-string">"mrpc"</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">logits</span>, <span class="cm-variable">labels</span> <span class="cm-operator">=</span> <span class="cm-variable">eval_preds</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">predictions</span> <span class="cm-operator">=</span> <span class="cm-variable">np</span>.<span class="cm-property">argmax</span>(<span class="cm-variable">logits</span>, <span class="cm-variable">axis</span><span class="cm-operator">=-</span><span class="cm-number">1</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-keyword">return</span> <span class="cm-variable">metric</span>.<span class="cm-property">compute</span>(<span class="cm-variable">predictions</span><span class="cm-operator">=</span><span class="cm-variable">predictions</span>, <span class="cm-variable">references</span><span class="cm-operator">=</span><span class="cm-variable">labels</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> </span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">trainer</span> <span class="cm-operator">=</span> <span class="cm-variable">Trainer</span>(</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">model</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">training_args</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">train_dataset</span><span class="cm-operator">=</span><span class="cm-variable">tokenized_datasets</span>[<span class="cm-string">"train"</span>],</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">eval_dataset</span><span class="cm-operator">=</span><span class="cm-variable">tokenized_datasets</span>[<span class="cm-string">"validation"</span>],</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">data_collator</span><span class="cm-operator">=</span><span class="cm-variable">data_collator</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">tokenizer</span><span class="cm-operator">=</span><span class="cm-variable">tokenizer</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">compute_metrics</span><span class="cm-operator">=</span><span class="cm-variable">compute_metrics</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment">### 训练 ###</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">trainer</span>.<span class="cm-property">train</span>()</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 968px;"/><div class="CodeMirror-gutters" style="display: none; height: 968px;"/></div></div></pre></li><li><p><span>不使用 </span><code>Trainer API</code><span> 来训练,纯人工实现:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python" style="break-inside: unset;"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment">### 加载数据集 ###</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">datasets</span> <span class="cm-keyword">import</span> <span class="cm-variable">load_dataset</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">transformers</span> <span class="cm-keyword">import</span> <span class="cm-variable">AutoTokenizer</span>, <span class="cm-variable">DataCollatorWithPadding</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">raw_datasets</span> <span class="cm-operator">=</span> <span class="cm-variable">load_dataset</span>(<span class="cm-string">"glue"</span>, <span class="cm-string">"mrpc"</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">checkpoint</span> <span class="cm-operator">=</span> <span class="cm-string">"bert-base-uncased"</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">tokenizer</span> <span class="cm-operator">=</span> <span class="cm-variable">AutoTokenizer</span>.<span class="cm-property">from_pretrained</span>(<span class="cm-variable">checkpoint</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">def</span> <span class="cm-def">tokenize_function</span>(<span class="cm-variable">example</span>):</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-keyword">return</span> <span class="cm-variable">tokenizer</span>(<span class="cm-variable">example</span>[<span class="cm-string">"sentence1"</span>], <span class="cm-variable">example</span>[<span class="cm-string">"sentence2"</span>], <span class="cm-variable">truncation</span><span class="cm-operator">=</span><span class="cm-keyword">True</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">tokenized_datasets</span> <span class="cm-operator">=</span> <span class="cm-variable">raw_datasets</span>.<span class="cm-property">map</span>(<span class="cm-variable">tokenize_function</span>, <span class="cm-variable">batched</span><span class="cm-operator">=</span><span class="cm-keyword">True</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">data_collator</span> <span class="cm-operator">=</span> <span class="cm-variable">DataCollatorWithPadding</span>(<span class="cm-variable">tokenizer</span><span class="cm-operator">=</span><span class="cm-variable">tokenizer</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">tokenized_datasets</span> <span class="cm-operator">=</span> <span class="cm-variable">tokenized_datasets</span>.<span class="cm-property">remove_columns</span>([<span class="cm-string">"sentence1"</span>, <span class="cm-string">"sentence2"</span>, <span class="cm-string">"idx"</span>]) <span class="cm-comment"># 删除不必要的列</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">tokenized_datasets</span> <span class="cm-operator">=</span> <span class="cm-variable">tokenized_datasets</span>.<span class="cm-property">rename_column</span>(<span class="cm-string">"label"</span>, <span class="cm-string">"labels"</span>) <span class="cm-comment"># 重命名列</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">tokenized_datasets</span>.<span class="cm-property">set_format</span>(<span class="cm-string">"torch"</span>) <span class="cm-comment"># 设置数据集的格式</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">tokenized_datasets</span>[<span class="cm-string">"train"</span>].<span class="cm-property">column_names</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment"># ["attention_mask", "input_ids", "labels", "token_type_ids"]</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">torch</span>.<span class="cm-property">utils</span>.<span class="cm-property">data</span> <span class="cm-keyword">import</span> <span class="cm-variable">DataLoader</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">train_dataloader</span> <span class="cm-operator">=</span> <span class="cm-variable">DataLoader</span>(</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">tokenized_datasets</span>[<span class="cm-string">"train"</span>], <span class="cm-variable">shuffle</span><span class="cm-operator">=</span><span class="cm-keyword">True</span>, <span class="cm-variable">batch_size</span><span class="cm-operator">=</span><span class="cm-number">8</span>, <span class="cm-variable">collate_fn</span><span class="cm-operator">=</span><span class="cm-variable">data_collator</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">eval_dataloader</span> <span class="cm-operator">=</span> <span class="cm-variable">DataLoader</span>(</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">tokenized_datasets</span>[<span class="cm-string">"validation"</span>], <span class="cm-variable">batch_size</span><span class="cm-operator">=</span><span class="cm-number">8</span>, <span class="cm-variable">collate_fn</span><span class="cm-operator">=</span><span class="cm-variable">data_collator</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment">### 定义模型 ###</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">transformers</span> <span class="cm-keyword">import</span> <span class="cm-variable">AutoModelForSequenceClassification</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">model</span> <span class="cm-operator">=</span> <span class="cm-variable">AutoModelForSequenceClassification</span>.<span class="cm-property">from_pretrained</span>(<span class="cm-variable">checkpoint</span>, <span class="cm-variable">num_labels</span><span class="cm-operator">=</span><span class="cm-number">2</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment">### 定义优化器和调度器 ###</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">transformers</span> <span class="cm-keyword">import</span> <span class="cm-variable">AdamW</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">transformers</span> <span class="cm-keyword">import</span> <span class="cm-variable">get_scheduler</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">optimizer</span> <span class="cm-operator">=</span> <span class="cm-variable">AdamW</span>(<span class="cm-variable">model</span>.<span class="cm-property">parameters</span>(), <span class="cm-variable">lr</span><span class="cm-operator">=</span><span class="cm-number">5e-5</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">num_epochs</span> <span class="cm-operator">=</span> <span class="cm-number">3</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">num_training_steps</span> <span class="cm-operator">=</span> <span class="cm-variable">num_epochs</span> <span class="cm-operator">*</span> <span class="cm-builtin">len</span>(<span class="cm-variable">train_dataloader</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">lr_scheduler</span> <span class="cm-operator">=</span> <span class="cm-variable">get_scheduler</span>(</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-string">"linear"</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">optimizer</span><span class="cm-operator">=</span><span class="cm-variable">optimizer</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">num_warmup_steps</span><span class="cm-operator">=</span><span class="cm-number">0</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">num_training_steps</span><span class="cm-operator">=</span><span class="cm-variable">num_training_steps</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment">### 定义 training loop ###</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">import</span> <span class="cm-variable">torch</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">device</span> <span class="cm-operator">=</span> <span class="cm-variable">torch</span>.<span class="cm-property">device</span>(<span class="cm-string">"cuda"</span>) <span class="cm-keyword">if</span> <span class="cm-variable">torch</span>.<span class="cm-property">cuda</span>.<span class="cm-property">is_available</span>() <span class="cm-keyword">else</span> <span class="cm-variable">torch</span>.<span class="cm-property">device</span>(<span class="cm-string">"cpu"</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">model</span>.<span class="cm-property">to</span>(<span class="cm-variable">device</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">tqdm</span>.<span class="cm-property">auto</span> <span class="cm-keyword">import</span> <span class="cm-variable">tqdm</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">progress_bar</span> <span class="cm-operator">=</span> <span class="cm-variable">tqdm</span>(<span class="cm-builtin">range</span>(<span class="cm-variable">num_training_steps</span>))</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">model</span>.<span class="cm-property">train</span>()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">for</span> <span class="cm-variable">epoch</span> <span class="cm-keyword">in</span> <span class="cm-builtin">range</span>(<span class="cm-variable">num_epochs</span>):</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-keyword">for</span> <span class="cm-variable">batch</span> <span class="cm-keyword">in</span> <span class="cm-variable">train_dataloader</span>:</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">batch</span> <span class="cm-operator">=</span> {<span class="cm-variable">k</span>: <span class="cm-variable">v</span>.<span class="cm-property">to</span>(<span class="cm-variable">device</span>) <span class="cm-keyword">for</span> <span class="cm-variable">k</span>, <span class="cm-variable">v</span> <span class="cm-keyword">in</span> <span class="cm-variable">batch</span>.<span class="cm-property">items</span>()}</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">outputs</span> <span class="cm-operator">=</span> <span class="cm-variable">model</span>(<span class="cm-operator">**</span><span class="cm-variable">batch</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">loss</span> <span class="cm-operator">=</span> <span class="cm-variable">outputs</span>.<span class="cm-property">loss</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">loss</span>.<span class="cm-property">backward</span>()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">optimizer</span>.<span class="cm-property">step</span>()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">lr_scheduler</span>.<span class="cm-property">step</span>()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">optimizer</span>.<span class="cm-property">zero_grad</span>()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">progress_bar</span>.<span class="cm-property">update</span>(<span class="cm-number">1</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> </span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment">### 评估 ###</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">import</span> <span class="cm-variable">evaluate</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">metric</span> <span class="cm-operator">=</span> <span class="cm-variable">evaluate</span>.<span class="cm-property">load</span>(<span class="cm-string">"glue"</span>, <span class="cm-string">"mrpc"</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">model</span>.<span class="cm-property">eval</span>()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">for</span> <span class="cm-variable">batch</span> <span class="cm-keyword">in</span> <span class="cm-variable">eval_dataloader</span>:</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">batch</span> <span class="cm-operator">=</span> {<span class="cm-variable">k</span>: <span class="cm-variable">v</span>.<span class="cm-property">to</span>(<span class="cm-variable">device</span>) <span class="cm-keyword">for</span> <span class="cm-variable">k</span>, <span class="cm-variable">v</span> <span class="cm-keyword">in</span> <span class="cm-variable">batch</span>.<span class="cm-property">items</span>()}</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-keyword">with</span> <span class="cm-variable">torch</span>.<span class="cm-property">no_grad</span>():</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">outputs</span> <span class="cm-operator">=</span> <span class="cm-variable">model</span>(<span class="cm-operator">**</span><span class="cm-variable">batch</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">logits</span> <span class="cm-operator">=</span> <span class="cm-variable">outputs</span>.<span class="cm-property">logits</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">predictions</span> <span class="cm-operator">=</span> <span class="cm-variable">torch</span>.<span class="cm-property">argmax</span>(<span class="cm-variable">logits</span>, <span class="cm-variable">dim</span><span class="cm-operator">=-</span><span class="cm-number">1</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">metric</span>.<span class="cm-property">add_batch</span>(<span class="cm-variable">predictions</span><span class="cm-operator">=</span><span class="cm-variable">predictions</span>, <span class="cm-variable">references</span><span class="cm-operator">=</span><span class="cm-variable">batch</span>[<span class="cm-string">"labels"</span>])</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">metric</span>.<span class="cm-property">compute</span>()</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 1870px;"/><div class="CodeMirror-gutters" style="display: none; height: 1870px;"/></div></div></pre></li><li><p><span>使用 </span><code>Accelerate</code><span> 加速训练:使用</span><code>Accelerate</code><span> 库,只需进行一些调整,我们就可以在多个 </span><code>GPU</code><span> 或</span><code>TPU</code><span> 上启用分布式训练,以下是改动的部分:</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="diff" style="break-inside: unset;"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="diff"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-positive">+ from accelerate import Accelerator</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> from transformers import AdamW, AutoModelForSequenceClassification, get_scheduler</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-positive">+ accelerator = Accelerator()</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> optimizer = AdamW(model.parameters(), lr=3e-5)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-negative">- device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-negative">- model.to(device)</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp=""> </span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-positive">+ train_dataloader, eval_dataloader, model, optimizer = accelerator.prepare(</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-positive">+ train_dataloader, eval_dataloader, model, optimizer</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-positive">+ )</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">