Full output of a fast run:

-- RUNPOD.IO --
Enjoy your Pod #eg6eikcnopftyk ^_^

root@8510995a57b3:/workspace/axolotl#
root@8510995a57b3:/workspace/axolotl# nvidia-smi
Fri Dec 29 02:01:48 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
| 30%   27C    P8    25W / 350W |      1MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  On   | 00000000:23:00.0 Off |                  N/A |
| 30%   27C    P8    21W / 350W |      1MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  On   | 00000000:41:00.0 Off |                  N/A |
| 30%   28C    P8    19W / 350W |      1MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  On   | 00000000:61:00.0 Off |                  N/A |
| 30%   27C    P8    27W / 350W |      1MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   4  NVIDIA GeForce ...  On   | 00000000:81:00.0 Off |                  N/A |
| 30%   27C    P8    20W / 350W |      1MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   5  NVIDIA GeForce ...  On   | 00000000:A1:00.0 Off |                  N/A |
| 30%   29C    P8    25W / 350W |      1MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   6  NVIDIA GeForce ...  On   | 00000000:C1:00.0 Off |                  N/A |
| 30%   27C    P8    23W / 350W |      1MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   7  NVIDIA GeForce ...  On   | 00000000:E1:00.0 Off |                  N/A |
| 30%   27C    P8    24W / 350W |      1MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
root@8510995a57b3:/workspace/axolotl# accelerate launch -m axolotl.cli.train examples/openllama-3b/lora.yml
The following values were not passed to `accelerate launch` and had defaults used instead:
	`--num_processes` was set to a value of `8`
		More than one GPU was found, enabling multi-GPU training.
		If this was unintended please pass in `--num_processes=1`.
	`--num_machines` was set to a value of `1`
	`--mixed_precision` was set to a value of `'no'`
	`--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:

================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================

  warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:

================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================

  warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:

================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================

  warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:

================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================

  warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:

================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================

  warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:

================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================

  warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:

================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================

  warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:

================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================

  warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
[2023-12-29 02:02:14,078] [INFO] [datasets.<module>:58] [PID:162] PyTorch version 2.0.1+cu118 available.
[2023-12-29 02:02:14,080] [INFO] [datasets.<module>:58] [PID:168] PyTorch version 2.0.1+cu118 available.
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
[2023-12-29 02:02:14,110] [INFO] [datasets.<module>:58] [PID:164] PyTorch version 2.0.1+cu118 available.
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
[2023-12-29 02:02:14,159] [INFO] [datasets.<module>:58] [PID:166] PyTorch version 2.0.1+cu118 available.
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
[2023-12-29 02:02:14,168] [INFO] [datasets.<module>:58] [PID:163] PyTorch version 2.0.1+cu118 available.
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
[2023-12-29 02:02:14,188] [INFO] [datasets.<module>:58] [PID:161] PyTorch version 2.0.1+cu118 available.
[2023-12-29 02:02:14,209] [INFO] [datasets.<module>:58] [PID:165] PyTorch version 2.0.1+cu118 available.
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
[2023-12-29 02:02:14,278] [INFO] [datasets.<module>:58] [PID:167] PyTorch version 2.0.1+cu118 available.
[2023-12-29 02:02:15,083] [INFO] [axolotl.validate_config:156] [PID:162] [RANK:1] bf16 support detected, but not enabled for this configuration.
[2023-12-29 02:02:15,083] [WARNING] [axolotl.validate_config:176] [PID:162] [RANK:1] `pad_to_sequence_len: true` is recommended when using sample_packing
[2023-12-29 02:02:15,090] [INFO] [axolotl.validate_config:156] [PID:168] [RANK:7] bf16 support detected, but not enabled for this configuration.
[2023-12-29 02:02:15,091] [WARNING] [axolotl.validate_config:176] [PID:168] [RANK:7] `pad_to_sequence_len: true` is recommended when using sample_packing
[2023-12-29 02:02:15,092] [INFO] [axolotl.validate_config:156] [PID:164] [RANK:3] bf16 support detected, but not enabled for this configuration.
[2023-12-29 02:02:15,092] [WARNING] [axolotl.validate_config:176] [PID:164] [RANK:3] `pad_to_sequence_len: true` is recommended when using sample_packing
[2023-12-29 02:02:15,150] [INFO] [axolotl.validate_config:156] [PID:166] [RANK:5] bf16 support detected, but not enabled for this configuration.
[2023-12-29 02:02:15,150] [WARNING] [axolotl.validate_config:176] [PID:166] [RANK:5] `pad_to_sequence_len: true` is recommended when using sample_packing
[2023-12-29 02:02:15,166] [INFO] [axolotl.validate_config:156] [PID:161] [RANK:0] bf16 support detected, but not enabled for this configuration.
[2023-12-29 02:02:15,166] [WARNING] [axolotl.validate_config:176] [PID:161] [RANK:0] `pad_to_sequence_len: true` is recommended when using sample_packing
[2023-12-29 02:02:15,196] [INFO] [axolotl.validate_config:156] [PID:163] [RANK:2] bf16 support detected, but not enabled for this configuration.
[2023-12-29 02:02:15,196] [WARNING] [axolotl.validate_config:176] [PID:163] [RANK:2] `pad_to_sequence_len: true` is recommended when using sample_packing
[2023-12-29 02:02:15,266] [INFO] [axolotl.validate_config:156] [PID:167] [RANK:6] bf16 support detected, but not enabled for this configuration.
[2023-12-29 02:02:15,266] [WARNING] [axolotl.validate_config:176] [PID:167] [RANK:6] `pad_to_sequence_len: true` is recommended when using sample_packing
[2023-12-29 02:02:15,320] [INFO] [axolotl.validate_config:156] [PID:165] [RANK:4] bf16 support detected, but not enabled for this configuration.
[2023-12-29 02:02:15,320] [WARNING] [axolotl.validate_config:176] [PID:165] [RANK:4] `pad_to_sequence_len: true` is recommended when using sample_packing
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 506/506 [00:00<00:00, 78.3kB/s]
[2023-12-29 02:02:15,389] [INFO] [axolotl.normalize_config:150] [PID:162] [RANK:1] GPU memory usage baseline: 0.000GB (+0.312GB misc)
[2023-12-29 02:02:15,402] [INFO] [axolotl.normalize_config:150] [PID:168] [RANK:7] GPU memory usage baseline: 0.000GB (+0.312GB misc)
[2023-12-29 02:02:15,402] [INFO] [axolotl.normalize_config:150] [PID:164] [RANK:3] GPU memory usage baseline: 0.000GB (+0.312GB misc)
[2023-12-29 02:02:15,413] [INFO] [axolotl.normalize_config:150] [PID:166] [RANK:5] GPU memory usage baseline: 0.000GB (+0.312GB misc)
[2023-12-29 02:02:15,417] [INFO] [axolotl.normalize_config:150] [PID:163] [RANK:2] GPU memory usage baseline: 0.000GB (+0.312GB misc)
[2023-12-29 02:02:15,436] [INFO] [axolotl.normalize_config:150] [PID:167] [RANK:6] GPU memory usage baseline: 0.000GB (+0.312GB misc)
[2023-12-29 02:02:15,436] [INFO] [axolotl.normalize_config:150] [PID:161] [RANK:0] GPU memory usage baseline: 0.000GB (+0.312GB misc)
[2023-12-29 02:02:15,492] [INFO] [axolotl.normalize_config:150] [PID:165] [RANK:4] GPU memory usage baseline: 0.000GB (+0.312GB misc)
[2023-12-29 02:02:15,496] [WARNING] [axolotl.scripts.check_user_token:342] [PID:165] [RANK:4] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
[2023-12-29 02:02:15,497] [WARNING] [axolotl.scripts.check_user_token:342] [PID:168] [RANK:7] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
[2023-12-29 02:02:15,498] [WARNING] [axolotl.scripts.check_user_token:342] [PID:166] [RANK:5] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
[2023-12-29 02:02:15,500] [WARNING] [axolotl.scripts.check_user_token:342] [PID:167] [RANK:6] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
                                 dP            dP   dP
                                 88            88   88
      .d8888b. dP.  .dP .d8888b. 88 .d8888b. d8888P 88
      88'  `88  `8bd8'  88'  `88 88 88'  `88   88   88
      88.  .88  .d88b.  88.  .88 88 88.  .88   88   88
      `88888P8 dP'  `dP `88888P' dP `88888P'   dP   dP

[2023-12-29 02:02:15,500] [WARNING] [axolotl.scripts.check_user_token:342] [PID:161] [RANK:0] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
[2023-12-29 02:02:15,501] [WARNING] [axolotl.scripts.check_user_token:342] [PID:163] [RANK:2] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
[2023-12-29 02:02:15,503] [WARNING] [axolotl.scripts.check_user_token:342] [PID:162] [RANK:1] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
[2023-12-29 02:02:15,503] [WARNING] [axolotl.scripts.check_user_token:342] [PID:164] [RANK:3] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 593/593 [00:00<00:00, 87.7kB/s]
tokenizer.model: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 512k/512k [00:00<00:00, 47.3MB/s]
special_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 330/330 [00:00<00:00, 284kB/s]
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in <https://github.com/huggingface/transformers/pull/24565>
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in <https://github.com/huggingface/transformers/pull/24565>
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in <https://github.com/huggingface/transformers/pull/24565>
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in <https://github.com/huggingface/transformers/pull/24565>
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in <https://github.com/huggingface/transformers/pull/24565>
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in <https://github.com/huggingface/transformers/pull/24565>
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in <https://github.com/huggingface/transformers/pull/24565>
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in <https://github.com/huggingface/transformers/pull/24565>
[2023-12-29 02:02:16,769] [DEBUG] [axolotl.load_tokenizer:184] [PID:164] [RANK:3] EOS: 2 / </s>
[2023-12-29 02:02:16,770] [DEBUG] [axolotl.load_tokenizer:185] [PID:164] [RANK:3] BOS: 1 / <s>
[2023-12-29 02:02:16,770] [DEBUG] [axolotl.load_tokenizer:186] [PID:164] [RANK:3] PAD: 2 / </s>
[2023-12-29 02:02:16,770] [DEBUG] [axolotl.load_tokenizer:187] [PID:164] [RANK:3] UNK: 0 / <unk>
[2023-12-29 02:02:16,774] [DEBUG] [axolotl.load_tokenizer:184] [PID:165] [RANK:4] EOS: 2 / </s>
[2023-12-29 02:02:16,774] [DEBUG] [axolotl.load_tokenizer:185] [PID:165] [RANK:4] BOS: 1 / <s>
[2023-12-29 02:02:16,774] [DEBUG] [axolotl.load_tokenizer:186] [PID:165] [RANK:4] PAD: 2 / </s>
[2023-12-29 02:02:16,775] [DEBUG] [axolotl.load_tokenizer:187] [PID:165] [RANK:4] UNK: 0 / <unk>
[2023-12-29 02:02:16,805] [DEBUG] [axolotl.load_tokenizer:184] [PID:166] [RANK:5] EOS: 2 / </s>
[2023-12-29 02:02:16,805] [DEBUG] [axolotl.load_tokenizer:185] [PID:166] [RANK:5] BOS: 1 / <s>
[2023-12-29 02:02:16,805] [DEBUG] [axolotl.load_tokenizer:186] [PID:166] [RANK:5] PAD: 2 / </s>
[2023-12-29 02:02:16,805] [DEBUG] [axolotl.load_tokenizer:187] [PID:166] [RANK:5] UNK: 0 / <unk>
[2023-12-29 02:02:16,814] [DEBUG] [axolotl.load_tokenizer:184] [PID:161] [RANK:0] EOS: 2 / </s>
[2023-12-29 02:02:16,814] [DEBUG] [axolotl.load_tokenizer:184] [PID:168] [RANK:7] EOS: 2 / </s>
[2023-12-29 02:02:16,814] [DEBUG] [axolotl.load_tokenizer:185] [PID:161] [RANK:0] BOS: 1 / <s>
[2023-12-29 02:02:16,814] [DEBUG] [axolotl.load_tokenizer:186] [PID:161] [RANK:0] PAD: 2 / </s>
[2023-12-29 02:02:16,814] [DEBUG] [axolotl.load_tokenizer:185] [PID:168] [RANK:7] BOS: 1 / <s>
[2023-12-29 02:02:16,814] [DEBUG] [axolotl.load_tokenizer:186] [PID:168] [RANK:7] PAD: 2 / </s>
[2023-12-29 02:02:16,814] [DEBUG] [axolotl.load_tokenizer:187] [PID:161] [RANK:0] UNK: 0 / <unk>
[2023-12-29 02:02:16,814] [DEBUG] [axolotl.load_tokenizer:187] [PID:168] [RANK:7] UNK: 0 / <unk>
[2023-12-29 02:02:16,815] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:161] [RANK:0] Unable to find prepared dataset in last_run_prepared/f9e5091071bf5ab6f7287bd5565a5f24
[2023-12-29 02:02:16,815] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:161] [RANK:0] Loading raw datasets...
[2023-12-29 02:02:16,815] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:161] [RANK:0] No seed provided, using default seed of 42
[2023-12-29 02:02:16,817] [DEBUG] [axolotl.load_tokenizer:184] [PID:162] [RANK:1] EOS: 2 / </s>
[2023-12-29 02:02:16,817] [DEBUG] [axolotl.load_tokenizer:185] [PID:162] [RANK:1] BOS: 1 / <s>
[2023-12-29 02:02:16,817] [DEBUG] [axolotl.load_tokenizer:186] [PID:162] [RANK:1] PAD: 2 / </s>
[2023-12-29 02:02:16,817] [DEBUG] [axolotl.load_tokenizer:187] [PID:162] [RANK:1] UNK: 0 / <unk>
[2023-12-29 02:02:16,823] [DEBUG] [axolotl.load_tokenizer:184] [PID:163] [RANK:2] EOS: 2 / </s>
[2023-12-29 02:02:16,823] [DEBUG] [axolotl.load_tokenizer:185] [PID:163] [RANK:2] BOS: 1 / <s>
[2023-12-29 02:02:16,823] [DEBUG] [axolotl.load_tokenizer:186] [PID:163] [RANK:2] PAD: 2 / </s>
[2023-12-29 02:02:16,823] [DEBUG] [axolotl.load_tokenizer:187] [PID:163] [RANK:2] UNK: 0 / <unk>
[2023-12-29 02:02:16,824] [DEBUG] [axolotl.load_tokenizer:184] [PID:167] [RANK:6] EOS: 2 / </s>
[2023-12-29 02:02:16,824] [DEBUG] [axolotl.load_tokenizer:185] [PID:167] [RANK:6] BOS: 1 / <s>
[2023-12-29 02:02:16,824] [DEBUG] [axolotl.load_tokenizer:186] [PID:167] [RANK:6] PAD: 2 / </s>
[2023-12-29 02:02:16,824] [DEBUG] [axolotl.load_tokenizer:187] [PID:167] [RANK:6] UNK: 0 / <unk>
Downloading readme: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 501/501 [00:00<00:00, 2.40MB/s]
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
  warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
Downloading data: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 36.0M/36.0M [00:01<00:00, 31.6MB/s]
Downloading data: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 4.91M/4.91M [00:00<00:00, 18.3MB/s]
Generating train split: 54568 examples [00:00, 113061.60 examples/s]
Map (num_proc=64):  82%|██████████████████████████████████████████████████████████████████▋              | 44909/54568 [00:01<00:00, 36136.89 examples/s][2023-12-29 02:02:24,963] [WARNING] [axolotl._tokenize:66] [PID:350] [RANK:0] Empty text requested for tokenization.
Map (num_proc=64): 100%|█████████████████████████████████████████████████████████████████████████████████| 54568/54568 [00:02<00:00, 26150.53 examples/s]
[2023-12-29 02:02:25,698] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:161] [RANK:0] merging datasets
[2023-12-29 02:02:25,704] [INFO] [axolotl.load_tokenized_prepared_datasets:369] [PID:161] [RANK:0] Saving merged prepared dataset to disk... last_run_prepared/f9e5091071bf5ab6f7287bd5565a5f24
Saving the dataset (1/1 shards): 100%|██████████████████████████████████████████████████████████████████| 54568/54568 [00:00<00:00, 549048.67 examples/s]
[2023-12-29 02:02:27,379] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:165] [RANK:4] Unable to find prepared dataset in last_run_prepared/f9e5091071bf5ab6f7287bd5565a5f24
[2023-12-29 02:02:27,379] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:167] [RANK:6] Unable to find prepared dataset in last_run_prepared/f9e5091071bf5ab6f7287bd5565a5f24
[2023-12-29 02:02:27,379] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:162] [RANK:1] Unable to find prepared dataset in last_run_prepared/f9e5091071bf5ab6f7287bd5565a5f24
[2023-12-29 02:02:27,379] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:165] [RANK:4] Loading raw datasets...
[2023-12-29 02:02:27,379] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:165] [RANK:4] No seed provided, using default seed of 42
[2023-12-29 02:02:27,379] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:168] [RANK:7] Unable to find prepared dataset in last_run_prepared/f9e5091071bf5ab6f7287bd5565a5f24
[2023-12-29 02:02:27,379] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:167] [RANK:6] Loading raw datasets...
[2023-12-29 02:02:27,379] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:162] [RANK:1] Loading raw datasets...
[2023-12-29 02:02:27,379] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:163] [RANK:2] Unable to find prepared dataset in last_run_prepared/f9e5091071bf5ab6f7287bd5565a5f24
[2023-12-29 02:02:27,379] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:167] [RANK:6] No seed provided, using default seed of 42
[2023-12-29 02:02:27,379] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:162] [RANK:1] No seed provided, using default seed of 42
[2023-12-29 02:02:27,379] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:168] [RANK:7] Loading raw datasets...
[2023-12-29 02:02:27,379] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:166] [RANK:5] Unable to find prepared dataset in last_run_prepared/f9e5091071bf5ab6f7287bd5565a5f24
[2023-12-29 02:02:27,379] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:163] [RANK:2] Loading raw datasets...
[2023-12-29 02:02:27,379] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:168] [RANK:7] No seed provided, using default seed of 42
[2023-12-29 02:02:27,380] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:163] [RANK:2] No seed provided, using default seed of 42
[2023-12-29 02:02:27,380] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:166] [RANK:5] Loading raw datasets...
[2023-12-29 02:02:27,380] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:166] [RANK:5] No seed provided, using default seed of 42
[2023-12-29 02:02:27,379] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:164] [RANK:3] Unable to find prepared dataset in last_run_prepared/f9e5091071bf5ab6f7287bd5565a5f24
[2023-12-29 02:02:27,380] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:164] [RANK:3] Loading raw datasets...
[2023-12-29 02:02:27,380] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:164] [RANK:3] No seed provided, using default seed of 42
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
  warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
  warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
  warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
  warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
  warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
  warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
  warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
Filter (num_proc=96):  65%|█████████████████████████████████████████████████▋                           | 34538/53476 [00:00<00:00, 102958.81 examples/s][2023-12-29 02:02:30,804] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:167] [RANK:6] merging datasets
[2023-12-29 02:02:30,820] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:162] [RANK:1] merging datasets
[2023-12-29 02:02:30,849] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:164] [RANK:3] merging datasets
Filter (num_proc=96): 100%|██████████████████████████████████████████████████████████████████████████████| 53476/53476 [00:00<00:00, 77709.35 examples/s]
[2023-12-29 02:02:30,893] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:168] [RANK:7] merging datasets
[2023-12-29 02:02:30,994] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:166] [RANK:5] merging datasets
[2023-12-29 02:02:31,001] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:163] [RANK:2] merging datasets
[2023-12-29 02:02:31,092] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:165] [RANK:4] merging datasets
Filter (num_proc=96): 100%|█████████████████████████████████████████████████████████████████████████████████| 1092/1092 [00:00<00:00, 2167.47 examples/s]
Map (num_proc=96): 100%|█████████████████████████████████████████████████████████████████████████████████| 53476/53476 [00:01<00:00, 39921.95 examples/s]
Map (num_proc=96): 100%|████████████████████████████████████████████████████████████████████████████████████| 1092/1092 [00:00<00:00, 1936.55 examples/s]
[2023-12-29 02:02:46,895] [DEBUG] [axolotl.log:60] [PID:161] [RANK:0] total_num_tokens: 188373
[2023-12-29 02:02:46,903] [DEBUG] [axolotl.log:60] [PID:161] [RANK:0] `total_supervised_tokens: 38104`
[2023-12-29 02:02:52,372] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 23546
[2023-12-29 02:02:52,372] [DEBUG] [axolotl.log:60] [PID:161] [RANK:0] data_loader_len: 87
[2023-12-29 02:02:53,430] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 1.0 total_num_tokens per device: 23546
[2023-12-29 02:02:53,468] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 1.0 total_num_tokens per device: 23546
[2023-12-29 02:02:53,473] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 1.0 total_num_tokens per device: 23546
[2023-12-29 02:02:53,473] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 1.0 total_num_tokens per device: 23546
[2023-12-29 02:02:53,475] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 1.0 total_num_tokens per device: 23546
[2023-12-29 02:02:53,599] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 1.0 total_num_tokens per device: 23546
[2023-12-29 02:02:53,832] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 1.0 total_num_tokens per device: 23546
[2023-12-29 02:02:53,873] [INFO] [axolotl.log:60] [PID:161] [RANK:0] sample_packing_eff_est across ranks: [0.9385612607002258, 0.9482371807098389, 0.9482371807098389, 0.9482371807098389, 0.9385612607002258, 0.9385612607002258, 0.9482371807098389, 0.9385612607002258]
[2023-12-29 02:02:53,878] [DEBUG] [axolotl.log:60] [PID:161] [RANK:0] sample_packing_eff_est: None
[2023-12-29 02:02:53,878] [DEBUG] [axolotl.log:60] [PID:161] [RANK:0] total_num_steps: 43
[2023-12-29 02:02:53,920] [DEBUG] [axolotl.log:60] [PID:161] [RANK:0] total_num_tokens: 10733491
[2023-12-29 02:02:54,255] [DEBUG] [axolotl.log:60] [PID:161] [RANK:0] `total_supervised_tokens: 6735490`
[2023-12-29 02:02:54,408] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 1341686
[2023-12-29 02:02:54,408] [DEBUG] [axolotl.log:60] [PID:161] [RANK:0] data_loader_len: 5183
[2023-12-29 02:02:54,470] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 1.0 total_num_tokens per device: 1341686
[2023-12-29 02:02:54,470] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 1.0 total_num_tokens per device: 1341686
[2023-12-29 02:02:54,471] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 1.0 total_num_tokens per device: 1341686
[2023-12-29 02:02:54,473] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 1.0 total_num_tokens per device: 1341686
[2023-12-29 02:02:54,477] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 1.0 total_num_tokens per device: 1341686
[2023-12-29 02:02:54,482] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 1.0 total_num_tokens per device: 1341686
[2023-12-29 02:02:54,484] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 1.0 total_num_tokens per device: 1341686
[2023-12-29 02:02:54,485] [INFO] [axolotl.log:60] [PID:161] [RANK:0] sample_packing_eff_est across ranks: [0.9308992028236389, 0.9320580363273621, 0.9318923354148865, 0.9327215552330017, 0.9300732016563416, 0.9310645461082458, 0.9299081563949585, 0.9308992028236389]
[2023-12-29 02:02:54,490] [DEBUG] [axolotl.log:60] [PID:161] [RANK:0] sample_packing_eff_est: 0.94
[2023-12-29 02:02:54,490] [DEBUG] [axolotl.log:60] [PID:161] [RANK:0] total_num_steps: 2591
[2023-12-29 02:02:54,496] [DEBUG] [axolotl.train.log:60] [PID:161] [RANK:0] loading tokenizer... openlm-research/open_llama_3b_v2
[2023-12-29 02:02:54,821] [DEBUG] [axolotl.load_tokenizer:184] [PID:161] [RANK:0] EOS: 2 / </s>
[2023-12-29 02:02:54,821] [DEBUG] [axolotl.load_tokenizer:185] [PID:161] [RANK:0] BOS: 1 / <s>
[2023-12-29 02:02:54,821] [DEBUG] [axolotl.load_tokenizer:186] [PID:161] [RANK:0] PAD: 2 / </s>
[2023-12-29 02:02:54,821] [DEBUG] [axolotl.load_tokenizer:187] [PID:161] [RANK:0] UNK: 0 / <unk>
[2023-12-29 02:02:54,821] [DEBUG] [axolotl.train.log:60] [PID:161] [RANK:0] loading model and peft_config...
[2023-12-29 02:02:54,827] [DEBUG] [axolotl.load_tokenizer:184] [PID:162] [RANK:1] EOS: 2 / </s>
[2023-12-29 02:02:54,827] [DEBUG] [axolotl.load_tokenizer:185] [PID:162] [RANK:1] BOS: 1 / <s>
[2023-12-29 02:02:54,827] [DEBUG] [axolotl.load_tokenizer:186] [PID:162] [RANK:1] PAD: 2 / </s>
[2023-12-29 02:02:54,827] [DEBUG] [axolotl.load_tokenizer:187] [PID:162] [RANK:1] UNK: 0 / <unk>
[2023-12-29 02:02:54,828] [DEBUG] [axolotl.load_tokenizer:184] [PID:163] [RANK:2] EOS: 2 / </s>
[2023-12-29 02:02:54,828] [DEBUG] [axolotl.load_tokenizer:185] [PID:163] [RANK:2] BOS: 1 / <s>
[2023-12-29 02:02:54,828] [DEBUG] [axolotl.load_tokenizer:186] [PID:163] [RANK:2] PAD: 2 / </s>
[2023-12-29 02:02:54,828] [DEBUG] [axolotl.load_tokenizer:187] [PID:163] [RANK:2] UNK: 0 / <unk>
[2023-12-29 02:02:54,828] [DEBUG] [axolotl.load_tokenizer:184] [PID:168] [RANK:7] EOS: 2 / </s>
[2023-12-29 02:02:54,828] [DEBUG] [axolotl.load_tokenizer:185] [PID:168] [RANK:7] BOS: 1 / <s>
[2023-12-29 02:02:54,828] [DEBUG] [axolotl.load_tokenizer:186] [PID:168] [RANK:7] PAD: 2 / </s>
[2023-12-29 02:02:54,829] [DEBUG] [axolotl.load_tokenizer:187] [PID:168] [RANK:7] UNK: 0 / <unk>
[2023-12-29 02:02:54,837] [DEBUG] [axolotl.load_tokenizer:184] [PID:165] [RANK:4] EOS: 2 / </s>
[2023-12-29 02:02:54,837] [DEBUG] [axolotl.load_tokenizer:185] [PID:165] [RANK:4] BOS: 1 / <s>
[2023-12-29 02:02:54,837] [DEBUG] [axolotl.load_tokenizer:186] [PID:165] [RANK:4] PAD: 2 / </s>
[2023-12-29 02:02:54,837] [DEBUG] [axolotl.load_tokenizer:187] [PID:165] [RANK:4] UNK: 0 / <unk>
[2023-12-29 02:02:54,837] [DEBUG] [axolotl.load_tokenizer:184] [PID:167] [RANK:6] EOS: 2 / </s>
[2023-12-29 02:02:54,837] [DEBUG] [axolotl.load_tokenizer:185] [PID:167] [RANK:6] BOS: 1 / <s>
[2023-12-29 02:02:54,837] [DEBUG] [axolotl.load_tokenizer:186] [PID:167] [RANK:6] PAD: 2 / </s>
[2023-12-29 02:02:54,837] [DEBUG] [axolotl.load_tokenizer:187] [PID:167] [RANK:6] UNK: 0 / <unk>
[2023-12-29 02:02:54,842] [DEBUG] [axolotl.load_tokenizer:184] [PID:166] [RANK:5] EOS: 2 / </s>
[2023-12-29 02:02:54,842] [DEBUG] [axolotl.load_tokenizer:185] [PID:166] [RANK:5] BOS: 1 / <s>
[2023-12-29 02:02:54,842] [DEBUG] [axolotl.load_tokenizer:186] [PID:166] [RANK:5] PAD: 2 / </s>
[2023-12-29 02:02:54,842] [DEBUG] [axolotl.load_tokenizer:187] [PID:166] [RANK:5] UNK: 0 / <unk>
[2023-12-29 02:02:54,923] [DEBUG] [axolotl.load_tokenizer:184] [PID:164] [RANK:3] EOS: 2 / </s>
[2023-12-29 02:02:54,923] [DEBUG] [axolotl.load_tokenizer:185] [PID:164] [RANK:3] BOS: 1 / <s>
[2023-12-29 02:02:54,923] [DEBUG] [axolotl.load_tokenizer:186] [PID:164] [RANK:3] PAD: 2 / </s>
[2023-12-29 02:02:54,923] [DEBUG] [axolotl.load_tokenizer:187] [PID:164] [RANK:3] UNK: 0 / <unk>
[2023-12-29 02:02:54,975] [INFO] [axolotl.load_model:232] [PID:161] [RANK:0] patching with flash attention for sample packing
[2023-12-29 02:02:54,975] [INFO] [axolotl.load_model:278] [PID:161] [RANK:0] patching _expand_mask
[2023-12-29 02:02:54,980] [INFO] [axolotl.load_model:232] [PID:162] [RANK:1] patching with flash attention for sample packing
[2023-12-29 02:02:54,980] [INFO] [axolotl.load_model:278] [PID:162] [RANK:1] patching _expand_mask
[2023-12-29 02:02:54,987] [INFO] [axolotl.load_model:232] [PID:167] [RANK:6] patching with flash attention for sample packing
[2023-12-29 02:02:54,987] [INFO] [axolotl.load_model:278] [PID:167] [RANK:6] patching _expand_mask
[2023-12-29 02:02:54,989] [INFO] [axolotl.load_model:232] [PID:168] [RANK:7] patching with flash attention for sample packing
[2023-12-29 02:02:54,990] [INFO] [axolotl.load_model:278] [PID:168] [RANK:7] patching _expand_mask
[2023-12-29 02:02:54,998] [INFO] [axolotl.load_model:232] [PID:165] [RANK:4] patching with flash attention for sample packing
[2023-12-29 02:02:54,999] [INFO] [axolotl.load_model:278] [PID:165] [RANK:4] patching _expand_mask
[2023-12-29 02:02:55,012] [INFO] [axolotl.load_model:232] [PID:163] [RANK:2] patching with flash attention for sample packing
[2023-12-29 02:02:55,012] [INFO] [axolotl.load_model:278] [PID:163] [RANK:2] patching _expand_mask
[2023-12-29 02:02:55,078] [INFO] [axolotl.load_model:232] [PID:164] [RANK:3] patching with flash attention for sample packing
[2023-12-29 02:02:55,079] [INFO] [axolotl.load_model:278] [PID:164] [RANK:3] patching _expand_mask
[2023-12-29 02:02:55,111] [INFO] [axolotl.load_model:232] [PID:166] [RANK:5] patching with flash attention for sample packing
[2023-12-29 02:02:55,112] [INFO] [axolotl.load_model:278] [PID:166] [RANK:5] patching _expand_mask
pytorch_model.bin: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 6.85G/6.85G [00:17<00:00, 394MB/s]
generation_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 137/137 [00:00<00:00, 116kB/s]
[2023-12-29 02:03:21,252] [INFO] [axolotl.load_model:503] [PID:161] [RANK:0] GPU memory usage after model load: 3.408GB (+0.334GB cache, +1.303GB misc)
[2023-12-29 02:03:21,255] [INFO] [axolotl.load_model:526] [PID:161] [RANK:0] converting PEFT model w/ prepare_model_for_kbit_training
[2023-12-29 02:03:21,258] [INFO] [axolotl.load_model:538] [PID:161] [RANK:0] converting modules to torch.float16 for flash attention
[2023-12-29 02:03:21,289] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda.<module>:16] [PID:161] CUDA extension not installed.
[2023-12-29 02:03:21,290] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda_old.<module>:15] [PID:161] CUDA extension not installed.
[2023-12-29 02:03:21,304] [INFO] [axolotl.load_model:503] [PID:165] [RANK:4] GPU memory usage after model load: 3.408GB (+0.334GB cache, +1.303GB misc)
[2023-12-29 02:03:21,307] [INFO] [axolotl.load_model:526] [PID:165] [RANK:4] converting PEFT model w/ prepare_model_for_kbit_training
[2023-12-29 02:03:21,311] [INFO] [axolotl.load_model:538] [PID:165] [RANK:4] converting modules to torch.float16 for flash attention
[2023-12-29 02:03:21,342] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda.<module>:16] [PID:165] CUDA extension not installed.
[2023-12-29 02:03:21,343] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda_old.<module>:15] [PID:165] CUDA extension not installed.
trainable params: 12,712,960 || all params: 3,439,186,560 || trainable%: 0.36965020007521776
[2023-12-29 02:03:21,524] [INFO] [axolotl.load_model:568] [PID:161] [RANK:0] GPU memory usage after adapters: 3.455GB (+1.099GB cache, +1.303GB misc)
[2023-12-29 02:03:21,559] [INFO] [axolotl.train.log:60] [PID:161] [RANK:0] Pre-saving adapter config to ./lora-out
[2023-12-29 02:03:21,562] [INFO] [axolotl.train.log:60] [PID:161] [RANK:0] Starting trainer...
trainable params: 12,712,960 || all params: 3,439,186,560 || trainable%: 0.36965020007521776
[2023-12-29 02:03:21,585] [INFO] [axolotl.load_model:568] [PID:165] [RANK:4] GPU memory usage after adapters: 3.455GB (+1.099GB cache, +1.303GB misc)
[2023-12-29 02:03:21,932] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:21,965] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:21,997] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:22,028] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:22,072] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:22,104] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:22,136] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:22,167] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:22,955] [INFO] [axolotl.load_model:503] [PID:163] [RANK:2] GPU memory usage after model load: 3.408GB (+0.334GB cache, +1.303GB misc)
[2023-12-29 02:03:22,958] [INFO] [axolotl.load_model:526] [PID:163] [RANK:2] converting PEFT model w/ prepare_model_for_kbit_training
[2023-12-29 02:03:22,961] [INFO] [axolotl.load_model:538] [PID:163] [RANK:2] converting modules to torch.float16 for flash attention
[2023-12-29 02:03:22,993] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda.<module>:16] [PID:163] CUDA extension not installed.
[2023-12-29 02:03:22,993] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda_old.<module>:15] [PID:163] CUDA extension not installed.
[2023-12-29 02:03:23,072] [INFO] [axolotl.load_model:503] [PID:166] [RANK:5] GPU memory usage after model load: 3.408GB (+0.334GB cache, +1.303GB misc)
[2023-12-29 02:03:23,075] [INFO] [axolotl.load_model:526] [PID:166] [RANK:5] converting PEFT model w/ prepare_model_for_kbit_training
[2023-12-29 02:03:23,079] [INFO] [axolotl.load_model:538] [PID:166] [RANK:5] converting modules to torch.float16 for flash attention
[2023-12-29 02:03:23,100] [INFO] [axolotl.load_model:503] [PID:162] [RANK:1] GPU memory usage after model load: 3.408GB (+0.334GB cache, +1.303GB misc)
[2023-12-29 02:03:23,103] [INFO] [axolotl.load_model:526] [PID:162] [RANK:1] converting PEFT model w/ prepare_model_for_kbit_training
[2023-12-29 02:03:23,107] [INFO] [axolotl.load_model:538] [PID:162] [RANK:1] converting modules to torch.float16 for flash attention
[2023-12-29 02:03:23,110] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda.<module>:16] [PID:166] CUDA extension not installed.
[2023-12-29 02:03:23,110] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda_old.<module>:15] [PID:166] CUDA extension not installed.
[2023-12-29 02:03:23,139] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda.<module>:16] [PID:162] CUDA extension not installed.
[2023-12-29 02:03:23,139] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda_old.<module>:15] [PID:162] CUDA extension not installed.
trainable params: 12,712,960 || all params: 3,439,186,560 || trainable%: 0.36965020007521776
[2023-12-29 02:03:23,237] [INFO] [axolotl.load_model:568] [PID:163] [RANK:2] GPU memory usage after adapters: 3.455GB (+1.099GB cache, +1.303GB misc)
trainable params: 12,712,960 || all params: 3,439,186,560 || trainable%: 0.36965020007521776
[2023-12-29 02:03:23,354] [INFO] [axolotl.load_model:568] [PID:166] [RANK:5] GPU memory usage after adapters: 3.455GB (+1.099GB cache, +1.303GB misc)
trainable params: 12,712,960 || all params: 3,439,186,560 || trainable%: 0.36965020007521776
[2023-12-29 02:03:23,388] [INFO] [axolotl.load_model:568] [PID:162] [RANK:1] GPU memory usage after adapters: 3.455GB (+1.099GB cache, +1.303GB misc)
[2023-12-29 02:03:23,443] [INFO] [axolotl.load_model:503] [PID:167] [RANK:6] GPU memory usage after model load: 3.408GB (+0.334GB cache, +1.303GB misc)
[2023-12-29 02:03:23,446] [INFO] [axolotl.load_model:526] [PID:167] [RANK:6] converting PEFT model w/ prepare_model_for_kbit_training
[2023-12-29 02:03:23,450] [INFO] [axolotl.load_model:538] [PID:167] [RANK:6] converting modules to torch.float16 for flash attention
[2023-12-29 02:03:23,481] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda.<module>:16] [PID:167] CUDA extension not installed.
[2023-12-29 02:03:23,482] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda_old.<module>:15] [PID:167] CUDA extension not installed.
[2023-12-29 02:03:23,597] [INFO] [axolotl.load_model:503] [PID:168] [RANK:7] GPU memory usage after model load: 3.408GB (+0.334GB cache, +1.303GB misc)
[2023-12-29 02:03:23,600] [INFO] [axolotl.load_model:526] [PID:168] [RANK:7] converting PEFT model w/ prepare_model_for_kbit_training
[2023-12-29 02:03:23,604] [INFO] [axolotl.load_model:538] [PID:168] [RANK:7] converting modules to torch.float16 for flash attention
[2023-12-29 02:03:23,635] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda.<module>:16] [PID:168] CUDA extension not installed.
[2023-12-29 02:03:23,636] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda_old.<module>:15] [PID:168] CUDA extension not installed.
[2023-12-29 02:03:23,685] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
trainable params: 12,712,960 || all params: 3,439,186,560 || trainable%: 0.36965020007521776
[2023-12-29 02:03:23,717] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:23,720] [INFO] [axolotl.load_model:568] [PID:167] [RANK:6] GPU memory usage after adapters: 3.455GB (+1.099GB cache, +1.303GB misc)
[2023-12-29 02:03:23,742] [INFO] [axolotl.load_model:503] [PID:164] [RANK:3] GPU memory usage after model load: 3.408GB (+0.334GB cache, +1.303GB misc)
[2023-12-29 02:03:23,745] [INFO] [axolotl.load_model:526] [PID:164] [RANK:3] converting PEFT model w/ prepare_model_for_kbit_training
[2023-12-29 02:03:23,749] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:23,750] [INFO] [axolotl.load_model:538] [PID:164] [RANK:3] converting modules to torch.float16 for flash attention
[2023-12-29 02:03:23,781] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:23,786] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda.<module>:16] [PID:164] CUDA extension not installed.
[2023-12-29 02:03:23,786] [WARNING] [auto_gptq.nn_modules.qlinear.qlinear_cuda_old.<module>:15] [PID:164] CUDA extension not installed.
[2023-12-29 02:03:23,826] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:23,853] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:23,858] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
trainable params: 12,712,960 || all params: 3,439,186,560 || trainable%: 0.36965020007521776
[2023-12-29 02:03:23,878] [INFO] [axolotl.load_model:568] [PID:168] [RANK:7] GPU memory usage after adapters: 3.455GB (+1.099GB cache, +1.303GB misc)
[2023-12-29 02:03:23,886] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:23,890] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:23,917] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:23,924] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:23,949] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
trainable params: 12,712,960 || all params: 3,439,186,560 || trainable%: 0.36965020007521776
[2023-12-29 02:03:24,104] [INFO] [axolotl.load_model:568] [PID:164] [RANK:3] GPU memory usage after adapters: 3.455GB (+1.099GB cache, +1.303GB misc)
[2023-12-29 02:03:24,178] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:24,215] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:24,248] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:24,280] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:24,325] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:24,357] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:24,391] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:24,425] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:24,902] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:24,939] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:24,971] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:25,003] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
  0%|                                                                                                                           | 0/2752 [00:00<?, ?it/s][2023-12-29 02:03:25,504] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:25,504] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:25,505] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:25,506] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:25,507] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:25,508] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:25,510] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:25,511] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:25,536] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:25,536] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:25,537] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:25,539] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:25,539] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:25,541] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:25,542] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:03:25,543] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
{'loss': 1.3883, 'learning_rate': 1e-05, 'epoch': 0.0}
  0%|                                                                                                                 | 1/2752 [00:02<1:32:27,  2.02s/it][2023-12-29 02:03:27,515] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,515] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,515] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,515] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,515] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,516] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,516] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,516] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,516] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,516] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,516] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,517] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,517] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,517] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,518] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,518] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,768] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,769] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,770] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:27,770] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:03:28,032] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:28,033] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:03:28,296] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:28,297] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:03:28,547] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:28,547] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:03:28,827] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:28,827] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:03:29,087] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:29,088] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:03:29,337] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:29,338] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:03:29,596] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:29,597] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:03:29,853] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:29,853] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:03:30,123] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:30,123] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:03:30,381] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:30,382] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:03:30,641] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:03:30,641] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 1.6172605752944946, 'eval_runtime': 3.1392, 'eval_samples_per_second': 347.857, 'eval_steps_per_second': 21.98, 'epoch': 0.0}
  0%|                  [2023-12-29 02:03:31,652] [INFO] [axolotl.callbacks.on_step_end:122] [PID:161] [RANK:0] GPU memory usage while training: 3.553GB (+3.812GB cache, +1.668GB misc)
  0%|                                                                                                                 | 2/2752 [00:06<2:30:21,  3.28s/it][2023-12-29 02:03:31,654] [INFO] [axolotl.callbacks.on_step_end:122] [PID:162] [RANK:1] GPU memory usage while training: 3.553GB (+3.812GB cache, +1.668GB misc)
[2023-12-29 02:03:31,655] [INFO] [axolotl.callbacks.on_step_end:122] [PID:166] [RANK:5] GPU memory usage while training: 3.554GB (+3.812GB cache, +1.668GB misc)
[2023-12-29 02:03:31,656] [INFO] [axolotl.callbacks.on_step_end:122] [PID:168] [RANK:7] GPU memory usage while training: 3.553GB (+3.812GB cache, +1.668GB misc)
[2023-12-29 02:03:31,656] [INFO] [axolotl.callbacks.on_step_end:122] [PID:165] [RANK:4] GPU memory usage while training: 3.554GB (+3.812GB cache, +1.668GB misc)
[2023-12-29 02:03:31,657] [INFO] [axolotl.callbacks.on_step_end:122] [PID:163] [RANK:2] GPU memory usage while training: 3.553GB (+3.812GB cache, +1.668GB misc)
[2023-12-29 02:03:31,658] [INFO] [axolotl.callbacks.on_step_end:122] [PID:167] [RANK:6] GPU memory usage while training: 3.553GB (+3.812GB cache, +1.668GB misc)
[2023-12-29 02:03:31,659] [INFO] [axolotl.callbacks.on_step_end:122] [PID:164] [RANK:3] GPU memory usage while training: 3.552GB (+3.813GB cache, +1.668GB misc)
{'loss': 1.2717, 'learning_rate': 2e-05, 'epoch': 0.0}
{'loss': 1.4102, 'learning_rate': 3e-05, 'epoch': 0.0}
{'loss': 1.2598, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 1.4164, 'learning_rate': 5e-05, 'epoch': 0.01}
{'loss': 1.3747, 'learning_rate': 6e-05, 'epoch': 0.01}
{'loss': 1.2655, 'learning_rate': 7e-05, 'epoch': 0.01}
{'loss': 1.3905, 'learning_rate': 8e-05, 'epoch': 0.01}
{'loss': 1.3399, 'learning_rate': 9e-05, 'epoch': 0.01}
{'loss': 1.2916, 'learning_rate': 0.0001, 'epoch': 0.01}
{'loss': 1.292, 'learning_rate': 0.00011000000000000002, 'epoch': 0.02}
{'loss': 1.2863, 'learning_rate': 0.00012, 'epoch': 0.02}
{'loss': 1.4121, 'learning_rate': 0.00013000000000000002, 'epoch': 0.02}
{'loss': 1.2416, 'learning_rate': 0.00014, 'epoch': 0.02}
{'loss': 1.1848, 'learning_rate': 0.00015000000000000001, 'epoch': 0.02}
{'loss': 1.2452, 'learning_rate': 0.00016, 'epoch': 0.02}
{'loss': 1.271, 'learning_rate': 0.00017, 'epoch': 0.02}
{'loss': 1.1541, 'learning_rate': 0.00018, 'epoch': 0.03}
{'loss': 1.1802, 'learning_rate': 0.00019, 'epoch': 0.03}
{'loss': 1.1818, 'learning_rate': 0.0002, 'epoch': 0.03}
{'loss': 1.157, 'learning_rate': 0.00019999993388373499, 'epoch': 0.03}
{'loss': 1.179, 'learning_rate': 0.00019999973553502733, 'epoch': 0.03}
{'loss': 1.2595, 'learning_rate': 0.00019999940495413936, 'epoch': 0.03}
{'loss': 1.1576, 'learning_rate': 0.00019999894214150818, 'epoch': 0.03}
{'loss': 1.1034, 'learning_rate': 0.00019999834709774576, 'epoch': 0.04}
{'loss': 1.1498, 'learning_rate': 0.000199997619823639, 'epoch': 0.04}
{'loss': 1.2154, 'learning_rate': 0.00019999676032014953, 'epoch': 0.04}
{'loss': 1.1614, 'learning_rate': 0.00019999576858841395, 'epoch': 0.04}
{'loss': 1.2372, 'learning_rate': 0.0001999946446297436, 'epoch': 0.04}
{'loss': 1.1168, 'learning_rate': 0.00019999338844562477, 'epoch': 0.04}
{'loss': 1.1751, 'learning_rate': 0.0001999920000377185, 'epoch': 0.05}
{'loss': 1.1843, 'learning_rate': 0.00019999047940786073, 'epoch': 0.05}
{'loss': 1.1122, 'learning_rate': 0.00019998882655806224, 'epoch': 0.05}
{'loss': 1.1952, 'learning_rate': 0.00019998704149050864, 'epoch': 0.05}
{'loss': 1.1519, 'learning_rate': 0.00019998512420756032, 'epoch': 0.05}
{'loss': 1.1951, 'learning_rate': 0.00019998307471175264, 'epoch': 0.05}
{'loss': 1.188, 'learning_rate': 0.00019998089300579558, 'epoch': 0.05}
{'loss': 1.2015, 'learning_rate': 0.0001999785790925742, 'epoch': 0.06}
{'loss': 1.1764, 'learning_rate': 0.00019997613297514816, 'epoch': 0.06}
{'loss': 1.0991, 'learning_rate': 0.00019997355465675205, 'epoch': 0.06}
{'loss': 1.1394, 'learning_rate': 0.0001999708441407952, 'epoch': 0.06}
{'loss': 1.1221, 'learning_rate': 0.00019996800143086188, 'epoch': 0.06}
{'loss': 1.1235, 'learning_rate': 0.000199965026530711, 'epoch': 0.06}
{'loss': 1.2169, 'learning_rate': 0.00019996191944427638, 'epoch': 0.06}
{'loss': 1.1213, 'learning_rate': 0.0001999586801756666, 'epoch': 0.07}
{'loss': 1.1409, 'learning_rate': 0.00019995530872916501, 'epoch': 0.07}
{'loss': 1.2117, 'learning_rate': 0.0001999518051092298, 'epoch': 0.07}
{'loss': 1.137, 'learning_rate': 0.00019994816932049383, 'epoch': 0.07}
{'loss': 1.1152, 'learning_rate': 0.00019994440136776484, 'epoch': 0.07}
{'loss': 1.1461, 'learning_rate': 0.0001999405012560253, 'epoch': 0.07}
{'loss': 1.1633, 'learning_rate': 0.00019993646899043238, 'epoch': 0.07}
{'loss': 1.0886, 'learning_rate': 0.0001999323045763181, 'epoch': 0.08}
{'loss': 1.0954, 'learning_rate': 0.00019992800801918914, 'epoch': 0.08}
{'loss': 1.1329, 'learning_rate': 0.00019992357932472693, 'epoch': 0.08}
{'loss': 1.1167, 'learning_rate': 0.00019991901849878766, 'epoch': 0.08}
{'loss': 1.0483, 'learning_rate': 0.00019991432554740225, 'epoch': 0.08}
{'loss': 1.0399, 'learning_rate': 0.0001999095004767763, 'epoch': 0.08}
{'loss': 1.1457, 'learning_rate': 0.0001999045432932901, 'epoch': 0.08}
{'loss': 1.1898, 'learning_rate': 0.00019989945400349866, 'epoch': 0.09}
{'loss': 1.0845, 'learning_rate': 0.0001998942326141317, 'epoch': 0.09}
{'loss': 1.1339, 'learning_rate': 0.00019988887913209355, 'epoch': 0.09}
{'loss': 1.1488, 'learning_rate': 0.00019988339356446334, 'epoch': 0.09}
{'loss': 1.1876, 'learning_rate': 0.00019987777591849468, 'epoch': 0.09}
{'loss': 1.1047, 'learning_rate': 0.000199872026201616, 'epoch': 0.09}
{'loss': 1.1951, 'learning_rate': 0.00019986614442143023, 'epoch': 0.09}
{'loss': 1.0934, 'learning_rate': 0.00019986013058571504, 'epoch': 0.1}
{'loss': 1.1218, 'learning_rate': 0.00019985398470242268, 'epoch': 0.1}
{'loss': 1.0293, 'learning_rate': 0.00019984770677968, 'epoch': 0.1}
{'loss': 1.1087, 'learning_rate': 0.00019984129682578842, 'epoch': 0.1}
{'loss': 1.1243, 'learning_rate': 0.00019983475484922406, 'epoch': 0.1}
{'loss': 1.1438, 'learning_rate': 0.00019982808085863745, 'epoch': 0.1}
{'loss': 1.1484, 'learning_rate': 0.00019982127486285384, 'epoch': 0.1}
{'loss': 1.0461, 'learning_rate': 0.00019981433687087295, 'epoch': 0.11}
{'loss': 1.142, 'learning_rate': 0.00019980726689186907, 'epoch': 0.11}
{'loss': 1.0223, 'learning_rate': 0.000199800064935191, 'epoch': 0.11}
{'loss': 1.0888, 'learning_rate': 0.0001997927310103621, 'epoch': 0.11}
{'loss': 1.1318, 'learning_rate': 0.00019978526512708013, 'epoch': 0.11}
{'loss': 0.9538, 'learning_rate': 0.00019977766729521753, 'epoch': 0.11}
{'loss': 1.1368, 'learning_rate': 0.000199769937524821, 'epoch': 0.11}
{'loss': 1.1407, 'learning_rate': 0.00019976207582611189, 'epoch': 0.12}
{'loss': 1.1756, 'learning_rate': 0.00019975408220948584, 'epoch': 0.12}
{'loss': 1.1115, 'learning_rate': 0.0001997459566855131, 'epoch': 0.12}
{'loss': 1.0728, 'learning_rate': 0.0001997376992649382, 'epoch': 0.12}
{'loss': 1.1101, 'learning_rate': 0.00019972930995868014, 'epoch': 0.12}
{'loss': 1.0907, 'learning_rate': 0.00019972078877783232, 'epoch': 0.12}
{'loss': 1.1709, 'learning_rate': 0.0001997121357336625, 'epoch': 0.12}
{'loss': 1.08, 'learning_rate': 0.0001997033508376129, 'epoch': 0.13}
{'loss': 1.1698, 'learning_rate': 0.0001996944341012999, 'epoch': 0.13}
{'loss': 1.1646, 'learning_rate': 0.00019968538553651437, 'epoch': 0.13}
{'loss': 1.1041, 'learning_rate': 0.00019967620515522146, 'epoch': 0.13}
{'loss': 1.244, 'learning_rate': 0.00019966689296956064, 'epoch': 0.13}
{'loss': 1.2206, 'learning_rate': 0.0001996574489918456, 'epoch': 0.13}
{'loss': 1.0939, 'learning_rate': 0.00019964787323456436, 'epoch': 0.14}
{'loss': 1.0881, 'learning_rate': 0.00019963816571037923, 'epoch': 0.14}
{'loss': 1.1767, 'learning_rate': 0.00019962832643212667, 'epoch': 0.14}
{'loss': 1.0859, 'learning_rate': 0.00019961835541281746, 'epoch': 0.14}
{'loss': 1.1681, 'learning_rate': 0.00019960825266563648, 'epoch': 0.14}
{'loss': 1.0445, 'learning_rate': 0.00019959801820394285, 'epoch': 0.14}
{'loss': 1.1215, 'learning_rate': 0.00019958765204126987, 'epoch': 0.14}
{'loss': 1.0394, 'learning_rate': 0.00019957715419132498, 'epoch': 0.15}
{'loss': 1.1211, 'learning_rate': 0.00019956652466798978, 'epoch': 0.15}
{'loss': 1.1266, 'learning_rate': 0.00019955576348531994, 'epoch': 0.15}
{'loss': 1.1694, 'learning_rate': 0.00019954487065754518, 'epoch': 0.15}
{'loss': 1.0359, 'learning_rate': 0.00019953384619906945, 'epoch': 0.15}
{'loss': 1.107, 'learning_rate': 0.00019952269012447064, 'epoch': 0.15}
{'loss': 1.0611, 'learning_rate': 0.0001995114024485007, 'epoch': 0.15}
{'loss': 1.0891, 'learning_rate': 0.00019949998318608561, 'epoch': 0.16}
{'loss': 1.093, 'learning_rate': 0.00019948843235232535, 'epoch': 0.16}
{'loss': 1.1103, 'learning_rate': 0.00019947674996249393, 'epoch': 0.16}
{'loss': 1.0403, 'learning_rate': 0.00019946493603203918, 'epoch': 0.16}
{'loss': 1.1186, 'learning_rate': 0.000199452990576583, 'epoch': 0.16}
{'loss': 0.9921, 'learning_rate': 0.0001994409136119212, 'epoch': 0.16}
{'loss': 1.1533, 'learning_rate': 0.00019942870515402345, 'epoch': 0.16}
{'loss': 1.0887, 'learning_rate': 0.00019941636521903321, 'epoch': 0.17}
{'loss': 1.0476, 'learning_rate': 0.00019940389382326802, 'epoch': 0.17}
{'loss': 1.0209, 'learning_rate': 0.00019939129098321904, 'epoch': 0.17}
{'loss': 1.1392, 'learning_rate': 0.00019937855671555132, 'epoch': 0.17}
{'loss': 1.089, 'learning_rate': 0.00019936569103710377, 'epoch': 0.17}
{'loss': 1.1601, 'learning_rate': 0.00019935269396488894, 'epoch': 0.17}
{'loss': 1.1188, 'learning_rate': 0.00019933956551609322, 'epoch': 0.17}
{'loss': 1.0759, 'learning_rate': 0.00019932630570807666, 'epoch': 0.18}
{'loss': 1.0615, 'learning_rate': 0.00019931291455837306, 'epoch': 0.18}
{'loss': 1.0169, 'learning_rate': 0.00019929939208468991, 'epoch': 0.18}
{'loss': 1.0956, 'learning_rate': 0.00019928573830490826, 'epoch': 0.18}
{'loss': 1.1011, 'learning_rate': 0.0001992719532370829, 'epoch': 0.18}
{'loss': 1.0581, 'learning_rate': 0.00019925803689944212, 'epoch': 0.18}
{'loss': 1.2119, 'learning_rate': 0.00019924398931038786, 'epoch': 0.18}
{'loss': 0.9725, 'learning_rate': 0.00019922981048849564, 'epoch': 0.19}
{'loss': 1.1215, 'learning_rate': 0.00019921550045251443, 'epoch': 0.19}
{'loss': 1.069, 'learning_rate': 0.00019920105922136678, 'epoch': 0.19}
{'loss': 1.0962, 'learning_rate': 0.00019918648681414868, 'epoch': 0.19}
{'loss': 1.0503, 'learning_rate': 0.00019917178325012963, 'epoch': 0.19}
{'loss': 1.1766, 'learning_rate': 0.00019915694854875246, 'epoch': 0.19}
{'loss': 1.1211, 'learning_rate': 0.00019914198272963352, 'epoch': 0.19}
{'loss': 1.099, 'learning_rate': 0.00019912688581256248, 'epoch': 0.2}
{'loss': 1.2161, 'learning_rate': 0.00019911165781750237, 'epoch': 0.2}
{'loss': 1.1908, 'learning_rate': 0.00019909629876458954, 'epoch': 0.2}
{'loss': 1.0902, 'learning_rate': 0.00019908080867413368, 'epoch': 0.2}
{'loss': 1.136, 'learning_rate': 0.0001990651875666177, 'epoch': 0.2}
{'loss': 1.0682, 'learning_rate': 0.00019904943546269785, 'epoch': 0.2}
{'loss': 1.0492, 'learning_rate': 0.00019903355238320346, 'epoch': 0.2}
{'loss': 1.0702, 'learning_rate': 0.0001990175383491372, 'epoch': 0.21}
{'loss': 1.1384, 'learning_rate': 0.00019900139338167473, 'epoch': 0.21}
{'loss': 1.0154, 'learning_rate': 0.00019898511750216505, 'epoch': 0.21}
{'loss': 0.9933, 'learning_rate': 0.00019896871073213007, 'epoch': 0.21}
{'loss': 1.109, 'learning_rate': 0.000198952173093265, 'epoch': 0.21}
{'loss': 1.0841, 'learning_rate': 0.00019893550460743788, 'epoch': 0.21}
{'loss': 1.0595, 'learning_rate': 0.0001989187052966899, 'epoch': 0.22}
{'loss': 1.0745, 'learning_rate': 0.0001989017751832352, 'epoch': 0.22}
{'loss': 1.1012, 'learning_rate': 0.00019888471428946094, 'epoch': 0.22}
{'loss': 1.1249, 'learning_rate': 0.00019886752263792714, 'epoch': 0.22}
{'loss': 1.1229, 'learning_rate': 0.00019885020025136677, 'epoch': 0.22}
{'loss': 1.0904, 'learning_rate': 0.00019883274715268564, 'epoch': 0.22}
{'loss': 1.1373, 'learning_rate': 0.00019881516336496243, 'epoch': 0.22}
{'loss': 1.1407, 'learning_rate': 0.00019879744891144864, 'epoch': 0.23}
{'loss': 0.9957, 'learning_rate': 0.0001987796038155685, 'epoch': 0.23}
{'loss': 1.2211, 'learning_rate': 0.00019876162810091908, 'epoch': 0.23}
{'loss': 1.1074, 'learning_rate': 0.00019874352179127014, 'epoch': 0.23}
{'loss': 1.1739, 'learning_rate': 0.00019872528491056405, 'epoch': 0.23}
{'loss': 1.1182, 'learning_rate': 0.0001987069174829159, 'epoch': 0.23}
{'loss': 1.127, 'learning_rate': 0.0001986884195326135, 'epoch': 0.23}
{'loss': 1.0611, 'learning_rate': 0.000198669791084117, 'epoch': 0.24}
{'loss': 1.1211, 'learning_rate': 0.0001986510321620594, 'epoch': 0.24}
{'loss': 1.1486, 'learning_rate': 0.00019863214279124608, 'epoch': 0.24}
{'loss': 1.1382, 'learning_rate': 0.0001986131229966549, 'epoch': 0.24}
{'loss': 1.1212, 'learning_rate': 0.0001985939728034362, 'epoch': 0.24}
{'loss': 1.0609, 'learning_rate': 0.00019857469223691276, 'epoch': 0.24}
{'loss': 1.1094, 'learning_rate': 0.00019855528132257984, 'epoch': 0.24}
{'loss': 1.1383, 'learning_rate': 0.0001985357400861049, 'epoch': 0.25}
{'loss': 1.0304, 'learning_rate': 0.00019851606855332787, 'epoch': 0.25}
{'loss': 1.0512, 'learning_rate': 0.00019849626675026087, 'epoch': 0.25}
{'loss': 1.1373, 'learning_rate': 0.00019847633470308833, 'epoch': 0.25}
  6%|███████                                                                                                          | 172/2752 [02:57<43:30,  1.01s/it][2023-12-29 02:06:23,389] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,389] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,389] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,390] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,390] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,390] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,390] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,390] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,391] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,391] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,391] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,391] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,391] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,391] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,391] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,391] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,643] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,644] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,644] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,645] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:06:23,906] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:23,907] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:06:24,168] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:24,168] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:06:24,419] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:24,420] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:06:24,701] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:24,702] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:06:24,964] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:24,964] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:06:25,216] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:25,217] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:06:25,477] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:25,478] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:06:25,737] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:25,738] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:06:26,006] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:26,007] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:06:26,271] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:26,271] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:06:26,533] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:06:26,533] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 1.08314847946167, 'eval_runtime': 3.155, 'eval_samples_per_second': 346.12, 'eval_steps_per_second': 21.87, 'epoch': 0.25}
{'loss': 1.0455, 'learning_rate': 0.00019845627243816693, 'epoch': 0.25}
{'loss': 1.1284, 'learning_rate': 0.0001984360799820255, 'epoch': 0.25}
{'loss': 1.1126, 'learning_rate': 0.00019841575736136502, 'epoch': 0.25}
{'loss': 1.0239, 'learning_rate': 0.00019839530460305862, 'epoch': 0.26}
{'loss': 1.1152, 'learning_rate': 0.00019837472173415147, 'epoch': 0.26}
{'loss': 1.1221, 'learning_rate': 0.0001983540087818609, 'epoch': 0.26}
{'loss': 1.0601, 'learning_rate': 0.00019833316577357607, 'epoch': 0.26}
{'loss': 1.1294, 'learning_rate': 0.00019831219273685826, 'epoch': 0.26}
{'loss': 1.0331, 'learning_rate': 0.00019829108969944068, 'epoch': 0.26}
{'loss': 1.0731, 'learning_rate': 0.00019826985668922834, 'epoch': 0.26}
{'loss': 1.0465, 'learning_rate': 0.00019824849373429825, 'epoch': 0.27}
{'loss': 1.0436, 'learning_rate': 0.00019822700086289915, 'epoch': 0.27}
{'loss': 1.0638, 'learning_rate': 0.00019820537810345164, 'epoch': 0.27}
{'loss': 1.0589, 'learning_rate': 0.000198183625484548, 'epoch': 0.27}
{'loss': 1.1969, 'learning_rate': 0.0001981617430349523, 'epoch': 0.27}
{'loss': 1.0533, 'learning_rate': 0.00019813973078360025, 'epoch': 0.27}
{'loss': 0.9988, 'learning_rate': 0.0001981175887595992, 'epoch': 0.27}
{'loss': 1.0979, 'learning_rate': 0.0001980953169922281, 'epoch': 0.28}
{'loss': 1.085, 'learning_rate': 0.00019807291551093747, 'epoch': 0.28}
{'loss': 1.1527, 'learning_rate': 0.0001980503843453494, 'epoch': 0.28}
{'loss': 1.1165, 'learning_rate': 0.00019802772352525735, 'epoch': 0.28}
{'loss': 1.0262, 'learning_rate': 0.00019800493308062635, 'epoch': 0.28}
{'loss': 1.052, 'learning_rate': 0.00019798201304159282, 'epoch': 0.28}
{'loss': 1.1547, 'learning_rate': 0.00019795896343846437, 'epoch': 0.28}
{'loss': 1.1672, 'learning_rate': 0.00019793578430172022, 'epoch': 0.29}
{'loss': 1.1193, 'learning_rate': 0.00019791247566201063, 'epoch': 0.29}
{'loss': 1.0842, 'learning_rate': 0.00019788903755015724, 'epoch': 0.29}
{'loss': 1.0412, 'learning_rate': 0.00019786546999715285, 'epoch': 0.29}
{'loss': 1.0645, 'learning_rate': 0.00019784177303416148, 'epoch': 0.29}
{'loss': 1.0191, 'learning_rate': 0.00019781794669251817, 'epoch': 0.29}
{'loss': 1.0915, 'learning_rate': 0.0001977939910037291, 'epoch': 0.3}
{'loss': 1.0264, 'learning_rate': 0.00019776990599947147, 'epoch': 0.3}
{'loss': 1.0597, 'learning_rate': 0.00019774569171159353, 'epoch': 0.3}
{'loss': 1.0705, 'learning_rate': 0.00019772134817211442, 'epoch': 0.3}
{'loss': 1.0304, 'learning_rate': 0.00019769687541322422, 'epoch': 0.3}
{'loss': 1.1003, 'learning_rate': 0.00019767227346728392, 'epoch': 0.3}
{'loss': 1.0781, 'learning_rate': 0.00019764754236682524, 'epoch': 0.3}
{'loss': 1.1305, 'learning_rate': 0.00019762268214455072, 'epoch': 0.31}
{'loss': 1.1008, 'learning_rate': 0.00019759769283333377, 'epoch': 0.31}
{'loss': 1.075, 'learning_rate': 0.00019757257446621827, 'epoch': 0.31}
{'loss': 1.0276, 'learning_rate': 0.0001975473270764189, 'epoch': 0.31}
{'loss': 1.1116, 'learning_rate': 0.000197521950697321, 'epoch': 0.31}
{'loss': 1.107, 'learning_rate': 0.00019749644536248031, 'epoch': 0.31}
{'loss': 1.1041, 'learning_rate': 0.00019747081110562322, 'epoch': 0.31}
{'loss': 1.183, 'learning_rate': 0.00019744504796064653, 'epoch': 0.32}
{'loss': 1.1011, 'learning_rate': 0.00019741915596161756, 'epoch': 0.32}
{'loss': 1.011, 'learning_rate': 0.00019739313514277384, 'epoch': 0.32}
{'loss': 1.0174, 'learning_rate': 0.0001973669855385235, 'epoch': 0.32}
{'loss': 1.0124, 'learning_rate': 0.00019734070718344468, 'epoch': 0.32}
{'loss': 1.0547, 'learning_rate': 0.00019731430011228604, 'epoch': 0.32}
{'loss': 1.0363, 'learning_rate': 0.00019728776435996625, 'epoch': 0.32}
{'loss': 1.1824, 'learning_rate': 0.00019726109996157424, 'epoch': 0.33}
{'loss': 1.0507, 'learning_rate': 0.00019723430695236895, 'epoch': 0.33}
{'loss': 1.1741, 'learning_rate': 0.00019720738536777951, 'epoch': 0.33}
{'loss': 1.0863, 'learning_rate': 0.00019718033524340504, 'epoch': 0.33}
{'loss': 1.1066, 'learning_rate': 0.0001971531566150145, 'epoch': 0.33}
{'loss': 1.1112, 'learning_rate': 0.00019712584951854701, 'epoch': 0.33}
{'loss': 1.0474, 'learning_rate': 0.0001970984139901114, 'epoch': 0.33}
{'loss': 1.0878, 'learning_rate': 0.00019707085006598628, 'epoch': 0.34}
{'loss': 1.1319, 'learning_rate': 0.00019704315778262016, 'epoch': 0.34}
{'loss': 1.1191, 'learning_rate': 0.00019701533717663133, 'epoch': 0.34}
{'loss': 1.0319, 'learning_rate': 0.00019698738828480758, 'epoch': 0.34}
{'loss': 1.0866, 'learning_rate': 0.00019695931114410646, 'epoch': 0.34}
{'loss': 1.0531, 'learning_rate': 0.00019693110579165513, 'epoch': 0.34}
{'loss': 0.9935, 'learning_rate': 0.0001969027722647502, 'epoch': 0.34}
{'loss': 1.0671, 'learning_rate': 0.0001968743106008578, 'epoch': 0.35}
{'loss': 1.0987, 'learning_rate': 0.00019684572083761352, 'epoch': 0.35}
{'loss': 1.1184, 'learning_rate': 0.00019681700301282234, 'epoch': 0.35}
{'loss': 1.1598, 'learning_rate': 0.00019678815716445857, 'epoch': 0.35}
{'loss': 1.0762, 'learning_rate': 0.0001967591833306658, 'epoch': 0.35}
{'loss': 0.9739, 'learning_rate': 0.00019673008154975685, 'epoch': 0.35}
{'loss': 0.9894, 'learning_rate': 0.00019670085186021375, 'epoch': 0.35}
{'loss': 1.0197, 'learning_rate': 0.00019667149430068766, 'epoch': 0.36}
{'loss': 1.0673, 'learning_rate': 0.00019664200890999882, 'epoch': 0.36}
{'loss': 1.0144, 'learning_rate': 0.0001966123957271365, 'epoch': 0.36}
{'loss': 1.0572, 'learning_rate': 0.000196582654791259, 'epoch': 0.36}
{'loss': 1.1283, 'learning_rate': 0.00019655278614169345, 'epoch': 0.36}
{'loss': 1.0792, 'learning_rate': 0.00019652278981793596, 'epoch': 0.36}
{'loss': 1.1433, 'learning_rate': 0.00019649266585965145, 'epoch': 0.36}
{'loss': 1.0699, 'learning_rate': 0.00019646241430667353, 'epoch': 0.37}
{'loss': 1.0948, 'learning_rate': 0.00019643203519900465, 'epoch': 0.37}
{'loss': 1.1015, 'learning_rate': 0.0001964015285768158, 'epoch': 0.37}
{'loss': 1.0478, 'learning_rate': 0.00019637089448044676, 'epoch': 0.37}
{'loss': 1.1033, 'learning_rate': 0.0001963401329504057, 'epoch': 0.37}
{'loss': 1.0759, 'learning_rate': 0.0001963092440273694, 'epoch': 0.37}
{'loss': 1.1169, 'learning_rate': 0.00019627822775218303, 'epoch': 0.38}
{'loss': 1.092, 'learning_rate': 0.00019624708416586021, 'epoch': 0.38}
{'loss': 1.0935, 'learning_rate': 0.00019621581330958295, 'epoch': 0.38}
{'loss': 1.1252, 'learning_rate': 0.0001961844152247014, 'epoch': 0.38}
{'loss': 0.9732, 'learning_rate': 0.00019615288995273412, 'epoch': 0.38}
{'loss': 1.0818, 'learning_rate': 0.0001961212375353677, 'epoch': 0.38}
{'loss': 1.0424, 'learning_rate': 0.000196089458014457, 'epoch': 0.38}
{'loss': 1.0514, 'learning_rate': 0.00019605755143202488, 'epoch': 0.39}
{'loss': 1.1457, 'learning_rate': 0.00019602551783026216, 'epoch': 0.39}
{'loss': 1.1511, 'learning_rate': 0.00019599335725152775, 'epoch': 0.39}
{'loss': 1.2227, 'learning_rate': 0.00019596106973834835, 'epoch': 0.39}
{'loss': 1.0812, 'learning_rate': 0.00019592865533341858, 'epoch': 0.39}
{'loss': 1.147, 'learning_rate': 0.0001958961140796008, 'epoch': 0.39}
{'loss': 1.2058, 'learning_rate': 0.00019586344601992515, 'epoch': 0.39}
{'loss': 1.0079, 'learning_rate': 0.0001958306511975895, 'epoch': 0.4}
{'loss': 1.1207, 'learning_rate': 0.00019579772965595918, 'epoch': 0.4}
{'loss': 1.0082, 'learning_rate': 0.00019576468143856719, 'epoch': 0.4}
{'loss': 1.1827, 'learning_rate': 0.00019573150658911404, 'epoch': 0.4}
{'loss': 1.0825, 'learning_rate': 0.00019569820515146768, 'epoch': 0.4}
{'loss': 1.0575, 'learning_rate': 0.00019566477716966344, 'epoch': 0.4}
{'loss': 1.0498, 'learning_rate': 0.000195631222687904, 'epoch': 0.4}
{'loss': 1.1522, 'learning_rate': 0.00019559754175055925, 'epoch': 0.41}
{'loss': 1.0967, 'learning_rate': 0.0001955637344021664, 'epoch': 0.41}
{'loss': 1.0513, 'learning_rate': 0.00019552980068742977, 'epoch': 0.41}
{'loss': 1.1326, 'learning_rate': 0.0001954957406512207, 'epoch': 0.41}
{'loss': 1.185, 'learning_rate': 0.0001954615543385777, 'epoch': 0.41}
{'loss': 1.183, 'learning_rate': 0.00019542724179470616, 'epoch': 0.41}
{'loss': 1.1387, 'learning_rate': 0.00019539280306497844, 'epoch': 0.41}
{'loss': 1.0883, 'learning_rate': 0.00019535823819493374, 'epoch': 0.42}
{'loss': 1.1656, 'learning_rate': 0.0001953235472302781, 'epoch': 0.42}
{'loss': 1.1537, 'learning_rate': 0.0001952887302168842, 'epoch': 0.42}
{'loss': 1.0822, 'learning_rate': 0.00019525378720079147, 'epoch': 0.42}
{'loss': 1.1122, 'learning_rate': 0.00019521871822820598, 'epoch': 0.42}
{'loss': 1.1109, 'learning_rate': 0.0001951835233455003, 'epoch': 0.42}
{'loss': 1.0737, 'learning_rate': 0.00019514820259921352, 'epoch': 0.42}
{'loss': 0.9906, 'learning_rate': 0.0001951127560360511, 'epoch': 0.43}
{'loss': 0.9589, 'learning_rate': 0.00019507718370288503, 'epoch': 0.43}
{'loss': 1.0871, 'learning_rate': 0.0001950414856467534, 'epoch': 0.43}
{'loss': 1.0871, 'learning_rate': 0.00019500566191486075, 'epoch': 0.43}
{'loss': 0.9745, 'learning_rate': 0.00019496971255457765, 'epoch': 0.43}
{'loss': 1.1144, 'learning_rate': 0.00019493363761344086, 'epoch': 0.43}
{'loss': 1.1213, 'learning_rate': 0.00019489743713915316, 'epoch': 0.43}
{'loss': 1.0394, 'learning_rate': 0.00019486111117958342, 'epoch': 0.44}
{'loss': 0.898, 'learning_rate': 0.0001948246597827663, 'epoch': 0.44}
{'loss': 1.1053, 'learning_rate': 0.00019478808299690247, 'epoch': 0.44}
{'loss': 1.1651, 'learning_rate': 0.0001947513808703583, 'epoch': 0.44}
{'loss': 1.0712, 'learning_rate': 0.00019471455345166595, 'epoch': 0.44}
{'loss': 1.0605, 'learning_rate': 0.00019467760078952325, 'epoch': 0.44}
{'loss': 1.0684, 'learning_rate': 0.00019464052293279363, 'epoch': 0.44}
{'loss': 1.0278, 'learning_rate': 0.00019460331993050609, 'epoch': 0.45}
{'loss': 1.1612, 'learning_rate': 0.00019456599183185507, 'epoch': 0.45}
{'loss': 1.1316, 'learning_rate': 0.0001945285386862005, 'epoch': 0.45}
{'loss': 1.1126, 'learning_rate': 0.00019449096054306763, 'epoch': 0.45}
{'loss': 1.0819, 'learning_rate': 0.00019445325745214695, 'epoch': 0.45}
{'loss': 1.0392, 'learning_rate': 0.00019441542946329422, 'epoch': 0.45}
{'loss': 1.1946, 'learning_rate': 0.0001943774766265304, 'epoch': 0.45}
{'loss': 1.1179, 'learning_rate': 0.00019433939899204142, 'epoch': 0.46}
{'loss': 1.0504, 'learning_rate': 0.0001943011966101783, 'epoch': 0.46}
{'loss': 1.0482, 'learning_rate': 0.00019426286953145704, 'epoch': 0.46}
{'loss': 1.0551, 'learning_rate': 0.0001942244178065585, 'epoch': 0.46}
{'loss': 1.1729, 'learning_rate': 0.00019418584148632836, 'epoch': 0.46}
{'loss': 1.0714, 'learning_rate': 0.00019414714062177712, 'epoch': 0.46}
{'loss': 1.0875, 'learning_rate': 0.00019410831526407984, 'epoch': 0.47}
{'loss': 1.1164, 'learning_rate': 0.0001940693654645763, 'epoch': 0.47}
{'loss': 1.139, 'learning_rate': 0.0001940302912747708, 'epoch': 0.47}
{'loss': 1.037, 'learning_rate': 0.00019399109274633215, 'epoch': 0.47}
{'loss': 1.0673, 'learning_rate': 0.00019395176993109356, 'epoch': 0.47}
{'loss': 1.0884, 'learning_rate': 0.00019391232288105254, 'epoch': 0.47}
{'loss': 1.0848, 'learning_rate': 0.00019387275164837098, 'epoch': 0.47}
{'loss': 1.1198, 'learning_rate': 0.00019383305628537485, 'epoch': 0.48}
{'loss': 1.1119, 'learning_rate': 0.0001937932368445544, 'epoch': 0.48}
{'loss': 1.0566, 'learning_rate': 0.00019375329337856383, 'epoch': 0.48}
{'loss': 1.0883, 'learning_rate': 0.0001937132259402214, 'epoch': 0.48}
{'loss': 1.0596, 'learning_rate': 0.00019367303458250938, 'epoch': 0.48}
{'loss': 1.0067, 'learning_rate': 0.00019363271935857372, 'epoch': 0.48}
{'loss': 1.0583, 'learning_rate': 0.00019359228032172433, 'epoch': 0.48}
{'loss': 1.1095, 'learning_rate': 0.00019355171752543472, 'epoch': 0.49}
{'loss': 1.0321, 'learning_rate': 0.00019351103102334212, 'epoch': 0.49}
{'loss': 1.0552, 'learning_rate': 0.00019347022086924732, 'epoch': 0.49}
{'loss': 0.9958, 'learning_rate': 0.00019342928711711465, 'epoch': 0.49}
{'loss': 1.0808, 'learning_rate': 0.0001933882298210718, 'epoch': 0.49}
{'loss': 1.1052, 'learning_rate': 0.0001933470490354099, 'epoch': 0.49}
{'loss': 1.1816, 'learning_rate': 0.00019330574481458333, 'epoch': 0.49}
{'loss': 1.182, 'learning_rate': 0.00019326431721320973, 'epoch': 0.5}
{'loss': 1.0482, 'learning_rate': 0.0001932227662860698, 'epoch': 0.5}
{'loss': 1.0726, 'learning_rate': 0.00019318109208810746, 'epoch': 0.5}
{'loss': 1.01, 'learning_rate': 0.00019313929467442952, 'epoch': 0.5}
 12%|██████████████▏                                                                                                  | 344/2752 [05:55<40:27,  1.01s/it][2023-12-29 02:09:20,839] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:20,839] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:20,840] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:20,840] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:20,840] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:20,840] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:20,840] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:20,840] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:20,840] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:20,840] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:20,841] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:20,841] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:20,841] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:20,841] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:20,841] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:20,842] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:21,095] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:21,096] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:21,096] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:21,097] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:09:21,358] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:21,358] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:09:21,618] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:21,618] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:09:21,869] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:21,870] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:09:22,152] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:22,152] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:09:22,417] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:22,418] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:09:22,670] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:22,670] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:09:22,931] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:22,932] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:09:23,193] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:23,194] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:09:23,462] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:23,463] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:09:23,725] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:23,726] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:09:23,989] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:09:23,990] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 1.0399237871170044, 'eval_runtime': 3.1624, 'eval_samples_per_second': 345.306, 'eval_steps_per_second': 21.819, 'epoch': 0.5}
{'loss': 1.0084, 'learning_rate': 0.00019309737410030578, 'epoch': 0.5}
{'loss': 1.1143, 'learning_rate': 0.00019305533042116883, 'epoch': 0.5}
{'loss': 1.0768, 'learning_rate': 0.00019301316369261414, 'epoch': 0.5}
{'loss': 1.0763, 'learning_rate': 0.00019297087397039984, 'epoch': 0.51}
{'loss': 1.0445, 'learning_rate': 0.00019292846131044664, 'epoch': 0.51}
{'loss': 1.0717, 'learning_rate': 0.00019288592576883793, 'epoch': 0.51}
{'loss': 1.0391, 'learning_rate': 0.00019284326740181952, 'epoch': 0.51}
{'loss': 1.2133, 'learning_rate': 0.00019280048626579962, 'epoch': 0.51}
{'loss': 1.0577, 'learning_rate': 0.00019275758241734886, 'epoch': 0.51}
{'loss': 1.1106, 'learning_rate': 0.00019271455591320007, 'epoch': 0.51}
{'loss': 1.1, 'learning_rate': 0.0001926714068102483, 'epoch': 0.52}
{'loss': 0.9612, 'learning_rate': 0.0001926281351655506, 'epoch': 0.52}
{'loss': 1.123, 'learning_rate': 0.00019258474103632625, 'epoch': 0.52}
{'loss': 1.0484, 'learning_rate': 0.00019254122447995645, 'epoch': 0.52}
{'loss': 1.0206, 'learning_rate': 0.00019249758555398412, 'epoch': 0.52}
{'loss': 1.1084, 'learning_rate': 0.0001924538243161142, 'epoch': 0.52}
{'loss': 1.0974, 'learning_rate': 0.00019240994082421326, 'epoch': 0.52}
{'loss': 1.1584, 'learning_rate': 0.0001923659351363096, 'epoch': 0.53}
{'loss': 1.041, 'learning_rate': 0.00019232180731059293, 'epoch': 0.53}
{'loss': 1.1318, 'learning_rate': 0.0001922775574054147, 'epoch': 0.53}
{'loss': 1.087, 'learning_rate': 0.00019223318547928762, 'epoch': 0.53}
{'loss': 1.0942, 'learning_rate': 0.0001921886915908859, 'epoch': 0.53}
{'loss': 1.1136, 'learning_rate': 0.0001921440757990448, 'epoch': 0.53}
{'loss': 1.2016, 'learning_rate': 0.00019209933816276102, 'epoch': 0.53}
{'loss': 1.0318, 'learning_rate': 0.00019205447874119224, 'epoch': 0.54}
{'loss': 1.0923, 'learning_rate': 0.00019200949759365718, 'epoch': 0.54}
{'loss': 0.984, 'learning_rate': 0.00019196439477963556, 'epoch': 0.54}
{'loss': 1.1257, 'learning_rate': 0.00019191917035876798, 'epoch': 0.54}
{'loss': 1.2465, 'learning_rate': 0.00019187382439085586, 'epoch': 0.54}
{'loss': 1.0907, 'learning_rate': 0.00019182835693586127, 'epoch': 0.54}
{'loss': 0.9928, 'learning_rate': 0.00019178276805390703, 'epoch': 0.55}
{'loss': 1.099, 'learning_rate': 0.00019173705780527642, 'epoch': 0.55}
{'loss': 1.1329, 'learning_rate': 0.00019169122625041328, 'epoch': 0.55}
{'loss': 1.0504, 'learning_rate': 0.00019164527344992186, 'epoch': 0.55}
{'loss': 1.1264, 'learning_rate': 0.00019159919946456667, 'epoch': 0.55}
{'loss': 1.051, 'learning_rate': 0.00019155300435527256, 'epoch': 0.55}
{'loss': 1.0525, 'learning_rate': 0.0001915066881831244, 'epoch': 0.55}
{'loss': 1.0701, 'learning_rate': 0.00019146025100936733, 'epoch': 0.56}
{'loss': 0.9844, 'learning_rate': 0.00019141369289540637, 'epoch': 0.56}
{'loss': 1.1251, 'learning_rate': 0.00019136701390280644, 'epoch': 0.56}
{'loss': 1.044, 'learning_rate': 0.00019132021409329242, 'epoch': 0.56}
{'loss': 1.0764, 'learning_rate': 0.00019127329352874886, 'epoch': 0.56}
{'loss': 1.0345, 'learning_rate': 0.00019122625227122002, 'epoch': 0.56}
{'loss': 1.1272, 'learning_rate': 0.00019117909038290974, 'epoch': 0.56}
{'loss': 1.2045, 'learning_rate': 0.00019113180792618132, 'epoch': 0.57}
{'loss': 1.0651, 'learning_rate': 0.00019108440496355767, 'epoch': 0.57}
{'loss': 1.0704, 'learning_rate': 0.00019103688155772082, 'epoch': 0.57}
{'loss': 1.0859, 'learning_rate': 0.00019098923777151222, 'epoch': 0.57}
{'loss': 1.0957, 'learning_rate': 0.00019094147366793243, 'epoch': 0.57}
{'loss': 1.134, 'learning_rate': 0.00019089358931014114, 'epoch': 0.57}
{'loss': 1.1122, 'learning_rate': 0.00019084558476145706, 'epoch': 0.57}
{'loss': 1.1211, 'learning_rate': 0.00019079746008535784, 'epoch': 0.58}
{'loss': 1.0487, 'learning_rate': 0.0001907492153454799, 'epoch': 0.58}
{'loss': 1.0375, 'learning_rate': 0.00019070085060561852, 'epoch': 0.58}
{'loss': 1.0904, 'learning_rate': 0.0001906523659297276, 'epoch': 0.58}
{'loss': 1.1081, 'learning_rate': 0.0001906037613819197, 'epoch': 0.58}
{'loss': 1.0338, 'learning_rate': 0.00019055503702646576, 'epoch': 0.58}
{'loss': 1.1025, 'learning_rate': 0.0001905061929277953, 'epoch': 0.58}
{'loss': 1.0348, 'learning_rate': 0.00019045722915049607, 'epoch': 0.59}
{'loss': 1.1084, 'learning_rate': 0.00019040814575931413, 'epoch': 0.59}
{'loss': 1.0338, 'learning_rate': 0.00019035894281915368, 'epoch': 0.59}
{'loss': 1.0363, 'learning_rate': 0.00019030962039507702, 'epoch': 0.59}
{'loss': 1.0279, 'learning_rate': 0.00019026017855230444, 'epoch': 0.59}
{'loss': 1.0918, 'learning_rate': 0.00019021061735621412, 'epoch': 0.59}
{'loss': 1.0521, 'learning_rate': 0.0001901609368723421, 'epoch': 0.59}
{'loss': 1.0885, 'learning_rate': 0.00019011113716638217, 'epoch': 0.6}
{'loss': 1.1895, 'learning_rate': 0.00019006121830418565, 'epoch': 0.6}
{'loss': 1.226, 'learning_rate': 0.00019001118035176162, 'epoch': 0.6}
{'loss': 1.018, 'learning_rate': 0.00018996102337527648, 'epoch': 0.6}
{'loss': 1.1071, 'learning_rate': 0.00018991074744105407, 'epoch': 0.6}
{'loss': 1.0587, 'learning_rate': 0.00018986035261557552, 'epoch': 0.6}
{'loss': 1.0012, 'learning_rate': 0.0001898098389654792, 'epoch': 0.6}
{'loss': 1.0688, 'learning_rate': 0.0001897592065575606, 'epoch': 0.61}
{'loss': 1.0061, 'learning_rate': 0.0001897084554587722, 'epoch': 0.61}
{'loss': 1.1045, 'learning_rate': 0.0001896575857362235, 'epoch': 0.61}
{'loss': 1.2132, 'learning_rate': 0.0001896065974571808, 'epoch': 0.61}
{'loss': 1.174, 'learning_rate': 0.00018955549068906717, 'epoch': 0.61}
{'loss': 1.0624, 'learning_rate': 0.0001895042654994624, 'epoch': 0.61}
{'loss': 1.1386, 'learning_rate': 0.00018945292195610288, 'epoch': 0.61}
{'loss': 1.0884, 'learning_rate': 0.00018940146012688146, 'epoch': 0.62}
{'loss': 1.0157, 'learning_rate': 0.0001893498800798474, 'epoch': 0.62}
{'loss': 1.0452, 'learning_rate': 0.00018929818188320635, 'epoch': 0.62}
{'loss': 1.1981, 'learning_rate': 0.00018924636560532006, 'epoch': 0.62}
{'loss': 1.0887, 'learning_rate': 0.00018919443131470658, 'epoch': 0.62}
{'loss': 1.0295, 'learning_rate': 0.0001891423790800399, 'epoch': 0.62}
{'loss': 1.0788, 'learning_rate': 0.00018909020897015004, 'epoch': 0.62}
{'loss': 1.0976, 'learning_rate': 0.00018903792105402282, 'epoch': 0.63}
{'loss': 1.0467, 'learning_rate': 0.00018898551540079989, 'epoch': 0.63}
{'loss': 1.0741, 'learning_rate': 0.00018893299207977857, 'epoch': 0.63}
{'loss': 1.1081, 'learning_rate': 0.00018888035116041175, 'epoch': 0.63}
{'loss': 1.1227, 'learning_rate': 0.0001888275927123079, 'epoch': 0.63}
{'loss': 1.0104, 'learning_rate': 0.00018877471680523082, 'epoch': 0.63}
{'loss': 1.1211, 'learning_rate': 0.00018872172350909968, 'epoch': 0.64}
{'loss': 1.0695, 'learning_rate': 0.00018866861289398883, 'epoch': 0.64}
{'loss': 1.0582, 'learning_rate': 0.0001886153850301278, 'epoch': 0.64}
{'loss': 1.0773, 'learning_rate': 0.00018856203998790112, 'epoch': 0.64}
{'loss': 1.0354, 'learning_rate': 0.0001885085778378483, 'epoch': 0.64}
{'loss': 1.0898, 'learning_rate': 0.00018845499865066372, 'epoch': 0.64}
{'loss': 1.1125, 'learning_rate': 0.00018840130249719644, 'epoch': 0.64}
{'loss': 1.0007, 'learning_rate': 0.00018834748944845028, 'epoch': 0.65}
{'loss': 1.0058, 'learning_rate': 0.00018829355957558362, 'epoch': 0.65}
{'loss': 1.1075, 'learning_rate': 0.00018823951294990923, 'epoch': 0.65}
{'loss': 1.1168, 'learning_rate': 0.0001881853496428944, 'epoch': 0.65}
{'loss': 1.0664, 'learning_rate': 0.00018813106972616055, 'epoch': 0.65}
{'loss': 1.0699, 'learning_rate': 0.00018807667327148345, 'epoch': 0.65}
{'loss': 1.0609, 'learning_rate': 0.00018802216035079293, 'epoch': 0.65}
{'loss': 1.0355, 'learning_rate': 0.00018796753103617278, 'epoch': 0.66}
{'loss': 1.0838, 'learning_rate': 0.0001879127853998607, 'epoch': 0.66}
{'loss': 1.1292, 'learning_rate': 0.00018785792351424827, 'epoch': 0.66}
{'loss': 1.2015, 'learning_rate': 0.0001878029454518807, 'epoch': 0.66}
{'loss': 1.0179, 'learning_rate': 0.00018774785128545694, 'epoch': 0.66}
{'loss': 1.0756, 'learning_rate': 0.00018769264108782933, 'epoch': 0.66}
{'loss': 1.0685, 'learning_rate': 0.00018763731493200375, 'epoch': 0.66}
{'loss': 1.1266, 'learning_rate': 0.00018758187289113937, 'epoch': 0.67}
{'loss': 1.0647, 'learning_rate': 0.00018752631503854864, 'epoch': 0.67}
{'loss': 1.1856, 'learning_rate': 0.00018747064144769703, 'epoch': 0.67}
{'loss': 1.0502, 'learning_rate': 0.0001874148521922032, 'epoch': 0.67}
{'loss': 1.0987, 'learning_rate': 0.00018735894734583867, 'epoch': 0.67}
{'loss': 0.9963, 'learning_rate': 0.00018730292698252785, 'epoch': 0.67}
{'loss': 1.1398, 'learning_rate': 0.0001872467911763479, 'epoch': 0.67}
{'loss': 1.0785, 'learning_rate': 0.00018719054000152855, 'epoch': 0.68}
{'loss': 1.0632, 'learning_rate': 0.00018713417353245223, 'epoch': 0.68}
{'loss': 1.0651, 'learning_rate': 0.00018707769184365367, 'epoch': 0.68}
{'loss': 1.0689, 'learning_rate': 0.0001870210950098201, 'epoch': 0.68}
{'loss': 1.0895, 'learning_rate': 0.00018696438310579093, 'epoch': 0.68}
{'loss': 1.0064, 'learning_rate': 0.00018690755620655774, 'epoch': 0.68}
{'loss': 1.0768, 'learning_rate': 0.00018685061438726414, 'epoch': 0.68}
{'loss': 1.0823, 'learning_rate': 0.00018679355772320585, 'epoch': 0.69}
{'loss': 1.0314, 'learning_rate': 0.00018673638628983018, 'epoch': 0.69}
{'loss': 1.0868, 'learning_rate': 0.00018667910016273648, 'epoch': 0.69}
{'loss': 1.0713, 'learning_rate': 0.00018662169941767562, 'epoch': 0.69}
{'loss': 1.1126, 'learning_rate': 0.00018656418413055007, 'epoch': 0.69}
{'loss': 1.1445, 'learning_rate': 0.00018650655437741368, 'epoch': 0.69}
{'loss': 1.0327, 'learning_rate': 0.00018644881023447177, 'epoch': 0.69}
{'loss': 1.0334, 'learning_rate': 0.00018639095177808095, 'epoch': 0.7}
{'loss': 1.0707, 'learning_rate': 0.0001863329790847488, 'epoch': 0.7}
{'loss': 1.1176, 'learning_rate': 0.00018627489223113422, 'epoch': 0.7}
{'loss': 0.9961, 'learning_rate': 0.0001862166912940468, 'epoch': 0.7}
{'loss': 1.1016, 'learning_rate': 0.00018615837635044716, 'epoch': 0.7}
{'loss': 1.1082, 'learning_rate': 0.0001860999474774466, 'epoch': 0.7}
{'loss': 1.1234, 'learning_rate': 0.00018604140475230715, 'epoch': 0.7}
{'loss': 1.0627, 'learning_rate': 0.0001859827482524413, 'epoch': 0.71}
{'loss': 1.0814, 'learning_rate': 0.00018592397805541205, 'epoch': 0.71}
{'loss': 1.0698, 'learning_rate': 0.00018586509423893267, 'epoch': 0.71}
{'loss': 1.0616, 'learning_rate': 0.00018580609688086678, 'epoch': 0.71}
{'loss': 1.0523, 'learning_rate': 0.000185746986059228, 'epoch': 0.71}
{'loss': 1.1709, 'learning_rate': 0.00018568776185218016, 'epoch': 0.71}
{'loss': 1.0101, 'learning_rate': 0.00018562842433803687, 'epoch': 0.72}
{'loss': 1.178, 'learning_rate': 0.00018556897359526162, 'epoch': 0.72}
{'loss': 1.0737, 'learning_rate': 0.0001855094097024677, 'epoch': 0.72}
{'loss': 1.0922, 'learning_rate': 0.00018544973273841784, 'epoch': 0.72}
{'loss': 1.088, 'learning_rate': 0.00018538994278202448, 'epoch': 0.72}
{'loss': 1.0477, 'learning_rate': 0.00018533003991234937, 'epoch': 0.72}
{'loss': 1.0246, 'learning_rate': 0.00018527002420860362, 'epoch': 0.72}
{'loss': 1.0326, 'learning_rate': 0.00018520989575014746, 'epoch': 0.73}
{'loss': 1.0096, 'learning_rate': 0.0001851496546164903, 'epoch': 0.73}
{'loss': 1.1709, 'learning_rate': 0.00018508930088729052, 'epoch': 0.73}
{'loss': 1.0801, 'learning_rate': 0.0001850288346423554, 'epoch': 0.73}
{'loss': 1.0792, 'learning_rate': 0.00018496825596164094, 'epoch': 0.73}
{'loss': 0.9567, 'learning_rate': 0.00018490756492525187, 'epoch': 0.73}
{'loss': 1.0395, 'learning_rate': 0.0001848467616134415, 'epoch': 0.73}
{'loss': 1.0034, 'learning_rate': 0.0001847858461066116, 'epoch': 0.74}
{'loss': 1.1287, 'learning_rate': 0.00018472481848531226, 'epoch': 0.74}
{'loss': 1.0392, 'learning_rate': 0.00018466367883024186, 'epoch': 0.74}
{'loss': 1.0144, 'learning_rate': 0.00018460242722224694, 'epoch': 0.74}
{'loss': 1.1198, 'learning_rate': 0.00018454106374232197, 'epoch': 0.74}
{'loss': 1.0975, 'learning_rate': 0.00018447958847160953, 'epoch': 0.74}
{'loss': 1.0869, 'learning_rate': 0.00018441800149139988, 'epoch': 0.74}
{'loss': 0.9778, 'learning_rate': 0.000184356302883131, 'epoch': 0.75}
{'loss': 1.0177, 'learning_rate': 0.0001842944927283886, 'epoch': 0.75}
{'loss': 1.0183, 'learning_rate': 0.00018423257110890574, 'epoch': 0.75}
{'loss': 1.11, 'learning_rate': 0.00018417053810656302, 'epoch': 0.75}
 19%|█████████████████████▏                                                                                           | 516/2752 [08:53<37:43,  1.01s/it][2023-12-29 02:12:18,530] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,530] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,530] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,530] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,531] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,531] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,531] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,531] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,531] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,531] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,531] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,531] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,531] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,532] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,532] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,532] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,786] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,786] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,787] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:18,787] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:12:19,048] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:19,049] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:12:19,309] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:19,310] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:12:19,558] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:19,559] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:12:19,844] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:19,845] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:12:20,106] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:20,107] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:12:20,358] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:20,358] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:12:20,627] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:20,628] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:12:20,885] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:20,885] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:12:21,154] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:21,155] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:12:21,419] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:21,420] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:12:21,684] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:12:21,685] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 1.0167537927627563, 'eval_runtime': 3.1655, 'eval_samples_per_second': 344.974, 'eval_steps_per_second': 21.798, 'epoch': 0.75}
{'loss': 1.0633, 'learning_rate': 0.00018410839380338817, 'epoch': 0.75}
{'loss': 1.0491, 'learning_rate': 0.00018404613828155623, 'epoch': 0.75}
{'loss': 1.0262, 'learning_rate': 0.00018398377162338924, 'epoch': 0.75}
{'loss': 1.077, 'learning_rate': 0.0001839212939113562, 'epoch': 0.76}
{'loss': 0.9969, 'learning_rate': 0.00018385870522807295, 'epoch': 0.76}
{'loss': 1.1163, 'learning_rate': 0.00018379600565630213, 'epoch': 0.76}
{'loss': 1.1191, 'learning_rate': 0.00018373319527895294, 'epoch': 0.76}
{'loss': 1.0903, 'learning_rate': 0.00018367027417908117, 'epoch': 0.76}
{'loss': 1.1652, 'learning_rate': 0.0001836072424398889, 'epoch': 0.76}
{'loss': 1.0696, 'learning_rate': 0.0001835441001447247, 'epoch': 0.76}
{'loss': 1.1756, 'learning_rate': 0.0001834808473770832, 'epoch': 0.77}
{'loss': 1.0519, 'learning_rate': 0.00018341748422060503, 'epoch': 0.77}
{'loss': 1.125, 'learning_rate': 0.000183354010759077, 'epoch': 0.77}
{'loss': 0.9652, 'learning_rate': 0.00018329042707643164, 'epoch': 0.77}
{'loss': 1.191, 'learning_rate': 0.0001832267332567473, 'epoch': 0.77}
{'loss': 1.0471, 'learning_rate': 0.00018316292938424787, 'epoch': 0.77}
{'loss': 1.204, 'learning_rate': 0.00018309901554330288, 'epoch': 0.77}
{'loss': 1.1012, 'learning_rate': 0.0001830349918184272, 'epoch': 0.78}
{'loss': 1.0106, 'learning_rate': 0.000182970858294281, 'epoch': 0.78}
{'loss': 1.0811, 'learning_rate': 0.00018290661505566963, 'epoch': 0.78}
{'loss': 1.1186, 'learning_rate': 0.00018284226218754363, 'epoch': 0.78}
{'loss': 1.0943, 'learning_rate': 0.0001827777997749984, 'epoch': 0.78}
{'loss': 1.0597, 'learning_rate': 0.0001827132279032742, 'epoch': 0.78}
{'loss': 1.036, 'learning_rate': 0.00018264854665775605, 'epoch': 0.78}
{'loss': 1.0876, 'learning_rate': 0.00018258375612397365, 'epoch': 0.79}
{'loss': 1.0605, 'learning_rate': 0.00018251885638760105, 'epoch': 0.79}
{'loss': 1.1032, 'learning_rate': 0.00018245384753445693, 'epoch': 0.79}
{'loss': 1.1089, 'learning_rate': 0.0001823887296505041, 'epoch': 0.79}
{'loss': 1.0279, 'learning_rate': 0.00018232350282184956, 'epoch': 0.79}
{'loss': 1.0117, 'learning_rate': 0.0001822581671347444, 'epoch': 0.79}
{'loss': 1.1945, 'learning_rate': 0.0001821927226755837, 'epoch': 0.8}
{'loss': 1.1076, 'learning_rate': 0.00018212716953090624, 'epoch': 0.8}
{'loss': 1.0742, 'learning_rate': 0.0001820615077873947, 'epoch': 0.8}
{'loss': 1.1111, 'learning_rate': 0.0001819957375318752, 'epoch': 0.8}
{'loss': 1.1079, 'learning_rate': 0.00018192985885131743, 'epoch': 0.8}
{'loss': 1.1471, 'learning_rate': 0.00018186387183283443, 'epoch': 0.8}
{'loss': 1.0644, 'learning_rate': 0.00018179777656368253, 'epoch': 0.8}
{'loss': 0.9979, 'learning_rate': 0.00018173157313126114, 'epoch': 0.81}
{'loss': 1.101, 'learning_rate': 0.00018166526162311276, 'epoch': 0.81}
{'loss': 1.0195, 'learning_rate': 0.00018159884212692274, 'epoch': 0.81}
{'loss': 1.0995, 'learning_rate': 0.00018153231473051933, 'epoch': 0.81}
{'loss': 1.0479, 'learning_rate': 0.00018146567952187333, 'epoch': 0.81}
{'loss': 1.0001, 'learning_rate': 0.00018139893658909817, 'epoch': 0.81}
{'loss': 1.0848, 'learning_rate': 0.00018133208602044972, 'epoch': 0.81}
{'loss': 1.0957, 'learning_rate': 0.0001812651279043262, 'epoch': 0.82}
{'loss': 1.0671, 'learning_rate': 0.000181198062329268, 'epoch': 0.82}
{'loss': 1.0095, 'learning_rate': 0.00018113088938395762, 'epoch': 0.82}
{'loss': 1.046, 'learning_rate': 0.00018106360915721956, 'epoch': 0.82}
{'loss': 1.1351, 'learning_rate': 0.00018099622173802014, 'epoch': 0.82}
{'loss': 1.0051, 'learning_rate': 0.00018092872721546754, 'epoch': 0.82}
{'loss': 1.1066, 'learning_rate': 0.00018086112567881137, 'epoch': 0.82}
{'loss': 1.0311, 'learning_rate': 0.0001807934172174429, 'epoch': 0.83}
{'loss': 1.0088, 'learning_rate': 0.0001807256019208947, 'epoch': 0.83}
{'loss': 0.981, 'learning_rate': 0.00018065767987884073, 'epoch': 0.83}
{'loss': 1.0371, 'learning_rate': 0.00018058965118109593, 'epoch': 0.83}
{'loss': 1.1011, 'learning_rate': 0.00018052151591761644, 'epoch': 0.83}
{'loss': 1.0709, 'learning_rate': 0.00018045327417849923, 'epoch': 0.83}
{'loss': 1.075, 'learning_rate': 0.00018038492605398205, 'epoch': 0.83}
{'loss': 1.0475, 'learning_rate': 0.00018031647163444339, 'epoch': 0.84}
{'loss': 1.0734, 'learning_rate': 0.0001802479110104022, 'epoch': 0.84}
{'loss': 1.0387, 'learning_rate': 0.000180179244272518, 'epoch': 0.84}
{'loss': 1.0572, 'learning_rate': 0.00018011047151159052, 'epoch': 0.84}
{'loss': 1.0918, 'learning_rate': 0.00018004159281855974, 'epoch': 0.84}
{'loss': 1.1378, 'learning_rate': 0.0001799726082845057, 'epoch': 0.84}
{'loss': 1.0488, 'learning_rate': 0.00017990351800064834, 'epoch': 0.84}
{'loss': 1.0926, 'learning_rate': 0.00017983432205834755, 'epoch': 0.85}
{'loss': 1.1045, 'learning_rate': 0.00017976502054910286, 'epoch': 0.85}
{'loss': 1.0701, 'learning_rate': 0.00017969561356455336, 'epoch': 0.85}
{'loss': 1.091, 'learning_rate': 0.00017962610119647777, 'epoch': 0.85}
{'loss': 1.0587, 'learning_rate': 0.00017955648353679398, 'epoch': 0.85}
{'loss': 1.1946, 'learning_rate': 0.00017948676067755916, 'epoch': 0.85}
{'loss': 1.1324, 'learning_rate': 0.00017941693271096966, 'epoch': 0.85}
{'loss': 1.0977, 'learning_rate': 0.00017934699972936075, 'epoch': 0.86}
{'loss': 1.1361, 'learning_rate': 0.00017927696182520658, 'epoch': 0.86}
{'loss': 1.0616, 'learning_rate': 0.00017920681909112008, 'epoch': 0.86}
{'loss': 1.0506, 'learning_rate': 0.00017913657161985268, 'epoch': 0.86}
{'loss': 1.0772, 'learning_rate': 0.00017906621950429443, 'epoch': 0.86}
{'loss': 1.1548, 'learning_rate': 0.00017899576283747373, 'epoch': 0.86}
{'loss': 1.0599, 'learning_rate': 0.0001789252017125572, 'epoch': 0.86}
{'loss': 1.0945, 'learning_rate': 0.0001788545362228496, 'epoch': 0.87}
{'loss': 1.0675, 'learning_rate': 0.00017878376646179368, 'epoch': 0.87}
{'loss': 1.0966, 'learning_rate': 0.00017871289252297011, 'epoch': 0.87}
{'loss': 1.1361, 'learning_rate': 0.0001786419145000973, 'epoch': 0.87}
{'loss': 1.0963, 'learning_rate': 0.00017857083248703126, 'epoch': 0.87}
{'loss': 1.0654, 'learning_rate': 0.00017849964657776552, 'epoch': 0.87}
{'loss': 0.9858, 'learning_rate': 0.00017842835686643108, 'epoch': 0.88}
{'loss': 1.0749, 'learning_rate': 0.00017835696344729605, 'epoch': 0.88}
{'loss': 1.0994, 'learning_rate': 0.00017828546641476578, 'epoch': 0.88}
{'loss': 1.0596, 'learning_rate': 0.0001782138658633826, 'epoch': 0.88}
{'loss': 0.9941, 'learning_rate': 0.00017814216188782577, 'epoch': 0.88}
{'loss': 1.0438, 'learning_rate': 0.00017807035458291122, 'epoch': 0.88}
{'loss': 1.1233, 'learning_rate': 0.0001779984440435916, 'epoch': 0.88}
{'loss': 1.0819, 'learning_rate': 0.000177926430364956, 'epoch': 0.89}
{'loss': 1.1729, 'learning_rate': 0.00017785431364222997, 'epoch': 0.89}
{'loss': 1.0372, 'learning_rate': 0.00017778209397077528, 'epoch': 0.89}
{'loss': 1.0326, 'learning_rate': 0.00017770977144608978, 'epoch': 0.89}
{'loss': 1.0828, 'learning_rate': 0.0001776373461638074, 'epoch': 0.89}
{'loss': 1.0522, 'learning_rate': 0.00017756481821969798, 'epoch': 0.89}
{'loss': 1.0151, 'learning_rate': 0.00017749218770966692, 'epoch': 0.89}
{'loss': 1.1541, 'learning_rate': 0.0001774194547297555, 'epoch': 0.9}
{'loss': 1.0455, 'learning_rate': 0.00017734661937614035, 'epoch': 0.9}
{'loss': 1.0421, 'learning_rate': 0.00017727368174513347, 'epoch': 0.9}
{'loss': 1.0451, 'learning_rate': 0.0001772006419331822, 'epoch': 0.9}
{'loss': 1.169, 'learning_rate': 0.00017712750003686883, 'epoch': 0.9}
{'loss': 1.1022, 'learning_rate': 0.00017705425615291084, 'epoch': 0.9}
{'loss': 1.0964, 'learning_rate': 0.00017698091037816042, 'epoch': 0.9}
{'loss': 1.0322, 'learning_rate': 0.00017690746280960454, 'epoch': 0.91}
{'loss': 1.0325, 'learning_rate': 0.0001768339135443648, 'epoch': 0.91}
{'loss': 1.0452, 'learning_rate': 0.00017676026267969728, 'epoch': 0.91}
{'loss': 1.0745, 'learning_rate': 0.0001766865103129923, 'epoch': 0.91}
{'loss': 1.1599, 'learning_rate': 0.00017661265654177454, 'epoch': 0.91}
{'loss': 0.9833, 'learning_rate': 0.00017653870146370267, 'epoch': 0.91}
{'loss': 1.0006, 'learning_rate': 0.00017646464517656943, 'epoch': 0.91}
{'loss': 1.0776, 'learning_rate': 0.0001763904877783013, 'epoch': 0.92}
{'loss': 1.011, 'learning_rate': 0.0001763162293669584, 'epoch': 0.92}
{'loss': 1.0316, 'learning_rate': 0.00017624187004073463, 'epoch': 0.92}
{'loss': 1.0778, 'learning_rate': 0.0001761674098979571, 'epoch': 0.92}
{'loss': 1.0309, 'learning_rate': 0.00017609284903708644, 'epoch': 0.92}
{'loss': 1.0083, 'learning_rate': 0.0001760181875567163, 'epoch': 0.92}
{'loss': 1.1088, 'learning_rate': 0.0001759434255555734, 'epoch': 0.92}
{'loss': 1.014, 'learning_rate': 0.00017586856313251756, 'epoch': 0.93}
{'loss': 1.0578, 'learning_rate': 0.00017579360038654114, 'epoch': 0.93}
{'loss': 1.0651, 'learning_rate': 0.00017571853741676932, 'epoch': 0.93}
{'loss': 1.1492, 'learning_rate': 0.00017564337432245976, 'epoch': 0.93}
{'loss': 1.0673, 'learning_rate': 0.00017556811120300253, 'epoch': 0.93}
{'loss': 1.0684, 'learning_rate': 0.00017549274815791994, 'epoch': 0.93}
{'loss': 1.0571, 'learning_rate': 0.00017541728528686645, 'epoch': 0.93}
{'loss': 1.0359, 'learning_rate': 0.00017534172268962852, 'epoch': 0.94}
{'loss': 1.1768, 'learning_rate': 0.00017526606046612452, 'epoch': 0.94}
{'loss': 1.0608, 'learning_rate': 0.0001751902987164045, 'epoch': 0.94}
{'loss': 1.0877, 'learning_rate': 0.00017511443754065012, 'epoch': 0.94}
{'loss': 1.0676, 'learning_rate': 0.00017503847703917455, 'epoch': 0.94}
{'loss': 1.073, 'learning_rate': 0.0001749624173124223, 'epoch': 0.94}
{'loss': 0.9852, 'learning_rate': 0.00017488625846096904, 'epoch': 0.94}
{'loss': 1.0669, 'learning_rate': 0.00017481000058552156, 'epoch': 0.95}
{'loss': 1.1724, 'learning_rate': 0.0001747336437869176, 'epoch': 0.95}
{'loss': 1.0379, 'learning_rate': 0.00017465718816612563, 'epoch': 0.95}
{'loss': 1.0634, 'learning_rate': 0.00017458063382424488, 'epoch': 0.95}
{'loss': 1.028, 'learning_rate': 0.00017450398086250513, 'epoch': 0.95}
{'loss': 1.0724, 'learning_rate': 0.00017442722938226647, 'epoch': 0.95}
{'loss': 1.0603, 'learning_rate': 0.00017435037948501935, 'epoch': 0.95}
{'loss': 1.0566, 'learning_rate': 0.00017427343127238439, 'epoch': 0.96}
{'loss': 0.9961, 'learning_rate': 0.00017419638484611206, 'epoch': 0.96}
{'loss': 1.0443, 'learning_rate': 0.00017411924030808284, 'epoch': 0.96}
{'loss': 1.0428, 'learning_rate': 0.0001740419977603069, 'epoch': 0.96}
{'loss': 1.0352, 'learning_rate': 0.00017396465730492406, 'epoch': 0.96}
{'loss': 1.0107, 'learning_rate': 0.00017388721904420352, 'epoch': 0.96}
{'loss': 1.0954, 'learning_rate': 0.00017380968308054385, 'epoch': 0.97}
{'loss': 1.129, 'learning_rate': 0.0001737320495164728, 'epoch': 0.97}
{'loss': 1.0696, 'learning_rate': 0.00017365431845464723, 'epoch': 0.97}
{'loss': 1.034, 'learning_rate': 0.0001735764899978529, 'epoch': 0.97}
{'loss': 1.0322, 'learning_rate': 0.0001734985642490043, 'epoch': 0.97}
{'loss': 1.0417, 'learning_rate': 0.00017342054131114465, 'epoch': 0.97}
{'loss': 1.0695, 'learning_rate': 0.00017334242128744568, 'epoch': 0.97}
{'loss': 1.1169, 'learning_rate': 0.0001732642042812074, 'epoch': 0.98}
{'loss': 0.9709, 'learning_rate': 0.00017318589039585816, 'epoch': 0.98}
{'loss': 0.9522, 'learning_rate': 0.00017310747973495446, 'epoch': 0.98}
{'loss': 1.0343, 'learning_rate': 0.00017302897240218065, 'epoch': 0.98}
{'loss': 1.0656, 'learning_rate': 0.00017295036850134893, 'epoch': 0.98}
{'loss': 1.1269, 'learning_rate': 0.0001728716681363993, 'epoch': 0.98}
{'loss': 1.0884, 'learning_rate': 0.00017279287141139918, 'epoch': 0.98}
{'loss': 0.979, 'learning_rate': 0.00017271397843054352, 'epoch': 0.99}
{'loss': 1.0008, 'learning_rate': 0.00017263498929815448, 'epoch': 0.99}
{'loss': 1.0743, 'learning_rate': 0.00017255590411868136, 'epoch': 0.99}
{'loss': 1.0851, 'learning_rate': 0.00017247672299670053, 'epoch': 0.99}
{'loss': 1.0137, 'learning_rate': 0.00017239744603691524, 'epoch': 0.99}
{'loss': 1.0706, 'learning_rate': 0.00017231807334415532, 'epoch': 0.99}
{'loss': 1.1174, 'learning_rate': 0.00017223860502337733, 'epoch': 0.99}
{'loss': 0.9967, 'learning_rate': 0.00017215904117966427, 'epoch': 1.0}
{'loss': 0.9764, 'learning_rate': 0.0001720793819182254, 'epoch': 1.0}
{'loss': 1.021, 'learning_rate': 0.00017199962734439618, 'epoch': 1.0}
{'loss': 1.0658, 'learning_rate': 0.00017191977756363808, 'epoch': 1.0}
 25%|████████████████████████████▎                                                                                    | 688/2752 [11:50<34:55,  1.02s/it][2023-12-29 02:15:16,017] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,017] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,017] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,017] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,018] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,018] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,018] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,018] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,018] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,018] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,018] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,018] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,018] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,019] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,019] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,019] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,273] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,274] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,274] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,275] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:15:16,536] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,536] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:15:16,799] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:16,800] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:15:17,052] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:17,053] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:15:17,342] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:17,342] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:15:17,609] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:17,610] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:15:17,866] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:17,866] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:15:18,135] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:18,135] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:15:18,399] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:18,400] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:15:18,669] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:18,670] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:15:18,932] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:18,933] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:15:19,197] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:15:19,198] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 1.0015047788619995, 'eval_runtime': 3.191, 'eval_samples_per_second': 342.217, 'eval_steps_per_second': 21.624, 'epoch': 1.0}
{'loss': 1.153, 'learning_rate': 0.00017183983268153849, 'epoch': 1.0}
{'loss': 1.1034, 'learning_rate': 0.00017175979280381056, 'epoch': 1.0}
{'loss': 1.1087, 'learning_rate': 0.00017167965803629307, 'epoch': 1.0}
{'loss': 1.072, 'learning_rate': 0.00017159942848495025, 'epoch': 1.01}
{'loss': 1.0972, 'learning_rate': 0.00017151910425587162, 'epoch': 1.01}
{'loss': 1.0909, 'learning_rate': 0.00017143868545527196, 'epoch': 1.01}
{'loss': 1.1009, 'learning_rate': 0.00017135817218949108, 'epoch': 1.01}
{'loss': 1.1206, 'learning_rate': 0.00017127756456499372, 'epoch': 1.01}
{'loss': 1.0852, 'learning_rate': 0.0001711968626883694, 'epoch': 1.01}
{'loss': 1.0235, 'learning_rate': 0.00017111606666633225, 'epoch': 1.01}
{'loss': 1.1133, 'learning_rate': 0.00017103517660572087, 'epoch': 1.02}
{'loss': 1.0687, 'learning_rate': 0.0001709541926134982, 'epoch': 1.02}
{'loss': 1.1827, 'learning_rate': 0.00017087311479675147, 'epoch': 1.02}
{'loss': 1.0509, 'learning_rate': 0.00017079194326269194, 'epoch': 1.02}
{'loss': 1.101, 'learning_rate': 0.00017071067811865476, 'epoch': 1.02}
 26%|████████████████████████████▊                                                                                    | 703/2752 [12:09<34:57,  1.02s/it][2023-12-29 02:15:34,683] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:15:34,684] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:15:34,684] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:15:34,684] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:15:34,684] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:15:34,684] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:15:34,685] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:15:34,686] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:15:34,715] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:15:34,716] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:15:34,717] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:15:34,717] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:15:34,717] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:15:34,717] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:15:34,719] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:15:34,945] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
{'loss': 1.0668, 'learning_rate': 0.0001706293194720989, 'epoch': 1.0}
{'loss': 0.9812, 'learning_rate': 0.00017054786743060692, 'epoch': 1.0}
{'loss': 1.1106, 'learning_rate': 0.00017046632210188496, 'epoch': 1.0}
{'loss': 0.9909, 'learning_rate': 0.00017038468359376245, 'epoch': 1.01}
{'loss': 1.074, 'learning_rate': 0.00017030295201419206, 'epoch': 1.01}
{'loss': 1.1153, 'learning_rate': 0.0001702211274712495, 'epoch': 1.01}
{'loss': 0.9878, 'learning_rate': 0.00017013921007313348, 'epoch': 1.01}
{'loss': 1.0902, 'learning_rate': 0.00017005719992816546, 'epoch': 1.01}
{'loss': 1.0829, 'learning_rate': 0.00016997509714478944, 'epoch': 1.01}
{'loss': 1.0556, 'learning_rate': 0.00016989290183157206, 'epoch': 1.01}
{'loss': 1.0264, 'learning_rate': 0.0001698106140972023, 'epoch': 1.02}
{'loss': 1.0451, 'learning_rate': 0.00016972823405049124, 'epoch': 1.02}
{'loss': 1.194, 'learning_rate': 0.00016964576180037217, 'epoch': 1.02}
{'loss': 1.0291, 'learning_rate': 0.00016956319745590017, 'epoch': 1.02}
{'loss': 1.0089, 'learning_rate': 0.00016948054112625222, 'epoch': 1.02}
{'loss': 1.0672, 'learning_rate': 0.00016939779292072683, 'epoch': 1.02}
{'loss': 1.1044, 'learning_rate': 0.00016931495294874408, 'epoch': 1.02}
{'loss': 1.0185, 'learning_rate': 0.0001692320213198453, 'epoch': 1.03}
{'loss': 1.0347, 'learning_rate': 0.0001691489981436932, 'epoch': 1.03}
{'loss': 1.0306, 'learning_rate': 0.00016906588353007132, 'epoch': 1.03}
{'loss': 1.022, 'learning_rate': 0.00016898267758888423, 'epoch': 1.03}
{'loss': 1.0474, 'learning_rate': 0.00016889938043015726, 'epoch': 1.03}
{'loss': 1.1231, 'learning_rate': 0.0001688159921640364, 'epoch': 1.03}
{'loss': 1.0324, 'learning_rate': 0.000168732512900788, 'epoch': 1.03}
{'loss': 0.9465, 'learning_rate': 0.00016864894275079882, 'epoch': 1.04}
{'loss': 1.0117, 'learning_rate': 0.0001685652818245758, 'epoch': 1.04}
{'loss': 1.0863, 'learning_rate': 0.0001684815302327459, 'epoch': 1.04}
{'loss': 1.0413, 'learning_rate': 0.00016839768808605594, 'epoch': 1.04}
{'loss': 1.1089, 'learning_rate': 0.00016831375549537252, 'epoch': 1.04}
{'loss': 0.9819, 'learning_rate': 0.00016822973257168186, 'epoch': 1.04}
{'loss': 1.0439, 'learning_rate': 0.00016814561942608957, 'epoch': 1.05}
{'loss': 1.0662, 'learning_rate': 0.00016806141616982059, 'epoch': 1.05}
{'loss': 0.9847, 'learning_rate': 0.00016797712291421904, 'epoch': 1.05}
{'loss': 1.0726, 'learning_rate': 0.00016789273977074797, 'epoch': 1.05}
{'loss': 1.0325, 'learning_rate': 0.00016780826685098942, 'epoch': 1.05}
{'loss': 1.073, 'learning_rate': 0.00016772370426664402, 'epoch': 1.05}
{'loss': 1.0774, 'learning_rate': 0.00016763905212953102, 'epoch': 1.05}
{'loss': 1.0768, 'learning_rate': 0.00016755431055158807, 'epoch': 1.06}
{'loss': 1.0674, 'learning_rate': 0.00016746947964487116, 'epoch': 1.06}
{'loss': 0.9796, 'learning_rate': 0.0001673845595215543, 'epoch': 1.06}
{'loss': 1.029, 'learning_rate': 0.0001672995502939295, 'epoch': 1.06}
{'loss': 1.0137, 'learning_rate': 0.00016721445207440664, 'epoch': 1.06}
{'loss': 1.0265, 'learning_rate': 0.00016712926497551326, 'epoch': 1.06}
{'loss': 1.1064, 'learning_rate': 0.0001670439891098944, 'epoch': 1.06}
{'loss': 1.0185, 'learning_rate': 0.00016695862459031248, 'epoch': 1.07}
{'loss': 1.027, 'learning_rate': 0.00016687317152964718, 'epoch': 1.07}
{'loss': 1.1075, 'learning_rate': 0.00016678763004089527, 'epoch': 1.07}
{'loss': 1.02, 'learning_rate': 0.00016670200023717038, 'epoch': 1.07}
{'loss': 1.0088, 'learning_rate': 0.00016661628223170295, 'epoch': 1.07}
{'loss': 1.0402, 'learning_rate': 0.0001665304761378401, 'epoch': 1.07}
{'loss': 1.0554, 'learning_rate': 0.00016644458206904546, 'epoch': 1.07}
{'loss': 0.9861, 'learning_rate': 0.00016635860013889886, 'epoch': 1.08}
{'loss': 0.9996, 'learning_rate': 0.00016627253046109638, 'epoch': 1.08}
{'loss': 1.0255, 'learning_rate': 0.00016618637314945014, 'epoch': 1.08}
{'loss': 1.0184, 'learning_rate': 0.00016610012831788813, 'epoch': 1.08}
{'loss': 0.948, 'learning_rate': 0.00016601379608045406, 'epoch': 1.08}
{'loss': 0.9442, 'learning_rate': 0.0001659273765513073, 'epoch': 1.08}
{'loss': 1.0535, 'learning_rate': 0.0001658408698447225, 'epoch': 1.08}
{'loss': 1.0937, 'learning_rate': 0.00016575427607508974, 'epoch': 1.09}
{'loss': 0.9786, 'learning_rate': 0.00016566759535691406, 'epoch': 1.09}
{'loss': 1.0295, 'learning_rate': 0.00016558082780481563, 'epoch': 1.09}
{'loss': 1.0469, 'learning_rate': 0.00016549397353352938, 'epoch': 1.09}
{'loss': 1.0665, 'learning_rate': 0.0001654070326579049, 'epoch': 1.09}
{'loss': 1.0111, 'learning_rate': 0.0001653200052929063, 'epoch': 1.09}
{'loss': 1.0803, 'learning_rate': 0.00016523289155361204, 'epoch': 1.09}
{'loss': 0.9923, 'learning_rate': 0.00016514569155521493, 'epoch': 1.1}
{'loss': 1.013, 'learning_rate': 0.0001650584054130216, 'epoch': 1.1}
{'loss': 0.9378, 'learning_rate': 0.00016497103324245282, 'epoch': 1.1}
{'loss': 1.0145, 'learning_rate': 0.00016488357515904295, 'epoch': 1.1}
{'loss': 1.0332, 'learning_rate': 0.0001647960312784401, 'epoch': 1.1}
{'loss': 1.0591, 'learning_rate': 0.0001647084017164057, 'epoch': 1.1}
{'loss': 1.05, 'learning_rate': 0.00016462068658881456, 'epoch': 1.1}
{'loss': 0.9587, 'learning_rate': 0.0001645328860116546, 'epoch': 1.11}
{'loss': 1.0445, 'learning_rate': 0.00016444500010102676, 'epoch': 1.11}
{'loss': 0.9294, 'learning_rate': 0.00016435702897314478, 'epoch': 1.11}
{'loss': 0.9881, 'learning_rate': 0.00016426897274433513, 'epoch': 1.11}
{'loss': 1.0397, 'learning_rate': 0.00016418083153103683, 'epoch': 1.11}
{'loss': 0.8706, 'learning_rate': 0.00016409260544980115, 'epoch': 1.11}
{'loss': 1.0591, 'learning_rate': 0.0001640042946172917, 'epoch': 1.11}
{'loss': 1.0461, 'learning_rate': 0.00016391589915028417, 'epoch': 1.12}
{'loss': 1.0836, 'learning_rate': 0.0001638274191656661, 'epoch': 1.12}
{'loss': 1.0256, 'learning_rate': 0.00016373885478043672, 'epoch': 1.12}
{'loss': 0.9782, 'learning_rate': 0.00016365020611170712, 'epoch': 1.12}
{'loss': 1.024, 'learning_rate': 0.0001635614732766996, 'epoch': 1.12}
{'loss': 0.9892, 'learning_rate': 0.00016347265639274778, 'epoch': 1.12}
{'loss': 1.0796, 'learning_rate': 0.00016338375557729658, 'epoch': 1.12}
{'loss': 0.984, 'learning_rate': 0.00016329477094790168, 'epoch': 1.13}
{'loss': 1.0713, 'learning_rate': 0.00016320570262222983, 'epoch': 1.13}
{'loss': 1.0742, 'learning_rate': 0.00016311655071805822, 'epoch': 1.13}
{'loss': 1.0199, 'learning_rate': 0.00016302731535327474, 'epoch': 1.13}
{'loss': 1.1462, 'learning_rate': 0.00016293799664587755, 'epoch': 1.13}
{'loss': 1.1392, 'learning_rate': 0.00016284859471397503, 'epoch': 1.13}
{'loss': 1.0094, 'learning_rate': 0.00016275910967578558, 'epoch': 1.14}
{'loss': 1.0022, 'learning_rate': 0.00016266954164963763, 'epoch': 1.14}
{'loss': 1.0915, 'learning_rate': 0.00016257989075396916, 'epoch': 1.14}
{'loss': 0.9938, 'learning_rate': 0.00016249015710732785, 'epoch': 1.14}
{'loss': 1.0792, 'learning_rate': 0.00016240034082837078, 'epoch': 1.14}
{'loss': 0.9668, 'learning_rate': 0.00016231044203586422, 'epoch': 1.14}
{'loss': 1.0253, 'learning_rate': 0.00016222046084868373, 'epoch': 1.14}
{'loss': 0.9409, 'learning_rate': 0.00016213039738581362, 'epoch': 1.15}
{'loss': 1.0344, 'learning_rate': 0.00016204025176634712, 'epoch': 1.15}
{'loss': 1.0437, 'learning_rate': 0.0001619500241094861, 'epoch': 1.15}
{'loss': 1.0711, 'learning_rate': 0.00016185971453454078, 'epoch': 1.15}
{'loss': 0.9458, 'learning_rate': 0.0001617693231609299, 'epoch': 1.15}
{'loss': 1.0187, 'learning_rate': 0.00016167885010818017, 'epoch': 1.15}
{'loss': 0.9752, 'learning_rate': 0.00016158829549592647, 'epoch': 1.15}
{'loss': 0.9931, 'learning_rate': 0.0001614976594439114, 'epoch': 1.16}
{'loss': 1.0036, 'learning_rate': 0.00016140694207198534, 'epoch': 1.16}
{'loss': 1.0178, 'learning_rate': 0.00016131614350010614, 'epoch': 1.16}
{'loss': 0.9627, 'learning_rate': 0.00016122526384833907, 'epoch': 1.16}
{'loss': 1.0381, 'learning_rate': 0.00016113430323685658, 'epoch': 1.16}
{'loss': 0.9023, 'learning_rate': 0.00016104326178593818, 'epoch': 1.16}
{'loss': 1.068, 'learning_rate': 0.00016095213961597033, 'epoch': 1.16}
{'loss': 0.9922, 'learning_rate': 0.0001608609368474461, 'epoch': 1.17}
{'loss': 0.9681, 'learning_rate': 0.00016076965360096535, 'epoch': 1.17}
{'loss': 0.9458, 'learning_rate': 0.00016067828999723405, 'epoch': 1.17}
{'loss': 1.0536, 'learning_rate': 0.00016058684615706477, 'epoch': 1.17}
{'loss': 1.0079, 'learning_rate': 0.0001604953222013759, 'epoch': 1.17}
{'loss': 1.0715, 'learning_rate': 0.0001604037182511919, 'epoch': 1.17}
{'loss': 1.0338, 'learning_rate': 0.00016031203442764307, 'epoch': 1.17}
{'loss': 0.9989, 'learning_rate': 0.00016022027085196516, 'epoch': 1.18}
{'loss': 0.9745, 'learning_rate': 0.00016012842764549952, 'epoch': 1.18}
{'loss': 0.9336, 'learning_rate': 0.0001600365049296927, 'epoch': 1.18}
{'loss': 1.0223, 'learning_rate': 0.0001599445028260965, 'epoch': 1.18}
{'loss': 1.0088, 'learning_rate': 0.0001598524214563675, 'epoch': 1.18}
{'loss': 0.9775, 'learning_rate': 0.0001597602609422674, 'epoch': 1.18}
{'loss': 1.1261, 'learning_rate': 0.00015966802140566225, 'epoch': 1.18}
{'loss': 0.8982, 'learning_rate': 0.0001595757029685228, 'epoch': 1.19}
{'loss': 1.0443, 'learning_rate': 0.00015948330575292401, 'epoch': 1.19}
{'loss': 0.9859, 'learning_rate': 0.00015939082988104505, 'epoch': 1.19}
{'loss': 1.0136, 'learning_rate': 0.00015929827547516914, 'epoch': 1.19}
{'loss': 0.9593, 'learning_rate': 0.0001592056426576833, 'epoch': 1.19}
{'loss': 1.0756, 'learning_rate': 0.0001591129315510782, 'epoch': 1.19}
{'loss': 1.0454, 'learning_rate': 0.00015902014227794816, 'epoch': 1.19}
{'loss': 1.0189, 'learning_rate': 0.00015892727496099075, 'epoch': 1.2}
{'loss': 1.1202, 'learning_rate': 0.00015883432972300674, 'epoch': 1.2}
{'loss': 1.1035, 'learning_rate': 0.00015874130668690003, 'epoch': 1.2}
{'loss': 1.0029, 'learning_rate': 0.0001586482059756773, 'epoch': 1.2}
{'loss': 1.0449, 'learning_rate': 0.00015855502771244798, 'epoch': 1.2}
{'loss': 0.9882, 'learning_rate': 0.00015846177202042406, 'epoch': 1.2}
{'loss': 0.9717, 'learning_rate': 0.00015836843902291984, 'epoch': 1.2}
{'loss': 0.9821, 'learning_rate': 0.000158275028843352, 'epoch': 1.21}
{'loss': 1.0507, 'learning_rate': 0.00015818154160523911, 'epoch': 1.21}
{'loss': 0.9342, 'learning_rate': 0.00015808797743220175, 'epoch': 1.21}
{'loss': 0.9054, 'learning_rate': 0.00015799433644796216, 'epoch': 1.21}
{'loss': 1.0236, 'learning_rate': 0.0001579006187763442, 'epoch': 1.21}
{'loss': 1.0042, 'learning_rate': 0.00015780682454127312, 'epoch': 1.21}
{'loss': 0.9812, 'learning_rate': 0.00015771295386677543, 'epoch': 1.22}
{'loss': 0.9981, 'learning_rate': 0.00015761900687697865, 'epoch': 1.22}
{'loss': 1.0162, 'learning_rate': 0.00015752498369611133, 'epoch': 1.22}
{'loss': 1.03, 'learning_rate': 0.0001574308844485026, 'epoch': 1.22}
{'loss': 1.0454, 'learning_rate': 0.00015733670925858237, 'epoch': 1.22}
{'loss': 1.0052, 'learning_rate': 0.00015724245825088086, 'epoch': 1.22}
{'loss': 1.0582, 'learning_rate': 0.0001571481315500285, 'epoch': 1.22}
{'loss': 1.0621, 'learning_rate': 0.00015705372928075594, 'epoch': 1.23}
{'loss': 0.918, 'learning_rate': 0.00015695925156789366, 'epoch': 1.23}
{'loss': 1.1238, 'learning_rate': 0.00015686469853637192, 'epoch': 1.23}
 31%|███████████████████████████████████▎                                                                             | 860/2752 [14:49<31:45,  1.01s/it][2023-12-29 02:18:14,763] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:14,763] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:14,764] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:14,764] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:14,764] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:14,764] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:14,764] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:14,764] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:14,765] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:14,765] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:14,765] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:14,765] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:14,765] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:14,765] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:14,766] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:14,766] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:15,019] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:15,020] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:15,021] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:15,021] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:18:15,285] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:15,286] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:18:15,547] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:15,548] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:18:15,798] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:15,798] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:18:16,081] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:16,082] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:18:16,349] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:16,350] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:18:16,601] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:16,602] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:18:16,862] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:16,863] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:18:17,128] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:17,129] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:18:17,394] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:17,394] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:18:17,656] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:17,656] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:18:17,918] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:18:17,919] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 0.997471809387207, 'eval_runtime': 3.1658, 'eval_samples_per_second': 344.941, 'eval_steps_per_second': 21.796, 'epoch': 1.23}
{'loss': 1.0215, 'learning_rate': 0.0001567700703112206, 'epoch': 1.23}
{'loss': 1.0901, 'learning_rate': 0.00015667536701756903, 'epoch': 1.23}
{'loss': 1.043, 'learning_rate': 0.00015658058878064573, 'epoch': 1.23}
{'loss': 1.0507, 'learning_rate': 0.00015648573572577839, 'epoch': 1.23}
{'loss': 0.9744, 'learning_rate': 0.0001563908079783935, 'epoch': 1.24}
{'loss': 1.0352, 'learning_rate': 0.00015629580566401657, 'epoch': 1.24}
{'loss': 1.0486, 'learning_rate': 0.0001562007289082715, 'epoch': 1.24}
{'loss': 1.0542, 'learning_rate': 0.0001561055778368807, 'epoch': 1.24}
{'loss': 1.0303, 'learning_rate': 0.00015601035257566478, 'epoch': 1.24}
{'loss': 0.9847, 'learning_rate': 0.00015591505325054258, 'epoch': 1.24}
{'loss': 1.0166, 'learning_rate': 0.00015581967998753082, 'epoch': 1.24}
{'loss': 1.0479, 'learning_rate': 0.00015572423291274393, 'epoch': 1.25}
{'loss': 0.9412, 'learning_rate': 0.00015562871215239402, 'epoch': 1.25}
{'loss': 0.9782, 'learning_rate': 0.00015553311783279055, 'epoch': 1.25}
{'loss': 1.051, 'learning_rate': 0.00015543745008034042, 'epoch': 1.25}
{'loss': 0.9484, 'learning_rate': 0.00015534170902154742, 'epoch': 1.25}
{'loss': 1.0431, 'learning_rate': 0.0001552458947830124, 'epoch': 1.25}
{'loss': 1.0192, 'learning_rate': 0.000155150007491433, 'epoch': 1.25}
{'loss': 0.9414, 'learning_rate': 0.00015505404727360334, 'epoch': 1.26}
{'loss': 1.0227, 'learning_rate': 0.00015495801425641407, 'epoch': 1.26}
{'loss': 1.0344, 'learning_rate': 0.00015486190856685208, 'epoch': 1.26}
{'loss': 0.9855, 'learning_rate': 0.0001547657303320004, 'epoch': 1.26}
{'loss': 1.0361, 'learning_rate': 0.00015466947967903786, 'epoch': 1.26}
{'loss': 0.9577, 'learning_rate': 0.0001545731567352392, 'epoch': 1.26}
{'loss': 0.983, 'learning_rate': 0.00015447676162797465, 'epoch': 1.26}
{'loss': 0.9632, 'learning_rate': 0.00015438029448470991, 'epoch': 1.27}
{'loss': 0.9546, 'learning_rate': 0.00015428375543300599, 'epoch': 1.27}
{'loss': 0.9796, 'learning_rate': 0.00015418714460051875, 'epoch': 1.27}
{'loss': 0.9752, 'learning_rate': 0.0001540904621149993, 'epoch': 1.27}
{'loss': 1.1023, 'learning_rate': 0.0001539937081042933, 'epoch': 1.27}
{'loss': 0.9553, 'learning_rate': 0.00015389688269634098, 'epoch': 1.27}
{'loss': 0.9235, 'learning_rate': 0.00015379998601917704, 'epoch': 1.27}
{'loss': 1.0244, 'learning_rate': 0.00015370301820093042, 'epoch': 1.28}
{'loss': 1.0053, 'learning_rate': 0.00015360597936982416, 'epoch': 1.28}
{'loss': 1.0732, 'learning_rate': 0.0001535088696541751, 'epoch': 1.28}
{'loss': 1.0219, 'learning_rate': 0.0001534116891823939, 'epoch': 1.28}
{'loss': 0.9317, 'learning_rate': 0.00015331443808298473, 'epoch': 1.28}
{'loss': 0.9772, 'learning_rate': 0.00015321711648454524, 'epoch': 1.28}
{'loss': 1.0778, 'learning_rate': 0.00015311972451576618, 'epoch': 1.28}
{'loss': 1.0702, 'learning_rate': 0.00015302226230543147, 'epoch': 1.29}
{'loss': 1.0377, 'learning_rate': 0.00015292472998241778, 'epoch': 1.29}
{'loss': 1.0045, 'learning_rate': 0.00015282712767569463, 'epoch': 1.29}
{'loss': 0.9666, 'learning_rate': 0.00015272945551432398, 'epoch': 1.29}
{'loss': 0.9771, 'learning_rate': 0.00015263171362746026, 'epoch': 1.29}
{'loss': 0.9417, 'learning_rate': 0.00015253390214435, 'epoch': 1.29}
{'loss': 1.0094, 'learning_rate': 0.0001524360211943318, 'epoch': 1.3}
{'loss': 0.939, 'learning_rate': 0.0001523380709068361, 'epoch': 1.3}
{'loss': 0.9674, 'learning_rate': 0.0001522400514113851, 'epoch': 1.3}
{'loss': 0.9809, 'learning_rate': 0.00015214196283759238, 'epoch': 1.3}
{'loss': 0.9492, 'learning_rate': 0.00015204380531516298, 'epoch': 1.3}
{'loss': 1.0156, 'learning_rate': 0.0001519455789738931, 'epoch': 1.3}
{'loss': 1.0039, 'learning_rate': 0.00015184728394366988, 'epoch': 1.3}
{'loss': 1.0555, 'learning_rate': 0.00015174892035447134, 'epoch': 1.31}
{'loss': 1.0223, 'learning_rate': 0.00015165048833636616, 'epoch': 1.31}
{'loss': 0.9875, 'learning_rate': 0.0001515519880195135, 'epoch': 1.31}
{'loss': 0.9453, 'learning_rate': 0.00015145341953416271, 'epoch': 1.31}
{'loss': 1.0224, 'learning_rate': 0.00015135478301065352, 'epoch': 1.31}
{'loss': 1.0238, 'learning_rate': 0.00015125607857941547, 'epoch': 1.31}
{'loss': 1.029, 'learning_rate': 0.0001511573063709679, 'epoch': 1.31}
{'loss': 1.0911, 'learning_rate': 0.0001510584665159198, 'epoch': 1.32}
{'loss': 1.0157, 'learning_rate': 0.00015095955914496965, 'epoch': 1.32}
{'loss': 0.9321, 'learning_rate': 0.00015086058438890508, 'epoch': 1.32}
{'loss': 0.9421, 'learning_rate': 0.00015076154237860304, 'epoch': 1.32}
{'loss': 0.9346, 'learning_rate': 0.00015066243324502918, 'epoch': 1.32}
{'loss': 0.9771, 'learning_rate': 0.00015056325711923808, 'epoch': 1.32}
{'loss': 0.9584, 'learning_rate': 0.00015046401413237282, 'epoch': 1.32}
{'loss': 1.0972, 'learning_rate': 0.00015036470441566488, 'epoch': 1.33}
{'loss': 0.9689, 'learning_rate': 0.00015026532810043407, 'epoch': 1.33}
{'loss': 1.0963, 'learning_rate': 0.00015016588531808816, 'epoch': 1.33}
{'loss': 1.0024, 'learning_rate': 0.00015006637620012286, 'epoch': 1.33}
{'loss': 1.0214, 'learning_rate': 0.00014996680087812165, 'epoch': 1.33}
{'loss': 1.0303, 'learning_rate': 0.00014986715948375542, 'epoch': 1.33}
{'loss': 0.9603, 'learning_rate': 0.00014976745214878256, 'epoch': 1.33}
{'loss': 1.0109, 'learning_rate': 0.00014966767900504856, 'epoch': 1.34}
{'loss': 1.0529, 'learning_rate': 0.00014956784018448603, 'epoch': 1.34}
{'loss': 1.0332, 'learning_rate': 0.00014946793581911428, 'epoch': 1.34}
{'loss': 0.9636, 'learning_rate': 0.00014936796604103948, 'epoch': 1.34}
{'loss': 1.008, 'learning_rate': 0.00014926793098245415, 'epoch': 1.34}
{'loss': 0.9722, 'learning_rate': 0.00014916783077563716, 'epoch': 1.34}
{'loss': 0.9184, 'learning_rate': 0.00014906766555295358, 'epoch': 1.34}
{'loss': 0.9888, 'learning_rate': 0.0001489674354468544, 'epoch': 1.35}
{'loss': 1.0216, 'learning_rate': 0.00014886714058987642, 'epoch': 1.35}
{'loss': 1.0306, 'learning_rate': 0.0001487667811146421, 'epoch': 1.35}
{'loss': 1.0693, 'learning_rate': 0.00014866635715385927, 'epoch': 1.35}
{'loss': 1.0027, 'learning_rate': 0.00014856586884032108, 'epoch': 1.35}
{'loss': 0.8941, 'learning_rate': 0.00014846531630690582, 'epoch': 1.35}
{'loss': 0.9139, 'learning_rate': 0.00014836469968657659, 'epoch': 1.35}
{'loss': 0.9428, 'learning_rate': 0.0001482640191123813, 'epoch': 1.36}
{'loss': 0.9904, 'learning_rate': 0.00014816327471745244, 'epoch': 1.36}
{'loss': 0.924, 'learning_rate': 0.00014806246663500686, 'epoch': 1.36}
{'loss': 0.9746, 'learning_rate': 0.00014796159499834568, 'epoch': 1.36}
{'loss': 1.0442, 'learning_rate': 0.00014786065994085396, 'epoch': 1.36}
{'loss': 0.9975, 'learning_rate': 0.0001477596615960007, 'epoch': 1.36}
{'loss': 1.0596, 'learning_rate': 0.00014765860009733858, 'epoch': 1.36}
{'loss': 0.9934, 'learning_rate': 0.00014755747557850378, 'epoch': 1.37}
{'loss': 1.0132, 'learning_rate': 0.00014745628817321578, 'epoch': 1.37}
{'loss': 1.0168, 'learning_rate': 0.00014735503801527726, 'epoch': 1.37}
{'loss': 0.978, 'learning_rate': 0.00014725372523857386, 'epoch': 1.37}
{'loss': 1.0202, 'learning_rate': 0.00014715234997707404, 'epoch': 1.37}
{'loss': 0.988, 'learning_rate': 0.00014705091236482887, 'epoch': 1.37}
{'loss': 1.0283, 'learning_rate': 0.00014694941253597183, 'epoch': 1.38}
{'loss': 1.0172, 'learning_rate': 0.00014684785062471883, 'epoch': 1.38}
{'loss': 0.999, 'learning_rate': 0.0001467462267653676, 'epoch': 1.38}
{'loss': 1.0348, 'learning_rate': 0.00014664454109229808, 'epoch': 1.38}
{'loss': 0.8903, 'learning_rate': 0.00014654279373997172, 'epoch': 1.38}
{'loss': 0.9988, 'learning_rate': 0.00014644098484293164, 'epoch': 1.38}
{'loss': 0.965, 'learning_rate': 0.0001463391145358023, 'epoch': 1.38}
{'loss': 0.9749, 'learning_rate': 0.00014623718295328944, 'epoch': 1.39}
{'loss': 1.0572, 'learning_rate': 0.00014613519023017974, 'epoch': 1.39}
{'loss': 1.0616, 'learning_rate': 0.00014603313650134075, 'epoch': 1.39}
{'loss': 1.1367, 'learning_rate': 0.00014593102190172067, 'epoch': 1.39}
{'loss': 0.9937, 'learning_rate': 0.00014582884656634827, 'epoch': 1.39}
{'loss': 1.0623, 'learning_rate': 0.0001457266106303326, 'epoch': 1.39}
{'loss': 1.1189, 'learning_rate': 0.00014562431422886272, 'epoch': 1.39}
{'loss': 0.9141, 'learning_rate': 0.0001455219574972079, 'epoch': 1.4}
{'loss': 1.0323, 'learning_rate': 0.00014541954057071692, 'epoch': 1.4}
{'loss': 0.9206, 'learning_rate': 0.0001453170635848183, 'epoch': 1.4}
{'loss': 1.103, 'learning_rate': 0.00014521452667501996, 'epoch': 1.4}
{'loss': 1.0007, 'learning_rate': 0.00014511192997690905, 'epoch': 1.4}
{'loss': 0.981, 'learning_rate': 0.00014500927362615177, 'epoch': 1.4}
{'loss': 0.9768, 'learning_rate': 0.00014490655775849324, 'epoch': 1.4}
{'loss': 1.0751, 'learning_rate': 0.0001448037825097572, 'epoch': 1.41}
{'loss': 1.0242, 'learning_rate': 0.000144700948015846, 'epoch': 1.41}
{'loss': 0.9725, 'learning_rate': 0.00014459805441274028, 'epoch': 1.41}
{'loss': 1.0499, 'learning_rate': 0.00014449510183649886, 'epoch': 1.41}
{'loss': 1.0967, 'learning_rate': 0.00014439209042325856, 'epoch': 1.41}
{'loss': 1.095, 'learning_rate': 0.00014428902030923392, 'epoch': 1.41}
{'loss': 1.0553, 'learning_rate': 0.00014418589163071722, 'epoch': 1.41}
{'loss': 1.01, 'learning_rate': 0.00014408270452407807, 'epoch': 1.42}
{'loss': 1.0895, 'learning_rate': 0.0001439794591257634, 'epoch': 1.42}
{'loss': 1.067, 'learning_rate': 0.00014387615557229726, 'epoch': 1.42}
{'loss': 1.0042, 'learning_rate': 0.00014377279400028053, 'epoch': 1.42}
{'loss': 1.0315, 'learning_rate': 0.00014366937454639078, 'epoch': 1.42}
{'loss': 1.0287, 'learning_rate': 0.0001435658973473822, 'epoch': 1.42}
{'loss': 0.9953, 'learning_rate': 0.00014346236254008537, 'epoch': 1.42}
{'loss': 0.9153, 'learning_rate': 0.00014335877026140688, 'epoch': 1.43}
{'loss': 0.874, 'learning_rate': 0.00014325512064832953, 'epoch': 1.43}
{'loss': 1.0082, 'learning_rate': 0.00014315141383791175, 'epoch': 1.43}
{'loss': 0.9922, 'learning_rate': 0.0001430476499672877, 'epoch': 1.43}
{'loss': 0.903, 'learning_rate': 0.000142943829173667, 'epoch': 1.43}
{'loss': 1.0113, 'learning_rate': 0.00014283995159433444, 'epoch': 1.43}
{'loss': 1.0344, 'learning_rate': 0.00014273601736665, 'epoch': 1.43}
{'loss': 0.9561, 'learning_rate': 0.00014263202662804863, 'epoch': 1.44}
{'loss': 0.8212, 'learning_rate': 0.00014252797951603977, 'epoch': 1.44}
{'loss': 1.02, 'learning_rate': 0.00014242387616820762, 'epoch': 1.44}
{'loss': 1.0819, 'learning_rate': 0.0001423197167222107, 'epoch': 1.44}
{'loss': 0.9885, 'learning_rate': 0.00014221550131578162, 'epoch': 1.44}
{'loss': 0.9739, 'learning_rate': 0.00014211123008672712, 'epoch': 1.44}
{'loss': 0.9883, 'learning_rate': 0.0001420069031729276, 'epoch': 1.44}
{'loss': 0.9477, 'learning_rate': 0.00014190252071233727, 'epoch': 1.45}
{'loss': 1.0646, 'learning_rate': 0.0001417980828429836, 'epoch': 1.45}
{'loss': 1.0487, 'learning_rate': 0.0001416935897029675, 'epoch': 1.45}
{'loss': 1.038, 'learning_rate': 0.00014158904143046286, 'epoch': 1.45}
{'loss': 1.0058, 'learning_rate': 0.0001414844381637165, 'epoch': 1.45}
{'loss': 0.9658, 'learning_rate': 0.00014137978004104802, 'epoch': 1.45}
{'loss': 1.11, 'learning_rate': 0.0001412750672008494, 'epoch': 1.45}
{'loss': 1.0374, 'learning_rate': 0.0001411702997815852, 'epoch': 1.46}
{'loss': 0.9788, 'learning_rate': 0.00014106547792179196, 'epoch': 1.46}
{'loss': 0.9702, 'learning_rate': 0.00014096060176007827, 'epoch': 1.46}
{'loss': 0.952, 'learning_rate': 0.00014085567143512457, 'epoch': 1.46}
{'loss': 1.085, 'learning_rate': 0.00014075068708568284, 'epoch': 1.46}
{'loss': 0.9883, 'learning_rate': 0.00014064564885057657, 'epoch': 1.46}
{'loss': 1.0018, 'learning_rate': 0.0001405405568687005, 'epoch': 1.47}
{'loss': 1.0296, 'learning_rate': 0.00014043541127902037, 'epoch': 1.47}
{'loss': 1.0565, 'learning_rate': 0.00014033021222057283, 'epoch': 1.47}
{'loss': 0.9503, 'learning_rate': 0.00014022495983246534, 'epoch': 1.47}
{'loss': 0.9864, 'learning_rate': 0.00014011965425387573, 'epoch': 1.47}
{'loss': 1.0184, 'learning_rate': 0.00014001429562405225, 'epoch': 1.47}
{'loss': 1.0072, 'learning_rate': 0.00013990888408231333, 'epoch': 1.47}
{'loss': 1.0445, 'learning_rate': 0.00013980341976804726, 'epoch': 1.48}
{'loss': 1.0382, 'learning_rate': 0.00013969790282071217, 'epoch': 1.48}
{'loss': 0.9769, 'learning_rate': 0.00013959233337983582, 'epoch': 1.48}
 38%|██████████████████████████████████████████                                                                      | 1032/2752 [17:46<29:04,  1.01s/it][2023-12-29 02:21:12,185] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,186] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,186] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,186] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,186] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,186] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,186] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,186] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,187] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,187] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,187] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,187] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,187] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,187] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,187] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,187] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,443] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,444] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,444] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,445] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:21:12,705] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,706] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:21:12,969] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:12,970] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:21:13,220] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:13,221] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:21:13,513] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:13,514] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:21:13,777] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:13,777] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:21:14,032] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:14,032] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:21:14,300] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:14,301] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:21:14,559] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:14,559] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:21:14,829] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:14,829] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:21:15,091] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:15,091] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:21:15,356] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:21:15,356] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 0.9937575459480286, 'eval_runtime': 3.1818, 'eval_samples_per_second': 343.202, 'eval_steps_per_second': 21.686, 'epoch': 1.48}
{'loss': 1.0046, 'learning_rate': 0.0001394867115850153, 'epoch': 1.48}
{'loss': 0.9755, 'learning_rate': 0.00013938103757591704, 'epoch': 1.48}
{'loss': 0.9396, 'learning_rate': 0.00013927531149227645, 'epoch': 1.48}
{'loss': 0.9849, 'learning_rate': 0.00013916953347389776, 'epoch': 1.48}
{'loss': 1.0334, 'learning_rate': 0.00013906370366065398, 'epoch': 1.49}
{'loss': 0.9628, 'learning_rate': 0.0001389578221924865, 'epoch': 1.49}
{'loss': 0.978, 'learning_rate': 0.00013885188920940506, 'epoch': 1.49}
{'loss': 0.9242, 'learning_rate': 0.0001387459048514876, 'epoch': 1.49}
{'loss': 1.0053, 'learning_rate': 0.00013863986925887983, 'epoch': 1.49}
{'loss': 1.0103, 'learning_rate': 0.00013853378257179535, 'epoch': 1.49}
{'loss': 1.105, 'learning_rate': 0.00013842764493051526, 'epoch': 1.49}
{'loss': 1.083, 'learning_rate': 0.00013832145647538799, 'epoch': 1.5}
{'loss': 0.9747, 'learning_rate': 0.00013821521734682933, 'epoch': 1.5}
{'loss': 0.9911, 'learning_rate': 0.00013810892768532186, 'epoch': 1.5}
{'loss': 0.9411, 'learning_rate': 0.00013800258763141518, 'epoch': 1.5}
{'loss': 0.9309, 'learning_rate': 0.00013789619732572538, 'epoch': 1.5}
{'loss': 1.0339, 'learning_rate': 0.00013778975690893503, 'epoch': 1.5}
{'loss': 1.0039, 'learning_rate': 0.00013768326652179307, 'epoch': 1.5}
{'loss': 1.0032, 'learning_rate': 0.00013757672630511434, 'epoch': 1.51}
{'loss': 0.9655, 'learning_rate': 0.00013747013639977973, 'epoch': 1.51}
{'loss': 0.9827, 'learning_rate': 0.00013736349694673576, 'epoch': 1.51}
{'loss': 0.9483, 'learning_rate': 0.00013725680808699444, 'epoch': 1.51}
{'loss': 1.1289, 'learning_rate': 0.00013715006996163317, 'epoch': 1.51}
{'loss': 0.9843, 'learning_rate': 0.0001370432827117945, 'epoch': 1.51}
{'loss': 1.0217, 'learning_rate': 0.00013693644647868586, 'epoch': 1.51}
{'loss': 1.0234, 'learning_rate': 0.00013682956140357954, 'epoch': 1.52}
{'loss': 0.8829, 'learning_rate': 0.00013672262762781242, 'epoch': 1.52}
{'loss': 1.0308, 'learning_rate': 0.0001366156452927856, 'epoch': 1.52}
{'loss': 0.9609, 'learning_rate': 0.00013650861453996465, 'epoch': 1.52}
{'loss': 0.9404, 'learning_rate': 0.00013640153551087902, 'epoch': 1.52}
{'loss': 1.0322, 'learning_rate': 0.000136294408347122, 'epoch': 1.52}
{'loss': 1.0137, 'learning_rate': 0.00013618723319035056, 'epoch': 1.52}
{'loss': 1.0723, 'learning_rate': 0.00013608001018228512, 'epoch': 1.53}
{'loss': 0.9559, 'learning_rate': 0.00013597273946470937, 'epoch': 1.53}
{'loss': 1.0487, 'learning_rate': 0.0001358654211794701, 'epoch': 1.53}
{'loss': 1.0032, 'learning_rate': 0.000135758055468477, 'epoch': 1.53}
{'loss': 1.0157, 'learning_rate': 0.00013565064247370248, 'epoch': 1.53}
{'loss': 1.0161, 'learning_rate': 0.00013554318233718136, 'epoch': 1.53}
{'loss': 1.1203, 'learning_rate': 0.00013543567520101106, 'epoch': 1.53}
{'loss': 0.9438, 'learning_rate': 0.00013532812120735087, 'epoch': 1.54}
{'loss': 1.0074, 'learning_rate': 0.00013522052049842216, 'epoch': 1.54}
{'loss': 0.9111, 'learning_rate': 0.0001351128732165081, 'epoch': 1.54}
{'loss': 1.0367, 'learning_rate': 0.00013500517950395348, 'epoch': 1.54}
{'loss': 1.1566, 'learning_rate': 0.0001348974395031643, 'epoch': 1.54}
{'loss': 1.0114, 'learning_rate': 0.00013478965335660798, 'epoch': 1.54}
{'loss': 0.9228, 'learning_rate': 0.00013468182120681278, 'epoch': 1.55}
{'loss': 1.0196, 'learning_rate': 0.000134573943196368, 'epoch': 1.55}
{'loss': 1.0479, 'learning_rate': 0.00013446601946792334, 'epoch': 1.55}
{'loss': 0.9756, 'learning_rate': 0.00013435805016418913, 'epoch': 1.55}
{'loss': 1.0486, 'learning_rate': 0.00013425003542793596, 'epoch': 1.55}
{'loss': 0.9684, 'learning_rate': 0.00013414197540199436, 'epoch': 1.55}
{'loss': 0.9797, 'learning_rate': 0.00013403387022925488, 'epoch': 1.55}
{'loss': 0.9891, 'learning_rate': 0.0001339257200526677, 'epoch': 1.56}
{'loss': 0.9018, 'learning_rate': 0.0001338175250152426, 'epoch': 1.56}
{'loss': 1.0394, 'learning_rate': 0.00013370928526004855, 'epoch': 1.56}
{'loss': 0.9698, 'learning_rate': 0.00013360100093021376, 'epoch': 1.56}
{'loss': 0.9922, 'learning_rate': 0.00013349267216892529, 'epoch': 1.56}
{'loss': 0.9479, 'learning_rate': 0.00013338429911942908, 'epoch': 1.56}
{'loss': 1.0514, 'learning_rate': 0.00013327588192502948, 'epoch': 1.56}
{'loss': 1.1081, 'learning_rate': 0.00013316742072908927, 'epoch': 1.57}
{'loss': 0.9938, 'learning_rate': 0.00013305891567502953, 'epoch': 1.57}
{'loss': 0.9916, 'learning_rate': 0.0001329503669063292, 'epoch': 1.57}
{'loss': 1.0116, 'learning_rate': 0.000132841774566525, 'epoch': 1.57}
{'loss': 1.0194, 'learning_rate': 0.0001327331387992114, 'epoch': 1.57}
{'loss': 1.0529, 'learning_rate': 0.0001326244597480402, 'epoch': 1.57}
{'loss': 1.032, 'learning_rate': 0.00013251573755672047, 'epoch': 1.57}
{'loss': 1.0313, 'learning_rate': 0.0001324069723690183, 'epoch': 1.58}
{'loss': 0.9669, 'learning_rate': 0.00013229816432875664, 'epoch': 1.58}
{'loss': 0.9497, 'learning_rate': 0.00013218931357981514, 'epoch': 1.58}
{'loss': 1.0103, 'learning_rate': 0.0001320804202661299, 'epoch': 1.58}
{'loss': 1.0261, 'learning_rate': 0.0001319714845316933, 'epoch': 1.58}
{'loss': 0.9605, 'learning_rate': 0.00013186250652055378, 'epoch': 1.58}
{'loss': 1.0291, 'learning_rate': 0.00013175348637681575, 'epoch': 1.58}
{'loss': 0.959, 'learning_rate': 0.00013164442424463935, 'epoch': 1.59}
{'loss': 1.0226, 'learning_rate': 0.0001315353202682401, 'epoch': 1.59}
{'loss': 0.957, 'learning_rate': 0.00013142617459188899, 'epoch': 1.59}
{'loss': 0.964, 'learning_rate': 0.00013131698735991217, 'epoch': 1.59}
{'loss': 0.9521, 'learning_rate': 0.0001312077587166906, 'epoch': 1.59}
{'loss': 1.0107, 'learning_rate': 0.00013109848880666014, 'epoch': 1.59}
{'loss': 0.9698, 'learning_rate': 0.0001309891777743111, 'epoch': 1.59}
{'loss': 1.0156, 'learning_rate': 0.00013087982576418823, 'epoch': 1.6}
{'loss': 1.1068, 'learning_rate': 0.00013077043292089054, 'epoch': 1.6}
{'loss': 1.1366, 'learning_rate': 0.00013066099938907085, 'epoch': 1.6}
{'loss': 0.9279, 'learning_rate': 0.00013055152531343592, 'epoch': 1.6}
{'loss': 1.0342, 'learning_rate': 0.00013044201083874612, 'epoch': 1.6}
{'loss': 0.9757, 'learning_rate': 0.00013033245610981516, 'epoch': 1.6}
{'loss': 0.9279, 'learning_rate': 0.00013022286127151007, 'epoch': 1.6}
{'loss': 0.9938, 'learning_rate': 0.00013011322646875088, 'epoch': 1.61}
{'loss': 0.9187, 'learning_rate': 0.00013000355184651045, 'epoch': 1.61}
{'loss': 1.0253, 'learning_rate': 0.0001298938375498143, 'epoch': 1.61}
{'loss': 1.1222, 'learning_rate': 0.00012978408372374048, 'epoch': 1.61}
{'loss': 1.0878, 'learning_rate': 0.00012967429051341913, 'epoch': 1.61}
{'loss': 0.9866, 'learning_rate': 0.0001295644580640327, 'epoch': 1.61}
{'loss': 1.0562, 'learning_rate': 0.00012945458652081535, 'epoch': 1.61}
{'loss': 1.0038, 'learning_rate': 0.00012934467602905304, 'epoch': 1.62}
{'loss': 0.9432, 'learning_rate': 0.00012923472673408322, 'epoch': 1.62}
{'loss': 0.9757, 'learning_rate': 0.00012912473878129454, 'epoch': 1.62}
{'loss': 1.1197, 'learning_rate': 0.0001290147123161269, 'epoch': 1.62}
{'loss': 1.012, 'learning_rate': 0.0001289046474840711, 'epoch': 1.62}
{'loss': 0.9468, 'learning_rate': 0.0001287945444306686, 'epoch': 1.62}
{'loss': 0.9955, 'learning_rate': 0.00012868440330151152, 'epoch': 1.62}
{'loss': 1.0147, 'learning_rate': 0.0001285742242422422, 'epoch': 1.63}
{'loss': 0.9599, 'learning_rate': 0.00012846400739855324, 'epoch': 1.63}
{'loss': 1.0064, 'learning_rate': 0.00012835375291618716, 'epoch': 1.63}
{'loss': 1.0272, 'learning_rate': 0.0001282434609409362, 'epoch': 1.63}
{'loss': 1.047, 'learning_rate': 0.00012813313161864228, 'epoch': 1.63}
{'loss': 0.9432, 'learning_rate': 0.00012802276509519666, 'epoch': 1.63}
{'loss': 1.0305, 'learning_rate': 0.00012791236151653973, 'epoch': 1.64}
{'loss': 0.9884, 'learning_rate': 0.00012780192102866098, 'epoch': 1.64}
{'loss': 0.9868, 'learning_rate': 0.00012769144377759866, 'epoch': 1.64}
{'loss': 1.0013, 'learning_rate': 0.00012758092990943962, 'epoch': 1.64}
{'loss': 0.9558, 'learning_rate': 0.00012747037957031916, 'epoch': 1.64}
{'loss': 1.0111, 'learning_rate': 0.00012735979290642076, 'epoch': 1.64}
{'loss': 1.0384, 'learning_rate': 0.000127249170063976, 'epoch': 1.64}
{'loss': 0.9203, 'learning_rate': 0.00012713851118926426, 'epoch': 1.65}
{'loss': 0.9246, 'learning_rate': 0.00012702781642861253, 'epoch': 1.65}
{'loss': 1.0293, 'learning_rate': 0.00012691708592839533, 'epoch': 1.65}
{'loss': 1.0329, 'learning_rate': 0.00012680631983503436, 'epoch': 1.65}
{'loss': 0.9835, 'learning_rate': 0.00012669551829499852, 'epoch': 1.65}
{'loss': 0.9903, 'learning_rate': 0.00012658468145480337, 'epoch': 1.65}
{'loss': 0.9837, 'learning_rate': 0.0001264738094610114, 'epoch': 1.65}
{'loss': 0.9596, 'learning_rate': 0.0001263629024602313, 'epoch': 1.66}
{'loss': 1.0152, 'learning_rate': 0.00012625196059911834, 'epoch': 1.66}
{'loss': 1.065, 'learning_rate': 0.00012614098402437366, 'epoch': 1.66}
{'loss': 1.1273, 'learning_rate': 0.00012602997288274444, 'epoch': 1.66}
{'loss': 0.9468, 'learning_rate': 0.0001259189273210235, 'epoch': 1.66}
{'loss': 1.0009, 'learning_rate': 0.00012580784748604922, 'epoch': 1.66}
{'loss': 0.994, 'learning_rate': 0.00012569673352470523, 'epoch': 1.66}
{'loss': 1.049, 'learning_rate': 0.00012558558558392038, 'epoch': 1.67}
{'loss': 0.9821, 'learning_rate': 0.0001254744038106684, 'epoch': 1.67}
{'loss': 1.1094, 'learning_rate': 0.00012536318835196773, 'epoch': 1.67}
{'loss': 0.9801, 'learning_rate': 0.00012525193935488137, 'epoch': 1.67}
{'loss': 1.0237, 'learning_rate': 0.00012514065696651674, 'epoch': 1.67}
{'loss': 0.9143, 'learning_rate': 0.00012502934133402533, 'epoch': 1.67}
{'loss': 1.065, 'learning_rate': 0.00012491799260460265, 'epoch': 1.67}
{'loss': 0.9975, 'learning_rate': 0.00012480661092548786, 'epoch': 1.68}
{'loss': 0.9822, 'learning_rate': 0.00012469519644396385, 'epoch': 1.68}
{'loss': 0.9908, 'learning_rate': 0.0001245837493073568, 'epoch': 1.68}
{'loss': 0.9924, 'learning_rate': 0.00012447226966303605, 'epoch': 1.68}
{'loss': 1.0077, 'learning_rate': 0.00012436075765841396, 'epoch': 1.68}
{'loss': 0.9311, 'learning_rate': 0.00012424921344094566, 'epoch': 1.68}
{'loss': 1.0039, 'learning_rate': 0.0001241376371581289, 'epoch': 1.68}
{'loss': 0.995, 'learning_rate': 0.0001240260289575039, 'epoch': 1.69}
{'loss': 0.9583, 'learning_rate': 0.00012391438898665287, 'epoch': 1.69}
{'loss': 0.9996, 'learning_rate': 0.00012380271739320027, 'epoch': 1.69}
{'loss': 0.9926, 'learning_rate': 0.00012369101432481224, 'epoch': 1.69}
{'loss': 1.0325, 'learning_rate': 0.00012357927992919657, 'epoch': 1.69}
{'loss': 1.0725, 'learning_rate': 0.00012346751435410248, 'epoch': 1.69}
{'loss': 0.9411, 'learning_rate': 0.00012335571774732044, 'epoch': 1.69}
{'loss': 0.9459, 'learning_rate': 0.0001232438902566819, 'epoch': 1.7}
{'loss': 0.9839, 'learning_rate': 0.0001231320320300592, 'epoch': 1.7}
{'loss': 1.0347, 'learning_rate': 0.0001230201432153653, 'epoch': 1.7}
{'loss': 0.9203, 'learning_rate': 0.00012290822396055355, 'epoch': 1.7}
{'loss': 1.018, 'learning_rate': 0.00012279627441361772, 'epoch': 1.7}
{'loss': 1.0189, 'learning_rate': 0.00012268429472259143, 'epoch': 1.7}
{'loss': 1.0424, 'learning_rate': 0.00012257228503554835, 'epoch': 1.7}
{'loss': 0.9882, 'learning_rate': 0.00012246024550060166, 'epoch': 1.71}
{'loss': 0.9952, 'learning_rate': 0.0001223481762659041, 'epoch': 1.71}
{'loss': 0.9864, 'learning_rate': 0.00012223607747964766, 'epoch': 1.71}
{'loss': 0.9798, 'learning_rate': 0.00012212394929006336, 'epoch': 1.71}
{'loss': 0.9661, 'learning_rate': 0.00012201179184542115, 'epoch': 1.71}
{'loss': 1.0852, 'learning_rate': 0.00012189960529402971, 'epoch': 1.71}
{'loss': 0.942, 'learning_rate': 0.00012178738978423612, 'epoch': 1.72}
{'loss': 1.0777, 'learning_rate': 0.00012167514546442576, 'epoch': 1.72}
{'loss': 0.996, 'learning_rate': 0.00012156287248302219, 'epoch': 1.72}
{'loss': 1.0227, 'learning_rate': 0.00012145057098848673, 'epoch': 1.72}
{'loss': 1.0104, 'learning_rate': 0.00012133824112931858, 'epoch': 1.72}
{'loss': 0.9697, 'learning_rate': 0.00012122588305405434, 'epoch': 1.72}
{'loss': 0.9514, 'learning_rate': 0.00012111349691126785, 'epoch': 1.72}
{'loss': 0.9427, 'learning_rate': 0.00012100108284957028, 'epoch': 1.73}
{'loss': 0.9278, 'learning_rate': 0.0001208886410176095, 'epoch': 1.73}
{'loss': 1.085, 'learning_rate': 0.0001207761715640702, 'epoch': 1.73}
 44%|█████████████████████████████████████████████████                                                               | 1204/2752 [20:44<26:11,  1.02s/it][2023-12-29 02:24:09,743] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,743] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,743] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,743] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,743] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,743] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,743] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,744] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,744] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,744] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,744] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,744] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,744] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,745] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,746] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,747] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,999] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:09,999] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:10,000] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:10,001] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:24:10,268] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:10,269] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:24:10,528] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:10,528] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:24:10,779] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:10,780] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:24:11,065] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:11,066] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:24:11,334] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:11,335] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:24:11,586] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:11,586] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:24:11,850] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:11,850] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:24:12,111] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:12,112] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:24:12,381] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:12,382] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:24:12,644] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:12,644] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:24:12,908] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:24:12,908] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 0.986236572265625, 'eval_runtime': 3.1796, 'eval_samples_per_second': 343.443, 'eval_steps_per_second': 21.701, 'epoch': 1.73}
{'loss': 1.0051, 'learning_rate': 0.00012066367463767361, 'epoch': 1.73}
{'loss': 1.0065, 'learning_rate': 0.00012055115038717722, 'epoch': 1.73}
{'loss': 0.8877, 'learning_rate': 0.00012043859896137472, 'epoch': 1.73}
{'loss': 0.9551, 'learning_rate': 0.00012032602050909574, 'epoch': 1.73}
{'loss': 0.9067, 'learning_rate': 0.00012021341517920555, 'epoch': 1.74}
{'loss': 1.0503, 'learning_rate': 0.00012010078312060504, 'epoch': 1.74}
{'loss': 0.9591, 'learning_rate': 0.00011998812448223049, 'epoch': 1.74}
{'loss': 0.9306, 'learning_rate': 0.00011987543941305321, 'epoch': 1.74}
{'loss': 1.0437, 'learning_rate': 0.00011976272806207956, 'epoch': 1.74}
{'loss': 1.0226, 'learning_rate': 0.00011964999057835055, 'epoch': 1.74}
{'loss': 1.0021, 'learning_rate': 0.00011953722711094189, 'epoch': 1.74}
{'loss': 0.899, 'learning_rate': 0.00011942443780896351, 'epoch': 1.75}
{'loss': 0.9354, 'learning_rate': 0.00011931162282155953, 'epoch': 1.75}
{'loss': 0.9396, 'learning_rate': 0.00011919878229790813, 'epoch': 1.75}
{'loss': 1.0293, 'learning_rate': 0.00011908591638722115, 'epoch': 1.75}
{'loss': 0.9847, 'learning_rate': 0.00011897302523874405, 'epoch': 1.75}
{'loss': 0.9713, 'learning_rate': 0.00011886010900175564, 'epoch': 1.75}
{'loss': 0.9512, 'learning_rate': 0.00011874716782556794, 'epoch': 1.75}
{'loss': 0.9958, 'learning_rate': 0.0001186342018595259, 'epoch': 1.76}
{'loss': 0.918, 'learning_rate': 0.0001185212112530073, 'epoch': 1.76}
{'loss': 1.0313, 'learning_rate': 0.00011840819615542247, 'epoch': 1.76}
{'loss': 1.0363, 'learning_rate': 0.00011829515671621412, 'epoch': 1.76}
{'loss': 1.0067, 'learning_rate': 0.00011818209308485717, 'epoch': 1.76}
{'loss': 1.0817, 'learning_rate': 0.0001180690054108585, 'epoch': 1.76}
{'loss': 0.9825, 'learning_rate': 0.00011795589384375686, 'epoch': 1.76}
{'loss': 1.1017, 'learning_rate': 0.00011784275853312245, 'epoch': 1.77}
{'loss': 0.9762, 'learning_rate': 0.00011772959962855704, 'epoch': 1.77}
{'loss': 1.0414, 'learning_rate': 0.00011761641727969343, 'epoch': 1.77}
{'loss': 0.8992, 'learning_rate': 0.0001175032116361956, 'epoch': 1.77}
{'loss': 1.107, 'learning_rate': 0.00011738998284775815, 'epoch': 1.77}
{'loss': 0.9726, 'learning_rate': 0.00011727673106410642, 'epoch': 1.77}
{'loss': 1.1176, 'learning_rate': 0.00011716345643499608, 'epoch': 1.77}
{'loss': 1.0255, 'learning_rate': 0.00011705015911021307, 'epoch': 1.78}
{'loss': 0.9226, 'learning_rate': 0.00011693683923957328, 'epoch': 1.78}
{'loss': 0.9997, 'learning_rate': 0.00011682349697292245, 'epoch': 1.78}
{'loss': 1.0464, 'learning_rate': 0.00011671013246013596, 'epoch': 1.78}
{'loss': 1.0169, 'learning_rate': 0.0001165967458511185, 'epoch': 1.78}
{'loss': 0.99, 'learning_rate': 0.00011648333729580412, 'epoch': 1.78}
{'loss': 0.9599, 'learning_rate': 0.0001163699069441558, 'epoch': 1.78}
{'loss': 1.0039, 'learning_rate': 0.00011625645494616535, 'epoch': 1.79}
{'loss': 0.9786, 'learning_rate': 0.00011614298145185323, 'epoch': 1.79}
{'loss': 1.0318, 'learning_rate': 0.00011602948661126828, 'epoch': 1.79}
{'loss': 1.0288, 'learning_rate': 0.00011591597057448769, 'epoch': 1.79}
{'loss': 0.9541, 'learning_rate': 0.0001158024334916165, 'epoch': 1.79}
{'loss': 0.9439, 'learning_rate': 0.00011568887551278768, 'epoch': 1.79}
{'loss': 1.115, 'learning_rate': 0.00011557529678816188, 'epoch': 1.8}
{'loss': 1.0287, 'learning_rate': 0.00011546169746792705, 'epoch': 1.8}
{'loss': 1.0036, 'learning_rate': 0.00011534807770229845, 'epoch': 1.8}
{'loss': 1.0283, 'learning_rate': 0.00011523443764151842, 'epoch': 1.8}
{'loss': 1.0294, 'learning_rate': 0.00011512077743585603, 'epoch': 1.8}
{'loss': 1.0725, 'learning_rate': 0.0001150070972356071, 'epoch': 1.8}
{'loss': 0.9904, 'learning_rate': 0.00011489339719109378, 'epoch': 1.8}
{'loss': 0.9253, 'learning_rate': 0.00011477967745266453, 'epoch': 1.81}
{'loss': 1.0266, 'learning_rate': 0.00011466593817069391, 'epoch': 1.81}
{'loss': 0.949, 'learning_rate': 0.00011455217949558217, 'epoch': 1.81}
{'loss': 1.0171, 'learning_rate': 0.00011443840157775527, 'epoch': 1.81}
{'loss': 0.9657, 'learning_rate': 0.00011432460456766471, 'epoch': 1.81}
{'loss': 0.9154, 'learning_rate': 0.00011421078861578709, 'epoch': 1.81}
{'loss': 1.0106, 'learning_rate': 0.00011409695387262416, 'epoch': 1.81}
{'loss': 1.0194, 'learning_rate': 0.00011398310048870247, 'epoch': 1.82}
{'loss': 0.9796, 'learning_rate': 0.00011386922861457319, 'epoch': 1.82}
{'loss': 0.9294, 'learning_rate': 0.00011375533840081202, 'epoch': 1.82}
{'loss': 0.9676, 'learning_rate': 0.00011364142999801887, 'epoch': 1.82}
{'loss': 1.049, 'learning_rate': 0.00011352750355681772, 'epoch': 1.82}
{'loss': 0.9453, 'learning_rate': 0.00011341355922785634, 'epoch': 1.82}
{'loss': 1.0379, 'learning_rate': 0.00011329959716180622, 'epoch': 1.82}
{'loss': 0.9617, 'learning_rate': 0.00011318561750936232, 'epoch': 1.83}
{'loss': 0.9331, 'learning_rate': 0.00011307162042124277, 'epoch': 1.83}
{'loss': 0.9028, 'learning_rate': 0.00011295760604818882, 'epoch': 1.83}
{'loss': 0.9642, 'learning_rate': 0.00011284357454096457, 'epoch': 1.83}
{'loss': 1.0175, 'learning_rate': 0.00011272952605035674, 'epoch': 1.83}
{'loss': 1.003, 'learning_rate': 0.00011261546072717454, 'epoch': 1.83}
{'loss': 1.0012, 'learning_rate': 0.00011250137872224946, 'epoch': 1.83}
{'loss': 0.9744, 'learning_rate': 0.00011238728018643499, 'epoch': 1.84}
{'loss': 1.0007, 'learning_rate': 0.00011227316527060651, 'epoch': 1.84}
{'loss': 0.9524, 'learning_rate': 0.00011215903412566111, 'epoch': 1.84}
{'loss': 0.9723, 'learning_rate': 0.00011204488690251725, 'epoch': 1.84}
{'loss': 1.0132, 'learning_rate': 0.00011193072375211468, 'epoch': 1.84}
{'loss': 1.0582, 'learning_rate': 0.00011181654482541428, 'epoch': 1.84}
{'loss': 0.9719, 'learning_rate': 0.00011170235027339766, 'epoch': 1.84}
{'loss': 1.0145, 'learning_rate': 0.0001115881402470672, 'epoch': 1.85}
{'loss': 1.0237, 'learning_rate': 0.0001114739148974457, 'epoch': 1.85}
{'loss': 0.9978, 'learning_rate': 0.00011135967437557626, 'epoch': 1.85}
{'loss': 1.0125, 'learning_rate': 0.00011124541883252198, 'epoch': 1.85}
{'loss': 0.9764, 'learning_rate': 0.00011113114841936584, 'epoch': 1.85}
{'loss': 1.1135, 'learning_rate': 0.00011101686328721053, 'epoch': 1.85}
{'loss': 1.0534, 'learning_rate': 0.00011090256358717819, 'epoch': 1.85}
{'loss': 1.0163, 'learning_rate': 0.00011078824947041016, 'epoch': 1.86}
{'loss': 1.0513, 'learning_rate': 0.00011067392108806692, 'epoch': 1.86}
{'loss': 0.9866, 'learning_rate': 0.00011055957859132773, 'epoch': 1.86}
{'loss': 0.9761, 'learning_rate': 0.00011044522213139064, 'epoch': 1.86}
{'loss': 1.0012, 'learning_rate': 0.00011033085185947208, 'epoch': 1.86}
{'loss': 1.0798, 'learning_rate': 0.00011021646792680667, 'epoch': 1.86}
{'loss': 0.9703, 'learning_rate': 0.00011010207048464729, 'epoch': 1.86}
{'loss': 1.007, 'learning_rate': 0.00010998765968426449, 'epoch': 1.87}
{'loss': 0.9898, 'learning_rate': 0.00010987323567694661, 'epoch': 1.87}
{'loss': 1.0196, 'learning_rate': 0.00010975879861399938, 'epoch': 1.87}
{'loss': 1.0584, 'learning_rate': 0.00010964434864674584, 'epoch': 1.87}
{'loss': 1.0112, 'learning_rate': 0.00010952988592652611, 'epoch': 1.87}
{'loss': 0.9821, 'learning_rate': 0.00010941541060469712, 'epoch': 1.87}
{'loss': 0.9078, 'learning_rate': 0.00010930092283263243, 'epoch': 1.88}
{'loss': 1.0014, 'learning_rate': 0.00010918642276172218, 'epoch': 1.88}
{'loss': 1.0205, 'learning_rate': 0.00010907191054337269, 'epoch': 1.88}
{'loss': 0.9852, 'learning_rate': 0.00010895738632900636, 'epoch': 1.88}
{'loss': 0.9204, 'learning_rate': 0.00010884285027006147, 'epoch': 1.88}
{'loss': 0.96, 'learning_rate': 0.0001087283025179919, 'epoch': 1.88}
{'loss': 1.0313, 'learning_rate': 0.00010861374322426714, 'epoch': 1.88}
{'loss': 1.006, 'learning_rate': 0.00010849917254037174, 'epoch': 1.89}
{'loss': 1.0996, 'learning_rate': 0.00010838459061780546, 'epoch': 1.89}
{'loss': 0.9643, 'learning_rate': 0.0001082699976080829, 'epoch': 1.89}
{'loss': 0.9693, 'learning_rate': 0.00010815539366273327, 'epoch': 1.89}
{'loss': 0.9897, 'learning_rate': 0.00010804077893330023, 'epoch': 1.89}
{'loss': 0.9793, 'learning_rate': 0.0001079261535713418, 'epoch': 1.89}
{'loss': 0.943, 'learning_rate': 0.00010781151772842993, 'epoch': 1.89}
{'loss': 1.0706, 'learning_rate': 0.00010769687155615055, 'epoch': 1.9}
{'loss': 0.9709, 'learning_rate': 0.00010758221520610321, 'epoch': 1.9}
{'loss': 0.9712, 'learning_rate': 0.00010746754882990082, 'epoch': 1.9}
{'loss': 0.9715, 'learning_rate': 0.00010735287257916972, 'epoch': 1.9}
{'loss': 1.0767, 'learning_rate': 0.00010723818660554913, 'epoch': 1.9}
{'loss': 1.0162, 'learning_rate': 0.00010712349106069131, 'epoch': 1.9}
{'loss': 1.0161, 'learning_rate': 0.00010700878609626102, 'epoch': 1.9}
{'loss': 0.9651, 'learning_rate': 0.00010689407186393552, 'epoch': 1.91}
{'loss': 0.962, 'learning_rate': 0.0001067793485154044, 'epoch': 1.91}
{'loss': 0.9621, 'learning_rate': 0.00010666461620236922, 'epoch': 1.91}
{'loss': 0.9946, 'learning_rate': 0.00010654987507654341, 'epoch': 1.91}
{'loss': 1.0797, 'learning_rate': 0.00010643512528965207, 'epoch': 1.91}
{'loss': 0.9162, 'learning_rate': 0.00010632036699343178, 'epoch': 1.91}
{'loss': 0.9236, 'learning_rate': 0.00010620560033963025, 'epoch': 1.91}
{'loss': 0.9982, 'learning_rate': 0.00010609082548000642, 'epoch': 1.92}
{'loss': 0.9382, 'learning_rate': 0.00010597604256632994, 'epoch': 1.92}
{'loss': 0.9533, 'learning_rate': 0.0001058612517503812, 'epoch': 1.92}
{'loss': 0.9939, 'learning_rate': 0.00010574645318395095, 'epoch': 1.92}
{'loss': 0.9483, 'learning_rate': 0.00010563164701884027, 'epoch': 1.92}
{'loss': 0.9357, 'learning_rate': 0.00010551683340686027, 'epoch': 1.92}
{'loss': 1.0208, 'learning_rate': 0.00010540201249983188, 'epoch': 1.92}
{'loss': 0.9353, 'learning_rate': 0.00010528718444958567, 'epoch': 1.93}
{'loss': 0.9885, 'learning_rate': 0.00010517234940796173, 'epoch': 1.93}
{'loss': 0.9881, 'learning_rate': 0.00010505750752680926, 'epoch': 1.93}
{'loss': 1.0692, 'learning_rate': 0.00010494265895798665, 'epoch': 1.93}
{'loss': 0.9999, 'learning_rate': 0.00010482780385336106, 'epoch': 1.93}
{'loss': 0.9933, 'learning_rate': 0.00010471294236480827, 'epoch': 1.93}
{'loss': 0.9862, 'learning_rate': 0.00010459807464421257, 'epoch': 1.93}
{'loss': 0.9535, 'learning_rate': 0.0001044832008434664, 'epoch': 1.94}
{'loss': 1.1075, 'learning_rate': 0.00010436832111447034, 'epoch': 1.94}
{'loss': 0.9865, 'learning_rate': 0.00010425343560913277, 'epoch': 1.94}
{'loss': 1.0135, 'learning_rate': 0.00010413854447936966, 'epoch': 1.94}
{'loss': 0.9973, 'learning_rate': 0.00010402364787710451, 'epoch': 1.94}
{'loss': 0.9933, 'learning_rate': 0.00010390874595426794, 'epoch': 1.94}
{'loss': 0.9078, 'learning_rate': 0.0001037938388627977, 'epoch': 1.94}
{'loss': 0.9869, 'learning_rate': 0.00010367892675463837, 'epoch': 1.95}
{'loss': 1.0976, 'learning_rate': 0.0001035640097817411, 'epoch': 1.95}
{'loss': 0.9593, 'learning_rate': 0.00010344908809606353, 'epoch': 1.95}
{'loss': 0.9876, 'learning_rate': 0.0001033341618495695, 'epoch': 1.95}
{'loss': 0.9538, 'learning_rate': 0.0001032192311942289, 'epoch': 1.95}
{'loss': 0.996, 'learning_rate': 0.00010310429628201743, 'epoch': 1.95}
{'loss': 0.984, 'learning_rate': 0.00010298935726491648, 'epoch': 1.95}
{'loss': 0.9882, 'learning_rate': 0.00010287441429491274, 'epoch': 1.96}
{'loss': 0.9246, 'learning_rate': 0.0001027594675239983, 'epoch': 1.96}
{'loss': 0.9562, 'learning_rate': 0.00010264451710417011, 'epoch': 1.96}
{'loss': 0.9681, 'learning_rate': 0.00010252956318743006, 'epoch': 1.96}
{'loss': 0.9678, 'learning_rate': 0.0001024146059257846, 'epoch': 1.96}
{'loss': 0.9447, 'learning_rate': 0.00010229964547124464, 'epoch': 1.96}
{'loss': 1.0149, 'learning_rate': 0.0001021846819758253, 'epoch': 1.97}
{'loss': 1.0448, 'learning_rate': 0.0001020697155915457, 'epoch': 1.97}
{'loss': 0.9901, 'learning_rate': 0.0001019547464704288, 'epoch': 1.97}
{'loss': 0.9507, 'learning_rate': 0.00010183977476450117, 'epoch': 1.97}
{'loss': 0.9689, 'learning_rate': 0.00010172480062579287, 'epoch': 1.97}
{'loss': 0.9805, 'learning_rate': 0.000101609824206337, 'epoch': 1.97}
{'loss': 1.002, 'learning_rate': 0.00010149484565816992, 'epoch': 1.97}
{'loss': 1.0156, 'learning_rate': 0.00010137986513333055, 'epoch': 1.98}
{'loss': 0.9005, 'learning_rate': 0.00010126488278386063, 'epoch': 1.98}
{'loss': 0.8677, 'learning_rate': 0.00010114989876180423, 'epoch': 1.98}
 50%|████████████████████████████████████████████████████████                                                        | 1376/2752 [23:41<23:01,  1.00s/it][2023-12-29 02:27:06,938] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:06,938] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:06,938] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:06,938] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:06,938] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:06,938] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:06,938] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:06,938] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:06,938] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:06,938] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:06,938] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:06,939] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:06,939] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:06,939] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:06,939] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:06,940] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:07,193] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:07,193] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:07,194] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:07,195] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:27:07,455] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:07,456] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:27:07,720] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:07,721] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:27:07,971] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:07,971] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:27:08,256] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:08,256] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:27:08,521] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:08,522] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:27:08,775] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:08,775] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:27:09,036] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:09,036] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:27:09,298] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:09,299] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:27:09,568] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:09,569] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:27:09,830] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:09,831] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:27:10,095] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:27:10,096] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 0.9859890937805176, 'eval_runtime': 3.1694, 'eval_samples_per_second': 344.542, 'eval_steps_per_second': 21.77, 'epoch': 1.98}
{'loss': 0.9542, 'learning_rate': 0.00010103491321920757, 'epoch': 1.98}
{'loss': 0.994, 'learning_rate': 0.000100919926308119, 'epoch': 1.98}
{'loss': 1.0406, 'learning_rate': 0.00010080493818058859, 'epoch': 1.98}
{'loss': 1.0105, 'learning_rate': 0.00010068994898866804, 'epoch': 1.98}
{'loss': 0.8996, 'learning_rate': 0.00010057495888441046, 'epoch': 1.99}
{'loss': 0.9243, 'learning_rate': 0.00010045996801987023, 'epoch': 1.99}
{'loss': 0.9922, 'learning_rate': 0.00010034497654710266, 'epoch': 1.99}
{'loss': 1.017, 'learning_rate': 0.00010022998461816389, 'epoch': 1.99}
{'loss': 0.93, 'learning_rate': 0.00010011499238511062, 'epoch': 1.99}
{'loss': 0.9934, 'learning_rate': 0.0001, 'epoch': 1.99}
{'loss': 1.0425, 'learning_rate': 9.988500761488941e-05, 'epoch': 1.99}
{'loss': 0.9271, 'learning_rate': 9.977001538183616e-05, 'epoch': 2.0}
{'loss': 0.8973, 'learning_rate': 9.965502345289733e-05, 'epoch': 2.0}
{'loss': 0.9481, 'learning_rate': 9.954003198012977e-05, 'epoch': 2.0}
{'loss': 0.9896, 'learning_rate': 9.942504111558956e-05, 'epoch': 2.0}
{'loss': 1.0777, 'learning_rate': 9.9310051011332e-05, 'epoch': 2.0}
{'loss': 1.0304, 'learning_rate': 9.919506181941146e-05, 'epoch': 2.0}
{'loss': 1.0172, 'learning_rate': 9.908007369188105e-05, 'epoch': 2.0}
{'loss': 1.0018, 'learning_rate': 9.896508678079244e-05, 'epoch': 2.01}
{'loss': 1.0292, 'learning_rate': 9.88501012381958e-05, 'epoch': 2.01}
{'loss': 1.0116, 'learning_rate': 9.873511721613938e-05, 'epoch': 2.01}
{'loss': 1.0264, 'learning_rate': 9.862013486666947e-05, 'epoch': 2.01}
{'loss': 1.0555, 'learning_rate': 9.850515434183013e-05, 'epoch': 2.01}
{'loss': 1.0142, 'learning_rate': 9.839017579366299e-05, 'epoch': 2.01}
{'loss': 0.9613, 'learning_rate': 9.827519937420716e-05, 'epoch': 2.01}
{'loss': 1.0444, 'learning_rate': 9.816022523549885e-05, 'epoch': 2.02}
{'loss': 0.9935, 'learning_rate': 9.804525352957124e-05, 'epoch': 2.02}
{'loss': 1.0946, 'learning_rate': 9.793028440845434e-05, 'epoch': 2.02}
{'loss': 0.9827, 'learning_rate': 9.781531802417473e-05, 'epoch': 2.02}
{'loss': 1.0292, 'learning_rate': 9.770035452875537e-05, 'epoch': 2.02}
 51%|█████████████████████████████████████████████████████████▏                                                      | 1406/2752 [24:15<22:39,  1.01s/it][2023-12-29 02:27:40,645] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:27:40,645] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:27:40,645] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:27:40,645] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:27:40,646] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:27:40,646] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:27:40,646] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:27:40,646] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:27:40,677] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:27:40,677] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:27:40,678] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:27:40,678] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:27:40,678] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:27:40,678] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:27:40,678] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:27:40,679] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
{'loss': 0.9939, 'learning_rate': 9.758539407421542e-05, 'epoch': 2.0}
{'loss': 0.8951, 'learning_rate': 9.747043681256996e-05, 'epoch': 2.0}
{'loss': 1.0295, 'learning_rate': 9.735548289582992e-05, 'epoch': 2.0}
{'loss': 0.9203, 'learning_rate': 9.724053247600175e-05, 'epoch': 2.01}
{'loss': 0.9928, 'learning_rate': 9.712558570508726e-05, 'epoch': 2.01}
{'loss': 1.0479, 'learning_rate': 9.701064273508356e-05, 'epoch': 2.01}
{'loss': 0.9144, 'learning_rate': 9.68957037179826e-05, 'epoch': 2.01}
{'loss': 1.0052, 'learning_rate': 9.678076880577114e-05, 'epoch': 2.01}
{'loss': 1.0168, 'learning_rate': 9.666583815043054e-05, 'epoch': 2.01}
{'loss': 0.9791, 'learning_rate': 9.65509119039365e-05, 'epoch': 2.01}
{'loss': 0.944, 'learning_rate': 9.643599021825892e-05, 'epoch': 2.02}
{'loss': 0.9631, 'learning_rate': 9.632107324536165e-05, 'epoch': 2.02}
{'loss': 1.107, 'learning_rate': 9.620616113720232e-05, 'epoch': 2.02}
{'loss': 0.9464, 'learning_rate': 9.609125404573211e-05, 'epoch': 2.02}
{'loss': 0.9397, 'learning_rate': 9.59763521228955e-05, 'epoch': 2.02}
{'loss': 0.989, 'learning_rate': 9.586145552063035e-05, 'epoch': 2.02}
{'loss': 1.0245, 'learning_rate': 9.574656439086725e-05, 'epoch': 2.02}
{'loss': 0.9392, 'learning_rate': 9.563167888552968e-05, 'epoch': 2.03}
{'loss': 0.9548, 'learning_rate': 9.551679915653362e-05, 'epoch': 2.03}
{'loss': 0.956, 'learning_rate': 9.540192535578748e-05, 'epoch': 2.03}
{'loss': 0.9523, 'learning_rate': 9.528705763519176e-05, 'epoch': 2.03}
{'loss': 0.9789, 'learning_rate': 9.517219614663896e-05, 'epoch': 2.03}
{'loss': 1.0522, 'learning_rate': 9.505734104201336e-05, 'epoch': 2.03}
{'loss': 0.9545, 'learning_rate': 9.494249247319077e-05, 'epoch': 2.03}
{'loss': 0.8591, 'learning_rate': 9.482765059203834e-05, 'epoch': 2.04}
{'loss': 0.9401, 'learning_rate': 9.471281555041432e-05, 'epoch': 2.04}
{'loss': 1.0116, 'learning_rate': 9.459798750016813e-05, 'epoch': 2.04}
{'loss': 0.9633, 'learning_rate': 9.448316659313975e-05, 'epoch': 2.04}
{'loss': 1.031, 'learning_rate': 9.436835298115975e-05, 'epoch': 2.04}
{'loss': 0.9011, 'learning_rate': 9.425354681604907e-05, 'epoch': 2.04}
{'loss': 0.9536, 'learning_rate': 9.413874824961883e-05, 'epoch': 2.05}
{'loss': 0.9879, 'learning_rate': 9.402395743367008e-05, 'epoch': 2.05}
{'loss': 0.8898, 'learning_rate': 9.390917451999359e-05, 'epoch': 2.05}
{'loss': 0.9827, 'learning_rate': 9.379439966036977e-05, 'epoch': 2.05}
{'loss': 0.9504, 'learning_rate': 9.367963300656827e-05, 'epoch': 2.05}
{'loss': 0.9886, 'learning_rate': 9.356487471034796e-05, 'epoch': 2.05}
{'loss': 1.0051, 'learning_rate': 9.34501249234566e-05, 'epoch': 2.05}
{'loss': 0.9898, 'learning_rate': 9.333538379763079e-05, 'epoch': 2.06}
{'loss': 0.9908, 'learning_rate': 9.32206514845956e-05, 'epoch': 2.06}
{'loss': 0.8935, 'learning_rate': 9.310592813606449e-05, 'epoch': 2.06}
{'loss': 0.9603, 'learning_rate': 9.2991213903739e-05, 'epoch': 2.06}
{'loss': 0.9367, 'learning_rate': 9.28765089393087e-05, 'epoch': 2.06}
{'loss': 0.9518, 'learning_rate': 9.276181339445088e-05, 'epoch': 2.06}
{'loss': 1.0235, 'learning_rate': 9.26471274208303e-05, 'epoch': 2.06}
{'loss': 0.9373, 'learning_rate': 9.253245117009919e-05, 'epoch': 2.07}
{'loss': 0.943, 'learning_rate': 9.241778479389683e-05, 'epoch': 2.07}
{'loss': 1.0316, 'learning_rate': 9.230312844384943e-05, 'epoch': 2.07}
{'loss': 0.941, 'learning_rate': 9.218848227157007e-05, 'epoch': 2.07}
{'loss': 0.926, 'learning_rate': 9.207384642865824e-05, 'epoch': 2.07}
{'loss': 0.9663, 'learning_rate': 9.195922106669981e-05, 'epoch': 2.07}
{'loss': 0.9753, 'learning_rate': 9.18446063372668e-05, 'epoch': 2.07}
{'loss': 0.9101, 'learning_rate': 9.173000239191713e-05, 'epoch': 2.08}
{'loss': 0.925, 'learning_rate': 9.161540938219454e-05, 'epoch': 2.08}
{'loss': 0.9318, 'learning_rate': 9.150082745962828e-05, 'epoch': 2.08}
{'loss': 0.9457, 'learning_rate': 9.138625677573289e-05, 'epoch': 2.08}
{'loss': 0.8729, 'learning_rate': 9.127169748200812e-05, 'epoch': 2.08}
{'loss': 0.8718, 'learning_rate': 9.115714972993858e-05, 'epoch': 2.08}
{'loss': 0.9763, 'learning_rate': 9.104261367099365e-05, 'epoch': 2.08}
{'loss': 1.0157, 'learning_rate': 9.092808945662733e-05, 'epoch': 2.09}
{'loss': 0.8974, 'learning_rate': 9.081357723827785e-05, 'epoch': 2.09}
{'loss': 0.9423, 'learning_rate': 9.069907716736761e-05, 'epoch': 2.09}
{'loss': 0.9599, 'learning_rate': 9.058458939530295e-05, 'epoch': 2.09}
{'loss': 0.9702, 'learning_rate': 9.047011407347389e-05, 'epoch': 2.09}
{'loss': 0.9297, 'learning_rate': 9.035565135325414e-05, 'epoch': 2.09}
{'loss': 0.9849, 'learning_rate': 9.024120138600063e-05, 'epoch': 2.09}
{'loss': 0.9112, 'learning_rate': 9.01267643230534e-05, 'epoch': 2.1}
{'loss': 0.9173, 'learning_rate': 9.001234031573553e-05, 'epoch': 2.1}
{'loss': 0.8652, 'learning_rate': 8.989792951535276e-05, 'epoch': 2.1}
{'loss': 0.9371, 'learning_rate': 8.978353207319332e-05, 'epoch': 2.1}
{'loss': 0.9531, 'learning_rate': 8.966914814052796e-05, 'epoch': 2.1}
{'loss': 0.9788, 'learning_rate': 8.955477786860937e-05, 'epoch': 2.1}
{'loss': 0.966, 'learning_rate': 8.944042140867229e-05, 'epoch': 2.1}
{'loss': 0.8829, 'learning_rate': 8.932607891193315e-05, 'epoch': 2.11}
{'loss': 0.9596, 'learning_rate': 8.921175052958985e-05, 'epoch': 2.11}
{'loss': 0.8538, 'learning_rate': 8.909743641282183e-05, 'epoch': 2.11}
{'loss': 0.9055, 'learning_rate': 8.898313671278948e-05, 'epoch': 2.11}
{'loss': 0.9679, 'learning_rate': 8.886885158063416e-05, 'epoch': 2.11}
{'loss': 0.7991, 'learning_rate': 8.875458116747806e-05, 'epoch': 2.11}
{'loss': 0.9862, 'learning_rate': 8.864032562442374e-05, 'epoch': 2.11}
{'loss': 0.9596, 'learning_rate': 8.852608510255429e-05, 'epoch': 2.12}
{'loss': 1.0007, 'learning_rate': 8.841185975293282e-05, 'epoch': 2.12}
{'loss': 0.9466, 'learning_rate': 8.829764972660237e-05, 'epoch': 2.12}
{'loss': 0.8998, 'learning_rate': 8.818345517458576e-05, 'epoch': 2.12}
{'loss': 0.9441, 'learning_rate': 8.806927624788535e-05, 'epoch': 2.12}
{'loss': 0.9134, 'learning_rate': 8.795511309748276e-05, 'epoch': 2.12}
{'loss': 0.9951, 'learning_rate': 8.78409658743389e-05, 'epoch': 2.12}
{'loss': 0.9091, 'learning_rate': 8.772683472939351e-05, 'epoch': 2.13}
{'loss': 0.9919, 'learning_rate': 8.761271981356504e-05, 'epoch': 2.13}
{'loss': 0.9902, 'learning_rate': 8.749862127775058e-05, 'epoch': 2.13}
{'loss': 0.9323, 'learning_rate': 8.738453927282548e-05, 'epoch': 2.13}
{'loss': 1.0634, 'learning_rate': 8.727047394964328e-05, 'epoch': 2.13}
{'loss': 1.0659, 'learning_rate': 8.715642545903546e-05, 'epoch': 2.13}
{'loss': 0.9359, 'learning_rate': 8.704239395181121e-05, 'epoch': 2.14}
{'loss': 0.9258, 'learning_rate': 8.692837957875725e-05, 'epoch': 2.14}
{'loss': 1.0116, 'learning_rate': 8.681438249063767e-05, 'epoch': 2.14}
{'loss': 0.9114, 'learning_rate': 8.670040283819376e-05, 'epoch': 2.14}
{'loss': 1.002, 'learning_rate': 8.658644077214368e-05, 'epoch': 2.14}
{'loss': 0.8892, 'learning_rate': 8.647249644318232e-05, 'epoch': 2.14}
{'loss': 0.9424, 'learning_rate': 8.635857000198114e-05, 'epoch': 2.14}
{'loss': 0.845, 'learning_rate': 8.6244661599188e-05, 'epoch': 2.15}
{'loss': 0.949, 'learning_rate': 8.613077138542684e-05, 'epoch': 2.15}
{'loss': 0.9641, 'learning_rate': 8.601689951129757e-05, 'epoch': 2.15}
{'loss': 0.9768, 'learning_rate': 8.590304612737587e-05, 'epoch': 2.15}
{'loss': 0.8609, 'learning_rate': 8.578921138421294e-05, 'epoch': 2.15}
{'loss': 0.9284, 'learning_rate': 8.567539543233532e-05, 'epoch': 2.15}
{'loss': 0.8856, 'learning_rate': 8.556159842224472e-05, 'epoch': 2.15}
{'loss': 0.9045, 'learning_rate': 8.544782050441785e-05, 'epoch': 2.16}
{'loss': 0.9271, 'learning_rate': 8.533406182930613e-05, 'epoch': 2.16}
{'loss': 0.9371, 'learning_rate': 8.522032254733548e-05, 'epoch': 2.16}
{'loss': 0.8904, 'learning_rate': 8.510660280890625e-05, 'epoch': 2.16}
{'loss': 0.9564, 'learning_rate': 8.499290276439293e-05, 'epoch': 2.16}
{'loss': 0.8299, 'learning_rate': 8.4879222564144e-05, 'epoch': 2.16}
{'loss': 0.9851, 'learning_rate': 8.47655623584816e-05, 'epoch': 2.16}
{'loss': 0.9057, 'learning_rate': 8.465192229770156e-05, 'epoch': 2.17}
{'loss': 0.8881, 'learning_rate': 8.4538302532073e-05, 'epoch': 2.17}
{'loss': 0.8754, 'learning_rate': 8.442470321183817e-05, 'epoch': 2.17}
{'loss': 0.9749, 'learning_rate': 8.43111244872123e-05, 'epoch': 2.17}
{'loss': 0.9256, 'learning_rate': 8.41975665083835e-05, 'epoch': 2.17}
{'loss': 0.9899, 'learning_rate': 8.408402942551234e-05, 'epoch': 2.17}
{'loss': 0.9544, 'learning_rate': 8.397051338873172e-05, 'epoch': 2.17}
{'loss': 0.9213, 'learning_rate': 8.38570185481468e-05, 'epoch': 2.18}
{'loss': 0.9002, 'learning_rate': 8.374354505383467e-05, 'epoch': 2.18}
{'loss': 0.8544, 'learning_rate': 8.363009305584424e-05, 'epoch': 2.18}
{'loss': 0.9532, 'learning_rate': 8.351666270419589e-05, 'epoch': 2.18}
{'loss': 0.9327, 'learning_rate': 8.340325414888152e-05, 'epoch': 2.18}
{'loss': 0.9033, 'learning_rate': 8.328986753986409e-05, 'epoch': 2.18}
{'loss': 1.0464, 'learning_rate': 8.317650302707754e-05, 'epoch': 2.18}
{'loss': 0.8241, 'learning_rate': 8.306316076042673e-05, 'epoch': 2.19}
{'loss': 0.9649, 'learning_rate': 8.294984088978694e-05, 'epoch': 2.19}
{'loss': 0.9017, 'learning_rate': 8.283654356500394e-05, 'epoch': 2.19}
{'loss': 0.9327, 'learning_rate': 8.272326893589362e-05, 'epoch': 2.19}
{'loss': 0.87, 'learning_rate': 8.261001715224188e-05, 'epoch': 2.19}
{'loss': 0.9762, 'learning_rate': 8.249678836380442e-05, 'epoch': 2.19}
{'loss': 0.9656, 'learning_rate': 8.238358272030658e-05, 'epoch': 2.19}
{'loss': 0.9379, 'learning_rate': 8.227040037144297e-05, 'epoch': 2.2}
{'loss': 1.0153, 'learning_rate': 8.215724146687756e-05, 'epoch': 2.2}
{'loss': 1.0173, 'learning_rate': 8.204410615624318e-05, 'epoch': 2.2}
{'loss': 0.9253, 'learning_rate': 8.193099458914148e-05, 'epoch': 2.2}
{'loss': 0.957, 'learning_rate': 8.181790691514284e-05, 'epoch': 2.2}
{'loss': 0.9088, 'learning_rate': 8.17048432837859e-05, 'epoch': 2.2}
{'loss': 0.8977, 'learning_rate': 8.159180384457757e-05, 'epoch': 2.2}
{'loss': 0.898, 'learning_rate': 8.147878874699274e-05, 'epoch': 2.21}
 56%|███████████████████████████████████████████████████████████████                                                 | 1548/2752 [26:38<20:08,  1.00s/it][2023-12-29 02:30:03,933] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:03,933] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:03,933] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:03,933] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:03,933] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:03,933] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:03,933] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:03,934] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:03,934] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:03,934] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:03,934] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:03,934] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:03,934] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:03,934] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:03,938] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:03,939] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:04,188] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:04,189] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:04,190] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:04,190] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:30:04,452] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:04,452] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:30:04,716] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:04,717] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:30:04,967] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:04,968] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:30:05,254] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:05,254] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:30:05,519] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:05,519] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:30:05,770] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:05,771] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:30:06,031] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:06,032] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:30:06,292] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:06,293] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:30:06,564] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:06,564] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:30:06,823] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:06,824] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:30:07,086] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:30:07,087] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 0.9910922646522522, 'eval_runtime': 3.1651, 'eval_samples_per_second': 345.012, 'eval_steps_per_second': 21.8, 'epoch': 2.21}
{'loss': 0.9692, 'learning_rate': 8.136579814047409e-05, 'epoch': 2.21}
{'loss': 0.8631, 'learning_rate': 8.125283217443207e-05, 'epoch': 2.21}
{'loss': 0.8234, 'learning_rate': 8.113989099824438e-05, 'epoch': 2.21}
{'loss': 0.9396, 'learning_rate': 8.102697476125597e-05, 'epoch': 2.21}
{'loss': 0.922, 'learning_rate': 8.091408361277888e-05, 'epoch': 2.21}
{'loss': 0.8982, 'learning_rate': 8.080121770209191e-05, 'epoch': 2.22}
{'loss': 0.9227, 'learning_rate': 8.068837717844047e-05, 'epoch': 2.22}
{'loss': 0.9319, 'learning_rate': 8.057556219103653e-05, 'epoch': 2.22}
{'loss': 0.9389, 'learning_rate': 8.046277288905814e-05, 'epoch': 2.22}
{'loss': 0.9657, 'learning_rate': 8.035000942164947e-05, 'epoch': 2.22}
{'loss': 0.9241, 'learning_rate': 8.023727193792048e-05, 'epoch': 2.22}
{'loss': 0.9862, 'learning_rate': 8.012456058694678e-05, 'epoch': 2.22}
{'loss': 0.9788, 'learning_rate': 8.001187551776952e-05, 'epoch': 2.23}
{'loss': 0.8368, 'learning_rate': 7.989921687939497e-05, 'epoch': 2.23}
{'loss': 1.0245, 'learning_rate': 7.978658482079449e-05, 'epoch': 2.23}
{'loss': 0.9395, 'learning_rate': 7.967397949090431e-05, 'epoch': 2.23}
{'loss': 1.0009, 'learning_rate': 7.956140103862527e-05, 'epoch': 2.23}
{'loss': 0.9677, 'learning_rate': 7.944884961282279e-05, 'epoch': 2.23}
{'loss': 0.9715, 'learning_rate': 7.933632536232642e-05, 'epoch': 2.23}
{'loss': 0.8948, 'learning_rate': 7.922382843592984e-05, 'epoch': 2.24}
{'loss': 0.948, 'learning_rate': 7.911135898239055e-05, 'epoch': 2.24}
{'loss': 0.9526, 'learning_rate': 7.899891715042976e-05, 'epoch': 2.24}
{'loss': 0.9683, 'learning_rate': 7.888650308873213e-05, 'epoch': 2.24}
{'loss': 0.9407, 'learning_rate': 7.87741169459457e-05, 'epoch': 2.24}
{'loss': 0.9093, 'learning_rate': 7.866175887068143e-05, 'epoch': 2.24}
{'loss': 0.9218, 'learning_rate': 7.854942901151328e-05, 'epoch': 2.24}
{'loss': 0.9578, 'learning_rate': 7.843712751697786e-05, 'epoch': 2.25}
{'loss': 0.8599, 'learning_rate': 7.832485453557424e-05, 'epoch': 2.25}
{'loss': 0.909, 'learning_rate': 7.821261021576391e-05, 'epoch': 2.25}
{'loss': 0.9667, 'learning_rate': 7.81003947059703e-05, 'epoch': 2.25}
{'loss': 0.8572, 'learning_rate': 7.798820815457886e-05, 'epoch': 2.25}
{'loss': 0.958, 'learning_rate': 7.787605070993668e-05, 'epoch': 2.25}
{'loss': 0.9316, 'learning_rate': 7.776392252035237e-05, 'epoch': 2.25}
{'loss': 0.8547, 'learning_rate': 7.765182373409591e-05, 'epoch': 2.26}
{'loss': 0.9308, 'learning_rate': 7.753975449939835e-05, 'epoch': 2.26}
{'loss': 0.9408, 'learning_rate': 7.742771496445167e-05, 'epoch': 2.26}
{'loss': 0.9128, 'learning_rate': 7.731570527740856e-05, 'epoch': 2.26}
{'loss': 0.9414, 'learning_rate': 7.720372558638233e-05, 'epoch': 2.26}
{'loss': 0.8834, 'learning_rate': 7.709177603944645e-05, 'epoch': 2.26}
{'loss': 0.8925, 'learning_rate': 7.697985678463476e-05, 'epoch': 2.26}
{'loss': 0.8767, 'learning_rate': 7.686796796994084e-05, 'epoch': 2.27}
{'loss': 0.868, 'learning_rate': 7.675610974331813e-05, 'epoch': 2.27}
{'loss': 0.8945, 'learning_rate': 7.66442822526796e-05, 'epoch': 2.27}
{'loss': 0.8905, 'learning_rate': 7.653248564589751e-05, 'epoch': 2.27}
{'loss': 0.9968, 'learning_rate': 7.642072007080343e-05, 'epoch': 2.27}
{'loss': 0.8581, 'learning_rate': 7.630898567518778e-05, 'epoch': 2.27}
{'loss': 0.8514, 'learning_rate': 7.619728260679975e-05, 'epoch': 2.27}
{'loss': 0.9472, 'learning_rate': 7.608561101334714e-05, 'epoch': 2.28}
{'loss': 0.9261, 'learning_rate': 7.597397104249613e-05, 'epoch': 2.28}
{'loss': 0.9951, 'learning_rate': 7.586236284187106e-05, 'epoch': 2.28}
{'loss': 0.9294, 'learning_rate': 7.575078655905434e-05, 'epoch': 2.28}
{'loss': 0.8349, 'learning_rate': 7.563924234158607e-05, 'epoch': 2.28}
{'loss': 0.8989, 'learning_rate': 7.552773033696398e-05, 'epoch': 2.28}
{'loss': 1.0002, 'learning_rate': 7.541625069264324e-05, 'epoch': 2.28}
{'loss': 0.9754, 'learning_rate': 7.530480355603615e-05, 'epoch': 2.29}
{'loss': 0.9512, 'learning_rate': 7.519338907451215e-05, 'epoch': 2.29}
{'loss': 0.923, 'learning_rate': 7.508200739539739e-05, 'epoch': 2.29}
{'loss': 0.8903, 'learning_rate': 7.49706586659747e-05, 'epoch': 2.29}
{'loss': 0.8961, 'learning_rate': 7.485934303348327e-05, 'epoch': 2.29}
{'loss': 0.8639, 'learning_rate': 7.474806064511863e-05, 'epoch': 2.29}
{'loss': 0.9267, 'learning_rate': 7.46368116480323e-05, 'epoch': 2.3}
{'loss': 0.8483, 'learning_rate': 7.452559618933164e-05, 'epoch': 2.3}
{'loss': 0.8689, 'learning_rate': 7.441441441607964e-05, 'epoch': 2.3}
{'loss': 0.8956, 'learning_rate': 7.43032664752948e-05, 'epoch': 2.3}
{'loss': 0.8652, 'learning_rate': 7.419215251395078e-05, 'epoch': 2.3}
{'loss': 0.9348, 'learning_rate': 7.408107267897651e-05, 'epoch': 2.3}
{'loss': 0.9277, 'learning_rate': 7.397002711725558e-05, 'epoch': 2.3}
{'loss': 0.9744, 'learning_rate': 7.385901597562637e-05, 'epoch': 2.31}
{'loss': 0.9454, 'learning_rate': 7.374803940088171e-05, 'epoch': 2.31}
{'loss': 0.9003, 'learning_rate': 7.36370975397687e-05, 'epoch': 2.31}
{'loss': 0.8666, 'learning_rate': 7.352619053898864e-05, 'epoch': 2.31}
{'loss': 0.9264, 'learning_rate': 7.341531854519664e-05, 'epoch': 2.31}
{'loss': 0.9439, 'learning_rate': 7.33044817050015e-05, 'epoch': 2.31}
{'loss': 0.9582, 'learning_rate': 7.319368016496564e-05, 'epoch': 2.31}
{'loss': 1.0029, 'learning_rate': 7.308291407160472e-05, 'epoch': 2.32}
{'loss': 0.9322, 'learning_rate': 7.29721835713875e-05, 'epoch': 2.32}
{'loss': 0.8516, 'learning_rate': 7.286148881073578e-05, 'epoch': 2.32}
{'loss': 0.8623, 'learning_rate': 7.275082993602402e-05, 'epoch': 2.32}
{'loss': 0.8561, 'learning_rate': 7.264020709357927e-05, 'epoch': 2.32}
{'loss': 0.8975, 'learning_rate': 7.25296204296809e-05, 'epoch': 2.32}
{'loss': 0.8716, 'learning_rate': 7.241907009056039e-05, 'epoch': 2.32}
{'loss': 1.0045, 'learning_rate': 7.230855622240136e-05, 'epoch': 2.33}
{'loss': 0.8901, 'learning_rate': 7.219807897133906e-05, 'epoch': 2.33}
{'loss': 1.0119, 'learning_rate': 7.208763848346029e-05, 'epoch': 2.33}
{'loss': 0.9225, 'learning_rate': 7.197723490480338e-05, 'epoch': 2.33}
{'loss': 0.9379, 'learning_rate': 7.186686838135774e-05, 'epoch': 2.33}
{'loss': 0.949, 'learning_rate': 7.17565390590638e-05, 'epoch': 2.33}
{'loss': 0.8753, 'learning_rate': 7.164624708381285e-05, 'epoch': 2.33}
{'loss': 0.9395, 'learning_rate': 7.153599260144677e-05, 'epoch': 2.34}
{'loss': 0.9719, 'learning_rate': 7.142577575775782e-05, 'epoch': 2.34}
{'loss': 0.9445, 'learning_rate': 7.13155966984885e-05, 'epoch': 2.34}
{'loss': 0.8849, 'learning_rate': 7.120545556933138e-05, 'epoch': 2.34}
{'loss': 0.9214, 'learning_rate': 7.109535251592892e-05, 'epoch': 2.34}
{'loss': 0.8934, 'learning_rate': 7.098528768387311e-05, 'epoch': 2.34}
{'loss': 0.849, 'learning_rate': 7.087526121870548e-05, 'epoch': 2.34}
{'loss': 0.9069, 'learning_rate': 7.076527326591682e-05, 'epoch': 2.35}
{'loss': 0.939, 'learning_rate': 7.065532397094695e-05, 'epoch': 2.35}
{'loss': 0.9456, 'learning_rate': 7.054541347918464e-05, 'epoch': 2.35}
{'loss': 0.982, 'learning_rate': 7.043554193596732e-05, 'epoch': 2.35}
{'loss': 0.9272, 'learning_rate': 7.03257094865809e-05, 'epoch': 2.35}
{'loss': 0.8147, 'learning_rate': 7.021591627625958e-05, 'epoch': 2.35}
{'loss': 0.8368, 'learning_rate': 7.010616245018573e-05, 'epoch': 2.35}
{'loss': 0.8645, 'learning_rate': 6.999644815348956e-05, 'epoch': 2.36}
{'loss': 0.9121, 'learning_rate': 6.988677353124913e-05, 'epoch': 2.36}
{'loss': 0.8353, 'learning_rate': 6.977713872848995e-05, 'epoch': 2.36}
{'loss': 0.8928, 'learning_rate': 6.966754389018487e-05, 'epoch': 2.36}
{'loss': 0.9564, 'learning_rate': 6.955798916125393e-05, 'epoch': 2.36}
{'loss': 0.9131, 'learning_rate': 6.94484746865641e-05, 'epoch': 2.36}
{'loss': 0.9782, 'learning_rate': 6.933900061092919e-05, 'epoch': 2.36}
{'loss': 0.9148, 'learning_rate': 6.92295670791095e-05, 'epoch': 2.37}
{'loss': 0.9299, 'learning_rate': 6.912017423581179e-05, 'epoch': 2.37}
{'loss': 0.9317, 'learning_rate': 6.901082222568895e-05, 'epoch': 2.37}
{'loss': 0.8967, 'learning_rate': 6.890151119333988e-05, 'epoch': 2.37}
{'loss': 0.9378, 'learning_rate': 6.87922412833094e-05, 'epoch': 2.37}
{'loss': 0.9033, 'learning_rate': 6.868301264008785e-05, 'epoch': 2.37}
{'loss': 0.9331, 'learning_rate': 6.857382540811101e-05, 'epoch': 2.38}
{'loss': 0.9384, 'learning_rate': 6.846467973175993e-05, 'epoch': 2.38}
{'loss': 0.9054, 'learning_rate': 6.835557575536071e-05, 'epoch': 2.38}
{'loss': 0.9476, 'learning_rate': 6.824651362318425e-05, 'epoch': 2.38}
{'loss': 0.8062, 'learning_rate': 6.813749347944625e-05, 'epoch': 2.38}
{'loss': 0.9072, 'learning_rate': 6.802851546830674e-05, 'epoch': 2.38}
{'loss': 0.8875, 'learning_rate': 6.791957973387013e-05, 'epoch': 2.38}
{'loss': 0.8945, 'learning_rate': 6.781068642018488e-05, 'epoch': 2.39}
{'loss': 0.9672, 'learning_rate': 6.770183567124337e-05, 'epoch': 2.39}
{'loss': 0.9715, 'learning_rate': 6.759302763098172e-05, 'epoch': 2.39}
{'loss': 1.0443, 'learning_rate': 6.748426244327957e-05, 'epoch': 2.39}
{'loss': 0.9025, 'learning_rate': 6.737554025195984e-05, 'epoch': 2.39}
{'loss': 0.9726, 'learning_rate': 6.726686120078862e-05, 'epoch': 2.39}
{'loss': 1.0293, 'learning_rate': 6.715822543347502e-05, 'epoch': 2.39}
{'loss': 0.8217, 'learning_rate': 6.704963309367083e-05, 'epoch': 2.4}
{'loss': 0.949, 'learning_rate': 6.694108432497048e-05, 'epoch': 2.4}
{'loss': 0.8321, 'learning_rate': 6.683257927091074e-05, 'epoch': 2.4}
{'loss': 1.0148, 'learning_rate': 6.672411807497057e-05, 'epoch': 2.4}
{'loss': 0.9201, 'learning_rate': 6.661570088057097e-05, 'epoch': 2.4}
{'loss': 0.8982, 'learning_rate': 6.65073278310747e-05, 'epoch': 2.4}
{'loss': 0.9065, 'learning_rate': 6.639899906978626e-05, 'epoch': 2.4}
{'loss': 0.9881, 'learning_rate': 6.629071473995147e-05, 'epoch': 2.41}
{'loss': 0.9452, 'learning_rate': 6.618247498475744e-05, 'epoch': 2.41}
{'loss': 0.8942, 'learning_rate': 6.60742799473323e-05, 'epoch': 2.41}
{'loss': 0.9652, 'learning_rate': 6.596612977074515e-05, 'epoch': 2.41}
{'loss': 1.0082, 'learning_rate': 6.585802459800566e-05, 'epoch': 2.41}
{'loss': 1.0078, 'learning_rate': 6.574996457206408e-05, 'epoch': 2.41}
{'loss': 0.9739, 'learning_rate': 6.564194983581089e-05, 'epoch': 2.41}
{'loss': 0.9248, 'learning_rate': 6.553398053207671e-05, 'epoch': 2.42}
{'loss': 1.011, 'learning_rate': 6.542605680363204e-05, 'epoch': 2.42}
{'loss': 0.9858, 'learning_rate': 6.53181787931872e-05, 'epoch': 2.42}
{'loss': 0.9249, 'learning_rate': 6.521034664339204e-05, 'epoch': 2.42}
{'loss': 0.9466, 'learning_rate': 6.510256049683571e-05, 'epoch': 2.42}
{'loss': 0.9436, 'learning_rate': 6.499482049604656e-05, 'epoch': 2.42}
{'loss': 0.9149, 'learning_rate': 6.488712678349189e-05, 'epoch': 2.42}
{'loss': 0.8384, 'learning_rate': 6.477947950157785e-05, 'epoch': 2.43}
{'loss': 0.7928, 'learning_rate': 6.467187879264916e-05, 'epoch': 2.43}
{'loss': 0.9252, 'learning_rate': 6.456432479898897e-05, 'epoch': 2.43}
{'loss': 0.897, 'learning_rate': 6.445681766281863e-05, 'epoch': 2.43}
{'loss': 0.8298, 'learning_rate': 6.434935752629758e-05, 'epoch': 2.43}
{'loss': 0.915, 'learning_rate': 6.4241944531523e-05, 'epoch': 2.43}
{'loss': 0.9462, 'learning_rate': 6.413457882052991e-05, 'epoch': 2.43}
{'loss': 0.8737, 'learning_rate': 6.402726053529065e-05, 'epoch': 2.44}
{'loss': 0.7485, 'learning_rate': 6.391998981771492e-05, 'epoch': 2.44}
{'loss': 0.9372, 'learning_rate': 6.381276680964947e-05, 'epoch': 2.44}
{'loss': 1.002, 'learning_rate': 6.3705591652878e-05, 'epoch': 2.44}
{'loss': 0.9004, 'learning_rate': 6.359846448912099e-05, 'epoch': 2.44}
{'loss': 0.8896, 'learning_rate': 6.349138546003534e-05, 'epoch': 2.44}
{'loss': 0.9036, 'learning_rate': 6.338435470721442e-05, 'epoch': 2.44}
{'loss': 0.8637, 'learning_rate': 6.327737237218765e-05, 'epoch': 2.45}
{'loss': 0.9706, 'learning_rate': 6.317043859642049e-05, 'epoch': 2.45}
{'loss': 0.9648, 'learning_rate': 6.306355352131414e-05, 'epoch': 2.45}
{'loss': 0.9592, 'learning_rate': 6.295671728820553e-05, 'epoch': 2.45}
{'loss': 0.9285, 'learning_rate': 6.284993003836686e-05, 'epoch': 2.45}
{'loss': 0.8922, 'learning_rate': 6.27431919130056e-05, 'epoch': 2.45}
{'loss': 1.0248, 'learning_rate': 6.263650305326429e-05, 'epoch': 2.45}
{'loss': 0.9553, 'learning_rate': 6.252986360022029e-05, 'epoch': 2.46}
 62%|██████████████████████████████████████████████████████████████████████                                          | 1720/2752 [29:34<17:19,  1.01s/it][2023-12-29 02:33:00,457] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,457] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,457] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,457] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,458] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,458] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,458] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,458] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,458] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,458] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,458] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,458] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,458] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,458] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,458] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,458] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,712] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,713] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,713] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,714] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:33:00,977] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:00,977] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:33:01,241] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:01,242] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:33:01,491] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:01,492] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:33:01,778] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:01,779] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:33:02,043] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:02,044] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:33:02,295] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:02,296] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:33:02,557] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:02,557] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:33:02,818] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:02,819] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:33:03,087] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:03,088] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:33:03,348] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:03,349] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:33:03,611] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:33:03,612] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 0.9947829246520996, 'eval_runtime': 3.1664, 'eval_samples_per_second': 344.866, 'eval_steps_per_second': 21.791, 'epoch': 2.46}
{'loss': 0.9065, 'learning_rate': 6.242327369488568e-05, 'epoch': 2.46}
{'loss': 0.889, 'learning_rate': 6.231673347820694e-05, 'epoch': 2.46}
{'loss': 0.8549, 'learning_rate': 6.221024309106498e-05, 'epoch': 2.46}
{'loss': 0.9912, 'learning_rate': 6.210380267427467e-05, 'epoch': 2.46}
{'loss': 0.9013, 'learning_rate': 6.199741236858483e-05, 'epoch': 2.46}
{'loss': 0.9185, 'learning_rate': 6.189107231467814e-05, 'epoch': 2.47}
{'loss': 0.9443, 'learning_rate': 6.17847826531707e-05, 'epoch': 2.47}
{'loss': 0.9674, 'learning_rate': 6.167854352461202e-05, 'epoch': 2.47}
{'loss': 0.861, 'learning_rate': 6.15723550694848e-05, 'epoch': 2.47}
{'loss': 0.9042, 'learning_rate': 6.146621742820471e-05, 'epoch': 2.47}
{'loss': 0.9436, 'learning_rate': 6.136013074112018e-05, 'epoch': 2.47}
{'loss': 0.9288, 'learning_rate': 6.125409514851243e-05, 'epoch': 2.47}
{'loss': 0.9631, 'learning_rate': 6.114811079059495e-05, 'epoch': 2.48}
{'loss': 0.9584, 'learning_rate': 6.104217780751353e-05, 'epoch': 2.48}
{'loss': 0.8961, 'learning_rate': 6.0936296339346054e-05, 'epoch': 2.48}
{'loss': 0.9219, 'learning_rate': 6.083046652610224e-05, 'epoch': 2.48}
{'loss': 0.8926, 'learning_rate': 6.072468850772357e-05, 'epoch': 2.48}
{'loss': 0.8685, 'learning_rate': 6.061896242408298e-05, 'epoch': 2.48}
{'loss': 0.901, 'learning_rate': 6.051328841498473e-05, 'epoch': 2.48}
{'loss': 0.9529, 'learning_rate': 6.040766662016424e-05, 'epoch': 2.49}
{'loss': 0.8898, 'learning_rate': 6.0302097179287844e-05, 'epoch': 2.49}
{'loss': 0.8969, 'learning_rate': 6.019658023195276e-05, 'epoch': 2.49}
{'loss': 0.8563, 'learning_rate': 6.009111591768668e-05, 'epoch': 2.49}
{'loss': 0.9268, 'learning_rate': 5.998570437594775e-05, 'epoch': 2.49}
{'loss': 0.9143, 'learning_rate': 5.9880345746124265e-05, 'epoch': 2.49}
{'loss': 1.0232, 'learning_rate': 5.977504016753468e-05, 'epoch': 2.49}
{'loss': 0.9814, 'learning_rate': 5.9669787779427155e-05, 'epoch': 2.5}
{'loss': 0.8978, 'learning_rate': 5.9564588720979655e-05, 'epoch': 2.5}
{'loss': 0.9057, 'learning_rate': 5.945944313129953e-05, 'epoch': 2.5}
{'loss': 0.8697, 'learning_rate': 5.935435114942345e-05, 'epoch': 2.5}
{'loss': 0.8556, 'learning_rate': 5.924931291431719e-05, 'epoch': 2.5}
{'loss': 0.957, 'learning_rate': 5.914432856487544e-05, 'epoch': 2.5}
{'loss': 0.928, 'learning_rate': 5.903939823992174e-05, 'epoch': 2.5}
{'loss': 0.9249, 'learning_rate': 5.8934522078208066e-05, 'epoch': 2.51}
{'loss': 0.8818, 'learning_rate': 5.882970021841483e-05, 'epoch': 2.51}
{'loss': 0.9018, 'learning_rate': 5.8724932799150586e-05, 'epoch': 2.51}
{'loss': 0.861, 'learning_rate': 5.8620219958952e-05, 'epoch': 2.51}
{'loss': 1.0394, 'learning_rate': 5.851556183628348e-05, 'epoch': 2.51}
{'loss': 0.9121, 'learning_rate': 5.8410958569537146e-05, 'epoch': 2.51}
{'loss': 0.9268, 'learning_rate': 5.830641029703254e-05, 'epoch': 2.51}
{'loss': 0.9448, 'learning_rate': 5.82019171570164e-05, 'epoch': 2.52}
{'loss': 0.8035, 'learning_rate': 5.8097479287662756e-05, 'epoch': 2.52}
{'loss': 0.943, 'learning_rate': 5.79930968270724e-05, 'epoch': 2.52}
{'loss': 0.8752, 'learning_rate': 5.788876991327288e-05, 'epoch': 2.52}
{'loss': 0.8614, 'learning_rate': 5.778449868421836e-05, 'epoch': 2.52}
{'loss': 0.9529, 'learning_rate': 5.768028327778932e-05, 'epoch': 2.52}
{'loss': 0.9355, 'learning_rate': 5.757612383179238e-05, 'epoch': 2.52}
{'loss': 0.9904, 'learning_rate': 5.747202048396023e-05, 'epoch': 2.53}
{'loss': 0.8688, 'learning_rate': 5.73679733719514e-05, 'epoch': 2.53}
{'loss': 0.9654, 'learning_rate': 5.726398263335e-05, 'epoch': 2.53}
{'loss': 0.9239, 'learning_rate': 5.71600484056656e-05, 'epoch': 2.53}
{'loss': 0.9396, 'learning_rate': 5.705617082633306e-05, 'epoch': 2.53}
{'loss': 0.9185, 'learning_rate': 5.695235003271231e-05, 'epoch': 2.53}
{'loss': 1.0378, 'learning_rate': 5.684858616208826e-05, 'epoch': 2.53}
{'loss': 0.8558, 'learning_rate': 5.674487935167049e-05, 'epoch': 2.54}
{'loss': 0.9194, 'learning_rate': 5.664122973859313e-05, 'epoch': 2.54}
{'loss': 0.8405, 'learning_rate': 5.653763745991467e-05, 'epoch': 2.54}
{'loss': 0.9421, 'learning_rate': 5.643410265261784e-05, 'epoch': 2.54}
{'loss': 1.0677, 'learning_rate': 5.633062545360925e-05, 'epoch': 2.54}
{'loss': 0.9314, 'learning_rate': 5.622720599971952e-05, 'epoch': 2.54}
{'loss': 0.851, 'learning_rate': 5.6123844427702775e-05, 'epoch': 2.55}
{'loss': 0.94, 'learning_rate': 5.602054087423663e-05, 'epoch': 2.55}
{'loss': 0.9605, 'learning_rate': 5.591729547592195e-05, 'epoch': 2.55}
{'loss': 0.8999, 'learning_rate': 5.5814108369282824e-05, 'epoch': 2.55}
{'loss': 0.9691, 'learning_rate': 5.571097969076611e-05, 'epoch': 2.55}
{'loss': 0.8829, 'learning_rate': 5.5607909576741445e-05, 'epoch': 2.55}
{'loss': 0.9018, 'learning_rate': 5.550489816350113e-05, 'epoch': 2.55}
{'loss': 0.9032, 'learning_rate': 5.540194558725973e-05, 'epoch': 2.56}
{'loss': 0.8197, 'learning_rate': 5.5299051984153995e-05, 'epoch': 2.56}
{'loss': 0.9565, 'learning_rate': 5.5196217490242793e-05, 'epoch': 2.56}
{'loss': 0.8936, 'learning_rate': 5.5093442241506784e-05, 'epoch': 2.56}
{'loss': 0.9086, 'learning_rate': 5.4990726373848243e-05, 'epoch': 2.56}
{'loss': 0.8598, 'learning_rate': 5.488807002309098e-05, 'epoch': 2.56}
{'loss': 0.9722, 'learning_rate': 5.478547332498007e-05, 'epoch': 2.56}
{'loss': 1.005, 'learning_rate': 5.468293641518172e-05, 'epoch': 2.57}
{'loss': 0.924, 'learning_rate': 5.458045942928309e-05, 'epoch': 2.57}
{'loss': 0.9149, 'learning_rate': 5.447804250279213e-05, 'epoch': 2.57}
{'loss': 0.9354, 'learning_rate': 5.437568577113727e-05, 'epoch': 2.57}
{'loss': 0.94, 'learning_rate': 5.4273389369667436e-05, 'epoch': 2.57}
{'loss': 0.967, 'learning_rate': 5.417115343365171e-05, 'epoch': 2.57}
{'loss': 0.9475, 'learning_rate': 5.4068978098279336e-05, 'epoch': 2.57}
{'loss': 0.9334, 'learning_rate': 5.396686349865929e-05, 'epoch': 2.58}
{'loss': 0.8909, 'learning_rate': 5.3864809769820315e-05, 'epoch': 2.58}
{'loss': 0.8644, 'learning_rate': 5.37628170467106e-05, 'epoch': 2.58}
{'loss': 0.9284, 'learning_rate': 5.366088546419771e-05, 'epoch': 2.58}
{'loss': 0.9473, 'learning_rate': 5.3559015157068404e-05, 'epoch': 2.58}
{'loss': 0.89, 'learning_rate': 5.3457206260028324e-05, 'epoch': 2.58}
{'loss': 0.9504, 'learning_rate': 5.3355458907701925e-05, 'epoch': 2.58}
{'loss': 0.8861, 'learning_rate': 5.325377323463239e-05, 'epoch': 2.59}
{'loss': 0.9384, 'learning_rate': 5.315214937528121e-05, 'epoch': 2.59}
{'loss': 0.8827, 'learning_rate': 5.3050587464028136e-05, 'epoch': 2.59}
{'loss': 0.882, 'learning_rate': 5.2949087635171144e-05, 'epoch': 2.59}
{'loss': 0.8757, 'learning_rate': 5.284765002292598e-05, 'epoch': 2.59}
{'loss': 0.9264, 'learning_rate': 5.2746274761426176e-05, 'epoch': 2.59}
{'loss': 0.8916, 'learning_rate': 5.2644961984722796e-05, 'epoch': 2.59}
{'loss': 0.9372, 'learning_rate': 5.254371182678424e-05, 'epoch': 2.6}
{'loss': 1.0207, 'learning_rate': 5.244252442149624e-05, 'epoch': 2.6}
{'loss': 1.0514, 'learning_rate': 5.234139990266143e-05, 'epoch': 2.6}
{'loss': 0.8366, 'learning_rate': 5.224033840399931e-05, 'epoch': 2.6}
{'loss': 0.959, 'learning_rate': 5.213934005914607e-05, 'epoch': 2.6}
{'loss': 0.8958, 'learning_rate': 5.203840500165434e-05, 'epoch': 2.6}
{'loss': 0.8551, 'learning_rate': 5.1937533364993143e-05, 'epoch': 2.6}
{'loss': 0.9172, 'learning_rate': 5.1836725282547585e-05, 'epoch': 2.61}
{'loss': 0.831, 'learning_rate': 5.173598088761874e-05, 'epoch': 2.61}
{'loss': 0.9451, 'learning_rate': 5.163530031342347e-05, 'epoch': 2.61}
{'loss': 1.0276, 'learning_rate': 5.153468369309424e-05, 'epoch': 2.61}
{'loss': 1.0002, 'learning_rate': 5.1434131159678945e-05, 'epoch': 2.61}
{'loss': 0.9118, 'learning_rate': 5.133364284614077e-05, 'epoch': 2.61}
{'loss': 0.9769, 'learning_rate': 5.123321888535795e-05, 'epoch': 2.61}
{'loss': 0.9228, 'learning_rate': 5.113285941012358e-05, 'epoch': 2.62}
{'loss': 0.8696, 'learning_rate': 5.103256455314562e-05, 'epoch': 2.62}
{'loss': 0.9039, 'learning_rate': 5.093233444704641e-05, 'epoch': 2.62}
{'loss': 1.0408, 'learning_rate': 5.083216922436284e-05, 'epoch': 2.62}
{'loss': 0.9339, 'learning_rate': 5.073206901754586e-05, 'epoch': 2.62}
{'loss': 0.86, 'learning_rate': 5.063203395896052e-05, 'epoch': 2.62}
{'loss': 0.9069, 'learning_rate': 5.053206418088572e-05, 'epoch': 2.62}
{'loss': 0.9286, 'learning_rate': 5.043215981551398e-05, 'epoch': 2.63}
{'loss': 0.8741, 'learning_rate': 5.033232099495144e-05, 'epoch': 2.63}
{'loss': 0.933, 'learning_rate': 5.023254785121746e-05, 'epoch': 2.63}
{'loss': 0.9479, 'learning_rate': 5.0132840516244604e-05, 'epoch': 2.63}
{'loss': 0.9658, 'learning_rate': 5.003319912187838e-05, 'epoch': 2.63}
{'loss': 0.8772, 'learning_rate': 4.993362379987716e-05, 'epoch': 2.63}
{'loss': 0.936, 'learning_rate': 4.9834114681911846e-05, 'epoch': 2.64}
{'loss': 0.9086, 'learning_rate': 4.9734671899565955e-05, 'epoch': 2.64}
{'loss': 0.9157, 'learning_rate': 4.963529558433514e-05, 'epoch': 2.64}
{'loss': 0.9237, 'learning_rate': 4.953598586762722e-05, 'epoch': 2.64}
{'loss': 0.8762, 'learning_rate': 4.9436742880761964e-05, 'epoch': 2.64}
{'loss': 0.934, 'learning_rate': 4.933756675497082e-05, 'epoch': 2.64}
{'loss': 0.9644, 'learning_rate': 4.923845762139699e-05, 'epoch': 2.64}
{'loss': 0.8441, 'learning_rate': 4.913941561109493e-05, 'epoch': 2.65}
{'loss': 0.8449, 'learning_rate': 4.904044085503041e-05, 'epoch': 2.65}
{'loss': 0.9557, 'learning_rate': 4.894153348408021e-05, 'epoch': 2.65}
{'loss': 0.9485, 'learning_rate': 4.884269362903212e-05, 'epoch': 2.65}
{'loss': 0.9048, 'learning_rate': 4.874392142058456e-05, 'epoch': 2.65}
{'loss': 0.9076, 'learning_rate': 4.8645216989346466e-05, 'epoch': 2.65}
{'loss': 0.91, 'learning_rate': 4.8546580465837274e-05, 'epoch': 2.65}
{'loss': 0.881, 'learning_rate': 4.844801198048654e-05, 'epoch': 2.66}
{'loss': 0.9445, 'learning_rate': 4.834951166363385e-05, 'epoch': 2.66}
{'loss': 1.0006, 'learning_rate': 4.825107964552864e-05, 'epoch': 2.66}
{'loss': 1.0497, 'learning_rate': 4.815271605633012e-05, 'epoch': 2.66}
{'loss': 0.8729, 'learning_rate': 4.8054421026106913e-05, 'epoch': 2.66}
{'loss': 0.9184, 'learning_rate': 4.7956194684837045e-05, 'epoch': 2.66}
{'loss': 0.9196, 'learning_rate': 4.785803716240767e-05, 'epoch': 2.66}
{'loss': 0.972, 'learning_rate': 4.775994858861492e-05, 'epoch': 2.67}
{'loss': 0.9013, 'learning_rate': 4.7661929093163905e-05, 'epoch': 2.67}
{'loss': 1.0315, 'learning_rate': 4.756397880566823e-05, 'epoch': 2.67}
{'loss': 0.909, 'learning_rate': 4.7466097855650025e-05, 'epoch': 2.67}
{'loss': 0.9478, 'learning_rate': 4.7368286372539775e-05, 'epoch': 2.67}
{'loss': 0.8349, 'learning_rate': 4.727054448567601e-05, 'epoch': 2.67}
{'loss': 0.9896, 'learning_rate': 4.7172872324305395e-05, 'epoch': 2.67}
{'loss': 0.9205, 'learning_rate': 4.7075270017582254e-05, 'epoch': 2.68}
{'loss': 0.9018, 'learning_rate': 4.697773769456859e-05, 'epoch': 2.68}
{'loss': 0.9135, 'learning_rate': 4.688027548423386e-05, 'epoch': 2.68}
{'loss': 0.9179, 'learning_rate': 4.678288351545478e-05, 'epoch': 2.68}
{'loss': 0.9328, 'learning_rate': 4.6685561917015276e-05, 'epoch': 2.68}
{'loss': 0.8566, 'learning_rate': 4.658831081760614e-05, 'epoch': 2.68}
{'loss': 0.9334, 'learning_rate': 4.6491130345824906e-05, 'epoch': 2.68}
{'loss': 0.9076, 'learning_rate': 4.639402063017585e-05, 'epoch': 2.69}
{'loss': 0.8876, 'learning_rate': 4.629698179906958e-05, 'epoch': 2.69}
{'loss': 0.9201, 'learning_rate': 4.6200013980822954e-05, 'epoch': 2.69}
{'loss': 0.9125, 'learning_rate': 4.610311730365904e-05, 'epoch': 2.69}
{'loss': 0.9517, 'learning_rate': 4.600629189570672e-05, 'epoch': 2.69}
{'loss': 0.9998, 'learning_rate': 4.590953788500071e-05, 'epoch': 2.69}
{'loss': 0.8575, 'learning_rate': 4.5812855399481256e-05, 'epoch': 2.69}
{'loss': 0.8623, 'learning_rate': 4.571624456699404e-05, 'epoch': 2.7}
{'loss': 0.8983, 'learning_rate': 4.561970551529008e-05, 'epoch': 2.7}
{'loss': 0.9505, 'learning_rate': 4.5523238372025356e-05, 'epoch': 2.7}
{'loss': 0.8463, 'learning_rate': 4.542684326476082e-05, 'epoch': 2.7}
{'loss': 0.9392, 'learning_rate': 4.533052032096217e-05, 'epoch': 2.7}
{'loss': 0.9328, 'learning_rate': 4.523426966799965e-05, 'epoch': 2.7}
{'loss': 0.9663, 'learning_rate': 4.5138091433147925e-05, 'epoch': 2.7}
{'loss': 0.9191, 'learning_rate': 4.504198574358596e-05, 'epoch': 2.71}
 69%|█████████████████████████████████████████████████████████████████████████████                                   | 1892/2752 [32:31<14:30,  1.01s/it][2023-12-29 02:35:57,136] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,136] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,136] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,136] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,136] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,136] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,136] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,136] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,137] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,137] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,137] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,137] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,137] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,137] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,137] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,137] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,392] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,393] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,393] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,394] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:35:57,654] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,655] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:35:57,919] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:57,920] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:35:58,170] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:58,171] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:35:58,456] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:58,457] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:35:58,722] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:58,723] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:35:58,974] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:58,974] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:35:59,234] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:59,235] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:35:59,496] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:59,497] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:35:59,765] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:35:59,765] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:36:00,026] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:36:00,027] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:36:00,289] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:36:00,290] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 0.9938868880271912, 'eval_runtime': 3.1649, 'eval_samples_per_second': 345.036, 'eval_steps_per_second': 21.802, 'epoch': 2.71}
{'loss': 0.9106, 'learning_rate': 4.4945952726396714e-05, 'epoch': 2.71}
{'loss': 0.9026, 'learning_rate': 4.484999250856706e-05, 'epoch': 2.71}
{'loss': 0.9029, 'learning_rate': 4.475410521698764e-05, 'epoch': 2.71}
{'loss': 0.8842, 'learning_rate': 4.465829097845261e-05, 'epoch': 2.71}
{'loss': 1.003, 'learning_rate': 4.4562549919659625e-05, 'epoch': 2.71}
{'loss': 0.8727, 'learning_rate': 4.4466882167209464e-05, 'epoch': 2.72}
{'loss': 0.9795, 'learning_rate': 4.4371287847606e-05, 'epoch': 2.72}
{'loss': 0.9245, 'learning_rate': 4.427576708725609e-05, 'epoch': 2.72}
{'loss': 0.9606, 'learning_rate': 4.418032001246917e-05, 'epoch': 2.72}
{'loss': 0.9373, 'learning_rate': 4.408494674945739e-05, 'epoch': 2.72}
{'loss': 0.8952, 'learning_rate': 4.3989647424335214e-05, 'epoch': 2.72}
{'loss': 0.879, 'learning_rate': 4.389442216311933e-05, 'epoch': 2.72}
{'loss': 0.8557, 'learning_rate': 4.3799271091728525e-05, 'epoch': 2.73}
{'loss': 0.8478, 'learning_rate': 4.370419433598345e-05, 'epoch': 2.73}
{'loss': 1.0037, 'learning_rate': 4.360919202160648e-05, 'epoch': 2.73}
{'loss': 0.9289, 'learning_rate': 4.351426427422165e-05, 'epoch': 2.73}
{'loss': 0.9311, 'learning_rate': 4.341941121935429e-05, 'epoch': 2.73}
{'loss': 0.8235, 'learning_rate': 4.332463298243099e-05, 'epoch': 2.73}
{'loss': 0.8741, 'learning_rate': 4.3229929688779414e-05, 'epoch': 2.73}
{'loss': 0.8201, 'learning_rate': 4.313530146362809e-05, 'epoch': 2.74}
{'loss': 0.9737, 'learning_rate': 4.304074843210637e-05, 'epoch': 2.74}
{'loss': 0.8822, 'learning_rate': 4.29462707192441e-05, 'epoch': 2.74}
{'loss': 0.8463, 'learning_rate': 4.285186844997154e-05, 'epoch': 2.74}
{'loss': 0.9696, 'learning_rate': 4.275754174911921e-05, 'epoch': 2.74}
{'loss': 0.9475, 'learning_rate': 4.266329074141764e-05, 'epoch': 2.74}
{'loss': 0.9222, 'learning_rate': 4.256911555149742e-05, 'epoch': 2.74}
{'loss': 0.8247, 'learning_rate': 4.247501630388873e-05, 'epoch': 2.75}
{'loss': 0.852, 'learning_rate': 4.2380993123021385e-05, 'epoch': 2.75}
{'loss': 0.8594, 'learning_rate': 4.2287046133224584e-05, 'epoch': 2.75}
{'loss': 0.9492, 'learning_rate': 4.219317545872689e-05, 'epoch': 2.75}
{'loss': 0.9106, 'learning_rate': 4.209938122365579e-05, 'epoch': 2.75}
{'loss': 0.8924, 'learning_rate': 4.200566355203784e-05, 'epoch': 2.75}
{'loss': 0.8769, 'learning_rate': 4.191202256779827e-05, 'epoch': 2.75}
{'loss': 0.9155, 'learning_rate': 4.181845839476091e-05, 'epoch': 2.76}
{'loss': 0.8438, 'learning_rate': 4.172497115664803e-05, 'epoch': 2.76}
{'loss': 0.9502, 'learning_rate': 4.163156097708014e-05, 'epoch': 2.76}
{'loss': 0.954, 'learning_rate': 4.153822797957596e-05, 'epoch': 2.76}
{'loss': 0.9277, 'learning_rate': 4.144497228755203e-05, 'epoch': 2.76}
{'loss': 0.9976, 'learning_rate': 4.1351794024322724e-05, 'epoch': 2.76}
{'loss': 0.8987, 'learning_rate': 4.1258693313099996e-05, 'epoch': 2.76}
{'loss': 1.0284, 'learning_rate': 4.1165670276993254e-05, 'epoch': 2.77}
{'loss': 0.9039, 'learning_rate': 4.1072725039009275e-05, 'epoch': 2.77}
{'loss': 0.9622, 'learning_rate': 4.097985772205186e-05, 'epoch': 2.77}
{'loss': 0.8291, 'learning_rate': 4.088706844892182e-05, 'epoch': 2.77}
{'loss': 1.0228, 'learning_rate': 4.079435734231676e-05, 'epoch': 2.77}
{'loss': 0.9032, 'learning_rate': 4.070172452483091e-05, 'epoch': 2.77}
{'loss': 1.0341, 'learning_rate': 4.0609170118954965e-05, 'epoch': 2.77}
{'loss': 0.9567, 'learning_rate': 4.051669424707602e-05, 'epoch': 2.78}
{'loss': 0.8364, 'learning_rate': 4.042429703147723e-05, 'epoch': 2.78}
{'loss': 0.921, 'learning_rate': 4.033197859433777e-05, 'epoch': 2.78}
{'loss': 0.9741, 'learning_rate': 4.0239739057732614e-05, 'epoch': 2.78}
{'loss': 0.9435, 'learning_rate': 4.014757854363249e-05, 'epoch': 2.78}
{'loss': 0.9182, 'learning_rate': 4.005549717390352e-05, 'epoch': 2.78}
{'loss': 0.8817, 'learning_rate': 3.996349507030731e-05, 'epoch': 2.78}
{'loss': 0.9229, 'learning_rate': 3.987157235450051e-05, 'epoch': 2.79}
{'loss': 0.8952, 'learning_rate': 3.977972914803486e-05, 'epoch': 2.79}
{'loss': 0.9602, 'learning_rate': 3.9687965572356935e-05, 'epoch': 2.79}
{'loss': 0.9466, 'learning_rate': 3.9596281748808086e-05, 'epoch': 2.79}
{'loss': 0.8834, 'learning_rate': 3.950467779862411e-05, 'epoch': 2.79}
{'loss': 0.8743, 'learning_rate': 3.9413153842935255e-05, 'epoch': 2.79}
{'loss': 1.0362, 'learning_rate': 3.9321710002765956e-05, 'epoch': 2.8}
{'loss': 0.9543, 'learning_rate': 3.92303463990347e-05, 'epoch': 2.8}
{'loss': 0.9374, 'learning_rate': 3.9139063152553864e-05, 'epoch': 2.8}
{'loss': 0.9471, 'learning_rate': 3.9047860384029675e-05, 'epoch': 2.8}
{'loss': 0.9507, 'learning_rate': 3.895673821406183e-05, 'epoch': 2.8}
{'loss': 1.0041, 'learning_rate': 3.8865696763143447e-05, 'epoch': 2.8}
{'loss': 0.9205, 'learning_rate': 3.877473615166097e-05, 'epoch': 2.8}
{'loss': 0.8563, 'learning_rate': 3.868385649989388e-05, 'epoch': 2.81}
{'loss': 0.9562, 'learning_rate': 3.859305792801469e-05, 'epoch': 2.81}
{'loss': 0.8823, 'learning_rate': 3.850234055608863e-05, 'epoch': 2.81}
{'loss': 0.9413, 'learning_rate': 3.841170450407358e-05, 'epoch': 2.81}
{'loss': 0.8878, 'learning_rate': 3.832114989181988e-05, 'epoch': 2.81}
{'loss': 0.8403, 'learning_rate': 3.8230676839070134e-05, 'epoch': 2.81}
{'loss': 0.9446, 'learning_rate': 3.814028546545924e-05, 'epoch': 2.81}
{'loss': 0.9467, 'learning_rate': 3.804997589051394e-05, 'epoch': 2.82}
{'loss': 0.8977, 'learning_rate': 3.795974823365287e-05, 'epoch': 2.82}
{'loss': 0.8595, 'learning_rate': 3.7869602614186395e-05, 'epoch': 2.82}
{'loss': 0.8949, 'learning_rate': 3.77795391513163e-05, 'epoch': 2.82}
{'loss': 0.9651, 'learning_rate': 3.768955796413577e-05, 'epoch': 2.82}
{'loss': 0.8827, 'learning_rate': 3.759965917162925e-05, 'epoch': 2.82}
{'loss': 0.9665, 'learning_rate': 3.750984289267217e-05, 'epoch': 2.82}
{'loss': 0.8955, 'learning_rate': 3.7420109246030866e-05, 'epoch': 2.83}
{'loss': 0.8589, 'learning_rate': 3.733045835036241e-05, 'epoch': 2.83}
{'loss': 0.8211, 'learning_rate': 3.724089032421441e-05, 'epoch': 2.83}
{'loss': 0.8969, 'learning_rate': 3.7151405286025e-05, 'epoch': 2.83}
{'loss': 0.9402, 'learning_rate': 3.706200335412248e-05, 'epoch': 2.83}
{'loss': 0.9404, 'learning_rate': 3.6972684646725283e-05, 'epoch': 2.83}
{'loss': 0.9332, 'learning_rate': 3.688344928194181e-05, 'epoch': 2.83}
{'loss': 0.9071, 'learning_rate': 3.6794297377770196e-05, 'epoch': 2.84}
{'loss': 0.9343, 'learning_rate': 3.670522905209832e-05, 'epoch': 2.84}
{'loss': 0.8765, 'learning_rate': 3.661624442270346e-05, 'epoch': 2.84}
{'loss': 0.8928, 'learning_rate': 3.652734360725224e-05, 'epoch': 2.84}
{'loss': 0.9422, 'learning_rate': 3.6438526723300446e-05, 'epoch': 2.84}
{'loss': 0.9866, 'learning_rate': 3.6349793888292915e-05, 'epoch': 2.84}
{'loss': 0.8925, 'learning_rate': 3.626114521956327e-05, 'epoch': 2.84}
{'loss': 0.9413, 'learning_rate': 3.617258083433396e-05, 'epoch': 2.85}
{'loss': 0.9447, 'learning_rate': 3.6084100849715876e-05, 'epoch': 2.85}
{'loss': 0.9265, 'learning_rate': 3.59957053827083e-05, 'epoch': 2.85}
{'loss': 0.9366, 'learning_rate': 3.590739455019888e-05, 'epoch': 2.85}
{'loss': 0.9024, 'learning_rate': 3.581916846896318e-05, 'epoch': 2.85}
{'loss': 1.037, 'learning_rate': 3.573102725566485e-05, 'epoch': 2.85}
{'loss': 0.978, 'learning_rate': 3.564297102685522e-05, 'epoch': 2.85}
{'loss': 0.9384, 'learning_rate': 3.555499989897326e-05, 'epoch': 2.86}
{'loss': 0.9749, 'learning_rate': 3.546711398834543e-05, 'epoch': 2.86}
{'loss': 0.919, 'learning_rate': 3.5379313411185453e-05, 'epoch': 2.86}
{'loss': 0.9048, 'learning_rate': 3.5291598283594316e-05, 'epoch': 2.86}
{'loss': 0.933, 'learning_rate': 3.520396872155992e-05, 'epoch': 2.86}
{'loss': 1.0065, 'learning_rate': 3.5116424840957065e-05, 'epoch': 2.86}
{'loss': 0.8885, 'learning_rate': 3.502896675754722e-05, 'epoch': 2.86}
{'loss': 0.9302, 'learning_rate': 3.494159458697843e-05, 'epoch': 2.87}
{'loss': 0.916, 'learning_rate': 3.485430844478509e-05, 'epoch': 2.87}
{'loss': 0.9507, 'learning_rate': 3.476710844638795e-05, 'epoch': 2.87}
{'loss': 0.9827, 'learning_rate': 3.467999470709373e-05, 'epoch': 2.87}
{'loss': 0.9284, 'learning_rate': 3.459296734209514e-05, 'epoch': 2.87}
{'loss': 0.9059, 'learning_rate': 3.450602646647066e-05, 'epoch': 2.87}
{'loss': 0.8338, 'learning_rate': 3.441917219518438e-05, 'epoch': 2.88}
{'loss': 0.9415, 'learning_rate': 3.433240464308597e-05, 'epoch': 2.88}
{'loss': 0.943, 'learning_rate': 3.4245723924910315e-05, 'epoch': 2.88}
{'loss': 0.92, 'learning_rate': 3.415913015527753e-05, 'epoch': 2.88}
{'loss': 0.8524, 'learning_rate': 3.407262344869272e-05, 'epoch': 2.88}
{'loss': 0.8813, 'learning_rate': 3.3986203919545945e-05, 'epoch': 2.88}
{'loss': 0.946, 'learning_rate': 3.389987168211187e-05, 'epoch': 2.88}
{'loss': 0.9317, 'learning_rate': 3.381362685054987e-05, 'epoch': 2.89}
{'loss': 1.0254, 'learning_rate': 3.3727469538903646e-05, 'epoch': 2.89}
{'loss': 0.8975, 'learning_rate': 3.3641399861101165e-05, 'epoch': 2.89}
{'loss': 0.9077, 'learning_rate': 3.355541793095456e-05, 'epoch': 2.89}
{'loss': 0.9046, 'learning_rate': 3.3469523862159856e-05, 'epoch': 2.89}
{'loss': 0.9072, 'learning_rate': 3.338371776829705e-05, 'epoch': 2.89}
{'loss': 0.8743, 'learning_rate': 3.3297999762829655e-05, 'epoch': 2.89}
{'loss': 0.9907, 'learning_rate': 3.3212369959104774e-05, 'epoch': 2.9}
{'loss': 0.9043, 'learning_rate': 3.312682847035284e-05, 'epoch': 2.9}
{'loss': 0.9034, 'learning_rate': 3.3041375409687526e-05, 'epoch': 2.9}
{'loss': 0.9034, 'learning_rate': 3.295601089010562e-05, 'epoch': 2.9}
{'loss': 0.993, 'learning_rate': 3.287073502448675e-05, 'epoch': 2.9}
{'loss': 0.9328, 'learning_rate': 3.278554792559337e-05, 'epoch': 2.9}
{'loss': 0.9436, 'learning_rate': 3.2700449706070534e-05, 'epoch': 2.9}
{'loss': 0.899, 'learning_rate': 3.2615440478445715e-05, 'epoch': 2.91}
{'loss': 0.8965, 'learning_rate': 3.2530520355128854e-05, 'epoch': 2.91}
{'loss': 0.8862, 'learning_rate': 3.2445689448411934e-05, 'epoch': 2.91}
{'loss': 0.9173, 'learning_rate': 3.236094787046901e-05, 'epoch': 2.91}
{'loss': 1.003, 'learning_rate': 3.2276295733356024e-05, 'epoch': 2.91}
{'loss': 0.8534, 'learning_rate': 3.2191733149010594e-05, 'epoch': 2.91}
{'loss': 0.8589, 'learning_rate': 3.2107260229252036e-05, 'epoch': 2.91}
{'loss': 0.925, 'learning_rate': 3.202287708578097e-05, 'epoch': 2.92}
{'loss': 0.8715, 'learning_rate': 3.193858383017942e-05, 'epoch': 2.92}
{'loss': 0.8844, 'learning_rate': 3.185438057391045e-05, 'epoch': 2.92}
{'loss': 0.9201, 'learning_rate': 3.1770267428318154e-05, 'epoch': 2.92}
{'loss': 0.8743, 'learning_rate': 3.168624450462746e-05, 'epoch': 2.92}
{'loss': 0.8695, 'learning_rate': 3.160231191394407e-05, 'epoch': 2.92}
{'loss': 0.9371, 'learning_rate': 3.151846976725412e-05, 'epoch': 2.92}
{'loss': 0.8625, 'learning_rate': 3.143471817542422e-05, 'epoch': 2.93}
{'loss': 0.9185, 'learning_rate': 3.13510572492012e-05, 'epoch': 2.93}
{'loss': 0.9125, 'learning_rate': 3.1267487099212e-05, 'epoch': 2.93}
{'loss': 0.9906, 'learning_rate': 3.118400783596361e-05, 'epoch': 2.93}
{'loss': 0.9378, 'learning_rate': 3.110061956984275e-05, 'epoch': 2.93}
{'loss': 0.9237, 'learning_rate': 3.10173224111158e-05, 'epoch': 2.93}
{'loss': 0.9199, 'learning_rate': 3.093411646992873e-05, 'epoch': 2.93}
{'loss': 0.8795, 'learning_rate': 3.085100185630685e-05, 'epoch': 2.94}
{'loss': 1.0392, 'learning_rate': 3.0767978680154684e-05, 'epoch': 2.94}
{'loss': 0.9213, 'learning_rate': 3.0685047051255946e-05, 'epoch': 2.94}
{'loss': 0.9464, 'learning_rate': 3.060220707927319e-05, 'epoch': 2.94}
{'loss': 0.9369, 'learning_rate': 3.051945887374782e-05, 'epoch': 2.94}
{'loss': 0.9182, 'learning_rate': 3.0436802544099862e-05, 'epoch': 2.94}
{'loss': 0.8392, 'learning_rate': 3.035423819962785e-05, 'epoch': 2.94}
{'loss': 0.9129, 'learning_rate': 3.027176594950878e-05, 'epoch': 2.95}
{'loss': 1.0331, 'learning_rate': 3.0189385902797705e-05, 'epoch': 2.95}
{'loss': 0.8897, 'learning_rate': 3.0107098168427937e-05, 'epoch': 2.95}
{'loss': 0.9169, 'learning_rate': 3.002490285521059e-05, 'epoch': 2.95}
{'loss': 0.8883, 'learning_rate': 2.9942800071834554e-05, 'epoch': 2.95}
{'loss': 0.9268, 'learning_rate': 2.9860789926866504e-05, 'epoch': 2.95}
{'loss': 0.9142, 'learning_rate': 2.977887252875049e-05, 'epoch': 2.95}
{'loss': 0.9245, 'learning_rate': 2.9697047985807958e-05, 'epoch': 2.96}
 75%|████████████████████████████████████████████████████████████████████████████████████                            | 2064/2752 [35:28<11:32,  1.01s/it][2023-12-29 02:38:53,931] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:53,931] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:53,931] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:53,931] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:53,932] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:53,932] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:53,932] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:53,932] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:53,932] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:53,932] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:53,932] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:53,932] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:53,932] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:53,933] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:53,933] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:53,933] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:54,188] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:54,188] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:54,189] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:54,190] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:38:54,450] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:54,450] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:38:54,715] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:54,716] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:38:54,966] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:54,966] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:38:55,252] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:55,253] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:38:55,518] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:55,518] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:38:55,769] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:55,770] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:38:56,030] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:56,031] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:38:56,292] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:56,292] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:38:56,561] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:56,561] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:38:56,824] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:56,825] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:38:57,088] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:38:57,089] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 0.991502583026886, 'eval_runtime': 3.1683, 'eval_samples_per_second': 344.661, 'eval_steps_per_second': 21.778, 'epoch': 2.96}
{'loss': 0.8612, 'learning_rate': 2.961531640623757e-05, 'epoch': 2.96}
{'loss': 0.8814, 'learning_rate': 2.9533677898115063e-05, 'epoch': 2.96}
{'loss': 0.899, 'learning_rate': 2.9452132569393077e-05, 'epoch': 2.96}
{'loss': 0.9123, 'learning_rate': 2.9370680527901116e-05, 'epoch': 2.96}
{'loss': 0.8835, 'learning_rate': 2.9289321881345254e-05, 'epoch': 2.96}
{'loss': 0.9375, 'learning_rate': 2.9208056737308074e-05, 'epoch': 2.97}
{'loss': 0.9662, 'learning_rate': 2.9126885203248554e-05, 'epoch': 2.97}
{'loss': 0.918, 'learning_rate': 2.904580738650181e-05, 'epoch': 2.97}
{'loss': 0.8745, 'learning_rate': 2.8964823394279174e-05, 'epoch': 2.97}
{'loss': 0.9056, 'learning_rate': 2.888393333366778e-05, 'epoch': 2.97}
{'loss': 0.9258, 'learning_rate': 2.880313731163061e-05, 'epoch': 2.97}
{'loss': 0.9394, 'learning_rate': 2.872243543500629e-05, 'epoch': 2.97}
{'loss': 0.932, 'learning_rate': 2.864182781050895e-05, 'epoch': 2.98}
{'loss': 0.8368, 'learning_rate': 2.856131454472807e-05, 'epoch': 2.98}
{'loss': 0.7922, 'learning_rate': 2.8480895744128422e-05, 'epoch': 2.98}
{'loss': 0.881, 'learning_rate': 2.840057151504979e-05, 'epoch': 2.98}
{'loss': 0.9326, 'learning_rate': 2.832034196370693e-05, 'epoch': 2.98}
{'loss': 0.9605, 'learning_rate': 2.824020719618944e-05, 'epoch': 2.98}
{'loss': 0.9439, 'learning_rate': 2.8160167318461506e-05, 'epoch': 2.98}
{'loss': 0.8284, 'learning_rate': 2.8080222436361934e-05, 'epoch': 2.99}
{'loss': 0.8589, 'learning_rate': 2.8000372655603847e-05, 'epoch': 2.99}
{'loss': 0.9226, 'learning_rate': 2.7920618081774618e-05, 'epoch': 2.99}
{'loss': 0.954, 'learning_rate': 2.784095882033575e-05, 'epoch': 2.99}
{'loss': 0.8516, 'learning_rate': 2.7761394976622658e-05, 'epoch': 2.99}
{'loss': 0.9274, 'learning_rate': 2.768192665584468e-05, 'epoch': 2.99}
{'loss': 0.9747, 'learning_rate': 2.7602553963084776e-05, 'epoch': 2.99}
{'loss': 0.866, 'learning_rate': 2.7523277003299463e-05, 'epoch': 3.0}
{'loss': 0.8241, 'learning_rate': 2.7444095881318656e-05, 'epoch': 3.0}
{'loss': 0.8822, 'learning_rate': 2.736501070184556e-05, 'epoch': 3.0}
{'loss': 0.9198, 'learning_rate': 2.728602156945649e-05, 'epoch': 3.0}
{'loss': 1.0102, 'learning_rate': 2.720712858860083e-05, 'epoch': 3.0}
{'loss': 0.9658, 'learning_rate': 2.712833186360072e-05, 'epoch': 3.0}
{'loss': 0.9391, 'learning_rate': 2.7049631498651085e-05, 'epoch': 3.0}
{'loss': 0.9419, 'learning_rate': 2.69710275978194e-05, 'epoch': 3.01}
{'loss': 0.97, 'learning_rate': 2.6892520265045552e-05, 'epoch': 3.01}
{'loss': 0.9433, 'learning_rate': 2.6814109604141848e-05, 'epoch': 3.01}
{'loss': 0.9582, 'learning_rate': 2.6735795718792646e-05, 'epoch': 3.01}
{'loss': 0.9987, 'learning_rate': 2.665757871255439e-05, 'epoch': 3.01}
{'loss': 0.9493, 'learning_rate': 2.6579458688855362e-05, 'epoch': 3.01}
{'loss': 0.9051, 'learning_rate': 2.6501435750995727e-05, 'epoch': 3.01}
{'loss': 0.984, 'learning_rate': 2.6423510002147113e-05, 'epoch': 3.02}
{'loss': 0.9262, 'learning_rate': 2.6345681545352773e-05, 'epoch': 3.02}
{'loss': 1.0209, 'learning_rate': 2.6267950483527216e-05, 'epoch': 3.02}
{'loss': 0.9266, 'learning_rate': 2.619031691945618e-05, 'epoch': 3.02}
{'loss': 0.9631, 'learning_rate': 2.611278095579651e-05, 'epoch': 3.02}
 77%|█████████████████████████████████████████████████████████████████████████████████████▊                          | 2109/2752 [36:17<10:48,  1.01s/it][2023-12-29 02:39:42,702] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:39:42,703] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:39:42,703] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:39:42,703] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:39:42,703] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:39:42,734] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:39:42,931] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:39:42,940] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:39:42,943] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:39:42,964] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:39:42,973] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:39:42,976] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:39:42,990] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:39:42,992] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:39:43,175] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
[2023-12-29 02:39:43,209] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 1341686
{'loss': 0.925, 'learning_rate': 2.6035342695075937e-05, 'epoch': 3.0}
{'loss': 0.8235, 'learning_rate': 2.5958002239693092e-05, 'epoch': 3.0}
{'loss': 0.959, 'learning_rate': 2.588075969191718e-05, 'epoch': 3.0}
{'loss': 0.86, 'learning_rate': 2.5803615153887983e-05, 'epoch': 3.01}
{'loss': 0.9208, 'learning_rate': 2.5726568727615662e-05, 'epoch': 3.01}
{'loss': 0.9867, 'learning_rate': 2.5649620514980644e-05, 'epoch': 3.01}
{'loss': 0.8472, 'learning_rate': 2.5572770617733544e-05, 'epoch': 3.01}
{'loss': 0.9361, 'learning_rate': 2.5496019137494908e-05, 'epoch': 3.01}
{'loss': 0.9576, 'learning_rate': 2.5419366175755145e-05, 'epoch': 3.01}
{'loss': 0.913, 'learning_rate': 2.5342811833874423e-05, 'epoch': 3.01}
{'loss': 0.8749, 'learning_rate': 2.5266356213082433e-05, 'epoch': 3.02}
{'loss': 0.8932, 'learning_rate': 2.518999941447846e-05, 'epoch': 3.02}
{'loss': 1.0326, 'learning_rate': 2.5113741539030987e-05, 'epoch': 3.02}
{'loss': 0.8748, 'learning_rate': 2.503758268757773e-05, 'epoch': 3.02}
{'loss': 0.8779, 'learning_rate': 2.496152296082548e-05, 'epoch': 3.02}
{'loss': 0.9256, 'learning_rate': 2.4885562459349888e-05, 'epoch': 3.02}
{'loss': 0.9557, 'learning_rate': 2.480970128359552e-05, 'epoch': 3.02}
{'loss': 0.866, 'learning_rate': 2.4733939533875472e-05, 'epoch': 3.03}
{'loss': 0.882, 'learning_rate': 2.465827731037147e-05, 'epoch': 3.03}
{'loss': 0.892, 'learning_rate': 2.458271471313357e-05, 'epoch': 3.03}
{'loss': 0.8872, 'learning_rate': 2.4507251842080092e-05, 'epoch': 3.03}
{'loss': 0.9151, 'learning_rate': 2.443188879699747e-05, 'epoch': 3.03}
{'loss': 0.9867, 'learning_rate': 2.4356625677540233e-05, 'epoch': 3.03}
{'loss': 0.8839, 'learning_rate': 2.4281462583230686e-05, 'epoch': 3.03}
{'loss': 0.7848, 'learning_rate': 2.4206399613458875e-05, 'epoch': 3.04}
{'loss': 0.8742, 'learning_rate': 2.413143686748247e-05, 'epoch': 3.04}
{'loss': 0.9463, 'learning_rate': 2.405657444442657e-05, 'epoch': 3.04}
{'loss': 0.8966, 'learning_rate': 2.3981812443283723e-05, 'epoch': 3.04}
{'loss': 0.9582, 'learning_rate': 2.3907150962913584e-05, 'epoch': 3.04}
{'loss': 0.8336, 'learning_rate': 2.3832590102042895e-05, 'epoch': 3.04}
{'loss': 0.8765, 'learning_rate': 2.3758129959265407e-05, 'epoch': 3.05}
{'loss': 0.9209, 'learning_rate': 2.3683770633041613e-05, 'epoch': 3.05}
{'loss': 0.8095, 'learning_rate': 2.3609512221698725e-05, 'epoch': 3.05}
{'loss': 0.9049, 'learning_rate': 2.3535354823430577e-05, 'epoch': 3.05}
{'loss': 0.8767, 'learning_rate': 2.3461298536297328e-05, 'epoch': 3.05}
{'loss': 0.9175, 'learning_rate': 2.33873434582255e-05, 'epoch': 3.05}
{'loss': 0.9369, 'learning_rate': 2.331348968700775e-05, 'epoch': 3.05}
{'loss': 0.9144, 'learning_rate': 2.3239737320302756e-05, 'epoch': 3.06}
{'loss': 0.9236, 'learning_rate': 2.3166086455635218e-05, 'epoch': 3.06}
{'loss': 0.8233, 'learning_rate': 2.3092537190395457e-05, 'epoch': 3.06}
{'loss': 0.9012, 'learning_rate': 2.3019089621839597e-05, 'epoch': 3.06}
{'loss': 0.8681, 'learning_rate': 2.2945743847089174e-05, 'epoch': 3.06}
{'loss': 0.8885, 'learning_rate': 2.2872499963131155e-05, 'epoch': 3.06}
{'loss': 0.951, 'learning_rate': 2.279935806681782e-05, 'epoch': 3.06}
{'loss': 0.8665, 'learning_rate': 2.272631825486653e-05, 'epoch': 3.07}
{'loss': 0.8761, 'learning_rate': 2.2653380623859665e-05, 'epoch': 3.07}
{'loss': 0.9654, 'learning_rate': 2.258054527024451e-05, 'epoch': 3.07}
{'loss': 0.8756, 'learning_rate': 2.2507812290333097e-05, 'epoch': 3.07}
{'loss': 0.8569, 'learning_rate': 2.243518178030206e-05, 'epoch': 3.07}
{'loss': 0.9044, 'learning_rate': 2.2362653836192603e-05, 'epoch': 3.07}
{'loss': 0.9117, 'learning_rate': 2.2290228553910242e-05, 'epoch': 3.07}
{'loss': 0.8473, 'learning_rate': 2.2217906029224757e-05, 'epoch': 3.08}
{'loss': 0.8593, 'learning_rate': 2.2145686357770046e-05, 'epoch': 3.08}
{'loss': 0.8527, 'learning_rate': 2.2073569635044e-05, 'epoch': 3.08}
{'loss': 0.8815, 'learning_rate': 2.2001555956408428e-05, 'epoch': 3.08}
{'loss': 0.8078, 'learning_rate': 2.1929645417088805e-05, 'epoch': 3.08}
{'loss': 0.8098, 'learning_rate': 2.1857838112174267e-05, 'epoch': 3.08}
{'loss': 0.9116, 'learning_rate': 2.178613413661743e-05, 'epoch': 3.08}
{'loss': 0.9506, 'learning_rate': 2.1714533585234244e-05, 'epoch': 3.09}
{'loss': 0.8248, 'learning_rate': 2.164303655270399e-05, 'epoch': 3.09}
{'loss': 0.8722, 'learning_rate': 2.1571643133568964e-05, 'epoch': 3.09}
{'loss': 0.8872, 'learning_rate': 2.1500353422234475e-05, 'epoch': 3.09}
{'loss': 0.8939, 'learning_rate': 2.142916751296876e-05, 'epoch': 3.09}
{'loss': 0.8646, 'learning_rate': 2.1358085499902725e-05, 'epoch': 3.09}
{'loss': 0.9108, 'learning_rate': 2.1287107477029878e-05, 'epoch': 3.09}
{'loss': 0.8471, 'learning_rate': 2.121623353820632e-05, 'epoch': 3.1}
{'loss': 0.8401, 'learning_rate': 2.114546377715042e-05, 'epoch': 3.1}
{'loss': 0.8044, 'learning_rate': 2.107479828744282e-05, 'epoch': 3.1}
{'loss': 0.8743, 'learning_rate': 2.1004237162526296e-05, 'epoch': 3.1}
{'loss': 0.8842, 'learning_rate': 2.093378049570558e-05, 'epoch': 3.1}
{'loss': 0.9091, 'learning_rate': 2.0863428380147344e-05, 'epoch': 3.1}
{'loss': 0.8988, 'learning_rate': 2.079318090887996e-05, 'epoch': 3.1}
{'loss': 0.8222, 'learning_rate': 2.072303817479343e-05, 'epoch': 3.11}
{'loss': 0.8944, 'learning_rate': 2.0653000270639268e-05, 'epoch': 3.11}
{'loss': 0.7921, 'learning_rate': 2.0583067289030335e-05, 'epoch': 3.11}
{'loss': 0.8369, 'learning_rate': 2.0513239322440847e-05, 'epoch': 3.11}
{'loss': 0.9107, 'learning_rate': 2.0443516463206048e-05, 'epoch': 3.11}
{'loss': 0.7424, 'learning_rate': 2.037389880352225e-05, 'epoch': 3.11}
{'loss': 0.9282, 'learning_rate': 2.030438643544663e-05, 'epoch': 3.11}
{'loss': 0.8903, 'learning_rate': 2.0234979450897184e-05, 'epoch': 3.12}
{'loss': 0.9372, 'learning_rate': 2.016567794165246e-05, 'epoch': 3.12}
{'loss': 0.8833, 'learning_rate': 2.0096481999351678e-05, 'epoch': 3.12}
{'loss': 0.8388, 'learning_rate': 2.0027391715494347e-05, 'epoch': 3.12}
{'loss': 0.8807, 'learning_rate': 1.9958407181440286e-05, 'epoch': 3.12}
{'loss': 0.8558, 'learning_rate': 1.988952848840948e-05, 'epoch': 3.12}
{'loss': 0.9237, 'learning_rate': 1.982075572748201e-05, 'epoch': 3.12}
{'loss': 0.8507, 'learning_rate': 1.9752088989597795e-05, 'epoch': 3.13}
{'loss': 0.9246, 'learning_rate': 1.9683528365556637e-05, 'epoch': 3.13}
{'loss': 0.9273, 'learning_rate': 1.961507394601797e-05, 'epoch': 3.13}
{'loss': 0.8661, 'learning_rate': 1.95467258215008e-05, 'epoch': 3.13}
{'loss': 0.9966, 'learning_rate': 1.9478484082383562e-05, 'epoch': 3.13}
{'loss': 1.0066, 'learning_rate': 1.9410348818904078e-05, 'epoch': 3.13}
{'loss': 0.8767, 'learning_rate': 1.9342320121159295e-05, 'epoch': 3.14}
{'loss': 0.8655, 'learning_rate': 1.9274398079105316e-05, 'epoch': 3.14}
{'loss': 0.9439, 'learning_rate': 1.9206582782557136e-05, 'epoch': 3.14}
{'loss': 0.845, 'learning_rate': 1.913887432118866e-05, 'epoch': 3.14}
{'loss': 0.9405, 'learning_rate': 1.9071272784532468e-05, 'epoch': 3.14}
{'loss': 0.8258, 'learning_rate': 1.9003778261979843e-05, 'epoch': 3.14}
{'loss': 0.8778, 'learning_rate': 1.893639084278046e-05, 'epoch': 3.14}
{'loss': 0.7739, 'learning_rate': 1.8869110616042407e-05, 'epoch': 3.15}
{'loss': 0.8811, 'learning_rate': 1.880193767073204e-05, 'epoch': 3.15}
{'loss': 0.902, 'learning_rate': 1.8734872095673817e-05, 'epoch': 3.15}
{'loss': 0.9088, 'learning_rate': 1.86679139795503e-05, 'epoch': 3.15}
{'loss': 0.7956, 'learning_rate': 1.8601063410901852e-05, 'epoch': 3.15}
{'loss': 0.8558, 'learning_rate': 1.853432047812671e-05, 'epoch': 3.15}
{'loss': 0.8145, 'learning_rate': 1.8467685269480705e-05, 'epoch': 3.15}
{'loss': 0.8353, 'learning_rate': 1.8401157873077257e-05, 'epoch': 3.16}
{'loss': 0.8619, 'learning_rate': 1.8334738376887262e-05, 'epoch': 3.16}
{'loss': 0.8716, 'learning_rate': 1.826842686873885e-05, 'epoch': 3.16}
{'loss': 0.83, 'learning_rate': 1.820222343631748e-05, 'epoch': 3.16}
{'loss': 0.8917, 'learning_rate': 1.8136128167165578e-05, 'epoch': 3.16}
{'loss': 0.7738, 'learning_rate': 1.8070141148682584e-05, 'epoch': 3.16}
{'loss': 0.9152, 'learning_rate': 1.80042624681248e-05, 'epoch': 3.16}
{'loss': 0.8365, 'learning_rate': 1.7938492212605306e-05, 'epoch': 3.17}
{'loss': 0.8271, 'learning_rate': 1.787283046909376e-05, 'epoch': 3.17}
{'loss': 0.8139, 'learning_rate': 1.7807277324416338e-05, 'epoch': 3.17}
{'loss': 0.9108, 'learning_rate': 1.7741832865255625e-05, 'epoch': 3.17}
{'loss': 0.8615, 'learning_rate': 1.7676497178150464e-05, 'epoch': 3.17}
{'loss': 0.9262, 'learning_rate': 1.7611270349495924e-05, 'epoch': 3.17}
{'loss': 0.8908, 'learning_rate': 1.7546152465543088e-05, 'epoch': 3.17}
{'loss': 0.8593, 'learning_rate': 1.7481143612398955e-05, 'epoch': 3.18}
{'loss': 0.8398, 'learning_rate': 1.7416243876026396e-05, 'epoch': 3.18}
{'loss': 0.7941, 'learning_rate': 1.735145334224394e-05, 'epoch': 3.18}
{'loss': 0.8993, 'learning_rate': 1.728677209672581e-05, 'epoch': 3.18}
{'loss': 0.8728, 'learning_rate': 1.7222200225001616e-05, 'epoch': 3.18}
{'loss': 0.8441, 'learning_rate': 1.7157737812456386e-05, 'epoch': 3.18}
{'loss': 0.9832, 'learning_rate': 1.7093384944330393e-05, 'epoch': 3.18}
 81%|███████████████████████████████████████████████████████████████████████████████████████████                     | 2236/2752 [38:26<08:43,  1.02s/it][2023-12-29 02:41:51,716] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,717] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,717] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,717] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,717] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,717] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,717] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,717] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,718] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,718] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,718] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,718] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,718] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,719] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,719] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,720] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,973] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,973] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,974] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:51,975] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:41:52,236] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:52,236] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:41:52,510] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:52,511] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:41:52,760] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:52,761] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:41:53,046] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:53,046] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:41:53,311] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:53,312] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:41:53,565] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:53,565] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:41:53,826] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:53,826] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:41:54,087] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:54,088] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:41:54,358] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:54,359] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:41:54,620] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:54,621] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:41:54,884] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:41:54,884] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 0.9991366267204285, 'eval_runtime': 3.1792, 'eval_samples_per_second': 343.486, 'eval_steps_per_second': 21.704, 'epoch': 3.18}
{'loss': 0.7692, 'learning_rate': 1.7029141705719064e-05, 'epoch': 3.19}
{'loss': 0.9054, 'learning_rate': 1.696500818157284e-05, 'epoch': 3.19}
{'loss': 0.8379, 'learning_rate': 1.6900984456697145e-05, 'epoch': 3.19}
{'loss': 0.8711, 'learning_rate': 1.6837070615752115e-05, 'epoch': 3.19}
{'loss': 0.8025, 'learning_rate': 1.6773266743252703e-05, 'epoch': 3.19}
{'loss': 0.905, 'learning_rate': 1.670957292356835e-05, 'epoch': 3.19}
{'loss': 0.9015, 'learning_rate': 1.6645989240922987e-05, 'epoch': 3.19}
{'loss': 0.8721, 'learning_rate': 1.6582515779394968e-05, 'epoch': 3.2}
{'loss': 0.9382, 'learning_rate': 1.6519152622916843e-05, 'epoch': 3.2}
{'loss': 0.9502, 'learning_rate': 1.6455899855275303e-05, 'epoch': 3.2}
{'loss': 0.8633, 'learning_rate': 1.6392757560111093e-05, 'epoch': 3.2}
{'loss': 0.8858, 'learning_rate': 1.632972582091884e-05, 'epoch': 3.2}
{'loss': 0.8456, 'learning_rate': 1.6266804721047058e-05, 'epoch': 3.2}
{'loss': 0.8411, 'learning_rate': 1.6203994343697882e-05, 'epoch': 3.2}
{'loss': 0.8317, 'learning_rate': 1.6141294771927062e-05, 'epoch': 3.21}
{'loss': 0.9077, 'learning_rate': 1.6078706088643836e-05, 'epoch': 3.21}
{'loss': 0.8067, 'learning_rate': 1.60162283766108e-05, 'epoch': 3.21}
{'loss': 0.7608, 'learning_rate': 1.5953861718443774e-05, 'epoch': 3.21}
{'loss': 0.8801, 'learning_rate': 1.5891606196611843e-05, 'epoch': 3.21}
{'loss': 0.8622, 'learning_rate': 1.5829461893437015e-05, 'epoch': 3.21}
{'loss': 0.833, 'learning_rate': 1.576742889109427e-05, 'epoch': 3.22}
{'loss': 0.865, 'learning_rate': 1.570550727161144e-05, 'epoch': 3.22}
{'loss': 0.8657, 'learning_rate': 1.5643697116869004e-05, 'epoch': 3.22}
{'loss': 0.8733, 'learning_rate': 1.558199850860016e-05, 'epoch': 3.22}
{'loss': 0.9039, 'learning_rate': 1.55204115283905e-05, 'epoch': 3.22}
{'loss': 0.863, 'learning_rate': 1.5458936257678014e-05, 'epoch': 3.22}
{'loss': 0.9268, 'learning_rate': 1.539757277775308e-05, 'epoch': 3.22}
{'loss': 0.9167, 'learning_rate': 1.533632116975814e-05, 'epoch': 3.23}
{'loss': 0.7777, 'learning_rate': 1.527518151468773e-05, 'epoch': 3.23}
{'loss': 0.9511, 'learning_rate': 1.5214153893388405e-05, 'epoch': 3.23}
{'loss': 0.8834, 'learning_rate': 1.51532383865585e-05, 'epoch': 3.23}
{'loss': 0.9316, 'learning_rate': 1.5092435074748146e-05, 'epoch': 3.23}
{'loss': 0.9127, 'learning_rate': 1.5031744038359097e-05, 'epoch': 3.23}
{'loss': 0.9103, 'learning_rate': 1.4971165357644613e-05, 'epoch': 3.23}
{'loss': 0.84, 'learning_rate': 1.491069911270948e-05, 'epoch': 3.24}
{'loss': 0.8856, 'learning_rate': 1.48503453835097e-05, 'epoch': 3.24}
{'loss': 0.8837, 'learning_rate': 1.4790104249852554e-05, 'epoch': 3.24}
{'loss': 0.9049, 'learning_rate': 1.4729975791396411e-05, 'epoch': 3.24}
{'loss': 0.8747, 'learning_rate': 1.4669960087650625e-05, 'epoch': 3.24}
{'loss': 0.8525, 'learning_rate': 1.4610057217975526e-05, 'epoch': 3.24}
{'loss': 0.8555, 'learning_rate': 1.4550267261582173e-05, 'epoch': 3.24}
{'loss': 0.8905, 'learning_rate': 1.4490590297532346e-05, 'epoch': 3.25}
{'loss': 0.8001, 'learning_rate': 1.4431026404738391e-05, 'epoch': 3.25}
{'loss': 0.8561, 'learning_rate': 1.4371575661963143e-05, 'epoch': 3.25}
{'loss': 0.9014, 'learning_rate': 1.4312238147819857e-05, 'epoch': 3.25}
{'loss': 0.7926, 'learning_rate': 1.425301394077201e-05, 'epoch': 3.25}
{'loss': 0.8945, 'learning_rate': 1.4193903119133256e-05, 'epoch': 3.25}
{'loss': 0.8685, 'learning_rate': 1.4134905761067329e-05, 'epoch': 3.25}
{'loss': 0.7927, 'learning_rate': 1.407602194458797e-05, 'epoch': 3.26}
{'loss': 0.8627, 'learning_rate': 1.4017251747558712e-05, 'epoch': 3.26}
{'loss': 0.8743, 'learning_rate': 1.395859524769284e-05, 'epoch': 3.26}
{'loss': 0.8629, 'learning_rate': 1.3900052522553397e-05, 'epoch': 3.26}
{'loss': 0.8783, 'learning_rate': 1.384162364955286e-05, 'epoch': 3.26}
{'loss': 0.8307, 'learning_rate': 1.3783308705953224e-05, 'epoch': 3.26}
{'loss': 0.8257, 'learning_rate': 1.3725107768865787e-05, 'epoch': 3.26}
{'loss': 0.8161, 'learning_rate': 1.3667020915251173e-05, 'epoch': 3.27}
{'loss': 0.8053, 'learning_rate': 1.3609048221919064e-05, 'epoch': 3.27}
{'loss': 0.8356, 'learning_rate': 1.3551189765528217e-05, 'epoch': 3.27}
{'loss': 0.826, 'learning_rate': 1.3493445622586343e-05, 'epoch': 3.27}
{'loss': 0.9188, 'learning_rate': 1.3435815869449964e-05, 'epoch': 3.27}
{'loss': 0.7895, 'learning_rate': 1.3378300582324387e-05, 'epoch': 3.27}
{'loss': 0.7949, 'learning_rate': 1.3320899837263524e-05, 'epoch': 3.27}
{'loss': 0.8875, 'learning_rate': 1.3263613710169831e-05, 'epoch': 3.28}
{'loss': 0.8677, 'learning_rate': 1.3206442276794207e-05, 'epoch': 3.28}
{'loss': 0.9383, 'learning_rate': 1.3149385612735876e-05, 'epoch': 3.28}
{'loss': 0.8617, 'learning_rate': 1.3092443793442277e-05, 'epoch': 3.28}
{'loss': 0.7634, 'learning_rate': 1.303561689420909e-05, 'epoch': 3.28}
{'loss': 0.8413, 'learning_rate': 1.297890499017992e-05, 'epoch': 3.28}
{'loss': 0.9422, 'learning_rate': 1.2922308156346353e-05, 'epoch': 3.28}
{'loss': 0.9099, 'learning_rate': 1.2865826467547825e-05, 'epoch': 3.29}
{'loss': 0.8902, 'learning_rate': 1.2809459998471462e-05, 'epoch': 3.29}
{'loss': 0.867, 'learning_rate': 1.2753208823652141e-05, 'epoch': 3.29}
{'loss': 0.8341, 'learning_rate': 1.269707301747215e-05, 'epoch': 3.29}
{'loss': 0.8372, 'learning_rate': 1.2641052654161333e-05, 'epoch': 3.29}
{'loss': 0.8073, 'learning_rate': 1.2585147807796815e-05, 'epoch': 3.29}
{'loss': 0.8703, 'learning_rate': 1.2529358552302972e-05, 'epoch': 3.3}
{'loss': 0.7844, 'learning_rate': 1.2473684961451381e-05, 'epoch': 3.3}
{'loss': 0.8001, 'learning_rate': 1.2418127108860623e-05, 'epoch': 3.3}
{'loss': 0.8309, 'learning_rate': 1.236268506799625e-05, 'epoch': 3.3}
{'loss': 0.8018, 'learning_rate': 1.2307358912170686e-05, 'epoch': 3.3}
{'loss': 0.8752, 'learning_rate': 1.2252148714543088e-05, 'epoch': 3.3}
{'loss': 0.868, 'learning_rate': 1.2197054548119302e-05, 'epoch': 3.3}
{'loss': 0.9158, 'learning_rate': 1.2142076485751751e-05, 'epoch': 3.31}
{'loss': 0.8873, 'learning_rate': 1.2087214600139308e-05, 'epoch': 3.31}
{'loss': 0.8359, 'learning_rate': 1.2032468963827249e-05, 'epoch': 3.31}
{'loss': 0.8097, 'learning_rate': 1.197783964920709e-05, 'epoch': 3.31}
{'loss': 0.8627, 'learning_rate': 1.1923326728516549e-05, 'epoch': 3.31}
{'loss': 0.8869, 'learning_rate': 1.1868930273839473e-05, 'epoch': 3.31}
{'loss': 0.9054, 'learning_rate': 1.181465035710565e-05, 'epoch': 3.31}
{'loss': 0.9408, 'learning_rate': 1.1760487050090796e-05, 'epoch': 3.32}
{'loss': 0.8753, 'learning_rate': 1.170644042441642e-05, 'epoch': 3.32}
{'loss': 0.7947, 'learning_rate': 1.1652510551549723e-05, 'epoch': 3.32}
{'loss': 0.8062, 'learning_rate': 1.1598697502803568e-05, 'epoch': 3.32}
{'loss': 0.8017, 'learning_rate': 1.1545001349336315e-05, 'epoch': 3.32}
{'loss': 0.843, 'learning_rate': 1.14914221621517e-05, 'epoch': 3.32}
{'loss': 0.8078, 'learning_rate': 1.1437960012098892e-05, 'epoch': 3.32}
{'loss': 0.9372, 'learning_rate': 1.1384614969872221e-05, 'epoch': 3.33}
{'loss': 0.8341, 'learning_rate': 1.1331387106011172e-05, 'epoch': 3.33}
{'loss': 0.9547, 'learning_rate': 1.1278276490900319e-05, 'epoch': 3.33}
{'loss': 0.862, 'learning_rate': 1.1225283194769176e-05, 'epoch': 3.33}
{'loss': 0.884, 'learning_rate': 1.1172407287692099e-05, 'epoch': 3.33}
{'loss': 0.8948, 'learning_rate': 1.1119648839588258e-05, 'epoch': 3.33}
{'loss': 0.8137, 'learning_rate': 1.1067007920221439e-05, 'epoch': 3.33}
{'loss': 0.8918, 'learning_rate': 1.1014484599200125e-05, 'epoch': 3.34}
{'loss': 0.9144, 'learning_rate': 1.0962078945977195e-05, 'epoch': 3.34}
{'loss': 0.8809, 'learning_rate': 1.090979102984998e-05, 'epoch': 3.34}
{'loss': 0.8329, 'learning_rate': 1.085762091996011e-05, 'epoch': 3.34}
{'loss': 0.8637, 'learning_rate': 1.0805568685293422e-05, 'epoch': 3.34}
{'loss': 0.8378, 'learning_rate': 1.0753634394679934e-05, 'epoch': 3.34}
{'loss': 0.7993, 'learning_rate': 1.0701818116793672e-05, 'epoch': 3.34}
{'loss': 0.8492, 'learning_rate': 1.06501199201526e-05, 'epoch': 3.35}
{'loss': 0.8809, 'learning_rate': 1.0598539873118552e-05, 'epoch': 3.35}
{'loss': 0.8883, 'learning_rate': 1.054707804389713e-05, 'epoch': 3.35}
{'loss': 0.9208, 'learning_rate': 1.0495734500537591e-05, 'epoch': 3.35}
{'loss': 0.8741, 'learning_rate': 1.0444509310932848e-05, 'epoch': 3.35}
{'loss': 0.7554, 'learning_rate': 1.0393402542819231e-05, 'epoch': 3.35}
{'loss': 0.7831, 'learning_rate': 1.0342414263776512e-05, 'epoch': 3.35}
{'loss': 0.8131, 'learning_rate': 1.0291544541227804e-05, 'epoch': 3.36}
{'loss': 0.8611, 'learning_rate': 1.0240793442439411e-05, 'epoch': 3.36}
{'loss': 0.7758, 'learning_rate': 1.0190161034520795e-05, 'epoch': 3.36}
{'loss': 0.8395, 'learning_rate': 1.0139647384424477e-05, 'epoch': 3.36}
{'loss': 0.9013, 'learning_rate': 1.008925255894595e-05, 'epoch': 3.36}
{'loss': 0.8523, 'learning_rate': 1.0038976624723539e-05, 'epoch': 3.36}
{'loss': 0.9207, 'learning_rate': 9.988819648238379e-06, 'epoch': 3.36}
{'loss': 0.863, 'learning_rate': 9.938781695814337e-06, 'epoch': 3.37}
{'loss': 0.8753, 'learning_rate': 9.888862833617862e-06, 'epoch': 3.37}
{'loss': 0.8762, 'learning_rate': 9.83906312765791e-06, 'epoch': 3.37}
{'loss': 0.8391, 'learning_rate': 9.789382643785895e-06, 'epoch': 3.37}
{'loss': 0.8798, 'learning_rate': 9.739821447695585e-06, 'epoch': 3.37}
{'loss': 0.8436, 'learning_rate': 9.690379604922983e-06, 'epoch': 3.37}
{'loss': 0.8689, 'learning_rate': 9.641057180846324e-06, 'epoch': 3.38}
{'loss': 0.8863, 'learning_rate': 9.591854240685882e-06, 'epoch': 3.38}
{'loss': 0.8453, 'learning_rate': 9.542770849503946e-06, 'epoch': 3.38}
{'loss': 0.8856, 'learning_rate': 9.493807072204718e-06, 'epoch': 3.38}
{'loss': 0.7519, 'learning_rate': 9.444962973534244e-06, 'epoch': 3.38}
{'loss': 0.8428, 'learning_rate': 9.396238618080322e-06, 'epoch': 3.38}
{'loss': 0.8327, 'learning_rate': 9.347634070272404e-06, 'epoch': 3.38}
{'loss': 0.8393, 'learning_rate': 9.299149394381501e-06, 'epoch': 3.39}
{'loss': 0.9051, 'learning_rate': 9.250784654520106e-06, 'epoch': 3.39}
{'loss': 0.9085, 'learning_rate': 9.202539914642182e-06, 'epoch': 3.39}
{'loss': 0.9799, 'learning_rate': 9.154415238542946e-06, 'epoch': 3.39}
{'loss': 0.8416, 'learning_rate': 9.106410689858857e-06, 'epoch': 3.39}
{'loss': 0.9105, 'learning_rate': 9.058526332067586e-06, 'epoch': 3.39}
{'loss': 0.9668, 'learning_rate': 9.010762228487813e-06, 'epoch': 3.39}
{'loss': 0.7611, 'learning_rate': 8.963118442279205e-06, 'epoch': 3.4}
{'loss': 0.8875, 'learning_rate': 8.915595036442349e-06, 'epoch': 3.4}
{'loss': 0.7758, 'learning_rate': 8.868192073818671e-06, 'epoch': 3.4}
{'loss': 0.9575, 'learning_rate': 8.820909617090289e-06, 'epoch': 3.4}
{'loss': 0.8685, 'learning_rate': 8.773747728780001e-06, 'epoch': 3.4}
{'loss': 0.8455, 'learning_rate': 8.726706471251156e-06, 'epoch': 3.4}
{'loss': 0.8608, 'learning_rate': 8.679785906707582e-06, 'epoch': 3.4}
{'loss': 0.9329, 'learning_rate': 8.632986097193573e-06, 'epoch': 3.41}
{'loss': 0.8921, 'learning_rate': 8.586307104593672e-06, 'epoch': 3.41}
{'loss': 0.8398, 'learning_rate': 8.539748990632701e-06, 'epoch': 3.41}
{'loss': 0.9086, 'learning_rate': 8.493311816875615e-06, 'epoch': 3.41}
{'loss': 0.9483, 'learning_rate': 8.446995644727473e-06, 'epoch': 3.41}
{'loss': 0.9482, 'learning_rate': 8.40080053543334e-06, 'epoch': 3.41}
{'loss': 0.9169, 'learning_rate': 8.354726550078152e-06, 'epoch': 3.41}
{'loss': 0.8638, 'learning_rate': 8.308773749586728e-06, 'epoch': 3.42}
{'loss': 0.9535, 'learning_rate': 8.2629421947236e-06, 'epoch': 3.42}
{'loss': 0.9294, 'learning_rate': 8.217231946092984e-06, 'epoch': 3.42}
{'loss': 0.871, 'learning_rate': 8.171643064138735e-06, 'epoch': 3.42}
{'loss': 0.8875, 'learning_rate': 8.12617560914416e-06, 'epoch': 3.42}
{'loss': 0.8881, 'learning_rate': 8.080829641232013e-06, 'epoch': 3.42}
{'loss': 0.8584, 'learning_rate': 8.03560522036445e-06, 'epoch': 3.42}
{'loss': 0.7881, 'learning_rate': 7.990502406342836e-06, 'epoch': 3.43}
{'loss': 0.742, 'learning_rate': 7.945521258807776e-06, 'epoch': 3.43}
{'loss': 0.8735, 'learning_rate': 7.900661837238977e-06, 'epoch': 3.43}
{'loss': 0.8317, 'learning_rate': 7.8559242009552e-06, 'epoch': 3.43}
{'loss': 0.7784, 'learning_rate': 7.811308409114138e-06, 'epoch': 3.43}
{'loss': 0.8541, 'learning_rate': 7.766814520712384e-06, 'epoch': 3.43}
{'loss': 0.8901, 'learning_rate': 7.72244259458531e-06, 'epoch': 3.43}
 88%|██████████████████████████████████████████████████████████████████████████████████████████████████              | 2408/2752 [41:23<05:49,  1.02s/it][2023-12-29 02:44:49,086] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,086] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,087] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,087] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,087] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,087] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,087] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,087] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,087] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,087] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,087] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,087] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,088] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,088] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,092] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,093] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,343] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,344] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,344] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,345] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:44:49,608] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,609] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:44:49,873] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:49,873] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:44:50,124] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:50,124] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:44:50,410] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:50,410] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:44:50,675] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:50,675] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:44:50,927] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:50,928] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:44:51,189] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:51,189] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:44:51,451] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:51,451] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:44:51,721] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:51,721] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:44:51,984] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:51,985] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:44:52,249] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:44:52,249] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 1.0039585828781128, 'eval_runtime': 3.1719, 'eval_samples_per_second': 344.273, 'eval_steps_per_second': 21.754, 'epoch': 3.43}
{'loss': 0.8212, 'learning_rate': 7.678192689407082e-06, 'epoch': 3.44}
{'loss': 0.6998, 'learning_rate': 7.634064863690448e-06, 'epoch': 3.44}
{'loss': 0.8835, 'learning_rate': 7.590059175786746e-06, 'epoch': 3.44}
{'loss': 0.9484, 'learning_rate': 7.546175683885814e-06, 'epoch': 3.44}
{'loss': 0.844, 'learning_rate': 7.502414446015893e-06, 'epoch': 3.44}
{'loss': 0.8346, 'learning_rate': 7.45877552004357e-06, 'epoch': 3.44}
{'loss': 0.8501, 'learning_rate': 7.415258963673732e-06, 'epoch': 3.44}
{'loss': 0.8103, 'learning_rate': 7.371864834449405e-06, 'epoch': 3.45}
{'loss': 0.9139, 'learning_rate': 7.328593189751754e-06, 'epoch': 3.45}
{'loss': 0.9085, 'learning_rate': 7.285444086799942e-06, 'epoch': 3.45}
{'loss': 0.9066, 'learning_rate': 7.2424175826511286e-06, 'epoch': 3.45}
{'loss': 0.8778, 'learning_rate': 7.199513734200369e-06, 'epoch': 3.45}
{'loss': 0.841, 'learning_rate': 7.156732598180505e-06, 'epoch': 3.45}
{'loss': 0.9662, 'learning_rate': 7.114074231162082e-06, 'epoch': 3.45}
{'loss': 0.8999, 'learning_rate': 7.071538689553381e-06, 'epoch': 3.46}
{'loss': 0.8571, 'learning_rate': 7.029126029600197e-06, 'epoch': 3.46}
{'loss': 0.8339, 'learning_rate': 6.986836307385858e-06, 'epoch': 3.46}
{'loss': 0.7903, 'learning_rate': 6.944669578831176e-06, 'epoch': 3.46}
{'loss': 0.9298, 'learning_rate': 6.902625899694237e-06, 'epoch': 3.46}
{'loss': 0.8453, 'learning_rate': 6.860705325570494e-06, 'epoch': 3.46}
{'loss': 0.8637, 'learning_rate': 6.818907911892558e-06, 'epoch': 3.47}
{'loss': 0.8856, 'learning_rate': 6.777233713930198e-06, 'epoch': 3.47}
{'loss': 0.9112, 'learning_rate': 6.7356827867902984e-06, 'epoch': 3.47}
{'loss': 0.8057, 'learning_rate': 6.694255185416687e-06, 'epoch': 3.47}
{'loss': 0.8563, 'learning_rate': 6.652950964590121e-06, 'epoch': 3.47}
{'loss': 0.8958, 'learning_rate': 6.611770178928223e-06, 'epoch': 3.47}
{'loss': 0.8808, 'learning_rate': 6.570712882885355e-06, 'epoch': 3.47}
{'loss': 0.9118, 'learning_rate': 6.529779130752678e-06, 'epoch': 3.48}
{'loss': 0.9069, 'learning_rate': 6.488968976657894e-06, 'epoch': 3.48}
{'loss': 0.844, 'learning_rate': 6.448282474565303e-06, 'epoch': 3.48}
{'loss': 0.8704, 'learning_rate': 6.407719678275703e-06, 'epoch': 3.48}
{'loss': 0.8371, 'learning_rate': 6.3672806414262765e-06, 'epoch': 3.48}
{'loss': 0.8223, 'learning_rate': 6.326965417490638e-06, 'epoch': 3.48}
{'loss': 0.8485, 'learning_rate': 6.286774059778599e-06, 'epoch': 3.48}
{'loss': 0.9, 'learning_rate': 6.246706621436205e-06, 'epoch': 3.49}
{'loss': 0.8453, 'learning_rate': 6.206763155445627e-06, 'epoch': 3.49}
{'loss': 0.8473, 'learning_rate': 6.166943714625173e-06, 'epoch': 3.49}
{'loss': 0.8125, 'learning_rate': 6.127248351629056e-06, 'epoch': 3.49}
{'loss': 0.8782, 'learning_rate': 6.087677118947455e-06, 'epoch': 3.49}
{'loss': 0.853, 'learning_rate': 6.0482300689064466e-06, 'epoch': 3.49}
{'loss': 0.9692, 'learning_rate': 6.008907253667839e-06, 'epoch': 3.49}
{'loss': 0.9187, 'learning_rate': 5.969708725229195e-06, 'epoch': 3.5}
{'loss': 0.8498, 'learning_rate': 5.930634535423696e-06, 'epoch': 3.5}
{'loss': 0.8456, 'learning_rate': 5.891684735920167e-06, 'epoch': 3.5}
{'loss': 0.8237, 'learning_rate': 5.852859378222897e-06, 'epoch': 3.5}
{'loss': 0.8031, 'learning_rate': 5.81415851367163e-06, 'epoch': 3.5}
{'loss': 0.9069, 'learning_rate': 5.77558219344152e-06, 'epoch': 3.5}
{'loss': 0.8812, 'learning_rate': 5.737130468542972e-06, 'epoch': 3.5}
{'loss': 0.8748, 'learning_rate': 5.698803389821728e-06, 'epoch': 3.51}
{'loss': 0.8297, 'learning_rate': 5.6606010079586215e-06, 'epoch': 3.51}
{'loss': 0.8528, 'learning_rate': 5.622523373469635e-06, 'epoch': 3.51}
{'loss': 0.8046, 'learning_rate': 5.58457053670578e-06, 'epoch': 3.51}
{'loss': 0.9836, 'learning_rate': 5.546742547853067e-06, 'epoch': 3.51}
{'loss': 0.8655, 'learning_rate': 5.509039456932385e-06, 'epoch': 3.51}
{'loss': 0.8693, 'learning_rate': 5.471461313799497e-06, 'epoch': 3.51}
{'loss': 0.8959, 'learning_rate': 5.434008168144944e-06, 'epoch': 3.52}
{'loss': 0.7543, 'learning_rate': 5.396680069493953e-06, 'epoch': 3.52}
{'loss': 0.8862, 'learning_rate': 5.359477067206397e-06, 'epoch': 3.52}
{'loss': 0.8258, 'learning_rate': 5.322399210476781e-06, 'epoch': 3.52}
{'loss': 0.8141, 'learning_rate': 5.2854465483340725e-06, 'epoch': 3.52}
{'loss': 0.8996, 'learning_rate': 5.248619129641707e-06, 'epoch': 3.52}
{'loss': 0.8866, 'learning_rate': 5.211917003097544e-06, 'epoch': 3.52}
{'loss': 0.9403, 'learning_rate': 5.175340217233704e-06, 'epoch': 3.53}
{'loss': 0.816, 'learning_rate': 5.1388888204165875e-06, 'epoch': 3.53}
{'loss': 0.9147, 'learning_rate': 5.102562860846827e-06, 'epoch': 3.53}
{'loss': 0.8745, 'learning_rate': 5.066362386559154e-06, 'epoch': 3.53}
{'loss': 0.8936, 'learning_rate': 5.030287445422366e-06, 'epoch': 3.53}
{'loss': 0.862, 'learning_rate': 4.9943380851392604e-06, 'epoch': 3.53}
{'loss': 0.9836, 'learning_rate': 4.958514353246602e-06, 'epoch': 3.53}
{'loss': 0.8044, 'learning_rate': 4.9228162971149846e-06, 'epoch': 3.54}
{'loss': 0.866, 'learning_rate': 4.887243963948895e-06, 'epoch': 3.54}
{'loss': 0.7985, 'learning_rate': 4.851797400786506e-06, 'epoch': 3.54}
{'loss': 0.8844, 'learning_rate': 4.816476654499713e-06, 'epoch': 3.54}
{'loss': 1.0106, 'learning_rate': 4.781281771794033e-06, 'epoch': 3.54}
{'loss': 0.8814, 'learning_rate': 4.746212799208527e-06, 'epoch': 3.54}
{'loss': 0.802, 'learning_rate': 4.7112697831158126e-06, 'epoch': 3.55}
{'loss': 0.8892, 'learning_rate': 4.676452769721917e-06, 'epoch': 3.55}
{'loss': 0.911, 'learning_rate': 4.641761805066258e-06, 'epoch': 3.55}
{'loss': 0.8486, 'learning_rate': 4.607196935021574e-06, 'epoch': 3.55}
{'loss': 0.9191, 'learning_rate': 4.572758205293848e-06, 'epoch': 3.55}
{'loss': 0.8329, 'learning_rate': 4.53844566142232e-06, 'epoch': 3.55}
{'loss': 0.8538, 'learning_rate': 4.504259348779316e-06, 'epoch': 3.55}
{'loss': 0.8515, 'learning_rate': 4.470199312570256e-06, 'epoch': 3.56}
{'loss': 0.7707, 'learning_rate': 4.4362655978336e-06, 'epoch': 3.56}
{'loss': 0.9054, 'learning_rate': 4.4024582494407556e-06, 'epoch': 3.56}
{'loss': 0.8464, 'learning_rate': 4.368777312096006e-06, 'epoch': 3.56}
{'loss': 0.86, 'learning_rate': 4.3352228303365605e-06, 'epoch': 3.56}
{'loss': 0.8042, 'learning_rate': 4.3017948485323255e-06, 'epoch': 3.56}
{'loss': 0.927, 'learning_rate': 4.2684934108859765e-06, 'epoch': 3.56}
{'loss': 0.9424, 'learning_rate': 4.235318561432844e-06, 'epoch': 3.57}
{'loss': 0.8785, 'learning_rate': 4.2022703440408486e-06, 'epoch': 3.57}
{'loss': 0.8664, 'learning_rate': 4.169348802410522e-06, 'epoch': 3.57}
{'loss': 0.8852, 'learning_rate': 4.136553980074842e-06, 'epoch': 3.57}
{'loss': 0.8903, 'learning_rate': 4.10388592039922e-06, 'epoch': 3.57}
{'loss': 0.9148, 'learning_rate': 4.071344666581456e-06, 'epoch': 3.57}
{'loss': 0.9007, 'learning_rate': 4.038930261651674e-06, 'epoch': 3.57}
{'loss': 0.8743, 'learning_rate': 4.006642748472278e-06, 'epoch': 3.58}
{'loss': 0.8454, 'learning_rate': 3.974482169737859e-06, 'epoch': 3.58}
{'loss': 0.8139, 'learning_rate': 3.9424485679751546e-06, 'epoch': 3.58}
{'loss': 0.8817, 'learning_rate': 3.910541985543014e-06, 'epoch': 3.58}
{'loss': 0.9012, 'learning_rate': 3.878762464632313e-06, 'epoch': 3.58}
{'loss': 0.8465, 'learning_rate': 3.847110047265911e-06, 'epoch': 3.58}
{'loss': 0.9034, 'learning_rate': 3.81558477529862e-06, 'epoch': 3.58}
{'loss': 0.8386, 'learning_rate': 3.7841866904170798e-06, 'epoch': 3.59}
{'loss': 0.8854, 'learning_rate': 3.752915834139781e-06, 'epoch': 3.59}
{'loss': 0.8416, 'learning_rate': 3.7217722478169903e-06, 'epoch': 3.59}
{'loss': 0.8349, 'learning_rate': 3.690755972630622e-06, 'epoch': 3.59}
{'loss': 0.8278, 'learning_rate': 3.6598670495943123e-06, 'epoch': 3.59}
{'loss': 0.8793, 'learning_rate': 3.629105519553255e-06, 'epoch': 3.59}
{'loss': 0.847, 'learning_rate': 3.598471423184202e-06, 'epoch': 3.59}
{'loss': 0.891, 'learning_rate': 3.5679648009953935e-06, 'epoch': 3.6}
{'loss': 0.9706, 'learning_rate': 3.537585693326484e-06, 'epoch': 3.6}
{'loss': 0.9981, 'learning_rate': 3.5073341403485727e-06, 'epoch': 3.6}
{'loss': 0.789, 'learning_rate': 3.477210182064039e-06, 'epoch': 3.6}
{'loss': 0.9186, 'learning_rate': 3.447213858306564e-06, 'epoch': 3.6}
{'loss': 0.8503, 'learning_rate': 3.4173452087410187e-06, 'epoch': 3.6}
{'loss': 0.8136, 'learning_rate': 3.3876042728635092e-06, 'epoch': 3.6}
{'loss': 0.8728, 'learning_rate': 3.357991090001189e-06, 'epoch': 3.61}
{'loss': 0.7827, 'learning_rate': 3.3285056993123455e-06, 'epoch': 3.61}
{'loss': 0.9014, 'learning_rate': 3.2991481397862568e-06, 'epoch': 3.61}
{'loss': 0.9747, 'learning_rate': 3.269918450243159e-06, 'epoch': 3.61}
{'loss': 0.9531, 'learning_rate': 3.2408166693342123e-06, 'epoch': 3.61}
{'loss': 0.866, 'learning_rate': 3.211842835541423e-06, 'epoch': 3.61}
{'loss': 0.9305, 'learning_rate': 3.1829969871776555e-06, 'epoch': 3.61}
{'loss': 0.8745, 'learning_rate': 3.1542791623864863e-06, 'epoch': 3.62}
{'loss': 0.8253, 'learning_rate': 3.125689399142229e-06, 'epoch': 3.62}
{'loss': 0.8647, 'learning_rate': 3.0972277352498303e-06, 'epoch': 3.62}
{'loss': 0.993, 'learning_rate': 3.0688942083448967e-06, 'epoch': 3.62}
{'loss': 0.8882, 'learning_rate': 3.0406888558935476e-06, 'epoch': 3.62}
{'loss': 0.8133, 'learning_rate': 3.012611715192437e-06, 'epoch': 3.62}
{'loss': 0.859, 'learning_rate': 2.984662823368689e-06, 'epoch': 3.62}
{'loss': 0.8799, 'learning_rate': 2.9568422173798294e-06, 'epoch': 3.63}
{'loss': 0.8258, 'learning_rate': 2.929149934013742e-06, 'epoch': 3.63}
{'loss': 0.8953, 'learning_rate': 2.901586009888624e-06, 'epoch': 3.63}
{'loss': 0.9015, 'learning_rate': 2.874150481452975e-06, 'epoch': 3.63}
{'loss': 0.9216, 'learning_rate': 2.846843384985476e-06, 'epoch': 3.63}
{'loss': 0.842, 'learning_rate': 2.8196647565949864e-06, 'epoch': 3.63}
{'loss': 0.8822, 'learning_rate': 2.7926146322204914e-06, 'epoch': 3.64}
{'loss': 0.8613, 'learning_rate': 2.7656930476310683e-06, 'epoch': 3.64}
{'loss': 0.8748, 'learning_rate': 2.7389000384257955e-06, 'epoch': 3.64}
{'loss': 0.8796, 'learning_rate': 2.7122356400337667e-06, 'epoch': 3.64}
{'loss': 0.8332, 'learning_rate': 2.6856998877139773e-06, 'epoch': 3.64}
{'loss': 0.8884, 'learning_rate': 2.6592928165553143e-06, 'epoch': 3.64}
{'loss': 0.9207, 'learning_rate': 2.633014461476524e-06, 'epoch': 3.64}
{'loss': 0.8051, 'learning_rate': 2.6068648572261543e-06, 'epoch': 3.65}
{'loss': 0.8006, 'learning_rate': 2.5808440383824796e-06, 'epoch': 3.65}
{'loss': 0.9132, 'learning_rate': 2.554952039353475e-06, 'epoch': 3.65}
{'loss': 0.9026, 'learning_rate': 2.5291888943767992e-06, 'epoch': 3.65}
{'loss': 0.8566, 'learning_rate': 2.5035546375197006e-06, 'epoch': 3.65}
{'loss': 0.8617, 'learning_rate': 2.47804930267902e-06, 'epoch': 3.65}
{'loss': 0.87, 'learning_rate': 2.4526729235810896e-06, 'epoch': 3.65}
{'loss': 0.8373, 'learning_rate': 2.427425533781746e-06, 'epoch': 3.66}
{'loss': 0.9077, 'learning_rate': 2.4023071666662624e-06, 'epoch': 3.66}
{'loss': 0.9626, 'learning_rate': 2.377317855449268e-06, 'epoch': 3.66}
{'loss': 1.0056, 'learning_rate': 2.3524576331747762e-06, 'epoch': 3.66}
{'loss': 0.8332, 'learning_rate': 2.3277265327160904e-06, 'epoch': 3.66}
{'loss': 0.8724, 'learning_rate': 2.3031245867757734e-06, 'epoch': 3.66}
{'loss': 0.8799, 'learning_rate': 2.2786518278855807e-06, 'epoch': 3.66}
{'loss': 0.9257, 'learning_rate': 2.2543082884064815e-06, 'epoch': 3.67}
{'loss': 0.8547, 'learning_rate': 2.2300940005285374e-06, 'epoch': 3.67}
{'loss': 0.9901, 'learning_rate': 2.2060089962709253e-06, 'epoch': 3.67}
{'loss': 0.8688, 'learning_rate': 2.182053307481857e-06, 'epoch': 3.67}
{'loss': 0.904, 'learning_rate': 2.158226965838539e-06, 'epoch': 3.67}
{'loss': 0.7944, 'learning_rate': 2.134530002847146e-06, 'epoch': 3.67}
{'loss': 0.9484, 'learning_rate': 2.1109624498427794e-06, 'epoch': 3.67}
{'loss': 0.8784, 'learning_rate': 2.0875243379893883e-06, 'epoch': 3.68}
{'loss': 0.8592, 'learning_rate': 2.0642156982798144e-06, 'epoch': 3.68}
{'loss': 0.8724, 'learning_rate': 2.0410365615356365e-06, 'epoch': 3.68}
{'loss': 0.8775, 'learning_rate': 2.0179869584072254e-06, 'epoch': 3.68}
{'loss': 0.8888, 'learning_rate': 1.995066919373645e-06, 'epoch': 3.68}
{'loss': 0.8184, 'learning_rate': 1.9722764747426515e-06, 'epoch': 3.68}
{'loss': 0.8942, 'learning_rate': 1.9496156546506274e-06, 'epoch': 3.68}
 94%|█████████████████████████████████████████████████████████████████████████████████████████████████████████       | 2580/2752 [44:21<02:54,  1.01s/it][2023-12-29 02:47:46,587] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,587] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,587] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,587] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,587] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,587] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,587] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,588] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,588] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,588] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,588] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,588] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,588] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,589] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,589] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,589] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,843] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,843] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,844] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:46,845] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:47:47,108] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:47,108] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:47:47,373] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:47,374] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:47:47,624] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:47,624] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:47:47,910] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:47,910] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:47:48,180] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:48,180] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:47:48,432] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:48,433] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:47:48,696] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:48,697] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:47:48,959] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:48,959] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:47:49,228] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:49,229] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:47:49,494] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:49,495] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:47:49,758] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:47:49,759] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 1.0016659498214722, 'eval_runtime': 3.183, 'eval_samples_per_second': 343.073, 'eval_steps_per_second': 21.678, 'epoch': 3.68}
{'loss': 0.8634, 'learning_rate': 1.927084489062547e-06, 'epoch': 3.69}
{'loss': 0.8513, 'learning_rate': 1.9046830077719236e-06, 'epoch': 3.69}
{'loss': 0.876, 'learning_rate': 1.8824112404008275e-06, 'epoch': 3.69}
{'loss': 0.8708, 'learning_rate': 1.8602692163997681e-06, 'epoch': 3.69}
{'loss': 0.9077, 'learning_rate': 1.8382569650477133e-06, 'epoch': 3.69}
{'loss': 0.9603, 'learning_rate': 1.8163745154520129e-06, 'epoch': 3.69}
{'loss': 0.8131, 'learning_rate': 1.7946218965483763e-06, 'epoch': 3.69}
{'loss': 0.82, 'learning_rate': 1.7729991371008502e-06, 'epoch': 3.7}
{'loss': 0.8551, 'learning_rate': 1.7515062657017632e-06, 'epoch': 3.7}
{'loss': 0.9039, 'learning_rate': 1.7301433107716592e-06, 'epoch': 3.7}
{'loss': 0.8059, 'learning_rate': 1.708910300559341e-06, 'epoch': 3.7}
{'loss': 0.8944, 'learning_rate': 1.6878072631417386e-06, 'epoch': 3.7}
{'loss': 0.8878, 'learning_rate': 1.6668342264239522e-06, 'epoch': 3.7}
{'loss': 0.9237, 'learning_rate': 1.6459912181391312e-06, 'epoch': 3.7}
{'loss': 0.8837, 'learning_rate': 1.6252782658485178e-06, 'epoch': 3.71}
{'loss': 0.8663, 'learning_rate': 1.6046953969413915e-06, 'epoch': 3.71}
{'loss': 0.8583, 'learning_rate': 1.5842426386349917e-06, 'epoch': 3.71}
{'loss': 0.8618, 'learning_rate': 1.5639200179745184e-06, 'epoch': 3.71}
{'loss': 0.8405, 'learning_rate': 1.543727561833086e-06, 'epoch': 3.71}
{'loss': 0.9588, 'learning_rate': 1.5236652969116804e-06, 'epoch': 3.71}
{'loss': 0.8364, 'learning_rate': 1.5037332497391588e-06, 'epoch': 3.72}
{'loss': 0.9264, 'learning_rate': 1.4839314466721599e-06, 'epoch': 3.72}
{'loss': 0.8804, 'learning_rate': 1.4642599138951163e-06, 'epoch': 3.72}
{'loss': 0.9264, 'learning_rate': 1.444718677420176e-06, 'epoch': 3.72}
{'loss': 0.8977, 'learning_rate': 1.4253077630872357e-06, 'epoch': 3.72}
{'loss': 0.8575, 'learning_rate': 1.4060271965638194e-06, 'epoch': 3.72}
{'loss': 0.838, 'learning_rate': 1.3868770033451328e-06, 'epoch': 3.72}
{'loss': 0.812, 'learning_rate': 1.367857208753931e-06, 'epoch': 3.73}
{'loss': 0.8061, 'learning_rate': 1.3489678379405956e-06, 'epoch': 3.73}
{'loss': 0.9596, 'learning_rate': 1.3302089158829912e-06, 'epoch': 3.73}
{'loss': 0.8883, 'learning_rate': 1.3115804673865306e-06, 'epoch': 3.73}
{'loss': 0.8919, 'learning_rate': 1.2930825170840877e-06, 'epoch': 3.73}
{'loss': 0.7877, 'learning_rate': 1.2747150894359738e-06, 'epoch': 3.73}
{'loss': 0.8302, 'learning_rate': 1.256478208729883e-06, 'epoch': 3.73}
{'loss': 0.7751, 'learning_rate': 1.2383718990809146e-06, 'epoch': 3.74}
{'loss': 0.934, 'learning_rate': 1.2203961844315048e-06, 'epoch': 3.74}
{'loss': 0.8424, 'learning_rate': 1.2025510885513847e-06, 'epoch': 3.74}
{'loss': 0.8013, 'learning_rate': 1.1848366350375895e-06, 'epoch': 3.74}
{'loss': 0.9296, 'learning_rate': 1.1672528473143818e-06, 'epoch': 3.74}
{'loss': 0.9107, 'learning_rate': 1.1497997486332513e-06, 'epoch': 3.74}
{'loss': 0.8821, 'learning_rate': 1.1324773620728702e-06, 'epoch': 3.74}
{'loss': 0.7834, 'learning_rate': 1.1152857105390602e-06, 'epoch': 3.75}
{'loss': 0.8085, 'learning_rate': 1.0982248167647923e-06, 'epoch': 3.75}
{'loss': 0.8174, 'learning_rate': 1.0812947033101207e-06, 'epoch': 3.75}
{'loss': 0.9087, 'learning_rate': 1.0644953925621482e-06, 'epoch': 3.75}
{'loss': 0.8706, 'learning_rate': 1.0478269067350166e-06, 'epoch': 3.75}
{'loss': 0.8524, 'learning_rate': 1.0312892678699281e-06, 'epoch': 3.75}
{'loss': 0.8352, 'learning_rate': 1.0148824978349792e-06, 'epoch': 3.75}
{'loss': 0.8716, 'learning_rate': 9.986066183252818e-07, 'epoch': 3.76}
{'loss': 0.8081, 'learning_rate': 9.824616508628315e-07, 'epoch': 3.76}
{'loss': 0.9079, 'learning_rate': 9.664476167965397e-07, 'epoch': 3.76}
{'loss': 0.9137, 'learning_rate': 9.505645373021455e-07, 'epoch': 3.76}
{'loss': 0.8851, 'learning_rate': 9.348124333822706e-07, 'epoch': 3.76}
{'loss': 0.9555, 'learning_rate': 9.191913258663199e-07, 'epoch': 3.76}
{'loss': 0.8582, 'learning_rate': 9.03701235410459e-07, 'epoch': 3.76}
{'loss': 0.989, 'learning_rate': 8.883421824976479e-07, 'epoch': 3.77}
{'loss': 0.864, 'learning_rate': 8.731141874375403e-07, 'epoch': 3.77}
{'loss': 0.9209, 'learning_rate': 8.580172703664846e-07, 'epoch': 3.77}
{'loss': 0.7957, 'learning_rate': 8.430514512475452e-07, 'epoch': 3.77}
{'loss': 0.9754, 'learning_rate': 8.282167498703918e-07, 'epoch': 3.77}
{'loss': 0.8648, 'learning_rate': 8.135131858513223e-07, 'epoch': 3.77}
{'loss': 0.9934, 'learning_rate': 7.989407786332393e-07, 'epoch': 3.77}
{'loss': 0.9181, 'learning_rate': 7.844995474855843e-07, 'epoch': 3.78}
{'loss': 0.7916, 'learning_rate': 7.701895115043822e-07, 'epoch': 3.78}
{'loss': 0.8838, 'learning_rate': 7.560106896121522e-07, 'epoch': 3.78}
{'loss': 0.9393, 'learning_rate': 7.419631005579075e-07, 'epoch': 3.78}
{'loss': 0.9027, 'learning_rate': 7.280467629171339e-07, 'epoch': 3.78}
{'loss': 0.8829, 'learning_rate': 7.142616950917446e-07, 'epoch': 3.78}
{'loss': 0.8453, 'learning_rate': 7.00607915310103e-07, 'epoch': 3.78}
{'loss': 0.8823, 'learning_rate': 6.870854416269334e-07, 'epoch': 3.79}
{'loss': 0.853, 'learning_rate': 6.736942919233436e-07, 'epoch': 3.79}
{'loss': 0.9244, 'learning_rate': 6.604344839068021e-07, 'epoch': 3.79}
{'loss': 0.9075, 'learning_rate': 6.473060351110727e-07, 'epoch': 3.79}
{'loss': 0.8477, 'learning_rate': 6.343089628962462e-07, 'epoch': 3.79}
{'loss': 0.84, 'learning_rate': 6.214432844486861e-07, 'epoch': 3.79}
{'loss': 0.9991, 'learning_rate': 6.087090167809839e-07, 'epoch': 3.8}
{'loss': 0.9178, 'learning_rate': 5.961061767320142e-07, 'epoch': 3.8}
{'loss': 0.903, 'learning_rate': 5.836347809668019e-07, 'epoch': 3.8}
{'loss': 0.9061, 'learning_rate': 5.712948459765887e-07, 'epoch': 3.8}
{'loss': 0.9127, 'learning_rate': 5.590863880788111e-07, 'epoch': 3.8}
{'loss': 0.9721, 'learning_rate': 5.470094234169998e-07, 'epoch': 3.8}
{'loss': 0.8808, 'learning_rate': 5.35063967960836e-07, 'epoch': 3.8}
{'loss': 0.8217, 'learning_rate': 5.232500375060956e-07, 'epoch': 3.81}
{'loss': 0.9174, 'learning_rate': 5.115676476746489e-07, 'epoch': 3.81}
{'loss': 0.8476, 'learning_rate': 5.000168139143946e-07, 'epoch': 3.81}
{'loss': 0.9031, 'learning_rate': 4.885975514993147e-07, 'epoch': 3.81}
{'loss': 0.854, 'learning_rate': 4.773098755293747e-07, 'epoch': 3.81}
{'loss': 0.8018, 'learning_rate': 4.661538009305577e-07, 'epoch': 3.81}
{'loss': 0.9106, 'learning_rate': 4.5512934245481865e-07, 'epoch': 3.81}
{'loss': 0.9109, 'learning_rate': 4.442365146800853e-07, 'epoch': 3.82}
{'loss': 0.8609, 'learning_rate': 4.3347533201022474e-07, 'epoch': 3.82}
{'loss': 0.8284, 'learning_rate': 4.2284580867500976e-07, 'epoch': 3.82}
{'loss': 0.8601, 'learning_rate': 4.1234795873013045e-07, 'epoch': 3.82}
{'loss': 0.9274, 'learning_rate': 4.0198179605716033e-07, 'epoch': 3.82}
{'loss': 0.8526, 'learning_rate': 3.9174733436353475e-07, 'epoch': 3.82}
{'loss': 0.9332, 'learning_rate': 3.8164458718255025e-07, 'epoch': 3.82}
{'loss': 0.8634, 'learning_rate': 3.7167356787332073e-07, 'epoch': 3.83}
{'loss': 0.821, 'learning_rate': 3.6183428962077716e-07, 'epoch': 3.83}
{'loss': 0.7866, 'learning_rate': 3.5212676543563416e-07, 'epoch': 3.83}
{'loss': 0.8669, 'learning_rate': 3.4255100815442365e-07, 'epoch': 3.83}
{'loss': 0.9061, 'learning_rate': 3.3310703043938354e-07, 'epoch': 3.83}
{'loss': 0.9099, 'learning_rate': 3.237948447785466e-07, 'epoch': 3.83}
{'loss': 0.9021, 'learning_rate': 3.146144634856407e-07, 'epoch': 3.83}
{'loss': 0.874, 'learning_rate': 3.0556589870012196e-07, 'epoch': 3.84}
{'loss': 0.901, 'learning_rate': 2.966491623871193e-07, 'epoch': 3.84}
{'loss': 0.8373, 'learning_rate': 2.8786426633747863e-07, 'epoch': 3.84}
{'loss': 0.8567, 'learning_rate': 2.792112221676857e-07, 'epoch': 3.84}
{'loss': 0.9109, 'learning_rate': 2.7069004131987653e-07, 'epoch': 3.84}
{'loss': 0.9547, 'learning_rate': 2.623007350618267e-07, 'epoch': 3.84}
{'loss': 0.8617, 'learning_rate': 2.540433144869292e-07, 'epoch': 3.84}
{'loss': 0.9077, 'learning_rate': 2.4591779051416075e-07, 'epoch': 3.85}
{'loss': 0.9076, 'learning_rate': 2.379241738881377e-07, 'epoch': 3.85}
{'loss': 0.8944, 'learning_rate': 2.300624751790048e-07, 'epoch': 3.85}
{'loss': 0.9022, 'learning_rate': 2.223327047824908e-07, 'epoch': 3.85}
{'loss': 0.8671, 'learning_rate': 2.1473487291986395e-07, 'epoch': 3.85}
{'loss': 1.0033, 'learning_rate': 2.0726898963793205e-07, 'epoch': 3.85}
{'loss': 0.9429, 'learning_rate': 1.9993506480900926e-07, 'epoch': 3.85}
{'loss': 0.9008, 'learning_rate': 1.9273310813093804e-07, 'epoch': 3.86}
{'loss': 0.9401, 'learning_rate': 1.8566312912706718e-07, 'epoch': 3.86}
{'loss': 0.8867, 'learning_rate': 1.78725137146174e-07, 'epoch': 3.86}
{'loss': 0.8705, 'learning_rate': 1.7191914136256427e-07, 'epoch': 3.86}
{'loss': 0.9031, 'learning_rate': 1.6524515077597224e-07, 'epoch': 3.86}
{'loss': 0.9782, 'learning_rate': 1.58703174211583e-07, 'epoch': 3.86}
{'loss': 0.8514, 'learning_rate': 1.5229322032002115e-07, 'epoch': 3.86}
{'loss': 0.8936, 'learning_rate': 1.4601529757732878e-07, 'epoch': 3.87}
{'loss': 0.8864, 'learning_rate': 1.3986941428496548e-07, 'epoch': 3.87}
{'loss': 0.9202, 'learning_rate': 1.3385557856977483e-07, 'epoch': 3.87}
{'loss': 0.9492, 'learning_rate': 1.27973798384029e-07, 'epoch': 3.87}
{'loss': 0.8922, 'learning_rate': 1.2222408150532882e-07, 'epoch': 3.87}
{'loss': 0.8704, 'learning_rate': 1.1660643553668138e-07, 'epoch': 3.87}
{'loss': 0.7991, 'learning_rate': 1.111208679064446e-07, 'epoch': 3.88}
{'loss': 0.9127, 'learning_rate': 1.0576738586831614e-07, 'epoch': 3.88}
{'loss': 0.9113, 'learning_rate': 1.0054599650135555e-07, 'epoch': 3.88}
{'loss': 0.891, 'learning_rate': 9.545670670991769e-08, 'epoch': 3.88}
{'loss': 0.8205, 'learning_rate': 9.049952322370824e-08, 'epoch': 3.88}
{'loss': 0.8465, 'learning_rate': 8.567445259775042e-08, 'epoch': 3.88}
{'loss': 0.91, 'learning_rate': 8.09815012123294e-08, 'epoch': 3.88}
{'loss': 0.8982, 'learning_rate': 7.642067527308116e-08, 'epoch': 3.89}
{'loss': 0.9924, 'learning_rate': 7.199198081087044e-08, 'epoch': 3.89}
{'loss': 0.8688, 'learning_rate': 6.769542368190162e-08, 'epoch': 3.89}
{'loss': 0.8771, 'learning_rate': 6.353100956761893e-08, 'epoch': 3.89}
{'loss': 0.8629, 'learning_rate': 5.949874397470634e-08, 'epoch': 3.89}
{'loss': 0.8763, 'learning_rate': 5.559863223515427e-08, 'epoch': 3.89}
{'loss': 0.8457, 'learning_rate': 5.183067950617071e-08, 'epoch': 3.89}
{'loss': 0.9548, 'learning_rate': 4.819489077021455e-08, 'epoch': 3.9}
{'loss': 0.8712, 'learning_rate': 4.469127083498448e-08, 'epoch': 3.9}
{'loss': 0.872, 'learning_rate': 4.1319824333407864e-08, 'epoch': 3.9}
{'loss': 0.8723, 'learning_rate': 3.808055572362967e-08, 'epoch': 3.9}
{'loss': 0.9563, 'learning_rate': 3.4973469289012465e-08, 'epoch': 3.9}
{'loss': 0.8961, 'learning_rate': 3.199856913813637e-08, 'epoch': 3.9}
{'loss': 0.9116, 'learning_rate': 2.915585920479913e-08, 'epoch': 3.9}
{'loss': 0.8716, 'learning_rate': 2.6445343247982755e-08, 'epoch': 3.91}
{'loss': 0.8707, 'learning_rate': 2.3867024851853546e-08, 'epoch': 3.91}
{'loss': 0.8533, 'learning_rate': 2.142090742580649e-08, 'epoch': 3.91}
{'loss': 0.8833, 'learning_rate': 1.9106994204409755e-08, 'epoch': 3.91}
{'loss': 0.9717, 'learning_rate': 1.6925288247393588e-08, 'epoch': 3.91}
{'loss': 0.823, 'learning_rate': 1.4875792439683623e-08, 'epoch': 3.91}
{'loss': 0.8259, 'learning_rate': 1.2958509491389769e-08, 'epoch': 3.91}
{'loss': 0.8943, 'learning_rate': 1.1173441937772922e-08, 'epoch': 3.92}
{'loss': 0.8425, 'learning_rate': 9.52059213927825e-09, 'epoch': 3.92}
{'loss': 0.8533, 'learning_rate': 7.999962281513006e-09, 'epoch': 3.92}
{'loss': 0.8883, 'learning_rate': 6.61155437524652e-09, 'epoch': 3.92}
{'loss': 0.8401, 'learning_rate': 5.355370256410197e-09, 'epoch': 3.92}
{'loss': 0.8426, 'learning_rate': 4.231411586064216e-09, 'epoch': 3.92}
{'loss': 0.9038, 'learning_rate': 3.2396798504752414e-09, 'epoch': 3.92}
{'loss': 0.8293, 'learning_rate': 2.3801763610165064e-09, 'epoch': 3.93}
{'loss': 0.8884, 'learning_rate': 1.6529022542455252e-09, 'epoch': 3.93}
{'loss': 0.8839, 'learning_rate': 1.0578584918374823e-09, 'epoch': 3.93}
{'loss': 0.9561, 'learning_rate': 5.950458606518439e-10, 'epoch': 3.93}
{'loss': 0.9134, 'learning_rate': 2.6446497266574555e-10, 'epoch': 3.93}
{'loss': 0.8959, 'learning_rate': 6.611626501840107e-11, 'epoch': 3.93}
{'loss': 0.891, 'learning_rate': 0.0, 'epoch': 3.93}
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2752/2752 [47:18<00:00,  1.01s/it][2023-12-29 02:50:44,099] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,099] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,099] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,099] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,099] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,099] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,099] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,099] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:168] [RANK:7] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,099] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:165] [RANK:4] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,099] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:162] [RANK:1] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,099] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:167] [RANK:6] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,100] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,100] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:163] [RANK:2] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,100] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:166] [RANK:5] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,100] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,101] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:164] [RANK:3] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,354] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,355] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,355] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,356] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:50:44,619] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,619] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:50:44,889] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:44,889] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:50:45,134] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:45,135] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:50:45,419] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:45,420] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:50:45,687] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:45,688] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:50:45,939] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:45,940] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:50:46,200] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:46,201] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:50:46,462] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:46,463] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:50:46,732] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:46,733] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:50:46,994] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:46,995] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
                                                                                                                                                        [2023-12-29 02:50:47,258] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
[2023-12-29 02:50:47,259] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:161] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 23546
{'eval_loss': 1.0020508766174316, 'eval_runtime': 3.171, 'eval_samples_per_second': 344.368, 'eval_steps_per_second': 21.76, 'epoch': 3.93}
{'train_runtime': 2842.4239, 'train_samples_per_second': 75.254, 'train_steps_per_second': 0.968, 'train_loss': 0.9769066730947342, 'epoch': 3.93}
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2752/2752 [47:26<00:00,  1.03s/it]
[2023-12-29 02:50:51,874] [INFO] [axolotl.train.log:60] [PID:161] [RANK:0] Training Completed!!! Saving pre-trained model to ./lora-out
root@8510995a57b3:/workspace/axolotl# accelerate launch -m axolotl.cli.train examples/u^C
root@8510995a57b3:/workspace/axolotl# ls -lh
total 88K
-rw-r--r--  1 root root  648 Dec 28 07:07 FAQS.md
-rw-r--r--  1 root root  12K Dec 28 07:07 LICENSE
-rw-r--r--  1 root root  40K Dec 28 07:07 README.md
-rw-r--r--  1 root root  262 Dec 28 07:07 TODO.md
drwxr-xr-x  2 root root  103 Dec 29 01:58 deepspeed
drwxr-xr-x  2 root root   88 Dec 29 01:58 docker
-rw-r--r--  1 root root  701 Dec 28 07:07 docker-compose.yaml
drwxr-xr-x  2 root root   96 Dec 29 01:58 docs
drwxr-xr-x 20 root root 4.0K Dec 29 01:58 examples
drwxr-xr-x  2 root root   95 Dec 29 01:58 image
drwxr-xr-x  3 root root   54 Dec 29 02:02 last_run_prepared
drwxr-xr-x  7 root root  332 Dec 29 02:50 lora-out
-rw-r--r--  1 root root   22 Dec 28 07:07 requirements-dev.txt
-rw-r--r--  1 root root    7 Dec 28 07:07 requirements-tests.txt
-rw-r--r--  1 root root  552 Dec 28 07:07 requirements.txt
drwxr-xr-x  2 root root   65 Dec 29 01:58 scripts
-rw-r--r--  1 root root 1.8K Dec 28 07:07 setup.py
drwxr-xr-x  4 root root   57 Dec 29 01:58 src
drwxr-xr-x  5 root root 4.0K Dec 29 01:58 tests
root@8510995a57b3:/workspace/axolotl# ls ~/..cacache
ls: cannot access '/root/..cacache': No such file or directory
root@8510995a57b3:/workspace/axolotl# ls ~/.cache
conda  huggingface  matplotlib  pip
root@8510995a57b3:/workspace/axolotl# ls ~/.cache/huggingface/
datasets  hub
root@8510995a57b3:/workspace/axolotl# ls ~/.cache/huggingface/datasets
_root_.cache_huggingface_datasets_teknium___gpt4-llm-cleaned_default_0.0.0_b4e7d42750cbc1d81f9b85b98b13b48c88092adb.lock  teknium___gpt4-llm-cleaned
downloads
root@8510995a57b3:/workspace/axolotl# df -h
Filesystem              Size  Used Avail Use% Mounted on
overlay                  20G  6.9G   14G  35% /
tmpfs                    64M     0   64M   0% /dev
tmpfs                   252G     0  252G   0% /sys/fs/cgroup
shm                     251G     0  251G   0% /dev/shm
/dev/mapper/vg0-root     16G  5.4G  9.4G  37% /usr/bin/nvidia-smi
/dev/mapper/vg0-docker  100G  485M  100G   1% /workspace
tmpfs                   252G   12K  252G   1% /proc/driver/nvidia
tmpfs                   252G  4.0K  252G   1% /etc/nvidia/nvidia-application-profiles-rc.d
tmpfs                    51G   25M   51G   1% /run/nvidia-persistenced/socket
tmpfs                   252G     0  252G   0% /proc/asound
tmpfs                   252G     0  252G   0% /proc/acpi
tmpfs                   252G     0  252G   0% /proc/scsi
tmpfs                   252G     0  252G   0% /sys/firmware
root@8510995a57b3:/workspace/axolotl# ls ~/.cache/
conda  huggingface  matplotlib  pip
root@8510995a57b3:/workspace/axolotl# cp -r !$/huggingface /workspace/
cp -r ~/.cache//huggingface /workspace/
root@8510995a57b3:/workspace/axolotl# accelerate launch -m axolotl.cli.train examples/mistral/config.yml
The following values were not passed to `accelerate launch` and had defaults used instead:
	`--num_processes` was set to a value of `8`
		More than one GPU was found, enabling multi-GPU training.
		If this was unintended please pass in `--num_processes=1`.
	`--num_machines` was set to a value of `1`
	`--mixed_precision` was set to a value of `'no'`
	`--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:

================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================

  warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:

================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================

  warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:

================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================

  warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:

================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================

  warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:

================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================

  warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:

================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================

  warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:

================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================

  warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:106: UserWarning:

================================================================================
WARNING: Manual override via BNB_CUDA_VERSION env variable detected!
BNB_CUDA_VERSION=XXX can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Loading CUDA version: BNB_CUDA_VERSION=118
================================================================================

  warn((f'\\n\\n{"="*80}\\n'
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
[2023-12-29 03:05:46,462] [INFO] [datasets.<module>:58] [PID:5112] PyTorch version 2.0.1+cu118 available.
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
[2023-12-29 03:05:46,491] [INFO] [datasets.<module>:58] [PID:5115] PyTorch version 2.0.1+cu118 available.
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
[2023-12-29 03:05:46,506] [INFO] [datasets.<module>:58] [PID:5110] PyTorch version 2.0.1+cu118 available.
[2023-12-29 03:05:46,528] [INFO] [datasets.<module>:58] [PID:5117] PyTorch version 2.0.1+cu118 available.
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
[2023-12-29 03:05:46,560] [INFO] [datasets.<module>:58] [PID:5114] PyTorch version 2.0.1+cu118 available.
[2023-12-29 03:05:46,566] [INFO] [datasets.<module>:58] [PID:5113] PyTorch version 2.0.1+cu118 available.
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
[2023-12-29 03:05:46,725] [INFO] [datasets.<module>:58] [PID:5116] PyTorch version 2.0.1+cu118 available.
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
[2023-12-29 03:05:46,815] [INFO] [datasets.<module>:58] [PID:5111] PyTorch version 2.0.1+cu118 available.
config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 571/571 [00:00<00:00, 141kB/s]
[2023-12-29 03:05:47,838] [INFO] [axolotl.normalize_config:150] [PID:5115] [RANK:5] GPU memory usage baseline: 0.000GB (+0.312GB misc)
[2023-12-29 03:05:47,853] [INFO] [axolotl.normalize_config:150] [PID:5110] [RANK:0] GPU memory usage baseline: 0.000GB (+0.312GB misc)
[2023-12-29 03:05:47,860] [INFO] [axolotl.normalize_config:150] [PID:5112] [RANK:2] GPU memory usage baseline: 0.000GB (+0.312GB misc)
[2023-12-29 03:05:47,874] [INFO] [axolotl.normalize_config:150] [PID:5117] [RANK:7] GPU memory usage baseline: 0.000GB (+0.312GB misc)
[2023-12-29 03:05:47,874] [INFO] [axolotl.normalize_config:150] [PID:5113] [RANK:3] GPU memory usage baseline: 0.000GB (+0.312GB misc)
[2023-12-29 03:05:47,877] [INFO] [axolotl.normalize_config:150] [PID:5114] [RANK:4] GPU memory usage baseline: 0.000GB (+0.312GB misc)
[2023-12-29 03:05:48,239] [INFO] [axolotl.normalize_config:150] [PID:5116] [RANK:6] GPU memory usage baseline: 0.000GB (+0.312GB misc)
[2023-12-29 03:05:48,274] [INFO] [axolotl.normalize_config:150] [PID:5111] [RANK:1] GPU memory usage baseline: 0.000GB (+0.312GB misc)
[2023-12-29 03:05:48,277] [WARNING] [axolotl.scripts.check_user_token:342] [PID:5111] [RANK:1] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
[2023-12-29 03:05:48,280] [WARNING] [axolotl.scripts.check_user_token:342] [PID:5114] [RANK:4] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
[2023-12-29 03:05:48,280] [WARNING] [axolotl.scripts.check_user_token:342] [PID:5115] [RANK:5] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
                                 dP            dP   dP
                                 88            88   88
      .d8888b. dP.  .dP .d8888b. 88 .d8888b. d8888P 88
      88'  `88  `8bd8'  88'  `88 88 88'  `88   88   88
      88.  .88  .d88b.  88.  .88 88 88.  .88   88   88
      `88888P8 dP'  `dP `88888P' dP `88888P'   dP   dP

[2023-12-29 03:05:48,285] [WARNING] [axolotl.scripts.check_user_token:342] [PID:5112] [RANK:2] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
[2023-12-29 03:05:48,285] [WARNING] [axolotl.scripts.check_user_token:342] [PID:5116] [RANK:6] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
[2023-12-29 03:05:48,285] [WARNING] [axolotl.scripts.check_user_token:342] [PID:5110] [RANK:0] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
[2023-12-29 03:05:48,286] [WARNING] [axolotl.scripts.check_user_token:342] [PID:5113] [RANK:3] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
[2023-12-29 03:05:48,287] [WARNING] [axolotl.scripts.check_user_token:342] [PID:5117] [RANK:7] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from <https://huggingface.co/settings/tokens> if you want to use gated models or datasets.
tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 967/967 [00:00<00:00, 172kB/s]
tokenizer.model: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 493k/493k [00:00<00:00, 51.8MB/s]
special_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████| 72.0/72.0 [00:00<00:00, 91.1kB/s]
tokenizer.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1.80M/1.80M [00:00<00:00, 14.8MB/s]
[2023-12-29 03:05:49,783] [DEBUG] [axolotl.load_tokenizer:184] [PID:5114] [RANK:4] EOS: 2 / </s>
[2023-12-29 03:05:49,783] [DEBUG] [axolotl.load_tokenizer:185] [PID:5114] [RANK:4] BOS: 1 / <s>
[2023-12-29 03:05:49,783] [DEBUG] [axolotl.load_tokenizer:186] [PID:5114] [RANK:4] PAD: 2 / </s>
[2023-12-29 03:05:49,783] [DEBUG] [axolotl.load_tokenizer:187] [PID:5114] [RANK:4] UNK: 0 / <unk>
[2023-12-29 03:05:49,799] [DEBUG] [axolotl.load_tokenizer:184] [PID:5113] [RANK:3] EOS: 2 / </s>
[2023-12-29 03:05:49,799] [DEBUG] [axolotl.load_tokenizer:185] [PID:5113] [RANK:3] BOS: 1 / <s>
[2023-12-29 03:05:49,799] [DEBUG] [axolotl.load_tokenizer:186] [PID:5113] [RANK:3] PAD: 2 / </s>
[2023-12-29 03:05:49,799] [DEBUG] [axolotl.load_tokenizer:187] [PID:5113] [RANK:3] UNK: 0 / <unk>
[2023-12-29 03:05:49,805] [DEBUG] [axolotl.load_tokenizer:184] [PID:5115] [RANK:5] EOS: 2 / </s>
[2023-12-29 03:05:49,805] [DEBUG] [axolotl.load_tokenizer:185] [PID:5115] [RANK:5] BOS: 1 / <s>
[2023-12-29 03:05:49,805] [DEBUG] [axolotl.load_tokenizer:186] [PID:5115] [RANK:5] PAD: 2 / </s>
[2023-12-29 03:05:49,805] [DEBUG] [axolotl.load_tokenizer:187] [PID:5115] [RANK:5] UNK: 0 / <unk>
[2023-12-29 03:05:49,807] [DEBUG] [axolotl.load_tokenizer:184] [PID:5110] [RANK:0] EOS: 2 / </s>
[2023-12-29 03:05:49,807] [DEBUG] [axolotl.load_tokenizer:185] [PID:5110] [RANK:0] BOS: 1 / <s>
[2023-12-29 03:05:49,807] [DEBUG] [axolotl.load_tokenizer:186] [PID:5110] [RANK:0] PAD: 2 / </s>
[2023-12-29 03:05:49,807] [DEBUG] [axolotl.load_tokenizer:184] [PID:5112] [RANK:2] EOS: 2 / </s>
[2023-12-29 03:05:49,807] [DEBUG] [axolotl.load_tokenizer:185] [PID:5112] [RANK:2] BOS: 1 / <s>
[2023-12-29 03:05:49,807] [DEBUG] [axolotl.load_tokenizer:187] [PID:5110] [RANK:0] UNK: 0 / <unk>
[2023-12-29 03:05:49,807] [DEBUG] [axolotl.load_tokenizer:186] [PID:5112] [RANK:2] PAD: 2 / </s>
[2023-12-29 03:05:49,807] [DEBUG] [axolotl.load_tokenizer:187] [PID:5112] [RANK:2] UNK: 0 / <unk>
[2023-12-29 03:05:49,807] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:5110] [RANK:0] Unable to find prepared dataset in last_run_prepared/a9d1db4e773009ed6598f0611beb6e88
[2023-12-29 03:05:49,807] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:5110] [RANK:0] Loading raw datasets...
[2023-12-29 03:05:49,807] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:5110] [RANK:0] No seed provided, using default seed of 42
[2023-12-29 03:05:49,812] [DEBUG] [axolotl.load_tokenizer:184] [PID:5117] [RANK:7] EOS: 2 / </s>
[2023-12-29 03:05:49,812] [DEBUG] [axolotl.load_tokenizer:185] [PID:5117] [RANK:7] BOS: 1 / <s>
[2023-12-29 03:05:49,812] [DEBUG] [axolotl.load_tokenizer:186] [PID:5117] [RANK:7] PAD: 2 / </s>
[2023-12-29 03:05:49,812] [DEBUG] [axolotl.load_tokenizer:187] [PID:5117] [RANK:7] UNK: 0 / <unk>
[2023-12-29 03:05:49,812] [DEBUG] [axolotl.load_tokenizer:184] [PID:5116] [RANK:6] EOS: 2 / </s>
[2023-12-29 03:05:49,813] [DEBUG] [axolotl.load_tokenizer:185] [PID:5116] [RANK:6] BOS: 1 / <s>
[2023-12-29 03:05:49,813] [DEBUG] [axolotl.load_tokenizer:186] [PID:5116] [RANK:6] PAD: 2 / </s>
[2023-12-29 03:05:49,813] [DEBUG] [axolotl.load_tokenizer:187] [PID:5116] [RANK:6] UNK: 0 / <unk>
[2023-12-29 03:05:49,837] [DEBUG] [axolotl.load_tokenizer:184] [PID:5111] [RANK:1] EOS: 2 / </s>
[2023-12-29 03:05:49,837] [DEBUG] [axolotl.load_tokenizer:185] [PID:5111] [RANK:1] BOS: 1 / <s>
[2023-12-29 03:05:49,837] [DEBUG] [axolotl.load_tokenizer:186] [PID:5111] [RANK:1] PAD: 2 / </s>
[2023-12-29 03:05:49,837] [DEBUG] [axolotl.load_tokenizer:187] [PID:5111] [RANK:1] UNK: 0 / <unk>
Downloading readme: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 28.0/28.0 [00:00<00:00, 173kB/s]
Downloading data: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 1.76M/1.76M [00:01<00:00, 1.14MB/s]
Generating train split: 2000 examples [00:00, 99498.37 examples/s]
Map (num_proc=64): 100%|████████████████████████████████████████████████████████████████████████████████████| 2000/2000 [00:00<00:00, 3086.84 examples/s]
[2023-12-29 03:05:58,120] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:5110] [RANK:0] merging datasets
[2023-12-29 03:05:58,126] [INFO] [axolotl.load_tokenized_prepared_datasets:369] [PID:5110] [RANK:0] Saving merged prepared dataset to disk... last_run_prepared/a9d1db4e773009ed6598f0611beb6e88
Saving the dataset (1/1 shards): 100%|████████████████████████████████████████████████████████████████████| 2000/2000 [00:00<00:00, 109410.44 examples/s]
[2023-12-29 03:05:59,853] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:5114] [RANK:4] Unable to find prepared dataset in last_run_prepared/a9d1db4e773009ed6598f0611beb6e88
[2023-12-29 03:05:59,853] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:5115] [RANK:5] Unable to find prepared dataset in last_run_prepared/a9d1db4e773009ed6598f0611beb6e88
[2023-12-29 03:05:59,854] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:5114] [RANK:4] Loading raw datasets...
[2023-12-29 03:05:59,854] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:5115] [RANK:5] Loading raw datasets...
[2023-12-29 03:05:59,854] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:5114] [RANK:4] No seed provided, using default seed of 42
[2023-12-29 03:05:59,854] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:5115] [RANK:5] No seed provided, using default seed of 42
[2023-12-29 03:05:59,853] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:5112] [RANK:2] Unable to find prepared dataset in last_run_prepared/a9d1db4e773009ed6598f0611beb6e88
[2023-12-29 03:05:59,854] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:5112] [RANK:2] Loading raw datasets...
[2023-12-29 03:05:59,854] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:5112] [RANK:2] No seed provided, using default seed of 42
[2023-12-29 03:05:59,854] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:5113] [RANK:3] Unable to find prepared dataset in last_run_prepared/a9d1db4e773009ed6598f0611beb6e88
[2023-12-29 03:05:59,854] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:5113] [RANK:3] Loading raw datasets...
[2023-12-29 03:05:59,854] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:5113] [RANK:3] No seed provided, using default seed of 42
[2023-12-29 03:05:59,854] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:5111] [RANK:1] Unable to find prepared dataset in last_run_prepared/a9d1db4e773009ed6598f0611beb6e88
[2023-12-29 03:05:59,854] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:5117] [RANK:7] Unable to find prepared dataset in last_run_prepared/a9d1db4e773009ed6598f0611beb6e88
[2023-12-29 03:05:59,855] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:5111] [RANK:1] Loading raw datasets...
[2023-12-29 03:05:59,855] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:5117] [RANK:7] Loading raw datasets...
[2023-12-29 03:05:59,855] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:5111] [RANK:1] No seed provided, using default seed of 42
[2023-12-29 03:05:59,855] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:5117] [RANK:7] No seed provided, using default seed of 42
[2023-12-29 03:05:59,854] [INFO] [axolotl.load_tokenized_prepared_datasets:147] [PID:5116] [RANK:6] Unable to find prepared dataset in last_run_prepared/a9d1db4e773009ed6598f0611beb6e88
[2023-12-29 03:05:59,855] [INFO] [axolotl.load_tokenized_prepared_datasets:148] [PID:5116] [RANK:6] Loading raw datasets...
[2023-12-29 03:05:59,855] [INFO] [axolotl.load_tokenized_prepared_datasets:153] [PID:5116] [RANK:6] No seed provided, using default seed of 42
Filter (num_proc=96):  46%|█████████████████████████████████████▉                                            | 880/1900 [00:00<00:00, 3558.07 examples/s][2023-12-29 03:06:03,178] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:5114] [RANK:4] merging datasets
[2023-12-29 03:06:03,222] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:5117] [RANK:7] merging datasets
[2023-12-29 03:06:03,229] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:5116] [RANK:6] merging datasets
[2023-12-29 03:06:03,235] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:5115] [RANK:5] merging datasets
[2023-12-29 03:06:03,271] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:5111] [RANK:1] merging datasets
Filter (num_proc=96): 100%|█████████████████████████████████████████████████████████████████████████████████| 1900/1900 [00:00<00:00, 3799.02 examples/s]
[2023-12-29 03:06:03,538] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:5112] [RANK:2] merging datasets
[2023-12-29 03:06:03,746] [INFO] [axolotl.load_tokenized_prepared_datasets:362] [PID:5113] [RANK:3] merging datasets
Filter (num_proc=96): 100%|████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 209.73 examples/s]
Map (num_proc=96): 100%|████████████████████████████████████████████████████████████████████████████████████| 1900/1900 [00:01<00:00, 1753.75 examples/s]
[2023-12-29 03:06:18,321] [DEBUG] [axolotl.log:60] [PID:5110] [RANK:0] total_num_tokens: 405259
[2023-12-29 03:06:18,336] [DEBUG] [axolotl.log:60] [PID:5110] [RANK:0] `total_supervised_tokens: 282059`
[2023-12-29 03:06:24,148] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:5110] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 50657
[2023-12-29 03:06:24,149] [DEBUG] [axolotl.log:60] [PID:5110] [RANK:0] data_loader_len: 23
[2023-12-29 03:06:24,621] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:5112] [RANK:2] packing_efficiency_estimate: 1.0 total_num_tokens per device: 50657
[2023-12-29 03:06:24,634] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:5114] [RANK:4] packing_efficiency_estimate: 1.0 total_num_tokens per device: 50657
[2023-12-29 03:06:24,656] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:5111] [RANK:1] packing_efficiency_estimate: 1.0 total_num_tokens per device: 50657
[2023-12-29 03:06:24,749] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:5113] [RANK:3] packing_efficiency_estimate: 1.0 total_num_tokens per device: 50657
[2023-12-29 03:06:24,773] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:5115] [RANK:5] packing_efficiency_estimate: 1.0 total_num_tokens per device: 50657
[2023-12-29 03:06:24,792] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:5117] [RANK:7] packing_efficiency_estimate: 1.0 total_num_tokens per device: 50657
[2023-12-29 03:06:24,995] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:5116] [RANK:6] packing_efficiency_estimate: 1.0 total_num_tokens per device: 50657
[2023-12-29 03:06:25,057] [INFO] [axolotl.log:60] [PID:5110] [RANK:0] sample_packing_eff_est across ranks: [0.9894018769264221, 0.9894018769264221, 0.9894018769264221, 0.9894018769264221, 0.9894018769264221, 0.9894018769264221, 0.9513479471206665, 0.9894018769264221]
[2023-12-29 03:06:25,061] [DEBUG] [axolotl.log:60] [PID:5110] [RANK:0] sample_packing_eff_est: 0.99
[2023-12-29 03:06:25,062] [DEBUG] [axolotl.log:60] [PID:5110] [RANK:0] total_num_steps: 11
[2023-12-29 03:06:25,075] [DEBUG] [axolotl.train.log:60] [PID:5110] [RANK:0] loading tokenizer... mistralai/Mistral-7B-v0.1
[2023-12-29 03:06:25,398] [DEBUG] [axolotl.load_tokenizer:184] [PID:5114] [RANK:4] EOS: 2 / </s>
[2023-12-29 03:06:25,398] [DEBUG] [axolotl.load_tokenizer:185] [PID:5114] [RANK:4] BOS: 1 / <s>
[2023-12-29 03:06:25,399] [DEBUG] [axolotl.load_tokenizer:186] [PID:5114] [RANK:4] PAD: 2 / </s>
[2023-12-29 03:06:25,399] [DEBUG] [axolotl.load_tokenizer:187] [PID:5114] [RANK:4] UNK: 0 / <unk>
[2023-12-29 03:06:25,406] [DEBUG] [axolotl.load_tokenizer:184] [PID:5115] [RANK:5] EOS: 2 / </s>
[2023-12-29 03:06:25,406] [DEBUG] [axolotl.load_tokenizer:185] [PID:5115] [RANK:5] BOS: 1 / <s>
[2023-12-29 03:06:25,406] [DEBUG] [axolotl.load_tokenizer:186] [PID:5115] [RANK:5] PAD: 2 / </s>
[2023-12-29 03:06:25,406] [DEBUG] [axolotl.load_tokenizer:187] [PID:5115] [RANK:5] UNK: 0 / <unk>
[2023-12-29 03:06:25,407] [DEBUG] [axolotl.load_tokenizer:184] [PID:5117] [RANK:7] EOS: 2 / </s>
[2023-12-29 03:06:25,407] [DEBUG] [axolotl.load_tokenizer:185] [PID:5117] [RANK:7] BOS: 1 / <s>
[2023-12-29 03:06:25,407] [DEBUG] [axolotl.load_tokenizer:186] [PID:5117] [RANK:7] PAD: 2 / </s>
[2023-12-29 03:06:25,407] [DEBUG] [axolotl.load_tokenizer:187] [PID:5117] [RANK:7] UNK: 0 / <unk>
[2023-12-29 03:06:25,409] [DEBUG] [axolotl.load_tokenizer:184] [PID:5112] [RANK:2] EOS: 2 / </s>
[2023-12-29 03:06:25,409] [DEBUG] [axolotl.load_tokenizer:185] [PID:5112] [RANK:2] BOS: 1 / <s>
[2023-12-29 03:06:25,409] [DEBUG] [axolotl.load_tokenizer:186] [PID:5112] [RANK:2] PAD: 2 / </s>
[2023-12-29 03:06:25,409] [DEBUG] [axolotl.load_tokenizer:187] [PID:5112] [RANK:2] UNK: 0 / <unk>
[2023-12-29 03:06:25,411] [DEBUG] [axolotl.load_tokenizer:184] [PID:5116] [RANK:6] EOS: 2 / </s>
[2023-12-29 03:06:25,411] [DEBUG] [axolotl.load_tokenizer:185] [PID:5116] [RANK:6] BOS: 1 / <s>
[2023-12-29 03:06:25,411] [DEBUG] [axolotl.load_tokenizer:186] [PID:5116] [RANK:6] PAD: 2 / </s>
[2023-12-29 03:06:25,411] [DEBUG] [axolotl.load_tokenizer:187] [PID:5116] [RANK:6] UNK: 0 / <unk>
[2023-12-29 03:06:25,423] [DEBUG] [axolotl.load_tokenizer:184] [PID:5113] [RANK:3] EOS: 2 / </s>
[2023-12-29 03:06:25,423] [DEBUG] [axolotl.load_tokenizer:185] [PID:5113] [RANK:3] BOS: 1 / <s>
[2023-12-29 03:06:25,423] [DEBUG] [axolotl.load_tokenizer:186] [PID:5113] [RANK:3] PAD: 2 / </s>
[2023-12-29 03:06:25,423] [DEBUG] [axolotl.load_tokenizer:187] [PID:5113] [RANK:3] UNK: 0 / <unk>
[2023-12-29 03:06:25,425] [DEBUG] [axolotl.load_tokenizer:184] [PID:5111] [RANK:1] EOS: 2 / </s>
[2023-12-29 03:06:25,426] [DEBUG] [axolotl.load_tokenizer:185] [PID:5111] [RANK:1] BOS: 1 / <s>
[2023-12-29 03:06:25,426] [DEBUG] [axolotl.load_tokenizer:186] [PID:5111] [RANK:1] PAD: 2 / </s>
[2023-12-29 03:06:25,426] [DEBUG] [axolotl.load_tokenizer:187] [PID:5111] [RANK:1] UNK: 0 / <unk>
[2023-12-29 03:06:25,432] [DEBUG] [axolotl.load_tokenizer:184] [PID:5110] [RANK:0] EOS: 2 / </s>
[2023-12-29 03:06:25,432] [DEBUG] [axolotl.load_tokenizer:185] [PID:5110] [RANK:0] BOS: 1 / <s>
[2023-12-29 03:06:25,432] [DEBUG] [axolotl.load_tokenizer:186] [PID:5110] [RANK:0] PAD: 2 / </s>
[2023-12-29 03:06:25,432] [DEBUG] [axolotl.load_tokenizer:187] [PID:5110] [RANK:0] UNK: 0 / <unk>
[2023-12-29 03:06:25,432] [DEBUG] [axolotl.train.log:60] [PID:5110] [RANK:0] loading model
[2023-12-29 03:06:25,545] [INFO] [axolotl.load_model:256] [PID:5114] [RANK:4] patching with flash attention
[2023-12-29 03:06:25,545] [INFO] [axolotl.load_model:256] [PID:5115] [RANK:5] patching with flash attention
[2023-12-29 03:06:25,548] [INFO] [axolotl.load_model:256] [PID:5112] [RANK:2] patching with flash attention
[2023-12-29 03:06:25,556] [INFO] [axolotl.load_model:256] [PID:5116] [RANK:6] patching with flash attention
[2023-12-29 03:06:25,560] [INFO] [axolotl.load_model:256] [PID:5113] [RANK:3] patching with flash attention
[2023-12-29 03:06:25,569] [INFO] [axolotl.load_model:256] [PID:5111] [RANK:1] patching with flash attention
[2023-12-29 03:06:25,577] [INFO] [axolotl.load_model:256] [PID:5110] [RANK:0] patching with flash attention
[2023-12-29 03:06:25,767] [INFO] [axolotl.load_model:256] [PID:5117] [RANK:7] patching with flash attention
model.safetensors.index.json: 100%|█████████████████████████████████████████████████████████████████████████████████| 25.1k/25.1k [00:00<00:00, 4.31MB/s]
Downloading shards:   0%|                                                                                                          | 0/2 [00:00<?, ?it/s]
model-00001-of-00002.safetensors:   1%|▌                                                                            | 73.4M/9.94G [00:01<03:01, 54.2MB/s]