Configs and Args
This section covers the configuration settings and argument parsing for the training scripts.
config.py
Description
The file /AttentionLens/attention_lens/train/config.py
defines a TrainConfig
data class, which holds the default configuration settings for training.
Variables:
lr
: Learning rate for the optimizer.epochs
: Number of complete passes through the training set.max_checkpoint_num
: Maximum number of checkpoint files to keep.batch_size
: Number of samples processed before the model is updated.num_nodes
: Number of nodes to use in distributed training.mixed_precision
: Boolean flag to indicate if mixed precision training should be used.checkpoint_mode
: Mode to determine when to save checkpoints, either after a certain number of steps (step
) or based on training loss (loss
).num_steps_per_checkpoint
: Number of steps between checkpoints whencheckpoint_mode
is set tostep
.checkpoint_dir
: Directory where checkpoint files are saved.accumulate_grad_batches
: Number of steps to accumulate gradients before updating model parameters.reload_checkpoint
: Path to a checkpoint file to resume training from.stopping_delta
: Minimum change in loss to qualify as an improvement for early stopping.stopping_patience
: Number of checks with no improvement after which training is stopped.model_name
: Name of the model architecture to use.layer_number
: Specific layer number to start training from.
Usage:
config = TrainConfig()
load_args.py
Description:
The file /AttentionLens/load_args.py
uses the argparse
module to parse command-line arguments. The parameters here are the same as those in config.py
. Note that changes here will override all defaults set in config.py
and lightning_lens.py
.
Usage:
To modify hyperparameters during the training of Attention Lens, use the --
modifier. For example:
python Train.py --lr 1e-3