Configs and Args
This section covers the configuration settings and argument parsing for the training scripts.
config.py
Description
The file /AttentionLens/attention_lens/train/config.py defines a TrainConfig data class, which holds the default configuration settings for training.
Variables:
lr: Learning rate for the optimizer.epochs: Number of complete passes through the training set.max_checkpoint_num: Maximum number of checkpoint files to keep.batch_size: Number of samples processed before the model is updated.num_nodes: Number of nodes to use in distributed training.mixed_precision: Boolean flag to indicate if mixed precision training should be used.checkpoint_mode: Mode to determine when to save checkpoints, either after a certain number of steps (step) or based on training loss (loss).num_steps_per_checkpoint: Number of steps between checkpoints whencheckpoint_modeis set tostep.checkpoint_dir: Directory where checkpoint files are saved.accumulate_grad_batches: Number of steps to accumulate gradients before updating model parameters.reload_checkpoint: Path to a checkpoint file to resume training from.stopping_delta: Minimum change in loss to qualify as an improvement for early stopping.stopping_patience: Number of checks with no improvement after which training is stopped.model_name: Name of the model architecture to use.layer_number: Specific layer number to start training from.
Usage:
config = TrainConfig()
load_args.py
Description:
The file /AttentionLens/load_args.py uses the argparse module to parse command-line arguments. The parameters here are the same as those in config.py. Note that changes here will override all defaults set in config.py and lightning_lens.py.
Usage:
To modify hyperparameters during the training of Attention Lens, use the -- modifier. For example:
python Train.py --lr 1e-3