lightning.fabric.plugins.environments.lsf.LSFEnvironment¶
- class lightning.fabric.plugins.environments.lsf.LSFEnvironment[source]¶
Bases:
ClusterEnvironmentAn environment for running on clusters managed by the LSF resource manager.
It is expected that any execution using this ClusterEnvironment was executed using the Job Step Manager i.e.
jsrun.This plugin expects the following environment variables:
LSB_JOBIDThe LSF assigned job ID
LSB_DJOB_RANKFILEThe OpenMPI compatible rank file for the LSF job
JSM_NAMESPACE_LOCAL_RANKThe node local rank for the task. This environment variable is set by
jsrunJSM_NAMESPACE_SIZEThe world size for the task. This environment variable is set by
jsrunJSM_NAMESPACE_RANKThe global rank for the task. This environment variable is set by
jsrun
- _get_main_address()[source]¶
A helper for getting the main address.
The main address is assigned to the first node in the list of nodes used for the job.
- Return type:
- static _get_main_port()[source]¶
A helper function for accessing the main port.
Uses the LSF job ID so all ranks can compute the main port.
- Return type:
- _get_node_rank()[source]¶
A helper method for getting the node rank.
The node rank is determined by the position of the current node in the list of hosts used in the job. This is calculated by reading all hosts from
LSB_DJOB_RANKFILEand finding this node’s hostname in the list.- Return type:
- static _read_hosts()[source]¶
Read compute hosts that are a part of the compute job.
LSF uses the Job Step Manager (JSM) to manage job steps. Job steps are executed by the JSM from “launch” nodes. Each job is assigned a launch node. This launch node will be the first node in the list contained in
LSB_DJOB_RANKFILE.
- static detect()[source]¶
Returns
Trueif the current process was launched using thejsruncommand.- Return type:
- global_rank()[source]¶
The world size is read from the environment variable
JSM_NAMESPACE_RANK.- Return type:
- local_rank()[source]¶
The local rank is read from the environment variable JSM_NAMESPACE_LOCAL_RANK.
- Return type:
- node_rank()[source]¶
The node rank is determined by the position of the current hostname in the OpenMPI host rank file stored in
LSB_DJOB_RANKFILE.- Return type:
- world_size()[source]¶
The world size is read from the environment variable
JSM_NAMESPACE_SIZE.- Return type:
- property creates_processes_externally: bool¶
LSF creates subprocesses, i.e., PyTorch Lightning does not need to spawn them.