Changelog¶
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog.
[unreleased] - YYYY-MM-DD¶
[unreleased] - Added¶
[unreleased] - Removed¶
[unreleased] - Changed¶
- Set - _DeviceDtypeModuleMixin._devicefrom torch’s default device function (#21164)
[unreleased] - Fixed¶
- Fixed - EADDRINUSEerrors in distributed tests with port manager and retry logic (#21309)
[2.5.5] - 2025-09-05¶
[2.5.5] - Changed¶
[2.5.5] - Fixed¶
[2.5.4] - 2025-08-29¶
[2.5.4] - Changed¶
- Added support for NVIDIA H200 GPUs in - get_available_flops(#21119)
[2.5.3] - 2025-08-13¶
[2.5.3] - Changed¶
[2.5.3] - Fixed¶
[2.5.2] - 2025-3-20¶
[2.5.2] - Changed¶
- Ensure correct device is used for autocast when mps is selected as Fabric accelerator (#20876) 
[2.5.2] - Fixed¶
- Fix: - TransformerEnginePrecisionconversion for layers with- bias=False(#20805)
[2.5.1] - 2025-03-18¶
[2.5.1] - Changed¶
- Added logging support for list of dicts without collapsing to a single key (#19957) 
[2.5.1] - Removed¶
- Removed legacy support for - lightning run model; use- fabric runinstead (#20588)
[2.5.0] - 2024-12-19¶
[2.5.0] - Added¶
- Added - stepparameter to- TensorBoardLogger.log_hyperparamsto visualize changes during training (#20176)
- Added timeout to DeepSpeedStrategy (#20474) 
- Added FP8 + FSDP2 + torch.compile examples for Fabric (#20440) 
- Added RTX 4080 super to chips dictionary (#20285) 
- Added device property to lazy load functionality (#20183) 
- Added - ddp_find_unused_parameters_truealias in Fabric’s DDPStrategy (#20125)
[2.5.0] - Changed¶
[2.5.0] - Fixed¶
- Fixed use of - convert_modulein FSDP to avoid using more memory than necessary during initialization (#20323)
[2.4.0] - 2024-08-06¶
[2.4.0] - Added¶
[2.4.0] - Changed¶
[2.4.0] - Removed¶
[2.4.0] - Fixed¶
[2.3.0] - 2024-06-13¶
[2.3.0] - Added¶
- Added sanitization for classes before logging them as hyperparameters (#19771) 
- Enabled consolidating distributed checkpoints through - fabric consolidatein the new CLI (#19560)
- Added the ability to explicitly mark forward methods in Fabric via - _FabricModule.mark_forward_method()(#19690)
- Added support for PyTorch 2.3 (#19708) 
- Added - ModelParallelStrategyto support 2D parallelism (#19846, #19852, #19870, #19872)
- Added a call to - torch.distributed.destroy_process_groupin atexit handler if process group needs destruction (#19931)
- Added support for configuring hybrid-sharding by passing a tuple for the - FSDPStrategy(device_mesh=...)argument (#19504)
[2.3.0] - Changed¶
- The - Fabric.rank_zero_firstcontext manager now uses a barrier without timeout to avoid long-running tasks to be interrupted (#19448)
- Fabric now raises an error if you forget to call - fabric.backward()when it is needed by the strategy or precision selection (#19447, #19493)
- _BackwardSyncControlcan now control what to do when gradient accumulation is disabled (#19577)
[2.3.0] - Removed¶
- Removed support for PyTorch 1.13 (#19706) 
[2.3.0] - Fixed¶
- Fixed a matrix shape mismatch issue when running a model loaded from a quantized checkpoint (bitsandbytes) (#19886) 
[2.2.2] - 2024-04-11¶
[2.2.2] - Fixed¶
[2.2.1] - 2024-03-04¶
[2.2.1] - Fixed¶
- Fixed an issue with CSVLogger trying to append to file from a previous run when the version is set manually (#19446) 
[2.2.0] - 2024-02-08¶
[2.2.0] - Added¶
- Added - lightning.fabric.utilities.ThroughputMonitorand- lightning.fabric.utilities.Throughputto track throughput and log it (#18848)
- Added - lightning.fabric.utilities.AttributeDictfor convenient dict-attribute access to represent state in script (#18943)
- Added support for meta-device initialization and materialization of 4-bit Bitsandbytes layers (#19150) 
- Added - TransformerEnginePrecision(fallback_compute_dtype=)to control the dtype of operations that don’t support fp8 (#19082)
- Added support for clipping gradients by value with FSDP (#19236) 
- Added a utility function and CLI to consolidate FSDP sharded checkpoints into a single file (#19213) 
- Added support for re-compiling the model inside - Fabric.setup()over the FSDP/DDP wrappers (#19280)
[2.2.0] - Changed¶
- seed_everything()without passing in a seed no longer randomly selects a seed, and now defaults to- 0(#18846)
- Changed the - TransformerEnginePrecision(dtype=)argument to- weights_dtypeand made it required (#19082)
- The columns in the - metrics.csvfile produced by- CSVLoggerare now sorted alphabetically (#19159)
[2.2.0] - Removed¶
- Removed support for PyTorch 1.12 (#19300) 
[2.2.0] - Fixed¶
[2.1.4] - 2024-01-31¶
[2.1.4] - Fixed¶
[2.1.3] - 2023-12-21¶
[2.1.3] - Fixed¶
[2.1.2] - 2023-11-15¶
[2.1.2] - Fixed¶
- Fixed precision default from environment (#18928) 
[2.1.1] - 2023-11-06¶
[2.1.1] - Changed¶
- Calling a method other than - forwardthat invokes submodules is now an error when the model is wrapped (e.g., with DDP) (#18819)
[2.1.1] - Fixed¶
[2.1.0] - 2023-10-11¶
[2.1.0] - Added¶
- Added support for the TPU-v4 architecture (#17227) 
- Added support for XLA’s new PJRT runtime (#17352) 
- Added support for Fully Sharded Data Parallel (FSDP) training with XLA (#18126, #18424, #18430) 
- Check for invalid TPU device inputs (#17227) 
- Added - XLAStrategy(sync_module_states=bool)to control whether to broadcast the parameters to all devices (#17522)
- Added support for joint setup of model and optimizer with FSDP (#17305) 
- Added support for handling multiple parameter groups in optimizers set up with FSDP (#17305) 
- Added support for saving and loading sharded model and optimizer state with - FSDPStrategy(#17323)
- Added a warning when calling methods on - _FabricModulethat bypass the strategy-specific wrappers (#17424)
- Added - Fabric.init_tensor()context manager to instantiate tensors efficiently directly on device and dtype (#17488)
- Added - Fabric.init_module()context manager to instantiate large models efficiently directly on device, dtype, and with sharding support (#17462)- Creates the model parameters in the desired dtype ( - torch.float32,- torch.float64,- torch.float16, or- torch.bfloat16) depending on the ‘true’ precision choice in- Fabric(precision='32-true'|'64-true'|'16-true'|'bf16-true')
- Handles initialization for FSDP models before wrapping and the Zero stage 3 initialization for DeepSpeed before sharding 
 
- Added support for empty weight initialization with - Fabric.init_module(empty_init=True)for checkpoint loading (#17627)
- Added support for meta-device initialization with - Fabric.init_module(empty_init=True)in FSDP (#18122)
- Added - lightning.fabric.plugins.Precision.module_init_context()and- lightning.fabric.strategies.Strategy.module_init_context()context managers to control model and tensor instantiation (#17462)
- lightning.fabric.strategies.Strategy.tensor_init_context()context manager to instantiate tensors efficiently directly on device and dtype (#17607)
- Run the DDP wrapper in a CUDA stream (#17334) 
- Added support for true half-precision as - Fabric(precision="16-true"|"bf16-true")(#17287)
- Added support for mixed 8-bit precision as - Fabric(precision="transformer-engine")using Nvidia’s Transformer Engine (#17597)
- Added support for linear layer quantization with - Fabric(plugins=BitsandbytesPrecision())using bitsandbytes (#18655)
- Added error messaging for missed - .launch()when it is required (#17570)
- Added support for saving checkpoints with either full state-dict or sharded state dict via - FSDPStrategy(state_dict_type="full"|"sharded")(#17526)
- Added support for loading a full-state checkpoint file into a sharded model (#17623) 
- Added support for calling hooks on a LightningModule via - Fabric.call(#17874)
- Added the parameter - Fabric.load(..., strict=True|False)to enable non-strict loading of partial checkpoint state (#17645)
- Added the parameter - Fabric.save(..., filter=...)to enable saving a partial checkpoint state (#17845)
- Added support for loading optimizer states from a full-state checkpoint file (#17747) 
- Automatically call - xla_model.mark_step()before saving checkpoints with XLA (#17882)
- Automatically call - xla_model.mark_step()after- optimizer.step()with XLA (#17883)
- Added support for all half-precision modes in FSDP precision plugin (#17807) 
- Added - FSDPStrategy(activation_checkpointing_policy=...)to customize the layer policy for automatic activation checkpointing (requires torch>=2.1) (#18045)
- Added a callback for spike-detection (#18014) 
- Added the ability to set the - torch.distributed.fsdp.ShardingStrategyvia string in- FSDPStrategy(#18087)
- Improved error messages when attempting to load a DeepSpeed checkpoint at an invalid path (#17795) 
- Added - Fabric.load_raw()for loading raw PyTorch state dict checkpoints for model or optimizer objects (#18049)
- Allowed accessing rank information in the main process before processes are launched when using the - XLAStrategy(#18194)
- Added automatic process cleanup to avoid zombie child processes and stalls when exceptions are raised (#18218) 
- Added validation of user input for - devicesand- num_nodeswhen running with- SLURMor- TorchElastic(#18292)
- Improved the error messaging and instructions when handling custom batch samplers in distributed settings (#18402) 
- Added support for saving and loading stateful objects other than modules and optimizers (#18513) 
- Enabled the default process group configuration for FSDP’s hybrid sharding (#18583) 
- Added - lightning.fabric.utilities.suggested_max_num_workersto assist with setting a good value in distributed settings (#18591)
- Added - lightning.fabric.utilities.is_shared_filesystemutility function to automatically check whether the filesystem is shared between machines (#18586)
- Removed support for PyTorch 1.11 (#18691) 
- Added support for passing the argument - .load_state_dict(..., assign=True|False)on Fabric-wrapped modules in PyTorch 2.1 or newer (#18690)
[2.1.0] - Changed¶
- Allow using iterable-style datasets with TPUs (#17331) 
- Increased the minimum XLA requirement to 1.13 (#17368) 
- Fabric argument validation now only raises an error if conflicting settings are set through the CLI (#17679) 
- DataLoader re-instantiation is now only performed when a distributed sampler is required (#18191) 
- Improved the formatting of emitted warnings (#18288) 
- Broadcast and reduction of tensors with XLA-based strategies now preserve the input’s device (#18275) 
- Due to lack of reliability, Fabric now only runs on one GPU instead of all GPUs in a Jupyter notebook if - devices="auto"(default) (#18291)
- Enabled launching via - torchrunin a SLURM environment; the- TorchElasticEnvironmentnow gets chosen over the- SLURMEnvironmentif both are detected (#18618)
- If not set by the user, Lightning will set - OMP_NUM_THREADSto- num_cpus / num_processeswhen launching subprocesses (e.g. when DDP is used) to avoid system overload for CPU-intensive tasks (#18677)
[2.1.0] - Deprecated¶
- Deprecated the - DDPStrategy.is_distributedproperty. This strategy is distributed by definition (#17381)
- Deprecated the - SingleTPUStrategy(- strategy="single_tpu") in favor of- SingleDeviceXLAStrategy(- strategy="single_xla") (#17383)
- Deprecated the - TPUAcceleratorin favor of- XLAAccelerator(#17383)
- Deprecated the - TPUPrecisionin favor of- XLAPrecision(#17383)
- Deprecated the - TPUBf16Precisionin favor of- XLABf16Precision(#17383)
[2.1.0] - Removed¶
- Removed automatic sharding support with - Fabric.runor using- fabric.launch(fn). This only impacts FSDP and DeepSpeed strategy users. Please instantiate your module under the newly added- fabric.init_modulecontext manager (#17832)
- Removed the unsupported - checkpoint_ioargument from the- FSDPStrategy(#18192)
[2.1.0] - Fixed¶
- Fixed issue where running on TPUs would select the wrong device index (#17227) 
- Removed the need to call - .launch()when using the DP-strategy (- strategy="dp") (#17931)
- Fixed FSDP re-applying activation checkpointing when the user had manually applied it already (#18006) 
- Fixed FSDP re-wrapping the module root when the user had manually wrapped the model (#18054) 
- Fixed issue where unexpected exceptions would leave the default torch dtype modified when using true precision settings (#18500) 
- Fixed redundant input-type casting in FSDP precision (#18630) 
- Fixed an issue with - find_usable_cuda_devices(0)incorrectly returning a list of devices (#18722)
- Fixed redundant file writes in - CSVLogger(#18567)
[2.0.9] - 2023-09-14¶
[2.0.9] - Fixed¶
- Fixed an issue causing the - _FabricOptimizer.stateto remain outdated after loading with- load_state_dict(#18488)
[2.0.8] - 2023-08-29¶
[2.0.8] - Changed¶
- On XLA, avoid setting the global rank before processes have been launched as this will initialize the PJRT computation client in the main process (#16966) 
[2.0.8] - Fixed¶
- Fixed model parameters getting shared between processes when running with - strategy="ddp_spawn"and- accelerator="cpu"; this has a necessary memory impact, as parameters are replicated for each process now (#18238)
- Removed false positive warning when using - fabric.no_backward_syncwith XLA strategies (#17761)
- Fixed issue where Fabric would not initialize the global rank, world size, and rank-zero-only rank after initialization and before launch (#16966) 
- Fixed FSDP full-precision - param_dtypetraining (- 16-mixed,- bf16-mixedand- 32-trueconfigurations) to avoid FSDP assertion errors with PyTorch < 2.0 (#18278)
[2.0.7] - 2023-08-14¶
[2.0.7] - Changed¶
- Disabled the auto-detection of the Kubeflow environment (#18137) 
[2.0.7] - Fixed¶
- Fixed issue where DDP subprocesses that used Hydra would set hydra’s working directory to current directory (#18145) 
- Fixed an issue that would prevent the user to set the multiprocessing start method after importing lightning (#18177) 
- Fixed an issue with - Fabric.all_reduce()not performing an inplace operation for all backends consistently (#18235)
[2.0.6] - 2023-07-20¶
[2.0.6] - Fixed¶
- Fixed - TensorBoardLogger.log_graphnot unwrapping the- _FabricModule(#17844)
[2.0.5] - 2023-07-07¶
[2.0.5] - Added¶
- Added validation against misconfigured device selection when using the DeepSpeed strategy (#17952) 
[2.0.5] - Changed¶
- Avoid info message when loading 0 entry point callbacks (#17990) 
[2.0.5] - Fixed¶
- Fixed the emission of a false-positive warning when calling a method on the Fabric-wrapped module that accepts no arguments (#17875) 
- Fixed check for FSDP’s flat parameters in all parameter groups (#17914) 
- Fixed automatic step tracking in Fabric’s CSVLogger (#17942) 
- Fixed an issue causing the - torch.set_float32_matmul_precisioninfo message to show multiple times (#17960)
- Fixed loading model state when - Fabric.load()is called after- Fabric.setup()(#17997)
[2.0.4] - 2023-06-22¶
[2.0.4] - Fixed¶
[2.0.3] - 2023-06-07¶
- Added support for - Callbackregistration through entry points (#17756)
[2.0.3] - Changed¶
[2.0.3] - Fixed¶
[2.0.2] - 2023-04-24¶
[2.0.2] - Changed¶
- Enabled precision autocast for LightningModule step methods in Fabric (#17439) 
[2.0.2] - Fixed¶
[2.0.1] - 2023-03-30¶
[2.0.1] - Changed¶
- Generalized - Optimizervalidation to accommodate both FSDP 1.x and 2.x (#16733)
[2.0.0] - 2023-03-15¶
[2.0.0] - Added¶
- Added - Fabric.all_reduce(#16459)
- Added support for saving and loading DeepSpeed checkpoints through - Fabric.save/load()(#16452)
- Added support for automatically calling - set_epochon the- dataloader.batch_sampler.sampler(#16841)
- Added support for writing logs to remote file systems with the - CSVLogger(#16880)
- Added support for frozen dataclasses in the optimizer state (#16656) 
- Added - lightning.fabric.is_wrappedto check whether a module, optimizer, or dataloader was already wrapped by Fabric (#16953)
[2.0.0] - Changed¶
- Fabric now chooses - accelerator="auto", strategy="auto", devices="auto"as defaults (#16842)
- Checkpoint saving and loading redesign (#16434) - Changed the method signatrue of - Fabric.saveand- Fabric.load
- Changed the method signature of - Strategy.save_checkpointand- Fabric.load_checkpoint
- Fabric.saveaccepts a state that can contain model and optimizer references
- Fabric.loadcan now load state in-place onto models and optimizers
- Fabric.loadreturns a dictionary of objects that weren’t loaded into the state
- Strategy.save_checkpointand- Fabric.load_checkpointare now responsible for accessing the state of the model and optimizers
 
- DataParallelStrategy.get_module_state_dict()and- DDPStrategy.get_module_state_dict()now correctly extracts the state dict without keys prefixed with ‘module’ (#16487)
- “Native” suffix removal (#16490) - strategy="fsdp_full_shard_offload"is now- strategy="fsdp_cpu_offload"
- lightning.fabric.plugins.precision.native_ampis now- lightning.fabric.plugins.precision.amp
 
- Enabled all shorthand strategy names that can be supported in the CLI (#16485) 
- Renamed - strategy='tpu_spawn'to- strategy='xla'and- strategy='tpu_spawn_debug'to- strategy='xla_debug'(#16781)
- Changed arguments for precision settings (from [64|32|16|bf16] to [“64-true”|”32-true”|”16-mixed”|”bf16-mixed”]) (#16767) 
- The selection - Fabric(strategy="ddp_spawn", ...)no longer falls back to “ddp” when a cluster environment gets detected (#16780)
- Renamed - setup_dataloaders(replace_sampler=...)to- setup_dataloaders(use_distributed_sampler=...)(#16829)
[2.0.0] - Removed¶
[2.0.0] - Fixed¶
[1.9.4] - 2023-03-01¶
[1.9.4] - Added¶
- Added - Fabric(strategy="auto")support (#16916)
[1.9.4] - Fixed¶
[1.9.3] - 2023-02-21¶
[1.9.3] - Fixed¶
[1.9.2] - 2023-02-15¶
[1.9.2] - Fixed¶
- Fixed an attribute error and improved input validation for invalid strategy types being passed to Trainer (#16693) 
[1.9.1] - 2023-02-10¶
[1.9.1] - Fixed¶
- Fixed error handling for - accelerator="mps"and- ddpstrategy pairing (#16455)
- Fixed strict availability check for - torch_xlarequirement (#16476)
- Fixed an issue where PL would wrap DataLoaders with XLA’s MpDeviceLoader more than once (#16571) 
- Fixed the batch_sampler reference for DataLoaders wrapped with XLA’s MpDeviceLoader (#16571) 
- Fixed an import error when - torch.distributedis not available (#16658)
[1.9.0] - 2023-01-17¶
[1.9.0] - Added¶
- Added - Fabric.launch()to programmatically launch processes (e.g. in Jupyter notebook) (#14992)
- Added the option to launch Fabric scripts from the CLI, without the need to wrap the code into the - runmethod (#14992)
- Added - Fabric.setup_module()and- Fabric.setup_optimizers()to support strategies that need to set up the model before an optimizer can be created (#15185)
- Added support for Fully Sharded Data Parallel (FSDP) training in Lightning Lite (#14967) 
- Added - lightning.fabric.accelerators.find_usable_cuda_devicesutility function (#16147)
- Added basic support for LightningModules (#16048) 
- Added support for managing callbacks via - Fabric(callbacks=...)and emitting events through- Fabric.call()(#16074)
- Added Logger support (#16121) - Added - Fabric(loggers=...)to support different Logger frameworks in Fabric
- Added - Fabric.logfor logging scalars using multiple loggers
- Added - Fabric.log_dictfor logging a dictionary of multiple metrics at once
- Added - Fabric.loggersand- Fabric.loggerattributes to access the individual logger instances
- Added support for calling - self.logand- self.log_dictin a LightningModule when using Fabric
- Added access to - self.loggerand- self.loggersin a LightningModule when using Fabric
 
- Added - lightning.fabric.loggers.TensorBoardLogger(#16121)
- Added - lightning.fabric.loggers.CSVLogger(#16346)
- Added support for a consistent - .zero_grad(set_to_none=...)on the wrapped optimizer regardless of which strategy is used (#16275)
[1.9.0] - Changed¶
- The - Fabric.run()method is no longer abstract (#14992)
- The - XLAStrategynow inherits from- ParallelStrategyinstead of- DDPSpawnStrategy(#15838)
- Merged the implementation of - DDPSpawnStrategyinto- DDPStrategyand removed- DDPSpawnStrategy(#14952)
- The dataloader wrapper returned from - .setup_dataloaders()now calls- .set_epoch()on the distributed sampler if one is used (#16101)
- Renamed - Strategy.reduceto- Strategy.all_reducein all strategies (#16370)
- When using multiple devices, the strategy now defaults to “ddp” instead of “ddp_spawn” when none is set (#16388) 
[1.9.0] - Removed¶
- Removed support for FairScale’s sharded training ( - strategy='ddp_sharded'|'ddp_sharded_spawn'). Use Fully-Sharded Data Parallel instead (- strategy='fsdp') (#16329)
[1.9.0] - Fixed¶
[1.8.6] - 2022-12-21¶
- minor cleaning 
[1.8.5] - 2022-12-15¶
- minor cleaning 
[1.8.4] - 2022-12-08¶
[1.8.4] - Fixed¶
- Fixed - shuffle=Falsehaving no effect when using DDP/DistributedSampler (#15931)
[1.8.3] - 2022-11-22¶
[1.8.3] - Changed¶
- Temporarily removed support for Hydra multi-run (#15737) 
[1.8.2] - 2022-11-17¶
[1.8.2] - Fixed¶
- Fixed the automatic fallback from - LightningLite(strategy="ddp_spawn", ...)to- LightningLite(strategy="ddp", ...)when on an LSF cluster (#15103)