BitsandbytesPrecision¶

class lightning.fabric.plugins.precision.BitsandbytesPrecision(mode, dtype=None, ignore_modules=None)[source]¶

Plugin for quantizing weights with bitsandbytes.

Warning

This is an experimental feature.

Note

The optimizer is not automatically replaced with bitsandbytes.optim.Adam8bit or equivalent 8-bit optimizers.

Parameters:

mode¶ (Literal['nf4', 'nf4-dq', 'fp4', 'fp4-dq', 'int8', 'int8-training']) – The quantization mode to use.
dtype¶ (Optional[dtype]) – The compute dtype to use.
ignore_modules¶ (Optional[set[str]]) – The submodules whose Linear layers should not be replaced, for example. {"lm_head"}. This might be desirable for numerical stability. The string will be checked in as a prefix, so a value like “transformer.blocks” will ignore all linear layers in all of the transformer blocks.

convert_input(data)[source]¶

Convert model inputs (forward) to the floating point precision type of this plugin.

This is a no-op in the base precision plugin, since we assume the data already has the desired type (default is torch.float32).

convert_module(module)[source]¶

Convert the module parameters to the precision type this plugin handles.

This is optional and depends on the precision limitations during optimization.

convert_output(data)[source]¶

Convert outputs to the floating point precision type expected after model’s forward.

This is a no-op in the base precision plugin, since we assume the data already has the desired type (default is torch.float32).

forward_context()[source]¶

A contextmanager for managing model forward/training_step/evaluation_step/predict_step.

module_init_context()[source]¶

Instantiate module parameters or tensors in the precision type this plugin handles.

This is optional and depends on the precision limitations during optimization.

tensor_init_context()[source]¶

Controls how tensors get created (device, dtype).