xlstm_jax.models.xlstm_pytorch.components.init#

Functions#

`bias_linspace_init_`(param[, start, end])	Linearly spaced bias init across dimensions.
`small_init_init_`(param, dim)	Fills the input Tensor with values according to the method described in Transformers without Tears: Improving
`wang_init_`(param, dim, num_blocks)	Adopted from EleutherAI/gpt-neox.

Module Contents#

xlstm_jax.models.xlstm_pytorch.components.init.bias_linspace_init_(param, start=3.4, end=6.0)#

Linearly spaced bias init across dimensions.

Parameters:

param (torch.Tensor | torch.distributed._tensor.DTensor)
start (float)
end (float)

Return type:

torch.Tensor

xlstm_jax.models.xlstm_pytorch.components.init.small_init_init_(param, dim)#

Fills the input Tensor with values according to the method described in Transformers without Tears: Improving the Normalization of Self-Attention - Nguyen, T. & Salazar, J. (2019), using a normal distribution. Adopted from EleutherAI/gpt-neox.

Parameters:

param (torch.Tensor)
dim (int)

Return type:

torch.Tensor

xlstm_jax.models.xlstm_pytorch.components.init.wang_init_(param, dim, num_blocks)#

Adopted from EleutherAI/gpt-neox.

Parameters:

param (torch.Tensor)
dim (int)
num_blocks (int)