xlstm_jax.dataset.hf_tokenizer

xlstm_jax.dataset.hf_tokenizer#

Functions#

load_tokenizer(tokenizer_path, add_bos, add_eos[, ...])

Loads the tokenizer.

Module Contents#

xlstm_jax.dataset.hf_tokenizer.load_tokenizer(tokenizer_path, add_bos, add_eos, hf_access_token=None, cache_dir=None)#

Loads the tokenizer.

Parameters:
  • tokenizer_path (str) – The path to the tokenizer.

  • add_bos (bool) – Whether to add the beginning of sequence token.

  • add_eos (bool) – Whether to add the end of sequence token.

  • hf_access_token (str | None) – The access token for HuggingFace.

  • cache_dir (str | None) – The cache directory for the tokenizer.

Returns:

The tokenizer.

Return type:

transformers.AutoTokenizer