Quantize#

Submodules#

Quantize#

Interactive file for quantizing models.

Utils#

Utility functions for quantization.

grag.quantize.utils.download_release_asset(download_url: str, root_quantize: Path | str = './grag-quantize') None[source]#

Downloads a file from a given URL and saves it to a specified path. It also attempts to extract the file if it is a ZIP archive.

Parameters:
  • download_url – The URL of the file to download.

  • root_quantize – Path where the file will be saved.

Returns:

None

grag.quantize.utils.fetch_model_repo(repo_id: str, model_path: str | Path = './grag-quantize/models') str | Path[source]#

Downloads a model from huggingface.co/models to a specified directory.

Parameters:
  • repo_id – Repository ID of the model to download (e.g., ‘huggingface/gpt2’).

  • model_path – The local directory where the model should be downloaded.

Returns:

The path to the directory where the model is downloaded.

grag.quantize.utils.get_asset_download_url(asset_name_pattern: str, user: str = 'ggerganov', repo: str = 'llama.cpp') str | None[source]#

Fetches the download URL of the first asset that matches a given name pattern in the latest release of the specified repository.

Parameters:
  • asset_name_pattern – Substring to match in the asset’s name.

  • user – GitHub username or organization of the repository.

  • repo – Repository name.

Returns:

The download URL of the matching asset, or None if no match is found.

grag.quantize.utils.get_llamacpp_repo(repo_url: str = 'https://github.com/ggerganov/llama.cpp.git', destination_folder: str | Path = './grag-quantize') None[source]#

Clones a GitHub repository to a specified local directory or updates it if it already exists. The directory is created if it does not exist. If the repository is already cloned, it pulls updates.

Parameters:
  • repo_url – The URL of the repository to clone.

  • destination_folder – The local path where the repository should be cloned or updated.

Returns:

None

grag.quantize.utils.inference_quantized_model(root_quantize: str | Path, quantized_model_file: str | Path) CompletedProcess[source]#

Runs inference using a quantized model binary.

Parameters:
  • root_quantize – The root directory containing the compiled inference executable.

  • quantized_model_file – The file path to the quantized model to use for inference.

Returns:

The subprocess.CompletedProcess object containing the inference execution result.

grag.quantize.utils.quantize_model(model_dir_path: str | Path, quantization: str, root_quantize: str | Path = './grag-quantize', output_dir: Path | str | None = None) Tuple[Path, Path][source]#

Quantizes a specified model using a given quantization level and saves it to an optional directory. If the output directory is not specified, it defaults to a subdirectory under the provided model directory. The function also handles specific exceptions during the conversion process and ensures the creation of the necessary directories.

Parameters:
  • model_dir_path – The directory path of the model to be quantized. This path must exist and contain the model files.

  • quantization – The quantization level to apply (e.g., ‘f32’, ‘f16’). This affects the precision and size of the model.

  • root_quantize – The root directory containing the quantization tools and scripts. This directory should have the necessary binary files and scripts for the quantization process.

  • output_dir – Optional directory to save the quantized model. If not specified, the function uses a default directory based on the model directory path.

Returns:

Returns a tuple containing the path to the root of the quantization tools and the path to the quantized model file.

Return type:

Tuple[Path, Path]

Raises:
  • PermissionError – If the function lacks permissions to execute the quantization binaries, it will attempt to modify permissions and retry.

  • TypeError – If there are issues with the provided model directory or quantization parameters.

grag.quantize.utils.repo_id_resolver(repo_url: str) str[source]#

Resolves the HuggingFace repository ID given a full URL to a model or dataset page.

This function parses a HuggingFace URL to extract the repository ID, which typically consists of a user or organization name followed by the repository name. If the URL does not start with the expected HuggingFace URL prefix, it returns the input URL unchanged.

Parameters:

repo_url – The full URL string pointing to a specific HuggingFace repository.

Returns:

The repository ID in the format ‘username/repository_name’ if the URL is valid, otherwise returns the original URL.

Examples

Input: “https://huggingface.co/gpt2/models” Output: “gpt2/models”

Input: “https://huggingface.co/facebook/bart-large” Output: “facebook/bart-large”

Input: “some_other_url” Output: “some_other_url”

Module contents#