Quantize#
Submodules#
Quantize#
Interactive file for quantizing models.
Utils#
Utility functions for quantization.
- grag.quantize.utils.download_release_asset(download_url: str, root_quantize: Path | str = './grag-quantize') None [source]#
Downloads a file from a given URL and saves it to a specified path. It also attempts to extract the file if it is a ZIP archive.
- Parameters:
download_url – The URL of the file to download.
root_quantize – Path where the file will be saved.
- Returns:
None
- grag.quantize.utils.fetch_model_repo(repo_id: str, model_path: str | Path = './grag-quantize/models') str | Path [source]#
Downloads a model from huggingface.co/models to a specified directory.
- Parameters:
repo_id – Repository ID of the model to download (e.g., ‘huggingface/gpt2’).
model_path – The local directory where the model should be downloaded.
- Returns:
The path to the directory where the model is downloaded.
- grag.quantize.utils.get_asset_download_url(asset_name_pattern: str, user: str = 'ggerganov', repo: str = 'llama.cpp') str | None [source]#
Fetches the download URL of the first asset that matches a given name pattern in the latest release of the specified repository.
- Parameters:
asset_name_pattern – Substring to match in the asset’s name.
user – GitHub username or organization of the repository.
repo – Repository name.
- Returns:
The download URL of the matching asset, or None if no match is found.
- grag.quantize.utils.get_llamacpp_repo(repo_url: str = 'https://github.com/ggerganov/llama.cpp.git', destination_folder: str | Path = './grag-quantize') None [source]#
Clones a GitHub repository to a specified local directory or updates it if it already exists. The directory is created if it does not exist. If the repository is already cloned, it pulls updates.
- Parameters:
repo_url – The URL of the repository to clone.
destination_folder – The local path where the repository should be cloned or updated.
- Returns:
None
- grag.quantize.utils.inference_quantized_model(root_quantize: str | Path, quantized_model_file: str | Path) CompletedProcess [source]#
Runs inference using a quantized model binary.
- Parameters:
root_quantize – The root directory containing the compiled inference executable.
quantized_model_file – The file path to the quantized model to use for inference.
- Returns:
The subprocess.CompletedProcess object containing the inference execution result.
- grag.quantize.utils.quantize_model(model_dir_path: str | Path, quantization: str, root_quantize: str | Path = './grag-quantize', output_dir: Path | str | None = None) Tuple[Path, Path] [source]#
Quantizes a specified model using a given quantization level and saves it to an optional directory. If the output directory is not specified, it defaults to a subdirectory under the provided model directory. The function also handles specific exceptions during the conversion process and ensures the creation of the necessary directories.
- Parameters:
model_dir_path – The directory path of the model to be quantized. This path must exist and contain the model files.
quantization – The quantization level to apply (e.g., ‘f32’, ‘f16’). This affects the precision and size of the model.
root_quantize – The root directory containing the quantization tools and scripts. This directory should have the necessary binary files and scripts for the quantization process.
output_dir – Optional directory to save the quantized model. If not specified, the function uses a default directory based on the model directory path.
- Returns:
Returns a tuple containing the path to the root of the quantization tools and the path to the quantized model file.
- Return type:
Tuple[Path, Path]
- Raises:
PermissionError – If the function lacks permissions to execute the quantization binaries, it will attempt to modify permissions and retry.
TypeError – If there are issues with the provided model directory or quantization parameters.
- grag.quantize.utils.repo_id_resolver(repo_url: str) str [source]#
Resolves the HuggingFace repository ID given a full URL to a model or dataset page.
This function parses a HuggingFace URL to extract the repository ID, which typically consists of a user or organization name followed by the repository name. If the URL does not start with the expected HuggingFace URL prefix, it returns the input URL unchanged.
- Parameters:
repo_url – The full URL string pointing to a specific HuggingFace repository.
- Returns:
The repository ID in the format ‘username/repository_name’ if the URL is valid, otherwise returns the original URL.
Examples
Input: “https://huggingface.co/gpt2/models” Output: “gpt2/models”
Input: “https://huggingface.co/facebook/bart-large” Output: “facebook/bart-large”
Input: “some_other_url” Output: “some_other_url”