Components#
VectorDB#
Embedding#
Class for embedding.
This module provides:
— Embedding
LLM#
Class for LLM.
- grag.components.llm.LLM(model_name: str, quantization: str, pipeline: str, device_map: str = 'auto', task: str = 'text-generation', max_new_tokens: str = '1024', temperature: str | float = 0.1, n_batch: str | int = 1024, n_ctx: str | int = 6000, n_gpu_layers: str | int = -1, std_out: bool | str = True, base_dir: str | Path = PosixPath('models'), callbacks=None)[source]#
A class for managing and utilizing large language models (LLMs).
This class facilitates the loading and operation of large language models using different pipelines and settings. It supports both local and Hugging Face-based model management, with adjustable parameters for quantization, computational specifics, and output control.
- grag.components.llm.quantization[source]#
Quantization setting for the model, affecting performance and memory usage.
- Type:
str
- grag.components.llm.pipeline[source]#
Type of pipeline (‘llama_cpp’ or ‘hf’) used for model operations.
- Type:
str
- grag.components.llm.device_map[source]#
Device mapping for model execution, defaults to ‘auto’.
- Type:
str
- grag.components.llm.task[source]#
The task for which the model is being used, defaults to ‘text-generation’.
- Type:
str
- grag.components.llm.max_new_tokens[source]#
Maximum number of new tokens to be generated, defaults to 1024.
- Type:
int
- grag.components.llm.temperature[source]#
Sampling temperature for generation, affecting randomness.
- Type:
float
- grag.components.llm.n_batch[source]#
Number of batches for GPU CPP, impacting batch processing.
- Type:
int
- grag.components.llm.n_ctx[source]#
Context size for CPP, defining the extent of context considered.
- Type:
int
- grag.components.llm.n_gpu_layers[source]#
Number of GPU layers for CPP, specifying computational depth.
- Type:
int
- grag.components.llm.std_out[source]#
Flag or descriptor for standard output during operations.
- Type:
bool or str
Retriever#
Class for retriever.
This module provides:
— Retriever
- grag.components.multivec_retriever.Retriever(vectordb: VectorDB | None = None, store_path: str | Path = PosixPath('data/doc_store'), top_k: str | int = 3, id_key: str = 'doc_id', namespace: str = '71e4b558187b270922923569301f1039', client_kwargs: Dict[str, Any] | None = None)[source]#
A class for multi vector retriever.
It connects to a vector database and a local file store. It is used to return most similar chunks from a vector store but has the additional functionality to return a linked document, chunk, etc.
- grag.components.multivec_retriever.vectordb[source]#
ChromaClient class instance from components.client (Optional, if the user provides it, store_path, id_key and namespace is not considered)
- grag.components.multivec_retriever.store[source]#
langchain.storage.LocalFileStore object, stores the key value pairs of document id and parent file
- grag.components.multivec_retriever.retriever[source]#
langchain.retrievers.multi_vector.MultiVectorRetriever class instance, langchain’s multi-vector retriever
Parse PDF#
Classes for parsing files.
This module provides:
— ParsePDF
- grag.components.parse_pdf.ParsePDF(single_text_out: bool = True, strategy: str = 'hi_res', infer_table_structure: bool = True, extract_images: bool = True, image_output_dir: str | None = None, add_captions_to_text: bool = True, add_captions_to_blocks: bool = True, add_caption_first: bool = True, table_as_html: bool = False)[source]#
Parsing and partitioning PDF documents into Text, Table or Image elements.
- grag.components.parse_pdf.single_text_out[source]#
Whether to combine all text elements into a single output document.
- Type:
bool
- grag.components.parse_pdf.strategy[source]#
The strategy for PDF partitioning; default is “hi_res” for better accuracy.
- Type:
str
- grag.components.parse_pdf.infer_table_structure[source]#
Whether to extract tables during partitioning.
- Type:
bool
- grag.components.parse_pdf.image_output_dir[source]#
Directory to save extracted images, if any.
- Type:
str
Whether to include figure captions in text output. Default is True.
- Type:
bool
Whether to add captions to table and image blocks. Default is True.
- Type:
bool
Whether to place captions before their corresponding image or table in the output. Default is True.
- Type:
bool
Prompt#
Classes for prompts.
This module provides:
— Prompt: for generic prompts
— FewShotPrompt: for few-shot prompts
- class grag.components.prompt.FewShotPrompt(*, name: str = 'custom_prompt', llm_type: str = 'None', task: str = 'QA', source: str = 'NoSource', doc_chain: str = 'stuff', language: str = 'en', filepath: str | None = None, input_keys: List[str], template: str, prompt: PromptTemplate | None = None, output_keys: List[str], examples: List[Dict[str, Any]], prefix: str, suffix: str, example_template: str)[source]#
Bases:
Prompt
A class for generic prompts.
- llm_type[source]#
The type of llm, llama2, etc (Optional, defaults to “None”) (Parent Class)
- Type:
str
- source[source]#
The source of the prompt (Optional, defaults to “NoSource”) (Parent Class)
- Type:
str
- doc_chain[source]#
The doc chain for the prompt (“stuff”, “refine”) (Optional, defaults to “stuff”) (Parent Class)
- Type:
str
- examples[source]#
The list of examples, each example is a dictionary with respective keys
- Type:
List[Dict[str, Any]]
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}[source]#
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {}[source]#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'doc_chain': FieldInfo(annotation=str, required=False, default='stuff'), 'example_template': FieldInfo(annotation=str, required=True), 'examples': FieldInfo(annotation=List[Dict[str, Any]], required=True), 'filepath': FieldInfo(annotation=Union[str, NoneType], required=False, default=None, exclude=True), 'input_keys': FieldInfo(annotation=List[str], required=True), 'language': FieldInfo(annotation=str, required=False, default='en'), 'llm_type': FieldInfo(annotation=str, required=False, default='None'), 'name': FieldInfo(annotation=str, required=False, default='custom_prompt'), 'output_keys': FieldInfo(annotation=List[str], required=True), 'prefix': FieldInfo(annotation=str, required=True), 'prompt': FieldInfo(annotation=Union[PromptTemplate, NoneType], required=False, default=None, exclude=True, repr=False), 'source': FieldInfo(annotation=str, required=False, default='NoSource'), 'suffix': FieldInfo(annotation=str, required=True), 'task': FieldInfo(annotation=str, required=False, default='QA'), 'template': FieldInfo(annotation=str, required=True)}[source]#
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- class grag.components.prompt.Prompt(*, name: str = 'custom_prompt', llm_type: str = 'None', task: str = 'QA', source: str = 'NoSource', doc_chain: str = 'stuff', language: str = 'en', filepath: str | None = None, input_keys: List[str], template: str, prompt: PromptTemplate | None = None)[source]#
Bases:
BaseModel
A class for generic prompts.
- doc_chain[source]#
The doc chain for the prompt (“stuff”, “refine”) (Optional, defaults to “stuff”)
- Type:
str
template (str): The template for the prompt
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}[source]#
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {}[source]#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'doc_chain': FieldInfo(annotation=str, required=False, default='stuff'), 'filepath': FieldInfo(annotation=Union[str, NoneType], required=False, default=None, exclude=True), 'input_keys': FieldInfo(annotation=List[str], required=True), 'language': FieldInfo(annotation=str, required=False, default='en'), 'llm_type': FieldInfo(annotation=str, required=False, default='None'), 'name': FieldInfo(annotation=str, required=False, default='custom_prompt'), 'prompt': FieldInfo(annotation=Union[PromptTemplate, NoneType], required=False, default=None, exclude=True, repr=False), 'source': FieldInfo(annotation=str, required=False, default='NoSource'), 'task': FieldInfo(annotation=str, required=False, default='QA'), 'template': FieldInfo(annotation=str, required=True)}[source]#
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
Text Splitter#
Class for splitting/chunking text.
This module provides:
— TextSplitter
Utils#
Utils functions.
This module provides:
— stuff_docs: concats langchain documents into string
— load_prompt: loads json prompt to langchain prompt
— find_config_path: finds the path of the ‘config.ini’ file by traversing up the directory tree from the current path.
— get_config: retrieves and parses the configuration settings from the ‘config.ini’ file.
— configure_args: a decorator to configure class instantiation arguments from a ‘config.ini’ file.
- grag.components.utils.configure_args(cls)[source]#
Decorator to configure class instantiation arguments from a ‘config.ini’ file, based on the class’s module name.
This function reads configuration specific to a class’s module from ‘config.ini’, then uses it to override or provide defaults for keyword arguments passed during class instantiation.
- Parameters:
cls (class) – The class whose instantiation is to be configured.
- Returns:
A wrapped class constructor that uses modified arguments based on the configuration.
- Return type:
function
- Raises:
TypeError – If there is a mismatch in provided arguments and class constructor requirements.
- grag.components.utils.find_config_path(current_path: Path)[source]#
Finds the path of the ‘config.ini’ file by traversing up the directory tree from the current path.
This function starts at the current path and moves up the directory tree until it finds a file named ‘config.ini’. If ‘config.ini’ is not found by the time the root of the directory tree is reached, None is returned.
- Parameters:
current_path (Path) – The starting point for the search, typically the location of the script being executed.
- Returns:
None or the path to the found ‘config.ini’ file.
- Return type:
Path
- grag.components.utils.get_config(load_env=False)[source]#
Retrieves and parses the configuration settings from the ‘config.ini’ file.
This function locates the ‘config.ini’ file by calling find_config_path using the script’s current location. It initializes a ConfigParser object to read the configuration settings from the located ‘config.ini’ file. Optionally, it can also load environment variables from a .env file specified in the config. If a config file cannot be read, a default dictionary is returned.
- Parameters:
load_env (bool) – If True, load environment variables from the path specified in the ‘config.ini’. Defaults to False.
- Returns:
- A parser object containing the configuration settings from ‘config.ini’, or a defaultdict
with None if the file is not found or an empty dict{dict{}}.
- Return type:
ConfigParser