Integrates with DeepDoc OCR and document parsing services to ingest documents into a vector store
The DeepDoc Client Action interfaces with the JIVAS DeepDoc service for document processing, chunking, metadata extraction, document handling, serving, and integrated management. Users can upload documents or URLs in batches across multiple formats, including PDF, DOCX, Excel, PPT, and TXT. Processing occurs asynchronously, culminating in ingestion and indexing within your configured vector store.
The DeepDoc Client Action now provides comprehensive document handling and management capabilities through built-in file serving integrated with your configured vector store:
File Serving and Integration: Uploaded documents can be served directly and are linked seamlessly to stored content within the vector store. When deleting a processed document through this interface, all associated references and indexed vector content will also be automatically removed, maintaining data integrity.
Document Management Interface: Users have access to an intuitive management interface to easily view a comprehensive list of all ingested documents. Documents may be quickly added or removed, thereby making repository management straightforward and efficient.
Duplicate Prevention Safeguards: Duplicate processing of documents is automatically prevented. The system verifies and ensures that each uploaded document is unique within the repository and vector store, eliminating redundancy and potential confusion.
This package is defined as a singleton action, requiring the Jivas library (version 2.0.0 or higher) and a properly configured vector_store_action
. It also requires the Jivas-modified DeepDoc service.
jivas/deepdoc_client_action
DeepDocStoreAction
^2.0.0
The DeepDoc Client Action can be accessed via an API endpoint to perform various operations using different walkers. The standard endpoint for interfacing with operations is /action/walker
. Below are the details and examples for each walker that can be called:
POST /action/walker
agent_id
: The ID of the agentmodule_root
: For this action, the module root will be 'actions.jivas.deepdoc_client_action'
walker
: The name of the walker (e.g., list_documents
, add_documents
, delete_documents
)args
: The arguments passed to the respective walkerfiles
: A list of files for upload, supplied as a list of dictionaries, each containing name
, content
, and type
.This walker processes documents and ingests them into the vector store.
Payload Example:
{
"agent_id": "12345",
"module_root": "actions.jivas.deepdoc_client_action",
"walker": "add_documents",
"args": {
"urls": ["http://example.com/document.pdf"],
"metadatas": [{"author": "John Doe"}],
"from_page": 0,
"to_page": 100000,
"lang": "english"
},
"files": [
{
"name": "document1.pdf",
"content": "byte data",
"type": "application/pdf"
}
]
}
This walker lists all documents processed by the deepdoc service.
Payload Example:
{
"agent_id": "12345",
"module_root": "actions.jivas.deepdoc_client_action",
"walker": "list_documents",
"args": {}
}
This walker removes documents from the vector store and deletes local file system entries.
Payload Example:
{
"agent_id": "12345",
"module_root": "actions.jivas.deepdoc_client_action",
"walker": "delete_documents",
"args": {
"documents": [
{
"job_id": "67890",
"filename": "document1.pdf"
}
]
}
}
This action supports configuration either via environment variables or directly from within the action application settings.
Variable Name | Description | Default Value | Required |
---|---|---|---|
DEEPDOC_API_URL |
API endpoint URL of your DeepDoc service | http://localhost:8001 |
Yes |
DEEPDOC_API_KEY |
Your DeepDoc API authentication token | api-key (replace with secure value) |
Yes |
JIVAS_BASE_URL |
Base URL for your Jivas instance; required for the deepdoc callback to function | http://localhost:8000 | Yes |
VECTOR_STORE_ACTION |
Action used for storing vector data | TypesenseVectorStoreAction |
Yes |
You can configure these values as environment variables so they are accessible to your runtime environment.
.bashrc
or shell configuration):export DEEPDOC_API_URL="https://your-custom-deepdoc-url.com"
export DEEPDOC_API_KEY="your-secure-api-key"
export JIVAS_BASE_URL="https://your-jivas-base-url.com"
docker-compose.yml
):environment:
DEEPDOC_API_URL: "https://your-custom-deepdoc-url.com"
DEEPDOC_API_KEY: "your-secure-api-key"
JIVAS_BASE_URL: "https://your-jivas-base-url.com"
Alternatively, you can configure these variables within the action app's settings interface in the JIVAS manager. This is useful if you prefer not to use environment variables or want easy adjustments through the GUI.
From within the action's configuration page:
api_url
, api_key
, and vector_store_action
) into their respective configuration fields.Note: Configuration values set directly within the action app override those provided via environment variables.
After adjusting settings, restart your service or action to apply your changes.
If operating locally, you must ensure proper configuration of either local or Amazon S3 file serving to enable this functionality. Configure the following variables in your .env
file:
# Set Environment
JIVAS_ENVIRONMENT=development
# JIVAS File Serving Config (Local)
JIVAS_FILE_INTERFACE="local"
JIVAS_FILES_ROOT_PATH=".files"
JIVAS_FILES_URL="http://127.0.0.1:9000/files"
# Alternative: Amazon S3 Configuration
JIVAS_FILE_INTERFACE="s3"
JIVAS_S3_ENDPOINT="your-s3-endpoint" # optional, typically for custom endpoints
JIVAS_S3_ACCESS_KEY_ID="your-access-key-id"
JIVAS_S3_SECRET_ACCESS_KEY="your-secret-access-key"
JIVAS_S3_REGION="us-east-1"
JIVAS_S3_BUCKET_NAME="your-bucket-name"
In addition, running locally requires configuration of MongoDB and a suitable vector store (we strongly recommend Typesense). Include the following configuration in your .env
file:
# MongoDB Config
DATABASE_HOST="mongodb://localhost:27017/?replicaSet=my-rs"
# Typesense Vector Store Config
TYPESENSE_HOST="localhost"
TYPESENSE_PORT=8108
TYPESENSE_PROTOCOL="http"
TYPESENSE_API_KEY="abcd"
TYPESENSE_CONNECTION_TIMEOUT_SECONDS=2
git clone https://github.com/YOUR_USERNAME/deepdoc_client_action
git checkout -b feature-xyz
git commit -m 'feat: add XYZ feature'
git push origin feature-xyz
This project is licensed under the Apache License 2.0. See the LICENSE file for additional licensing information.
jvcli download action jivas/deepdoc_client_action
Last published
2 months ago
Version
0.0.5
Downloads
45
Author
jivasType
action
Visibility
Public
Tags