01/27/2025 | News release | Distributed by Public on 01/26/2025 19:07
As technologies around large language models (LLMs) evolve rapidly, two key questions have emerged: how to utilize LLM services securely, and how to generate beneficial and accurate answers for each organization. Current LLM services are commonly provided as Software-as-a-Service (SaaS) products, with customer data and queries sent through public clouds. This poses significant obstacles for many organizations aiming to utilize LLMs securely. Additionally, LLMs generate responses based on their training data, making it challenging to produce answers that are relevant to customers' specific use cases without the latest information or access to internal knowledge.
To address the first challenge, LLMs should be hosted within an on-premise environment, where customers have full control over their data. For the second challenge, retrieval-augmented generation (RAG) is widely adopted. RAG utilizes customizable knowledge bases as additional contextual information to enhance the LLMs' responses. Typically, materials containing the additional knowledge (such as internal documents and system log messages) are vectorized and stored in a vector database (vectorDB). When a query is executed, related knowledge content is identified through vector similarity search and added to the context of the LLM prompt to assist the generation process.
In this release, we extend Splunk DSDL with the architecture of an on-premise LLM-RAG system (Fig. 1) and provide a set of commands and dashboards to leverage functionalities from the Splunk DSDL app easily. Our system comprises three key components: an Ollama module for local LLM service, a Milvus module for local vectorDB and a DSDL container that utilizes the LlamaIndex framework to orchestrate the LLM and vectorDB modules for RAG operations.
Fig. 1: on-premise LLM-RAG system
Based on this architecture, we extend Splunk DSDL with the following four use cases:
1. Standalone LLM: using LLM for zero-shot Q&A or natural language processing tasks.
2. Standalone VectorDB: using VectorDB to encode data from Splunk and conduct similarity search.
3. Document-based LLM-RAG: encoding documents such as internal knowledge bases or past support tickets into VectorDB and using them as contextual information for LLM generation.
4. Function-Calling-based LLM-RAG: defining function tools in the Jupyter notebook for the LLM to execute automatically in order to obtain contextual information for generation.
In the rest of this blog, we provide brief introductions and example outputs of the four use cases.
The standalone LLM can be used to conduct Q&A based on data ingested in Splunk. In the following example, we are going to classify the text data ingested in Splunk, using the dashboard provided in Splunk DSDL 5.2. First, a Splunk search is conducted to output the target text messages. Then on the setting panel, we choose the LLM model and input the following prompt in the text box:
Categorize the text. Choose from Biology, Math, Computer Science and Physics. Only output the category name in your answer.
On click of the Run Inference button, the local LLM receives the text data along with the prompt. When the generation finishes, the result and the runtime duration for each text data are shown on the output panel.
In DSDL 5.2, dashboards are provided to encode Splunk data into VectorDB collections and conduct similarity search based on the collections. In the following example, an error log in Splunk _internal index is matched with past logs in a VectorDB collection utilizing vector search based on cosine similarity. The log selected from the search result (Socket error … broken pipe) has been matched with top 4 similar log messages recorded in the collection named Internal_Error_Logs, as configured in the Setting panel on the top.
In DSDL 5.2, we provide dashboards for encoding documents residing on the docker host into VectorDB collections and using the collections in LLM generation. In the following example, we have already encoded documents regarding the Buttercup online store into a vector collection named Buttercup_store_info. On the LLM-RAG dashboard, we select the collection as well as local LLM and input the following query:
Customer [email protected] had a Payment processing error during checkout for the product page of DB-SG-G01. Answer the following questions: 1.List of employees in charge of this product support 2.What were the resolution notes in the past tickets with the same issue description?
On click of the Query button, the query is sent to the container environment, where relevant documents from the collection are retrieved automatically, added to the query as contextual information and sent to the LLM. The response generated by LLM is displayed in the output panel, along with a list of documents referred.
Another method for the LLM to retrieve additional contextual information is to execute function tools, namely Function Calling. This use case is highly dependent on users' customization and it is highly recommended that the users implement their custom tools using Python for their specific use cases. In DSDL 5.2, we provide two example function tools: Search Splunk Event and Search VectorDB.
In the following example, the LLM is prompted with the following query:
Search Splunk events in index _internal and sourcetype splunkd from 510 minutes ago to 500 minutes ago containing keyword Socket error, look for similar records from vectorDB collection phoebe_log_message and tell me what might be going on.
On receiving this query, the LLM automatically executes the two available function tools to obtain results from Splunk index and VectorDB collections. The results are then used by the LLM to generate the final output. In this sample dashboard, the LLM's response is displayed on the output panel, along with the execution results of the two function tools.
In this version 5.2 release of Splunk DSDL, we provide the LLM-RAG functionalities demonstrated above, which are flexible and extendable for specific use cases. For more details, please check the documentation.
Note: This project is conducted during my rotation in the GTS Global Architect team, based on customer requirements. I'd like to acknowledge the work and time of a number of amazing colleagues in researching, developing and refining these ideas together. Thank you Philipp Drieger, Joshua Cowling, Paul Davies and Tatsu Murata.