Build a private ChatGPT on an enterprise knowledge base.

Sujit Udhane
3 min readJun 11, 2023

1.Business scenario : Information is dispersed over various locations at the enterprise level, including corporate wikis, public websites, OneDrive, a personal email account, SharePoint, and a few more. Finding the appropriate information in a format that is easy for people to grasp is difficult due to the complexity at the business level, different contributors, and multiple versions of the document. This becomes a difficult undertaking, especially for those who are new to the company or for end users who need to get the pertinent information fast in order to go forward. Users either take longer to find the necessary information or take the appropriate action in this situation or give up.

2.Solution : A centralised knowledge store that gathers both structured and unstructured material and makes it accessible to end users in the form of questions and answers.

3.High level view of Solution:

High level diagram — Private GPT for Enterprise Knowledge Base

4.Prerequisites to try out the solution: Previous knowledge of a programming language. No matter if you have prior expertise in the AI/ML space.

5.GitHub URL of the solutionhttps://github.com/imartinez/privateGPT

6.Follow the steps mentioned in the ReadMe file (those are very well documented).

Step-1 : Bring all material into blob or file storage. Whenever you want to bring in new documents, this step needs to be performed. A data pipeline-type solution will be a nice idea to make this happen.

Step-2 : Data ingestion is used to load data into an AI-native embedded database.

If you follow the ReadMe file from the cloned Git repo, then there’s a step where you need to execute on the command line: python ingest.py.

Step-3 : Run the Python application to accept question prompts on the CLI.

If you follow the ReadMe file from the cloned Git repo, then there’s a step where you need to execute on the command line: python privateGPT.py.

You can now begin testing your own GPT against your company’s knowledge base there.

Step-4 (Optional):

The CLI approach will limit how many users may access your solution, making it unsuitable as a living strategy. Therefore, you must create a web application for your chat program in order to make it accessible over the internet.

This GitHub URL will enable you to do that — https://github.com/SamurAIGPT/privateGPT.

Finally, the outside world may mine your corporate knowledge store for valuable information.

7.It’s beneficial to be familiar with a few key GPT keywords.

Pre-trained Models: Pre-trained models, such as GPT or BERT, are trained on vast amounts of data to learn patterns and relationships within the data. These models capture general knowledge and can be fine-tuned for specific tasks. They excel at tasks like natural language processing, image recognition, or recommendation systems.

AI-Native Embedded Database: An AI-native embedded database is specifically designed to optimize data storage, retrieval, and processing for AI applications. It offers features such as efficient indexing, real-time data processing, scalability, and integration with AI frameworks.

LangChain is a framework for developing applications powered by language models.

SentenceTransformers is a Python framework for state-of-the-art sentence, text and image embeddings.This framework may be used to construct sentence and text embeddings for more than 100 different languages. Then, to identify sentences with a shared meaning, these embeddings can be compared, for instance, using cosine-similarity. For semantic textual similarity searches, semantic search, or paraphrase mining, this is helpful.

8. Advantages of the Solution

Prevent data loss. The network of the organisation will continue to house all sensitive data.

Choices available with trained models are many. The emphasis can change from creating or training the models to resolving business issues.

The privateGPT framework is well-liked and has more than 29k stars on GitHub.

Various models are simple to configure. I conducted the experiment using GPT4All, an open-source substitute for ChatGPT from OpenAI.

9.References

https://medium.com/@imicknl/how-to-create-a-private-chatgpt-with-your-own-data-15754e6378a1

In the following article, I’ll discuss some other well-liked frameworks for creating Chat GPT-style applications.

I hope the article was helpful to you. If so, give a round of applause.

--

--

Sujit Udhane

I am Lead Platform Architect, working in Pune-India. I have 20+ years of experience in technology, and last 10+ years working as an Architect.