Enable AI-powered search on your public websites.
Three steps to complete the task
Step#1 : Google custom search API configuration
Please read the section titled “Setting Up a CSE” from the url — https://thepythoncode.com/article/use-google-custom-search-engine-api-in-python
Following this process will provide you Google API Key and Google CSE ID.
The GoogleSearchAPIWrapper library from the Python package LangChain will use these two values.
To learn more, visit the website https://python.langchain.com/docs/integrations/tools/google_search.
Step#2 : Running private LLM (such as ) and vector store (Chroma DB)
Sample python code as below
def settings():
# Vectorstore
import faiss
from langchain.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chat_models.openai import ChatOpenAI
from langchain.utilities import GoogleSearchAPIWrapper
model_name = "sentence-transformers/all-MiniLM-L6-v2"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': False}
# Vectorstore
vectorstore = Chroma(
embedding_function=HuggingFaceEmbeddings(
model_name=model_name,
model_kwargs=model_kwargs,
encode_kwargs=encode_kwargs
), persist_directory="./chroma_db_oai"
)
# LLM
llm = ChatOpenAI(temperature=0, streaming=True)
# Search
search = GoogleSearchAPIWrapper()
# Initialize
web_retriever = WebResearchRetriever.from_llm(
vectorstore=vectorstore,
llm=llm,
search=search,
num_search_results=3
)
return web_retriever, llm
To learn more, visit the website — https://python.langchain.com/docs/modules/data_connection/retrievers/web_research
Step#3 : Running a Streamlit-based webapp to take a search query as an input and return the response
Benefits of the solution :
- LLM and Knowledgebase will operate in your managed infrastructure, preventing data loss.
- The Google custom search API handles data scraping.
- In addition to lexical searching (keyword matching), users have the ability to use semantic searching (a search with meaning).
- Increased observability is due to your choice of open source frameworks.
Limitations of the solution :
- Google’s custom search API has a daily cap for free usage; any hits beyond the cap will be charged.
- Google’s custom search API doesn’t work with intranet websites like Confluence and GitHub.
References :
If you enjoyed the article, please share and clap for it.