Enable AI-powered search on your public websites.

Sujit Udhane
2 min readAug 28, 2023

Three steps to complete the task

Step#1 : Google custom search API configuration

Please read the section titled “Setting Up a CSE” from the url — https://thepythoncode.com/article/use-google-custom-search-engine-api-in-python

Following this process will provide you Google API Key and Google CSE ID.

The GoogleSearchAPIWrapper library from the Python package LangChain will use these two values.

To learn more, visit the website https://python.langchain.com/docs/integrations/tools/google_search.

Step#2 : Running private LLM (such as ) and vector store (Chroma DB)

Sample python code as below

def settings():

# Vectorstore
import faiss
from langchain.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chat_models.openai import ChatOpenAI
from langchain.utilities import GoogleSearchAPIWrapper

model_name = "sentence-transformers/all-MiniLM-L6-v2"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': False}

# Vectorstore
vectorstore = Chroma(
embedding_function=HuggingFaceEmbeddings(
model_name=model_name,
model_kwargs=model_kwargs,
encode_kwargs=encode_kwargs
), persist_directory="./chroma_db_oai"
)

# LLM
llm = ChatOpenAI(temperature=0, streaming=True)

# Search
search = GoogleSearchAPIWrapper()

# Initialize
web_retriever = WebResearchRetriever.from_llm(
vectorstore=vectorstore,
llm=llm,
search=search,
num_search_results=3
)

return web_retriever, llm

To learn more, visit the website — https://python.langchain.com/docs/modules/data_connection/retrievers/web_research

Step#3 : Running a Streamlit-based webapp to take a search query as an input and return the response

Benefits of the solution :

  • LLM and Knowledgebase will operate in your managed infrastructure, preventing data loss.
  • The Google custom search API handles data scraping.
  • In addition to lexical searching (keyword matching), users have the ability to use semantic searching (a search with meaning).
  • Increased observability is due to your choice of open source frameworks.

Limitations of the solution :

  • Google’s custom search API has a daily cap for free usage; any hits beyond the cap will be charged.
  • Google’s custom search API doesn’t work with intranet websites like Confluence and GitHub.

References :

  1. https://blog.langchain.dev/automating-web-research/
  2. https://github.com/langchain-ai/web-explorer.git

If you enjoyed the article, please share and clap for it.

--

--

Sujit Udhane

I am Lead Platform Architect, working in Pune-India. I have 20+ years of experience in technology, and last 10+ years working as an Architect.