應該很多人研究或部署LLM的最終目的都是想要多出一個至多個可靠的分身來幫我們工作對吧? 如果可以透過"Agent或是muti-Agents"來幫我們免費打工該有多好! 省時省力又省錢!
今天要介紹研究生和博士生的好幫手, “arxiv Agent”
首先先安裝arxiv:
!pip install arxiv
from langchain_community.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper
api_wrapper = WikipediaAPIWrapper(top_k_results=1, doc_content_chars_max=200)
wiki=WikipediaQueryRun(api_wrapper=api_wrapper)
wiki就會被實體化成一個dictionary:
所以輸入:
wiki.name
"wikipedia"
接下來就是要處理documentation Reference 來源的問題:
要從網路中載入資源要使用 "WebBaseLoader"。
然後用OpenAIEmbeddings庫轉換資料向量。
在用FAISS 當作向量資料庫儲存這些向量,之後再建立Retriver的時候要做檢所使用。
還有需要有處理資料切割的庫:
#載入必續的庫
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
#我們想要知道langsmith怎麼用所以載入langsmith的線上doc
loader=WebBaseLoader("https://docs.smith.langchain.com/")
docs=loader.load()
documents=RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=200).split_documents(docs)
vectordb=FAISS.from_documents(documents,OpenAIEmbeddings())
retriever=vectordb.as_retriever()
retriever
用 "create_retriever_tool”建立檢索"工具",之後要丟給Agent使用的。
這個funtion分別需要三個input: retriever(繼承自BaseRetriever), name:幫這個工具取個名字, description: 幫這個工具的功能描述一下。
def create_retriever_tool(
retriever: BaseRetriever,
name: str,
description: str,
*,
document_prompt: Optional[BasePromptTemplate] = None,
document_separator: str = "\n\n",
) -> Tool:
"""Create a tool to do retrieval of documents.
Args:
retriever: The retriever to use for the retrieval
name: The name for the tool. This will be passed to the language model,
so should be unique and somewhat descriptive.
description: The description for the tool. This will be passed to the language
model, so should be descriptive.
Returns:
Tool class to pass to an agent
"""
document_prompt = document_prompt or PromptTemplate.from_template("{page_content}")
func = partial(
_get_relevant_documents,
retriever=retriever,
document_prompt=document_prompt,
document_separator=document_separator,
)
afunc = partial(
_aget_relevant_documents,
retriever=retriever,
document_prompt=document_prompt,
document_separator=document_separator,
)
return Tool(
name=name,
description=description,
func=func,
coroutine=afunc,
args_schema=RetrieverInput,
)
開始建立retriver工具實體化:
from langchain.tools.retriever import create_retriever_tool
retriever_tool=create_retriever_tool(retriever,"langsmith_search",
"Search for information about LangSmith. For any questions about LangSmith, you must use this tool!")
所以當我們使用retriever_tool.name會輸出: "langsmith_search".
retriever_tool.description會輸出: “Search for information about LangSmith. For any questions about LangSmith, you must use this tool!”
用同樣的邏輯建立arxiv tool和 arxiv機器人:
## Arxiv Tool
from langchain_community.utilities import ArxivAPIWrapper
from langchain_community.tools import ArxivQueryRun
arxiv_wrapper=ArxivAPIWrapper(top_k_results=1, doc_content_chars_max=200)
arxiv=ArxivQueryRun(api_wrapper=arxiv_wrapper)
arxiv.name
#輸出 'arxiv'
所以剛剛到現在我們已經有了三個tools, 就可以建立一個名為"tools”的list去裝這些已經實體化的tools.
tools=[wiki,arxiv,retriever_tool]
tools
輸出: (略過, 有點長
再來建立環境, 使用dotenv 的load_dotenv讀取.env和其他比較secret的秘密api_key資料。
from dotenv import load_dotenv
load_dotenv()
import os
os.environ['OPENAI_API_KEY']=os.getenv("OPENAI_API_KEY")
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0) #這邊使用gpt-3.5 turbo
#記得要把自己的 OPENAI_API_KEY 輸入到 .env檔案丟到工作區, load_dotenv才讀取得到
#使用hub 拉取 PromptTemplate工具
from langchain import hub
# Get the prompt to use - you can modify this!
prompt = hub.pull("hwchase17/openai-functions-agent")
prompt.messages
建立Agent需要有llm實體和tools定義, 和經過prompt來建立如下:
### Agents
from langchain.agents import create_openai_tools_agent
agent=create_openai_tools_agent(llm,tools,prompt)
也要用AgentExecutor來命令執行正式Agent
## Agent Executer
from langchain.agents import AgentExecutor
agent_executor=AgentExecutor(agent=agent,tools=tools,verbose=True)
agent_executor
最後用invoke來正式輸出agent內容, 推理和查詢內容將不會再是黑盒子。
agent_executor.invoke({"input":"Tell me about Langsmith"})
直接透過arxiv的論文號碼查詢論文內容:
agent_executor.invoke({"input":"What's the paper 1605.08386 about?"})
我查詢下面的論文"Plan-and-Solve Prompt: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models”
輸出如下:
- Agent推理過程:
> Entering new AgentExecutor chain… Invoking: `arxiv` with `{‘query’: ‘2305.04091’}` Published: 2023–05–26 Title: Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models Authors: Lei Wang, Wanyu Xu, Yihuai Lan, Zhiqiang Hu, Yunshi Lan, Roy Ka-The paper with ID 2305.04091 is titled “Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models.” It is authored by Lei Wang, Wanyu Xu, Yihuai Lan, Zhiqiang Hu, Yunshi Lan, and Roy Ka. The paper focuses on improving zero-shot chain-of-thought reasoning by large language models. > Finished chain.
2. input 和 out整理:
{‘input’: “What’s the paper 2305.04091 about?”,
‘output’: ‘The paper with ID 2305.04091 is titled “Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models.” It is authored by Lei Wang, Wanyu Xu, Yihuai Lan, Zhiqiang Hu, Yunshi Lan, and Roy Ka. The paper focuses on improving zero-shot chain-of-thought reasoning by large language models.’}
這樣我們就學會了建立一個基礎的Agent系統。
一個Agent是可以由多個tools或APIs組成的,並且他在推理的過程會列出來, 加上Retrieval, 所以可以有效避免幻覺產生。