【JUKD-240】熟露出2 Chroma Ollama 搭建腹地RAG应用

你的位置：欧美童模写真 > 插吧插吧网 >

插吧插吧网

发布日期：2024-08-24 04:55 点击次数：206

【JUKD-240】熟露出2 Chroma Ollama 搭建腹地RAG应用

> 本文作家为 360 奇舞团前端开辟工程师【JUKD-240】熟露出2

本篇著述咱们将基于Ollama腹地运行大讲话模子（LLM），并衔尾ChormaDB、Langchain来成立一个微型的基于网页本体进行腹地问答的RAG应用。

成见先容

先通俗了解下这些术语：

LLM (A large language model) 是通过使用海量的文本数据集（册本、网站等）考研出来的，具备通用讲话汇注和生成的才调。固然它不错推理好多本体，但它们的常识仅限于特定时辰点之前用于考研的数据。

LangChain 是一个用于开辟由大型讲话模子（LLM）驱动的应用轨范的框架。提供了丰富的接口、组件、才调简化了构建LLM应用轨范的过程。

Ollama 是一个免费的开源框架，不错让大模子很容易的运行在腹地电脑上。

RAG（Retrieval Augmented Generation）是一种诈欺零碎数据增强 LLM 常识的时期，它通过从外部数据库赢适合前或相关坎坷文信息，并在苦求大型讲话模子（LLM）生成反馈时呈现给它，从而管束了生成不正确或误导性信息的问题。

责任历程图解如下：

图片

爱色岛

基于上述RAG身手，接下来咱们将使用代码完成它。

发轫搭建

1. 依据Ollama使用指南完成大模子的腹地下载和的运行。

# LLMollama pull llama3# Embedding Modelollama pull nomic-embed-text

2. 装配langchain、langchain-community、bs4

pip install langchain langchain-community bs4

3. 开动化langchain提供的Ollama对象

from langchain_community.llms import Ollamafrom langchain.callbacks.manager import CallbackManagerfrom langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler# 1. 开动化llm，让其流式输出llm = Ollama(model='llama3'， temperature=0.1， top_p=0.4， callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]) )

temperature截止文本生成的创造性，为0时反馈是可展望，恒久选拔下一个最可能的单词，这关于事实和准确性额外紧迫的谜底口舌常有效的。为 1时生成文本会选拔更多的单词，会产生更具创意但不能能展望的谜底。

top_p 或中枢采样决定了生成时要筹商若干可能的单词。高top_p值意味着模子会筹商更多可能的单词，致使是可能性较低的单词，从而使生成的文本愈加万般化。

较低的temperature和较高的top_p，不错产生具有创意的连贯笔墨。由于temperature较低，谜底时常具有逻辑性和连贯性，但由于top_p较高，谜底仍然具有丰富的词汇和不雅点。比拟顺应生成信息类文本，本体昭彰且能蛊惑读者。

较高的temperature和较低的top_p，可能会把单词以难以展望的面容组合在全部。生成的文本创意高，会出现出东说念主料想的结果，顺应创作。

4. 赢得RAG检索本体并分块

#`BeautifulSoup'贯通网页本体：按照标签、类名、ID 等面容来定位和索求你需要的本体import bs4 #Load HTML pages using `urllib` and parse them with `BeautifulSoup'from langchain_community.document_loaders import WebBaseLoader#文天职割from langchain_text_splitters import RecursiveCharacterTextSplitterloader = WebBaseLoader(    web_paths=('https://vuejs.org/guide/introduction.html#html'，)，    bs_kwargs=dict(        parse_only=bs4.SoupStrainer(            class_=('content'，)，            # id=('article-root'，)        )    )，)docs = loader.load()# chunk_overlap：分块的重迭部分text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000， chunk_overlap=200)splits = text_splitter.split_documents(docs)

chunk_overlap：分块的重迭部分，重迭有助于裁汰将语句与与其相关的紧迫坎坷文分开的可能性。chunk_size：分块的大小，合理的分词配置会栽培RAG的遵循

本体基于腹地的词镶嵌模子 nomic-embed-text 镶嵌向量数据库中# 向量镶嵌 ::: conda install onnxruntime -c conda-forgefrom langchain_community.vectorstores import Chroma# 有好多镶嵌模子from langchain_community.embeddings import OllamaEmbeddings# 基于ollama运行镶嵌模子 nomic-embed-text ：A high-performing open embedding model with a large token context window.vectorstore = Chroma.from_documents(documents=splits， embedding=OllamaEmbeddings(model='nomic-embed-text'))# 相似搜索# vectorstore.similarity_search('vue')

此处的镶嵌模子也不错使用其他的比如llama3、mistral，关联词在腹地运行太慢了，它们和nomic-embed-text 相同不赈济华文的词镶嵌。若是念念试试成立一个华文的文档库，不错试试 herald/dmeta-embedding-zh词镶嵌的模子，赈济华文。

ollama pull herald/dmeta-embedding-zh:latest

配置Prompt表率输出from langchain_core.prompts import PromptTemplateprompt = PromptTemplate( input_variables=['context'， 'question']， template= '''You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. you don't know the answer， just say you don't know without any explanation Question: {question} Context: {context} Answer:'''，)基于langchain达成检索问答

from langchain.chains import RetrievalQA# 向量数据库检索器retriever = vectorstore.as_retriever()qa_chain = RetrievalQA.from_chain_type(    llm，    retriever=retriever，    chain_type_kwargs={'prompt': prompt})# what is Composition API？question = 'what is vue?'result = qa_chain.invoke({'query': question})# output# I think I know this one! Based on the context， # Vue is a JavaScript framework for building user interfaces # that builds on top of standard HTML， CSS， and JavaScript. # It provides a declarative way to use Vue primarily in # low-complexity scenarios or for building full applications with # Composition API + Single-File Components.

若是我问的问题与文档无关它的回复是奈何呢？

question = 'what is react?'result = qa_chain.invoke({'query': question})

最终推论后输出了I don't know.。

构建用户界面

Gradio是一个用于构建交互式机器学习界面的Python库。Gradio使用额外通俗。你只需要界说一个有输入和输出的函数，然后Gradio将自动为你生成一个界面。用户不错在界面中输入数据，然后不雅察模子的输出结果。

整合上述代码，构建可交互的UI：

import gradio as grfrom langchain_community.llms import Ollamafrom langchain.callbacks.manager import CallbackManagerfrom langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandlerfrom langchain_community.document_loaders import WebBaseLoaderfrom langchain_text_splitters import RecursiveCharacterTextSplitterfrom langchain_community.vectorstores import Chromafrom langchain_community.embeddings import OllamaEmbeddingsfrom langchain.chains import RetrievalQAfrom langchain_core.prompts import PromptTemplatedef init_ollama_llm(model， temperature， top_p):    return Ollama(model=model，                  temperature=temperature，                  top_p=top_p，                  callback_manager=CallbackManager([StreamingStdOutCallbackHandler()])                  )def content_web(url):    loader = WebBaseLoader(        web_paths=(url，)，    )    docs = loader.load()    # chunk_overlap：分块的重迭部分，重迭有助于裁汰将语句与与其相关的紧迫坎坷文分开的可能性，    # 配置了chunk_overlap遵循会更好    # 合理的分词会栽培RAG的遵循    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000， chunk_overlap=200)    splits = text_splitter.split_documents(docs)    return splitsdef chroma_retriever_store_content(splits):    # 基于ollama运行镶嵌模子 nomic-embed-text ：A high-performing open embedding model with a large token context window.    vectorstore = Chroma.from_documents(documents=splits，                                        embedding=OllamaEmbeddings(model='nomic-embed-text'))    return vectorstore.as_retriever()def rag_prompt():    return PromptTemplate(        input_variables=['context'， 'question']，        template=        '''You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the         question. you don't know the answer， just say you don't know         without any explanation Question: {question} Context: {context} Answer:'''，    )def ollama_rag_chroma_web_content(web_url， question，temperature，top_p):    llm = init_ollama_llm('llama3'， temperature， top_p)    splits = content_web(web_url)    retriever = chroma_retriever_store_content(splits)    qa_chain = RetrievalQA.from_chain_type(llm， retriever=retriever， chain_type_kwargs={'prompt': rag_prompt()})    return qa_chain.invoke({'query': question})['result']demo = gr.Interface(    fn=ollama_rag_chroma_web_content，    inputs=[gr.Textbox(label='web_url'，value='https://vuejs.org/guide/introduction.html'，info='爬取本体的网页地址')，            'text'，            gr.Slider(0， 1，step=0.1)，            gr.Slider(0， 1，step=0.1)]，    outputs='text'，    title='Ollama+RAG Example'，    description='输中计页的URL，然后发问， 赢得谜底')demo.launch()

运行后会输出网页地址Running on local URL: http://127.0.0.1:7860，掀开后遵循如下：

图片

参考

https://github.com/ollama/ollama

https://python.langchain.com/

https://partee.io/2022/08/11/vector-embeddings/

https://jalammar.github.io/illustrated-word2vec/

- END -【JUKD-240】熟露出2

本站仅提供存储作事，总共本体均由用户发布，如发现存害或侵权本体，请点击举报。

上一篇：巨乳女優辽宁建昌县洪灾中仍有州里失联，说念路遭水毁加大接济难度