win10部署本地大模型langchain+ollama

一、环境

windows10、Python 3.9.18、langchain==0.1.9

二、ollama下载

Download Ollama on Windows

0.1.33版本链接https://objects.githubusercontent.com/github-production-release-asset-2e65be/658928958/35e38c8d-b7f6-48ed-8a9c-f053d04b01a9?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAVCODYLSA53PQK4ZA%2F20240503%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240503T004753Z&X-Amz-Expires=300&X-Amz-Signature=ead8e1666fde6b2f23c86dec1c46ef2759fa6f05f60de5a506103db53e03478f&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=658928958&response-content-disposition=attachment%3B%20filename%3DOllamaSetup.exe&response-content-type=application%2Foctet-stream

三、ollama安装及迁移到其他盘

1、ollama安装及模型下载

直接运行下载的exe文件即可。安装完成后win+r打开cmd，输入ollama，有显示以下的内容就是安装成功

2、模型下载

详细见library，这里执行ollama run qwen:7b下载了阿里的，4.2G

3、默认位置

ollama及其下载的模型默认在c盘，会占用较大的空间。

ollama默认安装位置在C:\Users\XX\AppData\Local\Programs\Ollama

下载的模型默然安装在C:\Users\XX\.ollama

4、迁移操作

（1）将C:\Users\XX\AppData\Local\Programs\Ollama这个文件夹移动到其他盘（D:\Ollama）

（2）修改环境变量的用户变量，将PATH变量中的C:\Users\XX\AppData\Local\Programs\Ollama修改为步骤（1）的位置

（3）在系统变量中新建一个OLLAMA_MODELS的变量，位置根据其他盘的存储空间去设置，比如在ollama的文件夹下（D:\Ollama\models）

（4）将以下2个文件迁移到新目录

四、ollama的独立使用

1、打印版本号：ollama -v

2、打印已下载的模型：ollama list

3、启动模型：ollama run qwen:7b；（没有的话会先下载）

4、退出会话：crtl+d

5、关闭ollama：任务栏小图标quit

五、ollama+langchain搭建应用服务

1、启动ollama服务

在cmd执行ollama serve，默认占用端口：11434

2、langchain中的封装

miniconda3\envs\py39\Lib\site-packages\langchain_community\llms\ollama.py

class _OllamaCommon(BaseLanguageModel):
    base_url: str = "http://localhost:11434"
    """Base url the model is hosted under."""

    model: str = "llama2"
    """Model name to use."""

    mirostat: Optional[int] = None
    """Enable Mirostat sampling for controlling perplexity.
    (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)"""

    mirostat_eta: Optional[float] = None
    """Influences how quickly the algorithm responds to feedback
    from the generated text. A lower learning rate will result in
    slower adjustments, while a higher learning rate will make
    the algorithm more responsive. (Default: 0.1)"""

    mirostat_tau: Optional[float] = None
    """Controls the balance between coherence and diversity
    of the output. A lower value will result in more focused and
    coherent text. (Default: 5.0)"""

    num_ctx: Optional[int] = None
    """Sets the size of the context window used to generate the
    next token. (Default: 2048)	"""

    num_gpu: Optional[int] = None
    """The number of GPUs to use. On macOS it defaults to 1 to
    enable metal support, 0 to disable."""

    num_thread: Optional[int] = None
    """Sets the number of threads to use during computation.
    By default, Ollama will detect this for optimal performance.
    It is recommended to set this value to the number of physical
    CPU cores your system has (as opposed to the logical number of cores)."""

    num_predict: Optional[int] = None
    """Maximum number of tokens to predict when generating text.
    (Default: 128, -1 = infinite generation, -2 = fill context)"""

    repeat_last_n: Optional[int] = None
    """Sets how far back for the model to look back to prevent
    repetition. (Default: 64, 0 = disabled, -1 = num_ctx)"""

    repeat_penalty: Optional[float] = None
    """Sets how strongly to penalize repetitions. A higher value (e.g., 1.5)
    will penalize repetitions more strongly, while a lower value (e.g., 0.9)
    will be more lenient. (Default: 1.1)"""

    temperature: Optional[float] = None
    """The temperature of the model. Increasing the temperature will
    make the model answer more creatively. (Default: 0.8)"""

    stop: Optional[List[str]] = None
    """Sets the stop tokens to use."""

    tfs_z: Optional[float] = None
    """Tail free sampling is used to reduce the impact of less probable
    tokens from the output. A higher value (e.g., 2.0) will reduce the
    impact more, while a value of 1.0 disables this setting. (default: 1)"""

    top_k: Optional[int] = None
    """Reduces the probability of generating nonsense. A higher value (e.g. 100)
    will give more diverse answers, while a lower value (e.g. 10)
    will be more conservative. (Default: 40)"""

    top_p: Optional[float] = None
    """Works together with top-k. A higher value (e.g., 0.95) will lead
    to more diverse text, while a lower value (e.g., 0.5) will
    generate more focused and conservative text. (Default: 0.9)"""

    system: Optional[str] = None
    """system prompt (overrides what is defined in the Modelfile)"""

    template: Optional[str] = None
    """full prompt or prompt template (overrides what is defined in the Modelfile)"""

    format: Optional[str] = None
    """Specify the format of the output (e.g., json)"""

    timeout: Optional[int] = None
    """Timeout for the request stream"""

    headers: Optional[dict] = None
    """Additional headers to pass to endpoint (e.g. Authorization, Referer).
    This is useful when Ollama is hosted on cloud services that require
    tokens for authentication.
    """

3、示例

（1）基于给定的信息回答用户的问题

这里我们给出的信息是【"小明是一位科学家", "小明在balala地区工作"】，要求大模型介绍小明

（2）“阅读”一份文档后回答用户关于文档内容的问题

from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder
from langchain_community.chat_models import ChatOllama
from langchain_community.embeddings import OllamaEmbeddings

model="qwen:7b"
llm = ChatOllama(model=model, temperature=0)
loader = PyPDFLoader('../file/test.pdf')
docs = loader.load()
embeddings = OllamaEmbeddings(model=model)
text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)[:10]#解析前10页
vector = FAISS.from_documents(documents, embeddings)
# vector = FAISS.from_texts(["小明是一位科学家", "小明在balala地区工作"],embeddings)
retriever = vector.as_retriever()
prompt1 = ChatPromptTemplate.from_messages([
    MessagesPlaceholder(variable_name="chat_history"),
    ("user", "{input}"),
    ("user", "在给定上述对话的情况下，生成一个要查找的搜索查询，以获取与对话相关的信息")
])
retriever_chain = create_history_aware_retriever(llm, retriever, prompt1)
prompt2 = ChatPromptTemplate.from_messages([
    ("system", "根据文章内容回答问题:{context}"),
    MessagesPlaceholder(variable_name="chat_history"),
    ("user", "{input}"),
])
document_chain = create_stuff_documents_chain(llm, prompt2)
retrieval_chain = create_retrieval_chain(retriever_chain, document_chain)
chat_history = []
while True:
    question = input('用户：')
    response = retrieval_chain.invoke({
        "chat_history": chat_history,
        "input": question
    })
    answer = response["answer"]
    chat_history.extend([question, answer])
    print('AI：', answer)

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.mfbz.cn/a/591202.html

如若内容造成侵权/违法违规/事实不符，请联系我们进行投诉反馈qq邮箱809451989@qq.com，一经查实，立即删除！