本文详解如何用LangChain、ChromaDB和TinyLlama构建一个检索增强生成(RAG)系统,让Flutter文档问答更准确、减少幻觉,并分享实操中的教训与优化方向。
直接问大模型(比如ChatGPT)Flutter相关问题,可能会遇到知识过时、文档更新快、幻觉等问题。RAG(检索增强生成)通过先检索相关文档片段,再让大模型基于这些片段生成答案,有效解决上述痛点。架构很简单:用户问题→嵌入向量→向量数据库搜索→Top-K相关片段→大模型→答案。
构建一个能回答Flutter相关问题的系统,覆盖widget、导航、状态管理等主题,检索官方文档后由LLM生成回答,而非依赖模型内部的知识。
整个流水线如下:
Flutter文档(网页) → WebBaseLoader(爬取) → 文本清洗 → 文本分割 → 嵌入生成 → ChromaDB存储 → 检索器(Top-3相似搜索) → Prompt构建 → TinyLlama生成答案

使用LangChain的WebBaseLoader爬取Flutter官方文档的精选页面,包括安装、codelab、UI widgets、cookbook和状态管理等。
from langchain_community.document_loaders import WebBaseLoader
links = [
"https://docs.flutter.dev/get-started/install",
"https://docs.flutter.dev/get-started/codelab",
"https://docs.flutter.dev/ui",
"https://docs.flutter.dev/ui/widgets",
"https://docs.flutter.dev/cookbook",
"https://docs.flutter.dev/cookbook/navigation/navigation-basics",
"https://docs.flutter.dev/cookbook/networking/fetch-data",
"https://docs.flutter.dev/data-and-backend/state-mgmt/intro",
]
loader = WebBaseLoader(web_paths=links)
docs = loader.load()
print(f"loaded {len(docs)} pages")
原始HTML文本含有大量空白和换行,需用正则清理。这一步很关键:脏数据导致脏chunk,进而影响检索质量。

使用RecursiveCharacterTextSplitter,chunk_size设为1000字符,chunk_overlap设为200字符。这样的好处是每个chunk语义集中,同时跨边界信息不丢失。
from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
separators=["\n\n", "\n", ". ", " ", ""]
)
splits = text_splitter.split_documents(clean_docs)
print(f"got {len(splits)} chunks")
选用轻量级模型all-MiniLM-L6-v2,无需GPU即可运行,生成的嵌入足以支撑语义检索。
from langchain_huggingface import HuggingFaceEmbeddings
embedding = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
使用ChromaDB存储嵌入,配置检索器返回Top-3最相关chunk。
from langchain_chroma import Chroma
vectorstore = Chroma.from_documents(documents=splits, embedding=embedding)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
print("vector store ready!")
采用TinyLlama-1.1B-Chat,用float16降低显存占用,设置do_sample=False保证答案确定性。
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from langchain_huggingface import HuggingFacePipeline
import torch
model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto"
)
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=300,
do_sample=False,
return_full_text=False
)
llm = HuggingFacePipeline(pipeline=pipe)
print("model loaded - good to go!")
把检索器和LLM通过Prompt模板串联起来:
from langchain_core.prompts import PromptTemplate
prompt = PromptTemplate.from_template("""<|system|>
You are a Flutter documentation assistant. Answer ONLY using the context below. Be concise.
</s>
<|user|>
Context:
{context}
Question: {question}
</s>
<|assistant|>""")
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
问“What is Flutter and what can you build with it?”,模型基于检索到的文档正确回答。
float16。
本文完成了RAG全流程:从加载、清洗、分块、嵌入、存储到检索和生成。这套架构不仅适用于Flutter文档,也可迁移到任何文档或知识库的问答系统。
免费获取企业 AI 成熟度诊断报告,发现转型机会
关注公众号

扫码关注,获取最新 AI 资讯
3 步完成企业诊断,获取专属转型建议
已有 200+ 企业完成诊断