当前位置：首页 > news >正文

沃尔玛商城wordpress分类目录seo

news 2026/4/15 9:20:34

沃尔玛商城,wordpress分类目录seo,怎么创办网站,专业平台建设微调Llama VS GPT-4o 别忘了关注作者#xff0c;关注后您会变得更聪明#xff0c;不关注就只能靠颜值了 ^_^。一位年轻的儿科医生与一位经验丰富的医师#xff0c;谁更能有效治疗婴儿的咳嗽#xff1f; 两者都具备治疗咳嗽的能力#xff0c;但儿科医生由于专攻儿童医学… 微调Llama VS GPT-4o 别忘了关注作者关注后您会变得更聪明不关注就只能靠颜值了 ^_^。一位年轻的儿科医生与一位经验丰富的医师谁更能有效治疗婴儿的咳嗽两者都具备治疗咳嗽的能力但儿科医生由于专攻儿童医学或许在诊断婴儿疾病方面更具优势。这也正如小模型在某些特定任务上的表现往往经过微调后能够比大型模型更为出色尽管大型模型号称可以处理任何问题。最近我面临了一个必须在两者之间做出选择的场景。我正在开发一个查询路由系统用于将用户的请求引导至合适的部门然后由人工继续对话。从技术角度看这是一个文本分类任务。虽然GPT-4o及其小版本在这类任务上表现优秀但它的使用成本较高且由于是封闭模型我无法在自己的环境中进行微调。尽管OpenAI提供了微调服务但对我来说成本仍然过于昂贵。每百万个Token的训练费用为25美元而我的训练数据量很快就达到了数百万个Token。再加上微调后的模型使用费用比普通模型高50%这对我的小型项目而言预算无疑是无法承受的。因此我必须寻找一个替代方案。相比之下开源模型在处理分类任务时同样表现不俗且训练成本相对较低尤其是在使用GPU时。经过慎重考虑我决定转向小型模型。小型LLM通过微调可以在有限的预算下实现令人满意的效果这是我目前最为理想的选择。小型模型可以在普通硬件上运行微调所需的GPU也不必过于昂贵。更为重要的是小模型的训练和推理速度远快于大型LLM。经过一番调研我挑选了几款候选模型——Phi3.5、DistillBERT和GPT-Neo但最终选择了Meta Llama 3.2的1B模型。这个选择并非完全理性部分原因可能是最近关于这个模型的讨论较多。不过实践出真知我决定通过实测来检验效果。在接下来的部分我将分享我微调Llama 3.2–1B指令模型与使用少样本提示的GPT-4o的对比结果。微调Llama 3.2 1B模型免费实现微调微调模型的确可能需要较高的成本但如果选择合适的策略还是能够大幅降低开支。针对我的情况我采用了参数优化的微调PEFT策略而不是完全参数微调。完全微调会重新训练模型中的全部1B参数成本太高且可能导致“灾难性遗忘”即模型丢失预训练时学到的部分知识。而PEFT策略则聚焦于仅微调部分参数大大减少了时间和资源的消耗。其中“低秩适应”LORA技术是目前较为流行的微调方法。LORA允许我们仅对某些特定层的部分参数进行微调这样的训练不仅高效且效果明显。此外通过模型量化我们可以将模型的参数压缩为float16甚至更小的格式这不仅减少了内存消耗还能提高计算速度。当然精度可能会有所下降但对于我的任务来说这一折衷是可以接受的。接下来我将在免费的Colab和Kaggle平台上进行了微调。这些平台提供的GPU资源虽然有限但对于像我这样的小模型训练任务已经足够,关键它们免费。 Llama-3.2微调与GPT-4o少样本提示的对比微调Llama 3.2 1B模型的过程相对简单。我参考了Unsloth提供的Colab笔记本并做了部分修改。原笔记本微调的是3B参数的模型而我将其改为1B参数的Llama-3.2–Instruct因为我想测试较小模型在分类任务上的表现。接着我将数据集替换为我自己的数据用于训练。 # Beforefrom unsloth.chat_templates import standardize_sharegptdataset standardize_sharegpt(dataset)dataset dataset.map(formatting_prompts_func, batched True,)# Afterfrom datasets import Datasetdataset Dataset.from_json(/content/insurance_training_data.json)dataset dataset.map(formatting_prompts_func, batched True,) 最稳妥的做法是选择一个与笔记本初始设计相符的数据集例如下面的这个。 {conversations: [{role: user, content: user_query}{role: assistant, content: department}]} 到这里为止这两处调整已经足够让你用自己的数据微调模型了。评估微调后的模型接下来是关键的一步评估测试。评估LLM是一项广泛且富有挑战性的工作也是LLM开发中最为重要的技能之一。我将再出一篇文章在其中详细讨论过如何评估LLM应用别忘了关注作者关注后您会变得更聪明不关注就只能靠颜值了 ^_^。不过为了简洁起见这次我会采用经典的混淆矩阵方式进行评估。只需在笔记本的末尾添加下面的代码即可。 from langchain.prompts import FewShotPromptTemplatefrom langchain_openai import ChatOpenAIfrom langchain_core.prompts import PromptTemplatefrom pydantic import BaseModel# 1. A function to generate response with the fine-tuned modeldef generate_response(user_query):# Enable faster inference for the language modelFastLanguageModel.for_inference(model)# Define the message templatemessages [{role: system, content: You are a helpful assistant who can route the following query to the relevant department.},{role: user, content: user_query},]# Apply the chat template to tokenize the input and prepare for generationtokenized_input tokenizer.apply_chat_template(messages,tokenizeTrue,add_generation_promptTrue, # Required for text generationreturn_tensorspt).to(cuda) # Send input to the GPU# Generate a response using the modelgenerated_output model.generate(input_idstokenized_input,max_new_tokens64,use_cacheTrue, # Enable cache for faster generationtemperature1.5,min_p0.1)# Decode the generated tokens into human-readable textdecoded_response tokenizer.batch_decode(generated_output, skip_special_tokensTrue)[0]# Extract the assistants response (after system/user text)assistant_response decoded_response.split(\n\n)[-1]return assistant_response# 2. Generate Responeses with OpenAI GPT-4o# Define the prompt template for the exampleexample_prompt_template PromptTemplate.from_template(User Query: {user_query}\n{department})# Initialize OpenAI LLM (ensure the OPENAI_API_KEY environment variable is set)llm ChatOpenAI(temperature0, modelgpt-4o)# Define few-shot examplesexamples [{user_query: I recently had an accident and need to file a claim for my vehicle. Can you guide me through the process?, department: Claims},...]# Create a few-shot prompt templatefew_shot_prompt_template FewShotPromptTemplate(examplesexamples,example_promptexample_prompt_template,prefixYou are an intelligent assistant for an insurance company. Your task is to route customer queries to the appropriate department.,suffixUser Query: {user_query},input_variables[user_query])# Define the department model to structure the outputclass Department(BaseModel):department: str# Function to predict the appropriate department based on user querydef predict_department(user_query):# Wrap LLM with structured outputstructured_llm llm.with_structured_output(Department)# Create the chain for generating predictionsprediction_chain few_shot_prompt_template | structured_llm# Invoke the chain with the user query to get the departmentresult prediction_chain.invoke(user_query)return result.department# 3. Read your evaluation dataset and predict departmentsimport jsonwith open(/content/insurance_bot_evaluation_data (1).json, r) as f:eval_data json.load(f)for ix, item in enumerate(eval_data):print(f{ix1} of {len(eval_data)})item[open_ai_response] generate_response(item[user_query])item[llama_response] item[open_ai_response]# 4. Compute the precision, recall, accuracy, and F1 scores for the predictions.# 4.1 Using Open AIfrom sklearn.metrics import precision_score, recall_score, accuracy_score, f1_scoretrue_labels [item[department] for item in eval_data]predicted_labels_openai [item[open_ai_response] for item in eval_data]# Calculate the scores for open_ai_responseprecision_openai precision_score(true_labels, predicted_labels_openai, averageweighted)recall_openai recall_score(true_labels, predicted_labels_openai, averageweighted)accuracy_openai accuracy_score(true_labels, predicted_labels_openai)f1_openai f1_score(true_labels, predicted_labels_openai, averageweighted)print(OpenAI Response Scores:)print(Precision:, precision_openai)print(Recall:, recall_openai)print(Accuracy:, accuracy_openai)print(F1 Score:, f1_openai)# 4.2 Using Fine-tuned Llama 3.2 1B Instructtrue_labels [item[department] for item in eval_data]predicted_labels_llama [item[llama_response] for item in eval_data]# Calculate the scores for llama_responseprecision_llama precision_score(true_labels, predicted_labels_llama, averageweighted, zero_division0)recall_llama recall_score(true_labels, predicted_labels_llama, averageweighted, zero_division0)accuracy_llama accuracy_score(true_labels, predicted_labels_llama)f1_llama f1_score(true_labels, predicted_labels_llama, averageweighted, zero_division0)print(Llama Response Scores:)print(Precision:, precision_llama)print(Recall:, recall_llama)print(Accuracy:, accuracy_llama)print(F1 Score:, f1_llama) 以上代码非常清晰明了。我们编写了一个函数利用微调后的模型进行部门预测。同时也为OpenAI GPT-4o构建了一个类似的函数。接着我们使用这些函数对评估数据集生成预测结果。评估数据集中包含了预期的分类现在我们也获得了模型生成的分类这为接下来的指标计算提供了基础。接下来我们将进行这些计算。以下是结果 OpenAI Response Scores:Precision: 0.9Recall: 0.75Accuracy: 0.75F1 Score: 0.818Llama Response Scores:Precision: 0.88Recall: 0.73Accuracy: 0.79F1 Score: 0.798 结果显示微调后的模型表现几乎接近GPT-4o。对于一个只有1B参数的小型模型来说这已经相当令人满意了。尽管GPT-4o的表现确实更好但差距非常微小。此外如果在少样本提示中提供更多示例GPT-4o的结果可能会进一步提升。不过由于我的示例有时比较长甚至包括几段文字这会显著增加成本毕竟OpenAI是按输入Token计费的。总结我现在对小型LLM非常认可。它们运行速度快成本低而且在大多数使用场景中都能满足需求尤其是在不进行微调的情况下。在这篇文章中我讨论了如何微调Llama 3.2 1B模型。该模型可以在较为普通的硬件上运行而且微调成本几乎为零。我当前的任务是文本分类。当然这并不意味着小型模型能够全面超越像GPT-4o这样的巨型模型甚至也不一定能胜过Meta Llama的8B、11B或90B参数的模型。较大的模型拥有更强的多语言理解能力、视觉指令处理能力以及更加广泛的世界知识。我的看法是如果这些“超级能力”不是你当前的需求为什么不选择一个小型LLM呢”

查看全文

http://www.hkea.cn/news/14272714/