当前位置：首页 > news >正文

南京当的网站网站设计模版免费下载

news 2026/4/16 18:52:49

南京当的网站,网站设计模版免费下载,济南招聘网,网站设计风格有几种欢迎关注我的CSDN#xff1a;https://spike.blog.csdn.net/ 本文地址#xff1a;https://spike.blog.csdn.net/article/details/144304351 免责声明#xff1a;本文来源于个人知识与公开资料#xff0c;仅用于学术交流#xff0c;欢迎讨论#xff0c;不支持转载。 LLaVA-… 欢迎关注我的CSDNhttps://spike.blog.csdn.net/ 本文地址https://spike.blog.csdn.net/article/details/144304351 免责声明本文来源于个人知识与公开资料仅用于学术交流欢迎讨论不支持转载。 LLaVA-CoT是思维链(Chain-of-Thought, CoT) 的视觉语言模型通过自主的多阶段推理提升系统化和结构化推理能力实现深度思考包含总结摘要(SUMMARY)、视觉解释(CAPTION)、逻辑推理(REASONING)、结论生成(CONCLUSION) 等 4 个阶段提出推理时阶段级束搜索(Inference-Time Stage-Level Beam Search) 方法以实现有效的推理时扩展结果表明在多模态推理基准测试中超越基础模型和其他更大、甚至是闭源的模型。 Paper: LLaVA-CoT: Let Vision Language Models Reason Step-by-Step GitHub: https://github.com/PKU-YuanGroup/LLaVA-CoT git clone https://github.com/PKU-YuanGroup/LLaVA-CoT.git准备模型(Llama-3.2V-11B-cot)与训练数据(LLaVA-CoT-100k)以及参考模型(Llama-3.2-11B-Vision-Instruct) Llama-3.2V-11B-cot40GLLaVA-CoT-100k159GLlama-3.2-11B-Vision-Instruct40G cd [your path]/huggingface/# LLaVA-CoT 模型与数据 huggingface-cli download --token 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 Xkev/Llama-3.2V-11B-cot --local-dir Xkev/Llama-3.2V-11B-cot huggingface-cli download --repo-type dataset --token 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 Xkev/LLaVA-CoT-100k --local-dir Xkev/LLaVA-CoT-100k# HuggingFace 需要权限 # huggingface-cli download --token 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 meta-llama/Llama-3.2-11B-Vision-Instruct --local-dir meta-llama/Llama-3.2-11B-Vision-Instruct # ModelScope 模型 modelscope download --model fireicewolf/Llama-3.2-11B-Vision-Instruct --local_dir meta-llama/Llama-3.2-11B-Vision-InstructMeta 的 Llama-3.2-11B-Vision-Instruct 需要权限建议下载 ModelScope 版本即 fireicewolf/Llama-3.2-11B-Vision-Instruct 训练数据处理参考 HuggingFace - Xkev/LLaVA-CoT-100k cat image.zip.part-* image.zip # 速度较慢 unzip image.zip训练集样式 {id: f2813de1-767c-4e27-8ba8-e04fd383209d,image: sqa/train/20839/image.png,conversations: [{from: human,value: Which country is highlighted?\nContext: N/A\nOptions: (A) Solomon Islands (B) Nauru (C) Vanuatu (D) Fiji}, {from: gpt,value: SUMMARY I will determine the highlighted country by examining its location on the map and comparing it with the given options. I will then conclude if this country has any territorial disputes based on the supplied context. /SUMMARY\n\nCAPTION The image depicts a map highlighting a country in the South Pacific region. The highlighted area is in a boxed section between Australia and New Zealand, directly addressing the region known to include several Pacific island countries. /CAPTION\n\nREASONING The highlighted country is within the Pacific Islands region. Based on its position relative to neighboring larger landmasses like Australia and nearby countries such as Papua New Guinea and New Zealand, the highlighted country aligns with the location of Vanuatu. According to the context, Vanuatu has a territorial dispute over Matthew and Hunter Islands, claimed by both Vanuatu and France. Therefore, the presence of a dashed box labeled \Disputed island\ suggests the inclusion of this dispute in the overview of the countrys territories. /REASONING\n\nCONCLUSION The answer is C. /CONCLUSION}] }即我将通过检查地图上的位置并与给定选项进行比较来确定高亮显示的国家。然后我将根据提供的上下文得出这个国家是否有任何领土争端的结论。图片显示了一张地图高亮显示了南太平洋地区的一个国家。高亮区域位于澳大利亚和新西兰之间的一个框选部分直接指向已知包含几个太平洋岛国的地区。高亮显示的国家位于太平洋岛屿地区。根据其相对于邻近较大的陆地如澳大利亚和附近国家如巴布亚新几内亚及新西兰的位置高亮显示的国家与瓦努阿图的位置相符。根据上下文瓦努阿图对马修岛和亨特岛有领土争端这两个岛屿由瓦努阿图和法国声称拥有。因此标有“有争议的岛屿”的虚线框的存在表明将这一争端包含在国家领土的概览中。答案是C。测试模型参考使用 LLaMA-Factory 微调大模型环境配置与训练推理 cd [your path]/llm/LLaMA-Factory conda activate llama_factoryunset https_proxy http_proxy# export GRADIO_ANALYTICS_ENABLEDFalse # 必须添加否则报错 CUDA_VISIBLE_DEVICES0 GRADIO_ANALYTICS_ENABLEDFalse API_PORT7861 llamafactory-cli webchat \ --model_name_or_path [your path]/huggingface/Xkev/Llama-3.2V-11B-cot \ --template mllama# export GRADIO_ANALYTICS_ENABLEDFalse # 必须添加否则报错 CUDA_VISIBLE_DEVICES1 GRADIO_ANALYTICS_ENABLEDFalse API_PORT7862 llamafactory-cli webchat \ --model_name_or_path [your path]/huggingface/meta-llama/Llama-3.2-11B-Vision-Instruct \ --template mllama注意两个模型不能同时启动需要分别启动。 LLaVA-CoT 能够生成 4 个不同的阶段总结(SUMMARY)、描述(CAPTION)、推理(REASONING)、结论(CONCLUSION)即总结(SUMMARY)模型对即将执行的任务进行简要概述。描述(CAPTION)对图像中相关部分的描述如果存在重点介绍与问题相关的元素。推理(REASONING)详细的分析在此过程中模型系统地考虑问题。结论(CONCLUSION)简洁的答案总结基于之前的推理提供最终答复。 LLaMA-Factory 的 ChatBot 比较 LLaVA-CoT 与 Llama-Vision-Instruct 的输出差异注意默认模型都没有使用阶段级的束搜索(stage-level beam search)即只有1次推理 LLaVA-CoT通过 SUMMARY(总结) - CAPTION(描述) - REASONING(推理) - CONCLUSION(结论) 等 4 步进行。Llama-Vision-Instruct通过直接推理流程。 LLaVA-CoT 如下 I will analyze the image by focusing on the elements and composition to provide an accurate description that matches the standard answer. 我将分析图像通过关注图像中的元素和构图来提供准确的描述以匹配标准答案。 The image features a person wearing a white shirt and a white face mask. They are sitting on a stool with a pink cushion. Their feet are adorned with white shoes featuring a decorative pattern. Next to them is a green stool, and a circular object is visible in the foreground. The scene is set against a white wall, creating a minimalist and modern aesthetic. 在图片中人物穿着白色衬衫和白色罩衫。他们坐在带有粉色垫子的凳子上。他们的脚上穿着带有装饰图案的白色鞋子。在他们旁边是一个绿色的凳子前景中可以看到一个圆形物体。场景设置在白色墙壁前营造出简约而现代的美感。 To describe the image, I will note the clothing and accessories, such as the white shirt, face mask, and shoes. I will also observe the seating arrangement, mentioning the stools and their colors. Additionally, I will describe the surrounding objects and the overall setting to capture the essence of the scene. 为了描述这张图片我将注意服装和配饰比如白色衬衫、罩衫和鞋子。我还将观察座位安排提及凳子及其颜色。此外我将描述周围的物体和整体环境以捕捉场景的精髓。 The image captures a serene moment in a minimalist, modern setting. A person, dressed in a crisp white shirt and a white face mask, is seated on a stool with a pink cushion. Their feet, clad in white shoes with a decorative pattern, rest on the ground. The person’s head is tilted back, and their arms are crossed over their chest, adding a sense of relaxation to the scene. The stool they sit on is green, providing a pop of color against the otherwise monochrome backdrop. A circular object, possibly a table or a decoration, sits in the foreground, adding depth to the image. The background is a stark white wall, devoid of any distractions, allowing the viewer’s focus to remain on the person and their surroundings. The overall composition of the image suggests a moment of quiet contemplation in a simple yet elegant environment. 图片捕捉了一个简约现代环境中的宁静时刻。一个穿着整洁白色衬衫和白色罩衫的人坐在带有粉色垫子的凳子上。他们的脚上穿着带有装饰图案的白色鞋子放在地上。这个人的头向后仰双臂交叉在胸前为场景增添了一种放松的感觉。他们坐的凳子是绿色的在其他单色背景中提供了一抹色彩。一个圆形物体可能是桌子或装饰品位于前景为图片增加了深度。背景是一面纯白色的墙壁没有任何干扰使观众的注意力保持在人物及其周围环境上。图片的整体构图暗示了一个在简单而优雅的环境中的沉思时刻。 Llama-Vision-Instruct 如下 This image presents a close-up view of a woman’s legs and feet, dressed in a crisp white outfit and striking white high-heeled shoes with intricate white embellishments. She is seated on a stool, with her legs bent at the knees and her feet crossed at the ankles, as she leans back against the stool with her arms tucked behind her. 这张图片展示了一位女性的腿部和脚部的特写视角她穿着一套清爽的白色服装搭配着引人注目的白色高跟鞋鞋子上装饰着复杂的白色装饰。她坐在一个凳子上双腿在膝盖处弯曲双脚在脚踝处交叉同时她向后靠在凳子上双臂藏在身后。 The stool features a soft pink cushion and a light-colored wooden frame, situated in front of a mint green stool. The room’s white walls and shiny white floor provide a clean and minimalist backdrop, while the sunlight streaming through the window casts a warm glow on the scene. 凳子上有一个柔软的粉色垫子和一个浅色的木制框架位于一个薄荷绿色的凳子前面。房间的白色墙壁和闪亮的白色地板提供了一个干净且简约的背景而透过窗户照射进来的阳光为场景增添了温暖的光辉。 HuggingFace的在线测试网站Llama-3.2V-11B-cot LLaVA-COT 使用阶段级束搜索方法在每个阶段(即总结、描述、推理、结论)中生成多个候选结果选择(使用LLM选择)其中最优的一个来继续生成过程即推理时阶段级的束搜索(Inference-Time Stage-Level Beam Search) 方法需要集成至 VLMEvalKit 使用源码如下 def judge(self, image, prompt, outputs, typesummary) 即 Judge 函数也是通过大模型进行处理。 # 定义一个函数用于生成内部阶段的文本输出 def generate_inner_stage_beam(self, message, datasetNone):# 将传入的消息转换为提示和图片路径prompt, image_path self.message_to_promptimg(message, datasetdataset)# 打开图片文件image Image.open(image_path)# 创建一个包含用户角色和内容的消息列表内容中包含图片和文本messages [{role: user, content: [{type: image},{type: text, text: prompt}]}]# 应用聊天模板并添加生成提示input_text self.processor.apply_chat_template(messages, add_generation_promptTrue)# 将图片和文本输入处理成模型需要的格式并转移到指定的设备如GPUinputs self.processor(image, input_text, return_tensorspt).to(self.device)# 如果不使用自定义提示则根据不同的数据集类型设置最大新令牌数if not self.use_custom_prompt(dataset):if DATASET_TYPE(dataset) MCQ or DATASET_TYPE(dataset) Y/N:self.kwargs[max_new_tokens] 2048else:self.kwargs[max_new_tokens] 2048# 定义不同的阶段和对应的结束标记stages [SUMMARY, CAPTION, REASONING, CONCLUSION]end_markers [/SUMMARY, /CAPTION, /REASONING, /CONCLUSION]# 获取输入ID的初始长度initial_length len(inputs[input_ids][0])# 深拷贝输入IDinput_ids copy.deepcopy(inputs[input_ids])# 遍历每个阶段和结束标记for stage, end_marker in zip(stages, end_markers):# 定义停止条件当生成的文本包含结束标记时停止stop_criteria StoppingCriteriaList([StopOnStrings([end_marker], self.processor.tokenizer)])# 初始化候选生成文本列表candidates []# 生成10个候选文本for _ in range(10): generation_kwargs self.kwargs.copy()generation_kwargs.update({stopping_criteria: stop_criteria})# 将图片和当前输入ID处理成模型需要的格式并转移到指定的设备inputs self.processor(image, input_ids, return_tensorspt).to(self.device)# 使用模型生成文本output self.model.generate(**inputs, **generation_kwargs)# 获取新生成的IDnew_generated_ids output[0]# 解码生成的文本generated_text self.processor.tokenizer.decode(new_generated_ids[initial_length:], skip_special_tokensTrue)# 将生成的ID和文本添加到候选列表candidates.append({input_ids: new_generated_ids.unsqueeze(0),generated_text: generated_text,})# 通过比较和选择从候选列表中选择最佳的文本while(len(candidates) 1):# 随机选择两个候选文本candidate1 candidates.pop(np.random.randint(len(candidates)))candidate2 candidates.pop(np.random.randint(len(candidates)))outputs [candidate1[generated_text], candidate2[generated_text]]# 根据阶段类型和图片、提示、输出文本选择最佳文本best_index self.judge(image, prompt, outputs, typestage[1:-1].lower())if best_index 0:candidates.append(candidate1)else:candidates.append(candidate2)# 更新输入ID为最佳候选的IDinput_ids candidates[0][input_ids]# 解码最终的输出文本final_output self.processor.tokenizer.decode(input_ids[0][initial_length:], skip_special_tokensTrue)# 返回最终输出的文本return final_output参考视频工具视频下载、视频帧、视频去水印GitHub - how to use the inference_demo.pyGitHub - Meta-Llama/llama-recipesGitHub - open-compass/VLMEvalKit

查看全文

http://www.hkea.cn/news/14291379/