当前位置: 首页 > news >正文

网站负责人核验照片宝塔建站网址

网站负责人核验照片,宝塔建站网址,网站建设自查及整改报告,紧急通知河南全省暂停说明 我需要在昇腾服务器上对Qwen2-72B大模型进行lora微调#xff0c;改变其自我认知。 我的环境下是8张910B1卡。显存约512GB。 准备#xff1a;安装llamafactory 请参考官方方法安装llamafactory#xff1a;https://github.com/hiyouga/LLaMA-Factory 特别强调下…说明 我需要在昇腾服务器上对Qwen2-72B大模型进行lora微调改变其自我认知。 我的环境下是8张910B1卡。显存约512GB。 准备安装llamafactory 请参考官方方法安装llamafactoryhttps://github.com/hiyouga/LLaMA-Factory 特别强调下deepspeed一定要按照文档中要求的版本安装太新或者太旧多卡训练都有问题。 准备数据集和训练配置 我准备的工作目录如下 ./ ├── data │ ├── dataset_info.json │ └── self_cognition.json ├── deepspeed │ └── ds_z3_config.json ├── models ├── start_train.sh └── train_config.yaml其中data目录下self_cognition.json是我准备的数据集他是alpaca格式的dataset_info.json是数据集的入口配置文件一会训练要指定它。 dataset_info.json内容如下 { train_data_name: {file_name: self_cognition.json}}我在这里只写了一个数据集其实可以配置很多的。 deepspeed目录下的ds_z3_config.json是deepspeed的配置文件使用多卡训练必须要有这个文件。 这个文件在LLaMA-Factory代码工程下examples/deepspeed下有参考文件我直接复制了一个过来。 其内容如下 {train_batch_size: auto,train_micro_batch_size_per_gpu: auto,gradient_accumulation_steps: auto,gradient_clipping: auto,zero_allow_untested_optimizer: true,fp16: {enabled: auto,loss_scale: 0,loss_scale_window: 1000,initial_scale_power: 16,hysteresis: 2,min_loss_scale: 1},bf16: {enabled: auto},zero_optimization: {stage: 3,overlap_comm: true,contiguous_gradients: true,sub_group_size: 1e9,reduce_bucket_size: auto,stage3_prefetch_bucket_size: auto,stage3_param_persistence_threshold: auto,stage3_max_live_parameters: 1e9,stage3_max_reuse_distance: 1e9,stage3_gather_16bit_weights_on_model_save: true} } models目录是空的用来存放训练生成的模型。 train_config.yaml是训练配置文件起内容如下 ### model model_name_or_path: /data/xxx/mindformer_share/Qwen2-72B-Instruct/### method stage: sft do_train: true finetuning_type: lora lora_target: all### ddp ddp_timeout: 180000000 deepspeed: ./deepspeed/ds_z3_config.json### dataset dataset: train_data_name template: qwen cutoff_len: 1024 max_samples: 200 overwrite_cache: true preprocessing_num_workers: 16### output output_dir: ./models/ logging_steps: 10 save_steps: 50 plot_loss: true overwrite_output_dir: true#report_to: tensorboard#logging_dir: /data/xxx/mindformer_share/llamaFactory/tensorboard/### train per_device_train_batch_size: 1 gradient_accumulation_steps: 2 learning_rate: 1.0e-4 num_train_epochs: 40.0 lr_scheduler_type: cosine warmup_ratio: 0.1 fp16: true### eval val_size: 0.1 per_device_eval_batch_size: 1 eval_strategy: steps eval_steps: 500其中model_name_or_path用爱指定基础模型的路径output_dir用来指定训练生成模型的输出路径。num_train_epochs是训练的轮数max_samples是最大样本的数量可根据实际情况修改这个值。save_steps是多少步保存一次中间checkpoint。 重点要说下deepspeed配置如果不指定deepspeed配置文件则默认使用数据并行一旦模型无法在单张卡上加载就会出错。而配置了deepspeed之后则模型会被切分到各张卡上大模型可以平均分布在多张卡上。 我的训练启动脚本是start_train.sh其内容如下 #!/bin/shsource /usr/local/Ascend/ascend-toolkit/set_env.shset -x #export ASCEND_RT_VISIBLE_DEVICES0,1,2,3 export ASCEND_RT_VISIBLE_DEVICES0,1,2,3,4,5,6,7llamafactory-cli train ./train_config.yaml执行训练 在命令行执行start_train.sh脚本 sh start_train.sh训练完成后./models目录下文件如下 ./models/ ├── adapter_config.json ├── adapter_model.safetensors ├── added_tokens.json ├── all_results.json ├── checkpoint-100 ├── checkpoint-150 ├── checkpoint-200 ├── checkpoint-50 ├── eval_results.json ├── merges.txt ├── README.md ├── runs ├── special_tokens_map.json ├── tokenizer_config.json ├── tokenizer.json ├── trainer_log.jsonl ├── trainer_state.json ├── training_args.bin ├── training_loss.png ├── train_results.json └── vocab.json我的模型训练了40轮每50步保存一次模型。通过查看trainer_log.jsonl发现checkpoint-150之后loss就已经很稳定了我决定就使用checkpoint-150这个中间结果做后面的操作。 {current_steps: 10, total_steps: 200, loss: 4.1278, lr: 4e-05, epoch: 2.0, percentage: 5.0, elapsed_time: 0:03:03, remaining_time: 0:58:11} {current_steps: 20, total_steps: 200, loss: 2.4438, lr: 9e-05, epoch: 4.0, percentage: 10.0, elapsed_time: 0:05:48, remaining_time: 0:52:20} {current_steps: 30, total_steps: 200, loss: 1.0016, lr: 9.951340343707852e-05, epoch: 6.0, percentage: 15.0, elapsed_time: 0:08:35, remaining_time: 0:48:39} {current_steps: 40, total_steps: 200, loss: 0.4434, lr: 9.755282581475769e-05, epoch: 8.0, percentage: 20.0, elapsed_time: 0:11:18, remaining_time: 0:45:13} {current_steps: 50, total_steps: 200, loss: 0.0837, lr: 9.414737964294636e-05, epoch: 10.0, percentage: 25.0, elapsed_time: 0:13:59, remaining_time: 0:41:58} {current_steps: 60, total_steps: 200, loss: 0.0096, lr: 8.940053768033609e-05, epoch: 12.0, percentage: 30.0, elapsed_time: 0:17:36, remaining_time: 0:41:06} {current_steps: 70, total_steps: 200, loss: 0.0059, lr: 8.345653031794292e-05, epoch: 14.0, percentage: 35.0, elapsed_time: 0:20:22, remaining_time: 0:37:50} {current_steps: 80, total_steps: 200, loss: 0.0019, lr: 7.649596321166024e-05, epoch: 16.0, percentage: 40.0, elapsed_time: 0:23:07, remaining_time: 0:34:41} {current_steps: 90, total_steps: 200, loss: 0.0026, lr: 6.873032967079561e-05, epoch: 18.0, percentage: 45.0, elapsed_time: 0:25:51, remaining_time: 0:31:36} {current_steps: 100, total_steps: 200, loss: 0.0011, lr: 6.0395584540887963e-05, epoch: 20.0, percentage: 50.0, elapsed_time: 0:28:36, remaining_time: 0:28:36} {current_steps: 110, total_steps: 200, loss: 0.0007, lr: 5.174497483512506e-05, epoch: 22.0, percentage: 55.0, elapsed_time: 0:32:03, remaining_time: 0:26:13} {current_steps: 120, total_steps: 200, loss: 0.001, lr: 4.3041344951996746e-05, epoch: 24.0, percentage: 60.0, elapsed_time: 0:34:44, remaining_time: 0:23:09} {current_steps: 130, total_steps: 200, loss: 0.0009, lr: 3.4549150281252636e-05, epoch: 26.0, percentage: 65.0, elapsed_time: 0:37:25, remaining_time: 0:20:08} {current_steps: 140, total_steps: 200, loss: 0.0008, lr: 2.6526421860705473e-05, epoch: 28.0, percentage: 70.0, elapsed_time: 0:40:08, remaining_time: 0:17:12} {current_steps: 150, total_steps: 200, loss: 0.0009, lr: 1.9216926233717085e-05, epoch: 30.0, percentage: 75.0, elapsed_time: 0:42:48, remaining_time: 0:14:16} {current_steps: 160, total_steps: 200, loss: 0.0009, lr: 1.2842758726130283e-05, epoch: 32.0, percentage: 80.0, elapsed_time: 0:46:37, remaining_time: 0:11:39} {current_steps: 170, total_steps: 200, loss: 0.0007, lr: 7.597595192178702e-06, epoch: 34.0, percentage: 85.0, elapsed_time: 0:49:21, remaining_time: 0:08:42} {current_steps: 180, total_steps: 200, loss: 0.0007, lr: 3.6408072716606346e-06, epoch: 36.0, percentage: 90.0, elapsed_time: 0:52:05, remaining_time: 0:05:47} {current_steps: 190, total_steps: 200, loss: 0.0007, lr: 1.0926199633097157e-06, epoch: 38.0, percentage: 95.0, elapsed_time: 0:54:53, remaining_time: 0:02:53} {current_steps: 200, total_steps: 200, loss: 0.0007, lr: 3.04586490452119e-08, epoch: 40.0, percentage: 100.0, elapsed_time: 0:57:36, remaining_time: 0:00:00} {current_steps: 200, total_steps: 200, epoch: 40.0, percentage: 100.0, elapsed_time: 0:58:26, remaining_time: 0:00:00}合并lora模型到基础模型 merge方法1llamafactory-cli export llamafactory的命令行工具自带了lora合并功能参考源码工程目录examples/merge_lora/下的配置文件编写一个合并的配置文件即可。 首先编写一个合并用的配置文件qwen_merge_lora_config.yaml ### Note: DO NOT use quantized model or quantization_bit when merging lora adapters### model model_name_or_path: /data/xxx/mindformer_share/Qwen2-72B-Instruct/ adapter_name_or_path: /data/xxx/mindformer_share/llamaFactory/models/checkpoint-150/ template: qwen finetuning_type: lora### export export_dir: /data/xxx/mindformer_share/llamaFactory/export_merge_lora/ export_size: 2 export_device: cpu # 也可以写npu export_legacy_format: false上面文件中model_name_or_path是基础模型路径adapter_name_or_path是lora训练输出路径export_dir是合并后输出的模型路径。template时模型的架构跟lora训练配置里一致即可。 然后在命令行窗口执行 llamafactory-cli export qwen_merge_lora_config.yaml执行完毕在export_dir定义路径下就是合并后的完整模型。 merge方法2 也可以使用python通过调用peft来合并lora模型。 import torch import torch_npu from torch_npu.npu import amp from torch_npu.contrib import transfer_to_npu from transformers import AutoModelForCausalLM, AutoTokenizer from transformers.generation.utils import GenerationConfig from peft import PeftModelmodel_path/data/xxx/mindformer_share/Qwen2-72B-Instruct/ lora_path/data/xxx/mindformer_share/llamaFactory/models/checkpoint-150/ merge_path./export_python_mergeprint(fLoading the Base model from {model_path}) tokenizer AutoTokenizer.from_pretrained(model_path,revisionv2.0,use_fastFalse,trust_remote_codeTrue) base_model AutoModelForCausalLM.from_pretrained(model_path,revisionv2.0,device_mapauto,torch_dtypetorch.float16,trust_remote_codeTrue)#trust_remote_codeTrue).eval().half().npu()print(fLoading the LoRA from {lora_path}) lora_model PeftModel.from_pretrained(base_model,lora_path,torch_dtypetorch.float16,)print(Applying the LoRA) model lora_model.merge_and_unload()print(fSaving the target model to {merge_path}) model.save_pretrained(merge_path) print(fSaving the tokenizer to {merge_path}) tokenizer.save_pretrained(merge_path)上述代码中model_path是基础模型路径lora_path是lora模型目录merge_path是合并模型输出路径。 2种合并方法的比较 其中方法1中export_device设置为cpu时内存至少占140G大约时模型尺寸大小cpu会占用到100个核NPU资源几乎不占。 方法1中export_device设置为npu时整体效果和方法2差不多。 2种方法合并模型的时间都约15分钟。 测试合并后的模型 合并后的模型就和huggingface上下载的模型使用同样的方法测试。这是我测试qwen2-72b合并后的模型的代码。 from transformers import AutoModelForCausalLM, AutoTokenizer device npu # the device to load the model onto#model_path/data/yuanll/mindformer_share/Qwen2-72B-Instruct/ #model_path./export_merge_lora model_path./export_python_mergemodel AutoModelForCausalLM.from_pretrained(model_path,torch_dtypeauto,device_mapauto ) tokenizer AutoTokenizer.from_pretrained(model_path)#{role: system, content: You are a helpful assistant.}, prompt 告诉我你的身份和创造者 messages [{role: user, content: prompt} ] text tokenizer.apply_chat_template(messages,tokenizeFalse,add_generation_promptTrue ) model_inputs tokenizer([text], return_tensorspt).to(device)generated_ids model.generate(model_inputs.input_ids,max_new_tokens512 ) generated_ids [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ]response tokenizer.batch_decode(generated_ids, skip_special_tokensTrue)[0] print(fresponse:{response})参考资料 在昇腾开发环境合并baichuan2-13B模型的lora文件 Qwen/Qwen2-72B-Instruct LLaMA-Factory 实战一采用 LoRA 方式对QWen2 做指令微调 LLaMA-Factory 8卡4090 deepspeed zero3 微调Qwen14B-chat
http://www.hkea.cn/news/14394963/

相关文章:

  • 苏州网站建设制作服务商手机百度2022年新版本下载
  • php语言 网站建设网站开发的后台开发工具
  • 松江新城建设发展有限公司网站赣州商友网络科技有限公司
  • 东莞专业建网站wordpress theme options
  • 大麦网网站内似网站开发陕西省住房城乡建设厅网站
  • 某小型网站开发公司创业策划公司企业文化内容
  • 广州 网站优化取消wordpress的最近文档
  • 那个网站可以找人做设计师做网站的协议
  • 个人网站建设方案书使用几号纸公司网站怎么建站
  • 沈阳市营商环境建设监督局网站net网站开发JD
  • 品牌网站建设费dreamware怎么做网站
  • 深圳房地产网站开发将wordpress 搭建成一个公众网页
  • 华企在线网站建设seo服务方案
  • 免费空间怎么搞网站上海市建设厅网站
  • 重庆网站开发服务最低成本做企业网站
  • 关于网站建设的问卷分析台州cms模板建站
  • 查询网站所有关键词排名邹平做网站的公司有哪些
  • 初期网站价值WordPress自动采集翻译插件
  • 江苏建设招标信息网站做电影网站违法吗
  • gae安装wordpress西安网站优化
  • 备案个人网站做淘宝客河南建设网站公司
  • 外贸网站如何建站口子网站怎么做
  • 青岛网站维护公司浙江进出口贸易公司名录
  • 网站返回首页怎么做的好看第三方app下载平台
  • 北京市朝阳区住房建设网站迁安做网站中的cms润强
  • 在门户网站做推广内蒙古建设项目环保备案网站
  • 邯郸网站建设品牌公司微信网站建设报价表
  • 深圳商城网站哪家做的好保定网站网站建设
  • 怎么做自动下单网站网站如何设置微信支付功能
  • 个人做同城网站赚钱吗潍坊网站制作最低价格