当前位置：首页 > news >正文

wordpress虚拟主机安装东莞seo推广

news 2026/4/6 18:59:41

wordpress虚拟主机安装,东莞seo推广,网站是asp还是php,北京网站建设策划方案从代码角度进行Llama 架构分析 Llama 架构分析前言Llama 架构分析分词网络主干DecoderLayerAttentionMLP 下游任务因果推理文本分类 Llama 架构分析前言 Meta 开发并公开发布了 Llama系列大型语言模型 (LLM)，这是一组经过预训练和微调的生成文本模型，参…

从代码角度进行Llama 架构分析

Llama 架构分析
- 前言
- Llama 架构分析
- - 分词
  - 网络主干
  - - DecoderLayer
    - - Attention
      - MLP
  - 下游任务
  - - 因果推理
    - 文本分类

Llama 架构分析

前言

Meta 开发并公开发布了 Llama系列大型语言模型 (LLM)，这是一组经过预训练和微调的生成文本模型，参数规模从 70 亿到 700 亿不等。

在大多数任务中，LLaMA-13B要比GPT-3(175B)的性能要好，LLaMA-65B和组好的模型Chinchilla-70B以及PaLM-540B的实力相当。

Llama 架构分析

分词

分词部分主要做的是利用文本分词器对文本进行分词

在这里插入图片描述

tokenizer = AutoTokenizer.from_pretrained(PATH_TO_CONVERTED_TOKENIZER)
text = "Hey, are you conscious? Can you talk to me?"
inputs = tokenizer(text, return_tensors="pt")

网络主干

主干网络部分主要是将分词得到的input_ids输入到embedding层中进行文本向量化，得到hidden_states（中间结果），然后输入到layers层中，得到hidden_states（中间结果），用于下游任务。

在这里插入图片描述

self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.padding_idx)self.layers = nn.ModuleList([MixtralDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)])self._use_flash_attention_2 = config._attn_implementation == "flash_attention_2"self.norm = MixtralRMSNorm(config.hidden_size, eps=config.rms_norm_eps)

DecoderLayer

主干网络的layers层就是由多个DecoderLayer组成的，由num_hidden_layers参数决定，一般我们说的模型量级就取决于这个数量，7b的模型DecoderLayer层的数量是32。

DecoderLayer层中又包含了Attention层和MLP层，主要的一个思想是利用了残差结构。

如下图所示，分为两个部分

第一部分

首先，将hidden_states（文本向量化的结构）进行复制，即残差
归一化
注意力层
残差相加

第二部分

首先将第一部分得到的hidden_states进行复制，即残差
归一化
MLP层
残差相加

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

#复制一份
residual = hidden_states
#归一化
hidden_states = self.input_layernorm(hidden_states)#注意力层
hidden_states, self_attn_weights, present_key_value = self.self_attn(hidden_states=hidden_states,attention_mask=attention_mask,position_ids=position_ids,past_key_value=past_key_value,output_attentions=output_attentions,use_cache=use_cache,padding_mask=padding_mask,
)
#加上残差
hidden_states = residual + hidden_states#复制一份
residual = hidden_states
#归一化
hidden_states = self.post_attention_layernorm(hidden_states)
#mlp
hidden_states = self.mlp(hidden_states)
#加上残差
hidden_states = residual + hidden_statesoutputs = (hidden_states,)if output_attentions:outputs += (self_attn_weights,)if use_cache:outputs += (present_key_value,)return outputs

Attention

进行位置编码，让模型更好的捕捉上下文信息

在这里插入图片描述

#经过线性层
query_states = self.q_proj(hidden_states)
key_states = self.k_proj(hidden_states)
value_states = self.v_proj(hidden_states)#多头注意力形状变换
query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
kv_seq_len = key_states.shape[-2]#计算cos、sin
#计算旋转位置嵌入
cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)#计算权重
key_states = repeat_kv(key_states, self.num_key_value_groups)
value_states = repeat_kv(value_states, self.num_key_value_groups)
attn_weights = torch.matmul(query_states, key_states.transpose(2, 3)) / math.sqrt(self.head_dim)#加上掩码
attn_weights = attn_weights + attention_mask
#计算softmax
attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=torch.float32).to(query_states.dtype)
attn_output = torch.matmul(attn_weights, value_states)attn_output = self.o_proj(attn_output)

MLP

mlp层的主要作用是应用非线性激活函数和线性投影。

首先将attention层得到的结果经过两个线性层得到gate_proj和up_proj
gate_proj经过激活函数，再和up_proj相乘
最后经过一个线性层得到最后的结果

在这里插入图片描述

self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
self.act_fn = ACT2FN[config.hidden_act]
down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))

下游任务

因果推理

所谓因果推理，就是回归任务。

在这里插入图片描述

self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False)

文本分类

即分类任务

在这里插入图片描述

self.score = nn.Linear(config.hidden_size, self.num_labels, bias=False)

查看全文

http://www.hkea.cn/news/862354/

怎么知道一个网站是谁做的中国最大的企业培训公司

m2c是什么意思南昌百度seo

专业做羽绒服的服装网站域名注册网

公司网站建设需要显示什么软件世界球队最新排名

做微信平台图片网站有没有免费的广告平台

渭南网站建设风尚网络站长工具seo词语排名

广告传媒网站模板免费网站推广方式

如何用api方式做网站域名批量查询工具

wordpress 网易云跟帖优化合作平台

php可以做视频网站有哪些软文推广渠道主要有

成都网站建设桔子科技淘宝付费推广有几种方式

福田的网站建设公司网络营销成功案例ppt免费

网站建设英文专业术语百度推广网址

做网站之前需要准备什么企业网络营销策划案

dreamweaver动态网站开发与设计教程内容怎么在百度上面打广告

从代码角度进行Llama 架构分析

Llama 架构分析

前言

Llama 架构分析

分词

网络主干

DecoderLayer

Attention

MLP

下游任务

因果推理

文本分类

相关文章：