{
    "version": "https://jsonfeed.org/version/1",
    "title": "Yuukoの小屋",
    "description": "Amor che nella mente mi regiona.",
    "home_page_url": "https://yuuko.site",
    "items": [
        {
            "id": "https://yuuko.site/2026/06/15/CS/%E8%AE%A1%E7%BD%91/%E8%AE%A1%E7%AE%97%E6%9C%BA%E7%BD%91%E7%BB%9C%E6%9E%B6%E6%9E%84/",
            "url": "https://yuuko.site/2026/06/15/CS/%E8%AE%A1%E7%BD%91/%E8%AE%A1%E7%AE%97%E6%9C%BA%E7%BD%91%E7%BB%9C%E6%9E%B6%E6%9E%84/",
            "title": "计算机网络架构",
            "date_published": "2026-06-14T16:00:00.000Z",
            "content_html": "<link rel=\"stylesheet\" type=\"text/css\" href=\"https://cdn.jsdelivr.net/hint.css/2.4.1/hint.min.css\"><h1><span id=\"计算机网络相关概念\"> 计算机网络相关概念</span></h1>\n<p>计算机网络是由若干个计算机节点或者相关硬件节点与链接这些节点的链路组成的网状数据通路。</p>\n<p><code>internet</code>和<code>Internet</code>是两个需要区分的名词，前者是泛指的互联网结构，而后者是基于TCP/IP协议的现代 <strong>因特网</strong>。</p>\n<p><strong>以太网</strong>是一种使用有线连接实现局域网连接的技术形式，以太网主要通过“星型拓扑结构 + 交换机 + 网线/光纤 + 物理网卡”的方式来实现。</p>\n<h2><span id=\"计算机网络的分类\"> 计算机网络的分类</span></h2>\n<p>按网络使用者划分可以分为:</p>\n<ul>\n<li>公用网 -- 由电信运用商建设，由互联网服务提供商(ISP)，如中国移动，向普通用户提供互联网接入</li>\n<li>专用网 -- 特定单位的需求下的专门内部网络，不对外使用。</li>\n</ul>\n<p>按照空间拓扑结构可以分为</p>\n<ul>\n<li>总线型 -- 单一互联网总线串联所有计算机。建网结构简单，但是重负载情况下的通信效率低。</li>\n<li>星形网络 -- 中心化的网络架构，所有计算机通过独立线路与中心设备相联。实现成本高且中央设备对故障敏感，优点是便于管理与控制。</li>\n<li>环形网络 -- 去中心化的网络架构，通过环形网络连接所有计算机。例子:<strong>令牌环局域网</strong></li>\n<li>网状网络 -- 去中心化的网络架构，广泛应用于广域网，可靠性较高，但是控制复杂，线路成本较高。</li>\n</ul>\n<p>按照分布范围分类</p>\n<ul>\n<li>广域网(WAN) -- 长距离通信，城市间的互联网结构，用于构成互联网的骨干部分，通过高速链路连接节点交换机。</li>\n<li>城域网(MAN) -- 中距离通信，城市内或者城市群内的互联网结构。城域网通常使用以太网，因此有时也被纳入局域网。</li>\n<li>局域网(LAN) -- 短距离通信，通过主机高速线路互联。</li>\n<li>个人区域网(PAN) -- 使用无线网络在个人工作区域连接计算机与其他消费电子设备。<br>\n按传输技术分类</li>\n<li>广播式网络 -- 联网计算机共享同一个公共通信的信道，所有计算机都能收听到报文信息，根据目的地址的检查决定是否</li>\n<li>点对点网络 -- 每一条物理线路独立连接一对计算机，如果主机间没有直接连接，则分组通过中间件</li>\n</ul>\n<h2><span id=\"计算机网络的功能\"> 计算机网络的功能</span></h2>\n<ul>\n<li>数据通信</li>\n<li>资源共享 -- 硬件/软件/数据的共享， 比如NAS实现云存储</li>\n<li>分布式处理 -- 单一复杂任务的分布式计算</li>\n<li>提高可靠性 -- 通过网络进行数据备份，比如git</li>\n<li>负载均衡 -- 高并发任务的负载分配</li>\n</ul>\n<h2><span id=\"网络的信息交换方式\"> 网络的信息交换方式</span></h2>\n<p>节点间的信息交换方式分为</p>\n<ul>\n<li>电路交换</li>\n<li>报文交换</li>\n<li>分组交换</li>\n</ul>\n<h3><span id=\"电路交换\"> 电路交换</span></h3>\n<p>电路交换是通过电话线+电话交换机形式建立节点间的稳定通信线路资源，建立后通信线路由双方独占直到结束。因此电路交换通常只使用低频、连续且大量的</p>\n<p>电路交换分为三个阶段 -- 建立连接(开启占用通信资源) -&gt; 传输数据(持续占用通信资源) -&gt; 释放连接(释放通信资源)</p>\n<p>电路交换面临的问题有: 建立连接的时间长、线路利用率低与灵活性差、<strong>难以实现差错控制</strong></p>\n<h3><span id=\"报文交换\"> 报文交换</span></h3>\n<p>报文交换是用户节点将传输信息(比如源地址、目的地址等控制信息)封装为报文后传输。报文通过存储转发技术，在通信双方的节点中间的节点之间进行接收、存储、转发直到最终接收的节点。</p>\n<p>假如节点A到节点B存在多个路径，每一个报文可以独立选择如何经过</p>\n<p>交换节点能对报文进行差错检验，从而实现差错控制。报文交换方式不需要独占通信线路资源，线路利用率高。但是交换节点需要缓存报文后再转发，加大了缓存开销与转发的时间开销。如果报文过长，重传的成本也相应提高。</p>\n<h3><span id=\"分组交换\"> 分组交换</span></h3>\n<p>分组交换则是将报文进一步拆分为等长的数据段，每一个细分的数据段的首部包含必要的控制信息(源地址、目的地址、分组号等)，组内的数据传输可能不通过相同的路径传输。网络中的分组交换器查询分组的首部后缓存并处理转发。</p>\n<ul>\n<li>数据报服务 -- 数据报是分组的一种具体的实现形式，中转路由器将数据报作为独立的实体进行缓存、转发，同一报文的若干个数据报可能通过完全不同的路径传输。</li>\n<li>虚电路服务 -- 仿照电路交换模式，在通信的结点间的所有交换设备建立逻辑通信连接，逻辑连接建立后会分配一个唯一的虚电路标志(VCID)，分组通过VCID进行传输与转发。</li>\n</ul>\n<p>分组方式降低了报文的出错概率与重传的代价，遇到分组出错后只需要重传分组片段而不需要重传整个报文。但是每一个分组都需要额外的控制信息管理分组段。增加了控制复杂度同时降低了有效信息密度。</p>\n<p>分组可能失序、丢失或者重复，通常需要控制信息的管理。数据报服务模式的分组可能通过不同路径到达目的主机，需要在目的主机重排序，也有可能在传输过程发生丢包。虚电路不会发生失序，但是和电路交换一样有连接的建立与占用的时间与通信资源的开销。</p>\n<p>分组交换的时延通常是最小的，相比而言最灵活且最适合突发式通信。</p>\n<h2><span id=\"网络的性能指标\"> 网络的性能指标</span></h2>\n<ul>\n<li>速率 -- 数据传输的速度</li>\n<li>带宽 -- 信道支持的最大速率</li>\n<li>吞吐量 -- 单位时间内通过某个位置的实际数据量</li>\n<li>时延 -- 从当前结点传输到目的结点的总耗时，包括发送时延、传播时延、处理时延、排队时延</li>\n</ul>\n<p>发送时延指发送结点分组信息全部推入传输链路的耗时。满足</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mtext>发送时延</mtext><msub><mi>τ</mi><mi>s</mi></msub><mo>=</mo><mfrac><mrow><mtext>分组长度</mtext><mi>l</mi></mrow><mrow><mtext>发送速度</mtext><mi>v</mi></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\text{发送时延} \\tau_s = \\frac{\\text{分组长度} l}{\\text{发送速度} v}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord text\"><span class=\"mord cjk_fallback\">发送时延</span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.1132em;\">τ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:-0.1132em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">s</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.0574em;vertical-align:-0.686em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3714em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord cjk_fallback\">发送速度</span></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord cjk_fallback\">分组长度</span></span><span class=\"mord mathnormal\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p>传播耗时为数据在信道中的传输时间</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mtext>传输时延</mtext><msub><mi>τ</mi><mi>t</mi></msub><mo>=</mo><mfrac><mrow><mtext>信道长度</mtext><mi>s</mi></mrow><mrow><mtext>电磁波的信道传输速度</mtext><mi>c</mi></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\text{传输时延} \\tau_t = \\frac{\\text{信道长度} s}{\\text{电磁波的信道传输速度}c}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord text\"><span class=\"mord cjk_fallback\">传输时延</span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.1132em;\">τ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.1132em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.0463em;vertical-align:-0.686em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3603em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord cjk_fallback\">电磁波的信道传输速度</span></span><span class=\"mord mathnormal\">c</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord cjk_fallback\">信道长度</span></span><span class=\"mord mathnormal\">s</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p>处理时延为交换结点内的处理耗时</p>\n<p>排队时延为数据在路由器等设备等输入输出队列中的排队耗时</p>\n<p>关于传输信道的其他性能指标有:</p>\n<ul>\n<li>时延带宽积 -- 第一个比特到达接收端时，已发送的比特量。</li>\n<li>往返时延</li>\n<li>信道利用率 -- 有数据通过的时间占整个统计时间窗口的比例</li>\n</ul>\n<h1><span id=\"计算机网络的体系架构\"> 计算机网络的体系架构</span></h1>\n<h2><span id=\"计算机网络分层架构\"> 计算机网络分层架构</span></h2>\n<p>OSI 模型是ISO提出的网络体系标准，称为开放系统互联参考模型(OSI)，自上而下分为</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-text\"><span class=\"line\"><span>应用层 -> 表示层 -> 会话层 -> 传输层 -> 网络层 -> 数据链路层 -> 物理层 </span></span></code></pre>\n<p>分层的基本原则为:</p>\n<ul>\n<li>各层基本的功能相对独立，符合各层的技术实现目标</li>\n<li>各层的接口清晰，依赖少</li>\n<li>层间保证独立性，上层向下层提供服务，下层为上层提供数据</li>\n<li>分层结构有利于标准化工作</li>\n</ul>\n<p>各个层间的数据块可以分为:</p>\n<ul>\n<li>数据服务单元(SDU) -- 相邻层之间交换的数据单元，第<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>n</mi></mrow><annotation encoding=\"application/x-tex\">n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">n</span></span></span></span> 层的SDU记为 <strong>n-SDU</strong></li>\n<li>协议控制信息(PCI) -- 用于当前层处理的控制协议操作信息, 第<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>n</mi></mrow><annotation encoding=\"application/x-tex\">n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">n</span></span></span></span> 层的PCI记为 <strong>n-PCI</strong></li>\n</ul>\n<p>下层传入当前层的信息称为协议数据单元(PDU)， 包含SDU与PCI，当前层将PCI处理后，将SDU传递到下一层作为下一层的PDU。第<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>n</mi></mrow><annotation encoding=\"application/x-tex\">n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">n</span></span></span></span> 层的PDU记为 <strong>n-PDU</strong></p>\n<p>因此满足</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mo stretchy=\"false\">(</mo><mi>n</mi><mo>−</mo><mn>1</mn><mo stretchy=\"false\">)</mo><mtext>-SDU</mtext><mo>=</mo><mi>n</mi><mtext>-PDU</mtext><mo>=</mo><mi>n</mi><mtext>-SDU</mtext><mo>+</mo><mi>n</mi><mtext>-PCI</mtext></mrow><annotation encoding=\"application/x-tex\">(n-1)\\text{-SDU} = n\\text{-PDU} = n\\text{-SDU} + n\\text{-PCI}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">n</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">1</span><span class=\"mclose\">)</span><span class=\"mord text\"><span class=\"mord\">-SDU</span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\">n</span><span class=\"mord text\"><span class=\"mord\">-PDU</span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\">n</span><span class=\"mord text\"><span class=\"mord\">-SDU</span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\">n</span><span class=\"mord text\"><span class=\"mord\">-PCI</span></span></span></span></span></span></p>\n<p>协议是各层对应实体之间的通信协规则集，协议由三个部分组成</p>\n<ul>\n<li>语法</li>\n<li>语义</li>\n<li>同步(时序)</li>\n</ul>\n<p>接口是单个实体的各层之间的访问点</p>\n<p>服务是下层为相邻的上层暴露的功能调用。</p>\n<p>三者整体结构关系为</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-text\"><span class=\"line\"><span>             协议</span></span>\n<span class=\"line\"><span>n层     实体 ------实体</span></span>\n<span class=\"line\"><span>         |      接口|  服务</span></span>\n<span class=\"line\"><span>n-1层   实体 ------实体</span></span></code></pre>\n<p>根据是否连接/是否可靠/是否应答可将服务分为:</p>\n<ul>\n<li>是否连接\n<ul>\n<li>面向连接服务 -- 通信双方在传输前先建立连接并分配相应资源。最典型的场景就是电路交换或者虚电路模式的分组交换以及SSH等连接服务</li>\n<li>无连接服务 -- 网络动态选择路径，无需预先建立连接。</li>\n</ul>\n</li>\n<li>是否可靠\n<ul>\n<li>可靠服务 -- 具有检错、纠错、应答机制，保证传输等数据完整且正确</li>\n<li>不可靠服务 -- 无连接服务无法保证完全的可靠性</li>\n</ul>\n</li>\n<li>是否应答\n<ul>\n<li>有应答服务 -- 接收方获得数据后会向发送方发送应答。类似于FTP的文件传输服务是有应答服务</li>\n<li>无应答服务 -- 接收方不返回应答，比如网页访问，客户端本地收到网页(比如相应的html/css/js文件)并不会返回应答</li>\n</ul>\n</li>\n</ul>\n<h3><span id=\"osi结构\"> OSI结构</span></h3>\n<h4><span id=\"物理层\"> 物理层</span></h4>\n<p>物理层是实现最底层传输信息流的层，比如设备的接口特征、通信的编码意义和电气特征等。</p>\n<p>传输介质并不属于物理层的信息，底层信息传输不会关注传输介质的不同，因此这是属于比物理层更底层的物理介质层。</p>\n<h4><span id=\"数据链路层\"> 数据链路层</span></h4>\n<p>数据链路层内的数据开始存在意义，而不是物理层中透明的原始比特流。数据链路层的数据单位是 <strong>帧</strong>。</p>\n<p>在数据链路层中，计算机能实现基本的差错控制与流量控制，以及对共享信道的访问控制。</p>\n<p>数据链路层是根据MAC地址进行结点与结点间通信的</p>\n<h4><span id=\"网络层\"> 网络层</span></h4>\n<p>网络层的传输单位是数据报(一种分组方式)，主要任务是将数据报单元传输到目的主机。虚电路模式与数据报模式的数据交换都是发生在网络层的数据交换。</p>\n<p>网络层提供的功能主要是</p>\n<ul>\n<li>流量控制 -- 协调发送主机和接收主机的全局收发速率， 在现代网络层次中，网络层的数据流量的功能实际已分配给了数据链路层与传输层</li>\n<li>拥塞控制</li>\n<li>差错控制</li>\n</ul>\n<p>网络层是基于IP地址进行主机间通信的</p>\n<h4><span id=\"传输层\"> 传输层</span></h4>\n<p>传输层负责主机端口间的信息通信。</p>\n<p>由于设备通信进程通常不唯一，因此传输层需要复用与分用的功能。</p>\n<p>由于传输层是端口间的信息传输，端口对应进程的对外通信需求，因此传输层是进程内核态的最顶层交互，再向上的封装都是进程用户态的暴露。</p>\n<h4><span id=\"会话层\"> 会话层</span></h4>\n<p>会话层管理的是不同主机进程间的会话，比如会话的建立、维护与终止等功能。</p>\n<p>会话层承担着会话同步的功能，支持在连接上有序传输数据并引入<strong>检查点机制</strong>，用于实现会话恢复与断点续传。</p>\n<p>TLS/SSL 协议的握手阶段时实现在会话层的。访问 HTTPS 网站时，客户端和服务器首先要打招呼、交换证书、协商将要使用的密码套件。这个“建立安全会话、维持会话”的过程，属于会话层的功能。</p>\n<h4><span id=\"表示层\"> 表示层</span></h4>\n<p>表示层用于解决设备信息系统异构时信息不一致以及数据的加密、压缩、解密等功能</p>\n<p>最经典的场景是在Windows系统和Macos/Linux之间建立SSH时，由于不同系统之间的编码方式/换行符等差异(比如Windows使用<code>GBK</code>编码，Linux/Macos默认使用<code>UTF-8</code>/换行符差异CRLF<code>\\r\\n</code>与LF<code>\\n</code>)</p>\n<p>TLS/SSL 在表示层实现数据的加密与解密的过程，对于下层的传输层等层次接收的数据是加密后的数据帧/数据流，而不是规律可读的数据。</p>\n<h4><span id=\"应用层\"> 应用层</span></h4>\n<p>用户与网络的接口</p>\n<h4><span id=\"example\"> Example</span></h4>\n<p>在一个Flask+Nginx的后端通信场景,且单设备含有多网卡的场景中</p>\n<p>Flash监听本地回环的<code>127.0.0.1:4000</code>的端口，后端服务体现在应用层的用户后端程序实现中；表示层体现在Flask的底层实现，如python<code>json</code>实现相关的序列化与反序列化/ <code>gzip</code>实现相关的压缩；会话层的功能，比如网络上或者业务上的会话维持，通常是基于Flask底层的依赖完成的，比如WSGI的Werkzeug库用于处理TCP socket请求、flask session 用于实现后端业务面上的加密与与Cookie</p>\n<p>传输层中，本地回环和外部端口监听体现了传输层的<strong>复用</strong>特点。后端的进程可以监听若干个本地回环的端口，不同于MAC，端口并不是设备唯一确定的。</p>\n<p>比如Nginx绑定外网端口<code>&lt;ip&gt;:80</code>并外部流量反向代理到后端的<code>127.0.0.1:4000</code>端口。</p>\n<p>同时Flask将监听<code>127.0.0.1:4000</code>的流量传入进程中，体现了传输层的<strong>分用</strong>的特点。</p>\n<p>而对于设备的不同的网卡，他们在物理层与数据链路层完全不同，因为他们的通信路径、MAC地址完全不同；在网络层中，不同网卡通常分配不同的IP地址，因此网络层的信息也是独立的；在传输层中，不同网卡的外部流量可能交由同一个进程管理，因此在传输层网卡之间存在相同的部分；在用户态的相关层中，用户可以查看不同的网卡信息，但是Flask等库已经封装了不同网卡等的调用方式，用户只关心进程级的信息交互，此时不同的网卡对用户整体使用是无关的。<s>使用有线上网和无线上网对相同的网站访问没有区别</s></p>\n<h3><span id=\"tcpip结构\"> TCP/IP结构</span></h3>\n",
            "tags": []
        },
        {
            "id": "https://yuuko.site/2026/06/13/control/%E6%BB%91%E6%A8%A1%E6%8E%A7%E5%88%B6/",
            "url": "https://yuuko.site/2026/06/13/control/%E6%BB%91%E6%A8%A1%E6%8E%A7%E5%88%B6/",
            "title": "滑模控制",
            "date_published": "2026-06-12T16:00:00.000Z",
            "content_html": "<link rel=\"stylesheet\" type=\"text/css\" href=\"https://cdn.jsdelivr.net/hint.css/2.4.1/hint.min.css\"><h1><span id=\"滑模控制\"> 滑模控制</span></h1>\n<p>滑模控制是通过将相空间分块的方式实现分段非线性控制的控制方法。</p>\n<h2><span id=\"相图\"> 相图</span></h2>\n<p>对于微分方程</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mfrac><mrow><mi mathvariant=\"normal\">d</mi><mi mathvariant=\"bold\">x</mi></mrow><mrow><mi mathvariant=\"normal\">d</mi><mi>t</mi></mrow></mfrac><mo>=</mo><mi>f</mi><mo stretchy=\"false\">(</mo><mi mathvariant=\"bold\">x</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\frac{\\mathrm{d} \\mathbf{x}}{\\mathrm{d} t} = f(\\mathbf{x})\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:2.0574em;vertical-align:-0.686em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3714em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">d</span><span class=\"mord mathnormal\">t</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">d</span><span class=\"mord mathbf\">x</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span><span class=\"mopen\">(</span><span class=\"mord mathbf\">x</span><span class=\"mclose\">)</span></span></span></span></span></p>\n<p>且 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>f</mi><mo stretchy=\"false\">(</mo><mi mathvariant=\"bold\">x</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">f(\\mathbf{x})</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span><span class=\"mopen\">(</span><span class=\"mord mathbf\">x</span><span class=\"mclose\">)</span></span></span></span> 只通过 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"bold\">x</mi></mrow><annotation encoding=\"application/x-tex\">\\mathbf{x}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4444em;\"></span><span class=\"mord mathbf\">x</span></span></span></span> 与 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>t</mi></mrow><annotation encoding=\"application/x-tex\">t</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6151em;\"></span><span class=\"mord mathnormal\">t</span></span></span></span> 相关而不直接相关，这样的微分方程称为<strong>自洽微分方程</strong>, 同时保证，在一个初值下，自洽微分方程的解是存在且唯一的</p>\n<p>微分方程的解为 $x = \\phi(t,x_0) $, 即从初值 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>x</mi><mn>0</mn></msub></mrow><annotation encoding=\"application/x-tex\">x_0</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5806em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">0</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span> 出发的解。</p>\n<p>我们可以证明: 对于方程的解 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>ϕ</mi><mo stretchy=\"false\">(</mo><mi>t</mi><mo separator=\"true\">,</mo><msub><mi>x</mi><mn>0</mn></msub><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\phi(t,x_0)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\">ϕ</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">t</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">0</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span>, <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>ϕ</mi><mo stretchy=\"false\">(</mo><mi>t</mi><mo>+</mo><mi>c</mi><mo separator=\"true\">,</mo><msub><mi>x</mi><mn>0</mn></msub><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\phi(t+c,x_0)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\">ϕ</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">t</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\">c</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">0</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span> 仍然是方程的解，即解是 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>t</mi></mrow><annotation encoding=\"application/x-tex\">t</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6151em;\"></span><span class=\"mord mathnormal\">t</span></span></span></span> 无关的</p>\n<p>方程的解集</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>D</mi><mo>=</mo><mrow><mo fence=\"true\">{</mo><mi mathvariant=\"bold\">x</mi><mo>∈</mo><mi>D</mi><mo fence=\"false\" stretchy=\"true\" minsize=\"1.2em\" maxsize=\"1.2em\">∣</mo><mi mathvariant=\"bold\">x</mi><mo>=</mo><mi>ϕ</mi><mo stretchy=\"false\">(</mo><mi>t</mi><mo separator=\"true\">,</mo><msub><mi mathvariant=\"bold\">x</mi><mn>0</mn></msub><mo stretchy=\"false\">)</mo><mo separator=\"true\">,</mo><mi>t</mi><mo>∈</mo><mi>J</mi><mo fence=\"true\">}</mo></mrow></mrow><annotation encoding=\"application/x-tex\">D = \\left\\{\\mathbf{x}\\in D\\big| \\mathbf{x} = \\phi(t,\\mathbf{x}_0), t\\in J\\right\\}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">D</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.2em;vertical-align:-0.35em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size1\">{</span></span><span class=\"mord mathbf\">x</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">∈</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">D</span><span class=\"mord\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.85em;\"><span style=\"top:-2.85em;\"><span class=\"pstrut\" style=\"height:3.2em;\"></span><span style=\"width:0.333em;height:1.200em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.333em\" height=\"1.200em\" viewbox=\"0 0 333 1200\"><path d=\"M145 15 v585 v0 v585 c2.667,10,9.667,15,21,15\nc10,0,16.667,-5,20,-15 v-585 v0 v-585 c-2.667,-10,-9.667,-15,-21,-15\nc-10,0,-16.667,5,-20,15z M188 15 H145 v585 v0 v585 h43z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.35em;\"><span></span></span></span></span></span></span><span class=\"mord mathbf\">x</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord mathnormal\">ϕ</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">t</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathbf\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">0</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\">t</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">∈</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.09618em;\">J</span><span class=\"mclose delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size1\">}</span></span></span></span></span></span></span></p>\n<p>称为方程的相空间，解曲线 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi mathvariant=\"bold\">x</mi><mi>i</mi></msub><mo>=</mo><mi>ϕ</mi><mo stretchy=\"false\">(</mo><mi>t</mi><mo separator=\"true\">,</mo><msub><mi mathvariant=\"bold\">x</mi><mn>0</mn></msub><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\mathbf{x}_i = \\phi(t,\\mathbf{x}_0)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5944em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathbf\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\">ϕ</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">t</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathbf\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">0</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span> 实际是 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>t</mi><mo>∈</mo><mi>J</mi><mo>⊂</mo><mi mathvariant=\"double-struck\">R</mi><mo>→</mo><mi mathvariant=\"bold\">x</mi><mo>∈</mo><mi>D</mi><mo>⊂</mo><msup><mi mathvariant=\"double-struck\">R</mi><mi>n</mi></msup></mrow><annotation encoding=\"application/x-tex\">t\\in J\\subset \\mathbb{R} \\to \\mathbf{x} \\in D\\subset \\mathbb{R}^n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6542em;vertical-align:-0.0391em;\"></span><span class=\"mord mathnormal\">t</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">∈</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7224em;vertical-align:-0.0391em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.09618em;\">J</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">⊂</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6889em;\"></span><span class=\"mord mathbb\">R</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">→</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.5782em;vertical-align:-0.0391em;\"></span><span class=\"mord mathbf\">x</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">∈</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7224em;vertical-align:-0.0391em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">D</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">⊂</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6889em;\"></span><span class=\"mord\"><span class=\"mord mathbb\">R</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6644em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span></span></span></span></span></span></span></span> 的一条曲线，曲线在相空间<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>D</mi></mrow><annotation encoding=\"application/x-tex\">D</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">D</span></span></span></span> 上的投影称为相轨迹</p>\n<p><img loading=\"lazy\" src=\"/picture/control/trajectory.png\" alt=\"trajectory\"></p>\n<h2><span id=\"二阶系统滑模控制\"> 二阶系统滑模控制</span></h2>\n<p>考虑二阶系统，其闭环传递函数为</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi mathvariant=\"normal\">Φ</mi><mo stretchy=\"false\">(</mo><mi>s</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mfrac><mi>α</mi><mrow><msup><mi>s</mi><mn>2</mn></msup><mo>+</mo><mi>β</mi><mi>s</mi><mo>+</mo><mi>α</mi></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\Phi(s) = \\frac{\\alpha}{s^2+\\beta s+\\alpha}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">Φ</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">s</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.988em;vertical-align:-0.8804em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1076em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">s</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7401em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span><span class=\"mord mathnormal\">s</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8804em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p>对应的微分方程为</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mover accent=\"true\"><mi>c</mi><mo>¨</mo></mover><mo>+</mo><mi>β</mi><mover accent=\"true\"><mi>c</mi><mo>˙</mo></mover><mo>+</mo><mi>α</mi><mi>c</mi><mo>=</mo><mi>α</mi><mi>r</mi></mrow><annotation encoding=\"application/x-tex\">\\ddot{c} + \\beta \\dot{c} + \\alpha c  = \\alpha r \n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7512em;vertical-align:-0.0833em;\"></span><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">c</span></span><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.1944em;\"><span class=\"mord\">¨</span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">c</span></span><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.0833em;\"><span class=\"mord\">˙</span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mord mathnormal\">c</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">r</span></span></span></span></span></p>\n<p><img loading=\"lazy\" src=\"/picture/control/sys.png\" alt=\"sys\"></p>\n<p><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>e</mi></mrow><annotation encoding=\"application/x-tex\">e</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">e</span></span></span></span>为误差函数，定义为 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>e</mi><mo>=</mo><mi>r</mi><mo>−</mo><mi>c</mi></mrow><annotation encoding=\"application/x-tex\">e = r-c</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">e</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">r</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">c</span></span></span></span></p>\n<p>考虑常值输入，此时有</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mtable rowspacing=\"0.25em\" columnalign=\"right left\" columnspacing=\"0em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mi>α</mi><mi>e</mi></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mi>α</mi><mo stretchy=\"false\">(</mo><mi>r</mi><mo>−</mo><mi>c</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mo>−</mo><mover accent=\"true\"><mi>e</mi><mo>¨</mo></mover><mo>−</mo><mi>β</mi><mover accent=\"true\"><mi>e</mi><mo>˙</mo></mover></mrow></mstyle></mtd></mtr></mtable><annotation encoding=\"application/x-tex\">\\begin{aligned}\n\\alpha e &amp; = \\alpha (r-c) = -\\ddot{e} - \\beta\\dot{e}\n\\end{aligned}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.5em;vertical-align:-0.5em;\"></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-r\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1em;\"><span style=\"top:-3.16em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mord mathnormal\">e</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.5em;\"><span></span></span></span></span></span><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1em;\"><span style=\"top:-3.16em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">r</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\">c</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\">−</span><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">e</span></span><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.1944em;\"><span class=\"mord\">¨</span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">e</span></span><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.0833em;\"><span class=\"mord\">˙</span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.5em;\"><span></span></span></span></span></span></span></span></span></span></span></span></p>\n<p>因此原系统可以化为关于误差的零输入二阶系统</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mover accent=\"true\"><mi>e</mi><mo>¨</mo></mover><mo>+</mo><mi>β</mi><mover accent=\"true\"><mi>e</mi><mo>˙</mo></mover><mo>+</mo><mi>α</mi><mi>e</mi><mo>=</mo><mn>0</mn></mrow><annotation encoding=\"application/x-tex\">\\ddot{e} + \\beta\\dot{e} +\\alpha e = 0\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7512em;vertical-align:-0.0833em;\"></span><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">e</span></span><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.1944em;\"><span class=\"mord\">¨</span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">e</span></span><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.0833em;\"><span class=\"mord\">˙</span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mord mathnormal\">e</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">0</span></span></span></span></span></p>\n<p>特征方程根</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>λ</mi><mn>1</mn></msub><mo>=</mo><mfrac><mrow><mo>−</mo><mi>β</mi><mo>+</mo><msqrt><mrow><msup><mi>β</mi><mn>2</mn></msup><mo>−</mo><mn>4</mn><mi>α</mi></mrow></msqrt></mrow><mn>2</mn></mfrac><mo separator=\"true\">,</mo><msub><mi>λ</mi><mn>2</mn></msub><mo>=</mo><mfrac><mrow><mo>−</mo><mi>β</mi><mo>−</mo><msqrt><mrow><msup><mi>β</mi><mn>2</mn></msup><mo>−</mo><mn>4</mn><mi>α</mi></mrow></msqrt></mrow><mn>2</mn></mfrac></mrow><annotation encoding=\"application/x-tex\">\\lambda_1 = \\frac{-\\beta + \\sqrt{\\beta^2-4\\alpha}}{2}, \\lambda_2 = \\frac{-\\beta - \\sqrt{\\beta^2-4\\alpha}}{2}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8444em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">λ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.3208em;vertical-align:-0.686em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.6348em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">2</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">−</span><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9578em;\"><span class=\"svg-align\" style=\"top:-3.2em;\"><span class=\"pstrut\" style=\"height:3.2em;\"></span><span class=\"mord\" style=\"padding-left:1em;\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7401em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\">4</span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span></span></span><span style=\"top:-2.9178em;\"><span class=\"pstrut\" style=\"height:3.2em;\"></span><span class=\"hide-tail\" style=\"min-width:1.02em;height:1.28em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.28em\" viewbox=\"0 0 400000 1296\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M263,681c0.7,0,18,39.7,52,119\nc34,79.3,68.167,158.7,102.5,238c34.3,79.3,51.8,119.3,52.5,120\nc340,-704.7,510.7,-1060.3,512,-1067\nl0 -0\nc4.7,-7.3,11,-11,19,-11\nH40000v40H1012.3\ns-271.3,567,-271.3,567c-38.7,80.7,-84,175,-136,283c-52,108,-89.167,185.3,-111.5,232\nc-22.3,46.7,-33.8,70.3,-34.5,71c-4.7,4.7,-12.3,7,-23,7s-12,-1,-12,-1\ns-109,-253,-109,-253c-72.7,-168,-109.3,-252,-110,-252c-10.7,8,-22,16.7,-34,26\nc-22,17.3,-33.3,26,-34,26s-26,-26,-26,-26s76,-59,76,-59s76,-60,76,-60z\nM1001 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2822em;\"><span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">λ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.3208em;vertical-align:-0.686em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.6348em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">2</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">−</span><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9578em;\"><span class=\"svg-align\" style=\"top:-3.2em;\"><span class=\"pstrut\" style=\"height:3.2em;\"></span><span class=\"mord\" style=\"padding-left:1em;\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7401em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\">4</span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span></span></span><span style=\"top:-2.9178em;\"><span class=\"pstrut\" style=\"height:3.2em;\"></span><span class=\"hide-tail\" style=\"min-width:1.02em;height:1.28em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.28em\" viewbox=\"0 0 400000 1296\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M263,681c0.7,0,18,39.7,52,119\nc34,79.3,68.167,158.7,102.5,238c34.3,79.3,51.8,119.3,52.5,120\nc340,-704.7,510.7,-1060.3,512,-1067\nl0 -0\nc4.7,-7.3,11,-11,19,-11\nH40000v40H1012.3\ns-271.3,567,-271.3,567c-38.7,80.7,-84,175,-136,283c-52,108,-89.167,185.3,-111.5,232\nc-22.3,46.7,-33.8,70.3,-34.5,71c-4.7,4.7,-12.3,7,-23,7s-12,-1,-12,-1\ns-109,-253,-109,-253c-72.7,-168,-109.3,-252,-110,-252c-10.7,8,-22,16.7,-34,26\nc-22,17.3,-33.3,26,-34,26s-26,-26,-26,-26s76,-59,76,-59s76,-60,76,-60z\nM1001 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2822em;\"><span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p>对于方程</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mover accent=\"true\"><mi>e</mi><mo>¨</mo></mover><mo>−</mo><mo stretchy=\"false\">(</mo><msub><mi>λ</mi><mn>1</mn></msub><mo>+</mo><msub><mi>λ</mi><mn>2</mn></msub><mo stretchy=\"false\">)</mo><mover accent=\"true\"><mi>e</mi><mo>˙</mo></mover><mo>+</mo><msub><mi>λ</mi><mn>1</mn></msub><msub><mi>λ</mi><mn>2</mn></msub><mi>e</mi><mo>=</mo><mn>0</mn></mrow><annotation encoding=\"application/x-tex\">\\ddot{e} -(\\lambda_1+\\lambda_2)\\dot{e} + \\lambda_1\\lambda_2 e = 0\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7512em;vertical-align:-0.0833em;\"></span><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">e</span></span><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.1944em;\"><span class=\"mord\">¨</span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">λ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">λ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">e</span></span><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.0833em;\"><span class=\"mord\">˙</span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8444em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">λ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\">λ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">e</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">0</span></span></span></span></span></p>\n<p>写为</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msup><mrow><mo fence=\"true\">[</mo><mtable rowspacing=\"0.16em\" columnalign=\"center\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mover accent=\"true\"><mi>e</mi><mo>˙</mo></mover></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mi>e</mi></mstyle></mtd></mtr></mtable><mo fence=\"true\">]</mo></mrow><mo mathvariant=\"normal\" lspace=\"0em\" rspace=\"0em\">′</mo></msup><mo>=</mo><mrow><mo fence=\"true\">[</mo><mtable rowspacing=\"0.16em\" columnalign=\"center center\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><msub><mi>λ</mi><mn>1</mn></msub><mo>+</mo><msub><mi>λ</mi><mn>2</mn></msub></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><msub><mi>λ</mi><mn>1</mn></msub><msub><mi>λ</mi><mn>2</mn></msub></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mn>1</mn></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mn>0</mn></mstyle></mtd></mtr></mtable><mo fence=\"true\">]</mo></mrow><mrow><mo fence=\"true\">[</mo><mtable rowspacing=\"0.16em\" columnalign=\"center\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mover accent=\"true\"><mi>e</mi><mo>˙</mo></mover></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mi>e</mi></mstyle></mtd></mtr></mtable><mo fence=\"true\">]</mo></mrow></mrow><annotation encoding=\"application/x-tex\">\\begin{bmatrix}\n\\dot e\\\\\ne\n\\end{bmatrix}&#x27; = \n\\begin{bmatrix}\n\\lambda_1+\\lambda_2 &amp; \\lambda_1\\lambda_2\\\\\n1&amp;0\n\\end{bmatrix}\n\\begin{bmatrix}\n\\dot e\\\\\ne\n\\end{bmatrix}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:2.5418em;vertical-align:-0.95em;\"></span><span class=\"minner\"><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">[</span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.45em;\"><span style=\"top:-3.61em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">e</span></span><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.0833em;\"><span class=\"mord\">˙</span></span></span></span></span></span></span></span></span><span style=\"top:-2.41em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">e</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.95em;\"><span></span></span></span></span></span></span></span><span class=\"mclose delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">]</span></span></span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.5918em;\"><span style=\"top:-3.9029em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">′</span></span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.4em;vertical-align:-0.95em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">[</span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.45em;\"><span style=\"top:-3.61em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">λ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">λ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.41em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.95em;\"><span></span></span></span></span></span><span class=\"arraycolsep\" style=\"width:0.5em;\"></span><span class=\"arraycolsep\" style=\"width:0.5em;\"></span><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.45em;\"><span style=\"top:-3.61em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">λ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\">λ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.41em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">0</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.95em;\"><span></span></span></span></span></span></span></span><span class=\"mclose delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">]</span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">[</span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.45em;\"><span style=\"top:-3.61em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">e</span></span><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.0833em;\"><span class=\"mord\">˙</span></span></span></span></span></span></span></span></span><span style=\"top:-2.41em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">e</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.95em;\"><span></span></span></span></span></span></span></span><span class=\"mclose delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">]</span></span></span></span></span></span></span></p>\n<p>因此对于 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mi>X</mi><mo>=</mo><mrow><mo fence=\"true\">[</mo><mtable rowspacing=\"0.16em\" columnalign=\"center\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mover accent=\"true\"><mi>e</mi><mo>˙</mo></mover></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mi>e</mi></mstyle></mtd></mtr></mtable><mo fence=\"true\">]</mo></mrow></mstyle></mrow><annotation encoding=\"application/x-tex\">\\displaystyle X = \\begin{bmatrix}\\dot e\\\\ e\\end{bmatrix}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.4em;vertical-align:-0.95em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">[</span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.45em;\"><span style=\"top:-3.61em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">e</span></span><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.0833em;\"><span class=\"mord\">˙</span></span></span></span></span></span></span></span></span><span style=\"top:-2.41em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">e</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.95em;\"><span></span></span></span></span></span></span></span><span class=\"mclose delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">]</span></span></span></span></span></span>, 有</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mfrac><mrow><mi mathvariant=\"normal\">d</mi><mi>X</mi></mrow><mrow><mi mathvariant=\"normal\">d</mi><mi>t</mi></mrow></mfrac><mo>=</mo><mi>A</mi><mi>X</mi></mrow><annotation encoding=\"application/x-tex\">\\frac{\\mathrm{d} X}{\\mathrm{d} t} = A X\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:2.0574em;vertical-align:-0.686em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3714em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">d</span><span class=\"mord mathnormal\">t</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">d</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\">A</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span></span></span></span></span></p>\n<p>特征值为 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>λ</mi><mn>1</mn></msub><mo separator=\"true\">,</mo><msub><mi>λ</mi><mn>2</mn></msub></mrow><annotation encoding=\"application/x-tex\">\\lambda_1,\\lambda_2</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">λ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">λ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span></p>\n<p>这个方程的相曲线即为当前零输入误差二阶系统的相曲线。</p>\n<p>由于</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi mathvariant=\"normal\">d</mi><msup><mi>e</mi><mn>2</mn></msup><mo>=</mo><mn>2</mn><mi>e</mi><mover accent=\"true\"><mi>e</mi><mo>˙</mo></mover></mrow><annotation encoding=\"application/x-tex\">\\mathrm{d} e^{2} = 2e\\dot e \n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8641em;\"></span><span class=\"mord mathrm\">d</span><span class=\"mord\"><span class=\"mord mathnormal\">e</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8641em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6679em;\"></span><span class=\"mord\">2</span><span class=\"mord mathnormal\">e</span><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">e</span></span><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.0833em;\"><span class=\"mord\">˙</span></span></span></span></span></span></span></span></span></span></span></p>\n<p>因此当<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mover accent=\"true\"><mi>e</mi><mo>˙</mo></mover><mo separator=\"true\">,</mo><mi>e</mi></mrow><annotation encoding=\"application/x-tex\">\\dot{e}, e</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8623em;vertical-align:-0.1944em;\"></span><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">e</span></span><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.0833em;\"><span class=\"mord\">˙</span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\">e</span></span></span></span>图的相曲线收敛到0时，表示 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"normal\">∥</mi><mi>e</mi><mi mathvariant=\"normal\">∥</mi><mo>→</mo><mn>0</mn></mrow><annotation encoding=\"application/x-tex\">\\|e\\| \\to 0</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">∥</span><span class=\"mord mathnormal\">e</span><span class=\"mord\">∥</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">→</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">0</span></span></span></span> 且不在变化，说明此时系统稳定。同时，当 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>e</mi><mover accent=\"true\"><mi>e</mi><mo>˙</mo></mover><mo>&lt;</mo><mn>0</mn></mrow><annotation encoding=\"application/x-tex\">e\\dot{e} &lt;0</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.707em;vertical-align:-0.0391em;\"></span><span class=\"mord mathnormal\">e</span><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">e</span></span><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.0833em;\"><span class=\"mord\">˙</span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">&lt;</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">0</span></span></span></span> 时， 系统的误差在下降。</p>\n<p>去掉中间的负反馈通路 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>β</mi></mrow><annotation encoding=\"application/x-tex\">\\beta</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span></span></span></span> ,将二阶系统改为前向通路中存在一个<code>-1</code>的开关</p>\n<ul>\n<li>当开关处于<code>1</code>的时候，闭环传递函数为</li>\n</ul>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi mathvariant=\"normal\">Φ</mi><mo stretchy=\"false\">(</mo><mi>s</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mfrac><mi>α</mi><mrow><msup><mi>s</mi><mn>2</mn></msup><mo>+</mo><mi>α</mi></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\Phi(s) = \\frac{\\alpha}{s^2+\\alpha}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">Φ</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">s</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.8769em;vertical-align:-0.7693em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1076em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">s</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7401em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7693em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<ul>\n<li>当开关处于<code>2</code>当时候，闭环传递函数为</li>\n</ul>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi mathvariant=\"normal\">Φ</mi><mo stretchy=\"false\">(</mo><mi>s</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mo>−</mo><mfrac><mi>α</mi><mrow><msup><mi>s</mi><mn>2</mn></msup><mo>−</mo><mi>α</mi></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\Phi(s) = -\\frac{\\alpha}{s^2-\\alpha}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">Φ</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">s</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.8769em;vertical-align:-0.7693em;\"></span><span class=\"mord\">−</span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1076em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">s</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7401em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7693em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p><img loading=\"lazy\" src=\"/picture/control/sys.png\" alt=\"sliding\"></p>\n<p>这两种状态单独存在的时候，两个系统都无法渐进稳定，但是通过某种切换策略，在两个系统之间有限次转换，就可以实现相轨迹趋向于原点。</p>\n<p>其中发散模量</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>A</mi><mo>=</mo><mfrac><mrow><msub><mover accent=\"true\"><mi>e</mi><mo>˙</mo></mover><mn>0</mn></msub><mo>+</mo><msqrt><mi>α</mi></msqrt><msub><mi>e</mi><mn>0</mn></msub></mrow><mrow><mn>2</mn><mi>λ</mi></mrow></mfrac><mo>=</mo><mfrac><msub><mi>s</mi><mn>0</mn></msub><mrow><mn>2</mn><msqrt><mi>α</mi></msqrt></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">A = \\frac{\\dot e_0 +\\sqrt{\\alpha} e_0}{2\\lambda} = \\frac{s_0}{2\\sqrt \\alpha}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\">A</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.1633em;vertical-align:-0.686em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.4773em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">2</span><span class=\"mord mathnormal\">λ</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">e</span></span><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.0833em;\"><span class=\"mord\">˙</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">0</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8003em;\"><span class=\"svg-align\" style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\" style=\"padding-left:0.833em;\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span></span></span><span style=\"top:-2.7603em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"hide-tail\" style=\"min-width:0.853em;height:1.08em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.08em\" viewbox=\"0 0 400000 1080\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M95,702\nc-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14\nc0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54\nc44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10\ns173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429\nc69,-144,104.5,-217.7,106.5,-221\nl0 -0\nc5.3,-9.3,12,-14,20,-14\nH400000v40H845.2724\ns-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7\nc-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z\nM834 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2397em;\"><span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\">e</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">0</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.0376em;vertical-align:-0.93em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1076em;\"><span style=\"top:-2.3097em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">2</span><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8003em;\"><span class=\"svg-align\" style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;padding-left:0.833em;\">α</span></span><span style=\"top:-2.7603em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"hide-tail\" style=\"min-width:0.853em;height:1.08em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.08em\" viewbox=\"0 0 400000 1080\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M95,702\nc-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14\nc0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54\nc44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10\ns173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429\nc69,-144,104.5,-217.7,106.5,-221\nl0 -0\nc5.3,-9.3,12,-14,20,-14\nH400000v40H845.2724\ns-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7\nc-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z\nM834 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2397em;\"><span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">s</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">0</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.93em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p>Switch 2的稳定特征线为 $ \\dot e+ \\sqrt{\\alpha} e = 0$, 此时方程通解的发散模项为0</p>\n<p>当<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>s</mi><mo>&gt;</mo><mn>0</mn></mrow><annotation encoding=\"application/x-tex\">s&gt;0</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5782em;vertical-align:-0.0391em;\"></span><span class=\"mord mathnormal\">s</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">&gt;</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">0</span></span></span></span> 是向正向发散，当<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>s</mi><mo>&lt;</mo><mn>0</mn></mrow><annotation encoding=\"application/x-tex\">s&lt;0</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5782em;vertical-align:-0.0391em;\"></span><span class=\"mord mathnormal\">s</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">&lt;</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">0</span></span></span></span>的时候会将幅值先向0调整后再负向发散。因此，当 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>e</mi><mo>&gt;</mo><mn>0</mn><mo separator=\"true\">,</mo><mi>s</mi><mo>&lt;</mo><mn>0</mn></mrow><annotation encoding=\"application/x-tex\">e&gt; 0,s&lt;0</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5782em;vertical-align:-0.0391em;\"></span><span class=\"mord mathnormal\">e</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">&gt;</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8389em;vertical-align:-0.1944em;\"></span><span class=\"mord\">0</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\">s</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">&lt;</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">0</span></span></span></span> 或者 $e&lt;0, s&gt;0 $ 时， 发散项的贡献都是将误差向靠近0的收敛。</p>\n<p>当<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>e</mi><mi>s</mi><mo>&gt;</mo><mn>0</mn></mrow><annotation encoding=\"application/x-tex\">es&gt;0</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5782em;vertical-align:-0.0391em;\"></span><span class=\"mord mathnormal\">es</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">&gt;</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">0</span></span></span></span> 时，切换为Switch 1, 此时特征根为两个共轭纯虚根，因此对于轨迹的贡献为旋转贡献。</p>\n<p>因此以<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>e</mi><mo stretchy=\"false\">(</mo><mover accent=\"true\"><mi>e</mi><mo>˙</mo></mover><mo>+</mo><msqrt><mi>α</mi></msqrt><mi>e</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mn>0</mn></mrow><annotation encoding=\"application/x-tex\">e(\\dot e +\\sqrt \\alpha e ) = 0</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\">e</span><span class=\"mopen\">(</span><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">e</span></span><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.0833em;\"><span class=\"mord\">˙</span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.0503em;vertical-align:-0.25em;\"></span><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8003em;\"><span class=\"svg-align\" style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;padding-left:0.833em;\">α</span></span><span style=\"top:-2.7603em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"hide-tail\" style=\"min-width:0.853em;height:1.08em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.08em\" viewbox=\"0 0 400000 1080\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M95,702\nc-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14\nc0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54\nc44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10\ns173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429\nc69,-144,104.5,-217.7,106.5,-221\nl0 -0\nc5.3,-9.3,12,-14,20,-14\nH400000v40H845.2724\ns-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7\nc-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z\nM834 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2397em;\"><span></span></span></span></span></span><span class=\"mord mathnormal\">e</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">0</span></span></span></span> 作为滑模面进行滑模控制</p>\n<p>通常使用开关函数进行开关的切换</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mrow><mi mathvariant=\"normal\">s</mi><mi mathvariant=\"normal\">g</mi><mi mathvariant=\"normal\">n</mi></mrow><mo stretchy=\"false\">(</mo><mi>s</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mrow><mo fence=\"true\">{</mo><mtable rowspacing=\"0.36em\" columnalign=\"left left\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mn>1</mn></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mi>s</mi><mo>≥</mo><mn>0</mn></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mo>−</mo><mn>1</mn></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mi>s</mi><mo>&lt;</mo><mn>0</mn></mrow></mstyle></mtd></mtr></mtable></mrow></mrow><annotation encoding=\"application/x-tex\">\\mathrm{sgn}(s) =\\begin{dcases}\n1&amp; s\\geq 0\\\\\n-1 &amp; s&lt;0\n\\end{dcases} \n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">sgn</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">s</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:3em;vertical-align:-1.25em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size4\">{</span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.69em;\"><span style=\"top:-3.69em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span><span style=\"top:-2.25em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord\">−</span><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.19em;\"><span></span></span></span></span></span><span class=\"arraycolsep\" style=\"width:1em;\"></span><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.69em;\"><span style=\"top:-3.69em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">s</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≥</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\">0</span></span></span><span style=\"top:-2.25em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">s</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">&lt;</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\">0</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.19em;\"><span></span></span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p>实际使用时，会设计一个饱和边界层<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>γ</mi></mrow><annotation encoding=\"application/x-tex\">\\gamma</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05556em;\">γ</span></span></span></span> , 使用连续的函数替换开关函数，比如</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mrow><mi mathvariant=\"normal\">S</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">t</mi></mrow><mrow><mo fence=\"true\">(</mo><mi>x</mi><mo fence=\"true\">)</mo></mrow><mo>=</mo><mrow><mo fence=\"true\">{</mo><mtable rowspacing=\"0.36em\" columnalign=\"left left\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mn>1</mn></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mi>x</mi><mo>≥</mo><mn>1</mn></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mi>x</mi></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mi>x</mi><mo>∈</mo><mo stretchy=\"false\">(</mo><mo>−</mo><mn>1</mn><mo separator=\"true\">,</mo><mn>1</mn><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mo>−</mo><mn>1</mn></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mi>x</mi><mo>≤</mo><mo>−</mo><mn>1</mn></mrow></mstyle></mtd></mtr></mtable></mrow></mrow><annotation encoding=\"application/x-tex\">\\mathrm{Sat}\\left(x\\right) = \\begin{dcases}\n1 &amp; x\\geq 1\\\\\nx &amp; x\\in (-1,1)\\\\\n-1 &amp; x\\leq -1\n\\end{dcases}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">Sat</span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose delimcenter\" style=\"top:0em;\">)</span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:4.32em;vertical-align:-1.91em;\"></span><span class=\"minner\"><span class=\"mopen\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.35em;\"><span style=\"top:-2.2em;\"><span class=\"pstrut\" style=\"height:3.15em;\"></span><span class=\"delimsizinginner delim-size4\"><span>⎩</span></span></span><span style=\"top:-2.192em;\"><span class=\"pstrut\" style=\"height:3.15em;\"></span><span style=\"height:0.316em;width:0.8889em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.8889em\" height=\"0.316em\" style=\"width:0.8889em\" viewbox=\"0 0 888.89 316\" preserveaspectratio=\"xMinYMin\"><path d=\"M384 0 H504 V316 H384z M384 0 H504 V316 H384z\"/></svg></span></span><span style=\"top:-3.15em;\"><span class=\"pstrut\" style=\"height:3.15em;\"></span><span class=\"delimsizinginner delim-size4\"><span>⎨</span></span></span><span style=\"top:-4.292em;\"><span class=\"pstrut\" style=\"height:3.15em;\"></span><span style=\"height:0.316em;width:0.8889em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.8889em\" height=\"0.316em\" style=\"width:0.8889em\" viewbox=\"0 0 888.89 316\" preserveaspectratio=\"xMinYMin\"><path d=\"M384 0 H504 V316 H384z M384 0 H504 V316 H384z\"/></svg></span></span><span style=\"top:-4.6em;\"><span class=\"pstrut\" style=\"height:3.15em;\"></span><span class=\"delimsizinginner delim-size4\"><span>⎧</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.85em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.41em;\"><span style=\"top:-4.41em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span><span style=\"top:-2.97em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span></span></span><span style=\"top:-1.53em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord\">−</span><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.91em;\"><span></span></span></span></span></span><span class=\"arraycolsep\" style=\"width:1em;\"></span><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.41em;\"><span style=\"top:-4.41em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≥</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\">1</span></span></span><span style=\"top:-2.97em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">∈</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mopen\">(</span><span class=\"mord\">−</span><span class=\"mord\">1</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">1</span><span class=\"mclose\">)</span></span></span><span style=\"top:-1.53em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≤</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\">−</span><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.91em;\"><span></span></span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p>或者</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>tanh</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mfrac><mrow><msup><mi>e</mi><mi>x</mi></msup><mo>−</mo><msup><mi>e</mi><mrow><mo>−</mo><mi>x</mi></mrow></msup></mrow><mrow><msup><mi>e</mi><mi>x</mi></msup><mo>+</mo><msup><mi>e</mi><mrow><mo>−</mo><mi>x</mi></mrow></msup></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\tanh (x) = \\frac{e^{x}-e^{-x}}{e^{x}+e^{-x}}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mop\">tanh</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.2177em;vertical-align:-0.7693em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.4483em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">e</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.5904em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">x</span></span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">e</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6973em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">−</span><span class=\"mord mathnormal mtight\">x</span></span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">e</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6644em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">x</span></span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">e</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7713em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">−</span><span class=\"mord mathnormal mtight\">x</span></span></span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7693em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p>用于实现跳变的连续化与光滑化。使用 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mrow><mi mathvariant=\"normal\">S</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">t</mi></mrow><mrow><mo fence=\"true\">(</mo><mstyle displaystyle=\"true\" scriptlevel=\"0\"><mfrac><mi>s</mi><mi>γ</mi></mfrac></mstyle><mo fence=\"true\">)</mo></mrow></mrow><annotation encoding=\"application/x-tex\">\\mathrm{Sat}\\left(\\dfrac{s}{\\gamma}\\right)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:2.4em;vertical-align:-0.95em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">Sat</span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">(</span></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1076em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05556em;\">γ</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">s</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8804em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">)</span></span></span></span></span></span> 或者 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>tanh</mi><mo>⁡</mo><mrow><mo fence=\"true\">(</mo><mstyle displaystyle=\"true\" scriptlevel=\"0\"><mfrac><mi>s</mi><mi>γ</mi></mfrac></mstyle><mo fence=\"true\">)</mo></mrow></mrow><annotation encoding=\"application/x-tex\">\\tanh\\left(\\dfrac{s}{\\gamma}\\right)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:2.4em;vertical-align:-0.95em;\"></span><span class=\"mop\">tanh</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">(</span></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1076em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05556em;\">γ</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">s</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8804em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">)</span></span></span></span></span></span> 进行跳变，能将变化区域大致限制在饱和区 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo stretchy=\"false\">[</mo><mo>−</mo><mi>γ</mi><mo separator=\"true\">,</mo><mi>γ</mi><mo stretchy=\"false\">]</mo></mrow><annotation encoding=\"application/x-tex\">[-\\gamma,\\gamma]</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">[</span><span class=\"mord\">−</span><span class=\"mord mathnormal\" style=\"margin-right:0.05556em;\">γ</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05556em;\">γ</span><span class=\"mclose\">]</span></span></span></span> 中</p>\n<p>对于 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>n</mi></mrow><annotation encoding=\"application/x-tex\">n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">n</span></span></span></span> 阶高阶系统，滑模面是子空间内的超曲面</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>s</mi><mo>=</mo><mi>s</mi><mo stretchy=\"false\">(</mo><msub><mi>x</mi><mn>1</mn></msub><mo separator=\"true\">,</mo><mo>⋯</mo><mtext> </mtext><mo separator=\"true\">,</mo><msub><mi>x</mi><mi>m</mi></msub><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">s = s(x_1,\\cdots, x_m)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">s</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\">s</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\">⋯</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">m</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span></span></p>\n<p>需要更加复杂的切换逻辑</p>\n",
            "tags": []
        },
        {
            "id": "https://yuuko.site/2026/06/12/CS/LLM/CS336/tokenizer/",
            "url": "https://yuuko.site/2026/06/12/CS/LLM/CS336/tokenizer/",
            "title": "BPE-tokenizer",
            "date_published": "2026-06-11T16:00:00.000Z",
            "content_html": "<link rel=\"stylesheet\" type=\"text/css\" href=\"https://cdn.jsdelivr.net/hint.css/2.4.1/hint.min.css\"><h1><span id=\"bpe-tokenizer\"> BPE - Tokenizer</span></h1>\n<h2><span id=\"bpe的基本观念\"> BPE的基本观念</span></h2>\n<p>BPE Tokenizer 是将单个单词拆分为若干个子块进行tokenizer的方式，会训练一个Tokenizer Merge 表用于规定如何进行单个字母合并为子块，并训练一个token string子块与token id空间的映射表<code>vocab</code>以实现将字符转换为<code>int</code>类型的token id</p>\n<p>BPE 需要考虑一些Special token，它们是不可分token string单元，每一个作为整体映射到token id. 比如 <code>&quot;&lt;|endoftext|&gt;&quot;</code></p>\n<h3><span id=\"bpe-merge-表的训练\"> BPE Merge 表的训练</span></h3>\n<p>BPE 基于字符合并而贪心算法进行<code>vocab</code>和 <code>Merge</code>表</p>\n<p>假设Tokenizer 训练集为</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-text\"><span class=\"line\"><span>We&#x3C;/w> *20</span></span>\n<span class=\"line\"><span>are&#x3C;/w> *12</span></span>\n<span class=\"line\"><span>the&#x3C;/w> *3</span></span>\n<span class=\"line\"><span>world&#x3C;/w> *9</span></span></code></pre>\n<p>拆分为字频</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-text\"><span class=\"line\"><span>W e &#x3C;/W> *22</span></span>\n<span class=\"line\"><span>a r e &#x3C;/w> *12</span></span>\n<span class=\"line\"><span>t h e &#x3C;/w> *3</span></span>\n<span class=\"line\"><span>w o r l d &#x3C;/w> *9</span></span></code></pre>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-text\"><span class=\"line\"><span>&#x3C;/w> *44</span></span>\n<span class=\"line\"><span>e    *35</span></span>\n<span class=\"line\"><span>w    *29</span></span>\n<span class=\"line\"><span>r    *21</span></span>\n<span class=\"line\"><span>a    *12</span></span>\n<span class=\"line\"><span>o    * 9</span></span>\n<span class=\"line\"><span>l    * 9</span></span>\n<span class=\"line\"><span>d    * 9</span></span>\n<span class=\"line\"><span>t    * 3</span></span>\n<span class=\"line\"><span>h    * 3</span></span></code></pre>\n<p>Top-2 字频的字为 <code>w</code> 和 <code>e</code>， 将这两个字进行<strong>有序</strong>的合并</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-text\"><span class=\"line\"><span>we &#x3C;/w> *22</span></span>\n<span class=\"line\"><span>a r e &#x3C;/w> *12</span></span>\n<span class=\"line\"><span>t h e &#x3C;/w> *3</span></span>\n<span class=\"line\"><span>w o r l d &#x3C;/w> *9</span></span></code></pre>\n<p>更新字频表</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-text\"><span class=\"line\"><span>&#x3C;/w>  *44</span></span>\n<span class=\"line\"><span>we   *22</span></span>\n<span class=\"line\"><span>r    *21</span></span>\n<span class=\"line\"><span>a    *12</span></span>\n<span class=\"line\"><span>o    * 9</span></span>\n<span class=\"line\"><span>l    * 9</span></span>\n<span class=\"line\"><span>d    * 9</span></span>\n<span class=\"line\"><span>e    * 6</span></span>\n<span class=\"line\"><span>t    * 3</span></span>\n<span class=\"line\"><span>h    * 3</span></span></code></pre>\n<p>此时可合并的Top-2 字频为<code>r</code> 与 <code>a</code></p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-text\"><span class=\"line\"><span>we &#x3C;/w> *22</span></span>\n<span class=\"line\"><span>ar e &#x3C;/w> *12</span></span>\n<span class=\"line\"><span>t h e &#x3C;/w> *3</span></span>\n<span class=\"line\"><span>w o r l d &#x3C;/w> *9</span></span></code></pre>\n<p>更新词频表</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-text\"><span class=\"line\"><span>&#x3C;/w>  *44</span></span>\n<span class=\"line\"><span>we   *22</span></span>\n<span class=\"line\"><span>ar   *12</span></span>\n<span class=\"line\"><span>r    * 9</span></span>\n<span class=\"line\"><span>o    * 9</span></span>\n<span class=\"line\"><span>l    * 9</span></span>\n<span class=\"line\"><span>d    * 9</span></span>\n<span class=\"line\"><span>e    * 6</span></span>\n<span class=\"line\"><span>t    * 3</span></span>\n<span class=\"line\"><span>h    * 3</span></span></code></pre>\n<p>以此方式迭代直到达到设定的合并上限</p>\n<h2><span id=\"正则表达式库re的使用\"> 正则表达式库<code>re</code>的使用</span></h2>\n<h2><span id=\"遇到的问题\"> 遇到的问题</span></h2>\n",
            "tags": []
        },
        {
            "id": "https://yuuko.site/2026/06/11/CS/%E8%AE%A1%E7%BB%84/%E7%A3%81%E7%9B%98/",
            "url": "https://yuuko.site/2026/06/11/CS/%E8%AE%A1%E7%BB%84/%E7%A3%81%E7%9B%98/",
            "title": "磁盘与SSD",
            "date_published": "2026-06-10T16:00:00.000Z",
            "content_html": "<link rel=\"stylesheet\" type=\"text/css\" href=\"https://cdn.jsdelivr.net/hint.css/2.4.1/hint.min.css\"><h1><span id=\"磁盘存储器\"> 磁盘存储器</span></h1>\n<p>磁盘存储器由<strong>磁盘驱动器</strong>、<strong>磁盘控制器</strong>与<strong>盘片</strong>三个部分构成</p>\n<p>磁盘的存储单元，自上而下分为</p>\n<ul>\n<li>盘面 -- 磁盘的记录面，一个磁盘由多个盘面构成，每一个盘面有配套的读写磁头，盘面间共轴转动</li>\n<li>扇区 -- 盘面的扇形区域，盘面根据角度均分为若干个扇形区域，扇区中间有一定的间隙，称为扇区间隙</li>\n<li>磁道 -- 根据半径区分的，磁头能访问的存储区域。由于磁道的存储密度相同，磁道间的编址空间与存储空间随着半径递增。或者降低外层磁道信息线密度以保证磁道间一致性。</li>\n</ul>\n<p>存储区域内存在相关信息</p>\n<ul>\n<li>记录面数 -- 磁头数 = 盘面数</li>\n<li>柱面数 -- 单个记录面上的磁道数，即盘面沿着半径方向被分割成多少个部分</li>\n<li>扇区数 -- 单个记录面上的扇区数，即盘面沿着旋转方向被分割成多少个部分</li>\n</ul>\n<p>最早的磁盘是IBM推出的<strong>温彻斯特磁盘</strong>(温盘)</p>\n<h3><span id=\"磁记录原理\"> 磁记录原理</span></h3>\n<p>磁头和磁性记录相对移动时，电磁转换以实现数据的读写操作</p>\n<h2><span id=\"磁盘的管理\"> 磁盘的管理</span></h2>\n<h3><span id=\"磁盘的初始化\"> 磁盘的初始化</span></h3>\n<p>磁盘的初始化分为<strong>低级格式化</strong>与<strong>高级格式化</strong></p>\n<p>低级格式化将磁盘分为若干个扇区，扇区分为头部、尾部和数据部分，并使用CRC字段校验。低级格式化只是物理片段的划分，所以也叫物理格式化。</p>\n<p>高级格式化则是在分区上建立文件系统，初始化根目录等操作。因此也称为逻辑格式化。</p>\n<p>磁盘的容量也分为非格式化容量与格式化容量。非格式化容量是磁盘的理论存储上限，磁盘的磁记录表面可用的磁化单元总量；格式化容量是格式化后的实际可用存储容量。因此 非格式化容量 &gt; 格式化容量</p>\n<h3><span id=\"磁盘的地址编码方式\"> 磁盘的地址编码方式</span></h3>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-text\"><span class=\"line\"><span>|柱面(磁道)号|盘面(磁头)号|扇区号|</span></span></code></pre>\n<h3><span id=\"磁盘阵列\"> 磁盘阵列</span></h3>\n<p>RAID是将多个物理磁盘统一组合为一个逻辑磁盘的方式，数据在多个物理盘间交叉分割存储并并行访问，以获得更好的存储性能与安全性。</p>\n<h3><span id=\"磁盘的文件系统\"> 磁盘的文件系统</span></h3>\n<p><a href=\"/2026/06/05/CS/OS/%E6%96%87%E4%BB%B6%E7%AE%A1%E7%90%86/\" title=\"文件管理\">文件管理</a></p>\n<p>将磁盘进行分区，每一个分区的起始位置和大小记录在MBR中，而后是引导块进行OS的文件块加载进行自举。</p>\n<p>BIOS存储在ROM中，进行在硬件自检后加载引导块以启动操作系统。</p>\n<h2><span id=\"磁盘的调度算法\"> 磁盘的调度算法</span></h2>\n<p>磁盘读写时间由三个部分构成</p>\n<ul>\n<li>寻道时间</li>\n<li>旋转延迟时间</li>\n<li>数据传输时间</li>\n</ul>\n<p>寻道时间由跨越磁道时间<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>m</mi></mrow><annotation encoding=\"application/x-tex\">m</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">m</span></span></span></span>和磁头臂启动时间<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>s</mi></mrow><annotation encoding=\"application/x-tex\">s</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">s</span></span></span></span>构成</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>T</mi><mi>s</mi></msub><mo>=</mo><mi>m</mi><mi>n</mi><mo>+</mo><mi>s</mi></mrow><annotation encoding=\"application/x-tex\">T_s = mn+s\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">s</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\">mn</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">s</span></span></span></span></span></p>\n<p>平均寻道时间是最大寻道时间的一半</p>\n<p>旋转延迟时间为目标扇区旋转到磁头下的耗时，假设盘面转速为<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>r</mi></mrow><annotation encoding=\"application/x-tex\">r</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">r</span></span></span></span>,平均旋转延迟时间为</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>T</mi><mi>r</mi></msub><mo>=</mo><mfrac><mn>1</mn><mrow><mn>2</mn><mi>r</mi></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">T_r = \\frac{1}{2r}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em;\">r</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.0074em;vertical-align:-0.686em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3214em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">2</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">r</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p>数据传输时间为读写 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>b</mi></mrow><annotation encoding=\"application/x-tex\">b</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em;\"></span><span class=\"mord mathnormal\">b</span></span></span></span> 字节所需的时间。如果每条磁道存储<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>N</mi></mrow><annotation encoding=\"application/x-tex\">N</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span></span></span></span>字节，则平均数据传输时间为</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>T</mi><mi>t</mi></msub><mo>=</mo><mfrac><mi>b</mi><mrow><mi>r</mi><mi>N</mi></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">T_t= \\frac{b}{rN}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.0574em;vertical-align:-0.686em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3714em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">r</span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">b</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p>总平均存取时间为</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>T</mi><mi>a</mi></msub><mo>=</mo><msub><mi>T</mi><mi>s</mi></msub><mo>+</mo><msub><mi>T</mi><mi>r</mi></msub><mo>+</mo><msub><mi>T</mi><mi>t</mi></msub></mrow><annotation encoding=\"application/x-tex\">T_a = T_s+ T_r+T_t\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">a</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">s</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em;\">r</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span></span></p>\n<p>磁盘的存取时间中，寻道时间是最主要的时间开销部分，因此主要目标是降低寻道时间的开销。</p>\n<p>响应时间还需要考虑排队延迟和控制器时间的开销</p>\n<h3><span id=\"磁盘调度算法\"> 磁盘调度算法</span></h3>\n<h4><span id=\"fcfs算法\"> FCFS算法</span></h4>\n<p>最简单且最公平的算法，但是大量进程竞争磁盘时，磁头会频繁长距离移动导致效率低下</p>\n<h4><span id=\"最短寻道时间优先sstf\"> 最短寻道时间优先(SSTF)</span></h4>\n<p>基于贪心算法，每一次都寻找最短时间开销的磁道请求。</p>\n<p>SSTF可能在一个磁道区域来回移动导致较远的磁道出现&quot;饥饿&quot;现象</p>\n<h4><span id=\"scan算法\"> SCAN算法</span></h4>\n<p>根据当前的移动方向向最远处磁道移动，到达最远处后反向到达最内侧磁道。</p>\n<p>SCAN算法也称为电梯算法。它对最新扫描过的扇区不公平，需要更大的耗时才会返回该区域，因此局部性不如FCFS和SSTF。</p>\n<h4><span id=\"c-scan算法\"> C-SCAN算法</span></h4>\n<p>C-SCAN算法在到达磁盘最外侧边缘时，直接返回最内侧，返回过程不执行服务。当磁头从最内侧移动到最外侧时才进行服务。</p>\n<h4><span id=\"lookc-look-调度\"> LOOK/C-LOOK 调度</span></h4>\n<p>基于SCAN/C-SCAN改良，不到达磁道物理边缘，而只到达当前方向上的最远请求位置后就反向。</p>\n<h4><span id=\"nstepscan-amp-fscan-算法\"> NStepSCAN &amp; FSCAN 算法</span></h4>\n<p>由于SSTF/SCAN/C-SCAN 算法会出现<strong>磁道黏着</strong>现象。假设有进程始终在访问某个磁道，那么磁头就只会在当前磁道工作，导致其他磁道进程饥饿</p>\n<p>NStepSCAN将任务队列划分为若干长度为N的子队列，队列间FCFS，队列内SCAN。新请求放入新的队列当中，不放入正在执行的队列中。</p>\n<p>FSCAN将NStepSCAN简化为 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>N</mi><mo>=</mo><mn>2</mn></mrow><annotation encoding=\"application/x-tex\">N=2</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">2</span></span></span></span> 的形式，即维护当前服务队列和新请求的队列两个部分，将当前扫描过程中新增的请求全放入新请求队列中，降低了实现开销。</p>\n<h3><span id=\"减少延迟时间的算法\"> 减少延迟时间的算法</span></h3>\n<ul>\n<li>交替编号 -- 盘面内奇偶分离编号，减少执行时间导致错过扇区的长延迟</li>\n<li>错位命名 -- 盘面间错位编号，减少旋转延迟</li>\n</ul>\n<h3><span id=\"提高磁盘io速度的方法\"> 提高磁盘IO速度的方法</span></h3>\n<ul>\n<li>改进文件目录结构与目录以降低目录查找时间</li>\n<li>选择号的文件存储结构</li>\n<li>提高磁盘IO速度，如高速缓存、提前写、延迟读、虚拟盘、RAID等方式</li>\n</ul>\n<h1><span id=\"ssd\"> SSD</span></h1>\n<p>SSD 是基于Flash技术的存储设备，是E^2PROM的变体。</p>\n<p>SSD的读写操作以页为单位进行，擦除操作以块单位进行。</p>\n<p>SSD 的随机写速度较慢且擦写寿命较低，擦除的用时也比较长。因此提出了磨损均衡算法：</p>\n<ul>\n<li>动态磨损均衡 -- 写入数据时优先擦写访问少的块，将负载分散在不同块中</li>\n<li>静态磨损均衡 -- 控制器定期定期进行高磨损块的数据迁移，高磨损块以只读为主，低磨损块更多承担擦写任务，以提升平均寿命</li>\n</ul>\n<p>静态磨损均衡的实际性能好于动态磨损均衡</p>\n",
            "tags": []
        },
        {
            "id": "https://yuuko.site/2026/06/09/CS/OS/IO/",
            "url": "https://yuuko.site/2026/06/09/CS/OS/IO/",
            "title": "IO管理",
            "date_published": "2026-06-08T16:00:00.000Z",
            "content_html": "<link rel=\"stylesheet\" type=\"text/css\" href=\"https://cdn.jsdelivr.net/hint.css/2.4.1/hint.min.css\"><h1><span id=\"io管理\"> IO管理</span></h1>\n<h2><span id=\"io-设备\"> IO 设备</span></h2>\n<p>根据信息交换可以分为</p>\n<ul>\n<li>块设备 -- 传输速率高、支持随机读取的设备以块单位进行数据交换，如磁盘</li>\n<li>字符设备 -- 以字符为单位进行信息交换，通常使用中断方式实现异步通信，比如打印机</li>\n</ul>\n<p>打印机需要逐字符对照进行打印</p>\n<p>根据传输速率可以分为</p>\n<ul>\n<li>低速设备 -- 键盘鼠标等，传输速率低，大致为几字节每秒到几百字节每秒</li>\n<li>中速设备 -- 打印机等，大约数百至数千字节每秒</li>\n<li>高速设备 -- 磁盘，显卡等</li>\n</ul>\n<p>按共享属性分类为</p>\n<ul>\n<li>独占设备 -- 同一时刻只允许单一进程使用，直到释放，比如打印机</li>\n<li>共享设备 -- 同一时刻单独访问，但是允许进程逻辑上并发，通过调度实现多进程访问，分时交替使用，比如磁盘</li>\n<li>虚拟设备 -- 通过<strong>SPOOLing</strong> 设备将独占设备改造为多个逻辑设备并发，使用高速共享设备作为缓冲区，实现独占设备的逻辑共享。</li>\n</ul>\n<h2><span id=\"io接口\"> IO接口</span></h2>\n<p>IO 接口(设备控制器)是CPU与IO 设备之间的桥梁。 设备管理器与CPU通过三根逻辑区分的总线相连，分别为数据总线、地址总线、控制总线。</p>\n<p>总线端维护了数据寄存器与控制/状态寄存器用于解决总线与IO速度不同步的信息缓冲问题</p>\n<p>在现代总线结构，比如显卡的PCIe结构中，都是使用高速串行总线结构，实际为单根高速总线的串行结构，在物理结构上串行，在逻辑结构上分段。</p>\n<p>IO接口可以有<strong>一个或者多个</strong>面向设备的接口，每一个接口可以传输数据、控制、状态信号。</p>\n<p>IO接口中存在IO逻辑电路，用于实现对于设备的控制.CPU启动设备后，向设备控制器发送相应命令与地址，但是传输给设备的信息需要经过IO逻辑进行解析并控制发送。</p>\n<p>控制设备器的主要作用</p>\n<ul>\n<li>接收并识别命令</li>\n<li>完成数据交换</li>\n<li>标识并报告设备状态</li>\n<li>地址识别与数据缓冲 (对应)</li>\n<li>差错控制 -- 通过奇偶校验等方式进行差错识别；通过Hamming码等方式进行校验，或者报告给CPU进行信息的重传。</li>\n</ul>\n<h3><span id=\"io-接口的类型\"> IO 接口的类型</span></h3>\n<p>传统意义下IO接口并不是一个大型控制器的集成，而是若干个IO接口功能模块的实现，IO接口根据不同的类型有不同的分类。</p>\n<p>根据数据传输的方式分类为<strong>并行接口</strong>与<strong>串行接口</strong></p>\n<p>根据主机访问方式分为 <strong>程序查询接口</strong>、<strong>中断接口</strong>与<strong>DMA接口</strong></p>\n<p>根据可编程性分为 <strong>可编程接口</strong>与<strong>不可编程接口</strong>， 在可编程接口中需要一套执行指令的PC/相关寄存器，甚至单个IO接口就是一个SoC，具有独立指令执行的功能的设备。</p>\n<p>根据设备类型不同分为 <strong>块设备接口</strong>、<strong>字符设备接口</strong>与<strong>网络设备接口</strong></p>\n<h3><span id=\"io-接口的编址\"> IO 接口的编址</span></h3>\n<p>可以分为独立编址方式与统一编址方式。</p>\n<p>独立编址方式中的IO接口的地址空间与主存相对独立，二者不属于相同的地址空间，因此主存的地址值能和主存的某个地址相同。</p>\n<p>统一编址方式则是将部分主存地址分配给IO设备，IO设备可以获得较大的编址空间，且可以使用统一的虚拟内存管理方式管理。缺点是减少了内存物理地址可用量。</p>\n<h3><span id=\"io控制方式\"> IO控制方式</span></h3>\n<p>分为程序直接控制(程序查询方式)、中断驱动方式与DMA方式。</p>\n<h4><span id=\"程序直接控制\"> 程序直接控制</span></h4>\n<p>程序直接控制方式中，IO设备与主机的数据交换完全由CPU控制，通过程序的执行实现。</p>\n<p>实现流程</p>\n<ul>\n<li>CPU初始化程序并预设相应参数(比如UART传输的波特率、起始地址等)</li>\n<li>向IO接口发送命令字并启动外设</li>\n<li>循环读取外设状态寄存器值</li>\n<li>如果设备就绪就进行数据传送</li>\n<li>修改地址与计数器参数</li>\n<li>直到计数器归零停止传送</li>\n</ul>\n<p>读取外设状态寄存器的查询方式可以分为</p>\n<ul>\n<li>独占查询 -- 进程占据CPU进行忙等待查询</li>\n<li>定时查询 -- CPU周期性查询，周期间释放CPU资源</li>\n</ul>\n<h4><span id=\"程序中断方式\"> 程序中断方式</span></h4>\n<p>通过中断技术让IO独占总线与CPU一段时间以实现信息与至指令的传输；当IO工作的时候将IO进程中断并实现IO设备和CPU中其他进程的并行工作，以节省CPU的资源。</p>\n<p><strong>中断源</strong>是能够向CPU发送中断请求的设备或者事件，中断系统为每个中断源设置了中断请求标记触发器，置1时表示中断源请求中断。</p>\n<p>通过<code>INTR</code>的信号线发出的是可屏蔽中断。通过<code>NMI</code>的信号线发送的是不可屏蔽中断。不可屏蔽中断优先级高于可屏蔽中断，且可屏蔽中断在关中断状态不能中断。</p>\n<p>中断响应的过程可分为</p>\n<ul>\n<li>关中断</li>\n<li>保存断点 -- 程序计数器PC与程序状态字PSW保存入栈或者专用寄存器</li>\n<li>中断服务程序寻址 -- 通过中断类型找到对应的中断服务程序入口，并送入PC等待执行</li>\n<li>开中断</li>\n<li>执行中断服务程序</li>\n<li>关中断</li>\n<li>恢复现场和PSW</li>\n<li>开中断与中断返回</li>\n</ul>\n<p>中断隐指令(关中断-保存断点-中断服务寻址)是中断前的准备指令，将前一进程状态保存，并加载中断服务程序以便实现中断。</p>\n<p>中断服务程序的实现过程分为中断过程和恢复过程，中断过程(开中断-执行中断服务程序-关中断)进行中断，恢复过程(恢复现场和PSW-开中断与中断返回)将中断现场恢复并开中断</p>\n<h4><span id=\"多重中断与中断屏蔽\"> 多重中断与中断屏蔽</span></h4>\n<p>当系统存在多中断源竞争的时候需要使用优先级机制匹配中断，只有在开中断阶段的时候可屏蔽中断才能参与多重中断的竞争。</p>\n<p>多重中断是指多个中断源发生时，低优先级中断过程被高等级中断抢占并优先中断的过程。通过栈结构管理相应多重中断的断点，因为优先级执行机制是FILO的。</p>\n<p><strong>中断处理优先级</strong>是多重中断的实际处理顺序，可以通过动态屏蔽技术动态调整。分为<strong>处理优先级</strong>与<strong>响应优先级</strong>，当中断系统不使用动态屏蔽技术的时候处理优先级等于响应优先级。</p>\n<p>中断请求寄存器维护每一个中断源的中断请求，中断屏蔽字寄存器维护进程的处理优先级。中断请求经由请求寄存器发送，在屏蔽字寄存器和处理优先级电路的运算下输出在当前中断请求状态下的处理优先级，再通过中断判优电路处理后输出真正的执行中断的响应优先级。</p>\n<p>逻辑上的中断实现需要维护一个 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>n</mi><mo>×</mo><mi>n</mi></mrow><annotation encoding=\"application/x-tex\">n\\times n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\">n</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">n</span></span></span></span> 的中断屏蔽字表，比如</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mo fence=\"true\">[</mo><mtable rowspacing=\"0.16em\" columnalign=\"center center center center\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mn>1</mn></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mn>1</mn></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mn>1</mn></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mn>1</mn></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mn>0</mn></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mn>1</mn></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mn>0</mn></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mn>0</mn></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mn>0</mn></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mn>1</mn></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mn>1</mn></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mn>0</mn></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mn>0</mn></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mn>1</mn></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mn>1</mn></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mn>1</mn></mstyle></mtd></mtr></mtable><mo fence=\"true\">]</mo></mrow><annotation encoding=\"application/x-tex\">\\begin{bmatrix}\n1&amp;1&amp;1&amp;1\\\\\n0&amp;1&amp;0&amp;0\\\\\n0&amp;1&amp;1&amp;0\\\\\n0&amp;1&amp;1&amp;1\n\\end{bmatrix}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:4.8em;vertical-align:-2.15em;\"></span><span class=\"minner\"><span class=\"mopen\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.65em;\"><span style=\"top:-4.65em;\"><span class=\"pstrut\" style=\"height:6.8em;\"></span><span style=\"width:0.667em;height:4.800em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.667em\" height=\"4.800em\" viewbox=\"0 0 667 4800\"><path d=\"M403 1759 V84 H666 V0 H319 V1759 v1200 v1759 h347 v-84\nH403z M403 1759 V0 H319 V1759 v1200 v1759 h84z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.15em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.65em;\"><span style=\"top:-4.81em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span><span style=\"top:-3.61em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">0</span></span></span><span style=\"top:-2.41em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">0</span></span></span><span style=\"top:-1.21em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">0</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.15em;\"><span></span></span></span></span></span><span class=\"arraycolsep\" style=\"width:0.5em;\"></span><span class=\"arraycolsep\" style=\"width:0.5em;\"></span><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.65em;\"><span style=\"top:-4.81em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span><span style=\"top:-3.61em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span><span style=\"top:-2.41em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span><span style=\"top:-1.21em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.15em;\"><span></span></span></span></span></span><span class=\"arraycolsep\" style=\"width:0.5em;\"></span><span class=\"arraycolsep\" style=\"width:0.5em;\"></span><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.65em;\"><span style=\"top:-4.81em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span><span style=\"top:-3.61em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">0</span></span></span><span style=\"top:-2.41em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span><span style=\"top:-1.21em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.15em;\"><span></span></span></span></span></span><span class=\"arraycolsep\" style=\"width:0.5em;\"></span><span class=\"arraycolsep\" style=\"width:0.5em;\"></span><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.65em;\"><span style=\"top:-4.81em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span><span style=\"top:-3.61em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">0</span></span></span><span style=\"top:-2.41em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">0</span></span></span><span style=\"top:-1.21em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.15em;\"><span></span></span></span></span></span></span></span><span class=\"mclose\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.65em;\"><span style=\"top:-4.65em;\"><span class=\"pstrut\" style=\"height:6.8em;\"></span><span style=\"width:0.667em;height:4.800em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.667em\" height=\"4.800em\" viewbox=\"0 0 667 4800\"><path d=\"M347 1759 V0 H0 V84 H263 V1759 v1200 v1759 H0 v84 H347z\nM347 1759 V0 H263 V1759 v1200 v1759 h84z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.15em;\"><span></span></span></span></span></span></span></span></span></span></span></span></p>\n<p>表示<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>A</mi><mo>&gt;</mo><mi>D</mi><mo>&gt;</mo><mi>C</mi><mo>&gt;</mo><mi>B</mi></mrow><annotation encoding=\"application/x-tex\">A&gt;D&gt;C&gt;B</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7224em;vertical-align:-0.0391em;\"></span><span class=\"mord mathnormal\">A</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">&gt;</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7224em;vertical-align:-0.0391em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">D</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">&gt;</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7224em;vertical-align:-0.0391em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">&gt;</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05017em;\">B</span></span></span></span>，这是静态的中断响应优先级表</p>\n<p>但是基于当前中断请求状态的动态执行是通过vector+ 中断请求寄存器更新实现的。</p>\n<p>比如在初始请求寄存器为<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo stretchy=\"false\">[</mo><mi>A</mi><mo separator=\"true\">,</mo><mi>B</mi><mo separator=\"true\">,</mo><mi>C</mi><mo separator=\"true\">,</mo><mi>D</mi><mo stretchy=\"false\">]</mo><mo>=</mo><mo stretchy=\"false\">[</mo><mn>0</mn><mo separator=\"true\">,</mo><mn>1</mn><mo separator=\"true\">,</mo><mn>1</mn><mo separator=\"true\">,</mo><mn>1</mn><mo stretchy=\"false\">]</mo></mrow><annotation encoding=\"application/x-tex\">[A,B,C,D] = [0,1,1,1]</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">[</span><span class=\"mord mathnormal\">A</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05017em;\">B</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">D</span><span class=\"mclose\">]</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">[</span><span class=\"mord\">0</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">1</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">1</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">1</span><span class=\"mclose\">]</span></span></span></span>时</p>\n<ul>\n<li>第一轮 -- 中断D，请求寄存器更新为<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo stretchy=\"false\">[</mo><mn>0</mn><mo separator=\"true\">,</mo><mn>1</mn><mo separator=\"true\">,</mo><mn>1</mn><mo separator=\"true\">,</mo><mn>0</mn><mo stretchy=\"false\">]</mo></mrow><annotation encoding=\"application/x-tex\">[0,1,1,0]</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">[</span><span class=\"mord\">0</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">1</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">1</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">0</span><span class=\"mclose\">]</span></span></span></span></li>\n<li>第二轮 -- 中断C, 请求寄存器更新为<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo stretchy=\"false\">[</mo><mn>0</mn><mo separator=\"true\">,</mo><mn>1</mn><mo separator=\"true\">,</mo><mn>0</mn><mo separator=\"true\">,</mo><mn>0</mn><mo stretchy=\"false\">]</mo></mrow><annotation encoding=\"application/x-tex\">[0,1,0,0]</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">[</span><span class=\"mord\">0</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">1</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">0</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">0</span><span class=\"mclose\">]</span></span></span></span></li>\n<li>第三轮 -- 中断B, 请求寄存器更新为<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo stretchy=\"false\">[</mo><mn>0</mn><mo separator=\"true\">,</mo><mn>0</mn><mo separator=\"true\">,</mo><mn>0</mn><mo separator=\"true\">,</mo><mn>0</mn><mo stretchy=\"false\">]</mo></mrow><annotation encoding=\"application/x-tex\">[0,0,0,0]</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">[</span><span class=\"mord\">0</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">0</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">0</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">0</span><span class=\"mclose\">]</span></span></span></span>, 中断结束</li>\n</ul>\n<h4><span id=\"dma-方式\"> DMA 方式</span></h4>\n<p>DMA 在IO接口和内存之间建立轮直接的数据通路，以提高CPU资源利用率，实现数据准准备阶段传输过程中的CPU任务与IO设备的并行。</p>\n<h5><span id=\"dma控制器组成\"> DMA控制器组成</span></h5>\n<ul>\n<li>主存地址计数器 -- 用于存放待传输数据的主存地址，传输一个字就自加确认下一个地址</li>\n<li>传送长度计数器 -- 用于标记待传送数据的总长，传送一个字就自减，计数器归零说明传输结束</li>\n<li>数据缓冲寄存器 -- 暂存每次传输的数据</li>\n<li>DMA请求触发器</li>\n<li>控制状态逻辑</li>\n<li>中断机构 -- DMA中断源</li>\n</ul>\n<p>DMA控制器需要竞争总线的能力，且对于总线/CPU的优先级满足</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-text\"><span class=\"line\"><span>DMA > 不可屏蔽中断 > 可屏蔽中断</span></span></code></pre>\n<h5><span id=\"dma-传送方式\"> DMA 传送方式</span></h5>\n<p>DMA 与 CPU 共享主存的方式可分为</p>\n<ul>\n<li>停止CPU访问主存 -- DMA直接竞争主存，CPU停止主存相关的访问，直到DMA结束数据传送</li>\n<li>周期挪用 -- 在单个访存周期内，DMA优先级高于CPU，但是周期内DMA无法中断CPU访存</li>\n<li>DMA与CPU交替访存 -- 交替分配主存访存周期</li>\n</ul>\n<h5><span id=\"dma传送过程\"> DMA传送过程</span></h5>\n<p>DMA传送需要<strong>CPU</strong>进行预处理，数据传送与后处理三个过程。</p>\n<ul>\n<li>预处理 -- 完成DMA的初始化并启动IO设备，当IO设备准备好输入/输出请求时，向DMA发送DMA请求</li>\n<li>数据传输 -- DMA直接通过总线访问主存，CPU执行其他进程</li>\n<li>后处理 -- CPU响应后执行中断服务程序，并进行数据校验等后处理工作。</li>\n</ul>\n<h2><span id=\"io-软件\"> IO 软件</span></h2>\n<p>IO软件是运行在CPU侧进行IO设备与IO接口管理的软件系统，向上与文件系统、用户进程、虚拟存储器等系统交互，向下与底层IO设备系统交互。</p>\n<p>比如NVIDIA的驱动程序，面向下层显卡的实际计算管理，面向上层操作系统与用户暴露API，使用户能通过这些API进行显卡计算能力的使用，比如CUDA/OpenGL等计算</p>\n<p>IO 软件根据 用户侧 -- 硬件侧的自下而上的软件分类可分为</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-text\"><span class=\"line\"><span>用户 - 用户层软件 - 设备独立性软件 - 设备驱动软件 - 中断处理软件 - 硬件</span></span></code></pre>\n<h3><span id=\"中断处理软件\"> 中断处理软件</span></h3>\n<p>用于管理IO设备结束等情况下的中断管理，实现最基本的CPU/总线与IO设备的通信层任务</p>\n<p>操作系统对于中断处理软件本身也存在一层管理，包括三个环节</p>\n<ul>\n<li>注册中断 -- 向内核注册中断号，建立当前设备中断状态与内核记录的服务状态的记录映射，建立CPU和设备间中断状态的握手，确保内核能进行中断处理</li>\n<li>处理中断</li>\n<li>注销中断 -- 驱动/设备停用的时候恢复中断现场，注销中断处理程序并释放相应资源。</li>\n</ul>\n<h3><span id=\"设备驱动程序\"> 设备驱动程序</span></h3>\n<p>面向不同种类设备的专用管理软件，比如打印机的打印机驱动，鼠标设备的驱动，NVIDIA显卡的显卡驱动等。不同设备向主机暴露的API不同，访问方式不同，设备驱动程序正是提供这些API的软件层。</p>\n<p>类比VFS的统一管理多协议磁盘的形式，设备驱动程序类似于磁盘文件系统<code>NTFS</code>/<code>ext4</code>。</p>\n<h3><span id=\"设备独立层软件\"> 设备独立层软件</span></h3>\n<p>操作系统将设备驱动程序进行统一管理，比如网络的统一化接口<code>Socket</code>层，进行用户层网络调用的编程不需要考虑通过有线还是通过无线进行网络访问，而直接通过操作系统暴露的<code>Socket</code>获得网络资源。</p>\n<p>设备独立层软件引入物理设备与逻辑设备的概念，将实际用户层的访问与物理设备相解耦。比如在Unix中访问网卡信息，通常获得的不是真实的网卡名，而是映射后的网卡号以及相关信息</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-text\"><span class=\"line\"><span>asuna@AsunadeMacBook-Air ~ % ifconfig</span></span>\n<span class=\"line\"><span>lo0: ...</span></span>\n<span class=\"line\"><span>gif0: ...</span></span>\n<span class=\"line\"><span>stf0: ...</span></span>\n<span class=\"line\"><span>anpi0: ...</span></span>\n<span class=\"line\"><span>anpi1: ...</span></span>\n<span class=\"line\"><span>en3: ...</span></span>\n<span class=\"line\"><span>en4: ...</span></span>\n<span class=\"line\"><span>en1: ...</span></span>\n<span class=\"line\"><span>en2: ...</span></span>\n<span class=\"line\"><span>ap1: ...</span></span>\n<span class=\"line\"><span>en0: ...</span></span>\n<span class=\"line\"><span>utun0: ...</span></span>\n<span class=\"line\"><span>awdl0: ...</span></span>\n<span class=\"line\"><span>llw0: ...</span></span>\n<span class=\"line\"><span>utun1:...</span></span>\n<span class=\"line\"><span>utun2: ...</span></span>\n<span class=\"line\"><span>utun3: ...</span></span></code></pre>\n<p>其中<code>en*</code>是映射真实物理网卡的逻辑网卡接口<br>\n<code>lo0</code>是本地回环的网络接口，纯粹软件虚拟实现且指向<code>127.0.0.1</code><br>\n<code>anpi*</code>是Apple Silicon 的内部协议与通信接口，用于支撑内部系统服务的部分通信，并将这样的总线通信权限通过<code>anpi*</code>暴露给用户。</p>\n<h3><span id=\"用户层软件\"> 用户层软件</span></h3>\n<p>用户层软件就是整体实现与用户交互的软件接口，通过调用用户层的IO库函数与硬件进行交互。</p>\n<h2><span id=\"应用程序io接口\"> 应用程序IO接口</span></h2>\n<p>应用程序IO接口是OS面向用户侧提供的管理与访问IO设备的架构与编程规范。</p>\n<p>OS将物理设备抽象为<strong>块设备</strong>、<strong>字符设备</strong>、<strong>网络设备</strong>，网络设备虽然速度匹配，但是网络通信使用的是不可随机访问的数据包而不是数据块，因此不算块设备的范畴。</p>\n<ul>\n<li>\n<p>字符设备接口 -- <code>get</code>/<code>put</code>操作用于向缓冲区读写；<code>in-control</code> 用于实现具体设备的控制; <code>open</code>/<code>close</code>用于打开关闭与互斥访问</p>\n</li>\n<li>\n<p>块设备接口 -- 块设备通常为存储设备，因此块设备接口通常暴露的就是存储相关能力的接口。<code>read</code>/<code>write</code>/<code>seek</code>用于执行基本功能，<code>seek</code>用于指定下一个传输块。</p>\n</li>\n</ul>\n<h3><span id=\"阻塞io与非阻塞io\"> 阻塞IO与非阻塞IO</span></h3>\n<p>IO接口面向不同的设备有两种执行模式: 阻塞与非阻塞，</p>\n<p>阻塞IO能在进程发起IO请求但是资源未满足时发生阻塞，并移动到相应等待队列。</p>\n<p>非阻塞IO发生时，进程会轮询资源就绪情况，且进程能同时执行其他的任务而不受IO请求的影响。</p>\n<h1><span id=\"设备独立性软件\"> 设备独立性软件</span></h1>\n<p>设备独立性软件和VFS相似，在设备驱动程序层之上，实现不同设备等公共操作，并暴露统一且抽象化的API。具体功能有</p>\n<ul>\n<li><strong>统一的驱动程序接口</strong></li>\n<li>缓冲管理 -- 缓冲CPU和IO设备的速度差异，建立统一的缓冲区对应硬件层的软件实现。</li>\n<li>差错控制</li>\n<li>独占设备的分配与回收</li>\n<li>统一的逻辑数据块</li>\n</ul>\n<h2><span id=\"高速缓存与缓冲区\"> 高速缓存与缓冲区</span></h2>\n<p>磁盘高速缓存(Disk cache)基于内存实现，利用内存的存储空间暂存磁盘读取的盘块信息以实现加快磁盘IO速度的效果。在内存实现方式有两种</p>\n<ul>\n<li>在内存中开辟固定大小的专用缓冲区</li>\n<li>将内存中空闲的空间作为动态缓冲池，供分页系统与磁盘IO子系统使用。</li>\n</ul>\n<h3><span id=\"缓冲区\"> 缓冲区</span></h3>\n<p>IO设备大部分情况下IO速度远远慢于CPU的速度(如打印机)，也可能远快于CPU(显卡)</p>\n<p>缓冲区的实现方式有</p>\n<ul>\n<li>硬件缓冲区 -- 实现成本较高，关键部位才使用</li>\n<li>内存缓冲区 -- 常见实现方式</li>\n</ul>\n<p>缓冲区总是单个时刻可读/可写的，所以其能实现的作用是在CPU处理时间内作为数据蓄水池</p>\n<h4><span id=\"单缓冲\"> 单缓冲</span></h4>\n<p>在缓冲区结构中，单次IO时间分为三个部分</p>\n<ul>\n<li>IO设备写入缓冲区的时间 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>T</mi></mrow><annotation encoding=\"application/x-tex\">T</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">T</span></span></span></span></li>\n<li>缓冲区传送给CPU的时间 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>M</mi></mrow><annotation encoding=\"application/x-tex\">M</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">M</span></span></span></span></li>\n<li>CPU处理时间 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>C</mi></mrow><annotation encoding=\"application/x-tex\">C</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span></span></span></span></li>\n</ul>\n<p>在单个周期内，当<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>M</mi><mi>i</mi></msub></mrow><annotation encoding=\"application/x-tex\">M_i</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">M</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.109em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span>结束，就能执行<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>T</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub></mrow><annotation encoding=\"application/x-tex\">T_{i+1}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8917em;vertical-align:-0.2083em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">i</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2083em;\"><span></span></span></span></span></span></span></span></span></span>.如果<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>C</mi><mi>i</mi></msub><mo>&gt;</mo><msub><mi>T</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub></mrow><annotation encoding=\"application/x-tex\">C_{i}&gt; T_{i+1}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">&gt;</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8917em;vertical-align:-0.2083em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">i</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2083em;\"><span></span></span></span></span></span></span></span></span></span> ，即CPU处理时间长于IO写时间，则缓存区写等待CPU执行；反之CPU写等缓存区写。通常情况下都是 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>C</mi><mi>i</mi></msub><mo>&lt;</mo><msub><mi>T</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub></mrow><annotation encoding=\"application/x-tex\">C_{i}&lt;T_{i+1}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">&lt;</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8917em;vertical-align:-0.2083em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">i</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2083em;\"><span></span></span></span></span></span></span></span></span></span>, 因此单轮耗时为</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi mathvariant=\"script\">T</mi><mi>s</mi></msub><mo>=</mo><mi>max</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><mi>C</mi><mo separator=\"true\">,</mo><mi>T</mi><mo stretchy=\"false\">)</mo><mo>+</mo><mi>M</mi></mrow><annotation encoding=\"application/x-tex\">\\mathcal{T}_s = \\max(C,T)+M\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathcal\" style=\"margin-right:0.25417em;\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:-0.2542em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">s</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mop\">max</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">T</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">M</span></span></span></span></span></p>\n<p>对于<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>N</mi></mrow><annotation encoding=\"application/x-tex\">N</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span></span></span></span>轮IO写入的作业，总耗时为</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi mathvariant=\"script\">T</mi><mrow><mi>t</mi><mi>o</mi><mi>t</mi><mi>a</mi><mi>l</mi></mrow></msub><mo>=</mo><mo stretchy=\"false\">(</mo><mi>N</mi><mo>−</mo><mn>1</mn><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">(</mo><mi>max</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><mi>C</mi><mo separator=\"true\">,</mo><mi>T</mi><mo stretchy=\"false\">)</mo><mo>+</mo><mi>M</mi><mo stretchy=\"false\">)</mo><mo>+</mo><mo stretchy=\"false\">(</mo><mi>T</mi><mo>+</mo><mi>C</mi><mo>+</mo><mi>M</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\mathcal{T}_{total} = (N-1)(\\max(C,T)+M)+(T+C+M)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathcal\" style=\"margin-right:0.25417em;\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.2542em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">t</span><span class=\"mord mathnormal mtight\">o</span><span class=\"mord mathnormal mtight\">t</span><span class=\"mord mathnormal mtight\">a</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">1</span><span class=\"mclose\">)</span><span class=\"mopen\">(</span><span class=\"mop\">max</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">T</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">M</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">T</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">M</span><span class=\"mclose\">)</span></span></span></span></span></p>\n<p>最后一轮没有<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>C</mi></mrow><annotation encoding=\"application/x-tex\">C</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span></span></span></span>与<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>T</mi></mrow><annotation encoding=\"application/x-tex\">T</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">T</span></span></span></span>在时间轴上的cover，因此都需要计算</p>\n<h4><span id=\"双缓冲\"> 双缓冲</span></h4>\n<p>双缓冲是可切换的两个缓冲区的结构。在缓存区1写的时候，缓存区2也能向CPU传输信息</p>\n<p>如果IO写时间 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>T</mi></mrow><annotation encoding=\"application/x-tex\">T</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">T</span></span></span></span> &gt; 从缓冲区传入CPU的总时间 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>M</mi><mo>+</mo><mi>C</mi></mrow><annotation encoding=\"application/x-tex\">M+C</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">M</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span></span></span></span>, 则缓冲区2空闲等待缓冲区1写，反之需要缓冲区1满等待缓冲区2执行。因此单轮执行时间为</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi mathvariant=\"script\">T</mi><mi>s</mi></msub><mo>=</mo><mi>max</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><mi>T</mi><mo separator=\"true\">,</mo><mi>M</mi><mo>+</mo><mi>C</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\mathcal{T}_s = \\max(T,M+C)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathcal\" style=\"margin-right:0.25417em;\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:-0.2542em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">s</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mop\">max</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">T</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">M</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"mclose\">)</span></span></span></span></span></p>\n<p>N轮的总执行时间为</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi mathvariant=\"script\">T</mi><mrow><mi>t</mi><mi>o</mi><mi>t</mi><mi>a</mi><mi>l</mi></mrow></msub><mo>=</mo><mo stretchy=\"false\">(</mo><mi>N</mi><mo>−</mo><mn>1</mn><mo stretchy=\"false\">)</mo><mi>max</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><mi>T</mi><mo separator=\"true\">,</mo><mi>M</mi><mo>+</mo><mi>C</mi><mo stretchy=\"false\">)</mo><mo>+</mo><mo stretchy=\"false\">(</mo><mi>T</mi><mo>+</mo><mi>M</mi><mo>+</mo><mi>C</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\mathcal{T}_{total} = (N-1)\\max(T,M+C) +(T+M+C)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathcal\" style=\"margin-right:0.25417em;\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.2542em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">t</span><span class=\"mord mathnormal mtight\">o</span><span class=\"mord mathnormal mtight\">t</span><span class=\"mord mathnormal mtight\">a</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">1</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop\">max</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">T</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">M</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">T</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">M</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"mclose\">)</span></span></span></span></span></p>\n<p>双缓冲能实现全双工通信，双向传输信息，每一台设备设置发送缓冲区与接收缓冲区</p>\n<h4><span id=\"循环缓冲\"> 循环缓冲</span></h4>\n<p>循环缓冲区的实现逻辑与循环链表类似，通过环形空间和<code>in</code>/<code>out</code>两个指针指向当前写入的缓冲区与传入CPU的缓冲区。</p>\n<p>如果<code>in</code>指针追上<code>out</code>指针则为CPU速度慢于IO写速度；反之则为IO写速度慢于CPU速度</p>\n<h4><span id=\"缓冲池\"> 缓冲池</span></h4>\n<p>缓冲池是一套包括管理数据结构和操作函数的，能管理多个缓冲区的软件机制，能支持多进程共享。</p>\n<p>缓冲池按四种方式进行工作</p>\n<ul>\n<li>收容输入 -- 将输入数据存入输入缓冲区</li>\n<li>提取输入 -- 将输入缓冲区信息传入CPU并清空输入缓冲区</li>\n<li>收容输出 -- 从CPU将输出数据输入输出缓冲区</li>\n<li>提取输出 -- 将输出缓冲区的数据输出到IO并清空输出缓冲区</li>\n</ul>\n<h2><span id=\"设备分配与回收\"> 设备分配与回收</span></h2>\n<p>设备独立性软件维护了一张<strong>系统设备表(SDT)</strong>，包含所有物理设备与相应的全局索引。每一个表项对应一个设备，包含设备类型、标识符等用于检索设备的信息。</p>\n<p>每一个物理设备私有一个<strong>设备控制表(DCT)</strong>, 管理设备的各项具体属性，比如设备类型、设备标识符、设备状态、指向设备器控制表的指针、重复执行次数与时间和设备的队首指针等。</p>\n<p>**设备器控制表(COCT)<strong>用于标记设备器的状态信息，存在指向</strong>通道控制表(CHCT)**的指针</p>\n<p><strong>通道控制表(CHCT)</strong> 标记每一个通道的状态信息，以及通道相关的所有的控制器的信息。</p>\n<p>通道是内存和IO设备的通信通道，是一个协处理器级别的IO管理设备，用于实现更加复杂化，可编程的，类似于DMA的IO设备管理。</p>\n<p>根据访问顺序自上而下的排列为</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-text\"><span class=\"line\"><span>SDT -> DCT -> COCT -> CHCT</span></span></code></pre>\n<p>设备的分配的顺序为</p>\n<ul>\n<li>设备分配 -- 查询SDT并找出对应设备的DCT,分配设备</li>\n<li>控制器分配 -- 根据COCT找到对应设备的控制器</li>\n<li>通道分配 -- 控制器分配后根据COCT找到对应CHCT, 分配内存和IO设备间的通道。</li>\n</ul>\n<p>设备的分配算法会考虑是否死锁，</p>\n<ul>\n<li>安全分配方式 -- 死锁避免，进程IO请求后进入阻塞态</li>\n<li>不安全分配方式 -- 只有当请求设备被其他进程占用时才进入阻塞态。但是可能由于资源竞争造成死锁</li>\n</ul>\n<p>死锁竞争的讨论见 <a href=\"/2026/05/29/CS/OS/%E8%BF%9B%E7%A8%8B%E7%9A%84%E5%90%8C%E6%AD%A5%E4%B8%8E%E4%BA%92%E6%96%A5/\" title=\"进程的同步与互斥\">进程的同步与互斥</a> 与 <a href=\"/2026/06/01/CS/OS/deadlock/\" title=\"死锁\">死锁</a></p>\n<h3><span id=\"逻辑设备与物理设备的映射\"> 逻辑设备与物理设备的映射</span></h3>\n<p>设备的逻辑化需要建立逻辑设备和物理设备之间的映射，对应的映射表为<strong>逻辑设备表(LUT)</strong></p>\n<p>逻辑设备表可以为整个系统维护一个LUT表或者为单个用户维护LUT表，前者适用于单用户系统，后者适用于多用户系统</p>\n<h2><span id=\"spooling技术\"> SPOOLing技术</span></h2>\n<p>SPOOLing技术用于缓和高速CPU和低速IO之间的速度矛盾。其核心是将高速磁盘作为缓冲进行相应信息的临时存储，通过软件模拟脱机IO的过程。</p>\n<p>脱机IO的本质是将不同的作业拿到外围机进行并行IO作业，一方面解放主计算机的CPU，一方面可以保证IO作业的连续性。</p>\n<p>通过SPOOLing技术能通过<strong>多道程序</strong>和磁盘缓冲空间代替IO的外围机，从而将一台物理独占设备仿真为多个逻辑设备。具体场景为多用户共享的打印中心。</p>\n<p>SPOOLing技术的构成结构有</p>\n<ul>\n<li>输入井和输出井 -- 位于磁盘中的缓冲区，用于暂存输入/输出设备的文件(称为井文件)</li>\n<li>输入/输出缓冲区 -- 在内存中用于暂存输入/输出设备的数据，这是磁盘写速度慢于内存的平衡buffer</li>\n<li>输入输出进程</li>\n<li>井管理系统 -- 控制作业和磁盘井的信息交换</li>\n</ul>\n<h1><span id=\"磁盘\"> 磁盘</span></h1>\n<p><a href=\"/2026/06/11/CS/%E8%AE%A1%E7%BB%84/%E7%A3%81%E7%9B%98/\" title=\"磁盘与SSD\">磁盘与SSD</a></p>\n",
            "tags": []
        },
        {
            "id": "https://yuuko.site/2026/06/07/CS/LLM/position_encoding/",
            "url": "https://yuuko.site/2026/06/07/CS/LLM/position_encoding/",
            "title": "Position Encoding",
            "date_published": "2026-06-06T16:00:00.000Z",
            "content_html": "<link rel=\"stylesheet\" type=\"text/css\" href=\"https://cdn.jsdelivr.net/hint.css/2.4.1/hint.min.css\"><h1><span id=\"position-encoding\"> Position Encoding</span></h1>\n<p><a href=\"https://arxiv.org/abs/1706.03762\">Attention Is All You Need</a> 中提出了 <strong>Sinusoidal Position Encoding</strong></p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mtable rowspacing=\"0.25em\" columnalign=\"right left\" columnspacing=\"0em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow><mi mathvariant=\"normal\">P</mi><mi mathvariant=\"normal\">E</mi></mrow><mo stretchy=\"false\">(</mo><mi>p</mi><mi>o</mi><mi>s</mi><mo separator=\"true\">,</mo><mn>2</mn><mi>i</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mi>sin</mi><mo>⁡</mo><mrow><mo fence=\"true\">(</mo><mfrac><mrow><mi>p</mi><mi>o</mi><mi>s</mi></mrow><msup><mi>f</mi><mrow><mn>2</mn><mi>i</mi><mi mathvariant=\"normal\">/</mi><msub><mi>d</mi><mrow><mi>m</mi><mi>o</mi><mi>d</mi></mrow></msub></mrow></msup></mfrac><mo fence=\"true\">)</mo></mrow></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow><mi mathvariant=\"normal\">P</mi><mi mathvariant=\"normal\">E</mi></mrow><mo stretchy=\"false\">(</mo><mi>p</mi><mi>o</mi><mi>s</mi><mo separator=\"true\">,</mo><mn>2</mn><mi>i</mi><mo>+</mo><mn>1</mn><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mi>cos</mi><mo>⁡</mo><mrow><mo fence=\"true\">(</mo><mfrac><mrow><mi>p</mi><mi>o</mi><mi>s</mi></mrow><msup><mi>f</mi><mrow><mn>2</mn><mi>i</mi><mi mathvariant=\"normal\">/</mi><msub><mi>d</mi><mrow><mi>m</mi><mi>o</mi><mi>d</mi></mrow></msub></mrow></msup></mfrac><mo fence=\"true\">)</mo></mrow></mrow></mstyle></mtd></mtr></mtable><annotation encoding=\"application/x-tex\">\\begin{aligned}\n\\mathrm{PE}(pos,2i) &amp;= \\sin \\left(\\frac{pos}{f^{2i/d_{mod}}}\\right)\\\\\n\\mathrm{PE}(pos,2i+1)&amp; = \\cos \\left(\\frac{pos}{f^{2i/d_{mod}}}\\right)\n\\end{aligned}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:5.4001em;vertical-align:-2.45em;\"></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-r\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.95em;\"><span style=\"top:-4.95em;\"><span class=\"pstrut\" style=\"height:3.45em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathrm\">PE</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">p</span><span class=\"mord mathnormal\">os</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">2</span><span class=\"mord mathnormal\">i</span><span class=\"mclose\">)</span></span></span><span style=\"top:-2.25em;\"><span class=\"pstrut\" style=\"height:3.45em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathrm\">PE</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">p</span><span class=\"mord mathnormal\">os</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">2</span><span class=\"mord mathnormal\">i</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\">1</span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.45em;\"><span></span></span></span></span></span><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.95em;\"><span style=\"top:-4.95em;\"><span class=\"pstrut\" style=\"height:3.45em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mop\">sin</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">(</span></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1076em;\"><span style=\"top:-2.296em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.814em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\">i</span><span class=\"mord mtight\">/</span><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">d</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3448em;\"><span style=\"top:-2.3488em;margin-left:0em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">m</span><span class=\"mord mathnormal mtight\">o</span><span class=\"mord mathnormal mtight\">d</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1512em;\"><span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">p</span><span class=\"mord mathnormal\">os</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8984em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">)</span></span></span></span></span><span style=\"top:-2.25em;\"><span class=\"pstrut\" style=\"height:3.45em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mop\">cos</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">(</span></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1076em;\"><span style=\"top:-2.296em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.814em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\">i</span><span class=\"mord mtight\">/</span><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">d</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3448em;\"><span style=\"top:-2.3488em;margin-left:0em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">m</span><span class=\"mord mathnormal mtight\">o</span><span class=\"mord mathnormal mtight\">d</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1512em;\"><span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">p</span><span class=\"mord mathnormal\">os</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8984em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">)</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.45em;\"><span></span></span></span></span></span></span></span></span></span></span></span></p>\n<p>对于token vector</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mo>=</mo></mrow><annotation encoding=\"application/x-tex\">pos = \n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\">p</span><span class=\"mord mathnormal\">os</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span></span></span></span></span></p>\n",
            "tags": [
                "LLM",
                "encoding"
            ]
        },
        {
            "id": "https://yuuko.site/2026/06/07/CS/LLM/CS336/assignment1/",
            "url": "https://yuuko.site/2026/06/07/CS/LLM/CS336/assignment1/",
            "title": "Assignment1 -- Building a Transformer LM",
            "date_published": "2026-06-06T16:00:00.000Z",
            "content_html": "<link rel=\"stylesheet\" type=\"text/css\" href=\"https://cdn.jsdelivr.net/hint.css/2.4.1/hint.min.css\"><h1><span id=\"assignment1-building-a-transformer-lm\"> Assignment1 -- Building a Transformer LM</span></h1>\n<p>本作业从最基本的全连接层出发，进行一个简单的Transformer LM 的搭建</p>\n<h2><span id=\"torch的一些api的用法\"> <code>torch</code>的一些API的用法</span></h2>\n<h3><span id=\"矩阵转置与对换\"> 矩阵转置与对换</span></h3>\n<p>使用<code>&lt;Tensor&gt;.T</code>可以实现矩阵的转置，同样可以使用更加泛用的<code>&lt;Tensor&gt;.transpose(-2,-1)</code>进行维度对换</p>\n<p><code>&lt;Tensor&gt;.transpose(i,j)</code>相当于第 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>i</mi></mrow><annotation encoding=\"application/x-tex\">i</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6595em;\"></span><span class=\"mord mathnormal\">i</span></span></span></span> 维与第 $j $维进行对换</p>\n<h2><span id=\"核心函数的实现\"> 核心函数的实现</span></h2>\n<h3><span id=\"linear\"> Linear</span></h3>\n<p>简单的线性层的实现，相当于实现了一个张量乘法。</p>\n<p>在<code>torch</code>中实现乘法的方式:<code>torch.matmul</code>或者直接<code>@</code>。 在考虑张量的乘法的时候需要考虑乘法的维度对应问题</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> run_linear</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    d_in</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> int</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    d_out</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> int</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    weights</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Float</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Tensor</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> d_out d_in</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    in_features</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Float</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Tensor</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> ... d_in</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> -</span><span style=\"color:#999999;--shiki-dark:#666666\">></span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Float</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Tensor</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> ... d_out</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    return</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> in_features </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">@</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> weights</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">T</span></span></code></pre>\n<h3><span id=\"embedding\"> Embedding</span></h3>\n<p>嵌入映射的函数的实现。输入的Token经过Tokenizer转换为token_ids后，通过embedding形成token的特征向量。具体的实现为查表</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> run_embedding</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    vocab_size</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> int</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    d_model</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> int</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    weights</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Float</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Tensor</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> vocab_size d_model</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    token_ids</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Int</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Tensor</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> ...</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> -</span><span style=\"color:#999999;--shiki-dark:#666666\">></span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Float</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Tensor</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> ... d_model</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    return</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> weights</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">token_ids</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span></code></pre>\n<p><code>torch.Tensor</code>支持python数组的操作方式，因此可以直接通过数组读token_ids的方式输出Tensor</p>\n<h3><span id=\"swiglu\"> SwiGLU</span></h3>\n<p>实现SwiGLU激活函数。同样是注意张量乘法维度的问题。SwiGLU的数学定义满足</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mrow><mi mathvariant=\"normal\">S</mi><mi mathvariant=\"normal\">w</mi><mi mathvariant=\"normal\">i</mi><mi mathvariant=\"normal\">G</mi><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">U</mi></mrow><mo stretchy=\"false\">(</mo><mi>x</mi><mo separator=\"true\">,</mo><msub><mi>W</mi><mn>1</mn></msub><mo separator=\"true\">,</mo><msub><mi>W</mi><mn>2</mn></msub><mo separator=\"true\">,</mo><msub><mi>W</mi><mn>3</mn></msub><mo stretchy=\"false\">)</mo><mo>=</mo><msub><mi>W</mi><mn>3</mn></msub><mo stretchy=\"false\">(</mo><mrow><mi mathvariant=\"normal\">S</mi><mi mathvariant=\"normal\">i</mi><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">U</mi></mrow><mo stretchy=\"false\">(</mo><msub><mi>W</mi><mn>1</mn></msub><mi>x</mi><mo stretchy=\"false\">)</mo><mo>⊙</mo><msub><mi>W</mi><mn>2</mn></msub><mi>x</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\mathrm{SwiGLU}(x,W_1,W_2,W_3) = W_3(\\mathrm{SiLU}(W_1x)\\odot W_2x)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">SwiGLU</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">3</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">3</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathrm\">SiLU</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⊙</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span></span></span></span></span></p>\n<p>其中<code>SiLU</code>和<code>Sigmoid</code>函数为</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mrow><mi mathvariant=\"normal\">S</mi><mi mathvariant=\"normal\">i</mi><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">U</mi></mrow><mo>=</mo><mi>x</mi><mi>σ</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mfrac><mi>x</mi><mrow><mn>1</mn><mo>+</mo><msup><mi>e</mi><mrow><mo>−</mo><mi>x</mi></mrow></msup></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\mathrm{SiLU} = x\\sigma(x) = \\frac{x}{1+e^{-x}}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">SiLU</span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\">x</span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">σ</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.8769em;vertical-align:-0.7693em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1076em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">e</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6973em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">−</span><span class=\"mord mathnormal mtight\">x</span></span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7693em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> run_swiglu</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    d_model</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> int</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    d_ff</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> int</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    w1_weight</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Float</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Tensor</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> d_ff d_model</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    w2_weight</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Float</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Tensor</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> d_model d_ff</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    w3_weight</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Float</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Tensor</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> d_ff d_model</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    in_features</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Float</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Tensor</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> ... d_model</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> -</span><span style=\"color:#999999;--shiki-dark:#666666\">></span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Float</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Tensor</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> ... d_model</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">    def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> run_silu</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">in_features</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">        return</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> in_features </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">/</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">exp</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> -</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> in_features</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    tensor_1 </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> in_features </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">@</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> w1_weight</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">T   </span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    tensor_2 </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> in_features </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">@</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> w3_weight</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">T   </span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    return</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">run_silu</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">tensor_1</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> tensor_2</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> @</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> w2_weight</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">T</span></span></code></pre>\n<p>计算中实现<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo>⊙</mo></mrow><annotation encoding=\"application/x-tex\">\\odot</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6667em;vertical-align:-0.0833em;\"></span><span class=\"mord\">⊙</span></span></span></span> ，即<strong>Hadamard积</strong>，直接使用<code>A*B</code></p>\n<h3><span id=\"softmax\"> Softmax</span></h3>\n<p><code>Softmax</code>需要根据<code>dim</code>进行Tensor的切分，作为vector进行计算后再合并</p>\n<p>为防止最大值过大导致计算指数爆炸，通常在计算时将最大值减去</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mtable rowspacing=\"0.25em\" columnalign=\"right left\" columnspacing=\"0em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow><mi mathvariant=\"normal\">S</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">f</mi><mi mathvariant=\"normal\">t</mi><mi mathvariant=\"normal\">m</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">x</mi></mrow><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><msub><mrow><mo fence=\"true\">(</mo><mfrac><msup><mi>e</mi><msub><mi>x</mi><mi>i</mi></msub></msup><mrow><mo>∑</mo><msup><mi>e</mi><msub><mi>x</mi><mi>j</mi></msub></msup></mrow></mfrac><mo fence=\"true\">)</mo></mrow><mi>i</mi></msub></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><msub><mrow><mo fence=\"true\">(</mo><mfrac><msup><mi>e</mi><mrow><msub><mi>x</mi><mi>i</mi></msub><mo>−</mo><msub><mi>x</mi><mrow><mi>m</mi><mi>a</mi><mi>x</mi></mrow></msub></mrow></msup><mrow><mo>∑</mo><msup><mi>e</mi><mrow><msub><mi>x</mi><mi>j</mi></msub><mo>−</mo><msub><mi>x</mi><mrow><mi>m</mi><mi>a</mi><mi>x</mi></mrow></msub></mrow></msup></mrow></mfrac><mo fence=\"true\">)</mo></mrow><mi>i</mi></msub></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mrow><mi mathvariant=\"normal\">S</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">f</mi><mi mathvariant=\"normal\">t</mi><mi mathvariant=\"normal\">m</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">x</mi></mrow><mo stretchy=\"false\">(</mo><mi>x</mi><mo>−</mo><msub><mi>x</mi><mrow><mi>m</mi><mi>a</mi><mi>x</mi></mrow></msub><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr></mtable><annotation encoding=\"application/x-tex\">\\begin{aligned}\n\\mathrm{Softmax}(x)&amp; = \\left(\\frac{e^{x_i}}{\\sum e^{x_j}}\\right)_i\\\\\n&amp;=\\left(\\frac{e^{x_i-x_{max}}}{\\sum e^{x_j - x_{max}}}\\right)_i\\\\\n&amp;=\\mathrm{Softmax}(x-x_{max})\n\\end{aligned}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:6.9995em;vertical-align:-3.2497em;\"></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-r\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.7497em;\"><span style=\"top:-5.7497em;\"><span class=\"pstrut\" style=\"height:3.45em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathrm\">Softmax</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span></span></span><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3.45em;\"></span><span class=\"mord\"></span></span><span style=\"top:-0.8603em;\"><span class=\"pstrut\" style=\"height:3.45em;\"></span><span class=\"mord\"></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.2497em;\"><span></span></span></span></span></span><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.7497em;\"><span style=\"top:-5.7497em;\"><span class=\"pstrut\" style=\"height:3.45em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"minner\"><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">(</span></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3414em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mop op-symbol small-op\" style=\"position:relative;top:0em;\">∑</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">e</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6065em;\"><span style=\"top:-3.0051em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3281em;\"><span style=\"top:-2.357em;margin-left:0em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2819em;\"><span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">e</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6644em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3281em;\"><span style=\"top:-2.357em;margin-left:0em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.143em;\"><span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.936em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">)</span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:-0.5381em;\"><span style=\"top:-1.7003em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9997em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3.45em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"minner\"><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">(</span></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.4483em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mop op-symbol small-op\" style=\"position:relative;top:0em;\">∑</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">e</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7134em;\"><span style=\"top:-3.0051em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3281em;\"><span style=\"top:-2.357em;margin-left:0em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2819em;\"><span></span></span></span></span></span></span><span class=\"mbin mtight\">−</span><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1645em;\"><span style=\"top:-2.357em;margin-left:0em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">ma</span><span class=\"mord mathnormal mtight\">x</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.143em;\"><span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">e</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7713em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3281em;\"><span style=\"top:-2.357em;margin-left:0em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.143em;\"><span></span></span></span></span></span></span><span class=\"mbin mtight\">−</span><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1645em;\"><span style=\"top:-2.357em;margin-left:0em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">ma</span><span class=\"mord mathnormal mtight\">x</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.143em;\"><span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.936em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">)</span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:-0.5381em;\"><span style=\"top:-1.7003em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9997em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-0.8603em;\"><span class=\"pstrut\" style=\"height:3.45em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">Softmax</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">ma</span><span class=\"mord mathnormal mtight\">x</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.2497em;\"><span></span></span></span></span></span></span></span></span></span></span></span></p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> run_softmax</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">in_features</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Float</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Tensor</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> ...</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> dim</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> int</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> -</span><span style=\"color:#999999;--shiki-dark:#666666\">></span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Float</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Tensor</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> ...</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    max_val </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">max</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">in_features</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dim</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\">keepdim</span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">True</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">values</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    sum_val </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">sum</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">exp</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">in_feature </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">-</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">max</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dim</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\">keepdim</span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">True</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    return</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">exp</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">in_feature </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">-</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> max_val</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> /</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> sum_val</span></span></code></pre>\n<p><code>keepdim</code> 保证对应维度为1的axis不被压缩，仍然保留该维度。<br>\n<code>torch.exp</code>实现了Tensor逐元素的指数运算</p>\n<h3><span id=\"dot-self-attention-with-scaling\"> Dot Self-Attention with Scaling</span></h3>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mrow><mi mathvariant=\"normal\">A</mi><mi mathvariant=\"normal\">t</mi><mi mathvariant=\"normal\">t</mi><mi mathvariant=\"normal\">e</mi><mi mathvariant=\"normal\">n</mi><mi mathvariant=\"normal\">t</mi><mi mathvariant=\"normal\">i</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">n</mi></mrow><mo stretchy=\"false\">(</mo><mi>Q</mi><mo separator=\"true\">,</mo><mi>K</mi><mo separator=\"true\">,</mo><mi>V</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mrow><mi mathvariant=\"normal\">S</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">f</mi><mi mathvariant=\"normal\">t</mi><mi mathvariant=\"normal\">m</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">x</mi></mrow><mrow><mo fence=\"true\">(</mo><mfrac><mrow><mi>Q</mi><msup><mi>K</mi><mi>T</mi></msup></mrow><msqrt><msub><mi>d</mi><mi>k</mi></msub></msqrt></mfrac><mo fence=\"true\">)</mo></mrow><mo>⋅</mo><mi>V</mi></mrow><annotation encoding=\"application/x-tex\">\\mathrm{Attention}(Q,K,V) = \\mathrm{Softmax}\\left(\\frac{QK^T}{\\sqrt{d_k}}\\right)\\cdot V\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">Attention</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">Q</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em;\">V</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.4684em;vertical-align:-0.95em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">Softmax</span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">(</span></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.5183em;\"><span style=\"top:-2.2528em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8572em;\"><span class=\"svg-align\" style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\" style=\"padding-left:0.833em;\"><span class=\"mord\"><span class=\"mord mathnormal\">d</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03148em;\">k</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.8172em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"hide-tail\" style=\"min-width:0.853em;height:1.08em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.08em\" viewbox=\"0 0 400000 1080\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M95,702\nc-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14\nc0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54\nc44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10\ns173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429\nc69,-144,104.5,-217.7,106.5,-221\nl0 -0\nc5.3,-9.3,12,-14,20,-14\nH400000v40H845.2724\ns-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7\nc-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z\nM834 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1828em;\"><span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">Q</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8413em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em;\">T</span></span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.93em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">)</span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em;\">V</span></span></span></span></span></p>\n<p>在Attention的基础上，需要考虑Q,K 的masking</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> run_scaled_dot_product_attention</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    Q</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Float</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Tensor</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> ... queries d_k</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    K</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Float</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Tensor</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> ... keys d_k</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    V</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Float</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Tensor</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> ... keys d_v</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    mask</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Bool</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Tensor</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> ... queries keys</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> |</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\"> None</span><span style=\"color:#999999;--shiki-dark:#666666\"> =</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\"> None</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> -</span><span style=\"color:#999999;--shiki-dark:#666666\">></span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Float</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Tensor</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> ... queries d_v</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    mul </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Q </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">@</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> K</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">transpose</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">-</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">2</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">-</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> /</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">sqrt</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">tensor</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">K</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">shape</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">[</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">-</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">]</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> device</span><span style=\"color:#999999;--shiki-dark:#666666\"> =</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> K</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">device</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> dtype</span><span style=\"color:#999999;--shiki-dark:#666666\"> =</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> K</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dtype</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    if</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">mask </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">is</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\"> None</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        mask_mul </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> mul</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    else</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        mask_mul </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">mask_fill</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">mask</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">-</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">inf</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    return</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">softmax</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">mask_mul</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> dim</span><span style=\"color:#999999;--shiki-dark:#666666\"> =</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> -</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> @</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> V</span></span></code></pre>\n<p>需要注意的是，<code>torch.sqrt</code>只能面向Tensor进行计算，所以需要将python int 的 <code>d_k</code>转换为Tensor后再进行开根</p>\n<p><code>mask</code> 是一个Bool类型的Tensor，用于控制能输入Attention被“注意”的部分</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mtext>mask</mtext><mo stretchy=\"false\">(</mo><mi>i</mi><mo separator=\"true\">,</mo><mi>j</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mrow><mo fence=\"true\">{</mo><mtable rowspacing=\"0.36em\" columnalign=\"left left\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mo>−</mo><mi mathvariant=\"normal\">∞</mi></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><msub><mi>a</mi><mrow><mi>i</mi><mo separator=\"true\">,</mo><mi>j</mi></mrow></msub><mtext> </mtext><mtext>is masked</mtext></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mn>0</mn></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><msub><mi>a</mi><mrow><mi>i</mi><mo separator=\"true\">,</mo><mi>j</mi></mrow></msub><mtext> </mtext><mtext>isn’t masked</mtext></mrow></mstyle></mtd></mtr></mtable></mrow></mrow><annotation encoding=\"application/x-tex\">\\text{mask}(i,j) = \\begin{dcases}\n-\\infty &amp;a_{i,j}\\,\\text{is masked}\\\\\n0 &amp;a_{i,j}\\,\\text{isn&#x27;t masked}\n\\end{dcases}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord text\"><span class=\"mord\">mask</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">i</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05724em;\">j</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:3em;vertical-align:-1.25em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size4\">{</span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.69em;\"><span style=\"top:-3.69em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord\">−</span><span class=\"mord\">∞</span></span></span><span style=\"top:-2.25em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord\">0</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.19em;\"><span></span></span></span></span></span><span class=\"arraycolsep\" style=\"width:1em;\"></span><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.69em;\"><span style=\"top:-3.69em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">a</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">i</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord text\"><span class=\"mord\">is masked</span></span></span></span><span style=\"top:-2.25em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">a</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">i</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord text\"><span class=\"mord\">isn’t masked</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.19em;\"><span></span></span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p>mask矩阵和scaling QK product 输出的矩阵相加</p>\n<p>某一个项Masked后，经过Softmax就会变成<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msup><mi>e</mi><mrow><mo>−</mo><mi mathvariant=\"normal\">∞</mi></mrow></msup><mo>=</mo><mn>0</mn></mrow><annotation encoding=\"application/x-tex\">e^{-\\infty} = 0</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7713em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">e</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7713em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">−</span><span class=\"mord mtight\">∞</span></span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">0</span></span></span></span>, 反之保留原本的结果</p>\n<h3><span id=\"multihead-attention\"> MultiHead-Attention</span></h3>\n<p>多头注意力需要将输入的QKV矩阵的隐藏层平均分割为<code>num_heads</code>个，每一个Head分为<code>d_k = d_model / num_heads</code></p>\n<p>在初始设置通道数的时候，总保证通道数是整除隐藏层维度的，因此单个输出头的隐藏层维度总是整数</p>\n<p>将不同输出头的Attention结果Concat，并最后经过output tensor</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> run_multihead_self_attention</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    d_model</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> int</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    num_heads</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> int</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    q_proj_weight</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Float</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Tensor</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> d_model d_model</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    k_proj_weight</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Float</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Tensor</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> d_model d_model</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    v_proj_weight</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Float</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Tensor</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> d_model d_model</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    o_proj_weight</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Float</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Tensor</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> d_model d_model</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    in_features</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Float</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Tensor</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> ... sequence_length d_model</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> -</span><span style=\"color:#999999;--shiki-dark:#666666\">></span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Float</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Tensor</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> ... sequence_length d_model</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    d_k </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> d_model </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">//</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> num_heads</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    Q </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">in_features </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">@</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> q_proj_weight</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">reshape</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\">...</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">num_heads</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">d_k</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">transpose</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">-</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">2</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> -</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">3</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> </span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    K </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">in_features </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">@</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> k_proj_weight</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">reshape</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\">...</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">num_heads</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">d_k</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">transpose</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">-</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">2</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> -</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">3</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    V </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">in_features </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">@</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> v_proj_weight</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">reshape</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\">...</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">num_heads</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">d_k</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">transpose</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">-</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">2</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> -</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">3</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    attn_output </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> run_scaled_dot_product_attention</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Q</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> K</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> V</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    attn_output </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> attn_output</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">transpose</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">-</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">2</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> -</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">3</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">reshape</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">in_features</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">shape</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">-</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> d_model</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    return</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> attn_output </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">@</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> o_proj_weight</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">T</span></span></code></pre>\n<h3><span id=\"cross-entropy\"> Cross-Entropy</span></h3>\n<p>对于输入为logits ，Target 为One-Hot 的标记Tensor, Cross-Entropy 的实现为</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> run_cross_entropy</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    inputs</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Float</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Tensor</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> batch_size vocab_size</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> targets</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Int</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Tensor</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> batch_size</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> -</span><span style=\"color:#999999;--shiki-dark:#666666\">></span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Float</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Tensor</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"\"</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    inputs_ </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> run_softmax</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">inputs</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\">dim</span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">-</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    target_tensor </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> inputs_</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">arange</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">inputs</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">shape</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">]</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">targets</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    return</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> -</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">mean</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">log</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">target_tensor</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span></code></pre>\n<p>即表示为</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mrow><mi mathvariant=\"normal\">C</mi><mi mathvariant=\"normal\">E</mi></mrow><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mo>−</mo><mfrac><mn>1</mn><mi mathvariant=\"script\">B</mi></mfrac><mo>∑</mo><mi>log</mi><mo>⁡</mo><mrow><mi mathvariant=\"normal\">S</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">f</mi><mi mathvariant=\"normal\">t</mi><mi mathvariant=\"normal\">m</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">x</mi></mrow><mo stretchy=\"false\">(</mo><msub><mi>x</mi><mi>i</mi></msub><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\mathrm{CE}(x) = -\\frac{1}{\\mathcal{B}}\\sum \\log \\mathrm{Softmax}(x_i)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">CE</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.0074em;vertical-align:-0.686em;\"></span><span class=\"mord\">−</span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3214em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathcal\" style=\"margin-right:0.03041em;\">B</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop op-symbol large-op\" style=\"position:relative;top:0em;\">∑</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop\">lo<span style=\"margin-right:0.01389em;\">g</span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">Softmax</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span></span></p>\n<h3><span id=\"rope\"> RoPE</span></h3>\n<p>RoPE旋转编码器 <a href=\"/2026/06/07/CS/LLM/position_encoding/\" title=\"Position Encoding\">Position Encoding</a></p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> run_rope</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    d_k</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> int</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    theta</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> float</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    max_seq_len</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> int</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    in_query_or_key</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Float</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Tensor</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> ... sequence_length d_k</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    token_positions</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Int</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Tensor</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> ... sequence_length</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> -</span><span style=\"color:#999999;--shiki-dark:#666666\">></span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Float</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Tensor</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> ... sequence_length d_k</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    x </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> in_query_or_key</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    dim_idx </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">arange</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> d_k</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 2</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> device</span><span style=\"color:#999999;--shiki-dark:#666666\"> =</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> x</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">device</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> dtype</span><span style=\"color:#999999;--shiki-dark:#666666\"> =</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> x</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dtype</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    feq </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 1.0</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> /</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">theta </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">**</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dim_idx </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">/</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> d_k</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    angle </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> token_positions</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">to</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">x</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dtype</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">unsqueeze</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">-</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> *</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> feq</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    cos_ </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">cos</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">angle</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    sin_ </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">sin</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">angle</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    x_even </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> x</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\">...</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">2</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    x_odd </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> x</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\">...</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">2</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    out_even </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> x_even </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> cos_ </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">-</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> x_odd </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> sin_</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    out_odd </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> x_even </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> sin_ </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> x_odd </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> cos_</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    out </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">empty_like</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">x</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    out</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\">...</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">2</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span><span style=\"color:#999999;--shiki-dark:#666666\"> =</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> out_even</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    out</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\">...</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">2</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span><span style=\"color:#999999;--shiki-dark:#666666\"> =</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> out_odd</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    return</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> out</span></span></code></pre>\n<h3><span id=\"batch\"> Batch</span></h3>\n<p>将Dataset 分割为若干个batch进行训练。输出为两个<code>torch.Tensor</code> x,y</p>\n<p><strong>Q</strong>: 为什么输出x,y， 且y tensor是x tensor 后移1位</p>\n<p><strong>A</strong>: 因为自回归语言模型的训练目标是 Next Token Prediction，也就是根据当前位置之前的 token 来预测下一个 token。</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> run_get_batch</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    dataset</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> npt</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">NDArray</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> batch_size</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> int</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> context_length</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> int</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> device</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> str</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> -</span><span style=\"color:#999999;--shiki-dark:#666666\">></span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> tuple</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Tensor</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Tensor</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#998418;--shiki-dark:#B8A965\">    max</span><span style=\"color:#999999;--shiki-dark:#666666\"> =</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> len</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dataset</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> -</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> context_length</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    sample </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">randint</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#998418;--shiki-dark:#B8A965\">max</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">batch_size</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    x </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">stack</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">tensor</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dataset</span><span style=\"color:#a13865;--shiki-dark:#d9739f\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">s </span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> s </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> context_length</span><span style=\"color:#a13865;--shiki-dark:#d9739f\">]</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\"> for</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> s </span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">in</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> sample</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    y </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">stack</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">tensor</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dataset</span><span style=\"color:#a13865;--shiki-dark:#d9739f\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">s </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 1</span><span style=\"color:#999999;--shiki-dark:#666666\"> :</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> s </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 1</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> +</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> context_length</span><span style=\"color:#a13865;--shiki-dark:#d9739f\">]</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\"> for</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> s </span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">in</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> sample</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    return</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> x</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">to</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">device</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">to</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">device</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span></code></pre>\n<p><code>torch.randint</code> 本身就能直接生成Tensor int的随机张量，不需要单个随机整数生成再载入Tensor中</p>\n<h3><span id=\"gradient-clippiing\"> Gradient Clippiing</span></h3>\n<p>对于给定梯度模阈值Scaling 梯度</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mrow><mi mathvariant=\"normal\">S</mi><mi mathvariant=\"normal\">c</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">l</mi><mi mathvariant=\"normal\">_</mi><mi mathvariant=\"normal\">f</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">c</mi><mi mathvariant=\"normal\">t</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">r</mi></mrow><mo>=</mo><mrow><mo fence=\"true\">{</mo><mtable rowspacing=\"0.36em\" columnalign=\"left left\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mn>1</mn></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mi>g</mi><mo>&lt;</mo><mi>M</mi></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mfrac><mi>M</mi><mrow><mi mathvariant=\"normal\">∥</mi><mi>g</mi><mi mathvariant=\"normal\">∥</mi><mo>+</mo><mi>ε</mi></mrow></mfrac></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mi>g</mi><mo>≥</mo><mi>M</mi></mrow></mstyle></mtd></mtr></mtable></mrow></mrow><annotation encoding=\"application/x-tex\">\\mathrm{Scal\\_ factor} = \\begin{dcases}\n1 &amp; g&lt;M\\\\\n\\frac{M}{\\|g\\|+\\varepsilon} &amp; g\\geq M\n\\end{dcases}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.0044em;vertical-align:-0.31em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">Scal_factor</span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:3.7363em;vertical-align:-1.6182em;\"></span><span class=\"minner\"><span class=\"mopen\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.05em;\"><span style=\"top:-2.5em;\"><span class=\"pstrut\" style=\"height:3.15em;\"></span><span class=\"delimsizinginner delim-size4\"><span>⎩</span></span></span><span style=\"top:-2.492em;\"><span class=\"pstrut\" style=\"height:3.15em;\"></span><span style=\"height:0.016em;width:0.8889em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.8889em\" height=\"0.016em\" style=\"width:0.8889em\" viewbox=\"0 0 888.89 16\" preserveaspectratio=\"xMinYMin\"><path d=\"M384 0 H504 V16 H384z M384 0 H504 V16 H384z\"/></svg></span></span><span style=\"top:-3.15em;\"><span class=\"pstrut\" style=\"height:3.15em;\"></span><span class=\"delimsizinginner delim-size4\"><span>⎨</span></span></span><span style=\"top:-4.292em;\"><span class=\"pstrut\" style=\"height:3.15em;\"></span><span style=\"height:0.016em;width:0.8889em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.8889em\" height=\"0.016em\" style=\"width:0.8889em\" viewbox=\"0 0 888.89 16\" preserveaspectratio=\"xMinYMin\"><path d=\"M384 0 H504 V16 H384z M384 0 H504 V16 H384z\"/></svg></span></span><span style=\"top:-4.3em;\"><span class=\"pstrut\" style=\"height:3.15em;\"></span><span class=\"delimsizinginner delim-size4\"><span>⎧</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.55em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.1182em;\"><span style=\"top:-4.4705em;\"><span class=\"pstrut\" style=\"height:3.3603em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span><span style=\"top:-2.6782em;\"><span class=\"pstrut\" style=\"height:3.3603em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3603em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">∥</span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">g</span><span class=\"mord\">∥</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\">ε</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">M</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.936em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.6182em;\"><span></span></span></span></span></span><span class=\"arraycolsep\" style=\"width:1em;\"></span><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.1182em;\"><span style=\"top:-4.4705em;\"><span class=\"pstrut\" style=\"height:3.3603em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">g</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">&lt;</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">M</span></span></span><span style=\"top:-2.6782em;\"><span class=\"pstrut\" style=\"height:3.3603em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">g</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≥</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">M</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.6182em;\"><span></span></span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> run_gradient_clipping</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">parameters</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Iterable</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">nn</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Parameter</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> max_l2_norm</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> float</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> -</span><span style=\"color:#999999;--shiki-dark:#666666\">></span><span style=\"color:#1E754F;--shiki-dark:#4D9375\"> None</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    norm </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">tensor</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0.0</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    for</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> p </span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">in</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> parameters</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        norm </span><span style=\"color:#999999;--shiki-dark:#666666\">+=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">norm</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">p</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">grad</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> **</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 2</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    total_norm </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">sqrt</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">norm</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    if</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> total_norm </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">></span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> max_l2_norm</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">        for</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> p </span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">in</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> parameters</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            p</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">grad </span><span style=\"color:#999999;--shiki-dark:#666666\">*=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> max_l2_norm </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">/</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">total_norm </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 10</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">**</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">-</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">6</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span></code></pre>\n<h3><span id=\"checkpoint-save-amp-load\"> Checkpoint Save &amp; Load</span></h3>\n<p>实现模型状态的断点保存与加载,通过<code>torch.save</code>一个dict对象实现</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> run_save_checkpoint</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    model</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">nn</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Module</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    optimizer</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">optim</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Optimizer</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    iteration</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> int</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    out</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> str</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> |</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> os</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">PathLike </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">|</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> BinaryIO </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">|</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\"> IO</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#998418;--shiki-dark:#B8A965\">bytes</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">save</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">        </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">            \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">model</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> model</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">state_dict</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">            \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">optimizer</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> optimizer</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">state_dict</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">            \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">iteration</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> iteration</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">        </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#125;</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        out</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">    </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span></code></pre>\n<p>加载就是从dict中load出来</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> run_load_checkpoint</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    src</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> str</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> |</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> os</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">PathLike </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">|</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> BinaryIO </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">|</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\"> IO</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#998418;--shiki-dark:#B8A965\">bytes</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    model</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">nn</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Module</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    optimizer</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">optim</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Optimizer</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> -</span><span style=\"color:#999999;--shiki-dark:#666666\">></span><span style=\"color:#998418;--shiki-dark:#B8A965\"> int</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    checkpoint </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> torch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">load</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">src</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    model</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">load_state_dict</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">checkpoint</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">model</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    optimizer</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">load_state_dict</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">checkpoint</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">optimizer</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    return</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> checkpoint</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">iteration</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span></code></pre>\n",
            "tags": []
        },
        {
            "id": "https://yuuko.site/2026/06/05/CS/OS/%E6%96%87%E4%BB%B6%E7%AE%A1%E7%90%86/",
            "url": "https://yuuko.site/2026/06/05/CS/OS/%E6%96%87%E4%BB%B6%E7%AE%A1%E7%90%86/",
            "title": "文件管理",
            "date_published": "2026-06-04T16:00:00.000Z",
            "content_html": "<link rel=\"stylesheet\" type=\"text/css\" href=\"https://cdn.jsdelivr.net/hint.css/2.4.1/hint.min.css\"><h1><span id=\"文件管理\"> 文件管理</span></h1>\n<div class=\"note default subtitle no-icon\">\n<p>Everything is a file</p>\n</div>\n<p>文件是存储在计算机持久性存储设备的信息集合。用户进行IO操作时，文件是<strong>逻辑操作的基本对象</strong>，而文件系统是操作系统面向用户的文件管理系统。</p>\n<h2><span id=\"文件的定义与属性\"> 文件的定义与属性</span></h2>\n<p>文件是具有文件名的一组相关元素的集合，分为<strong>有结构文件</strong>与<strong>无结构文件</strong>。</p>\n<ul>\n<li>\n<p><strong>有结构文件</strong>由若干个格式相似的记录组成。而记录是一组相关数据项的组合，用于描述一个对象某一方面的完整属性。</p>\n</li>\n<li>\n<p><strong>无结构文件</strong>则是连续的字节流，比如二进制文件或者普通文本文件。因此也称为<strong>流式文件</strong>。例如<code>.lib</code>、<code>.dll</code>、<code>.o</code>文件这样的已编译后的二进制文件，需要通过读/写指针的方式定位下一个操作的字节。进行特定区域的检索只能通过顺序查找，效率较低。</p>\n</li>\n</ul>\n<p>对于有结构文件，其由多个记录组成，因此称为<strong>记录式文件</strong>。根据记录的长度是否一致可以分为</p>\n<ul>\n<li>定长记录 -- 所有记录的长度相同，且各数据项在记录中的位置相对固定，系统可以直接通过偏移量定位记录。</li>\n<li>变长记录 -- 记录长度不一或者数据长度不一，只能通过顺序查找相应记录，速度较慢</li>\n</ul>\n<p>有结构文件根据记录组织方式分为</p>\n<ul>\n<li>顺序文件</li>\n<li>索引文件</li>\n<li>索引顺序文件</li>\n<li>直接文件(散列文件)</li>\n</ul>\n<p><strong>顺序文件</strong>的记录按线性排列，可分为<strong>串结构</strong>与<strong>顺序结构</strong>。串结构的记录是时间先后顺序排列的，因此只能顺序扫描，效率低；而顺序结构的记录按键字排列，能通过偏移量直接定位记录。</p>\n<p>类似于磁带这样物理限制的硬件，只能存储与使用顺序文件。</p>\n<p><strong>索引文件</strong>通常用于进行数据随机查询等任务。索引文件维护了一个索引表，索引表中包含了：索引号、文件长度于指针，指针指向相应的逻辑文件，但是逻辑文件对象是什么层度地址取决于程序员的逻辑。</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-text\"><span class=\"line\"><span>|索引号idx|长度m|指针ptr|</span></span></code></pre>\n<p>对于定长记录，第 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>i</mi></mrow><annotation encoding=\"application/x-tex\">i</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6595em;\"></span><span class=\"mord mathnormal\">i</span></span></span></span> 条记录的首址是简单的 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>A</mi><mi>i</mi></msub><mo>=</mo><mi>i</mi><mi>L</mi></mrow><annotation encoding=\"application/x-tex\">A_i = iL</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">A</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\">i</span><span class=\"mord mathnormal\">L</span></span></span></span>, 对于变长记录，第<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>i</mi></mrow><annotation encoding=\"application/x-tex\">i</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6595em;\"></span><span class=\"mord mathnormal\">i</span></span></span></span> 条记录的首址是前序记录的后继</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>A</mi><mi>i</mi></msub><mo>=</mo><munderover><mo>∑</mo><mrow><mi>k</mi><mo>=</mo><mn>1</mn></mrow><mi>i</mi></munderover><msub><mi>A</mi><mi>k</mi></msub><mo>+</mo><mn>1</mn></mrow><annotation encoding=\"application/x-tex\">A_i = \\sum_{k=1}^i A_k +1\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">A</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:3.1138em;vertical-align:-1.3021em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.8117em;\"><span style=\"top:-1.8479em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03148em;\">k</span><span class=\"mrel mtight\">=</span><span class=\"mord mtight\">1</span></span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span><span style=\"top:-4.3em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3021em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">A</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03148em;\">k</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">1</span></span></span></span></span></p>\n<p><strong>索引顺序文件</strong></p>\n<p>将索引文件与顺序文件相结合。分组的组间通过索引建立，可以无序；但是组内通过顺序文件构成，需要键字有序才能进行偏移量定位。</p>\n<p>索引顺序文件的查找逻辑和<a href=\"/2026/04/18/CS/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84/%E6%9F%A5%E6%89%BE/\" title=\"查找\">查找</a> 中的分块查找相同。</p>\n<p>对于<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>N</mi></mrow><annotation encoding=\"application/x-tex\">N</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span></span></span></span>条记录的普通顺序文件的普通顺序文件的查找需要 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mfrac><mi>N</mi><mn>2</mn></mfrac></mrow><annotation encoding=\"application/x-tex\">\\frac{N}{2}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.2173em;vertical-align:-0.345em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8723em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.394em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.345em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span> 次。 分块最优查找下，假设分为<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>M</mi></mrow><annotation encoding=\"application/x-tex\">M</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">M</span></span></span></span>块，则需要在块间和块内进行顺序查找，总共需要</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mfrac><mi>M</mi><mn>2</mn></mfrac><mo>+</mo><mfrac><mi>N</mi><mrow><mn>2</mn><mi>M</mi></mrow></mfrac><mo>≥</mo><mn>2</mn><msqrt><mrow><mfrac><mi>M</mi><mn>2</mn></mfrac><mo>⋅</mo><mfrac><mi>N</mi><mrow><mn>2</mn><mi>M</mi></mrow></mfrac></mrow></msqrt><mo>=</mo><msqrt><mi>N</mi></msqrt></mrow><annotation encoding=\"application/x-tex\">\\frac{M}{2}+\\frac{N}{2M}\\geq 2\\sqrt{\\frac{M}{2}\\cdot\\frac{N}{2M}} = \\sqrt{N}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:2.0463em;vertical-align:-0.686em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3603em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">2</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">M</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.0463em;vertical-align:-0.686em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3603em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">2</span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">M</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≥</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.44em;vertical-align:-0.769em;\"></span><span class=\"mord\">2</span><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.671em;\"><span class=\"svg-align\" style=\"top:-4.4em;\"><span class=\"pstrut\" style=\"height:4.4em;\"></span><span class=\"mord\" style=\"padding-left:1em;\"><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3603em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">2</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">M</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3603em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">2</span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">M</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span><span style=\"top:-3.631em;\"><span class=\"pstrut\" style=\"height:4.4em;\"></span><span class=\"hide-tail\" style=\"min-width:1.02em;height:2.48em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"2.48em\" viewbox=\"0 0 400000 2592\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M424,2478\nc-1.3,-0.7,-38.5,-172,-111.5,-514c-73,-342,-109.8,-513.3,-110.5,-514\nc0,-2,-10.7,14.3,-32,49c-4.7,7.3,-9.8,15.7,-15.5,25c-5.7,9.3,-9.8,16,-12.5,20\ns-5,7,-5,7c-4,-3.3,-8.3,-7.7,-13,-13s-13,-13,-13,-13s76,-122,76,-122s77,-121,77,-121\ns209,968,209,968c0,-2,84.7,-361.7,254,-1079c169.3,-717.3,254.7,-1077.7,256,-1081\nl0 -0c4,-6.7,10,-10,18,-10 H400000\nv40H1014.6\ns-87.3,378.7,-272.6,1166c-185.3,787.3,-279.3,1182.3,-282,1185\nc-2,6,-10,9,-24,9\nc-8,0,-12,-0.7,-12,-2z M1001 80\nh400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.769em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.04em;vertical-align:-0.0645em;\"></span><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9755em;\"><span class=\"svg-align\" style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\" style=\"padding-left:0.833em;\"><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span></span></span><span style=\"top:-2.9355em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"hide-tail\" style=\"min-width:0.853em;height:1.08em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.08em\" viewbox=\"0 0 400000 1080\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M95,702\nc-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14\nc0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54\nc44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10\ns173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429\nc69,-144,104.5,-217.7,106.5,-221\nl0 -0\nc5.3,-9.3,12,-14,20,-14\nH400000v40H845.2724\ns-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7\nc-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z\nM834 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.0645em;\"><span></span></span></span></span></span></span></span></span></span></p>\n<p>当 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>M</mi><mo>=</mo><msqrt><mi>N</mi></msqrt></mrow><annotation encoding=\"application/x-tex\">M = \\sqrt{N}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">M</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.04em;vertical-align:-0.1133em;\"></span><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9267em;\"><span class=\"svg-align\" style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\" style=\"padding-left:0.833em;\"><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span></span></span><span style=\"top:-2.8867em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"hide-tail\" style=\"min-width:0.853em;height:1.08em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.08em\" viewbox=\"0 0 400000 1080\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M95,702\nc-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14\nc0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54\nc44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10\ns173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429\nc69,-144,104.5,-217.7,106.5,-221\nl0 -0\nc5.3,-9.3,12,-14,20,-14\nH400000v40H845.2724\ns-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7\nc-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z\nM834 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1133em;\"><span></span></span></span></span></span></span></span></span> 时，查找效率最优</p>\n<p><strong>散列文件</strong> 通过Hash函数进行文件索引</p>\n<h3><span id=\"文件的属性\"> 文件的属性</span></h3>\n<p>文件的属性包括</p>\n<ul>\n<li>名称</li>\n<li>类型</li>\n<li>创建者</li>\n<li>所有者</li>\n<li>位置</li>\n<li>大小</li>\n<li>保护</li>\n<li>创建时间</li>\n</ul>\n<h2><span id=\"文件系统结构\"> 文件系统结构</span></h2>\n<p>根据与硬件系统交互的自下至上的结构可分为</p>\n<ul>\n<li>IO控制 -- 负责磁盘与内存交互，将磁盘文件写入写出内存，并实现IO与操作系统的交互</li>\n<li>基本文件系统 -- 负责读写磁盘中的物理块</li>\n<li>文件组织模块 -- 负责文件逻辑与物理块直接的映射</li>\n<li>逻辑文件系统 -- 负责管理文件系统的元数据，维护目录结构，不含有实际数据的内容。</li>\n</ul>\n<h1><span id=\"文件目录\"> 文件目录</span></h1>\n<p>文件目录管理的目的是实现:</p>\n<ul>\n<li>按名存取</li>\n<li>提高检索速度</li>\n<li>支持文件共享 -- 允许多用户访问同一目录下的文件</li>\n<li>允许文件重名 -- 不同子目录下的文件可重名</li>\n</ul>\n<h2><span id=\"文件控制块fcb\"> 文件控制块FCB</span></h2>\n<p>和PCB刻画进程信息类似，FCB用于确定文件的相关信息。FCB包含的信息有</p>\n<ul>\n<li><strong>文件的基本信息</strong> -- 文件名，物理地址、逻辑结构等</li>\n<li><strong>存储控制信息</strong> -- 访问权限等</li>\n<li><strong>使用信息</strong> -- 创建时间与最后修改时间等</li>\n</ul>\n<p>为了避免在检索文件时将整个FCB全部调入内存，UNIX系统中将文件名和描述信息解耦，将文件描述信息组织位<strong>索引结点(inode)</strong>。通常inode就是FCB，在文件目录的目录项中，对应索引结点编号指向对应的inode，文件目录表建立的是文件名和索引节点编号的映射</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-text\"><span class=\"line\"><span>|文件名|索引节点编号|</span></span></code></pre>\n<p>目录项大小相比FCB减小，每个盘块可以容纳更多的目录项，因此单次搜索可以降低平均磁盘启动的次数。</p>\n<p>索引节点由两个部分构成，分别为<strong>磁盘索引节点</strong>与<strong>内存索引节点</strong></p>\n<p>磁盘索引节点是文件的相关信息，包括文件标识符、文件类型、权限、地址、长度等信息。文件加载进内存后，相应的内存索引节点则相比磁盘索引节点多了运行时信息，包括索引节点号、状态、访问计数器、链接指针等。</p>\n<h2><span id=\"文件目录的结构\"> 文件目录的结构</span></h2>\n<p>文件目录分为</p>\n<ul>\n<li>单级目录</li>\n<li>两级目录</li>\n<li>树形目录</li>\n<li>有向无环图目录</li>\n</ul>\n<p>单级目录将FCB线性排列，满足了基本的按名存取的需求，但是其无法实现文件重名存储且查找效率低，文件无法共享。</p>\n<p>两级目录分为<strong>主文件目录(MFD)<strong>和</strong>用户文件目录(UFD)</strong>，满足了多用户的需求，主文件目录下放置多个文件名和用户目录的存储位置, 用户目录下放置用户私有文件。</p>\n<p>用户文件目录内是单级目录排列的，因此灵活性与查找效率仍然较差。</p>\n<p>树状目录是现代的文件目录结构，从根结点<code>/</code>出发，到若干个子目录。用户的专有目录在<code>/usr/*</code></p>\n<p>树状目录的路径分为绝对路径与相对路径。绝对路径就是从根结点出发的完整路径，比如<code>/var/root</code>. 相对路径则是从进程当前路径为基准，其他文件的路径。比如这篇文章的相对路径为<code>./source/_posts/OS/文件管理.md</code></p>\n<p>有向无环图的目录结构的形成逻辑和<a href=\"/2026/04/12/CS/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84/%E5%9B%BE/\" title=\"图\">图</a> 中的m叉树和DAG的转换相同，即将目录间共享的文件作为共同结点商去，就构成了有向无环图。但是有向无环图的目录结构带来了更复杂的文件用户权限管理和文件增删的逻辑，增加了管理的开销。</p>\n<h3><span id=\"文件目录的实现\"> 文件目录的实现</span></h3>\n<p>文件目录基于 线性列表/哈希表 实现</p>\n<ul>\n<li>线性列表 通过线性表管理文件名与数据块指针，但是查找效率较低，时间复杂度为 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"script\">O</mi><mo stretchy=\"false\">(</mo><mi>n</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\mathcal{O}(n)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathcal\" style=\"margin-right:0.02778em;\">O</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">n</span><span class=\"mclose\">)</span></span></span></span></li>\n<li>哈希表 查找效率高但是存在Hash冲突，且目录查询依赖磁盘IO，需要动态加载活跃的目录进内存以提高响应速度。</li>\n</ul>\n<h2><span id=\"文件目录的操作\"> 文件目录的操作</span></h2>\n<p>和日常使用计算机相同，文件目录的操作包含</p>\n<ul>\n<li>搜索文件</li>\n<li>创建/删除文件</li>\n<li>创建/删除目录</li>\n<li>移动/显示/改变目录</li>\n</ul>\n<h2><span id=\"文件的物理结构\"> 文件的物理结构</span></h2>\n<p>文件的物理结构是指操作系统中抽象的文件对象与磁盘中的存储块之间的对应方式。</p>\n<p>文件的物理结构分为两个互补的核心部分</p>\n<ul>\n<li>文件分配方式 -- 管理已使用的磁盘非空闲块</li>\n<li>文件存储空间管理方式 -- 管理未使用的磁盘空闲块</li>\n</ul>\n<h3><span id=\"连续分配方式\"> 连续分配方式</span></h3>\n<p>根据磁盘连续的存储块进行分配。比如文件长度为<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>n</mi></mrow><annotation encoding=\"application/x-tex\">n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">n</span></span></span></span>块，从第<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>b</mi></mrow><annotation encoding=\"application/x-tex\">b</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em;\"></span><span class=\"mord mathnormal\">b</span></span></span></span>块开始存放，则分配给该文件的块就是 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>b</mi><mo separator=\"true\">,</mo><mi>b</mi><mo>+</mo><mn>1</mn><mo separator=\"true\">,</mo><mo>⋯</mo><mtext> </mtext><mo separator=\"true\">,</mo><mi>b</mi><mo>+</mo><mi>n</mi><mo>−</mo><mn>1</mn></mrow><annotation encoding=\"application/x-tex\">b,b+1,\\cdots , b+n-1</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\">b</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\">b</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"mord\">1</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\">⋯</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\">b</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\">n</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">1</span></span></span></span></p>\n<p>连续分配方式可以在 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"script\">O</mi><mo stretchy=\"false\">(</mo><mn>1</mn><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\mathcal{O}(1)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathcal\" style=\"margin-right:0.02778em;\">O</span><span class=\"mopen\">(</span><span class=\"mord\">1</span><span class=\"mclose\">)</span></span></span></span> 时间内读取指定块，但是它和内存连续分配类似，可能产生大量的外部碎片。文件长度也需要预先知道，难以支持文件大小的动态拓展。</p>\n<p>和线性表插入的问题相同，对于连续分配方式的文件，需要增删新的文件需要整体移动文件，性能开销大。</p>\n<h3><span id=\"链接分配方式\"> 链接分配方式</span></h3>\n<h4><span id=\"隐式链接\"> 隐式链接</span></h4>\n<p>磁盘块内存储指向下一个磁盘块的指针，用户无法管理内部指针。当磁盘块链断开时，后续磁盘块内的数据都无法访问。</p>\n<p>磁盘块能合并为<strong>簇</strong>对象，减少了指针存储的开销。簇内的块是连续分配的块，而簇间的指针序仍然存在。</p>\n<h4><span id=\"显式链接\"> 显式链接</span></h4>\n<p>显式链接会维护一张<strong>文件分配表(FAT)</strong>, 其在整个文件系统只有一张。FAT记录盘块的盘块号与其指向的下一个块的盘块号。这其实就是盘块的静态链表结构。</p>\n<p>FAT需要始终加载在内存中，内存开销较大，但是支持顺序访问也支持随机访问，性能效率高。</p>\n<h3><span id=\"索引分配\"> 索引分配</span></h3>\n<p>索引表相当于将单个文件分配的硬盘块使用一个表进行维护，访问文件时只需要根据索引表的内容加载硬盘块。</p>\n<p>如果文件过大可以使用多级索引表进行硬盘块的索引。类似于 <a href=\"/2026/06/02/CS/OS/%E5%86%85%E5%AD%98%E7%AE%A1%E7%90%86/\" title=\"内存管理\">内存管理</a> 中关于多级页表的形式，多级索引表也是将块号进行离散分表映射以节省存储开销。但是相应会增加磁盘IO的开销。</p>\n<h4><span id=\"混合索引方式\"> 混合索引方式</span></h4>\n<p>混合索引方式则是对于大文件与小文件使用不同的文件索引方式。inode结构中可以存入多个直接块以及若干层索引间址，对应多级索引的管理方式。这样对于不同大小的文件使用不同的索引方式可以降低整体的性能开销。</p>\n<h2><span id=\"文件的操作\"> 文件的操作</span></h2>\n<p>文件的基本操作包括创建文件、删除文件、读/写文件</p>\n<p>创建文件的过程包括</p>\n<ul>\n<li>文件名合法检验和权限检验</li>\n<li>分配inode并创建inode中的信息</li>\n<li>分配磁盘块</li>\n<li>建立目录项</li>\n<li>更新文件系统的元数据</li>\n</ul>\n<p>删除文件的过程包括</p>\n<ul>\n<li>定位删除文件，按路径名检索文件目录找到目录项并获得inode</li>\n<li>检查使用状态、硬链接计数， 如果文件被打开，则延迟删除(Windows 系统会直接终止删除，Linux/Macos会等待文件关闭后完全删除)；如果硬链接计数大于1则不回收inode和磁盘块</li>\n<li>移除目录项</li>\n<li>释放inode、磁盘块与内存资源 -- 内存临时资源的释放要晚于磁盘信息的释放</li>\n<li>更新文件系统元数据</li>\n</ul>\n<p>读文件与写文件前都需要将文件从磁盘加载入内存中，这个过程称为<strong>打开文件</strong>，对应<code>open()</code>系统调用(这个操作会切换用户态至内核态，因为设计磁盘IO操作)</p>\n<p>打开文件的实现为：通过目录找到inode, 并根据inode维护一个<strong>打开文件表</strong>，用于记录当前活跃的文件信息。</p>\n<p>打开文件表有两层</p>\n<ul>\n<li>系统打开文件表 -- 全局活跃的文件属性，每一个项中有一个打开计数器，用于记录多少进程打开了这个文件</li>\n<li>进程打开文件表 -- 进程独有的打开文件表，用于记录私有文件的打开状态</li>\n</ul>\n<p>读文件的过程: 打开文件，如果文件不在内存中就将文件磁盘块加载进内存并更新读指针</p>\n<p>写文件的过程: 打开文件，如果文件不在内存中就将文件磁盘块加载进内存并更新写指针</p>\n<p>文件关闭时，通过系统调用<code>close()</code>关闭文件。对应的打开计数器减1，当计数器减到0，就释放打开文件表中关于这个文件的项并释放相关资源。</p>\n<p>文件加载入内存后，包含额外的信息有</p>\n<ul>\n<li>文件指针 -- 在打开文件表中，多个进程对文件多开的时候需要在打开文件表中维护多个文件指针，这是在文件在内存中的指针</li>\n<li>文件打开计数 -- 内存inode会管理多少个打开文件项使用了这个inode； 打开文件表会管理多少个进程打开了这个inode</li>\n<li>文件磁盘位置 -- 在内存inode中</li>\n<li>访问权限 -- 在内存inode中</li>\n</ul>\n<p>总结: 内存inode的信息是对于磁盘文件在内存的对象的信息管理；打开文件表是进程面/文件多开场景下的信息管理</p>\n<h2><span id=\"文件的共享\"> 文件的共享</span></h2>\n<p>文件的共享分为两个视角:</p>\n<ul>\n<li>用户对文件的共享</li>\n<li>进程对文件的共享</li>\n</ul>\n<p>进程对于文件共享管理的方式就是打开文件表以及相应的打开计数器与指针的管理</p>\n<p>用户侧的共享分为<strong>硬链接</strong>与<strong>软链接</strong></p>\n<p>硬链接是基于索引结点的共享方式。文件加载进内存后，文件的物理地址与属性信息加载在内存inode中，用户目录项只维护文件名与指向inode的指针。</p>\n<p>文件内存inode维护了一个链接计数<code>count</code>，用于标记指向该文件的用户文件表个数。当用户A不再使用该文件时，将count减1，并删除A私有文件表中的目录项，只有当count等于0才真正从内存中释放该文件与inode。</p>\n<p>软链接是Namespace层面的链接，对于形成软链接的文件，分配一个Link类型的文件，里面只存储文件实际目录的字符串，直接存储在inode中。</p>\n<p>当原始文件被删除时，不会影响另外一个用户访问Link文件本身，但是会出现访问失败的错误。同时，软链接建立的只是文件目录的映射而不是指针的映射，所以并不会改变原始文件内存inode的链接计数器</p>\n<h2><span id=\"文件保护\"> 文件保护</span></h2>\n<p>文件保护机制包括</p>\n<ul>\n<li>口令保护</li>\n<li>加密保护</li>\n<li>访问控制</li>\n</ul>\n<p>口令与加密防止其他用户的非法获取文件内容，访问控制用于管理用户对文件的权限。</p>\n<p>Linux/Macos中的<code>chmod</code>实现了对于三种用户类型的文件访问权限控制的方式。这是一种访问控制列表(Access-Control List,ACL)控制方式</p>\n<ul>\n<li><code>u</code> -- file owner</li>\n<li><code>g</code> -- group users</li>\n<li><code>o</code> -- other users</li>\n</ul>\n<p>对于三种权限进行赋权:</p>\n<ul>\n<li><code>r</code> -- read</li>\n<li><code>w</code> -- write</li>\n<li><code>x</code> -- execute</li>\n</ul>\n<p>使用三位十进制数转二进制权限，每一个十进制位代表user类型，比如<code>chmod 712 &lt;file&gt;</code> 表示file user 具有rwx(111)， group users具有x(001), other users具有w(010)</p>\n<h1><span id=\"文件系统\"> 文件系统</span></h1>\n<p>文件系统是整体文件管理逻辑在硬盘与IO侧的实现。</p>\n<h2><span id=\"fs-文件系统\"> FS -- 文件系统</span></h2>\n<p>File System 是在硬盘中真实管理硬盘的系统。硬盘通常被划分为若干个分区，每一个分区包括一个FS用于管理分区内的块信息。</p>\n<p>FS的构成可以分为</p>\n<ul>\n<li>主引导部分(MBR) -- 位于磁盘的0号分区，经由BIOS加载进内存并执行用于识别活动分区并读取分区的第一个扇区 -- 引导块</li>\n<li>引导块 -- 分区的引导部分，也是分区的，用于加载当前分区的操作系统的文件部分。</li>\n<li>超级块(Super Block) -- 包含分区的全局信息，比如分区的总分块、块大小、空闲块数量和指针、空闲i节点数量和指针</li>\n<li>空闲块信息</li>\n</ul>\n<p>对于不同品牌/设别类型的外存，文件系统不尽相同，存在如ext4/NTFS/FAT等类型。</p>\n<p>操作系统在内存中维护与VS相关的信息：</p>\n<ul>\n<li>内存中的挂载表 -- 挂载文件系统分区的信息，包括设备标识、挂载点与文件系统类型等</li>\n<li>内存中的目录表缓存 -- 根据局部性原理缓存最近访问的目录内容，减少磁盘的读取</li>\n<li>系统级打开文件表</li>\n<li>进程级打开文件表</li>\n</ul>\n<p>这些内存中信息在文件系统挂载时加载，在文件打开且运行时更新，卸载时释放。</p>\n<h3><span id=\"文件存储空间管理\"> 文件存储空间管理</span></h3>\n<p>文件存储空间管理是面向磁盘空闲块的管理方式。</p>\n<p><strong>空闲表法</strong> 和磁盘块分配中的连续分配方式类似，都是基于磁盘的连续磁盘空间进行分配。FS维护了一张空闲表用于记录空闲块的序号、起始盘块号、空闲盘块数等。</p>\n<p><strong>空闲链表法</strong> 使用一个链表链接所有空闲盘区组织</p>\n<p>空闲链表法的基本单位可以是盘块/盘区，和簇结构类似，空闲盘区链就是将一块块连续空闲盘块区域使用链表链接，满足区内连续，区间链表序的关系。</p>\n<p><strong>成组链接法</strong><br>\n通过分组方法进一步拓展了空闲链表法的指示空间。通过每一组的第一个盘块作为索引块记录下一组的空闲表块的块号与数目。</p>\n<p>这是经典的离散映射拓展指示空间的方式，第一级空闲表块信息被加载到专门的内存栈中，称为<strong>空闲盘块号栈</strong>，通过盘块栈弹出表示顺序分配空闲盘块。</p>\n<p><strong>位示图法</strong></p>\n<p>通过一个二维二进制表维护磁盘盘块的使用情况。</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mo fence=\"true\">{</mo><mtable rowspacing=\"0.36em\" columnalign=\"left left\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><msub><mi>a</mi><mrow><mi>i</mi><mo separator=\"true\">,</mo><mi>j</mi></mrow></msub><mo>=</mo><mn>0</mn></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mtext>(i,j)盘块空闲</mtext></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><msub><mi>a</mi><mrow><mi>i</mi><mo separator=\"true\">,</mo><mi>j</mi></mrow></msub><mo>=</mo><mn>1</mn></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mtext>(i,j)盘块正在使用</mtext></mstyle></mtd></mtr></mtable></mrow><annotation encoding=\"application/x-tex\">\\begin{dcases}\na_{i,j} = 0 &amp; \\text{(i,j)盘块空闲}\\\\\na_{i,j} = 1 &amp; \\text{(i,j)盘块正在使用}\n\\end{dcases}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:3em;vertical-align:-1.25em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size4\">{</span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.69em;\"><span style=\"top:-3.69em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">a</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">i</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\">0</span></span></span><span style=\"top:-2.25em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">a</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">i</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.19em;\"><span></span></span></span></span></span><span class=\"arraycolsep\" style=\"width:1em;\"></span><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.69em;\"><span style=\"top:-3.69em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord\">(i,j)</span><span class=\"mord cjk_fallback\">盘块空闲</span></span></span></span><span style=\"top:-2.25em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord\">(i,j)</span><span class=\"mord cjk_fallback\">盘块正在使用</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.19em;\"><span></span></span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p>盘块的序号是横向顺序数下去的，因此对于一个<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>m</mi><mo>×</mo><mi>n</mi></mrow><annotation encoding=\"application/x-tex\">m\\times n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\">m</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">n</span></span></span></span> 的位示图表，坐标为<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo stretchy=\"false\">(</mo><mi>i</mi><mo separator=\"true\">,</mo><mi>j</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">(i,j)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">i</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05724em;\">j</span><span class=\"mclose\">)</span></span></span></span> 的磁盘块的盘块号为</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>b</mi><mo>=</mo><mi>m</mi><mo stretchy=\"false\">(</mo><mi>i</mi><mo>−</mo><mn>1</mn><mo stretchy=\"false\">)</mo><mo>+</mo><mi>j</mi></mrow><annotation encoding=\"application/x-tex\">b = m(i - 1)+j\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em;\"></span><span class=\"mord mathnormal\">b</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\">m</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">i</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">1</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.854em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05724em;\">j</span></span></span></span></span></p>\n<p>盘块的回收则是根据盘块值反解二维坐标的过程。即</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mo fence=\"true\">{</mo><mtable rowspacing=\"0.36em\" columnalign=\"left left\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mi>i</mi><mo>=</mo><mrow><mo fence=\"true\">⌈</mo><mfrac><mi>b</mi><mi>m</mi></mfrac><mo fence=\"true\">⌉</mo></mrow></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mi>j</mi><mo>=</mo><mo stretchy=\"false\">(</mo><mi>b</mi><mo>−</mo><mn>1</mn><mo stretchy=\"false\">)</mo><mi mathvariant=\"normal\">%</mi><mi>m</mi><mo>+</mo><mo>+</mo><mn>1</mn></mrow></mstyle></mtd></mtr></mtable></mrow><annotation encoding=\"application/x-tex\">\\begin{dcases}\ni = \\left\\lceil\\frac{b}{m}\\right\\rceil \\\\\nj = (b-1) \\% m ++1\n\\end{dcases}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:3.84em;vertical-align:-1.67em;\"></span><span class=\"minner\"><span class=\"mopen\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.05em;\"><span style=\"top:-2.5em;\"><span class=\"pstrut\" style=\"height:3.15em;\"></span><span class=\"delimsizinginner delim-size4\"><span>⎩</span></span></span><span style=\"top:-2.492em;\"><span class=\"pstrut\" style=\"height:3.15em;\"></span><span style=\"height:0.016em;width:0.8889em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.8889em\" height=\"0.016em\" style=\"width:0.8889em\" viewbox=\"0 0 888.89 16\" preserveaspectratio=\"xMinYMin\"><path d=\"M384 0 H504 V16 H384z M384 0 H504 V16 H384z\"/></svg></span></span><span style=\"top:-3.15em;\"><span class=\"pstrut\" style=\"height:3.15em;\"></span><span class=\"delimsizinginner delim-size4\"><span>⎨</span></span></span><span style=\"top:-4.292em;\"><span class=\"pstrut\" style=\"height:3.15em;\"></span><span style=\"height:0.016em;width:0.8889em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.8889em\" height=\"0.016em\" style=\"width:0.8889em\" viewbox=\"0 0 888.89 16\" preserveaspectratio=\"xMinYMin\"><path d=\"M384 0 H504 V16 H384z M384 0 H504 V16 H384z\"/></svg></span></span><span style=\"top:-4.3em;\"><span class=\"pstrut\" style=\"height:3.15em;\"></span><span class=\"delimsizinginner delim-size4\"><span>⎧</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.55em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.17em;\"><span style=\"top:-4.17em;\"><span class=\"pstrut\" style=\"height:3.45em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">i</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">⌈</span></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3714em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">m</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">b</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">⌉</span></span></span></span></span><span style=\"top:-2.212em;\"><span class=\"pstrut\" style=\"height:3.45em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05724em;\">j</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">b</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\">1</span><span class=\"mclose\">)</span><span class=\"mord\">%</span><span class=\"mord mathnormal\">m</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\">+</span><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.67em;\"><span></span></span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<h2><span id=\"vfs-虚拟文件系统\"> VFS -- 虚拟文件系统</span></h2>\n<p>VFS 是 操作系统层与FS的中间层。由于现代操作系统会面对多种不同的FS，VFS统一化了面向操作系统的API，以保证系统能更加统一化的IO。</p>\n<p>VFS存在四种基础对象</p>\n<ul>\n<li>超级块 --  与FS中的超级块对象相对应。当文件系统挂载时，将超级块加载入内存</li>\n<li>Vnode -- vnode与每一个FS下的inode相对应，本质是对各个FS的元数据进行封装，通过指针索引FS中的私有元数据并向上暴露给OS。这体现了封装的同一性理念</li>\n<li>目录表对象与文件对象</li>\n</ul>\n<h2><span id=\"文件系统的挂载\"> 文件系统的挂载</span></h2>\n<p>Windows 的单个盘，比如<code>C:\\</code>表示分区/卷，比如可以在一个机械硬盘中进行多个分区<br>\n<img loading=\"lazy\" src=\"/picture/CS/disk.png\" alt=\"disk\"></p>\n<p>每一个卷作为独立的根目录 ，OS根据相应的驱动器定位对应的文件系统，再从其目录中查找</p>\n<p>Unix以单一根目录为挂载点进行挂载，挂载点只能挂载单一设备，但是同一个设备可以使用多个挂载点。</p>\n<p>WSL中的挂载</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-text\"><span class=\"line\"><span>(base) asuna@Asuna:~$ df -h</span></span>\n<span class=\"line\"><span>Filesystem      Size  Used Avail Use% Mounted on</span></span>\n<span class=\"line\"><span>none            3.9G     0  3.9G   0% /usr/lib/modules/6.6.114.1-microsoft-standard-WSL2</span></span>\n<span class=\"line\"><span>none            3.9G  4.0K  3.9G   1% /mnt/wsl</span></span>\n<span class=\"line\"><span>drivers         201G  198G  3.0G  99% /usr/lib/wsl/drivers</span></span>\n<span class=\"line\"><span>/dev/sdd       1007G   71G  885G   8% /</span></span>\n<span class=\"line\"><span>none            3.9G   84K  3.9G   1% /mnt/wslg</span></span>\n<span class=\"line\"><span>none            3.9G     0  3.9G   0% /usr/lib/wsl/lib</span></span>\n<span class=\"line\"><span>rootfs          3.9G  2.7M  3.9G   1% /init</span></span>\n<span class=\"line\"><span>none            3.9G  604K  3.9G   1% /run</span></span>\n<span class=\"line\"><span>none            3.9G     0  3.9G   0% /run/lock</span></span>\n<span class=\"line\"><span>none            3.9G     0  3.9G   0% /run/shm</span></span>\n<span class=\"line\"><span>none            3.9G   76K  3.9G   1% /mnt/wslg/versions.txt</span></span>\n<span class=\"line\"><span>none            3.9G   76K  3.9G   1% /mnt/wslg/doc</span></span>\n<span class=\"line\"><span>C:\\             201G  198G  3.0G  99% /mnt/c</span></span>\n<span class=\"line\"><span>D:\\             753G  626G  128G  84% /mnt/d</span></span>\n<span class=\"line\"><span>E:\\             954G  837G  118G  88% /mnt/e</span></span>\n<span class=\"line\"><span>tmpfs           781M   20K  781M   1% /run/user/1000</span></span></code></pre>\n<p>WSL 可以访问宿主Windows机器的卷，挂载在<code>/mnt/*</code>上，因此也可以查看宿主机的硬盘挂载情况</p>\n<p>Macos的挂载方式与Linux类似<br>\n<img loading=\"lazy\" src=\"/picture/CS/mount.png\" alt=\"mount\"></p>\n",
            "tags": []
        },
        {
            "id": "https://yuuko.site/2026/06/04/CS/LLM/optimizer/",
            "url": "https://yuuko.site/2026/06/04/CS/LLM/optimizer/",
            "title": "optimizer",
            "date_published": "2026-06-03T16:00:00.000Z",
            "content_html": "<link rel=\"stylesheet\" type=\"text/css\" href=\"https://cdn.jsdelivr.net/hint.css/2.4.1/hint.min.css\">",
            "tags": []
        },
        {
            "id": "https://yuuko.site/2026/06/02/CS/OS/%E5%86%85%E5%AD%98%E7%AE%A1%E7%90%86/",
            "url": "https://yuuko.site/2026/06/02/CS/OS/%E5%86%85%E5%AD%98%E7%AE%A1%E7%90%86/",
            "title": "内存管理",
            "date_published": "2026-06-01T16:00:00.000Z",
            "content_html": "<link rel=\"stylesheet\" type=\"text/css\" href=\"https://cdn.jsdelivr.net/hint.css/2.4.1/hint.min.css\"><h1><span id=\"内存管理\"> 内存管理</span></h1>\n<p>内存管理是操作系统对内存空间的分配、组织、管理、回收，以及物理/虚拟内存地址映射的一整套庞大的管理系统。</p>\n<h2><span id=\"内存的分配\"> 内存的分配</span></h2>\n<p>内存的分配包括连续分配方式与离散分配方式，其中连续分配方式是将</p>\n<h3><span id=\"连续内存分配方式\"> 连续内存分配方式</span></h3>\n<p>连续分配管理方式直接向用户进程分配物理内存的连续内存空间。因为其物理空间是完整连续的，它的地址转换通常只需通过硬件的基址寄存器进行简单的加法计算即可完成，而不需要像非连续分配（分页/分段机制）那样，依赖页表或段表来进行复杂的离散内存寻址与映射。</p>\n<p>连续分配方式分为单一连续分配、固定分区分配与动态分区分配</p>\n<h4><span id=\"单一分区分配\"> 单一分区分配</span></h4>\n<p>将内存分为系统区与用户区，用户区单次只允许一条进程访问。单一分区分配不会产生外部碎片，但是其对于内存的利用率过低，在现代操作系统中被淘汰。</p>\n<h4><span id=\"固定分区分配\"> 固定分区分配</span></h4>\n<p>单一分区分配只支持单进程访问内存，多进程并发的场景受限。固定分区将<strong>物理内存空间</strong>分为固定大小的分区，每一个分块只能容纳一道进程作业。</p>\n<p>固定分区可以分为相同内存大小的分区与不同内存大小的分区。</p>\n<ul>\n<li>分区大小相等 -- 缺乏灵活性，分区过小限制大，分区过大会产生内部碎片，造成空间浪费。</li>\n<li>分区大小不等 -- 需要维护一个内存分配的分区使用表，按照分区大小排列，包括内存大小、初始地址与是否被分配。</li>\n</ul>\n<h4><span id=\"动态分区分配\"> 动态分区分配</span></h4>\n<p>按照进程的动态需求分配连续内存空间。动态分区分配方式在进程释放后会出现<strong>内存碎片</strong>，这样的内存碎片存在于给进程分配的内存块外，因此称为<strong>外部碎片</strong>，即一小段的连续内存空间。假如之后的进程需求内存小于内存碎片，则可以直接从内存碎片分配内存空间，但是代价是再次释放进程后会将内存碎片分为更小的内存碎片，直到内存碎片无法支持任何进程。</p>\n<p>对于外部碎片，可以通过紧凑技术合并外部碎片。操作系统周期性地移动进程内存地址，将进程的首址与进程的终址拼接，从而将碎片合并。</p>\n<p>紧凑技术需要进行<strong>动态重定位</strong>以实现，性能开销较大。</p>\n<p>通常通过链表维护空闲分区，以起址排序。内存块回收时，内存空闲区的合并大致可以分为四类</p>\n<ul>\n<li>内存块回收时，与前一空闲区相邻 -- 合并并更新前一空闲区的大小，空闲区的个数不改变</li>\n<li>内存块回收时，与后一空闲区相邻 -- 合并并更新后一空闲区的大小，空闲区的个数不改变</li>\n<li>内存块回收时，不与任何空闲区相邻 -- 创建新的空闲表项并插入空闲区链表，空闲区个数+1</li>\n<li>内存块回收时，前后均与空闲区相邻 -- 合并并更新前一空闲区大小(前+当前块+后)，删除后一空闲块的链表项</li>\n</ul>\n<h4><span id=\"空闲内存块分配方式\"> 空闲内存块分配方式</span></h4>\n<p>动态分区中有很多的内存块，如何将空闲的、具有足够大小的内存块分配给进程作业呢？</p>\n<p>根据分区索引方式分为顺序分配与索引分配。</p>\n<h5><span id=\"空闲内存块的顺序分配\"> 空闲内存块的顺序分配</span></h5>\n<p>由于空闲分区内存块的链表是根据空闲地址升序排列的，**首次分配算法(First Fit)**使用最简单的顺序查找，找到第一个满足内存条件的空闲分区就分配给进程。</p>\n<p>首次分配算法下，通常高地址的内存块是没有被使用过的完整分块，因此有利于大内存开销的进程使用。但是低地址区域容易积累小的外部碎片。每次查找需要从表头开始，查找的开销也较大。</p>\n<p>基于首次分配算法的查找逻辑，**邻近适应算法(Next Fit)**的每一轮搜索起点为上次搜索的结束位置。但是首次分配算法的大内存块保留的优势因为搜索起点变化而被破坏，整体性能不如首次分配算法</p>\n<p><strong>最佳适应算法(Best Fit)</strong> 将空闲内存块根据容量的升序排列，根据内存大小进行最优分配。最佳适应算法目的是追求单次分配的最大内存利用率，但是会产生相当多的微小外部碎片，实际内存利用率与性能不佳。</p>\n<p><strong>最坏适应算法(Worth Fit)</strong> 将空闲分区按容量递减排列，主动将最大的内存块分割为小块，每次分块总是内存利用率最小的，从而避免生成小碎片。但是难以有足够的大内存块满足大作业的内存需求。</p>\n<h5><span id=\"空闲内存块的索引搜索分配\"> 空闲内存块的索引搜索分配</span></h5>\n<p>通过索引表查找的方式分配空闲内存块。常见的方式有三种: 快速适应算法(Quick Fit)、伙伴系统(Buddy System)、哈希算法</p>\n<p>快速适应算法是根据常用进程占据的空间进行预分块，分为固定大小的内存块。但是回收时进行合并操作的逻辑较为复杂，因为合并后的块大小不一定是常用进程的分块大小，不可避免产生内部碎片。</p>\n<p>伙伴系统是将内存空间进行空间二分，形成一个内存块二叉树。如果内存块不足以分配给进程，就向上合并为更大的块以匹配需求。通常不会适应类似于红黑树/B树等更加复杂的数据结构进行内存的管理，更加复杂的数据结构意味着更加复杂的硬件逻辑与性能问题。</p>\n<p>哈希算法通过哈希函数得到空闲块的头指针以快速获取空闲分块链。</p>\n<h3><span id=\"离散内存分配方式\"> 离散内存分配方式</span></h3>\n<p>即分页方式和分段方式，通过页表和段表的映射方式管理内存块</p>\n<h4><span id=\"基本分页存储管理\"> 基本分页存储管理</span></h4>\n<p>分页管理模式将物理内存分为若干大小相等的物理内存单元，称为<strong>页框</strong>(物理块/页帧)；将逻辑空间分为与物理空间相等大小的区域称为<strong>页</strong>。系统以页框为单位分配进程的内存资源。</p>\n<h5><span id=\"页与页框的映射方式\"> 页与页框的映射方式</span></h5>\n<p>逻辑地址中的页有一个唯一的编号称为<strong>页号</strong>，同时物理地址的页框也有唯一编号称为<strong>页框号</strong>。</p>\n<p>页号到页框号的映射通过<strong>页表</strong>记录。页表包含每一个页的对应物理页框号、以及相应的状态信息(有效位、访问位等)</p>\n<p>逻辑地址的构成由页号+页内偏移量构成。页号用于确定逻辑地址所在的页面，页内偏移量用于确定相对于页面首址的偏移量。这两个地址共同刻画了单一逻辑地址对象。</p>\n<p>逻辑地址与物理地址的转换需要<strong>地址变换机构</strong>，通过页表将逻辑地址映射到物理空间。通过**页表寄存器(PTR)**加快地址变换的速度, 用以存放页表始址与长度。</p>\n<p>设逻辑侧的页大小/物理侧的页框大小为<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>L</mi></mrow><annotation encoding=\"application/x-tex\">L</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\">L</span></span></span></span>, 逻辑地址为<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>A</mi></mrow><annotation encoding=\"application/x-tex\">A</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\">A</span></span></span></span>, 物理地址<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>E</mi></mrow><annotation encoding=\"application/x-tex\">E</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05764em;\">E</span></span></span></span>, 页表长度<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>M</mi></mrow><annotation encoding=\"application/x-tex\">M</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">M</span></span></span></span></p>\n<p>地址变换的逻辑为:</p>\n<ul>\n<li>根据逻辑地址进行页号与偏移量的计算: 页号 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>P</mi><mo>=</mo><mo stretchy=\"false\">⌊</mo><mfrac><mi>A</mi><mi>L</mi></mfrac><mo stretchy=\"false\">⌋</mo></mrow><annotation encoding=\"application/x-tex\">P = \\lfloor \\frac{A}{L} \\rfloor</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">P</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.2173em;vertical-align:-0.345em;\"></span><span class=\"mopen\">⌊</span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8723em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">L</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.394em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">A</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.345em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose\">⌋</span></span></span></span>(填满了多少块), 页内偏移 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>W</mi><mo>=</mo><mi>A</mi><mi mathvariant=\"normal\">%</mi><mi>L</mi></mrow><annotation encoding=\"application/x-tex\">W = A\\% L</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8056em;vertical-align:-0.0556em;\"></span><span class=\"mord mathnormal\">A</span><span class=\"mord\">%</span><span class=\"mord mathnormal\">L</span></span></span></span> (偏移)</li>\n<li>越界判断: <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>P</mi><mo>≥</mo><mi>M</mi></mrow><annotation encoding=\"application/x-tex\">P\\geq M</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8193em;vertical-align:-0.136em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">P</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≥</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">M</span></span></span></span> 说明当前逻辑地址所在页超过页表范围，触发越界中断</li>\n<li>根据页号<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>P</mi></mrow><annotation encoding=\"application/x-tex\">P</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">P</span></span></span></span> 在页表中查找页框号<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>T</mi></mrow><annotation encoding=\"application/x-tex\">T</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">T</span></span></span></span>, 并计算该页框的始址<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>b</mi></mrow><annotation encoding=\"application/x-tex\">b</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em;\"></span><span class=\"mord mathnormal\">b</span></span></span></span></li>\n<li>逻辑地址对应的物理地址为对应页框相等页内偏移的地址</li>\n</ul>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-text\"><span class=\"line\"><span>A = PL+W // 逻辑侧</span></span>\n<span class=\"line\"><span>T = σ(P) // 页表映射</span></span>\n<span class=\"line\"><span>E = TL+W // 物理侧</span></span></code></pre>\n<p>如果地址转换速度慢或者页表过大，可能导致内存系统的性能下降。诸如快表、多级页表等技术解决了这样的问题</p>\n<h5><span id=\"基于快表的地址变换机构\"> 基于快表的地址变换机构</span></h5>\n<p><a href=\"/2026/05/09/CS/%E8%AE%A1%E7%BB%84/%E8%99%9A%E6%8B%9F%E5%AD%98%E5%82%A8%E5%99%A8/\" title=\"虚拟存储器\">虚拟存储器</a> 中提到 <strong>快表(TLB)</strong> 是基于Cache实现的、能快速并行查找的存储器。</p>\n<p>快表中的存储形式和内存页表一致，都是页号+页内偏移量的形式。但是遵循局部性原理存入更高频使用的页表项，能提高地址变换的整体性能。</p>\n<p>在有快表的地址变换机构中，CP优先向快表中的页号进行比较，如果快表页号命中就直接读取快表中存储的物理块号；否则就访问内存页表。</p>\n<h5><span id=\"两级页表结构\"> 两级页表结构</span></h5>\n<p>两级页表结构将底层页表也做了一层页表离散映射，相当于将页表作为一个<strong>地址</strong><br>\n。同时，系统只将活跃的页表调入内存，其余页表置于内存当中。</p>\n<p>假设单个页框内能容纳的页表项数量为 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>K</mi></mrow><annotation encoding=\"application/x-tex\">K</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span></span></span></span></p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-text\"><span class=\"line\"><span>P_1 = ⌊P_0/K⌋ // 一级页表索引</span></span>\n<span class=\"line\"><span>P_2 = P_0 % k // 二级页表索引</span></span>\n<span class=\"line\"><span>A = P_0L+W    // 逻辑侧 -- P_0 为总逻辑页号</span></span>\n<span class=\"line\"><span>T = σ(P)      // 页表映射</span></span>\n<span class=\"line\"><span>E = TL+W      // 物理侧</span></span></code></pre>\n<p>两级页表结构本质是对页进行分组，满足</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>P</mi><mn>0</mn></msub><mo>=</mo><msub><mi>P</mi><mn>1</mn></msub><mi>K</mi><mo>+</mo><msub><mi>P</mi><mn>2</mn></msub></mrow><annotation encoding=\"application/x-tex\">P_0 = P_1K+P_2\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">P</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">0</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">P</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">P</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span></span></p>\n<p>因此逻辑地址由三个部分组成</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-text\"><span class=\"line\"><span>|一级页表索引|二级页表索引|页内偏移量|</span></span></code></pre>\n<p>硬件系统通常设置一个外层页表寄存器用于存放一级页表的起始地址</p>\n<h4><span id=\"基本分段存储管理\"> 基本分段存储管理</span></h4>\n<p>分页管理方式是硬件上的内存管理方式，分段管理方式是面向用户的内存管理方式。</p>\n<p>分段系统将用户进程的逻辑空间分为不同大小的段，包含如主程序段、子程序段、栈段与数据段等不同内存段，每一个段从0地址开始兵分配连续的段内空间。通过段号+段内偏移量确定逻辑地址</p>\n<p>分段结构中的逻辑地址结构</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-text\"><span class=\"line\"><span>|段号 S|段内偏移量 W|</span></span></code></pre>\n<p>段表结构</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-text\"><span class=\"line\"><span>|段表|段长|本段在主存中的初始地址|</span></span></code></pre>\n<p>用于确定段在物理地址中的区域，段地址变换机构也能通过段表进行逻辑地址与物理地址的映射</p>\n<h5><span id=\"段与共享\"> 段与共享</span></h5>\n<p>系统能维护一个共享段表以支持多个共享段访问共享段表。不可修改的代码支持多个进程的并发执行，系统会将这样的纯代码放入只读段中。同时每一个进程都有私有的局部数据区，避免共享访问私有数据。</p>\n<h5><span id=\"段的保护\"> 段的保护</span></h5>\n<ul>\n<li>越界保护 段号不能超过段表长度</li>\n<li>存取控制保护 通过访问权限位防止非法访问</li>\n</ul>\n<h4><span id=\"段页式存储管理\"> 段页式存储管理</span></h4>\n<p>先根据用户逻辑讲进程地址分段，再将每一个段内存块根据页逻辑存储。进程的逻辑地址由三个部分构成</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-text\"><span class=\"line\"><span>|段号 S|页号 P|页内偏移量 W|</span></span></code></pre>\n<p>段页寻址中段寻址部分需要通过段表实现，这和单段表的逻辑是一样的。</p>\n",
            "tags": []
        },
        {
            "id": "https://yuuko.site/2026/06/01/CS/LLM/moe/",
            "url": "https://yuuko.site/2026/06/01/CS/LLM/moe/",
            "title": "MoE -- Mixture of experts",
            "date_published": "2026-05-31T16:00:00.000Z",
            "content_html": "<link rel=\"stylesheet\" type=\"text/css\" href=\"https://cdn.jsdelivr.net/hint.css/2.4.1/hint.min.css\"><h1><span id=\"mixture-of-experts\"> Mixture of Experts</span></h1>\n",
            "tags": [
                "架构"
            ]
        },
        {
            "id": "https://yuuko.site/2026/06/01/CS/OS/deadlock/",
            "url": "https://yuuko.site/2026/06/01/CS/OS/deadlock/",
            "title": "死锁",
            "date_published": "2026-05-31T16:00:00.000Z",
            "content_html": "<link rel=\"stylesheet\" type=\"text/css\" href=\"https://cdn.jsdelivr.net/hint.css/2.4.1/hint.min.css\"><h1><span id=\"死锁\"> 死锁</span></h1>\n<p>死锁，即进程间的资源竞争陷入了一个循环等待的链，等待其他进程的资源但是自身占用了一定的资源。当进程链形成有向环且进程都等待时，系统处于死锁。</p>\n<h3><span id=\"死锁产生的原因\"> 死锁产生的原因</span></h3>\n<ul>\n<li>系统资源的竞争 -- 系统资源分配不足但不可剥夺时才会出现死锁</li>\n<li>进程推进顺序非法 -- 系统资源充足但是进程的资源申请与释放的顺序不合法，为此我们需要通过安全性检测与银行家算法得到合法的执行顺序。</li>\n</ul>\n<h3><span id=\"死锁产生的必要条件\"> 死锁产生的必要条件</span></h3>\n<ul>\n<li>互斥访问 -- 资源只能一个进程访问，否则不会产生资源竞争。</li>\n<li>不可剥夺条件</li>\n<li>请求并保持条件 -- 如果进程申请资源不足，并不会释放当前拥有的资源而是保持占用。</li>\n<li>循环等待</li>\n</ul>\n<h2><span id=\"死锁的预防\"> 死锁的预防</span></h2>\n<p>死锁的预防即通过硬件手段扼杀死锁的产生，针对死锁产生的四个必要条件破坏。</p>\n<ul>\n<li>破坏互斥条件 -- 基本不可行，对于类似于打印机的设备无法实现并行访问</li>\n<li>破坏不可剥夺条件 -- 资源可抢夺，可能导致任务执行混乱，通常也不可行</li>\n<li>破坏请求并保持条件 -- 进程不能在拥有不可剥夺资源时申请新资源。\n<ul>\n<li>静态资源分配： 进程运行前一次性分配所有资源，资源不足则不执行。静态资源分配可能造成系统资源浪费与饥饿现象</li>\n<li>动态申请并释放： 进程可以申请资源，但是使用完拥有的资源释放后才能申请新资源</li>\n</ul>\n</li>\n<li>破坏循环等待条件 -- 基于编号分配资源，从而避免自然产生死锁环。</li>\n</ul>\n<h2><span id=\"死锁避免\"> 死锁避免</span></h2>\n<p>死锁避免是通过软件方式对当前执行状态下的进程序列进行安全性检验。</p>\n<h3><span id=\"安全性检查\"> 安全性检查</span></h3>\n<p>系统安全性检查是对于某个进程组，进程序列 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo stretchy=\"false\">(</mo><msub><mi>P</mi><mn>1</mn></msub><mo separator=\"true\">,</mo><mo>⋯</mo><mtext> </mtext><mo separator=\"true\">,</mo><msub><mi>P</mi><mi>n</mi></msub><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">(P_1,\\cdots ,P_n)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">P</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\">⋯</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">P</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span> 的执行是否不会发生死锁。如果在某个初始分配条件下的任何进程序列都是安全序列，则系统是安全的；反之，如果系统不安全，其不一定会陷入死锁，因为系统不一定会选择不安全序列执行。</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-cpp\"><span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">#</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">include</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> &#x3C;</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">vector</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">></span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">#</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">include</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> &#x3C;</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">stdexcept</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">></span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> num </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 3</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">  // 3个进程</span></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> resource </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 2</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"> // 2个互斥资源</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#2E8F82;--shiki-dark:#5DA994\">vector</span><span style=\"color:#999999;--shiki-dark:#666666\">&#x3C;</span><span style=\"color:#59873A;--shiki-dark:#80A665\">std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#2E8F82;--shiki-dark:#5DA994\">vector</span><span style=\"color:#999999;--shiki-dark:#666666\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#999999;--shiki-dark:#666666\">></span><span style=\"color:#999999;--shiki-dark:#666666\">></span><span style=\"color:#59873A;--shiki-dark:#80A665\"> max</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2E8F82;--shiki-dark:#5DA994\">num</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#59873A;--shiki-dark:#80A665\">std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#59873A;--shiki-dark:#80A665\">vector</span><span style=\"color:#999999;--shiki-dark:#666666\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#999999;--shiki-dark:#666666\">></span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">resource</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#2E8F82;--shiki-dark:#5DA994\">vector</span><span style=\"color:#999999;--shiki-dark:#666666\">&#x3C;</span><span style=\"color:#59873A;--shiki-dark:#80A665\">std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#2E8F82;--shiki-dark:#5DA994\">vector</span><span style=\"color:#999999;--shiki-dark:#666666\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#999999;--shiki-dark:#666666\">></span><span style=\"color:#999999;--shiki-dark:#666666\">></span><span style=\"color:#59873A;--shiki-dark:#80A665\"> allocate</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2E8F82;--shiki-dark:#5DA994\">num</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#59873A;--shiki-dark:#80A665\">std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#59873A;--shiki-dark:#80A665\">vector</span><span style=\"color:#999999;--shiki-dark:#666666\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#999999;--shiki-dark:#666666\">></span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">resource</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#2E8F82;--shiki-dark:#5DA994\">vector</span><span style=\"color:#999999;--shiki-dark:#666666\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">bool</span><span style=\"color:#999999;--shiki-dark:#666666\">></span><span style=\"color:#59873A;--shiki-dark:#80A665\"> finish</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2E8F82;--shiki-dark:#5DA994\">num</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#2E8F82;--shiki-dark:#5DA994\">false</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#2E8F82;--shiki-dark:#5DA994\">vector</span><span style=\"color:#999999;--shiki-dark:#666666\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#999999;--shiki-dark:#666666\">></span><span style=\"color:#59873A;--shiki-dark:#80A665\"> work</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2E8F82;--shiki-dark:#5DA994\">resource</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">   // 当前可用资源</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">for</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> i </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> i </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> num</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> i</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">++</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">     // 顺序执行example， 检测P0-P1-P2序列</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">    std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">vector</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">></span><span style=\"color:#59873A;--shiki-dark:#80A665\"> need</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">resource</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">    int</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> sign </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    for</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> j </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> j </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> resource</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> j</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">++</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#B07D48;--shiki-dark:#BD976A\">        need</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">j</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">]</span><span style=\"color:#999999;--shiki-dark:#666666\"> =</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> max</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">i</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">]</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">j</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">]</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> -</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> allocate</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">i</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">]</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">j</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">        if</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\">need</span><span style=\"color:#a13865;--shiki-dark:#d9739f\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">j</span><span style=\"color:#a13865;--shiki-dark:#d9739f\">]</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">=</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> work</span><span style=\"color:#a13865;--shiki-dark:#d9739f\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">j</span><span style=\"color:#a13865;--shiki-dark:#d9739f\">]</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            sign </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">++</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">        </span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">    </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#125;</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    </span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    if</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">sign </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">==</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> resource</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#B07D48;--shiki-dark:#BD976A\">        finish</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">i</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">]</span><span style=\"color:#999999;--shiki-dark:#666666\"> =</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\"> true</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">        for</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> k </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> k </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> resource</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> k</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">++</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#B07D48;--shiki-dark:#BD976A\">            work</span><span style=\"color:#a13865;--shiki-dark:#d9739f\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">k</span><span style=\"color:#a13865;--shiki-dark:#d9739f\">]</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> +=</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> allocate</span><span style=\"color:#a13865;--shiki-dark:#d9739f\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">i</span><span style=\"color:#a13865;--shiki-dark:#d9739f\">]</span><span style=\"color:#a13865;--shiki-dark:#d9739f\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">k</span><span style=\"color:#a13865;--shiki-dark:#d9739f\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">        </span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">    </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    else</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">        throw</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#59873A;--shiki-dark:#80A665\">runtime_error</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">Unsafe process seq</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">    </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span>\n<span class=\"line\"></span></code></pre>\n<h3><span id=\"银行家算法\"> 银行家算法</span></h3>\n<p>银行家算法不需要遍历所有的安全序列。对于当前资源足够执行的进程，将试探性分配资源再通过安全性检查检测系统是否为安全状态，安全则正式分配，否则撤销分配。</p>\n<h2><span id=\"死锁检测\"> 死锁检测</span></h2>\n<p>死锁检测是在进程执行中，判断当前是否处于死锁状态，只根据当前的资源分配情况判断，而不进行进程的安全性检验与未来运行的预测。</p>\n<p>通常通过资源分配图进行简化与判断。有向边从进程指向资源表示进程申请资源；从资源指向进程表示进程已经分配的资源。</p>\n<p><strong>简化</strong>：如果资源类中存在数量足够的资源，则将请求有向边反向；入股进程结点只有入度，则该进程资源分配足够，执行完成就可以删除相应入边并释放资源。</p>\n<p>如果简化进程资源分配图仍然存在有向边，则系统处于死锁状态。基于此，可以归纳出</p>\n<p><strong>死锁定理</strong>： 系统死锁当且仅当资源分配图不可完全简化</p>\n<h2><span id=\"死锁解除\"> 死锁解除</span></h2>\n<p>通过操作系统进行死锁状态的解除</p>\n<ul>\n<li>资源剥夺 -- 挂起死锁进程并释放资源，分配给其他进程。可能存在进程长时间挂起而饥饿的现象</li>\n<li>撤销进程 -- 终止死锁进程并回收资源。</li>\n<li>进程回退 -- 回到未死锁状态并进行安全序列的执行。</li>\n</ul>\n",
            "tags": []
        },
        {
            "id": "https://yuuko.site/2026/05/29/Reading/kawabata/%E4%BC%8A%E8%B1%86%E7%9A%84%E8%88%9E%E5%A5%B3/",
            "url": "https://yuuko.site/2026/05/29/Reading/kawabata/%E4%BC%8A%E8%B1%86%E7%9A%84%E8%88%9E%E5%A5%B3/",
            "title": "《伊豆的舞女》 -- 川端康诚",
            "date_published": "2026-05-28T16:00:00.000Z",
            "content_html": "<link rel=\"stylesheet\" type=\"text/css\" href=\"https://cdn.jsdelivr.net/hint.css/2.4.1/hint.min.css\"><p>《伊豆的舞女》并不是我看过的第一本川端康诚的书，相比《雪国》、《千纸鹤》等刻作品刻画的日本压抑严肃社会背景下的扭曲的爱情，《伊豆的舞女》中的萍水相逢间平淡而朴素的好感与爱恋，却如山间的野花纯真，也如野花般易逝。</p>\n<div class=\"note default subtitle no-icon\">\n<p>在黑暗中，少年的体温温暖着我。我任凭泪泉涌流，我的头脑恍如变成一池清水，一滴滴溢了出来，后来什么也没有留下，顿时觉得舒畅了。</p>\n</div>\n",
            "tags": [
                "读书笔记",
                "川端康诚"
            ]
        },
        {
            "id": "https://yuuko.site/2026/05/29/CS/OS/%E8%BF%9B%E7%A8%8B%E7%9A%84%E5%90%8C%E6%AD%A5%E4%B8%8E%E4%BA%92%E6%96%A5/",
            "url": "https://yuuko.site/2026/05/29/CS/OS/%E8%BF%9B%E7%A8%8B%E7%9A%84%E5%90%8C%E6%AD%A5%E4%B8%8E%E4%BA%92%E6%96%A5/",
            "title": "进程的同步与互斥",
            "date_published": "2026-05-28T16:00:00.000Z",
            "content_html": "<link rel=\"stylesheet\" type=\"text/css\" href=\"https://cdn.jsdelivr.net/hint.css/2.4.1/hint.min.css\"><h1><span id=\"进程的同步与互斥\"> 进程的同步与互斥</span></h1>\n<p>进程会对临界资源的访问权限竞争。通常临界区只支持一个进程进行读/写，因为如果两条进程对临界区的相同数据进行写操作，有可能导致两个进程的发生先后不同，导致产生不同的执行结果。</p>\n<h3><span id=\"临界资源\"> 临界资源</span></h3>\n<p>临界资源是能被多个进程访问，但是被多个进程同时访问会产生冲突的资源。临界区是访问这一块共享资源的<strong>代码</strong>。</p>\n<p>进程对临界资源的访问分为四个部分</p>\n<ul>\n<li>进入区 -- 进入临界区前的检查部分</li>\n<li>临界区 -- 实际访问临界资源的代码段</li>\n<li>退出区 -- 离开临界区的释放资源的段</li>\n<li>剩余区 -- 其他部分</li>\n</ul>\n<h3><span id=\"进程的同步\"> 进程的同步</span></h3>\n<p>进程的同步指的是多个异步进程，需要彼此协调运行次序，而在执行流程上产生等待或者进程信息传递等制约关系。比如进程B需要进程A提供的相关数据，A在缓冲区写入信息、B在缓冲区读信息，那么进程A就是进程B的严格前序进程，这两个进程是同步的。</p>\n<h3><span id=\"进程的互斥\"> 进程的互斥</span></h3>\n<p>进程的互斥指的是，对于一个缓冲区有多个进程能进行读、写操作。当一个进程进入缓冲区后，其他需要访问缓冲区的进程只能等待当前进程结束才能进入缓冲区。</p>\n<p>避免多进程进入缓冲区的设计策略:</p>\n<ul>\n<li>空闲让进 -- 缓冲区空则放行</li>\n<li>忙则等待 -- 缓冲区有进程，其他进程就挂起等待</li>\n<li>有限等待 -- 挂起进程不能一直被挂起等待，应保证在有限时间内得到缓冲区资源</li>\n<li>让权等待 -- 等待期应当放弃CPU使用权，变为阻塞态，避免忙等待。具体体现在不应反复执行判断语句占用CPU资源。</li>\n</ul>\n<h2><span id=\"实现临界区互斥的方法\"> 实现临界区互斥的方法</span></h2>\n<p>实现临界区互斥的方法通常都需要基于上面四个设计策略构建，其中让权等待是节省CPU性能开销的可选择策略。</p>\n<p>实现方法分为软件实现方法与硬件实现方法。</p>\n<h3><span id=\"软件实现方法\"> 软件实现方法</span></h3>\n<h4><span id=\"单标志法\"> 单标志法</span></h4>\n<p>单标志法通过一个一个公共变量<code>turn</code>指示当前允许进入临界区的进程编号</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-cpp\"><span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">#</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">include</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> &#x3C;</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">thread</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">></span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">#</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">include</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> &#x3C;</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">iostream</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">></span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">#</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">include</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> &#x3C;</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">atomic</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">></span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">atomic</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">></span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> turn </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">void</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> critical_section</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> id</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">    std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">cout </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">thread </span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> id </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> in critical section</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\">\\n</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">void</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> remainder_section</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> id</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">    std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">cout </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">thread </span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> id </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> in remainder section</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\">\\n</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">void</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> thread0</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    while</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">true</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">        while</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">turn </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">!=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">            // busy waiting</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">        </span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">&#125;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        critical_section</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        turn </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 1</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        remainder_section</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">    </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">void</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> thread1</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    while</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">true</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">        while</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">turn </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">!=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 1</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">&#123;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">        </span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">&#125;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        critical_section</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        turn </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        remainder_section</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">    </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> main</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">    std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">thread </span><span style=\"color:#59873A;--shiki-dark:#80A665\">t0</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">thread0</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">    std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">thread </span><span style=\"color:#59873A;--shiki-dark:#80A665\">t1</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">thread1</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#B07D48;--shiki-dark:#BD976A\">    t0</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#59873A;--shiki-dark:#80A665\">join</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#B07D48;--shiki-dark:#BD976A\">    t1</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#59873A;--shiki-dark:#80A665\">join</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    return</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span></code></pre>\n<p>两个进程必须按照<code>turn</code>的标记交替进入临界区访问，是互锁的，假如<code>thread0</code>不再申请临界区, 则<code>turn</code>始终为0，<code>thread1</code>也无法再申临界区，违背<strong>空闲让进</strong>原则</p>\n<h4><span id=\"双标志法\"> 双标志法</span></h4>\n<p>在<code>turn</code>的基础上，添加了布尔数组<code>flag[]</code>， 用于表示进程访问临界区的意愿，<code>flag[i]=true</code>表示进程i需要访问临界区。</p>\n<p><strong>双标志先检查法</strong></p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-cpp\"><span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">#</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">include</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">&#x3C;</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">iostream</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">></span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">#</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">include</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">&#x3C;</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">thread</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">></span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">#</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">include</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">&#x3C;</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">atomic</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">></span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">atomic</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">></span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> turn </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">atomic</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">bool</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">></span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> flag</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">2</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span><span style=\"color:#999999;--shiki-dark:#666666\"> =</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">false</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">false</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">void</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> critical_section</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> id</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">    std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">cout </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">thread </span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> id </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> in critical section</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\">\\n</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">void</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> remainder_section</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> id</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">    std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">cout </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">thread </span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> id </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> in remainder section</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\">\\n</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">void</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> thread0</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    while</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\">flag</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">]</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> ;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">   // 先进行其他进程的意愿检查</span></span>\n<span class=\"line\"><span style=\"color:#B07D48;--shiki-dark:#BD976A\">    flag</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\"> =</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\"> true</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">    // 再设置本进程的进入意愿</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">    critical_section</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#B07D48;--shiki-dark:#BD976A\">    flag</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\"> =</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\"> false</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">    // 访问完临界区后降低访问意愿</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">    remainder_section</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">void</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> thread1</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    while</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\">flag</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">]</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> ;</span></span>\n<span class=\"line\"><span style=\"color:#B07D48;--shiki-dark:#BD976A\">    flag</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\"> =</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\"> true</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">    critical_section</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#B07D48;--shiki-dark:#BD976A\">    flag</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\"> =</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\"> false</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    remainder_section</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#123;</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#125;</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> main</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">    int</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> main</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">thread </span><span style=\"color:#59873A;--shiki-dark:#80A665\">t0</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">thread0</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">thread </span><span style=\"color:#59873A;--shiki-dark:#80A665\">t1</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">thread1</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#B07D48;--shiki-dark:#BD976A\">        t0</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#59873A;--shiki-dark:#80A665\">join</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#B07D48;--shiki-dark:#BD976A\">        t1</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#59873A;--shiki-dark:#80A665\">join</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">        return</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">    </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span></code></pre>\n<p>双标志先检查法的问题在于，如果两个进程都通过了意愿检查，可能出现两个进程同时使用临界区的问题，违背了<strong>忙则等待</strong>原则</p>\n<p><strong>双标志后检查法</strong></p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-cpp\"><span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">/* 其他的设置与 双标志先检查法 相同*/</span></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">void</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> thread0</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#B07D48;--shiki-dark:#BD976A\">    flag</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\"> =</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\"> true</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">      // 先设置本进程访问意愿</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    while</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\">flag</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">]</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">      // 再检查其他进程的访问优先级</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">    critical_section</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#B07D48;--shiki-dark:#BD976A\">    flag</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\"> =</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\"> false</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">    // 访问完临界区后降低访问意愿</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">    remainder_section</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">void</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> thread1</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#B07D48;--shiki-dark:#BD976A\">    flag</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\"> =</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\"> true</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    while</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\">flag</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">]</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> ;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">    critical_section</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#B07D48;--shiki-dark:#BD976A\">    flag</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\"> =</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\"> false</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    remainder_section</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#123;</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#125;</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> main</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">    int</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> main</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">thread </span><span style=\"color:#59873A;--shiki-dark:#80A665\">t0</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">thread0</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">thread </span><span style=\"color:#59873A;--shiki-dark:#80A665\">t1</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">thread1</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#B07D48;--shiki-dark:#BD976A\">        t0</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#59873A;--shiki-dark:#80A665\">join</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#B07D48;--shiki-dark:#BD976A\">        t1</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#59873A;--shiki-dark:#80A665\">join</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">        return</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">    </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span>\n<span class=\"line\"></span></code></pre>\n<p>双标志后检查法解决了双标志先检查法的<strong>忙则等待</strong>问题，但是双标记后检查法可能遇到两个进程都想访问临界区，都进行等待，导致临界区空闲，违背了<strong>空闲让进</strong>与<strong>有限等待</strong>准则。</p>\n<h4><span id=\"peterson-算法\"> Peterson 算法</span></h4>\n<p>Peterson算法融合了<code>turn</code>与<code>flag</code>的优点，通过<code>turn</code>管理互斥访问，通过<code>flag</code>管理饥饿问题</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-cpp\"><span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">#</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">include</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> &#x3C;</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">array</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">></span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">#</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">include</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> &#x3C;</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">atomic</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">></span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">#</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">include</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> &#x3C;</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">chrono</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">></span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">#</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">include</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> &#x3C;</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">iostream</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">></span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">#</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">include</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> &#x3C;</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">mutex</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">></span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">#</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">include</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> &#x3C;</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">string</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">></span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">#</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">include</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> &#x3C;</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">thread</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">></span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">using</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> namespace</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#59873A;--shiki-dark:#80A665\">chrono_literals</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">array</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#59873A;--shiki-dark:#80A665\">std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">atomic</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">bool</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">></span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 2</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">></span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> flag </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">false</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\"> false</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">atomic</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">></span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> turn</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">atomic</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">></span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> shared_counter</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">atomic</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">></span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> active_in_critical</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">mutex output_mutex</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">void</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> log</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> process_id</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> const</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#2E8F82;--shiki-dark:#5DA994\">string</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x26;</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> message</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">    std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">lock_guard</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#59873A;--shiki-dark:#80A665\">std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">mutex</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">></span><span style=\"color:#59873A;--shiki-dark:#80A665\"> lock</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">output_mutex</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">    std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">cout </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">P</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> process_id </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> | </span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> message </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> '</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\">\\n</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">'</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">void</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> peterson_lock</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> process_id</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">    const</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> int</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> other </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 1</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> -</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> process_id</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#B07D48;--shiki-dark:#BD976A\">    flag</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">process_id</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#59873A;--shiki-dark:#80A665\">store</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">true</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#B07D48;--shiki-dark:#BD976A\">    turn</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#59873A;--shiki-dark:#80A665\">store</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">other</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">    log</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">process_id</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">wants to enter: flag[</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> +</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#59873A;--shiki-dark:#80A665\">to_string</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">process_id</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> +</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">                        \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">]=true, turn=</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> +</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#59873A;--shiki-dark:#80A665\">to_string</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">other</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">    bool</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> printed_wait_message </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\"> false</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    while</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\">flag</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">other</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#59873A;--shiki-dark:#80A665\">load</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> &#x26;&#x26;</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> turn</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#59873A;--shiki-dark:#80A665\">load</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> ==</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> other</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">        if</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">!</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">printed_wait_message</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">            log</span><span style=\"color:#a13865;--shiki-dark:#d9739f\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">process_id</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">waits because flag[</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> +</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#59873A;--shiki-dark:#80A665\">to_string</span><span style=\"color:#bda437;--shiki-dark:#e6cc77\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">other</span><span style=\"color:#bda437;--shiki-dark:#e6cc77\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> +</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">                                \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">]=true and turn=</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> +</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#59873A;--shiki-dark:#80A665\">to_string</span><span style=\"color:#bda437;--shiki-dark:#e6cc77\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">other</span><span style=\"color:#bda437;--shiki-dark:#e6cc77\">)</span><span style=\"color:#a13865;--shiki-dark:#d9739f\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            printed_wait_message </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\"> true</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">        </span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#59873A;--shiki-dark:#80A665\">this_thread</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#59873A;--shiki-dark:#80A665\">sleep_for</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">20</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">ms</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">    </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#125;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    if</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">printed_wait_message</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        log</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">process_id</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">can enter now</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">    </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">void</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> peterson_unlock</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> process_id</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#B07D48;--shiki-dark:#BD976A\">    flag</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">process_id</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#59873A;--shiki-dark:#80A665\">store</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">false</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">    log</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">process_id</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">leaves: flag[</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> +</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#59873A;--shiki-dark:#80A665\">to_string</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">process_id</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> +</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">]=false</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">void</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> critical_section</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> process_id</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> int</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> round</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">    const</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> int</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> already_inside </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> active_in_critical</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#59873A;--shiki-dark:#80A665\">fetch_add</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    if</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">already_inside </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">!=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        log</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">process_id</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">ERROR: mutual exclusion was broken</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">    </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#125;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">    log</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">process_id</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">enters critical section, round </span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> +</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#59873A;--shiki-dark:#80A665\">to_string</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">round</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">    const</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> int</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> before </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> shared_counter</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#59873A;--shiki-dark:#80A665\">load</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">    std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#59873A;--shiki-dark:#80A665\">this_thread</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#59873A;--shiki-dark:#80A665\">sleep_for</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">80</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">ms</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#B07D48;--shiki-dark:#BD976A\">    shared_counter</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#59873A;--shiki-dark:#80A665\">store</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">before </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 1</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">    log</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">process_id</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">updates shared_counter: </span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> +</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#59873A;--shiki-dark:#80A665\">to_string</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">before</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> +</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">                        \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> -> </span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> +</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#59873A;--shiki-dark:#80A665\">to_string</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">before </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 1</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">    log</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">process_id</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">exits critical section, round </span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> +</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#59873A;--shiki-dark:#80A665\">to_string</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">round</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#B07D48;--shiki-dark:#BD976A\">    active_in_critical</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#59873A;--shiki-dark:#80A665\">fetch_sub</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">void</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> process</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> process_id</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> int</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> rounds</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    for</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> round </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 1</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> round </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> rounds</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> ++</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">round</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        peterson_lock</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">process_id</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        critical_section</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">process_id</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> round</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        peterson_unlock</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">process_id</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#59873A;--shiki-dark:#80A665\">this_thread</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#59873A;--shiki-dark:#80A665\">sleep_for</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">process_id </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">==</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> ?</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 35</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">ms</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> :</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 55</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">ms</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">    </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> main</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">    constexpr</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> int</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> rounds </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 2</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">    std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">cout </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">Peterson algorithm demo: two threads, one critical section</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\">\\n</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">    std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">cout </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">----------------------------------------------------------</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\">\\n</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">    std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">thread </span><span style=\"color:#59873A;--shiki-dark:#80A665\">p0</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">process</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> rounds</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">    std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">thread </span><span style=\"color:#59873A;--shiki-dark:#80A665\">p1</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">process</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 1</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> rounds</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#B07D48;--shiki-dark:#BD976A\">    p0</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#59873A;--shiki-dark:#80A665\">join</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#B07D48;--shiki-dark:#BD976A\">    p1</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#59873A;--shiki-dark:#80A665\">join</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">    std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">cout </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">----------------------------------------------------------</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\">\\n</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">    std</span><span style=\"color:#999999;--shiki-dark:#666666\">::</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">cout </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">Final shared_counter = </span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> shared_counter</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#59873A;--shiki-dark:#80A665\">load</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">              </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> (expected </span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> rounds </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 2</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> \"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">)</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\">\\n</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    return</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span></code></pre>\n<p>Peterson算法的<code>while</code>循环会持续检测是否具有进入临界区的权限(对于进程0，当对方进程进入的时候挂起,<code>flag[1]=true</code>+ <code>turn =1</code>), 因此没有遵循<strong>让权等待</strong>原则，但是这已经是纯软件互斥算法中最完善的方案。</p>\n<h3><span id=\"硬件实现算法\"> 硬件实现算法</span></h3>\n<p>硬件实现算法相当于给临界区资源添加锁</p>\n<h4><span id=\"中断屏蔽\"> 中断屏蔽</span></h4>\n<p>对于单处理器系统，一次只能执行单进程任务。因此能够通过关中断方式，保证进程在临界区执行时不进行进程的切换，保证进程访问临界区不受干扰。</p>\n<h4><span id=\"硬件指令-testandset与-swap\"> 硬件指令 -- <code>TestAndSet</code>与 <code>Swap</code></span></h4>\n<p><code>TestAndSet</code>原语实现了加锁功能</p>\n<p>以CPP伪代码实现</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-cpp\"><span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">#</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">include</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\"> &#x3C;</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">atomic</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">></span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">using</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> namespace</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> std</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#2E8F82;--shiki-dark:#5DA994\">atomic</span><span style=\"color:#999999;--shiki-dark:#666666\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">bool</span><span style=\"color:#999999;--shiki-dark:#666666\">></span><span style=\"color:#59873A;--shiki-dark:#80A665\"> TestAndSet</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2E8F82;--shiki-dark:#5DA994\">atomic</span><span style=\"color:#999999;--shiki-dark:#666666\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">bool</span><span style=\"color:#999999;--shiki-dark:#666666\">></span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x26;</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> lock</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    return</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> lock</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#59873A;--shiki-dark:#80A665\">exchange</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">true</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> memory_order_acquire</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">void</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> Unlock</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2E8F82;--shiki-dark:#5DA994\">atomic</span><span style=\"color:#999999;--shiki-dark:#666666\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">bool</span><span style=\"color:#999999;--shiki-dark:#666666\">></span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x26;</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> lock</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#B07D48;--shiki-dark:#BD976A\">    lock</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#59873A;--shiki-dark:#80A665\">store</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">false</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> memory_order_release</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> main</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    ...</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    while</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#59873A;--shiki-dark:#80A665\">TestAndSet</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">lock</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">        // 忙等待</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">    </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">    fun_</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">       // 上锁的临界区代码段</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">    Unlock</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">lock</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"> // 开锁 </span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">    fun_other</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span></code></pre>\n<p><code>Swap</code>实现的是交换两个原子变量的值</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-cpp\"><span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">#</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">include</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">&#x3C;</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">atomic</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">></span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">using</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> namespace</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> std</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">void</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> Swap</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2E8F82;--shiki-dark:#5DA994\">atomic</span><span style=\"color:#999999;--shiki-dark:#666666\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">bool</span><span style=\"color:#999999;--shiki-dark:#666666\">></span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> a</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#2E8F82;--shiki-dark:#5DA994\"> atomic</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> b</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    atomic</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">bool</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">></span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> tmp </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> a</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    a </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> b</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    b </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> tmp</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> main</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    atomic</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">bool</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">></span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> key </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\"> true</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    atomic</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">bool</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">></span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> lock </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\"> false</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    while</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">key </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">!=</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\"> false</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        swap</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">lock</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> key</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">    </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">    fun_</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    lock </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\"> false</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">    fun_other</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span></code></pre>\n<p><code>Swap</code>实现的是，进程维护的<code>Key</code>值与临界区的<code>Lock</code>的原子布尔值的交换，进程持有<code>Key = True</code>访问临界区时，临界区的<code>Lock</code>变为<code>True</code>，此时相当于上锁。其他进程便无法访问临界区。但是通过<code>TS</code>与<code>Swap</code>实现的互斥锁，在临界区外的进程需要持续检查<code>Key</code>的值，形成忙等待。</p>\n<h4><span id=\"互斥锁\"> 互斥锁</span></h4>\n<p>互斥锁(自旋锁)和<code>TS</code>/<code>Swap</code>指令逻辑相同，基于这两个指令实现开/关锁管理进程与临界区</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-cpp\"><span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">atomic</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">bool</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">></span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> available </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\"> true</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">void</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> acquire</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    while</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">!</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">available</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    available </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\"> false</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">void</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> release</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    available </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\"> true</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span></code></pre>\n<h3><span id=\"实际场景下的同步算法\"> 实际场景下的同步算法</span></h3>\n<h4><span id=\"信号量机制\"> 信号量机制</span></h4>\n<p>信号量将互斥锁的二元占用状态推广为整数计数状态，可用于表示可用资源数量或限制并发访问数量；其 acquire/release 操作本质上是原子的计数检查、递减、递增与等待唤醒机制，而不只是普通整数比较。</p>\n<p>信号量只能通过两个标准原语访问<code>wait()</code>/ <code>signal()</code>, 也称为P/V操作</p>\n<ul>\n<li>整型信号量<br>\n通过整数S记录信号量</li>\n</ul>\n<p>伪代码实现:</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-text\"><span class=\"line\"><span>wait(S)&#123;</span></span>\n<span class=\"line\"><span>    while(S&#x3C;=0)&#123;</span></span>\n<span class=\"line\"><span>         // 忙等待</span></span>\n<span class=\"line\"><span>    &#125;</span></span>\n<span class=\"line\"><span>    S--;</span></span>\n<span class=\"line\"><span>    </span></span>\n<span class=\"line\"><span>&#125;</span></span>\n<span class=\"line\"><span>signal(S)&#123;</span></span>\n<span class=\"line\"><span>    S++;</span></span>\n<span class=\"line\"><span>&#125;</span></span></code></pre>\n<ul>\n<li>记录型信号量</li>\n</ul>\n<p>通过整数S与进程链表L的整个结构体共同实现信号量。</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-cpp\"><span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">struct</span><span style=\"color:#2E8F82;--shiki-dark:#5DA994\"> semaphore</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">    int</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> val</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    llist</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">process</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">></span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> *</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">L</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">void</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> wait</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2E8F82;--shiki-dark:#5DA994\">semaphore</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> S</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#B07D48;--shiki-dark:#BD976A\">    S</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\">val</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> --</span><span style=\"color:#999999;--shiki-dark:#666666\"> ;</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    if</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\">S</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\">val</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">        // L 中插入进程</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        block</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\">S</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\">L</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">    </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">void</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> signal</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2E8F82;--shiki-dark:#5DA994\">semaphroe</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> S</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#B07D48;--shiki-dark:#BD976A\">    S</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\">val</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> ++</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    if</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\">S</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\">val</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">        // L 中 移除进程</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        wakeup</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\">S</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\">L</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">    </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span>\n<span class=\"line\"></span></code></pre>\n<h5><span id=\"信号量与互斥\"> 信号量与互斥</span></h5>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-text\"><span class=\"line\"><span>semaphore S = n; // 初始化信号量</span></span>\n<span class=\"line\"><span>thread P1()&#123;</span></span>\n<span class=\"line\"><span>    P(S);    // 加锁</span></span>\n<span class=\"line\"><span>    fun();   // 执行</span></span>\n<span class=\"line\"><span>    V(S)     // 开锁</span></span>\n<span class=\"line\"><span>&#125;</span></span></code></pre>\n<h5><span id=\"信号量与同步\"> 信号量与同步</span></h5>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-text\"><span class=\"line\"><span>semaphore S = 0; // 初始化信号量</span></span>\n<span class=\"line\"><span></span></span>\n<span class=\"line\"><span>thread P1()&#123;</span></span>\n<span class=\"line\"><span>    fun1();</span></span>\n<span class=\"line\"><span>    V(S);</span></span>\n<span class=\"line\"><span>&#125;</span></span>\n<span class=\"line\"><span></span></span>\n<span class=\"line\"><span>thread P2()&#123;</span></span>\n<span class=\"line\"><span>    while(S&#x3C;=0);</span></span>\n<span class=\"line\"><span>    P(S);</span></span>\n<span class=\"line\"><span>    fun2();</span></span>\n<span class=\"line\"><span>&#125;</span></span></code></pre>\n<p>进程P1执行完后，信号量才能增加到1，进程P2才能通过忙等待执行。因此通过信号量的序关系确定了进程的前驱关系</p>\n<h5><span id=\"信号量与计算图\"> 信号量与计算图</span></h5>\n<p>信号量可以描述程序段间的前后前驱关系，它能实现一个有向图的任务执行流程，每一个有向边对应一个同步问题。通过维护不同的信号量值，管理不同的前驱关系，从而构建一个进程前驱图。</p>\n<p>神经网络中的计算图是张量信息流作为有向边与算子计算结点构成顶点的DAG。由于数据依赖天然诱导执行依赖，因此它可以被视为一种前驱图。若从并行调度角度实现，每条依赖边都可以抽象为一个同步约束，信号量是表达这种约束的一种机制。</p>\n<h4><span id=\"生产者-消费者问题\"> 生产者-消费者问题</span></h4>\n<p>系统中存在多个生产者与消费者，其中生产者、消费者只能在缓冲区互斥读/写</p>\n<p>这个场景下，只需要判断缓冲区是否有“产品”与是否能访问。通过互斥信号量<code>mutex</code>实现互斥访问的限制，通过计数信号量<code>empty</code>的值表示当前缓冲区含有的产品量。</p>\n<p>CPP伪代码为</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-cpp\"><span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">semaphore mutex </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 1</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">  // 互斥缓冲区 -- 占用进程缓冲区P后不允许进程访问</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">semaphore empty </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> n</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">  // 空缓冲区值 -- 产品缓冲区的存量上限</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">semaphore full </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">void</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> producer</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    while</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">true</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        produce</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">        // 生产产品</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        P</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">empty</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">         // 申请空缓冲区 -- 容量 -1</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        P</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">mutex</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">         // 申请进程互斥缓冲区 -- 禁止其他进程访问</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        put_product</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">    // 产品放入缓冲区 </span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        V</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">mutex</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">         // 释放进程互斥缓冲区 </span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        V</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">full</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">          // 满缓冲区 有效产品 +1</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">    </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">void</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> consumer</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    while</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">true</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        P</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">full</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">          // 申请满缓冲区 有效产品 -1</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        P</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">mutex</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">         // 申请进程互斥缓冲区</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        use_product</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">    // 消耗产品</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        V</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">mutex</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">         // 释放进程互斥缓冲区</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        V</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">empty</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">         // 释放空缓冲区 -- 容量 +1</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">    </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span>\n<span class=\"line\"></span></code></pre>\n<h4><span id=\"读者-写者问题\"> 读者-写者问题</span></h4>\n<p>读者-写者问题是生产者-消费者问题的变式。写者互斥写、读者并行读，但是读者读和写者写不能同时发生。</p>\n<p>读者-写者问题分为两种权限优先</p>\n<ul>\n<li>读者优先 -- 读者可抢占行夺取缓冲区资源</li>\n<li>写者优先 -- 写者非抢占性优先，读者读取时写者需等待</li>\n</ul>\n<h5><span id=\"读者优先\"> 读者优先</span></h5>\n<p>写者的信号量权限为 <code>rw = 1</code>, 读者侧维护信号量<code>account</code> 表示当前在读的人数，如果读者归零，则写者能进入缓冲区写</p>\n<p>CPP伪代码实现</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-cpp\"><span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> count </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">semaphore mutex </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 1</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> </span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">semaphore rw </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 1</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">void</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> reader</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    while</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">true</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        P</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">mutex</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">   // 申请进程缓冲区</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">        if</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">count </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">==</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">            P</span><span style=\"color:#a13865;--shiki-dark:#d9739f\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">rw</span><span style=\"color:#a13865;--shiki-dark:#d9739f\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">  // 关闭 写者访问</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">        </span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        count </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">++</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        V</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">mutex</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">   // 释放进程缓冲区</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        read</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        P</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">mutex</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        count </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">--</span><span style=\"color:#999999;--shiki-dark:#666666\"> ;</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">        if</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">count </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">==</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">            V</span><span style=\"color:#a13865;--shiki-dark:#d9739f\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">rw</span><span style=\"color:#a13865;--shiki-dark:#d9739f\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">        </span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        V</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">mutex</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">    </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">void</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> writer</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    while</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">true</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        P</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">rw</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">      // 申请rw锁，如果存在读者，则无法访问</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        write</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        V</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">rw</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">    </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span></code></pre>\n<h5><span id=\"写者优先\"> 写者优先</span></h5>\n<p>写者优逻辑中，还需要维护一个新的信号量<code>w</code>, 用于限制写者的唯一访问</p>\n<p>CPP伪代码</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-cpp\"><span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> count </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">       // 读者数</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">semaphore mutex </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 1</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"> // count锁</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">semaphore rw </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 1</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">    // 写者优先权</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">semaphore w </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 1</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">     // 文件的独占写信号量</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">void</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> writer</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    while</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">true</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        P</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">rw</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">      // 阻止读者读</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        P</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">w</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">       // 写者单独写</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        write</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        V</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">w</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        P</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">rw</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">    </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">void</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> reader</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    while</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">true</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        P</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">rw</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">      // 申请读权限，如果有写者则无法读</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        P</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">mutex</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">   // 申请进程缓冲区访问</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">        if</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">count </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">==</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">            P</span><span style=\"color:#a13865;--shiki-dark:#d9739f\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">w</span><span style=\"color:#a13865;--shiki-dark:#d9739f\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">   // 阻止写者访问，非抢占性优先</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">        </span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        count </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">++</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        V</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">mutex</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        V</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">rw</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        read</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        P</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">mutex</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        count </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">--</span><span style=\"color:#999999;--shiki-dark:#666666\"> ;</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">        if</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">count </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">==</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">            V</span><span style=\"color:#a13865;--shiki-dark:#d9739f\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">w</span><span style=\"color:#a13865;--shiki-dark:#d9739f\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">        </span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">        V</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">mutex</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">    </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">&#125;</span></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span>\n<span class=\"line\"></span></code></pre>\n<h4><span id=\"哲学家进餐问题\"> 哲学家进餐问题</span></h4>\n<p>圆形餐桌的每一个哲学家左右手各有一根筷子，进餐时哲学家需要拿起两根筷子进食，如果没有足够的筷子就进行等待。</p>\n<p>选定一个哲学家为<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>0</mn></mrow><annotation encoding=\"application/x-tex\">0</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">0</span></span></span></span> 号哲学家，且逆时针编号，则第<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>i</mi></mrow><annotation encoding=\"application/x-tex\">i</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6595em;\"></span><span class=\"mord mathnormal\">i</span></span></span></span> 个哲学家的左手边的筷子与右手边的筷子的编号分别为</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mo fence=\"true\">{</mo><mtable rowspacing=\"0.36em\" columnalign=\"left left\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mspace width=\"1em\"><mi>i</mi></mspace></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mtext>左手边筷子</mtext></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mo stretchy=\"false\">(</mo><mi>i</mi><mo>+</mo><mn>1</mn><mo stretchy=\"false\">)</mo><mi mathvariant=\"normal\">%</mi><mn>5</mn></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mtext>右手边筷子</mtext></mstyle></mtd></mtr></mtable></mrow><annotation encoding=\"application/x-tex\">\\begin{dcases}\n\\quad  i &amp; \\text{左手边筷子}\\\\\n(i+1)\\% 5 &amp;\\text{右手边筷子}\n\\end{dcases}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:3em;vertical-align:-1.25em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size4\">{</span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.69em;\"><span style=\"top:-3.69em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mspace\" style=\"margin-right:1em;\"></span><span class=\"mord mathnormal\">i</span></span></span><span style=\"top:-2.25em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mopen\">(</span><span class=\"mord mathnormal\">i</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\">1</span><span class=\"mclose\">)</span><span class=\"mord\">%5</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.19em;\"><span></span></span></span></span></span><span class=\"arraycolsep\" style=\"width:1em;\"></span><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.69em;\"><span style=\"top:-3.69em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord cjk_fallback\">左手边筷子</span></span></span></span><span style=\"top:-2.25em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord cjk_fallback\">右手边筷子</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.19em;\"><span></span></span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p>对于 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>n</mi></mrow><annotation encoding=\"application/x-tex\">n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">n</span></span></span></span> 个哲学家与 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>n</mi></mrow><annotation encoding=\"application/x-tex\">n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">n</span></span></span></span> 根筷子，会出现一个人有一只筷子的死锁情况: 每一个人都只有左手边筷子或者右手边筷子，此时会发生死锁。</p>\n<p>如果限制每一次只能有 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>n</mi><mo>−</mo><mn>1</mn></mrow><annotation encoding=\"application/x-tex\">n-1</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\">n</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">1</span></span></span></span> 个哲学家能吃饭，那么根据容斥原理，总有一个哲学家有两只筷子，此时这个哲学家吃完饭释放筷子资源后，其余哲学家也能继续进餐。因此基于这个想法，进餐人数的信号量初始为 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>n</mi><mo>−</mo><mn>1</mn></mrow><annotation encoding=\"application/x-tex\">n-1</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\">n</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">1</span></span></span></span> 时能避免死锁。</p>\n",
            "tags": [
                "OS",
                "进程"
            ]
        },
        {
            "id": "https://yuuko.site/2026/05/25/CS/OS/%E8%BF%9B%E7%A8%8B%E4%B8%8E%E7%BA%BF%E7%A8%8B/",
            "url": "https://yuuko.site/2026/05/25/CS/OS/%E8%BF%9B%E7%A8%8B%E4%B8%8E%E7%BA%BF%E7%A8%8B/",
            "title": "进程与线程",
            "date_published": "2026-05-24T16:00:00.000Z",
            "content_html": "<link rel=\"stylesheet\" type=\"text/css\" href=\"https://cdn.jsdelivr.net/hint.css/2.4.1/hint.min.css\"><h1><span id=\"进程与线程\"> 进程与线程</span></h1>\n<p>进程是操作系统进行资源分配与调度的基本单位。</p>\n<h3><span id=\"进程的特征\"> 进程的特征</span></h3>\n<ul>\n<li>动态性 -- 进程有明确的生命周期，执行状态存在变化</li>\n<li>并发性 -- 多个进程可以在内存中存在于内存并在CPU中并发</li>\n<li>独立性 -- 作为申请系统资源的独立单元存在</li>\n<li>异步性 -- 进程间相互制约，申请CPU资源是先后进行的。这导致了实际执行的结果可能不可复现</li>\n</ul>\n<h2><span id=\"并行的效率\"> 并行的效率</span></h2>\n<p>Amdahl’s Law与 Gustafson’s Law 讨论了并行计算能提升多大的计算效率</p>\n<h3><span id=\"amdahls-law\"> Amdahl's Law</span></h3>\n<p>Amdahl's Law 讨论在<strong>等量任务</strong>的条件下，并行计算带来的计算效率优化上线。</p>\n<p>假设一个程序中可并行部分的比例为 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>p</mi></mrow><annotation encoding=\"application/x-tex\">p</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\">p</span></span></span></span>, 可以 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>N</mi></mrow><annotation encoding=\"application/x-tex\">N</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span></span></span></span> 路并行， 那么整体加速比为</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>S</mi><mi>N</mi></msub><mo>=</mo><mfrac><mn>1</mn><mrow><mo stretchy=\"false\">(</mo><mn>1</mn><mo>−</mo><mi>p</mi><mo stretchy=\"false\">)</mo><mo>+</mo><mfrac><mi>p</mi><mi>N</mi></mfrac></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">S_N = \\frac{1}{(1-p)+\\frac{p}{N}}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05764em;\">S</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0576em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.3524em;vertical-align:-1.031em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3214em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mopen\">(</span><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\">p</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7475em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.4461em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.345em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.031em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p>Amdahl's law 指出了并行优化的效率上限 -- 串行部分的比例。当 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>N</mi><mo>→</mo><mi mathvariant=\"normal\">∞</mi></mrow><annotation encoding=\"application/x-tex\">N\\to \\infty</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">→</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord\">∞</span></span></span></span> 时，加速比最大值为</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>S</mi><mi mathvariant=\"normal\">∞</mi></msub><mo>=</mo><mfrac><mn>1</mn><mrow><mn>1</mn><mo>−</mo><mi>p</mi></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">S_\\infty = \\frac{1}{1-p}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05764em;\">S</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:-0.0576em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">∞</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.2019em;vertical-align:-0.8804em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3214em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\">p</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8804em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<h3><span id=\"gustafsons-law\"> Gustafson’s Law</span></h3>\n<p>Gustafson’s Law讨论在<strong>相等时间</strong>的情况下，并行计算得带来的计算任务量的效率优化上限。</p>\n<p>假设一个程序的可并行部分为<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>p</mi></mrow><annotation encoding=\"application/x-tex\">p</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\">p</span></span></span></span>. 则相同时间下并行的任务量（加速比）为</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>S</mi><mi>N</mi></msub><mo>=</mo><mn>1</mn><mo>−</mo><mi>p</mi><mo>+</mo><mi>N</mi><mi>p</mi><mo>=</mo><mo stretchy=\"false\">(</mo><mi>N</mi><mo>−</mo><mn>1</mn><mo stretchy=\"false\">)</mo><mi>p</mi><mo>+</mo><mn>1</mn></mrow><annotation encoding=\"application/x-tex\">S_N = 1-p + Np = (N-1)p+1\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05764em;\">S</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0576em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7778em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\">p</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8778em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\">Np</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">1</span><span class=\"mclose\">)</span><span class=\"mord mathnormal\">p</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">1</span></span></span></span></span></p>\n<p>这说明通过增加并行资源，能通过扩大并行任务规模并使并行部分占主导。</p>\n<h2><span id=\"进程的构成\"> 进程的构成</span></h2>\n<p>进程实体由三部分构成：</p>\n<ul>\n<li>进程控制块(PCB)</li>\n<li>程序段</li>\n<li>相关数据段</li>\n</ul>\n<h4><span id=\"进程控制块-pcb\"> 进程控制块 (PCB)</span></h4>\n<p>PCB是进程存在的<strong>唯一标志</strong>，在进程的整个生命周期内，OS均依赖PCB进行进程管理。</p>\n<p>PCB包含四类信息:</p>\n<ul>\n<li><strong>进程标志信息</strong>: 包含用户ID(UID)、进程ID(PID)与父进程ID(PPID), 作为进程标志编号；用户ID能标注进程所属用户，以提供资源与安全的保护</li>\n<li><strong>进程调度信息</strong>: 包含进程当前状态，进程优先级，CPU占用时间等进度相关调度信息</li>\n<li><strong>资源与内存信息</strong>: 包含进程内所使用的内存信息，包含代码指针、数据指针、堆栈指针、IO清单与文件描述符</li>\n<li><strong>处理机状态信息</strong>: 也称为<strong>CPU上下文</strong>， 包含物理寄存器状态参数，如通用寄存器值(PC)、程序计数器值、程序状态字(PSW)、 栈指针(SP)</li>\n</ul>\n<p>进程的PCB是用户不透明的，我们能通过shell访问相应进程的PCB。</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-bash\"><span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"># Linux</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">cat</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> /proc/[pid]</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">         # 进程 PCB</span></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">cat</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\"> /proc/[pid]/status</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">  # 进程 PCB中的状态特征</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"># MacOS</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#59873A;--shiki-dark:#80A665\">ps</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\"> -l</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\"> -p</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">pid</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">          </span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"># 与 cat /proc/[pid]/status 一致</span></span>\n<span class=\"line\"></span></code></pre>\n<p>对于不同OS，父进程中断对子进程的影响都不同。Linux系统对父进程中断并不会中断子进程的执行，而是由<code>init</code>(PID=1)进程收养并正常运行。</p>\n<p>PCB中的处理机状态信息与指令的相应状态信息相对应。当进程指令进入CPU时间片处理时，对应寄存器从内存中读取PCB处理机信息。</p>\n<h4><span id=\"程序段\"> 程序段</span></h4>\n<p>程序段是CPU可执行的代码部分，是代码的纯二进制指令部分。静态的二进制指令可以被多个进程共享。</p>\n<p>程序段包含</p>\n<ul>\n<li><code>.init</code> -- 主程序启动前的隐代码准备</li>\n<li><code>.text</code> -- 程序的可执行部分</li>\n<li><code>.rodata</code> -- 只读数据存储</li>\n</ul>\n<h4><span id=\"数据段\"> 数据段</span></h4>\n<p>数据段包含进程运行所需的数据，包含<code>.data</code>、<code>.bss</code>与堆、栈</p>\n<p><code>.data</code>用于放置已初始化的静态变量、<code>.bss</code>用于放置未初始化的静态变量</p>\n<p><strong>Tip:</strong> 关于变量存储的空间</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-c\"><span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> a </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 1</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"> // 在.data 数据段中</span></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> b</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">10</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"> // 在.bss 数据段中</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">void</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> main</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#123;</span></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">    static</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> int</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> List</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">10</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"> // 函数中静态变量声明在.data数据段中</span></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">    int</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> c </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 1</span><span style=\"color:#999999;--shiki-dark:#666666\">;</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"> // 在被调用的函数中声明的变量直接在栈中生成</span></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">    int</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> *</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">p_arr </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int*</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#59873A;--shiki-dark:#80A665\">malloc</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">10</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*sizeof</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">int</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"> // 被调用函数中声明动态申请空间时，变量生成在堆中</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">&#125;</span></span></code></pre>\n<h4><span id=\"共享库的存储映射区\"> 共享库的存储映射区</span></h4>\n<p>依赖库的加载，比如一些动态库<code>.so/.dll</code></p>\n<p>静态库通常加载到<code>.text</code>中</p>\n<h2><span id=\"进程的生命周期\"> 进程的生命周期</span></h2>\n<p>进程的生命周期分为</p>\n<ul>\n<li>创建态</li>\n<li>就绪态</li>\n<li>运行态</li>\n<li>阻塞态</li>\n<li>终止态</li>\n</ul>\n<p>满足图</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-text\"><span class=\"line\"><span></span></span>\n<span class=\"line\"><span>创建态 -> 就绪态  &#x3C;-> 运行态 -> 终止态</span></span>\n<span class=\"line\"><span>            |          |</span></span>\n<span class=\"line\"><span>            &#x3C;------  阻塞态</span></span></code></pre>\n<h3><span id=\"进程的创建\"> 进程的创建</span></h3>\n<p>在创建态下，OS通过<strong>创建原语</strong>建立新进程。</p>\n<ul>\n<li>分配唯一标识并创建空白PCB</li>\n<li>分配新进程所需的资源，如内存、文件、IO等，存入PCB</li>\n<li>初始化PCB</li>\n<li>插入就绪队列中</li>\n</ul>\n<p>终端登录、作业调度系统启动任务、用户程序请求、系统提供服务等事件都会创建新的进程。</p>\n<h3><span id=\"进程的终止\"> 进程的终止</span></h3>\n<p>分为正常结束、异常结束与外界干预</p>\n<ul>\n<li>正常结束 -- 进程完成任务后退出</li>\n<li>异常结束 -- 异常或者中断导致退出</li>\n<li>外界干预 -- 人工退出、系统资源不足或者安全原因导致退出</li>\n</ul>\n<p>终止进程的流程</p>\n<ul>\n<li>检索PCB并读取当前状态</li>\n<li>剥夺CPU并调度下一个进程</li>\n<li>释放相关的系统资源，归还给OS</li>\n<li>根据系统设计中断相关子进程</li>\n<li>清除并释放PCB</li>\n</ul>\n<h3><span id=\"进程的阻塞与唤醒\"> 进程的阻塞与唤醒</span></h3>\n<p>进程因为外部事件无法进行时会主动调用<strong>阻塞原语(Block)</strong>, 将进程从运行态转变为阻塞态。</p>\n<p>阻塞原语与中断实现的逻辑相似。阻塞原语的实现过程</p>\n<ul>\n<li>保存CPU现场(PC/状态寄存器)到PCB</li>\n<li>修改PCB状态字段为阻塞态并放入等待队列</li>\n<li>调度就绪队列的进程运行</li>\n</ul>\n<p>阻塞进程的事件结束后，需要内核的中断处理程序或者合作进程调用**唤醒原语(Wakeup)**进行进程的唤醒</p>\n<ul>\n<li>在等待队列找到对应PCB</li>\n<li>移除队列并修改PCB中的状态字，此时从阻塞态转变为就绪态</li>\n<li>将PCB插入就绪队列，并等待CPU资源执行</li>\n</ul>\n<h2><span id=\"进程的通信\"> 进程的通信</span></h2>\n<p>进程是独立的单元，进程间的通信通常需要申请内存区域搭建通信的介质进行信息交换。</p>\n<p>根据通信效率与量的不同可以分为:</p>\n<ul>\n<li>低级通信 -- 控制信息的传输</li>\n<li>高级通信 -- 高效传输大量信息</li>\n</ul>\n<p>高级通信面对大量的数据与地址的传递</p>\n<ul>\n<li>\n<p>共享传递 -- 划出独立的内存空间，使多个进程都能读写这个共享的空间以实现通信。 引入同步<strong>互斥机制</strong>规避同时读写共享空间的导致的数据不一致的问题</p>\n</li>\n<li>\n<p>消息传递 --  封装格式化的信息进行传递，格式化信息由内核实现，面向用户透明。分为直接通信方式与间接通信方式。直接通信方式是将格式化信息通过<strong>发送原语</strong>与<strong>接收原语</strong>到指定进程；间接通信方式通过维护一个<strong>信箱</strong>实体进行间接通信</p>\n</li>\n<li>\n<p>管道通信<br>\n维护一个固定大小的内存缓存区进行通信，管道是单向的。管道满足:</p>\n<ul>\n<li>互斥性: 同一时刻只有写端或者读端一个进程访问管道，以避免并发造成数据混乱</li>\n<li>同步性: 管道慢自动阻塞，读端释放后继续写入</li>\n<li>写端关闭检测后读端自动返回空</li>\n</ul>\n</li>\n<li>\n<p>信号用于通知进程事件发生的情况，通过一个向量维护待处理的信号，一个向量维护被阻塞的信号。信号的来源分为:</p>\n<ul>\n<li>内核: 如<code>SIGKILL(kill -9)</code></li>\n<li>进程：其他进程或者自身发送信号</li>\n</ul>\n</li>\n</ul>\n<p>仅当OS从内核态回到用户态的时候才会处理待处理的信号。处理信号的方式</p>\n<ul>\n<li>执行系统默认的信号处理 -- 比如遇到<code>SIGKILL</code>杀死进程</li>\n<li>执行用户自定义的信号处理程序 -- 相关定义函数的执行</li>\n</ul>\n<h2><span id=\"线程\"> 线程</span></h2>\n<p>线程是<strong>程序执行流的最小单位</strong>，也是处理器调度的基本单位。</p>\n<p>在单个进程中有多个执行线程。一个线程不拥有独立的系统资源，比如地址空间，但是运行基本的线程私有数据，比如线程ID、PC、寄存器组和栈寄存器(这些是指令执行的基本单位，线程独立执行一些指令组，因此拥有这些物理元件的数据独立性)。线程的控制块<strong>TCB</strong>用于记录线程的执行上下文(即对应物理元件的备份值，如PC值，栈指针等)</p>\n<p>统一进程中的多个线程也能并发执行相同的程序代码。常见的，比如<code>uWSGI</code>创建的多线程后端服务，能访问同一个后端代码库。也能同时共享进程的资源与内存信息。</p>\n<p>线程间的CPU调用与生命周期是独立的。多CPU系统的线程可以在不同的CPU核心上运行</p>\n<p>线程间的调度的开销小于进程间的调度。</p>\n<h3><span id=\"线程控制块tcb\"> 线程控制块(TCB)</span></h3>\n<p>TCB 包含</p>\n<ul>\n<li>线程标志符(ID)</li>\n<li>寄存器集合</li>\n<li>线程状态</li>\n<li>调度优先级</li>\n<li>线程私有数据的局部存储区</li>\n<li>线程的私有堆栈的指针</li>\n</ul>\n<h3><span id=\"线程的实现\"> 线程的实现</span></h3>\n<p>线程分为用户级线程(ULT)与内核态线程(KLT)，基于这两种线程存在多种进程实现方式。</p>\n<ul>\n<li>用户级线程模式 -- 只由用户级线程执行线程，任务只需要库函数进行。这样的OS称为纯微内核系统，用户态线程只通过系统调用的方式执行一些内核操作(比如IO)</li>\n<li>内核级线程模式 -- 指系统内核支持用户级与内核级线程。但是线程的用户态/内核态转换面临系统开销</li>\n</ul>\n<p>组合方式分为多对一模式、一对一模式与多对多模式</p>\n<p>一对一模式与内核级线程模式是一个模式的两个名字</p>\n<p>多对一模式是指始终只有一个线程能从用户态切换至内核态，实现并发的内核态运行。多对一模式并不能完成多内核/多CPU多线程并行</p>\n<p>多对多模式是能将实现多对多的内核态切换，同时避免一对一模型的全切换开销过大的问题。</p>\n<p>线程库是用户态线程的管理方式，程序员通过线程库的库函数(API)进行线程的创建与生命周期的管理。同时，高级语言能通过线程池进行多用户线程的管理，例如<code>SQLite</code>的<code>ThreadPoolExecutor</code></p>\n<h1><span id=\"cpu调度\"> CPU调度</span></h1>\n<p>CPU调度是指当进程数远大于CPU核心数时，如何分配CPU资源以实现进程的高效、公平的运行，从而实现进程的高并发。</p>\n<h3><span id=\"调度的层次\"> 调度的层次</span></h3>\n<p>调度自外向内分为三个层次</p>\n<ul>\n<li>高级调度(作业调度) -- 高级调度负责将作业从外存中提取进内存，分配内存/IO等资源，并创建相应的PCB。整个作业调度周期内高级调度只发生一次。</li>\n<li>中级调度(内存调度) -- 中级调度是为了提高内存利用率，当内存紧张时，将阻塞态或者就绪态持续较长的进程挂起并放入外存；当系统资源支持对应进程继续运行时，中级调度将进程移回内存。</li>\n<li><strong>低级调度(进程调度)</strong> -- 低级调度按照规则给进程分配CPU资源。</li>\n</ul>\n<p>进程调度的任务包括</p>\n<ul>\n<li>保存CPU现场信息(如PC,PSM), 以确保上一个进程能断点运行</li>\n<li>选取就绪的待运行进程，将进程PCB从就绪态改为运行态</li>\n<li>移交CPU控制权，恢复断点，分配CPU</li>\n</ul>\n<h3><span id=\"调度的实现\"> 调度的实现</span></h3>\n<p>进程调度通过调度器实现，调度器分为排队器、分派器与上下文切换器。</p>\n<ul>\n<li>排队器 -- 进程就绪态的排队序列</li>\n<li>分派器 -- 根据调度算法选定进程进行CPU资源分配，并触发上下文的切换</li>\n<li>上下文切换器 -- 进行当前CPU运行的进程的上下文切换写入PCB，再将分派进程的上下文写入CPU的相应寄存器中</li>\n</ul>\n<h4><span id=\"上下文切换\"> 上下文切换</span></h4>\n<p>上下文切换是进程调度的核心占用步骤，因为PCB就是进程的信息载体本身。上下文切换分为上一个进程PCB的保存与下一个进程PCB的恢复与写入CPU寄存器，完整流程为</p>\n<ul>\n<li>挂起当前进程，将CPU上下文写入PCB</li>\n<li>将当前进程的PCB放入就绪队列或者阻塞队列中</li>\n<li>选择另外一个进程，更新相关PCB信息(从就绪态改为运行态)</li>\n<li>将PCB的运行信息恢复进CPU</li>\n<li>分配CPU资源，跳转到PC地址并执行</li>\n</ul>\n<p>上下文切换涉及复杂的访存与指令的执行，时间与资源的开销都较大。</p>\n<p>上下文切换是发生在进程间的，而模式切换是发生在进程或者线程内的状态变化。上下文切换只能由内核执行，因此当前进程切换必须为内核态。</p>\n<h4><span id=\"调度的时机\"> 调度的时机</span></h4>\n<p>进行进程调度的时刻有</p>\n<ul>\n<li>创建新进程时，父进程与子进程都进入就绪态，调度器选择父进程/子进程分配CPU资源</li>\n<li>进程正常结束或者终止的时候。如果没有就绪进程就挂起一个闲逛进程(PID=0,优先度最低)、</li>\n<li>进程进入阻塞态时，为了提高CPU利用率，会进行进程调度</li>\n<li>IO就绪时，将阻塞态进程切换为就绪态进程</li>\n</ul>\n<p>不能进行进程切换的场景(关中断状态)</p>\n<ul>\n<li>处理中断的过程。</li>\n<li>执行原子操作的情况，此时系统会进行屏蔽中断操作。</li>\n</ul>\n<h4><span id=\"调度的方式\"> 调度的方式</span></h4>\n<ul>\n<li>非抢占调度方式: 面对更加紧迫的或者优先级更高的进程，只有当前进程执行后才会进行调度</li>\n<li>抢占调度方式: 出现优先级更高的进程进入就绪态，当前进程的CPU资源直接被剥夺，给新进程分配CPU资源</li>\n</ul>\n<h4><span id=\"调度的目标\"> 调度的目标</span></h4>\n<p>调度的目标是提高CPU的利用率，进程的调度目的是尽可能提高CPU的利用率。<strong>CPU利用率</strong>的计算公式为</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mtext>CPU利用率</mtext><mo>=</mo><mfrac><mtext>CPU有效工作时间</mtext><mrow><mtext>CPU有效工作时间</mtext><mo>+</mo><mtext>CPU空闲等待时间</mtext></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\text{CPU利用率} = \\frac{\\text{CPU有效工作时间}}{\\text{CPU有效工作时间}+\\text{CPU空闲等待时间}}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord text\"><span class=\"mord\">CPU</span><span class=\"mord cjk_fallback\">利用率</span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.1297em;vertical-align:-0.7693em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3603em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord\">CPU</span><span class=\"mord cjk_fallback\">有效工作时间</span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord text\"><span class=\"mord\">CPU</span><span class=\"mord cjk_fallback\">空闲等待时间</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord\">CPU</span><span class=\"mord cjk_fallback\">有效工作时间</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7693em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p><strong>系统吞吐量</strong>用于衡量系统的执行单个作业的平均速度。长作业通常占用CPU的时间，降低系统的吞吐量。</p>\n<p>周转时间指作业从提交到完成的整个生命周期所消耗的时间，包括等待、就绪态排队、阻塞态等待IO以及实际CPU的执行时间。对于<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>n</mi></mrow><annotation encoding=\"application/x-tex\">n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">n</span></span></span></span>个作业的平均周转时间为均值</p>\n<h4><span id=\"调度算法\"> 调度算法</span></h4>\n<h5><span id=\"先来先服务fcfs算法\"> 先来先服务(FCFS)算法</span></h5>\n<p>FCFS算法中的进程队列满足FIFO，即先来先执行，晚到晚执行的策略，属于非抢占式的算法。FCFS是公平算法但是对短进程不友好，效率较低；同时FCFS遇到高频IO的进程无法快速释放CPU资源，效率更低。</p>\n<h5><span id=\"短作业优先sjf算法-短进程优先spf算法\"> 短作业优先(SJF)算法 / 短进程优先(SPF)算法</span></h5>\n<p>短作业优先算法会进行队列中的作业用时预测，将最短用时的作业分配CPU资源。</p>\n<p>预测算法是简单的预测时间加权。对于上一次实际用时 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>t</mi><mi>n</mi></msub></mrow><annotation encoding=\"application/x-tex\">t_n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7651em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">t</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span> 与预测用时 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>τ</mi><mi>n</mi></msub></mrow><annotation encoding=\"application/x-tex\">\\tau_n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5806em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.1132em;\">τ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:-0.1132em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span>, 以及预测因子 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>α</mi></mrow><annotation encoding=\"application/x-tex\">\\alpha</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span></span></span></span>, 本轮的预测用时为</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>τ</mi><mrow><mi>n</mi><mo>+</mo><mn>1</mn></mrow></msub><mo>=</mo><mi>α</mi><msub><mi>t</mi><mi>n</mi></msub><mo>+</mo><mo stretchy=\"false\">(</mo><mn>1</mn><mo>−</mo><mi>α</mi><mo stretchy=\"false\">)</mo><msub><mi>τ</mi><mi>n</mi></msub></mrow><annotation encoding=\"application/x-tex\">\\tau_{n+1} = \\alpha t_n + (1-\\alpha)\\tau_n \n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6389em;vertical-align:-0.2083em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.1132em;\">τ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1132em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">n</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2083em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7651em;vertical-align:-0.15em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mord\"><span class=\"mord mathnormal\">t</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mclose\">)</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.1132em;\">τ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:-0.1132em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span></span></p>\n<p>SJF对于长作业不利，如果短作业频繁创建，长作业会始终得不到CPU资源而产生<strong>饥饿现象</strong>(调度策略使某一进程出现无限期等待)；同时SJF只考虑了作业长度作为优先级而没考虑进程的优先级，无法保证紧迫作业及时处理。</p>\n<h5><span id=\"高响应比优先调度算法\"> 高响应比优先调度算法</span></h5>\n<p>基于响应比评估作业的整体优先级，综合考虑了作业的时效性与等候时间</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mtext>响应比</mtext><mo>=</mo><mfrac><mrow><mtext>等待时间</mtext><mo>+</mo><mtext>估计运行时间</mtext></mrow><mtext>估计运行时间</mtext></mfrac></mrow><annotation encoding=\"application/x-tex\">\\text{响应比} = \\frac{\\text{等待时间}+\\text{估计运行时间}}{\\text{估计运行时间}}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord text\"><span class=\"mord cjk_fallback\">响应比</span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.0463em;vertical-align:-0.686em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3603em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord cjk_fallback\">估计运行时间</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord cjk_fallback\">等待时间</span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord text\"><span class=\"mord cjk_fallback\">估计运行时间</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<h5><span id=\"优先级调度算法\"> 优先级调度算法</span></h5>\n<p>根据系统优先级进行作业的调度</p>\n<p>优先级调度算法根据是否能在进程期间抢占CPU资源分为</p>\n<ul>\n<li>抢占式优先调度算法</li>\n<li>非抢占式优先调度算法</li>\n</ul>\n<p>通常 系统进程 &gt; 用户进程， 交互性进程 &gt; 非交互性进程， IO进程 &gt; 计算进程(IO设备的传输与处理优先级高，相关的数据信息无法暂存，处理速度也慢于CPU)</p>\n<p>根据进程优先级是否可变可以分为静态优先级与动态优先级。其中静态优先级指进程优先级在创建时就固定不可变，调度简单但是有可能出现饥饿现象；动态优先级可以根据等待时间或者进程属性进行动态的改变。</p>\n<h5><span id=\"时间片轮转rr算法\"> 时间片轮转(RR)算法</span></h5>\n<p>时间片一个执行窗口，如果进程执行时间长于时间片，就将进程CPU资源释放给下一个进程，并放入就绪队列。时间片轮转能保证进程的公平性，不会遇到饥饿现象，但是面临性能开销较大的问题。</p>\n<p>当进程结束但是时间片未结束，此时也触发时间片的轮转，为下一个几次呢分配CPU资源</p>\n<h5><span id=\"多级队列调度算法\"> 多级队列调度算法</span></h5>\n<p>将就绪进程分为多个<strong>不同优先级</strong>、<strong>不同或相同调度算法</strong>的进程队列，进行进程的调度与执行。潜在的问题为高级队列进程频繁创建的场景下，低等级队列的进程饥饿</p>\n<h5><span id=\"多级反馈队列调度算法\"> 多级反馈队列调度算法</span></h5>\n<p>相比多级队列调度算法， 多级反馈队列调度算法打通了不同优先级队列的通道。多级反馈队列的通常基于递增时间片的进程队列实现，通常越低级、越底层的队列的时间片越长，越接近FCFS。</p>\n<p>新到达的进程通常先进入最高优先级队列。若进程在当前队列的时间片内未执行完，则被剥夺 CPU，并被移动到下一较低优先级队列的队尾继续等待；若进程已经处于最低优先级队列，则被剥夺 CPU 后通常回到该最低级队列队尾等待。、、</p>\n<h4><span id=\"多处理机调度\"> 多处理机调度</span></h4>\n<p>多处理器系统分为非对称多处理器(AMP)和对称多处理器(SMP)。AMP有一个独立主CPU进行全部调度与决策，从CPU只执行任务；SMP的CPU是平等执行与调度进程的，调度程序可以将任意就绪进程分配给不同的CPU</p>\n<p>在多核CPU/多CPU系统中，某个进程在CPU1运行，调度到CPU2时，相应的寄存器与Cache都要进行移动。因此系统应尽量使进程在单个CPU中调度，称为<strong>处理器亲和性</strong></p>\n<p>CPU多核性能平均化称为 <strong>性能均衡</strong>，因此需要进行CPU间的进程调度实现多核平均化。从这个角度看，处理器亲和性与性能均衡是相对矛盾的，性能均衡会抵消处理器亲和性的性能优化。</p>\n<p>面向多处理机的调度方式可以分为公有队列调度与私有队列调度</p>\n<ul>\n<li>公共就绪队列调度 -- 所有CPU共享一个进程队列，CPU的性能均衡但是处理器亲和性较差</li>\n<li>私有就绪队列调度 -- 调度器控制私有队列进程，当CPU空闲时调用私有队列进程执行。CPU的处理器亲和性高但是性能不均衡</li>\n</ul>\n<p>处理器的亲和性的提升方法分为软亲和与硬亲和。软亲和指调度器进行进程软绑定在某个CPU上，尽量向这个CPU调度；硬亲和指用户进程通过<strong>系统调度</strong>主动请求系统绑定在某个CPU上。</p>\n<h2><span id=\"协程\"> 协程</span></h2>\n<p>协程是语言实现的，可以在线程内部执行、中断并恢复的程序控制结构，通过<code>yield()</code>自主管理CPU的调用，其性能开销小于线程的切换</p>\n",
            "tags": []
        },
        {
            "id": "https://yuuko.site/2026/05/24/CS/LLM/deepnet/",
            "url": "https://yuuko.site/2026/05/24/CS/LLM/deepnet/",
            "title": "DeepNet",
            "date_published": "2026-05-23T16:00:00.000Z",
            "content_html": "<link rel=\"stylesheet\" type=\"text/css\" href=\"https://cdn.jsdelivr.net/hint.css/2.4.1/hint.min.css\"><h1><span id=\"deepnet\"> DeepNet</span></h1>\n<p>本篇文章作为Transformer 中的 Layer Normalization与梯度稳定性的后继文章，进一步进行Layer Normalization的相关学习</p>\n<h2><span id=\"warm-up-与初始化方式对于梯度下降稳定性的影响\"> Warm-up 与初始化方式对于梯度下降稳定性的影响</span></h2>\n<p>在本节设置了三个实验组进行对比</p>\n<ul>\n<li><strong>对照组1:</strong> Post-LN + 标准Xavier初始化 + No Warm-up</li>\n<li><strong>对照组2:</strong> Post-LN + 标准Xavier初始化 + Warm-up</li>\n<li><strong>对照组3:</strong> Post-LN + Xavier缩放初始化 + Warm-up</li>\n</ul>\n<p>Xavier的缩放初始化是对于第 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>l</mi></mrow><annotation encoding=\"application/x-tex\">l</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.01968em;\">l</span></span></span></span> 层，<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>l</mi><mo>∈</mo><mo stretchy=\"false\">[</mo><mn>1</mn><mo separator=\"true\">,</mo><mi>N</mi><mo stretchy=\"false\">]</mo></mrow><annotation encoding=\"application/x-tex\">l\\in[1,N]</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7335em;vertical-align:-0.0391em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.01968em;\">l</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">∈</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">[</span><span class=\"mord\">1</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span><span class=\"mclose\">]</span></span></span></span>, 有随着深度递减的缩放因子<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>k</mi><mi>l</mi></msub><mo>=</mo><mi>N</mi><mo>+</mo><mn>1</mn><mo>−</mo><mi>l</mi></mrow><annotation encoding=\"application/x-tex\">k_l = N+1-l</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8444em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03148em;\">k</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0315em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.01968em;\">l</span></span></span></span>, 使得</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msubsup><mi>W</mi><mi>o</mi><mi>l</mi></msubsup><mo>∼</mo><mi mathvariant=\"script\">N</mi><mo stretchy=\"false\">(</mo><mn>0</mn><mo separator=\"true\">,</mo><mfrac><mn>1</mn><mrow><msubsup><mi>k</mi><mi>l</mi><mn>2</mn></msubsup><msup><mi>d</mi><mo mathvariant=\"normal\" lspace=\"0em\" rspace=\"0em\">′</mo></msup></mrow></mfrac><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">W_o^l\\sim \\mathcal{N}(0,\\frac{1}{k_l^2d&#x27;})\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.1461em;vertical-align:-0.247em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8991em;\"><span style=\"top:-2.453em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">o</span></span></span><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.247em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">∼</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.3087em;vertical-align:-0.9873em;\"></span><span class=\"mord mathcal\" style=\"margin-right:0.14736em;\">N</span><span class=\"mopen\">(</span><span class=\"mord\">0</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3214em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03148em;\">k</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7959em;\"><span style=\"top:-2.3987em;margin-left:-0.0315em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span><span style=\"top:-3.0448em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3013em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\">d</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6779em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">′</span></span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9873em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose\">)</span></span></span></span></span></p>\n<p><img loading=\"lazy\" src=\"/picture/DeepNet/ScalingInit.png\" alt=\"scaling_init\"></p>\n<p>其采用的评估指标为整个模型各轮训练的输出更新的模, <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>θ</mi><mi>i</mi></msub></mrow><annotation encoding=\"application/x-tex\">\\theta_i</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8444em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span> 表示第<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>i</mi></mrow><annotation encoding=\"application/x-tex\">i</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6595em;\"></span><span class=\"mord mathnormal\">i</span></span></span></span>轮训练后的参数, <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>θ</mi><mn>0</mn></msub></mrow><annotation encoding=\"application/x-tex\">\\theta_0</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8444em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">0</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span> 表示初始化参数</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi mathvariant=\"normal\">∥</mi><mi mathvariant=\"normal\">Δ</mi><msub><mi>F</mi><mi>i</mi></msub><mi mathvariant=\"normal\">∥</mi><mo>=</mo><mi mathvariant=\"normal\">∥</mi><mi>F</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo separator=\"true\">,</mo><msub><mi>θ</mi><mi>i</mi></msub><mo stretchy=\"false\">)</mo><mo>−</mo><mi>F</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo separator=\"true\">,</mo><msub><mi>θ</mi><mn>0</mn></msub><mo stretchy=\"false\">)</mo><mi mathvariant=\"normal\">∥</mi></mrow><annotation encoding=\"application/x-tex\">\\|\\Delta F_i\\| = \\|F(x,\\theta_i)-F(x,\\theta_0)\\|\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">∥Δ</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">F</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\">∥</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">∥</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">F</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">F</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">0</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mord\">∥</span></span></span></span></span></p>\n<p>在图4.a中，对照组1 的Model Update在初始步就出现了爆炸。这一结论和 <a href=\"/2026/05/23/CS/LLM/ln/\" title=\"Transformer 中的 Layer Normalization与梯度稳定性\">Transformer 中的 Layer Normalization与梯度稳定性</a> 中的结论是一致的。</p>\n<p>通过图3的子图对比，作者将Model Update 而非梯度模作为评估梯度下降稳定性的评估指标，因为在对照组2、3的对比中，对照组3的梯度模更大，但是具有更好的收敛效果。</p>\n<h2><span id=\"deepnet-的结构\"> DeepNet 的结构</span></h2>\n<p>本文的主体创新点，即DeepNorm 架构。相对于Pre-LN,就是在主输入信号处增加了一个可学习参数 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>α</mi><mo>&gt;</mo><mn>1</mn></mrow><annotation encoding=\"application/x-tex\">\\alpha&gt;1</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5782em;vertical-align:-0.0391em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">&gt;</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">1</span></span></span></span>, 对输入tensor进行了放大。</p>\n<p><img loading=\"lazy\" src=\"/picture/DeepNet/DeepNet_architecture.png\" alt=\"deepnet_architecture\"></p>\n<p><strong>Lemma 1.</strong>: 对于 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>X</mi><mo>=</mo><mo stretchy=\"false\">(</mo><msub><mi mathvariant=\"bold\">x</mi><mn>1</mn></msub><mo separator=\"true\">,</mo><mo>⋯</mo><mtext> </mtext><mo separator=\"true\">,</mo><msub><mi mathvariant=\"bold\">x</mi><mi>n</mi></msub><mo stretchy=\"false\">)</mo><mo>∈</mo><msup><mi mathvariant=\"double-struck\">R</mi><mrow><mi>n</mi><mo>×</mo><mi>d</mi></mrow></msup></mrow><annotation encoding=\"application/x-tex\">X=(\\mathbf{x}_1,\\cdots,\\mathbf{x}_n)\\in \\mathbb{R}^{n\\times d}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathbf\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\">⋯</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathbf\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">∈</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8491em;\"></span><span class=\"mord\"><span class=\"mord mathbb\">R</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8491em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">n</span><span class=\"mbin mtight\">×</span><span class=\"mord mathnormal mtight\">d</span></span></span></span></span></span></span></span></span></span></span></span>, <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mrow><mi mathvariant=\"normal\">V</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">r</mi></mrow><mo stretchy=\"false\">(</mo><msub><mi mathvariant=\"bold\">x</mi><mi>i</mi></msub><mo stretchy=\"false\">)</mo><mo>=</mo><mn>1</mn><mo separator=\"true\">,</mo><mi mathvariant=\"bold\">E</mi><mo stretchy=\"false\">(</mo><msub><mi mathvariant=\"bold\">x</mi><mi>i</mi></msub><mo stretchy=\"false\">)</mo><mo>=</mo><mn>0</mn></mrow><annotation encoding=\"application/x-tex\">\\mathrm{Var}(\\mathbf{x}_i) = 1, \\mathbf{E}(\\mathbf{x}_i) = 0</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">Var</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathbf\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">1</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathbf\">E</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathbf\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">0</span></span></span></span>, <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>q</mi><mi>i</mi></msub><mo>∈</mo><mo stretchy=\"false\">[</mo><mn>0</mn><mo separator=\"true\">,</mo><mn>1</mn><mo stretchy=\"false\">]</mo></mrow><annotation encoding=\"application/x-tex\">q_i\\in [0,1]</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7335em;vertical-align:-0.1944em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">q</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">∈</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">[</span><span class=\"mord\">0</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">1</span><span class=\"mclose\">]</span></span></span></span>, 则有</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mrow><mi mathvariant=\"normal\">S</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">f</mi><mi mathvariant=\"normal\">t</mi><mi mathvariant=\"normal\">m</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">x</mi></mrow><mo stretchy=\"false\">(</mo><msub><mi>q</mi><mn>1</mn></msub><mo separator=\"true\">,</mo><mo>⋯</mo><mtext> </mtext><mo separator=\"true\">,</mo><msub><mi>q</mi><mi>n</mi></msub><mo stretchy=\"false\">)</mo><mo>⋅</mo><mi>X</mi><mo>≍</mo><msub><mi mathvariant=\"bold\">x</mi><mi>i</mi></msub></mrow><annotation encoding=\"application/x-tex\">\\mathrm{Softmax}(q_1,\\cdots, q_n)\\cdot X \\asymp \\mathbf{x}_i\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">Softmax</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">q</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\">⋯</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">q</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≍</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.5944em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathbf\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span></span></p>\n<p>这里的 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo>≍</mo></mrow><annotation encoding=\"application/x-tex\">\\asymp</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4637em;\"></span><span class=\"mrel\">≍</span></span></span></span> 表示 equal bound of magnitude，即只比较数量级上的上下界，而不要求两个向量在方向或逐元素取值上相同。<br>\n<strong>Proof:</strong></p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mtable rowspacing=\"0.25em\" columnalign=\"right left\" columnspacing=\"0em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow><mi mathvariant=\"normal\">S</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">f</mi><mi mathvariant=\"normal\">t</mi><mi mathvariant=\"normal\">m</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">x</mi></mrow><mo stretchy=\"false\">(</mo><msub><mi>q</mi><mn>1</mn></msub><mo separator=\"true\">,</mo><mo>⋯</mo><mtext> </mtext><mo separator=\"true\">,</mo><msub><mi>q</mi><mi>n</mi></msub><mo stretchy=\"false\">)</mo><mo>⋅</mo><mi>X</mi></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><msub><mrow><mo fence=\"true\">(</mo><mfrac><mrow><mi>exp</mi><mo>⁡</mo><msub><mi>q</mi><mi>i</mi></msub></mrow><mrow><mo>∑</mo><mi>exp</mi><mo>⁡</mo><msub><mi>q</mi><mi>j</mi></msub></mrow></mfrac><mo fence=\"true\">)</mo></mrow><mi>i</mi></msub><mo>⋅</mo><msup><mrow><mo fence=\"true\">(</mo><msub><mi mathvariant=\"bold\">x</mi><mn>1</mn></msub><mo separator=\"true\">,</mo><mo>⋯</mo><mtext> </mtext><mo separator=\"true\">,</mo><msub><mi mathvariant=\"bold\">x</mi><mi>n</mi></msub><mo fence=\"true\">)</mo></mrow><mi>T</mi></msup></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mo>∑</mo><mfrac><mrow><msub><mi mathvariant=\"bold\">x</mi><mi>i</mi></msub><mi>exp</mi><mo>⁡</mo><msub><mi>q</mi><mi>i</mi></msub></mrow><mrow><mo>∑</mo><mi>exp</mi><mo>⁡</mo><msub><mi>q</mi><mi>j</mi></msub></mrow></mfrac></mrow></mstyle></mtd></mtr></mtable><annotation encoding=\"application/x-tex\">\\begin{aligned}\n\\mathrm{Softmax}(q_1,\\cdots, q_n) \\cdot X &amp;= \\left(\\frac{\\exp{q_i}}{\\sum \\exp{q_j}}\\right)_{i}\\cdot \\left(\\mathbf{x}_1,\\cdots ,\\mathbf{x}_n \\right)^T\\\\\n&amp;=\\sum \\frac{\\mathbf{x}_i\\exp q_i}{\\sum\\exp q_j}\n\\end{aligned}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:5.1654em;vertical-align:-2.3327em;\"></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-r\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.8327em;\"><span style=\"top:-4.8327em;\"><span class=\"pstrut\" style=\"height:3.45em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathrm\">Softmax</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">q</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\">⋯</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">q</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span></span></span><span style=\"top:-2.3894em;\"><span class=\"pstrut\" style=\"height:3.45em;\"></span><span class=\"mord\"></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.3327em;\"><span></span></span></span></span></span><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.8327em;\"><span style=\"top:-4.8327em;\"><span class=\"pstrut\" style=\"height:3.45em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"minner\"><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">(</span></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1076em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mop op-symbol small-op\" style=\"position:relative;top:0em;\">∑</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop\">exp</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">q</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mop\">exp</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">q</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9721em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">)</span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:-0.5601em;\"><span style=\"top:-1.6782em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.0218em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"minner\"><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\">(</span><span class=\"mord\"><span class=\"mord mathbf\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\">⋯</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathbf\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose delimcenter\" style=\"top:0em;\">)</span></span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9812em;\"><span style=\"top:-3.2029em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em;\">T</span></span></span></span></span></span></span></span></span></span><span style=\"top:-2.3894em;\"><span class=\"pstrut\" style=\"height:3.45em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mop op-symbol large-op\" style=\"position:relative;top:0em;\">∑</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1214em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mop op-symbol small-op\" style=\"position:relative;top:0em;\">∑</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop\">exp</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">q</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathbf\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop\">exp</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">q</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9721em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.3327em;\"><span></span></span></span></span></span></span></span></span></span></span></span></p>\n<p>由于</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>n</mi><mo>≤</mo><mo>∑</mo><mi>exp</mi><mo>⁡</mo><msub><mi>q</mi><mi>j</mi></msub><mo>≤</mo><mi>n</mi><mi>e</mi></mrow><annotation encoding=\"application/x-tex\">n\\leq\\sum \\exp q_j\\leq ne\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7719em;vertical-align:-0.136em;\"></span><span class=\"mord mathnormal\">n</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≤</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.6em;vertical-align:-0.55em;\"></span><span class=\"mop op-symbol large-op\" style=\"position:relative;top:0em;\">∑</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop\">exp</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">q</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≤</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">n</span><span class=\"mord mathnormal\">e</span></span></span></span></span></p>\n<p>有</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mfrac><mn>1</mn><mrow><mo>∑</mo><mi>exp</mi><mo>⁡</mo><msub><mi>q</mi><mi>j</mi></msub></mrow></mfrac><mo>=</mo><mi mathvariant=\"normal\">Θ</mi><mrow><mo fence=\"true\">(</mo><mfrac><mn>1</mn><mi>n</mi></mfrac><mo fence=\"true\">)</mo></mrow></mrow><annotation encoding=\"application/x-tex\">\\frac{1}{\\sum \\exp q_j} = \\Theta\\left(\\frac{1}{n}\\right)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:2.2935em;vertical-align:-0.9721em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3214em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mop op-symbol small-op\" style=\"position:relative;top:0em;\">∑</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop\">exp</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">q</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9721em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.4em;vertical-align:-0.95em;\"></span><span class=\"mord\">Θ</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">(</span></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3214em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">n</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">)</span></span></span></span></span></span></span></p>\n<p>因此可以写成凸集的形式</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mrow><mi mathvariant=\"normal\">S</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">f</mi><mi mathvariant=\"normal\">t</mi><mi mathvariant=\"normal\">m</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">x</mi></mrow><mo stretchy=\"false\">(</mo><msub><mi>q</mi><mn>1</mn></msub><mo separator=\"true\">,</mo><mo>⋯</mo><mtext> </mtext><mo separator=\"true\">,</mo><msub><mi>q</mi><mi>n</mi></msub><mo stretchy=\"false\">)</mo><mi>X</mi><mo>=</mo><munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></munderover><msub><mi>w</mi><mi>i</mi></msub><msub><mi mathvariant=\"bold\">x</mi><mi>i</mi></msub><mo separator=\"true\">,</mo><mspace width=\"1em\"><msub><mi>w</mi><mi>i</mi></msub><mo>=</mo><mfrac><mrow><mi>exp</mi><mo>⁡</mo><msub><mi>q</mi><mi>i</mi></msub></mrow><mrow><mo>∑</mo><mi>exp</mi><mo>⁡</mo><msub><mi>q</mi><mi>j</mi></msub></mrow></mfrac><mo>=</mo><mi mathvariant=\"normal\">Θ</mi><mrow><mo fence=\"true\">(</mo><mfrac><mn>1</mn><mi>n</mi></mfrac><mo fence=\"true\">)</mo></mrow><mi mathvariant=\"normal\">.</mi></mspace></mrow><annotation encoding=\"application/x-tex\">\\mathrm{Softmax}(q_1,\\cdots,q_n)X\n= \\sum_{i=1}^n w_i\\mathbf{x}_i,\n\\quad\nw_i=\\frac{\\exp{q_i}}{\\sum \\exp q_j}=\\Theta\\left(\\frac1n\\right).\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">Softmax</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">q</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\">⋯</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">q</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.9291em;vertical-align:-1.2777em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.6514em;\"><span style=\"top:-1.8723em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">i</span><span class=\"mrel mtight\">=</span><span class=\"mord mtight\">1</span></span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span><span style=\"top:-4.3em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.2777em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathbf\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:1em;\"></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.0797em;vertical-align:-0.9721em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1076em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mop op-symbol small-op\" style=\"position:relative;top:0em;\">∑</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop\">exp</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">q</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mop\">exp</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">q</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9721em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.4em;vertical-align:-0.95em;\"></span><span class=\"mord\">Θ</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">(</span></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3214em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">n</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">)</span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">.</span></span></span></span></span></p>\n<p>因此，由 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>w</mi><mi>i</mi></msub><mo>=</mo><mi mathvariant=\"normal\">Θ</mi><mrow><mo fence=\"true\">(</mo><mfrac><mn>1</mn><mi>n</mi></mfrac><mo fence=\"true\">)</mo></mrow></mrow><annotation encoding=\"application/x-tex\">w_i=\\Theta\\left(\\frac1n\\right)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5806em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.2em;vertical-align:-0.35em;\"></span><span class=\"mord\">Θ</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size1\">(</span></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8451em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.394em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.345em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size1\">)</span></span></span></span></span></span> 可得</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></munderover><msub><mi>w</mi><mi>i</mi></msub><msub><mi mathvariant=\"bold\">x</mi><mi>i</mi></msub><mo>≍</mo><munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></munderover><mfrac><mn>1</mn><mi>n</mi></mfrac><msub><mi mathvariant=\"bold\">x</mi><mi>i</mi></msub><mi mathvariant=\"normal\">.</mi></mrow><annotation encoding=\"application/x-tex\">\\sum_{i=1}^n w_i\\mathbf{x}_i\n\\asymp\n\\sum_{i=1}^n \\frac1n\\mathbf{x}_i .\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:2.9291em;vertical-align:-1.2777em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.6514em;\"><span style=\"top:-1.8723em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">i</span><span class=\"mrel mtight\">=</span><span class=\"mord mtight\">1</span></span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span><span style=\"top:-4.3em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.2777em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathbf\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≍</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.9291em;vertical-align:-1.2777em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.6514em;\"><span style=\"top:-1.8723em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">i</span><span class=\"mrel mtight\">=</span><span class=\"mord mtight\">1</span></span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span><span style=\"top:-4.3em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.2777em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3214em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">n</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mord\"><span class=\"mord mathbf\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\">.</span></span></span></span></span></p>\n<p>又因为每个 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi mathvariant=\"bold\">x</mi><mi>i</mi></msub></mrow><annotation encoding=\"application/x-tex\">\\mathbf{x}_i</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5944em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathbf\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span> 都满足 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"bold\">E</mi><mo stretchy=\"false\">(</mo><msub><mi mathvariant=\"bold\">x</mi><mi>i</mi></msub><mo stretchy=\"false\">)</mo><mo>=</mo><mn>0</mn></mrow><annotation encoding=\"application/x-tex\">\\mathbf{E}(\\mathbf{x}_i)=0</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathbf\">E</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathbf\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">0</span></span></span></span> 且 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mrow><mi mathvariant=\"normal\">V</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">r</mi></mrow><mo stretchy=\"false\">(</mo><msub><mi mathvariant=\"bold\">x</mi><mi>i</mi></msub><mo stretchy=\"false\">)</mo><mo>=</mo><mn>1</mn></mrow><annotation encoding=\"application/x-tex\">\\mathrm{Var}(\\mathbf{x}_i)=1</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">Var</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathbf\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">1</span></span></span></span>，所以所有 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi mathvariant=\"bold\">x</mi><mi>i</mi></msub></mrow><annotation encoding=\"application/x-tex\">\\mathbf{x}_i</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5944em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathbf\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span> 具有相同的 magnitude bound。于是 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>n</mi></mrow><annotation encoding=\"application/x-tex\">n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">n</span></span></span></span> 个同阶向量经过 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mfrac><mn>1</mn><mi>n</mi></mfrac></mrow><annotation encoding=\"application/x-tex\">\\frac1n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.1901em;vertical-align:-0.345em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8451em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.394em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.345em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span> 量级的权重加权求和后，其输出仍然具有与单个 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi mathvariant=\"bold\">x</mi><mi>i</mi></msub></mrow><annotation encoding=\"application/x-tex\">\\mathbf{x}_i</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5944em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathbf\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span> 相同的 magnitude bound，即</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></munderover><mfrac><mn>1</mn><mi>n</mi></mfrac><msub><mi mathvariant=\"bold\">x</mi><mi>i</mi></msub><mo>≍</mo><msub><mi mathvariant=\"bold\">x</mi><mi>i</mi></msub><mi mathvariant=\"normal\">.</mi></mrow><annotation encoding=\"application/x-tex\">\\sum_{i=1}^n \\frac1n\\mathbf{x}_i \\asymp \\mathbf{x}_i .\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:2.9291em;vertical-align:-1.2777em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.6514em;\"><span style=\"top:-1.8723em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">i</span><span class=\"mrel mtight\">=</span><span class=\"mord mtight\">1</span></span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span><span style=\"top:-4.3em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.2777em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3214em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">n</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mord\"><span class=\"mord mathbf\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≍</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.5944em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathbf\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\">.</span></span></span></span></span></p>\n<p>因此</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mrow><mi mathvariant=\"normal\">S</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">f</mi><mi mathvariant=\"normal\">t</mi><mi mathvariant=\"normal\">m</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">x</mi></mrow><mo stretchy=\"false\">(</mo><msub><mi>q</mi><mn>1</mn></msub><mo separator=\"true\">,</mo><mo>⋯</mo><mtext> </mtext><mo separator=\"true\">,</mo><msub><mi>q</mi><mi>n</mi></msub><mo stretchy=\"false\">)</mo><mi>X</mi><mo>=</mo><munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></munderover><msub><mi>w</mi><mi>i</mi></msub><msub><mi mathvariant=\"bold\">x</mi><mi>i</mi></msub><mo>≍</mo><msub><mi mathvariant=\"bold\">x</mi><mi>i</mi></msub><mi mathvariant=\"normal\">.</mi></mrow><annotation encoding=\"application/x-tex\">\\mathrm{Softmax}(q_1,\\cdots,q_n)X\n= \\sum_{i=1}^n w_i\\mathbf{x}_i\n\\asymp \\mathbf{x}_i .\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">Softmax</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">q</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\">⋯</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">q</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.9291em;vertical-align:-1.2777em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.6514em;\"><span style=\"top:-1.8723em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">i</span><span class=\"mrel mtight\">=</span><span class=\"mord mtight\">1</span></span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span><span style=\"top:-4.3em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.2777em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathbf\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≍</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.5944em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathbf\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\">.</span></span></span></span></span></p>\n<p>Lemma 1对Attention输出头的阶进行了估计，保证了输入和输出的阶是相等的，Attention不会导致Model Update发生爆炸</p>\n<p><strong>Lemma 2</strong> 方差的阶的可加性</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mrow><mi mathvariant=\"normal\">V</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">r</mi></mrow><mo stretchy=\"false\">(</mo><mi>X</mi><mo>+</mo><mi>Y</mi><mo stretchy=\"false\">)</mo><mo>≍</mo><mrow><mi mathvariant=\"normal\">V</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">r</mi></mrow><mo stretchy=\"false\">(</mo><mi>X</mi><mo stretchy=\"false\">)</mo><mo>+</mo><mrow><mi mathvariant=\"normal\">V</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">r</mi></mrow><mo stretchy=\"false\">(</mo><mi>Y</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\mathrm{Var}(X+Y)\\asymp \\mathrm{Var}(X) + \\mathrm{Var}(Y)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">Var</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em;\">Y</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≍</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">Var</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">Var</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em;\">Y</span><span class=\"mclose\">)</span></span></span></span></span></p>\n<p><strong>Proof:</strong><br>\n展开得</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mtable rowspacing=\"0.25em\" columnalign=\"right left\" columnspacing=\"0em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow><mi mathvariant=\"normal\">V</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">r</mi></mrow><mo stretchy=\"false\">(</mo><mi>X</mi><mo>+</mo><mi>Y</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mi mathvariant=\"double-struck\">E</mi><mo stretchy=\"false\">[</mo><mo stretchy=\"false\">(</mo><mi>X</mi><mo>+</mo><mi>Y</mi><msup><mo stretchy=\"false\">)</mo><mn>2</mn></msup><mo stretchy=\"false\">]</mo><mo>−</mo><mo stretchy=\"false\">[</mo><mi mathvariant=\"double-struck\">E</mi><mo stretchy=\"false\">(</mo><mi>X</mi><mo>+</mo><mi>Y</mi><mo stretchy=\"false\">)</mo><msup><mo stretchy=\"false\">]</mo><mn>2</mn></msup></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mi mathvariant=\"double-struck\">E</mi><mo stretchy=\"false\">(</mo><msup><mi>X</mi><mn>2</mn></msup><mo stretchy=\"false\">)</mo><mo>−</mo><msup><mi mathvariant=\"double-struck\">E</mi><mn>2</mn></msup><mo stretchy=\"false\">(</mo><mi>X</mi><mo stretchy=\"false\">)</mo><mo>+</mo><mi mathvariant=\"double-struck\">E</mi><mo stretchy=\"false\">(</mo><msup><mi>Y</mi><mn>2</mn></msup><mo stretchy=\"false\">)</mo><mo>−</mo><msup><mi mathvariant=\"double-struck\">E</mi><mn>2</mn></msup><mo stretchy=\"false\">(</mo><mi>Y</mi><mo stretchy=\"false\">)</mo><mo>+</mo><mn>2</mn><mi mathvariant=\"double-struck\">E</mi><mo stretchy=\"false\">(</mo><mi>X</mi><mi>Y</mi><mo stretchy=\"false\">)</mo><mo>−</mo><mi mathvariant=\"double-struck\">E</mi><mo stretchy=\"false\">(</mo><mi>X</mi><mo stretchy=\"false\">)</mo><mi mathvariant=\"double-struck\">E</mi><mo stretchy=\"false\">(</mo><mi>Y</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mrow><mi mathvariant=\"normal\">V</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">r</mi></mrow><mo stretchy=\"false\">(</mo><mi>X</mi><mo stretchy=\"false\">)</mo><mo>+</mo><mrow><mi mathvariant=\"normal\">V</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">r</mi></mrow><mo stretchy=\"false\">(</mo><mi>Y</mi><mo stretchy=\"false\">)</mo><mo>+</mo><mn>2</mn><mrow><mi mathvariant=\"normal\">c</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">v</mi></mrow><mo stretchy=\"false\">(</mo><mi>X</mi><mo separator=\"true\">,</mo><mi>Y</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr></mtable><annotation encoding=\"application/x-tex\">\\begin{aligned}\n\\mathrm{Var}(X+Y) &amp;= \\mathbb{E}[(X+Y)^2] - [\\mathbb{E}(X+Y)]^2\\\\\n&amp; = \\mathbb{E}(X^2) - \\mathbb{E}^2(X) +\\mathbb{E}(Y^2) - \\mathbb{E}^2(Y) +2\\mathbb{E}(XY) - \\mathbb{E}(X)\\mathbb{E}(Y)\\\\\n&amp; = \\mathrm{Var}(X) +\\mathrm{Var}(Y)+2\\mathrm{cov}(X,Y)\n\\end{aligned}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:4.5482em;vertical-align:-2.0241em;\"></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-r\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.5241em;\"><span style=\"top:-4.66em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathrm\">Var</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em;\">Y</span><span class=\"mclose\">)</span></span></span><span style=\"top:-3.1359em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"></span></span><span style=\"top:-1.6359em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.0241em;\"><span></span></span></span></span></span><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.5241em;\"><span style=\"top:-4.66em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord mathbb\">E</span><span class=\"mopen\">[(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em;\">Y</span><span class=\"mclose\"><span class=\"mclose\">)</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8641em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span><span class=\"mclose\">]</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mopen\">[</span><span class=\"mord mathbb\">E</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em;\">Y</span><span class=\"mclose\">)</span><span class=\"mclose\"><span class=\"mclose\">]</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8641em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.1359em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord mathbb\">E</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8641em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathbb\">E</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8641em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathbb\">E</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.22222em;\">Y</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8641em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathbb\">E</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8641em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em;\">Y</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\">2</span><span class=\"mord mathbb\">E</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em;\">Y</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathbb\">E</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span><span class=\"mclose\">)</span><span class=\"mord mathbb\">E</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em;\">Y</span><span class=\"mclose\">)</span></span></span><span style=\"top:-1.6359em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">Var</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">Var</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em;\">Y</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\">2</span><span class=\"mord\"><span class=\"mord mathrm\" style=\"margin-right:0.01389em;\">cov</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em;\">Y</span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.0241em;\"><span></span></span></span></span></span></span></span></span></span></span></span></p>\n<p><strong>Theorem 1</strong> 对于<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>N</mi></mrow><annotation encoding=\"application/x-tex\">N</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span></span></span></span>层DeepNet结构 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>F</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo separator=\"true\">,</mo><mi>θ</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">F(x,\\theta)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">F</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"mclose\">)</span></span></span></span> , 每一层的DeepNet可表示为 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>x</mi><mrow><mi>l</mi><mo>+</mo><mn>1</mn></mrow></msub><mo>=</mo><mi>f</mi><mo stretchy=\"false\">(</mo><msub><mi>x</mi><mi>l</mi></msub><mo separator=\"true\">,</mo><msub><mi>θ</mi><mi>l</mi></msub><mo stretchy=\"false\">)</mo><mo>=</mo><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow><mo stretchy=\"false\">(</mo><mi>α</mi><msub><mi>x</mi><mi>l</mi></msub><mo>+</mo><msub><mi>G</mi><msub><mi>θ</mi><mi>l</mi></msub></msub><mo stretchy=\"false\">(</mo><msub><mi>x</mi><mi>l</mi></msub><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">x_{l+1} = f(x_l,\\theta_l) = \\mathrm{LN}(\\alpha x_l+G_{\\theta_l}(x_l))</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6389em;vertical-align:-0.2083em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2083em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">LN</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.0059em;vertical-align:-0.2559em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">G</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3448em;\"><span style=\"top:-2.3488em;margin-left:-0.0278em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1512em;\"><span></span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2559em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">))</span></span></span></span>, <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>θ</mi><mi>i</mi></msub><mo>∈</mo><mrow><mo fence=\"true\">{</mo><msub><mi>θ</mi><mn>1</mn></msub><mo separator=\"true\">,</mo><mo>⋯</mo><mtext> </mtext><mo separator=\"true\">,</mo><msub><mi>θ</mi><mrow><mn>2</mn><mi>N</mi></mrow></msub><mo fence=\"true\">}</mo></mrow></mrow><annotation encoding=\"application/x-tex\">\\theta_i\\in \\left\\{\\theta_1,\\cdots,\\theta_{2N}\\right\\}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8444em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">∈</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\">{</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\">⋯</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose delimcenter\" style=\"top:0em;\">}</span></span></span></span></span></p>\n<p>其中奇数参数是Selt-Attention的参数权重，偶数参数是FFN的参数权重。则有</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi mathvariant=\"normal\">∥</mi><mi mathvariant=\"normal\">Δ</mi><mi>F</mi><mi mathvariant=\"normal\">∥</mi><mo>≤</mo><munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></munderover><mfrac><msqrt><mrow><msubsup><mi>v</mi><mi>i</mi><mn>2</mn></msubsup><mo>+</mo><msubsup><mi>w</mi><mi>i</mi><mn>2</mn></msubsup></mrow></msqrt><mi>α</mi></mfrac><mi mathvariant=\"normal\">∥</mi><msubsup><mi>θ</mi><mi>i</mi><mo>∗</mo></msubsup><mo>−</mo><msub><mi>θ</mi><mi>i</mi></msub><mi mathvariant=\"normal\">∥</mi></mrow><annotation encoding=\"application/x-tex\">\\|\\Delta F\\|\\leq \\sum_{i=1}^n \\frac{\\sqrt{v_i^2+w_i^2}}{\\alpha}\\|\\theta^*_i-\\theta_i\\|\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">∥Δ</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">F</span><span class=\"mord\">∥</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≤</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.9291em;vertical-align:-1.2777em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.6514em;\"><span style=\"top:-1.8723em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">i</span><span class=\"mrel mtight\">=</span><span class=\"mord mtight\">1</span></span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span><span style=\"top:-4.3em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.2777em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.63em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.6855em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9445em;\"><span class=\"svg-align\" style=\"top:-3.2em;\"><span class=\"pstrut\" style=\"height:3.2em;\"></span><span class=\"mord\" style=\"padding-left:1em;\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7959em;\"><span style=\"top:-2.4231em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span><span style=\"top:-3.0448em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2769em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7959em;\"><span style=\"top:-2.4231em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span><span style=\"top:-3.0448em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2769em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.9045em;\"><span class=\"pstrut\" style=\"height:3.2em;\"></span><span class=\"hide-tail\" style=\"min-width:1.02em;height:1.28em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.28em\" viewbox=\"0 0 400000 1296\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M263,681c0.7,0,18,39.7,52,119\nc34,79.3,68.167,158.7,102.5,238c34.3,79.3,51.8,119.3,52.5,120\nc340,-704.7,510.7,-1060.3,512,-1067\nl0 -0\nc4.7,-7.3,11,-11,19,-11\nH40000v40H1012.3\ns-271.3,567,-271.3,567c-38.7,80.7,-84,175,-136,283c-52,108,-89.167,185.3,-111.5,232\nc-22.3,46.7,-33.8,70.3,-34.5,71c-4.7,4.7,-12.3,7,-23,7s-12,-1,-12,-1\ns-109,-253,-109,-253c-72.7,-168,-109.3,-252,-110,-252c-10.7,8,-22,16.7,-34,26\nc-22,17.3,-33.3,26,-34,26s-26,-26,-26,-26s76,-59,76,-59s76,-60,76,-60z\nM1001 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2955em;\"><span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mord\">∥</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7387em;\"><span style=\"top:-2.453em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mbin mtight\">∗</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.247em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\">∥</span></span></span></span></span></p>\n<p>基于最简化的情况，Self-Attention Layer 与 FFN layer 都视为 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>G</mi><mi>θ</mi></msub><mo stretchy=\"false\">(</mo><msub><mi>x</mi><mi>l</mi></msub><mo stretchy=\"false\">)</mo><mo>≍</mo><msub><mi>v</mi><mi>l</mi></msub><msub><mi>w</mi><mi>l</mi></msub><msub><mi>x</mi><mi>l</mi></msub></mrow><annotation encoding=\"application/x-tex\">G_\\theta(x_l) \\asymp v_lw_lx_l</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">G</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em;\">θ</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≍</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.5806em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span>, 则每一个Layer的递推输出为</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mtable rowspacing=\"0.25em\" columnalign=\"right left\" columnspacing=\"0em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><msub><mi>x</mi><mrow><mi>l</mi><mo>+</mo><mn>1</mn></mrow></msub><mo>=</mo><mi>f</mi><mo stretchy=\"false\">(</mo><msub><mi>x</mi><mi>l</mi></msub><mo separator=\"true\">,</mo><msub><mi>θ</mi><mi>l</mi></msub><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mfrac><mrow><mi>α</mi><msub><mi>x</mi><mi>l</mi></msub><mo>+</mo><msub><mi>G</mi><msub><mi>θ</mi><mi>l</mi></msub></msub><mo stretchy=\"false\">(</mo><msub><mi>x</mi><mi>l</mi></msub><mo stretchy=\"false\">)</mo></mrow><msqrt><mrow><mrow><mi mathvariant=\"normal\">V</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">r</mi></mrow><mo stretchy=\"false\">(</mo><mi>α</mi><msub><mi>x</mi><mi>l</mi></msub><mo>+</mo><msub><mi>G</mi><msub><mi>θ</mi><mi>l</mi></msub></msub><mo stretchy=\"false\">(</mo><msub><mi>x</mi><mi>l</mi></msub><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo></mrow></msqrt></mfrac></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>≍</mo><mfrac><mrow><mo stretchy=\"false\">(</mo><mi>α</mi><mo>+</mo><msub><mi>v</mi><mi>l</mi></msub><msub><mi>w</mi><mi>l</mi></msub><mo stretchy=\"false\">)</mo><msub><mi>x</mi><mi>l</mi></msub></mrow><msqrt><mrow><mo stretchy=\"false\">(</mo><msup><mi>α</mi><mn>2</mn></msup><mo>+</mo><msubsup><mi>v</mi><mi>l</mi><mn>2</mn></msubsup><msubsup><mi>w</mi><mi>l</mi><mn>2</mn></msubsup><mo stretchy=\"false\">)</mo><mrow><mi mathvariant=\"normal\">V</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">r</mi></mrow><mo stretchy=\"false\">(</mo><msub><mi>x</mi><mi>l</mi></msub><mo stretchy=\"false\">)</mo></mrow></msqrt></mfrac></mrow></mstyle></mtd></mtr></mtable><annotation encoding=\"application/x-tex\">\\begin{aligned}\nx_{l+1} = f(x_l,\\theta_l) &amp;= \\frac{\\alpha x_l+G_{\\theta_l}(x_l)}{\\sqrt{\\mathrm{Var}(\\alpha x_l+G_{\\theta_l}(x_l))}}\\\\\n&amp; \\asymp \\frac{(\\alpha+v_lw_l)x_l}{\\sqrt{(\\alpha^2+v_l^2w_l^2)\\mathrm{Var}(x_l)}}\\\\\n\\end{aligned}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:5.714em;vertical-align:-2.607em;\"></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-r\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.107em;\"><span style=\"top:-5.107em;\"><span class=\"pstrut\" style=\"height:3.427em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2083em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span><span style=\"top:-2.25em;\"><span class=\"pstrut\" style=\"height:3.427em;\"></span><span class=\"mord\"></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.607em;\"><span></span></span></span></span></span><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.107em;\"><span style=\"top:-5.107em;\"><span class=\"pstrut\" style=\"height:3.427em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.427em;\"><span style=\"top:-2.1779em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9321em;\"><span class=\"svg-align\" style=\"top:-3.2em;\"><span class=\"pstrut\" style=\"height:3.2em;\"></span><span class=\"mord\" style=\"padding-left:1em;\"><span class=\"mord\"><span class=\"mord mathrm\">Var</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">G</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3448em;\"><span style=\"top:-2.3488em;margin-left:-0.0278em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1512em;\"><span></span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2559em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">))</span></span></span><span style=\"top:-2.8921em;\"><span class=\"pstrut\" style=\"height:3.2em;\"></span><span class=\"hide-tail\" style=\"min-width:1.02em;height:1.28em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.28em\" viewbox=\"0 0 400000 1296\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M263,681c0.7,0,18,39.7,52,119\nc34,79.3,68.167,158.7,102.5,238c34.3,79.3,51.8,119.3,52.5,120\nc340,-704.7,510.7,-1060.3,512,-1067\nl0 -0\nc4.7,-7.3,11,-11,19,-11\nH40000v40H1012.3\ns-271.3,567,-271.3,567c-38.7,80.7,-84,175,-136,283c-52,108,-89.167,185.3,-111.5,232\nc-22.3,46.7,-33.8,70.3,-34.5,71c-4.7,4.7,-12.3,7,-23,7s-12,-1,-12,-1\ns-109,-253,-109,-253c-72.7,-168,-109.3,-252,-110,-252c-10.7,8,-22,16.7,-34,26\nc-22,17.3,-33.3,26,-34,26s-26,-26,-26,-26s76,-59,76,-59s76,-60,76,-60z\nM1001 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3079em;\"><span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">G</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3448em;\"><span style=\"top:-2.3488em;margin-left:-0.0278em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1512em;\"><span></span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2559em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.13em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span><span style=\"top:-2.25em;\"><span class=\"pstrut\" style=\"height:3.427em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≍</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.427em;\"><span style=\"top:-2.1777em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9323em;\"><span class=\"svg-align\" style=\"top:-3.2em;\"><span class=\"pstrut\" style=\"height:3.2em;\"></span><span class=\"mord\" style=\"padding-left:1em;\"><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7401em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7959em;\"><span style=\"top:-2.3987em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span><span style=\"top:-3.0448em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3013em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7959em;\"><span style=\"top:-2.3987em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span><span style=\"top:-3.0448em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3013em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mord\"><span class=\"mord mathrm\">Var</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span><span style=\"top:-2.8923em;\"><span class=\"pstrut\" style=\"height:3.2em;\"></span><span class=\"hide-tail\" style=\"min-width:1.02em;height:1.28em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.28em\" viewbox=\"0 0 400000 1296\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M263,681c0.7,0,18,39.7,52,119\nc34,79.3,68.167,158.7,102.5,238c34.3,79.3,51.8,119.3,52.5,120\nc340,-704.7,510.7,-1060.3,512,-1067\nl0 -0\nc4.7,-7.3,11,-11,19,-11\nH40000v40H1012.3\ns-271.3,567,-271.3,567c-38.7,80.7,-84,175,-136,283c-52,108,-89.167,185.3,-111.5,232\nc-22.3,46.7,-33.8,70.3,-34.5,71c-4.7,4.7,-12.3,7,-23,7s-12,-1,-12,-1\ns-109,-253,-109,-253c-72.7,-168,-109.3,-252,-110,-252c-10.7,8,-22,16.7,-34,26\nc-22,17.3,-33.3,26,-34,26s-26,-26,-26,-26s76,-59,76,-59s76,-60,76,-60z\nM1001 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3077em;\"><span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.13em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.607em;\"><span></span></span></span></span></span></span></span></span></span></span></span></p>\n<p>根据Lemma 1, <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mrow><mi mathvariant=\"normal\">V</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">r</mi></mrow><mo stretchy=\"false\">(</mo><msub><mi>x</mi><mi>l</mi></msub><mo stretchy=\"false\">)</mo><mo>≍</mo><mrow><mi mathvariant=\"normal\">V</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">r</mi></mrow><mo stretchy=\"false\">(</mo><msub><mi>x</mi><mn>0</mn></msub><mo stretchy=\"false\">)</mo><mo>=</mo><mn>1</mn></mrow><annotation encoding=\"application/x-tex\">\\mathrm{Var}(x_l) \\asymp \\mathrm{Var}(x_0) = 1</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">Var</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≍</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">Var</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">0</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">1</span></span></span></span>, 故</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>x</mi><mrow><mi>l</mi><mo>+</mo><mn>1</mn></mrow></msub><mo>=</mo><mi>f</mi><mo stretchy=\"false\">(</mo><msub><mi>x</mi><mi>l</mi></msub><mo separator=\"true\">,</mo><msub><mi>θ</mi><mi>l</mi></msub><mo stretchy=\"false\">)</mo><mo>≍</mo><mfrac><mrow><mi>α</mi><mo>+</mo><msub><mi>v</mi><mi>l</mi></msub><msub><mi>w</mi><mi>l</mi></msub></mrow><msqrt><mrow><msup><mi>α</mi><mn>2</mn></msup><mo>+</mo><msubsup><mi>v</mi><mi>l</mi><mn>2</mn></msubsup><msubsup><mi>w</mi><mi>l</mi><mn>2</mn></msubsup></mrow></msqrt></mfrac><msub><mi>x</mi><mi>l</mi></msub></mrow><annotation encoding=\"application/x-tex\">x_{l+1} = f(x_l,\\theta_l) \\asymp \\frac{\\alpha+v_lw_l}{\\sqrt{\\alpha^2+v_l^2w_l^2}}x_l\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6389em;vertical-align:-0.2083em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2083em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≍</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.3903em;vertical-align:-1.13em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.2603em;\"><span style=\"top:-2.1777em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9323em;\"><span class=\"svg-align\" style=\"top:-3.2em;\"><span class=\"pstrut\" style=\"height:3.2em;\"></span><span class=\"mord\" style=\"padding-left:1em;\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7401em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7959em;\"><span style=\"top:-2.3987em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span><span style=\"top:-3.0448em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3013em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7959em;\"><span style=\"top:-2.3987em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span><span style=\"top:-3.0448em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3013em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.8923em;\"><span class=\"pstrut\" style=\"height:3.2em;\"></span><span class=\"hide-tail\" style=\"min-width:1.02em;height:1.28em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.28em\" viewbox=\"0 0 400000 1296\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M263,681c0.7,0,18,39.7,52,119\nc34,79.3,68.167,158.7,102.5,238c34.3,79.3,51.8,119.3,52.5,120\nc340,-704.7,510.7,-1060.3,512,-1067\nl0 -0\nc4.7,-7.3,11,-11,19,-11\nH40000v40H1012.3\ns-271.3,567,-271.3,567c-38.7,80.7,-84,175,-136,283c-52,108,-89.167,185.3,-111.5,232\nc-22.3,46.7,-33.8,70.3,-34.5,71c-4.7,4.7,-12.3,7,-23,7s-12,-1,-12,-1\ns-109,-253,-109,-253c-72.7,-168,-109.3,-252,-110,-252c-10.7,8,-22,16.7,-34,26\nc-22,17.3,-33.3,26,-34,26s-26,-26,-26,-26s76,-59,76,-59s76,-60,76,-60z\nM1001 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3077em;\"><span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.13em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span></span></p>\n<p>因此</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><msub><mi>f</mi><mi>l</mi></msub></mrow><mrow><mi mathvariant=\"normal\">∂</mi><msub><mi>x</mi><mi>l</mi></msub></mrow></mfrac><mo>≍</mo><mfrac><mrow><mi>α</mi><mo>+</mo><msub><mi>v</mi><mi>l</mi></msub><msub><mi>w</mi><mi>l</mi></msub></mrow><msqrt><mrow><msup><mi>α</mi><mn>2</mn></msup><mo>+</mo><msubsup><mi>v</mi><mi>l</mi><mn>2</mn></msubsup><msubsup><mi>w</mi><mi>l</mi><mn>2</mn></msubsup></mrow></msqrt></mfrac></mrow><annotation encoding=\"application/x-tex\">\\frac{\\partial f_l}{\\partial x_l} \\asymp \\frac{\\alpha+v_lw_l}{\\sqrt{\\alpha^2+v_l^2w_l^2}}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:2.2074em;vertical-align:-0.836em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3714em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.1076em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.836em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≍</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.3903em;vertical-align:-1.13em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.2603em;\"><span style=\"top:-2.1777em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9323em;\"><span class=\"svg-align\" style=\"top:-3.2em;\"><span class=\"pstrut\" style=\"height:3.2em;\"></span><span class=\"mord\" style=\"padding-left:1em;\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7401em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7959em;\"><span style=\"top:-2.3987em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span><span style=\"top:-3.0448em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3013em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7959em;\"><span style=\"top:-2.3987em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span><span style=\"top:-3.0448em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3013em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.8923em;\"><span class=\"pstrut\" style=\"height:3.2em;\"></span><span class=\"hide-tail\" style=\"min-width:1.02em;height:1.28em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.28em\" viewbox=\"0 0 400000 1296\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M263,681c0.7,0,18,39.7,52,119\nc34,79.3,68.167,158.7,102.5,238c34.3,79.3,51.8,119.3,52.5,120\nc340,-704.7,510.7,-1060.3,512,-1067\nl0 -0\nc4.7,-7.3,11,-11,19,-11\nH40000v40H1012.3\ns-271.3,567,-271.3,567c-38.7,80.7,-84,175,-136,283c-52,108,-89.167,185.3,-111.5,232\nc-22.3,46.7,-33.8,70.3,-34.5,71c-4.7,4.7,-12.3,7,-23,7s-12,-1,-12,-1\ns-109,-253,-109,-253c-72.7,-168,-109.3,-252,-110,-252c-10.7,8,-22,16.7,-34,26\nc-22,17.3,-33.3,26,-34,26s-26,-26,-26,-26s76,-59,76,-59s76,-60,76,-60z\nM1001 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3077em;\"><span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.13em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p>视 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>θ</mi><mi>l</mi></msub><mo>=</mo><mo stretchy=\"false\">(</mo><msub><mi>v</mi><mi>l</mi></msub><mo separator=\"true\">,</mo><msub><mi>w</mi><mi>l</mi></msub><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\theta_l = (v_l,w_l)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8444em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span></p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><msub><mi>f</mi><mi>l</mi></msub></mrow><mrow><mi mathvariant=\"normal\">∂</mi><msub><mi>θ</mi><mi>l</mi></msub></mrow></mfrac><mo>≍</mo><mfrac><mrow><mi>α</mi><msub><mi>x</mi><mi>l</mi></msub><mo stretchy=\"false\">(</mo><mi>α</mi><mo>−</mo><msub><mi>v</mi><mi>l</mi></msub><msub><mi>w</mi><mi>l</mi></msub><mo stretchy=\"false\">)</mo></mrow><mrow><mo stretchy=\"false\">(</mo><msup><mi>α</mi><mn>2</mn></msup><mo>+</mo><msubsup><mi>v</mi><mi>l</mi><mn>2</mn></msubsup><msubsup><mi>w</mi><mi>l</mi><mn>2</mn></msubsup><msup><mo stretchy=\"false\">)</mo><mfrac><mn>3</mn><mn>2</mn></mfrac></msup></mrow></mfrac><mrow><mo fence=\"true\">(</mo><msub><mi>w</mi><mi>l</mi></msub><mo separator=\"true\">,</mo><msub><mi>v</mi><mi>l</mi></msub><mo fence=\"true\">)</mo></mrow></mrow><annotation encoding=\"application/x-tex\">\\frac{\\partial f_l}{\\partial \\theta_l} \\asymp \\frac{\\alpha x_l(\\alpha -v_lw_l)}{(\\alpha^2+v_l^2w_l^2)^{\\frac{3}{2}}} \\left(w_l,v_l\\right)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:2.2074em;vertical-align:-0.836em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3714em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.1076em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.836em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≍</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.5579em;vertical-align:-1.1309em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.427em;\"><span style=\"top:-2.1704em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7401em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7959em;\"><span style=\"top:-2.3987em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span><span style=\"top:-3.0448em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3013em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7959em;\"><span style=\"top:-2.3987em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span><span style=\"top:-3.0448em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3013em;\"><span></span></span></span></span></span></span><span class=\"mclose\"><span class=\"mclose\">)</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9396em;\"><span style=\"top:-3.3486em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mopen nulldelimiter sizing reset-size3 size6\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8443em;\"><span style=\"top:-2.656em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span></span></span></span><span style=\"top:-3.2255em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line mtight\" style=\"border-bottom-width:0.049em;\"></span></span><span style=\"top:-3.384em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">3</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.344em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter sizing reset-size3 size6\"></span></span></span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1309em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose delimcenter\" style=\"top:0em;\">)</span></span></span></span></span></span></p>\n<p>因此可展开Layer Model Update</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mtable rowspacing=\"0.25em\" columnalign=\"right left\" columnspacing=\"0em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mi mathvariant=\"normal\">∥</mi><mi mathvariant=\"normal\">Δ</mi><mi>F</mi><mi mathvariant=\"normal\">∥</mi></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mi mathvariant=\"normal\">∥</mi><mi>f</mi><mo stretchy=\"false\">(</mo><msubsup><mi>x</mi><mrow><mn>2</mn><mi>N</mi></mrow><mo>∗</mo></msubsup><mo separator=\"true\">,</mo><msubsup><mi>θ</mi><mrow><mn>2</mn><mi>N</mi></mrow><mo>∗</mo></msubsup><mo stretchy=\"false\">)</mo><mo>−</mo><mi>f</mi><mo stretchy=\"false\">(</mo><msub><mi>x</mi><mrow><mn>2</mn><mi>N</mi></mrow></msub><mo separator=\"true\">,</mo><msub><mi>θ</mi><mrow><mn>2</mn><mi>N</mi></mrow></msub><mo stretchy=\"false\">)</mo><mi mathvariant=\"normal\">∥</mi></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mi mathvariant=\"normal\">∥</mi><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>f</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><msub><mi>x</mi><mi>l</mi></msub></mrow></mfrac><mo stretchy=\"false\">(</mo><msub><mi>x</mi><mrow><mn>2</mn><mi>N</mi></mrow></msub><mo separator=\"true\">,</mo><msub><mi>θ</mi><mrow><mn>2</mn><mi>N</mi></mrow></msub><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">(</mo><msubsup><mi>x</mi><mrow><mn>2</mn><mi>N</mi></mrow><mo>∗</mo></msubsup><mo>−</mo><msub><mi>x</mi><mrow><mn>2</mn><mi>N</mi></mrow></msub><mo stretchy=\"false\">)</mo><mo>+</mo><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>f</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><msub><mi>θ</mi><mi>l</mi></msub></mrow></mfrac><mo stretchy=\"false\">(</mo><msub><mi>x</mi><mrow><mn>2</mn><mi>N</mi></mrow></msub><mo separator=\"true\">,</mo><msub><mi>θ</mi><mrow><mn>2</mn><mi>N</mi></mrow></msub><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">(</mo><msubsup><mi>θ</mi><mrow><mn>2</mn><mi>N</mi></mrow><mo>∗</mo></msubsup><mo>−</mo><msub><mi>θ</mi><mrow><mn>2</mn><mi>N</mi></mrow></msub><mo stretchy=\"false\">)</mo><mi mathvariant=\"normal\">∥</mi></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>≤</mo><mi mathvariant=\"normal\">∥</mi><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>f</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><msub><mi>x</mi><mi>l</mi></msub></mrow></mfrac><mo stretchy=\"false\">(</mo><msub><mi>x</mi><mrow><mn>2</mn><mi>N</mi></mrow></msub><mo separator=\"true\">,</mo><msub><mi>θ</mi><mrow><mn>2</mn><mi>N</mi></mrow></msub><mo stretchy=\"false\">)</mo><mi mathvariant=\"normal\">∥</mi><mi mathvariant=\"normal\">∥</mi><msubsup><mi>x</mi><mrow><mn>2</mn><mi>N</mi></mrow><mo>∗</mo></msubsup><mo>−</mo><msub><mi>x</mi><mrow><mn>2</mn><mi>N</mi></mrow></msub><mi mathvariant=\"normal\">∥</mi><mo>+</mo><mi mathvariant=\"normal\">∥</mi><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>f</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><msub><mi>θ</mi><mi>l</mi></msub></mrow></mfrac><mo stretchy=\"false\">(</mo><msub><mi>x</mi><mrow><mn>2</mn><mi>N</mi></mrow></msub><mo separator=\"true\">,</mo><msub><mi>θ</mi><mrow><mn>2</mn><mi>N</mi></mrow></msub><mo stretchy=\"false\">)</mo><mi mathvariant=\"normal\">∥</mi><mi mathvariant=\"normal\">∥</mi><msubsup><mi>θ</mi><mrow><mn>2</mn><mi>N</mi></mrow><mo>∗</mo></msubsup><mo>−</mo><msub><mi>θ</mi><mrow><mn>2</mn><mi>N</mi></mrow></msub><mi mathvariant=\"normal\">∥</mi></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>≍</mo><mfrac><mrow><mi>α</mi><mo>+</mo><msub><mi>v</mi><mrow><mn>2</mn><mi>N</mi></mrow></msub><msub><mi>w</mi><mrow><mn>2</mn><mi>N</mi></mrow></msub></mrow><msqrt><mrow><msup><mi>α</mi><mn>2</mn></msup><mo>+</mo><msubsup><mi>v</mi><mrow><mn>2</mn><mi>N</mi></mrow><mn>2</mn></msubsup><msubsup><mi>w</mi><mrow><mn>2</mn><mi>N</mi></mrow><mn>2</mn></msubsup></mrow></msqrt></mfrac><mi mathvariant=\"normal\">∥</mi><msubsup><mi>x</mi><mrow><mn>2</mn><mi>N</mi></mrow><mo>∗</mo></msubsup><mo>−</mo><msub><mi>x</mi><mrow><mn>2</mn><mi>N</mi></mrow></msub><mi mathvariant=\"normal\">∥</mi><mo>+</mo><mfrac><mrow><mi>α</mi><msub><mi>x</mi><mrow><mn>2</mn><mi>N</mi></mrow></msub><mo stretchy=\"false\">(</mo><mi>α</mi><mo>−</mo><msub><mi>v</mi><mrow><mn>2</mn><mi>N</mi></mrow></msub><msub><mi>w</mi><mrow><mn>2</mn><mi>N</mi></mrow></msub><mo stretchy=\"false\">)</mo></mrow><mrow><mo stretchy=\"false\">(</mo><msup><mi>α</mi><mn>2</mn></msup><mo>+</mo><msubsup><mi>v</mi><mrow><mn>2</mn><mi>N</mi></mrow><mn>2</mn></msubsup><msubsup><mi>w</mi><mrow><mn>2</mn><mi>N</mi></mrow><mn>2</mn></msubsup><msup><mo stretchy=\"false\">)</mo><mfrac><mn>3</mn><mn>2</mn></mfrac></msup></mrow></mfrac><mi mathvariant=\"normal\">∥</mi><mrow><mo fence=\"true\">(</mo><msub><mi>w</mi><mrow><mn>2</mn><mi>N</mi></mrow></msub><mo separator=\"true\">,</mo><msub><mi>v</mi><mrow><mn>2</mn><mi>N</mi></mrow></msub><mo fence=\"true\">)</mo></mrow><mi mathvariant=\"normal\">∥</mi><mi mathvariant=\"normal\">∥</mi><msubsup><mi>θ</mi><mrow><mn>2</mn><mi>N</mi></mrow><mo>∗</mo></msubsup><mo>−</mo><msub><mi>θ</mi><mrow><mn>2</mn><mi>N</mi></mrow></msub><mi mathvariant=\"normal\">∥</mi><mi mathvariant=\"normal\">∣</mi></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>≤</mo><mi mathvariant=\"normal\">∥</mi><msubsup><mi>x</mi><mrow><mn>2</mn><mi>N</mi></mrow><mo>∗</mo></msubsup><mo>−</mo><msub><mi>x</mi><mrow><mn>2</mn><mi>N</mi></mrow></msub><mi mathvariant=\"normal\">∥</mi><mo>+</mo><mfrac><mrow><mi>α</mi><msub><mi>x</mi><mrow><mn>2</mn><mi>N</mi></mrow></msub><mo stretchy=\"false\">(</mo><mi>α</mi><mo>−</mo><msub><mi>v</mi><mrow><mn>2</mn><mi>N</mi></mrow></msub><msub><mi>w</mi><mrow><mn>2</mn><mi>N</mi></mrow></msub><mo stretchy=\"false\">)</mo></mrow><mrow><mo stretchy=\"false\">(</mo><msup><mi>α</mi><mn>2</mn></msup><mo>+</mo><msubsup><mi>v</mi><mrow><mn>2</mn><mi>N</mi></mrow><mn>2</mn></msubsup><msubsup><mi>w</mi><mrow><mn>2</mn><mi>N</mi></mrow><mn>2</mn></msubsup><msup><mo stretchy=\"false\">)</mo><mfrac><mn>3</mn><mn>2</mn></mfrac></msup></mrow></mfrac><msqrt><mrow><msubsup><mi>w</mi><mrow><mn>2</mn><mi>N</mi></mrow><mn>2</mn></msubsup><mo>+</mo><msubsup><mi>v</mi><mrow><mn>2</mn><mi>N</mi></mrow><mn>2</mn></msubsup></mrow></msqrt><mi mathvariant=\"normal\">∥</mi><msubsup><mi>θ</mi><mrow><mn>2</mn><mi>N</mi></mrow><mo>∗</mo></msubsup><mo>−</mo><msub><mi>θ</mi><mrow><mn>2</mn><mi>N</mi></mrow></msub><mi mathvariant=\"normal\">∥</mi></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>≍</mo><mi mathvariant=\"normal\">∥</mi><msubsup><mi>x</mi><mrow><mn>2</mn><mi>N</mi></mrow><mo>∗</mo></msubsup><mo>−</mo><msub><mi>x</mi><mrow><mn>2</mn><mi>N</mi></mrow></msub><mi mathvariant=\"normal\">∥</mi><mo>+</mo><mfrac><msqrt><mrow><msubsup><mi>w</mi><mrow><mn>2</mn><mi>N</mi></mrow><mn>2</mn></msubsup><mo>+</mo><msubsup><mi>v</mi><mrow><mn>2</mn><mi>N</mi></mrow><mn>2</mn></msubsup></mrow></msqrt><mi>α</mi></mfrac><mi mathvariant=\"normal\">∥</mi><msubsup><mi>θ</mi><mrow><mn>2</mn><mi>N</mi></mrow><mo>∗</mo></msubsup><mo>−</mo><msub><mi>θ</mi><mrow><mn>2</mn><mi>N</mi></mrow></msub><mi mathvariant=\"normal\">∥</mi><mtext> </mtext></mrow></mstyle></mtd></mtr></mtable><annotation encoding=\"application/x-tex\">\\begin{aligned}\n\\|\\Delta F\\| &amp;= \\|f(x_{2N}^*,\\theta_{2N}^*)-f(x_{2N},\\theta_{2N})\\|\\\\\n&amp; = \\|\\frac{\\partial f}{\\partial x_l}(x_{2N},\\theta_{2N})(x^*_{2N}-x_{2N}) +\\frac{\\partial f}{\\partial \\theta_l}(x_{2N},\\theta_{2N})(\\theta_{2N}^*-\\theta_{2N})\\|\\\\\n&amp;\\leq \\|\\frac{\\partial f}{\\partial x_l}(x_{2N},\\theta_{2N}) \\|  \\|x^*_{2N}-x_{2N}\\| +\\|\\frac{\\partial f}{\\partial \\theta_l}(x_{2N},\\theta_{2N})\\|\\|\\theta_{2N}^*-\\theta_{2N}\\|\\\\\n&amp;\\asymp \\frac{\\alpha+v_{2N}w_{2N}}{\\sqrt{\\alpha^2+v_{2N}^2w_{2N}^2}}\\|x^*_{2N}-x_{2N}\\|+ \\frac{\\alpha x_{2N}(\\alpha -v_{2N}w_{2N})}{(\\alpha^2+v_{2N}^2w_{2N}^2)^{\\frac{3}{2}}} \\|\\left(w_{2N},v_{2N}\\right)\\|\\|\\theta_{2N}^*-\\theta_{2N}\\||\\\\\n&amp;\\leq \\|x^*_{2N}-x_{2N}\\|+\\frac{\\alpha x_{2N}(\\alpha -v_{2N}w_{2N})}{(\\alpha^2+v_{2N}^2w_{2N}^2)^{\\frac{3}{2}}} \\sqrt{w_{2N}^2+v_{2N}^2}\\|\\theta_{2N}^*-\\theta_{2N}\\|\\\\\n&amp;\\asymp \\|x^*_{2N}-x_{2N}\\|+\\frac{\\sqrt{w_{2N}^2+v_{2N}^2}}{\\alpha}\\|\\theta_{2N}^*-\\theta_{2N}\\|\\\n\\end{aligned}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:14.838em;vertical-align:-7.169em;\"></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-r\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:7.669em;\"><span style=\"top:-10.459em;\"><span class=\"pstrut\" style=\"height:3.63em;\"></span><span class=\"mord\"><span class=\"mord\">∥Δ</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">F</span><span class=\"mord\">∥</span></span></span><span style=\"top:-8.4276em;\"><span class=\"pstrut\" style=\"height:3.63em;\"></span><span class=\"mord\"></span></span><span style=\"top:-5.9201em;\"><span class=\"pstrut\" style=\"height:3.63em;\"></span><span class=\"mord\"></span></span><span style=\"top:-3.3571em;\"><span class=\"pstrut\" style=\"height:3.63em;\"></span><span class=\"mord\"></span></span><span style=\"top:-0.5001em;\"><span class=\"pstrut\" style=\"height:3.63em;\"></span><span class=\"mord\"></span></span><span style=\"top:2.553em;\"><span class=\"pstrut\" style=\"height:3.63em;\"></span><span class=\"mord\"></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:7.169em;\"><span></span></span></span></span></span><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:7.669em;\"><span style=\"top:-10.459em;\"><span class=\"pstrut\" style=\"height:3.63em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\">∥</span><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7387em;\"><span style=\"top:-2.453em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mbin mtight\">∗</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.247em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7387em;\"><span style=\"top:-2.453em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mbin mtight\">∗</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.247em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mord\">∥</span></span></span><span style=\"top:-8.4276em;\"><span class=\"pstrut\" style=\"height:3.63em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\">∥</span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3714em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.836em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7387em;\"><span style=\"top:-2.453em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mbin mtight\">∗</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.247em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3714em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.836em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7387em;\"><span style=\"top:-2.453em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mbin mtight\">∗</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.247em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mord\">∥</span></span></span><span style=\"top:-5.9201em;\"><span class=\"pstrut\" style=\"height:3.63em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≤</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\">∥</span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3714em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.836em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mord\">∥∥</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7387em;\"><span style=\"top:-2.453em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mbin mtight\">∗</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.247em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\">∥</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\">∥</span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3714em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.836em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mord\">∥∥</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7387em;\"><span style=\"top:-2.453em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mbin mtight\">∗</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.247em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\">∥</span></span></span><span style=\"top:-3.3571em;\"><span class=\"pstrut\" style=\"height:3.63em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≍</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.2603em;\"><span style=\"top:-2.1738em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9362em;\"><span class=\"svg-align\" style=\"top:-3.2em;\"><span class=\"pstrut\" style=\"height:3.2em;\"></span><span class=\"mord\" style=\"padding-left:1em;\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7401em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7959em;\"><span style=\"top:-2.4065em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span><span style=\"top:-3.0448em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2935em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7959em;\"><span style=\"top:-2.4065em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span><span style=\"top:-3.0448em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2935em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.8962em;\"><span class=\"pstrut\" style=\"height:3.2em;\"></span><span class=\"hide-tail\" style=\"min-width:1.02em;height:1.28em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.28em\" viewbox=\"0 0 400000 1296\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M263,681c0.7,0,18,39.7,52,119\nc34,79.3,68.167,158.7,102.5,238c34.3,79.3,51.8,119.3,52.5,120\nc340,-704.7,510.7,-1060.3,512,-1067\nl0 -0\nc4.7,-7.3,11,-11,19,-11\nH40000v40H1012.3\ns-271.3,567,-271.3,567c-38.7,80.7,-84,175,-136,283c-52,108,-89.167,185.3,-111.5,232\nc-22.3,46.7,-33.8,70.3,-34.5,71c-4.7,4.7,-12.3,7,-23,7s-12,-1,-12,-1\ns-109,-253,-109,-253c-72.7,-168,-109.3,-252,-110,-252c-10.7,8,-22,16.7,-34,26\nc-22,17.3,-33.3,26,-34,26s-26,-26,-26,-26s76,-59,76,-59s76,-60,76,-60z\nM1001 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3038em;\"><span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.13em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mord\">∥</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7387em;\"><span style=\"top:-2.453em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mbin mtight\">∗</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.247em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\">∥</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.427em;\"><span style=\"top:-2.1704em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7401em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7959em;\"><span style=\"top:-2.4065em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span><span style=\"top:-3.0448em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2935em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7959em;\"><span style=\"top:-2.4065em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span><span style=\"top:-3.0448em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2935em;\"><span></span></span></span></span></span></span><span class=\"mclose\"><span class=\"mclose\">)</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9396em;\"><span style=\"top:-3.3486em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mopen nulldelimiter sizing reset-size3 size6\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8443em;\"><span style=\"top:-2.656em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span></span></span></span><span style=\"top:-3.2255em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line mtight\" style=\"border-bottom-width:0.049em;\"></span></span><span style=\"top:-3.384em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">3</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.344em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter sizing reset-size3 size6\"></span></span></span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1231em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mord\">∥</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose delimcenter\" style=\"top:0em;\">)</span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">∥∥</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7387em;\"><span style=\"top:-2.453em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mbin mtight\">∗</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.247em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\">∥∣</span></span></span><span style=\"top:-0.5001em;\"><span class=\"pstrut\" style=\"height:3.63em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≤</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\">∥</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7387em;\"><span style=\"top:-2.453em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mbin mtight\">∗</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.247em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\">∥</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.427em;\"><span style=\"top:-2.1704em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7401em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7959em;\"><span style=\"top:-2.4065em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span><span style=\"top:-3.0448em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2935em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7959em;\"><span style=\"top:-2.4065em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span><span style=\"top:-3.0448em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2935em;\"><span></span></span></span></span></span></span><span class=\"mclose\"><span class=\"mclose\">)</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9396em;\"><span style=\"top:-3.3486em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mopen nulldelimiter sizing reset-size3 size6\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8443em;\"><span style=\"top:-2.656em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span></span></span></span><span style=\"top:-3.2255em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line mtight\" style=\"border-bottom-width:0.049em;\"></span></span><span style=\"top:-3.384em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">3</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.344em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter sizing reset-size3 size6\"></span></span></span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1231em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.2851em;\"><span class=\"svg-align\" style=\"top:-3.8em;\"><span class=\"pstrut\" style=\"height:3.8em;\"></span><span class=\"mord\" style=\"padding-left:1em;\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7959em;\"><span style=\"top:-2.4065em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span><span style=\"top:-3.0448em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2935em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7959em;\"><span style=\"top:-2.4065em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span><span style=\"top:-3.0448em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2935em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.2451em;\"><span class=\"pstrut\" style=\"height:3.8em;\"></span><span class=\"hide-tail\" style=\"min-width:1.02em;height:1.88em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.88em\" viewbox=\"0 0 400000 1944\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M983 90\nl0 -0\nc4,-6.7,10,-10,18,-10 H400000v40\nH1013.1s-83.4,268,-264.1,840c-180.7,572,-277,876.3,-289,913c-4.7,4.7,-12.7,7,-24,7\ns-12,0,-12,0c-1.3,-3.3,-3.7,-11.7,-7,-25c-35.3,-125.3,-106.7,-373.3,-214,-744\nc-10,12,-21,25,-33,39s-32,39,-32,39c-6,-5.3,-15,-14,-27,-26s25,-30,25,-30\nc26.7,-32.7,52,-63,76,-91s52,-60,52,-60s208,722,208,722\nc56,-175.3,126.3,-397.3,211,-666c84.7,-268.7,153.8,-488.2,207.5,-658.5\nc53.7,-170.3,84.5,-266.8,92.5,-289.5z\nM1001 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.5549em;\"><span></span></span></span></span></span><span class=\"mord\">∥</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7387em;\"><span style=\"top:-2.453em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mbin mtight\">∗</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.247em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\">∥</span></span></span><span style=\"top:2.553em;\"><span class=\"pstrut\" style=\"height:3.63em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≍</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\">∥</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7387em;\"><span style=\"top:-2.453em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mbin mtight\">∗</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.247em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\">∥</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.63em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.6938em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9362em;\"><span class=\"svg-align\" style=\"top:-3.2em;\"><span class=\"pstrut\" style=\"height:3.2em;\"></span><span class=\"mord\" style=\"padding-left:1em;\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7959em;\"><span style=\"top:-2.4065em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span><span style=\"top:-3.0448em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2935em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7959em;\"><span style=\"top:-2.4065em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span><span style=\"top:-3.0448em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2935em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.8962em;\"><span class=\"pstrut\" style=\"height:3.2em;\"></span><span class=\"hide-tail\" style=\"min-width:1.02em;height:1.28em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.28em\" viewbox=\"0 0 400000 1296\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M263,681c0.7,0,18,39.7,52,119\nc34,79.3,68.167,158.7,102.5,238c34.3,79.3,51.8,119.3,52.5,120\nc340,-704.7,510.7,-1060.3,512,-1067\nl0 -0\nc4.7,-7.3,11,-11,19,-11\nH40000v40H1012.3\ns-271.3,567,-271.3,567c-38.7,80.7,-84,175,-136,283c-52,108,-89.167,185.3,-111.5,232\nc-22.3,46.7,-33.8,70.3,-34.5,71c-4.7,4.7,-12.3,7,-23,7s-12,-1,-12,-1\ns-109,-253,-109,-253c-72.7,-168,-109.3,-252,-110,-252c-10.7,8,-22,16.7,-34,26\nc-22,17.3,-33.3,26,-34,26s-26,-26,-26,-26s76,-59,76,-59s76,-60,76,-60z\nM1001 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3038em;\"><span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mord\">∥</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7387em;\"><span style=\"top:-2.453em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mbin mtight\">∗</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.247em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\">∥</span><span class=\"mspace\"> </span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:7.169em;\"><span></span></span></span></span></span></span></span></span></span></span></span></p>\n<p>递推得</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi mathvariant=\"normal\">∥</mi><mi mathvariant=\"normal\">Δ</mi><mi>F</mi><mi mathvariant=\"normal\">∥</mi><mo>≤</mo><munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><mn>2</mn><mi>N</mi></mrow></munderover><mfrac><msqrt><mrow><msubsup><mi>w</mi><mi>i</mi><mn>2</mn></msubsup><mo>+</mo><msubsup><mi>v</mi><mi>i</mi><mn>2</mn></msubsup></mrow></msqrt><mi>α</mi></mfrac><mi mathvariant=\"normal\">∥</mi><msubsup><mi>θ</mi><mi>i</mi><mo>∗</mo></msubsup><mo>−</mo><msub><mi>θ</mi><mi>i</mi></msub><mi mathvariant=\"normal\">∥</mi></mrow><annotation encoding=\"application/x-tex\">\\|\\Delta F\\|\\leq\\sum_{i=1}^{2N}\\frac{\\sqrt{w_{i}^2+v_{i}^2}}{\\alpha}\\|\\theta_{i}^*-\\theta_{i}\\|\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">∥Δ</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">F</span><span class=\"mord\">∥</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≤</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:3.106em;vertical-align:-1.2777em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.8283em;\"><span style=\"top:-1.8723em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">i</span><span class=\"mrel mtight\">=</span><span class=\"mord mtight\">1</span></span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span><span style=\"top:-4.3em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.2777em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.63em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.6855em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9445em;\"><span class=\"svg-align\" style=\"top:-3.2em;\"><span class=\"pstrut\" style=\"height:3.2em;\"></span><span class=\"mord\" style=\"padding-left:1em;\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7959em;\"><span style=\"top:-2.4231em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span style=\"top:-3.0448em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2769em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7959em;\"><span style=\"top:-2.4231em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span style=\"top:-3.0448em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2769em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.9045em;\"><span class=\"pstrut\" style=\"height:3.2em;\"></span><span class=\"hide-tail\" style=\"min-width:1.02em;height:1.28em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.28em\" viewbox=\"0 0 400000 1296\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M263,681c0.7,0,18,39.7,52,119\nc34,79.3,68.167,158.7,102.5,238c34.3,79.3,51.8,119.3,52.5,120\nc340,-704.7,510.7,-1060.3,512,-1067\nl0 -0\nc4.7,-7.3,11,-11,19,-11\nH40000v40H1012.3\ns-271.3,567,-271.3,567c-38.7,80.7,-84,175,-136,283c-52,108,-89.167,185.3,-111.5,232\nc-22.3,46.7,-33.8,70.3,-34.5,71c-4.7,4.7,-12.3,7,-23,7s-12,-1,-12,-1\ns-109,-253,-109,-253c-72.7,-168,-109.3,-252,-110,-252c-10.7,8,-22,16.7,-34,26\nc-22,17.3,-33.3,26,-34,26s-26,-26,-26,-26s76,-59,76,-59s76,-60,76,-60z\nM1001 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2955em;\"><span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mord\">∥</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7387em;\"><span style=\"top:-2.453em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mbin mtight\">∗</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.247em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\">∥</span></span></span></span></span></p>\n<p>由于每一个<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>w</mi><mi>i</mi></msub><mo separator=\"true\">,</mo><msub><mi>v</mi><mi>i</mi></msub></mrow><annotation encoding=\"application/x-tex\">w_i,v_i</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span> 的初始化都与scaling factor <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>β</mi></mrow><annotation encoding=\"application/x-tex\">\\beta</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span></span></span></span> 相关， 因此整个Model Update  的 Bound 能通过 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>α</mi><mo separator=\"true\">,</mo><mi>β</mi></mrow><annotation encoding=\"application/x-tex\">\\alpha, \\beta</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span></span></span></span> 控制，我们能通过设计这两个值的大小实现稳定训练Deeper的Network且保证收敛</p>\n",
            "tags": []
        },
        {
            "id": "https://yuuko.site/2026/05/23/CS/LLM/ln/",
            "url": "https://yuuko.site/2026/05/23/CS/LLM/ln/",
            "title": "Transformer 中的 Layer Normalization与梯度稳定性",
            "date_published": "2026-05-22T16:00:00.000Z",
            "content_html": "<link rel=\"stylesheet\" type=\"text/css\" href=\"https://cdn.jsdelivr.net/hint.css/2.4.1/hint.min.css\"><h1><span id=\"on-layer-normalization-in-the-transformer-architecture\"> On Layer Normalization in the Transformer Architecture</span></h1>\n<p>本文介绍了Pre-LN, 将归一化层放置在残差分支，以降低训练初始状态的训练梯度爆炸的现象。通过Post-LN架构进行训练刚需Warm-up(即通过初始降低学习率的方式进行训练)， 本文提出的Pre-LN通过迁移LN层位置的方式降低了整体梯度的稳定性与相对大小。将模型从Warm-up 中解脱出来。</p>\n<h2><span id=\"layer-normalization的作用\"> Layer Normalization的作用</span></h2>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mi>γ</mi><mo>⊙</mo><mfrac><mrow><mi>x</mi><mo>−</mo><mi>μ</mi></mrow><msqrt><mrow><msup><mi>σ</mi><mn>2</mn></msup><mo>+</mo><mi>ε</mi></mrow></msqrt></mfrac><mo>+</mo><mi>β</mi></mrow><annotation encoding=\"application/x-tex\">\\mathrm{LN}(x) = \\gamma \\odot \\frac{x-\\mu}{\\sqrt{\\sigma^2+\\varepsilon}}+\\beta\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">LN</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7778em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05556em;\">γ</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⊙</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.1903em;vertical-align:-0.93em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.2603em;\"><span style=\"top:-2.1966em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9134em;\"><span class=\"svg-align\" style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\" style=\"padding-left:0.833em;\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">σ</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7401em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\">ε</span></span></span><span style=\"top:-2.8734em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"hide-tail\" style=\"min-width:0.853em;height:1.08em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.08em\" viewbox=\"0 0 400000 1080\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M95,702\nc-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14\nc0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54\nc44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10\ns173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429\nc69,-144,104.5,-217.7,106.5,-221\nl0 -0\nc5.3,-9.3,12,-14,20,-14\nH400000v40H845.2724\ns-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7\nc-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z\nM834 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1266em;\"><span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\">μ</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.93em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span></span></span></span></span></p>\n<p>其中 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>γ</mi><mo separator=\"true\">,</mo><mi>β</mi></mrow><annotation encoding=\"application/x-tex\">\\gamma, \\beta</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05556em;\">γ</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span></span></span></span> 是可学习的参数。</p>\n<p>整个 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow><annotation encoding=\"application/x-tex\">\\mathrm{LN}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">LN</span></span></span></span></span>层的作用可视为一个归一化与一个仿射变换作用，内层归一化可表示为</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mrow><mi mathvariant=\"normal\">N</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">r</mi><mi mathvariant=\"normal\">m</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">l</mi></mrow><mo>:</mo><mi>x</mi><mo>→</mo><mfrac><mrow><mi>x</mi><mo>−</mo><mi>μ</mi></mrow><msqrt><mrow><msup><mi>σ</mi><mn>2</mn></msup><mo>+</mo><mi>ε</mi></mrow></msqrt></mfrac></mrow><annotation encoding=\"application/x-tex\">\\mathrm{Normal}: x\\to \\frac{x-\\mu}{\\sqrt{\\sigma^2+\\varepsilon}}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">Normal</span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">:</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">x</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">→</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.1903em;vertical-align:-0.93em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.2603em;\"><span style=\"top:-2.1966em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9134em;\"><span class=\"svg-align\" style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\" style=\"padding-left:0.833em;\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">σ</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7401em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\">ε</span></span></span><span style=\"top:-2.8734em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"hide-tail\" style=\"min-width:0.853em;height:1.08em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.08em\" viewbox=\"0 0 400000 1080\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M95,702\nc-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14\nc0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54\nc44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10\ns173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429\nc69,-144,104.5,-217.7,106.5,-221\nl0 -0\nc5.3,-9.3,12,-14,20,-14\nH400000v40H845.2724\ns-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7\nc-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z\nM834 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1266em;\"><span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\">μ</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.93em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p>归一化的 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mrow><mi mathvariant=\"normal\">N</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">r</mi><mi mathvariant=\"normal\">m</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">l</mi></mrow><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\mathrm{Normal}(x)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">Normal</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span></span></span></span>变为期望<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>0</mn></mrow><annotation encoding=\"application/x-tex\">0</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">0</span></span></span></span>, 方差<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>1</mn></mrow><annotation encoding=\"application/x-tex\">1</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">1</span></span></span></span>的标准向量</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mtable rowspacing=\"0.25em\" columnalign=\"right left\" columnspacing=\"0em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow><mi mathvariant=\"normal\">V</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">r</mi></mrow><mo stretchy=\"false\">(</mo><mrow><mi mathvariant=\"normal\">N</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">r</mi><mi mathvariant=\"normal\">m</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">l</mi></mrow><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mfrac><msup><mi>σ</mi><mn>2</mn></msup><mrow><msup><mi>σ</mi><mn>2</mn></msup><mo>+</mo><mi>ε</mi></mrow></mfrac><mo>→</mo><mn>1</mn></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mi mathvariant=\"normal\">E</mi><mo stretchy=\"false\">(</mo><mrow><mi mathvariant=\"normal\">N</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">r</mi><mi mathvariant=\"normal\">m</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">l</mi></mrow><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mn>0</mn></mrow></mstyle></mtd></mtr></mtable><annotation encoding=\"application/x-tex\">\\begin{aligned}\n\\mathrm{Var}(\\mathrm{Normal}(x)) &amp;= \\frac{\\sigma^2}{\\sigma^2+\\varepsilon}\\to 1 \\\\\n\\mathrm{E}(\\mathrm{Normal}(x)) &amp;= 0\n\\end{aligned}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:4.0604em;vertical-align:-1.7802em;\"></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-r\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.2802em;\"><span style=\"top:-4.2802em;\"><span class=\"pstrut\" style=\"height:3.4911em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathrm\">Var</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathrm\">Normal</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">))</span></span></span><span style=\"top:-2.3709em;\"><span class=\"pstrut\" style=\"height:3.4911em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">E</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathrm\">Normal</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">))</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.7802em;\"><span></span></span></span></span></span><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.2802em;\"><span style=\"top:-4.2802em;\"><span class=\"pstrut\" style=\"height:3.4911em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.4911em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">σ</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7401em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\">ε</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">σ</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8141em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7693em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">→</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\">1</span></span></span><span style=\"top:-2.3709em;\"><span class=\"pstrut\" style=\"height:3.4911em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\">0</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.7802em;\"><span></span></span></span></span></span></span></span></span></span></span></span></p>\n<p>可学习的参数 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>γ</mi><mo separator=\"true\">,</mo><mi>β</mi></mrow><annotation encoding=\"application/x-tex\">\\gamma, \\beta</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05556em;\">γ</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span></span></span></span> 能改变向量的整体期望与均值以增强 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow><annotation encoding=\"application/x-tex\">\\mathrm{LN}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">LN</span></span></span></span></span> 层的调节能力。</p>\n<h2><span id=\"post-layer-normalization-的-梯度爆炸与-warm-up\"> Post-Layer Normalization 的 梯度爆炸与 Warm-up</span></h2>\n<p>本文对于Multi Head Attention 的梯度阶估计过程提出了一个简化计算的模型，再通过实验论证假设对于完整的MHA Residue Flow 也成立。</p>\n<p>初始权重使用Xavier初始化，每个权重<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>w</mi></mrow><annotation encoding=\"application/x-tex\">w</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span></span></span></span>满足</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mrow><mi mathvariant=\"normal\">V</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">r</mi></mrow><mo stretchy=\"false\">(</mo><mi>w</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mfrac><mn>2</mn><mrow><msub><mi>n</mi><mrow><mi>i</mi><mi>n</mi></mrow></msub><mo>+</mo><msub><mi>n</mi><mrow><mi>o</mi><mi>u</mi><mi>t</mi></mrow></msub></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\mathrm{Var}(w) = \\frac{2}{n_{in}+n_{out}} \n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">Var</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.1574em;vertical-align:-0.836em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3214em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">n</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">in</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">n</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">o</span><span class=\"mord mathnormal mtight\">u</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.836em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p>Post-LN 层的一次前向传播的公式</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mo fence=\"true\">{</mo><mtable rowspacing=\"0.36em\" columnalign=\"left left\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><msubsup><mover accent=\"true\"><mi>x</mi><mo>~</mo></mover><mi>t</mi><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mo>=</mo><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow><mo stretchy=\"false\">(</mo><msubsup><mi>x</mi><mi>t</mi><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mo>+</mo><mrow><mi mathvariant=\"normal\">M</mi><mi mathvariant=\"normal\">H</mi><mi mathvariant=\"normal\">A</mi></mrow><mo stretchy=\"false\">(</mo><msubsup><mi>x</mi><mi>t</mi><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><msubsup><mi>x</mi><mrow><mi>t</mi><mo>+</mo><mn>1</mn></mrow><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mo>=</mo><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow><mo stretchy=\"false\">(</mo><msubsup><mover accent=\"true\"><mi>x</mi><mo>~</mo></mover><mi>t</mi><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mo>+</mo><mrow><mi mathvariant=\"normal\">F</mi><mi mathvariant=\"normal\">F</mi><mi mathvariant=\"normal\">N</mi></mrow><mo stretchy=\"false\">(</mo><msubsup><mover accent=\"true\"><mi>x</mi><mo>~</mo></mover><mi>t</mi><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr></mtable></mrow><annotation encoding=\"application/x-tex\">\\begin{dcases}\n\\tilde{x}^{post}_t = \\mathrm{LN}(x^{post}_t+\\mathrm{MHA}(x^{post}_t))\\\\\nx^{post}_{t+1} = \\mathrm{LN}(\\tilde{x}^{post}_t+ \\mathrm{FFN}(\\tilde{x}^{post}_t))\n\\end{dcases}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:3em;vertical-align:-1.25em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size4\">{</span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.69em;\"><span style=\"top:-3.69em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">x</span></span><span style=\"top:-3.35em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.2222em;\"><span class=\"mord\">~</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">LN</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">MHA</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mclose\">))</span></span></span><span style=\"top:-2.25em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4337em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">t</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3246em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">LN</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">x</span></span><span style=\"top:-3.35em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.2222em;\"><span class=\"mord\">~</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">FFN</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">x</span></span><span style=\"top:-3.35em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.2222em;\"><span class=\"mord\">~</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mclose\">))</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.19em;\"><span></span></span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p>Pre-LN 层的一次前向传播的公式</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mo fence=\"true\">{</mo><mtable rowspacing=\"0.36em\" columnalign=\"left left\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><msubsup><mover accent=\"true\"><mi>x</mi><mo>~</mo></mover><mi>t</mi><mrow><mi>p</mi><mi>r</mi><mi>e</mi></mrow></msubsup><mo>=</mo><msubsup><mi>x</mi><mi>t</mi><mrow><mi>p</mi><mi>r</mi><mi>e</mi></mrow></msubsup><mo>+</mo><mrow><mi mathvariant=\"normal\">M</mi><mi mathvariant=\"normal\">H</mi><mi mathvariant=\"normal\">A</mi></mrow><mo stretchy=\"false\">(</mo><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow><mo stretchy=\"false\">(</mo><msubsup><mi>x</mi><mi>t</mi><mrow><mi>p</mi><mi>r</mi><mi>e</mi></mrow></msubsup><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><msubsup><mi>x</mi><mrow><mi>t</mi><mo>+</mo><mn>1</mn></mrow><mrow><mi>p</mi><mi>r</mi><mi>e</mi></mrow></msubsup><mo>=</mo><msubsup><mover accent=\"true\"><mi>x</mi><mo>~</mo></mover><mi>t</mi><mrow><mi>p</mi><mi>r</mi><mi>e</mi></mrow></msubsup><mo>+</mo><mrow><mi mathvariant=\"normal\">F</mi><mi mathvariant=\"normal\">F</mi><mi mathvariant=\"normal\">N</mi></mrow><mo stretchy=\"false\">(</mo><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow><mo stretchy=\"false\">(</mo><msubsup><mover accent=\"true\"><mi>x</mi><mo>~</mo></mover><mi>t</mi><mrow><mi>p</mi><mi>r</mi><mi>e</mi></mrow></msubsup><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr></mtable></mrow><annotation encoding=\"application/x-tex\">\\begin{dcases}\n\\tilde{x}^{pre}_t = x^{pre}_t + \\mathrm{MHA}(\\mathrm{LN}(x^{pre}_t))\\\\\nx^{pre}_{t+1} = \\tilde{x}^{pre}_t+\\mathrm{FFN}(\\mathrm{LN}(\\tilde{x}^{pre}_t))\n\\end{dcases}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:3em;vertical-align:-1.25em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size4\">{</span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.69em;\"><span style=\"top:-3.69em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">x</span></span><span style=\"top:-3.35em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.2222em;\"><span class=\"mord\">~</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">MHA</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathrm\">LN</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mclose\">))</span></span></span><span style=\"top:-2.25em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.4337em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">t</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3246em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">x</span></span><span style=\"top:-3.35em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.2222em;\"><span class=\"mord\">~</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">FFN</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathrm\">LN</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">x</span></span><span style=\"top:-3.35em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.2222em;\"><span class=\"mord\">~</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mclose\">))</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.19em;\"><span></span></span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p>记</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>J</mi><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow></msub><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo></mrow><mrow><mi mathvariant=\"normal\">∂</mi><mi>x</mi></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">J_{\\mathrm{LN}}(x) = \\frac{\\partial \\mathrm{LN}(x)}{\\partial x}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.09618em;\">J</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0962em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathrm mtight\">LN</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.113em;vertical-align:-0.686em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.427em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal\">x</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathrm\">LN</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p>为 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow><annotation encoding=\"application/x-tex\">\\mathrm{LN}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">LN</span></span></span></span></span>层的Jacobian 矩阵</p>\n<p>则Post-LN满足</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mtable rowspacing=\"0.25em\" columnalign=\"right left\" columnspacing=\"0em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mi mathvariant=\"normal\">d</mi><msubsup><mover accent=\"true\"><mi>x</mi><mo>~</mo></mover><mi>t</mi><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><msub><mi>J</mi><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow></msub><mo stretchy=\"false\">(</mo><msubsup><mi>x</mi><mi>t</mi><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mo>+</mo><mrow><mi mathvariant=\"normal\">M</mi><mi mathvariant=\"normal\">H</mi><mi mathvariant=\"normal\">A</mi></mrow><mo stretchy=\"false\">(</mo><msubsup><mi>x</mi><mi>t</mi><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo><mo>⋅</mo><mo stretchy=\"false\">(</mo><mi mathvariant=\"normal\">d</mi><msubsup><mi>x</mi><mi>t</mi><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mo>+</mo><mi mathvariant=\"normal\">d</mi><mrow><mi mathvariant=\"normal\">M</mi><mi mathvariant=\"normal\">H</mi><mi mathvariant=\"normal\">A</mi></mrow><mo stretchy=\"false\">(</mo><msubsup><mi>x</mi><mi>t</mi><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><msub><mi>J</mi><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow></msub><mo stretchy=\"false\">(</mo><msubsup><mi>x</mi><mi>t</mi><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mo>+</mo><mrow><mi mathvariant=\"normal\">M</mi><mi mathvariant=\"normal\">H</mi><mi mathvariant=\"normal\">A</mi></mrow><mo stretchy=\"false\">(</mo><msubsup><mi>x</mi><mi>t</mi><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo><mo>⋅</mo><mo stretchy=\"false\">(</mo><mi>I</mi><mo>+</mo><msub><mi>J</mi><mrow><mi mathvariant=\"normal\">M</mi><mi mathvariant=\"normal\">H</mi><mi mathvariant=\"normal\">A</mi></mrow></msub><mo stretchy=\"false\">(</mo><msubsup><mi>x</mi><mi>t</mi><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo><mo>⋅</mo><mi mathvariant=\"normal\">d</mi><msubsup><mi>x</mi><mi>t</mi><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup></mrow></mstyle></mtd></mtr></mtable><annotation encoding=\"application/x-tex\">\\begin{aligned}\n\\mathrm{d}\\tilde{x}^{post}_t &amp;= J_{\\mathrm{LN}}(x^{post}_t+\\mathrm{MHA}(x^{post}_t))\\cdot (\\mathrm{d} x^{post}_t+ \\mathrm{d} \\mathrm{MHA}(x^{post}_t))\\\\\n&amp;=J_{\\mathrm{LN}}(x^{post}_t+\\mathrm{MHA}(x^{post}_t))\\cdot(I+J_{\\mathrm{MHA}}(x^{post}_t)) \\cdot \\mathrm{d}x^{post}_t\n\\end{aligned}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:3.1429em;vertical-align:-1.3215em;\"></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-r\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.8215em;\"><span style=\"top:-3.91em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">d</span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">x</span></span><span style=\"top:-3.35em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.2222em;\"><span class=\"mord\">~</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.3385em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3215em;\"><span></span></span></span></span></span><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.8215em;\"><span style=\"top:-3.91em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.09618em;\">J</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0962em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathrm mtight\">LN</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">MHA</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mclose\">))</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mopen\">(</span><span class=\"mord mathrm\">d</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathrm\">d</span><span class=\"mord\"><span class=\"mord mathrm\">MHA</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mclose\">))</span></span></span><span style=\"top:-2.3385em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.09618em;\">J</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0962em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathrm mtight\">LN</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">MHA</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mclose\">))</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">I</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.09618em;\">J</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0962em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathrm mtight\">MHA</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mclose\">))</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathrm\">d</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3215em;\"><span></span></span></span></span></span></span></span></span></span></span></span></p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mtable rowspacing=\"0.25em\" columnalign=\"right\" columnspacing><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mi mathvariant=\"normal\">d</mi><msubsup><mi>x</mi><mrow><mi>t</mi><mo>+</mo><mn>1</mn></mrow><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mo>=</mo><msub><mi>J</mi><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow></msub><mo stretchy=\"false\">(</mo><msubsup><mover accent=\"true\"><mi>x</mi><mo>~</mo></mover><mi>t</mi><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mo>+</mo><mrow><mi mathvariant=\"normal\">F</mi><mi mathvariant=\"normal\">F</mi><mi mathvariant=\"normal\">N</mi></mrow><mo stretchy=\"false\">(</mo><msubsup><mover accent=\"true\"><mi>x</mi><mo>~</mo></mover><mi>t</mi><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo><mo>⋅</mo><mo stretchy=\"false\">(</mo><mi>I</mi><mo>+</mo><msub><mi>J</mi><mrow><mi mathvariant=\"normal\">F</mi><mi mathvariant=\"normal\">F</mi><mi mathvariant=\"normal\">N</mi></mrow></msub><mo stretchy=\"false\">(</mo><msubsup><mover accent=\"true\"><mi>x</mi><mo>~</mo></mover><mi>t</mi><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo><mi mathvariant=\"normal\">d</mi><msubsup><mover accent=\"true\"><mi>x</mi><mo>~</mo></mover><mi>t</mi><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup></mrow></mstyle></mtd></mtr></mtable><annotation encoding=\"application/x-tex\">\\begin{aligned}\n\\mathrm{d} x^{post}_{t+1} = J_\\mathrm{LN}(\\tilde{x}^{post}_t+\\mathrm{FFN}(\\tilde{x}^{post}_t))\\cdot (I+J_\\mathrm{FFN}(\\tilde{x}^{post}_t))\\mathrm{d}\\tilde{x}^{post}_t\n\\end{aligned}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.5715em;vertical-align:-0.5357em;\"></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-r\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.0357em;\"><span style=\"top:-3.1243em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">d</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4337em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">t</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3246em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.09618em;\">J</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0962em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathrm mtight\">LN</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">x</span></span><span style=\"top:-3.35em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.2222em;\"><span class=\"mord\">~</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">FFN</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">x</span></span><span style=\"top:-3.35em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.2222em;\"><span class=\"mord\">~</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mclose\">))</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">I</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.09618em;\">J</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0962em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathrm mtight\">FFN</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">x</span></span><span style=\"top:-3.35em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.2222em;\"><span class=\"mord\">~</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mclose\">))</span><span class=\"mord mathrm\">d</span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">x</span></span><span style=\"top:-3.35em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.2222em;\"><span class=\"mord\">~</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.5357em;\"><span></span></span></span></span></span></span></span></span></span></span></span></p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><msubsup><mi>x</mi><mrow><mi>t</mi><mo>+</mo><mn>1</mn></mrow><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup></mrow><mrow><mi mathvariant=\"normal\">∂</mi><msubsup><mi>x</mi><mi>t</mi><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup></mrow></mfrac><mo>=</mo><msub><mi>J</mi><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow></msub><mo stretchy=\"false\">(</mo><msubsup><mover accent=\"true\"><mi>x</mi><mo>~</mo></mover><mi>t</mi><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mo>+</mo><mrow><mi mathvariant=\"normal\">F</mi><mi mathvariant=\"normal\">F</mi><mi mathvariant=\"normal\">N</mi></mrow><mo stretchy=\"false\">(</mo><msubsup><mover accent=\"true\"><mi>x</mi><mo>~</mo></mover><mi>t</mi><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo><mo>⋅</mo><mo stretchy=\"false\">(</mo><mi>I</mi><mo>+</mo><msub><mi>J</mi><mrow><mi mathvariant=\"normal\">F</mi><mi mathvariant=\"normal\">F</mi><mi mathvariant=\"normal\">N</mi></mrow></msub><mo stretchy=\"false\">(</mo><msubsup><mover accent=\"true\"><mi>x</mi><mo>~</mo></mover><mi>t</mi><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo><mo>⋅</mo><msub><mi>J</mi><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow></msub><mo stretchy=\"false\">(</mo><msubsup><mi>x</mi><mi>t</mi><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mo>+</mo><mrow><mi mathvariant=\"normal\">M</mi><mi mathvariant=\"normal\">H</mi><mi mathvariant=\"normal\">A</mi></mrow><mo stretchy=\"false\">(</mo><msubsup><mi>x</mi><mi>t</mi><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo><mo>⋅</mo><mo stretchy=\"false\">(</mo><mi>I</mi><mo>+</mo><msub><mi>J</mi><mrow><mi mathvariant=\"normal\">M</mi><mi mathvariant=\"normal\">H</mi><mi mathvariant=\"normal\">A</mi></mrow></msub><mo stretchy=\"false\">(</mo><msubsup><mi>x</mi><mi>t</mi><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\frac{\\partial x^{post}_{t+1}}{\\partial x^{post}_{t}} = J_\\mathrm{LN}(\\tilde{x}^{post}_t+\\mathrm{FFN}(\\tilde{x}^{post}_t))\\cdot (I+J_\\mathrm{FFN}(\\tilde{x}^{post}_t))\\cdot J_{\\mathrm{LN}}(x^{post}_t+\\mathrm{MHA}(x^{post}_t))\\cdot(I+J_{\\mathrm{MHA}}(x^{post}_t))\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:2.6733em;vertical-align:-1.0472em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.6261em;\"><span style=\"top:-2.1985em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.7146em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4337em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">t</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3246em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.0472em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.1615em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.09618em;\">J</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0962em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathrm mtight\">LN</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">x</span></span><span style=\"top:-3.35em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.2222em;\"><span class=\"mord\">~</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.1615em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">FFN</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">x</span></span><span style=\"top:-3.35em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.2222em;\"><span class=\"mord\">~</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mclose\">))</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">I</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.1615em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.09618em;\">J</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0962em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathrm mtight\">FFN</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">x</span></span><span style=\"top:-3.35em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.2222em;\"><span class=\"mord\">~</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mclose\">))</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.1615em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.09618em;\">J</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0962em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathrm mtight\">LN</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.1615em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">MHA</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mclose\">))</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">I</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.1615em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.09618em;\">J</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0962em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathrm mtight\">MHA</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mclose\">))</span></span></span></span></span></p>\n<p>Pre-LN满足</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mtable rowspacing=\"0.25em\" columnalign=\"right left\" columnspacing=\"0em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mi mathvariant=\"normal\">d</mi><msubsup><mover accent=\"true\"><mi>x</mi><mo>~</mo></mover><mi>t</mi><mrow><mi>p</mi><mi>r</mi><mi>e</mi></mrow></msubsup></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mo stretchy=\"false\">(</mo><mi>I</mi><mo>+</mo><msub><mi>J</mi><mrow><mi mathvariant=\"normal\">M</mi><mi mathvariant=\"normal\">H</mi><mi mathvariant=\"normal\">A</mi></mrow></msub><mo stretchy=\"false\">(</mo><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow><mo stretchy=\"false\">(</mo><msubsup><mi>x</mi><mi>t</mi><mrow><mi>p</mi><mi>r</mi><mi>e</mi></mrow></msubsup><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo><mo>⋅</mo><msub><mi>J</mi><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow></msub><mo stretchy=\"false\">(</mo><msubsup><mi>x</mi><mi>t</mi><mrow><mi>p</mi><mi>r</mi><mi>e</mi></mrow></msubsup><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo><mi mathvariant=\"normal\">d</mi><msubsup><mi>x</mi><mi>t</mi><mrow><mi>p</mi><mi>r</mi><mi>e</mi></mrow></msubsup></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mi mathvariant=\"normal\">d</mi><msubsup><mi>x</mi><mrow><mi>t</mi><mo>+</mo><mn>1</mn></mrow><mrow><mi>p</mi><mi>r</mi><mi>e</mi></mrow></msubsup></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mo stretchy=\"false\">(</mo><mi>I</mi><mo>+</mo><msub><mi>J</mi><mrow><mi mathvariant=\"normal\">F</mi><mi mathvariant=\"normal\">F</mi><mi mathvariant=\"normal\">N</mi></mrow></msub><mo stretchy=\"false\">(</mo><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow><mo stretchy=\"false\">(</mo><msubsup><mover accent=\"true\"><mi>x</mi><mo>~</mo></mover><mi>t</mi><mrow><mi>p</mi><mi>r</mi><mi>e</mi></mrow></msubsup><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo><mo>⋅</mo><msub><mi>J</mi><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow></msub><mo stretchy=\"false\">(</mo><msubsup><mover accent=\"true\"><mi>x</mi><mo>~</mo></mover><mi>t</mi><mrow><mi>p</mi><mi>r</mi><mi>e</mi></mrow></msubsup><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo><mi mathvariant=\"normal\">d</mi><msubsup><mover accent=\"true\"><mi>x</mi><mo>~</mo></mover><mi>t</mi><mrow><mi>p</mi><mi>r</mi><mi>e</mi></mrow></msubsup></mrow></mstyle></mtd></mtr></mtable><annotation encoding=\"application/x-tex\">\\begin{aligned}\n\\mathrm{d} \\tilde{x}^{pre}_{t} &amp;= (I + J_{\\mathrm{MHA}}(\\mathrm{LN}(x^{pre}_t))\\cdot J_{\\mathrm{LN}}(x^{pre}_t))\\mathrm{d}x^{pre}_t\\\\\n\\mathrm{d} x^{pre}_{t+1}&amp;=(I + J_{\\mathrm{FFN}}(\\mathrm{LN}(\\tilde{x}^{pre}_t))\\cdot J_{\\mathrm{LN}}(\\tilde{x}^{pre}_t))\\mathrm{d}\\tilde{x}^{pre}_t\n\\end{aligned}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:3em;vertical-align:-1.25em;\"></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-r\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.75em;\"><span style=\"top:-3.91em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">d</span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">x</span></span><span style=\"top:-3.35em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.2222em;\"><span class=\"mord\">~</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.41em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">d</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.4337em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">t</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3246em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.25em;\"><span></span></span></span></span></span><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.75em;\"><span style=\"top:-3.91em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">I</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.09618em;\">J</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0962em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathrm mtight\">MHA</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathrm\">LN</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mclose\">))</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.09618em;\">J</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0962em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathrm mtight\">LN</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mclose\">))</span><span class=\"mord mathrm\">d</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.41em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">I</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.09618em;\">J</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0962em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathrm mtight\">FFN</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathrm\">LN</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">x</span></span><span style=\"top:-3.35em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.2222em;\"><span class=\"mord\">~</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mclose\">))</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.09618em;\">J</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0962em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathrm mtight\">LN</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">x</span></span><span style=\"top:-3.35em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.2222em;\"><span class=\"mord\">~</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mclose\">))</span><span class=\"mord mathrm\">d</span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">x</span></span><span style=\"top:-3.35em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.2222em;\"><span class=\"mord\">~</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.25em;\"><span></span></span></span></span></span></span></span></span></span></span></span></p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><msubsup><mi>x</mi><mrow><mi>t</mi><mo>+</mo><mn>1</mn></mrow><mrow><mi>p</mi><mi>r</mi><mi>e</mi></mrow></msubsup></mrow><mrow><mi mathvariant=\"normal\">∂</mi><msubsup><mi>x</mi><mi>t</mi><mrow><mi>p</mi><mi>r</mi><mi>e</mi></mrow></msubsup></mrow></mfrac><mo>=</mo><mo stretchy=\"false\">(</mo><mi>I</mi><mo>+</mo><msub><mi>J</mi><mrow><mi mathvariant=\"normal\">F</mi><mi mathvariant=\"normal\">F</mi><mi mathvariant=\"normal\">N</mi></mrow></msub><mo stretchy=\"false\">(</mo><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow><mo stretchy=\"false\">(</mo><msubsup><mover accent=\"true\"><mi>x</mi><mo>~</mo></mover><mi>t</mi><mrow><mi>p</mi><mi>r</mi><mi>e</mi></mrow></msubsup><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo><mo>⋅</mo><msub><mi>J</mi><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow></msub><mo stretchy=\"false\">(</mo><msubsup><mover accent=\"true\"><mi>x</mi><mo>~</mo></mover><mi>t</mi><mrow><mi>p</mi><mi>r</mi><mi>e</mi></mrow></msubsup><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo><mo>⋅</mo><mo stretchy=\"false\">(</mo><mi>I</mi><mo>+</mo><msub><mi>J</mi><mrow><mi mathvariant=\"normal\">M</mi><mi mathvariant=\"normal\">H</mi><mi mathvariant=\"normal\">A</mi></mrow></msub><mo stretchy=\"false\">(</mo><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow><mo stretchy=\"false\">(</mo><msubsup><mi>x</mi><mi>t</mi><mrow><mi>p</mi><mi>r</mi><mi>e</mi></mrow></msubsup><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo><mo>⋅</mo><msub><mi>J</mi><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow></msub><mo stretchy=\"false\">(</mo><msubsup><mi>x</mi><mi>t</mi><mrow><mi>p</mi><mi>r</mi><mi>e</mi></mrow></msubsup><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\frac{\\partial x^{pre}_{t+1}}{\\partial x^{pre}_t} = (I + J_{\\mathrm{FFN}}(\\mathrm{LN}(\\tilde{x}^{pre}_t))\\cdot J_{\\mathrm{LN}}(\\tilde{x}^{pre}_t))\\cdot(I + J_{\\mathrm{MHA}}(\\mathrm{LN}(x^{pre}_t))\\cdot J_{\\mathrm{LN}}(x^{pre}_t))\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:2.4287em;vertical-align:-0.9318em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.4969em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.7146em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.4337em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">t</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3246em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9318em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">I</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.0323em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.09618em;\">J</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0962em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathrm mtight\">FFN</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathrm\">LN</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">x</span></span><span style=\"top:-3.35em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.2222em;\"><span class=\"mord\">~</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mclose\">))</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.0323em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.09618em;\">J</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0962em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathrm mtight\">LN</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">x</span></span><span style=\"top:-3.35em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.2222em;\"><span class=\"mord\">~</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mclose\">))</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">I</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.0323em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.09618em;\">J</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0962em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathrm mtight\">MHA</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathrm\">LN</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mclose\">))</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.0323em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.09618em;\">J</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0962em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathrm mtight\">LN</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.4542em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2458em;\"><span></span></span></span></span></span></span><span class=\"mclose\">))</span></span></span></span></span></p>\n<h3><span id=\"mha贡献的梯度流动\"> MHA贡献的梯度流动</span></h3>\n<p>基于本文关于MHA的假定，有<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>W</mi><mi>Q</mi></msub><mo>=</mo><msub><mi>W</mi><mi>K</mi></msub><mo>=</mo><mn>0</mn></mrow><annotation encoding=\"application/x-tex\">W_Q = W_K = 0</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.9694em;vertical-align:-0.2861em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">Q</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.07153em;\">K</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">0</span></span></span></span>, 因此单一Attention头的输出为</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mtable rowspacing=\"0.25em\" columnalign=\"right left\" columnspacing=\"0em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mi>h</mi></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mrow><mi mathvariant=\"normal\">S</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">f</mi><mi mathvariant=\"normal\">t</mi><mi mathvariant=\"normal\">m</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">x</mi></mrow><mo stretchy=\"false\">(</mo><mfrac><mrow><mi>Q</mi><msup><mi>K</mi><mi>T</mi></msup></mrow><msqrt><mi>d</mi></msqrt></mfrac><mo stretchy=\"false\">)</mo><mo>⋅</mo><mi>V</mi></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mrow><mi mathvariant=\"normal\">S</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">f</mi><mi mathvariant=\"normal\">t</mi><mi mathvariant=\"normal\">m</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">x</mi></mrow><mo stretchy=\"false\">(</mo><mn mathvariant=\"bold\">0</mn><mo stretchy=\"false\">)</mo><mo>⋅</mo><mi>X</mi><mo>⋅</mo><msub><mi>W</mi><mi>V</mi></msub></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mfrac><mn>1</mn><mi>n</mi></mfrac><mi>X</mi><mo>⋅</mo><msub><mi>W</mi><mi>V</mi></msub></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mfrac><mn>1</mn><mi>n</mi></mfrac><munderover><mo>∑</mo><mrow><mi>j</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></munderover><msup><mi>x</mi><mi>j</mi></msup><msubsup><mi>w</mi><mi>V</mi><mi>j</mi></msubsup></mrow></mstyle></mtd></mtr></mtable><annotation encoding=\"application/x-tex\">\\begin{aligned}\nh &amp;= \\mathrm{Softmax}(\\frac{QK^T}{\\sqrt{d}})\\cdot V\\\\\n&amp; = \\mathrm{Softmax}(\\bold{0}) \\cdot X\\cdot W_V\\\\\n&amp; = \\frac{1}{n}X\\cdot W_V\\\\\n&amp; = \\frac{1}{n} \\sum_{j=1}^n x^j w_{V}^j\n\\end{aligned}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:9.9209em;vertical-align:-4.7105em;\"></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-r\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:5.2105em;\"><span style=\"top:-7.3435em;\"><span class=\"pstrut\" style=\"height:3.6514em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">h</span></span></span><span style=\"top:-5.2735em;\"><span class=\"pstrut\" style=\"height:3.6514em;\"></span><span class=\"mord\"></span></span><span style=\"top:-3.2921em;\"><span class=\"pstrut\" style=\"height:3.6514em;\"></span><span class=\"mord\"></span></span><span style=\"top:-0.6547em;\"><span class=\"pstrut\" style=\"height:3.6514em;\"></span><span class=\"mord\"></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4.7105em;\"><span></span></span></span></span></span><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:5.2105em;\"><span style=\"top:-7.3435em;\"><span class=\"pstrut\" style=\"height:3.6514em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">Softmax</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.5183em;\"><span style=\"top:-2.1778em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9322em;\"><span class=\"svg-align\" style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\" style=\"padding-left:0.833em;\"><span class=\"mord mathnormal\">d</span></span></span><span style=\"top:-2.8922em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"hide-tail\" style=\"min-width:0.853em;height:1.08em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.08em\" viewbox=\"0 0 400000 1080\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M95,702\nc-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14\nc0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54\nc44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10\ns173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429\nc69,-144,104.5,-217.7,106.5,-221\nl0 -0\nc5.3,-9.3,12,-14,20,-14\nH400000v40H845.2724\ns-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7\nc-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z\nM834 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1078em;\"><span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">Q</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8413em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em;\">T</span></span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.93em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em;\">V</span></span></span><span style=\"top:-5.2735em;\"><span class=\"pstrut\" style=\"height:3.6514em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">Softmax</span></span><span class=\"mopen\">(</span><span class=\"mord mathbf\">0</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.22222em;\">V</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.2921em;\"><span class=\"pstrut\" style=\"height:3.6514em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3214em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">n</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.22222em;\">V</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-0.6547em;\"><span class=\"pstrut\" style=\"height:3.6514em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3214em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">n</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.6514em;\"><span style=\"top:-1.8723em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span><span class=\"mrel mtight\">=</span><span class=\"mord mtight\">1</span></span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span><span style=\"top:-4.3em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.4138em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8747em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9426em;\"><span style=\"top:-2.4065em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.22222em;\">V</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2935em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4.7105em;\"><span></span></span></span></span></span></span></span></span></span></span></span></p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mrow><mi mathvariant=\"normal\">M</mi><mi mathvariant=\"normal\">H</mi><mi mathvariant=\"normal\">A</mi></mrow><mo stretchy=\"false\">(</mo><mi>X</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mrow><mi mathvariant=\"normal\">C</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">n</mi><mi mathvariant=\"normal\">c</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">t</mi></mrow><mo stretchy=\"false\">(</mo><msub><mi>h</mi><mn>1</mn></msub><mo separator=\"true\">,</mo><mo>⋯</mo><mtext> </mtext><mo separator=\"true\">,</mo><msub><mi>h</mi><mi>n</mi></msub><mo stretchy=\"false\">)</mo><mo>⋅</mo><msub><mi>W</mi><mi>O</mi></msub></mrow><annotation encoding=\"application/x-tex\">\\mathrm{MHA}(X) = \\mathrm{Concat}(h_1,\\cdots, h_n)\\cdot W_O\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">MHA</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">Concat</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">h</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\">⋯</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">h</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em;\">O</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span></span></p>\n<p>计算MHA的微分</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mtable rowspacing=\"0.25em\" columnalign=\"right left\" columnspacing=\"0em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow><mi mathvariant=\"normal\">d</mi><mtext> </mtext><mi mathvariant=\"normal\">M</mi><mi mathvariant=\"normal\">H</mi><mi mathvariant=\"normal\">A</mi></mrow><mo stretchy=\"false\">(</mo><mi>X</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mrow><mi mathvariant=\"normal\">d</mi><mtext> </mtext><mi mathvariant=\"normal\">C</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">n</mi><mi mathvariant=\"normal\">c</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">t</mi></mrow><mo stretchy=\"false\">(</mo><msub><mi>h</mi><mn>1</mn></msub><mo separator=\"true\">,</mo><mo>⋯</mo><mtext> </mtext><mo separator=\"true\">,</mo><msub><mi>h</mi><mi>n</mi></msub><mo stretchy=\"false\">)</mo><mo>⋅</mo><msub><mi>W</mi><mi>O</mi></msub><mo>+</mo><mrow><mi mathvariant=\"normal\">C</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">n</mi><mi mathvariant=\"normal\">c</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">t</mi></mrow><mo stretchy=\"false\">(</mo><msub><mi>h</mi><mn>1</mn></msub><mo separator=\"true\">,</mo><mo>⋯</mo><mtext> </mtext><mo separator=\"true\">,</mo><msub><mi>h</mi><mi>n</mi></msub><mo stretchy=\"false\">)</mo><mo>⋅</mo><mi mathvariant=\"normal\">d</mi><msub><mi>W</mi><mi>O</mi></msub></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mrow><mi mathvariant=\"normal\">C</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">n</mi><mi mathvariant=\"normal\">c</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">t</mi></mrow><mo stretchy=\"false\">(</mo><mi mathvariant=\"normal\">d</mi><msub><mi>h</mi><mn>1</mn></msub><mo separator=\"true\">,</mo><mo>⋯</mo><mtext> </mtext><mo separator=\"true\">,</mo><mi mathvariant=\"normal\">d</mi><msub><mi>h</mi><mi>n</mi></msub><mo stretchy=\"false\">)</mo><mo>⋅</mo><msub><mi>W</mi><mi>O</mi></msub><mo>+</mo><mrow><mi mathvariant=\"normal\">C</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">n</mi><mi mathvariant=\"normal\">c</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">t</mi></mrow><mo stretchy=\"false\">(</mo><msub><mi>h</mi><mn>1</mn></msub><mo separator=\"true\">,</mo><mo>⋯</mo><mtext> </mtext><mo separator=\"true\">,</mo><msub><mi>h</mi><mi>n</mi></msub><mo stretchy=\"false\">)</mo><mo>⋅</mo><mi mathvariant=\"normal\">d</mi><msub><mi>W</mi><mi>O</mi></msub></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mrow><mi mathvariant=\"normal\">C</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">n</mi><mi mathvariant=\"normal\">c</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">t</mi></mrow><mo stretchy=\"false\">(</mo><mi mathvariant=\"normal\">d</mi><mi>X</mi><mo>⋅</mo><msubsup><mi>W</mi><mi>V</mi><mi>i</mi></msubsup><mo stretchy=\"false\">)</mo><mo>⋅</mo><msub><mi>W</mi><mi>O</mi></msub></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mi mathvariant=\"normal\">d</mi><mi>X</mi><mo>⋅</mo><mrow><mi mathvariant=\"normal\">C</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">n</mi><mi mathvariant=\"normal\">c</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">t</mi></mrow><mo stretchy=\"false\">(</mo><msubsup><mi>W</mi><mi>V</mi><mi>i</mi></msubsup><mo stretchy=\"false\">)</mo><mo>⋅</mo><msub><mi>W</mi><mi>O</mi></msub></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mfrac><mn>1</mn><mi>n</mi></mfrac><munderover><mo>∑</mo><mrow><mi>j</mi><mo>=</mo><mi>i</mi></mrow><mi>n</mi></munderover><mrow><mo fence=\"true\">(</mo><mi mathvariant=\"normal\">d</mi><msup><mi>x</mi><mi>j</mi></msup><mo fence=\"true\">)</mo></mrow><mo>⋅</mo><mrow><mi mathvariant=\"normal\">C</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">n</mi><mi mathvariant=\"normal\">c</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">t</mi></mrow><mo stretchy=\"false\">(</mo><msubsup><mi>W</mi><mi>V</mi><mi>i</mi></msubsup><mo stretchy=\"false\">)</mo><mo>⋅</mo><msub><mi>W</mi><mi>O</mi></msub></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>:</mo><mo>=</mo><mfrac><mn>1</mn><mi>n</mi></mfrac><munderover><mo>∑</mo><mrow><mi>j</mi><mo>=</mo><mi>i</mi></mrow><mi>n</mi></munderover><mrow><mo fence=\"true\">(</mo><mi mathvariant=\"normal\">d</mi><msup><mi>x</mi><mi>j</mi></msup><mo fence=\"true\">)</mo></mrow><msub><mi>W</mi><mrow><mi>V</mi><mo separator=\"true\">,</mo><mi>l</mi></mrow></msub></mrow></mstyle></mtd></mtr></mtable><annotation encoding=\"application/x-tex\">\\begin{aligned}\n\\mathrm{d\\,MHA}(X)&amp;= \\mathrm{d\\,Concat}(h_1,\\cdots, h_n)\\cdot W_O+ \\mathrm{Concat}(h_1,\\cdots, h_n)\\cdot \\mathrm{d}W_O \\\\\n&amp; = \\mathrm{Concat}(\\mathrm{d}h_1,\\cdots,\\mathrm{d}h_n)\\cdot W_O + \\mathrm{Concat}(h_1,\\cdots, h_n)\\cdot \\mathrm{d}W_O\\\\\n&amp; = \\mathrm{Concat}(\\mathrm{d}X \\cdot W_V^i)\\cdot W_O\\\\\n&amp; = \\mathrm{d}X\\cdot \\mathrm{Concat}(W_V^i)\\cdot W_O\\\\\n&amp; = \\frac{1}{n}\\sum_{j=i}^n\\left(\\mathrm{d}x^j\\right)\\cdot \\mathrm{Concat}(W_V^i)\\cdot W_O\\\\\n&amp; :=\\frac{1}{n}\\sum_{j=i}^n\\left(\\mathrm{d}x^j\\right) W_{V,l}\n\\end{aligned}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:12.7997em;vertical-align:-6.1498em;\"></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-r\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:6.6498em;\"><span style=\"top:-9.4612em;\"><span class=\"pstrut\" style=\"height:3.6514em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathrm\">d</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathrm\">MHA</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span><span class=\"mclose\">)</span></span></span><span style=\"top:-7.9612em;\"><span class=\"pstrut\" style=\"height:3.6514em;\"></span><span class=\"mord\"></span></span><span style=\"top:-6.4266em;\"><span class=\"pstrut\" style=\"height:3.6514em;\"></span><span class=\"mord\"></span></span><span style=\"top:-4.8919em;\"><span class=\"pstrut\" style=\"height:3.6514em;\"></span><span class=\"mord\"></span></span><span style=\"top:-2.5805em;\"><span class=\"pstrut\" style=\"height:3.6514em;\"></span><span class=\"mord\"></span></span><span style=\"top:0.7847em;\"><span class=\"pstrut\" style=\"height:3.6514em;\"></span><span class=\"mord\"></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:6.1498em;\"><span></span></span></span></span></span><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:6.6498em;\"><span style=\"top:-9.4612em;\"><span class=\"pstrut\" style=\"height:3.6514em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">d</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathrm\">Concat</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">h</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\">⋯</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">h</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em;\">O</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">Concat</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">h</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\">⋯</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">h</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathrm\">d</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em;\">O</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-7.9612em;\"><span class=\"pstrut\" style=\"height:3.6514em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">Concat</span></span><span class=\"mopen\">(</span><span class=\"mord mathrm\">d</span><span class=\"mord\"><span class=\"mord mathnormal\">h</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\">⋯</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathrm\">d</span><span class=\"mord\"><span class=\"mord mathnormal\">h</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em;\">O</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">Concat</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">h</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\">⋯</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">h</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathrm\">d</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em;\">O</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-6.4266em;\"><span class=\"pstrut\" style=\"height:3.6514em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">Concat</span></span><span class=\"mopen\">(</span><span class=\"mord mathrm\">d</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8747em;\"><span style=\"top:-2.453em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.22222em;\">V</span></span></span><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.247em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em;\">O</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-4.8919em;\"><span class=\"pstrut\" style=\"height:3.6514em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord mathrm\">d</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">Concat</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8747em;\"><span style=\"top:-2.453em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.22222em;\">V</span></span></span><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.247em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em;\">O</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.5805em;\"><span class=\"pstrut\" style=\"height:3.6514em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3214em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">n</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.6514em;\"><span style=\"top:-1.8723em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span><span class=\"mrel mtight\">=</span><span class=\"mord mathnormal mtight\">i</span></span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span><span style=\"top:-4.3em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.4138em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size1\">(</span></span><span class=\"mord mathrm\">d</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8747em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span></span></span></span></span></span><span class=\"mclose delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size1\">)</span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">Concat</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8747em;\"><span style=\"top:-2.453em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.22222em;\">V</span></span></span><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.247em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em;\">O</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:0.7847em;\"><span class=\"pstrut\" style=\"height:3.6514em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">:=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3214em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">n</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.6514em;\"><span style=\"top:-1.8723em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span><span class=\"mrel mtight\">=</span><span class=\"mord mathnormal mtight\">i</span></span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span><span style=\"top:-4.3em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.4138em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size1\">(</span></span><span class=\"mord mathrm\">d</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8747em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span></span></span></span></span></span><span class=\"mclose delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size1\">)</span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.22222em;\">V</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:6.1498em;\"><span></span></span></span></span></span></span></span></span></span></span></span></p>\n<p>其中 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>W</mi><mrow><mi>V</mi><mo separator=\"true\">,</mo><mi>l</mi></mrow></msub></mrow><annotation encoding=\"application/x-tex\">W_{V,l}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.9694em;vertical-align:-0.2861em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.22222em;\">V</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span></span></span></span> 是等效的随机矩阵</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>W</mi><mrow><mi>V</mi><mo separator=\"true\">,</mo><mi>l</mi></mrow></msub><mo>=</mo><mrow><mi mathvariant=\"normal\">C</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">n</mi><mi mathvariant=\"normal\">c</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">t</mi></mrow><mo stretchy=\"false\">(</mo><msubsup><mi>W</mi><mi>V</mi><mi>i</mi></msubsup><mo stretchy=\"false\">)</mo><mo>⋅</mo><msub><mi>W</mi><mi>O</mi></msub></mrow><annotation encoding=\"application/x-tex\">W_{V,l} = \\mathrm{Concat}(W_V^i)\\cdot W_O \n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.9694em;vertical-align:-0.2861em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.22222em;\">V</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.1247em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">Concat</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8747em;\"><span style=\"top:-2.453em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.22222em;\">V</span></span></span><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.247em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em;\">O</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span></span></p>\n<p>对应Jacobian矩阵为</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>J</mi><mrow><mi mathvariant=\"normal\">M</mi><mi mathvariant=\"normal\">H</mi><mi mathvariant=\"normal\">A</mi></mrow></msub><mo>=</mo><mfrac><mn>1</mn><mi>n</mi></mfrac><msup><mn mathvariant=\"bold\">11</mn><mi>T</mi></msup><mo>⊗</mo><msub><mi>W</mi><mrow><mi>V</mi><mo separator=\"true\">,</mo><mi>l</mi></mrow></msub></mrow><annotation encoding=\"application/x-tex\">J_{\\mathrm{MHA}} = \\frac{1}{n}\\bold{1}\\bold{1}^T\\otimes W_{V,l}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.09618em;\">J</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0962em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathrm mtight\">MHA</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.0074em;vertical-align:-0.686em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3214em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">n</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mord mathbf\">1</span><span class=\"mord\"><span class=\"mord mathbf\">1</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8913em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em;\">T</span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⊗</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.9694em;vertical-align:-0.2861em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.22222em;\">V</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span></span></span></span></span></p>\n<p>残差梯度流为</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>I</mi><mo>+</mo><msub><mi>J</mi><mrow><mi mathvariant=\"normal\">M</mi><mi mathvariant=\"normal\">H</mi><mi mathvariant=\"normal\">A</mi></mrow></msub><mo>=</mo><mi>I</mi><mo>+</mo><mfrac><mn>1</mn><mi>n</mi></mfrac><msup><mn mathvariant=\"bold\">11</mn><mi>T</mi></msup><mo>⊗</mo><msub><mi>W</mi><mrow><mi>V</mi><mo separator=\"true\">,</mo><mi>l</mi></mrow></msub></mrow><annotation encoding=\"application/x-tex\">I+J_\\mathrm{MHA} = I+\\frac{1}{n}\\bold{1}\\bold{1}^T\\otimes W_{V,l}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">I</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.09618em;\">J</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0962em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathrm mtight\">MHA</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">I</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.0074em;vertical-align:-0.686em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3214em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">n</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mord mathbf\">1</span><span class=\"mord\"><span class=\"mord mathbf\">1</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8913em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em;\">T</span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⊗</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.9694em;vertical-align:-0.2861em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.22222em;\">V</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span></span></span></span></span></p>\n<h3><span id=\"ln层jacobian矩阵谱范数的阶估计\"> LN层Jacobian矩阵谱范数的阶估计</span></h3>\n<p>根据上文推导，需要计算LN层的Jacobian矩阵的大小。在此我们只考虑未仿射变换的归一化映射的梯度，因为仿射变换后只需要进行梯度的线性缩放。</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mfrac><mrow><mi>x</mi><mo>−</mo><mi>μ</mi></mrow><mi>σ</mi></mfrac></mrow><annotation encoding=\"application/x-tex\">\\mathrm{LN}(x) = \\frac{x-\\mu}{\\sigma}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">LN</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.9463em;vertical-align:-0.686em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.2603em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">σ</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\">μ</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p>取无偏向量</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>y</mi><mo>=</mo><mi>x</mi><mo stretchy=\"false\">(</mo><mi>I</mi><mo>−</mo><mfrac><mn>1</mn><mi>n</mi></mfrac><msup><mn mathvariant=\"bold\">1</mn><mi>T</mi></msup><mn mathvariant=\"bold\">1</mn><mo stretchy=\"false\">)</mo><mo>=</mo><mrow><mo fence=\"true\">(</mo><mtable rowspacing=\"0.16em\" columnalign=\"center\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><msub><mi>x</mi><mn>1</mn></msub><mo>−</mo><mfrac><mn>1</mn><mi>n</mi></mfrac><mo>∑</mo><msub><mi>x</mi><mi>i</mi></msub></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><msub><mi>x</mi><mn>2</mn></msub><mo>−</mo><mfrac><mn>1</mn><mi>n</mi></mfrac><mo>∑</mo><msub><mi>x</mi><mi>i</mi></msub></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi mathvariant=\"normal\">⋮</mi><mpadded height=\"0em\" voffset=\"0em\"><mspace mathbackground=\"black\" width=\"0em\" height=\"1.5em\"></mspace></mpadded></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><msub><mi>x</mi><mi>n</mi></msub><mo>−</mo><mfrac><mn>1</mn><mi>n</mi></mfrac><mo>∑</mo><msub><mi>x</mi><mi>i</mi></msub></mrow></mstyle></mtd></mtr></mtable><mo fence=\"true\">)</mo></mrow></mrow><annotation encoding=\"application/x-tex\">y = x(I-\\frac{1}{n}\\bold{1}^T\\bold{1}) = \\begin{pmatrix}\nx_1 - \\frac{1}{n}\\sum x_i\\\\[1em]\nx_2 - \\frac{1}{n}\\sum x_i\\\\[1em]\n\\vdots\\\\[1em]\nx_n - \\frac{1}{n}\\sum x_i\n\\end{pmatrix}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\">x</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">I</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.0074em;vertical-align:-0.686em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3214em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">n</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mord\"><span class=\"mord mathbf\">1</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8913em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em;\">T</span></span></span></span></span></span></span></span><span class=\"mord mathbf\">1</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:8.4753em;vertical-align:-3.9877em;\"></span><span class=\"minner\"><span class=\"mopen\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4.4499em;\"><span style=\"top:-6.4499em;\"><span class=\"pstrut\" style=\"height:10.4em;\"></span><span style=\"width:0.875em;height:8.400em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.875em\" height=\"8.400em\" viewbox=\"0 0 875 8400\"><path d=\"M863,9c0,-2,-2,-5,-6,-9c0,0,-17,0,-17,0c-12.7,0,-19.3,0.3,-20,1\nc-5.3,5.3,-10.3,11,-15,17c-242.7,294.7,-395.3,682,-458,1162c-21.3,163.3,-33.3,349,\n-36,557 l0,4884c0.2,6,0,26,0,60c2,159.3,10,310.7,24,454c53.3,528,210,\n949.7,470,1265c4.7,6,9.7,11.7,15,17c0.7,0.7,7,1,19,1c0,0,18,0,18,0c4,-4,6,-7,6,-9\nc0,-2.7,-3.3,-8.7,-10,-18c-135.3,-192.7,-235.5,-414.3,-300.5,-665c-65,-250.7,-102.5,\n-544.7,-112.5,-882c-2,-104,-3,-167,-3,-189\nl0,-4892c0,-162.7,5.7,-314,17,-454c20.7,-272,63.7,-513,129,-723c65.3,\n-210,155.3,-396.3,270,-559c6.7,-9.3,10,-15.3,10,-18z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.9501em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4.4877em;\"><span style=\"top:-7.3301em;\"><span class=\"pstrut\" style=\"height:3.6875em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8451em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.394em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.345em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop op-symbol small-op\" style=\"position:relative;top:0em;\">∑</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-5.1249em;\"><span class=\"pstrut\" style=\"height:3.6875em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8451em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.394em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.345em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop op-symbol small-op\" style=\"position:relative;top:0em;\">∑</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.2649em;\"><span class=\"pstrut\" style=\"height:3.6875em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord\">⋮</span><span class=\"mord rule\" style=\"border-right-width:0em;border-top-width:1.5em;bottom:0em;\"></span></span></span></span><span style=\"top:-0.0598em;\"><span class=\"pstrut\" style=\"height:3.6875em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8451em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.394em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.345em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop op-symbol small-op\" style=\"position:relative;top:0em;\">∑</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.9877em;\"><span></span></span></span></span></span></span></span><span class=\"mclose\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4.4499em;\"><span style=\"top:-6.4499em;\"><span class=\"pstrut\" style=\"height:10.4em;\"></span><span style=\"width:0.875em;height:8.400em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.875em\" height=\"8.400em\" viewbox=\"0 0 875 8400\"><path d=\"M76,0c-16.7,0,-25,3,-25,9c0,2,2,6.3,6,13c21.3,28.7,42.3,60.3,\n63,95c96.7,156.7,172.8,332.5,228.5,527.5c55.7,195,92.8,416.5,111.5,664.5\nc11.3,139.3,17,290.7,17,454c0,28,1.7,43,3.3,45l0,4809\nc-3,4,-3.3,16.7,-3.3,38c0,162,-5.7,313.7,-17,455c-18.7,248,-55.8,469.3,-111.5,664\nc-55.7,194.7,-131.8,370.3,-228.5,527c-20.7,34.7,-41.7,66.3,-63,95c-2,3.3,-4,7,-6,11\nc0,7.3,5.7,11,17,11c0,0,11,0,11,0c9.3,0,14.3,-0.3,15,-1c5.3,-5.3,10.3,-11,15,-17\nc242.7,-294.7,395.3,-681.7,458,-1161c21.3,-164.7,33.3,-350.7,36,-558\nl0,-4944c-2,-159.3,-10,-310.7,-24,-454c-53.3,-528,-210,-949.7,\n-470,-1265c-4.7,-6,-9.7,-11.7,-15,-17c-0.7,-0.7,-6.7,-1,-18,-1z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.9501em;\"><span></span></span></span></span></span></span></span></span></span></span></span></p>\n<p>其中</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mtable rowspacing=\"0.25em\" columnalign=\"right left\" columnspacing=\"0em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mi mathvariant=\"normal\">∥</mi><mi>y</mi><mi mathvariant=\"normal\">∥</mi></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><msqrt><mrow><mfrac><mn>1</mn><mi>n</mi></mfrac><munder><mo>∑</mo><mi>i</mi></munder><mo stretchy=\"false\">(</mo><msub><mi>x</mi><mi>i</mi></msub><mo>−</mo><mi>μ</mi><msup><mo stretchy=\"false\">)</mo><mn>2</mn></msup></mrow></msqrt></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><msqrt><mrow><mfrac><mn>1</mn><mi>n</mi></mfrac><munder><mo>∑</mo><mi>i</mi></munder><mo stretchy=\"false\">(</mo><msubsup><mi>x</mi><mi>i</mi><mn>2</mn></msubsup><mo>−</mo><mn>2</mn><mi>μ</mi><msub><mi>x</mi><mi>i</mi></msub><mo>+</mo><msup><mi>μ</mi><mn>2</mn></msup><mo stretchy=\"false\">)</mo></mrow></msqrt></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><msqrt><mrow><mfrac><mn>1</mn><mi>n</mi></mfrac><munder><mo>∑</mo><mi>i</mi></munder><msubsup><mi>x</mi><mi>i</mi><mn>2</mn></msubsup><mo>−</mo><msup><mi>μ</mi><mn>2</mn></msup></mrow></msqrt></mrow></mstyle></mtd></mtr></mtable><annotation encoding=\"application/x-tex\">\\begin{aligned}\n\\|y\\| &amp;= \\sqrt{\\frac{1}{n}\\sum_i (x_i-\\mu)^2}\\\\\n&amp;=\\sqrt{\\frac{1}{n} \\sum_i (x_i^2-2\\mu x_i+\\mu^2)}\\\\\n\n&amp;=\\sqrt{\\frac{1}{n}\\sum_i x_i^2-\\mu^2}\n\\end{aligned}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:10.02em;vertical-align:-4.76em;\"></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-r\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:5.26em;\"><span style=\"top:-7.26em;\"><span class=\"pstrut\" style=\"height:3.6558em;\"></span><span class=\"mord\"><span class=\"mord\">∥</span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"mord\">∥</span></span></span><span style=\"top:-3.92em;\"><span class=\"pstrut\" style=\"height:3.6558em;\"></span><span class=\"mord\"></span></span><span style=\"top:-0.58em;\"><span class=\"pstrut\" style=\"height:3.6558em;\"></span><span class=\"mord\"></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4.76em;\"><span></span></span></span></span></span><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:5.26em;\"><span style=\"top:-7.26em;\"><span class=\"pstrut\" style=\"height:3.6558em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.6558em;\"><span class=\"svg-align\" style=\"top:-5em;\"><span class=\"pstrut\" style=\"height:5em;\"></span><span class=\"mord\" style=\"padding-left:1em;\"><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3214em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">n</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.05em;\"><span style=\"top:-1.8723em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.2777em;\"><span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\">μ</span><span class=\"mclose\"><span class=\"mclose\">)</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7401em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.6158em;\"><span class=\"pstrut\" style=\"height:5em;\"></span><span class=\"hide-tail\" style=\"min-width:1.02em;height:3.08em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"3.08em\" viewbox=\"0 0 400000 3240\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M473,2793\nc339.3,-1799.3,509.3,-2700,510,-2702 l0 -0\nc3.3,-7.3,9.3,-11,18,-11 H400000v40H1017.7\ns-90.5,478,-276.2,1466c-185.7,988,-279.5,1483,-281.5,1485c-2,6,-10,9,-24,9\nc-8,0,-12,-0.7,-12,-2c0,-1.3,-5.3,-32,-16,-92c-50.7,-293.3,-119.7,-693.3,-207,-1200\nc0,-1.3,-5.3,8.7,-16,30c-10.7,21.3,-21.3,42.7,-32,64s-16,33,-16,33s-26,-26,-26,-26\ns76,-153,76,-153s77,-151,77,-151c0.7,0.7,35.7,202,105,604c67.3,400.7,102,602.7,104,\n606zM1001 80h400000v40H1017.7z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3842em;\"><span></span></span></span></span></span></span></span><span style=\"top:-3.92em;\"><span class=\"pstrut\" style=\"height:3.6558em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.6558em;\"><span class=\"svg-align\" style=\"top:-5em;\"><span class=\"pstrut\" style=\"height:5em;\"></span><span class=\"mord\" style=\"padding-left:1em;\"><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3214em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">n</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.05em;\"><span style=\"top:-1.8723em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.2777em;\"><span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7959em;\"><span style=\"top:-2.4231em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span><span style=\"top:-3.0448em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2769em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\">2</span><span class=\"mord mathnormal\">μ</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">μ</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7401em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span><span style=\"top:-3.6158em;\"><span class=\"pstrut\" style=\"height:5em;\"></span><span class=\"hide-tail\" style=\"min-width:1.02em;height:3.08em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"3.08em\" viewbox=\"0 0 400000 3240\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M473,2793\nc339.3,-1799.3,509.3,-2700,510,-2702 l0 -0\nc3.3,-7.3,9.3,-11,18,-11 H400000v40H1017.7\ns-90.5,478,-276.2,1466c-185.7,988,-279.5,1483,-281.5,1485c-2,6,-10,9,-24,9\nc-8,0,-12,-0.7,-12,-2c0,-1.3,-5.3,-32,-16,-92c-50.7,-293.3,-119.7,-693.3,-207,-1200\nc0,-1.3,-5.3,8.7,-16,30c-10.7,21.3,-21.3,42.7,-32,64s-16,33,-16,33s-26,-26,-26,-26\ns76,-153,76,-153s77,-151,77,-151c0.7,0.7,35.7,202,105,604c67.3,400.7,102,602.7,104,\n606zM1001 80h400000v40H1017.7z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3842em;\"><span></span></span></span></span></span></span></span><span style=\"top:-0.58em;\"><span class=\"pstrut\" style=\"height:3.6558em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.6558em;\"><span class=\"svg-align\" style=\"top:-5em;\"><span class=\"pstrut\" style=\"height:5em;\"></span><span class=\"mord\" style=\"padding-left:1em;\"><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3214em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">n</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.05em;\"><span style=\"top:-1.8723em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.2777em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7959em;\"><span style=\"top:-2.4231em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span><span style=\"top:-3.0448em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2769em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">μ</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7401em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.6158em;\"><span class=\"pstrut\" style=\"height:5em;\"></span><span class=\"hide-tail\" style=\"min-width:1.02em;height:3.08em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"3.08em\" viewbox=\"0 0 400000 3240\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M473,2793\nc339.3,-1799.3,509.3,-2700,510,-2702 l0 -0\nc3.3,-7.3,9.3,-11,18,-11 H400000v40H1017.7\ns-90.5,478,-276.2,1466c-185.7,988,-279.5,1483,-281.5,1485c-2,6,-10,9,-24,9\nc-8,0,-12,-0.7,-12,-2c0,-1.3,-5.3,-32,-16,-92c-50.7,-293.3,-119.7,-693.3,-207,-1200\nc0,-1.3,-5.3,8.7,-16,30c-10.7,21.3,-21.3,42.7,-32,64s-16,33,-16,33s-26,-26,-26,-26\ns76,-153,76,-153s77,-151,77,-151c0.7,0.7,35.7,202,105,604c67.3,400.7,102,602.7,104,\n606zM1001 80h400000v40H1017.7z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3842em;\"><span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4.76em;\"><span></span></span></span></span></span></span></span></span></span></span></span></p>\n<p>有<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"script\">O</mi><mo stretchy=\"false\">(</mo><mi mathvariant=\"normal\">∥</mi><mi>y</mi><mi mathvariant=\"normal\">∥</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mi mathvariant=\"script\">O</mi><mo stretchy=\"false\">(</mo><mi mathvariant=\"normal\">∥</mi><mi>x</mi><mi mathvariant=\"normal\">∥</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\mathcal{O}(\\|y\\|) = \\mathcal{O}(\\|x\\|)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathcal\" style=\"margin-right:0.02778em;\">O</span><span class=\"mopen\">(</span><span class=\"mord\">∥</span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"mord\">∥</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathcal\" style=\"margin-right:0.02778em;\">O</span><span class=\"mopen\">(</span><span class=\"mord\">∥</span><span class=\"mord mathnormal\">x</span><span class=\"mord\">∥</span><span class=\"mclose\">)</span></span></span></span></p>\n<p>因此</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mfrac><mi>y</mi><msqrt><mrow><mfrac><mn>1</mn><mi>n</mi></mfrac><mo>∑</mo><msubsup><mi>y</mi><mi>j</mi><mn>2</mn></msubsup></mrow></msqrt></mfrac></mrow><annotation encoding=\"application/x-tex\">\\mathrm{LN}(x) = \\frac{y}{\\sqrt{\\frac{1}{n}\\sum y_j^2}}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">LN</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.8376em;vertical-align:-1.73em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1076em;\"><span style=\"top:-2.11em;\"><span class=\"pstrut\" style=\"height:3.2011em;\"></span><span class=\"mord\"><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.2011em;\"><span class=\"svg-align\" style=\"top:-3.8em;\"><span class=\"pstrut\" style=\"height:3.8em;\"></span><span class=\"mord\" style=\"padding-left:1em;\"><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8451em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.394em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.345em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop op-symbol small-op\" style=\"position:relative;top:0em;\">∑</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7959em;\"><span style=\"top:-2.4231em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span><span style=\"top:-3.0448em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.413em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.1611em;\"><span class=\"pstrut\" style=\"height:3.8em;\"></span><span class=\"hide-tail\" style=\"min-width:1.02em;height:1.88em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.88em\" viewbox=\"0 0 400000 1944\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M983 90\nl0 -0\nc4,-6.7,10,-10,18,-10 H400000v40\nH1013.1s-83.4,268,-264.1,840c-180.7,572,-277,876.3,-289,913c-4.7,4.7,-12.7,7,-24,7\ns-12,0,-12,0c-1.3,-3.3,-3.7,-11.7,-7,-25c-35.3,-125.3,-106.7,-373.3,-214,-744\nc-10,12,-21,25,-33,39s-32,39,-32,39c-6,-5.3,-15,-14,-27,-26s25,-30,25,-30\nc26.7,-32.7,52,-63,76,-91s52,-60,52,-60s208,722,208,722\nc56,-175.3,126.3,-397.3,211,-666c84.7,-268.7,153.8,-488.2,207.5,-658.5\nc53.7,-170.3,84.5,-266.8,92.5,-289.5z\nM1001 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6389em;\"><span></span></span></span></span></span></span></span><span style=\"top:-3.4311em;\"><span class=\"pstrut\" style=\"height:3.2011em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.8781em;\"><span class=\"pstrut\" style=\"height:3.2011em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.73em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mtable rowspacing=\"0.25em\" columnalign=\"right left\" columnspacing=\"0em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow><mo stretchy=\"false\">(</mo><mi>x</mi><msub><mo stretchy=\"false\">)</mo><mi>i</mi></msub></mrow><mrow><mi mathvariant=\"normal\">∂</mi><msub><mi>y</mi><mi>j</mi></msub></mrow></mfrac></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><msqrt><mi>n</mi></msqrt><mfrac><mrow><msub><mi>δ</mi><mrow><mi>i</mi><mo separator=\"true\">,</mo><mi>j</mi></mrow></msub><msqrt><mrow><mo>∑</mo><msubsup><mi>y</mi><mi>j</mi><mn>2</mn></msubsup></mrow></msqrt><mo>−</mo><msub><mi>y</mi><mi>i</mi></msub><mfrac><msub><mi>y</mi><mi>j</mi></msub><msqrt><mrow><mo>∑</mo><msubsup><mi>y</mi><mi>j</mi><mn>2</mn></msubsup></mrow></msqrt></mfrac></mrow><mrow><mo>∑</mo><msubsup><mi>y</mi><mi>j</mi><mn>2</mn></msubsup></mrow></mfrac></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mfrac><msqrt><mi>n</mi></msqrt><mrow><mi mathvariant=\"normal\">∥</mi><mi>y</mi><mi mathvariant=\"normal\">∥</mi></mrow></mfrac><mrow><mo fence=\"true\">(</mo><msub><mi>δ</mi><mrow><mi>i</mi><mo separator=\"true\">,</mo><mi>j</mi></mrow></msub><mo>−</mo><mfrac><mrow><msub><mi>y</mi><mi>i</mi></msub><msub><mi>y</mi><mi>j</mi></msub></mrow><mrow><mi mathvariant=\"normal\">∥</mi><mi>y</mi><msup><mi mathvariant=\"normal\">∥</mi><mn>2</mn></msup></mrow></mfrac><mo fence=\"true\">)</mo></mrow></mrow></mstyle></mtd></mtr></mtable><annotation encoding=\"application/x-tex\">\\begin{aligned}\n\\frac{\\partial \\mathrm{LN}(x)_i}{\\partial y_j} &amp; = \\sqrt{n} \\frac{\\delta_{i,j} \\sqrt{\\sum y_j^2} - y_i\\frac{y_j}{\\sqrt{\\sum y_j^2}}}{\\sum y_j^2}\\\\\n&amp; = \\frac{\\sqrt{n}}{\\|y\\|}\\left(\\delta_{i,j} -\\frac{y_iy_j}{\\|y\\|^2} \\right)\n\\end{aligned}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:6.5223em;vertical-align:-3.0112em;\"></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-r\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.5112em;\"><span style=\"top:-5.5112em;\"><span class=\"pstrut\" style=\"height:4.3961em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.427em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathrm\">LN</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\"><span class=\"mclose\">)</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9721em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span><span style=\"top:-2.6349em;\"><span class=\"pstrut\" style=\"height:4.3961em;\"></span><span class=\"mord\"></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.0112em;\"><span></span></span></span></span></span><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.5112em;\"><span style=\"top:-5.5112em;\"><span class=\"pstrut\" style=\"height:4.3961em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8492em;\"><span class=\"svg-align\" style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\" style=\"padding-left:0.833em;\"><span class=\"mord mathnormal\">n</span></span></span><span style=\"top:-2.8092em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"hide-tail\" style=\"min-width:0.853em;height:1.08em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.08em\" viewbox=\"0 0 400000 1080\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M95,702\nc-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14\nc0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54\nc44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10\ns173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429\nc69,-144,104.5,-217.7,106.5,-221\nl0 -0\nc5.3,-9.3,12,-14,20,-14\nH400000v40H845.2724\ns-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7\nc-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z\nM834 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1908em;\"><span></span></span></span></span></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.3961em;\"><span style=\"top:-2.4905em;\"><span class=\"pstrut\" style=\"height:3.1765em;\"></span><span class=\"mord\"><span class=\"mop op-symbol small-op\" style=\"position:relative;top:0em;\">∑</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7959em;\"><span style=\"top:-2.4231em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span><span style=\"top:-3.0448em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.413em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.4065em;\"><span class=\"pstrut\" style=\"height:3.1765em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-4.3961em;\"><span class=\"pstrut\" style=\"height:3.1765em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03785em;\">δ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0379em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">i</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1765em;\"><span class=\"svg-align\" style=\"top:-3.8em;\"><span class=\"pstrut\" style=\"height:3.8em;\"></span><span class=\"mord\" style=\"padding-left:1em;\"><span class=\"mop op-symbol small-op\" style=\"position:relative;top:0em;\">∑</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7959em;\"><span style=\"top:-2.4231em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span><span style=\"top:-3.0448em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.413em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.1365em;\"><span class=\"pstrut\" style=\"height:3.8em;\"></span><span class=\"hide-tail\" style=\"min-width:1.02em;height:1.88em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.88em\" viewbox=\"0 0 400000 1944\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M983 90\nl0 -0\nc4,-6.7,10,-10,18,-10 H400000v40\nH1013.1s-83.4,268,-264.1,840c-180.7,572,-277,876.3,-289,913c-4.7,4.7,-12.7,7,-24,7\ns-12,0,-12,0c-1.3,-3.3,-3.7,-11.7,-7,-25c-35.3,-125.3,-106.7,-373.3,-214,-744\nc-10,12,-21,25,-33,39s-32,39,-32,39c-6,-5.3,-15,-14,-27,-26s25,-30,25,-30\nc26.7,-32.7,52,-63,76,-91s52,-60,52,-60s208,722,208,722\nc56,-175.3,126.3,-397.3,211,-666c84.7,-268.7,153.8,-488.2,207.5,-658.5\nc53.7,-170.3,84.5,-266.8,92.5,-289.5z\nM1001 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6635em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8087em;\"><span style=\"top:-2.5188em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord sqrt mtight\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9589em;\"><span class=\"svg-align\" style=\"top:-3.4286em;\"><span class=\"pstrut\" style=\"height:3.4286em;\"></span><span class=\"mord mtight\" style=\"padding-left:1.19em;\"><span class=\"mop op-symbol small-op mtight\" style=\"position:relative;top:0em;\">∑</span><span class=\"mspace mtight\" style=\"margin-right:0.1952em;\"></span><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03588em;\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8051em;\"><span style=\"top:-2.1777em;margin-left:-0.0359em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span><span style=\"top:-2.8448em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.4612em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.9309em;\"><span class=\"pstrut\" style=\"height:3.4286em;\"></span><span class=\"hide-tail mtight\" style=\"min-width:0.853em;height:1.5429em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.5429em\" viewbox=\"0 0 400000 1080\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M95,702\nc-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14\nc0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54\nc44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10\ns173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429\nc69,-144,104.5,-217.7,106.5,-221\nl0 -0\nc5.3,-9.3,12,-14,20,-14\nH400000v40H845.2724\ns-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7\nc-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z\nM834 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.4977em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.5073em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03588em;\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3281em;\"><span style=\"top:-2.357em;margin-left:-0.0359em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2819em;\"><span></span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8296em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.099em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span><span style=\"top:-2.6349em;\"><span class=\"pstrut\" style=\"height:4.3961em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.4773em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">∥</span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"mord\">∥</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8003em;\"><span class=\"svg-align\" style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\" style=\"padding-left:0.833em;\"><span class=\"mord mathnormal\">n</span></span></span><span style=\"top:-2.7603em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"hide-tail\" style=\"min-width:0.853em;height:1.08em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.08em\" viewbox=\"0 0 400000 1080\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M95,702\nc-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14\nc0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54\nc44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10\ns173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429\nc69,-144,104.5,-217.7,106.5,-221\nl0 -0\nc5.3,-9.3,12,-14,20,-14\nH400000v40H845.2724\ns-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7\nc-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z\nM834 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2397em;\"><span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.936em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">(</span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03785em;\">δ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0379em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">i</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1076em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">∥</span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"mord\"><span class=\"mord\">∥</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7401em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.936em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">)</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.0112em;\"><span></span></span></span></span></span></span></span></span></span></span></span></p>\n<p>因此</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mtable rowspacing=\"0.25em\" columnalign=\"right left\" columnspacing=\"0em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><msub><mi>J</mi><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow></msub><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo></mrow><mrow><mi mathvariant=\"normal\">∂</mi><mi>y</mi></mrow></mfrac><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>y</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><mi>x</mi></mrow></mfrac></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mfrac><msqrt><mi>n</mi></msqrt><mrow><mi mathvariant=\"normal\">∥</mi><mi>y</mi><mi mathvariant=\"normal\">∥</mi></mrow></mfrac><mrow><mo fence=\"true\">(</mo><mi>I</mi><mo>−</mo><mfrac><mrow><msub><mi>y</mi><mi>i</mi></msub><msub><mi>y</mi><mi>j</mi></msub></mrow><mrow><mi mathvariant=\"normal\">∥</mi><mi>y</mi><msup><mi mathvariant=\"normal\">∥</mi><mn>2</mn></msup></mrow></mfrac><mo fence=\"true\">)</mo></mrow><mo stretchy=\"false\">(</mo><mi>I</mi><mo>−</mo><msup><mn mathvariant=\"bold\">1</mn><mi>T</mi></msup><mn mathvariant=\"bold\">1</mn><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr></mtable><annotation encoding=\"application/x-tex\">\\begin{aligned}\nJ_{\\mathrm{LN}}(x) &amp;= \\frac{\\partial \\mathrm{LN}(x)}{\\partial y}\\frac{\\partial y}{\\partial x}\\\\\n&amp;=\\frac{\\sqrt{n}}{\\|y\\|}\\left(I-\\frac{y_iy_j}{\\|y\\|^2}\\right)(I- \\bold{1}^T\\bold{1})\n\\end{aligned}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:5.3348em;vertical-align:-2.4174em;\"></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-r\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.9174em;\"><span style=\"top:-4.9677em;\"><span class=\"pstrut\" style=\"height:3.4773em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.09618em;\">J</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0962em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathrm mtight\">LN</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span></span></span><span style=\"top:-2.3099em;\"><span class=\"pstrut\" style=\"height:3.4773em;\"></span><span class=\"mord\"></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.4174em;\"><span></span></span></span></span></span><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.9174em;\"><span style=\"top:-4.9677em;\"><span class=\"pstrut\" style=\"height:3.4773em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.427em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathrm\">LN</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8804em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3714em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal\">x</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span><span style=\"top:-2.3099em;\"><span class=\"pstrut\" style=\"height:3.4773em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.4773em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">∥</span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"mord\">∥</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8003em;\"><span class=\"svg-align\" style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\" style=\"padding-left:0.833em;\"><span class=\"mord mathnormal\">n</span></span></span><span style=\"top:-2.7603em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"hide-tail\" style=\"min-width:0.853em;height:1.08em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.08em\" viewbox=\"0 0 400000 1080\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M95,702\nc-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14\nc0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54\nc44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10\ns173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429\nc69,-144,104.5,-217.7,106.5,-221\nl0 -0\nc5.3,-9.3,12,-14,20,-14\nH400000v40H845.2724\ns-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7\nc-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z\nM834 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2397em;\"><span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.936em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">(</span></span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">I</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1076em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">∥</span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"mord\"><span class=\"mord\">∥</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7401em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.936em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">)</span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">I</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathbf\">1</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8913em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em;\">T</span></span></span></span></span></span></span></span><span class=\"mord mathbf\">1</span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.4174em;\"><span></span></span></span></span></span></span></span></span></span></span></span></p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi mathvariant=\"normal\">∥</mi><msub><mi>J</mi><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow></msub><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo><mi mathvariant=\"normal\">∥</mi><mo>=</mo><mi mathvariant=\"script\">O</mi><mo stretchy=\"false\">(</mo><mfrac><msqrt><mi>n</mi></msqrt><mrow><mi mathvariant=\"normal\">∥</mi><mi>y</mi><mi mathvariant=\"normal\">∥</mi></mrow></mfrac><mo stretchy=\"false\">)</mo><mo>=</mo><mi mathvariant=\"script\">O</mi><mo stretchy=\"false\">(</mo><mfrac><msqrt><mi>n</mi></msqrt><mrow><mi mathvariant=\"normal\">∥</mi><mi>x</mi><mi mathvariant=\"normal\">∥</mi></mrow></mfrac><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\|J_\\mathrm{LN}(x)\\| = \\mathcal{O}(\\frac{\\sqrt{n}}{\\|y\\|}) = \\mathcal{O}(\\frac{\\sqrt{n}}{\\|x\\|})\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">∥</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.09618em;\">J</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0962em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathrm mtight\">LN</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span><span class=\"mord\">∥</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.4133em;vertical-align:-0.936em;\"></span><span class=\"mord mathcal\" style=\"margin-right:0.02778em;\">O</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.4773em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">∥</span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"mord\">∥</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8003em;\"><span class=\"svg-align\" style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\" style=\"padding-left:0.833em;\"><span class=\"mord mathnormal\">n</span></span></span><span style=\"top:-2.7603em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"hide-tail\" style=\"min-width:0.853em;height:1.08em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.08em\" viewbox=\"0 0 400000 1080\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M95,702\nc-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14\nc0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54\nc44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10\ns173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429\nc69,-144,104.5,-217.7,106.5,-221\nl0 -0\nc5.3,-9.3,12,-14,20,-14\nH400000v40H845.2724\ns-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7\nc-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z\nM834 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2397em;\"><span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.936em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.4133em;vertical-align:-0.936em;\"></span><span class=\"mord mathcal\" style=\"margin-right:0.02778em;\">O</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.4773em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">∥</span><span class=\"mord mathnormal\">x</span><span class=\"mord\">∥</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8003em;\"><span class=\"svg-align\" style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\" style=\"padding-left:0.833em;\"><span class=\"mord mathnormal\">n</span></span></span><span style=\"top:-2.7603em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"hide-tail\" style=\"min-width:0.853em;height:1.08em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.08em\" viewbox=\"0 0 400000 1080\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M95,702\nc-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14\nc0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54\nc44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10\ns173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429\nc69,-144,104.5,-217.7,106.5,-221\nl0 -0\nc5.3,-9.3,12,-14,20,-14\nH400000v40H845.2724\ns-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7\nc-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z\nM834 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2397em;\"><span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.936em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose\">)</span></span></span></span></span></p>\n<p>基于以上结果进行主定理的叙述</p>\n<blockquote>\n<p><strong>Definition 1.1</strong>: 随机变量的 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo stretchy=\"false\">(</mo><mi>ε</mi><mo separator=\"true\">,</mo><mi>δ</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">(\\varepsilon,\\delta)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">ε</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03785em;\">δ</span><span class=\"mclose\">)</span></span></span></span>-Bounded</p>\n<p>对于实随机变量 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>Z</mi><mo>≥</mo><mn>0</mn></mrow><annotation encoding=\"application/x-tex\">Z\\geq 0</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8193em;vertical-align:-0.136em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">Z</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≥</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">0</span></span></span></span>, 如果<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>Z</mi></mrow><annotation encoding=\"application/x-tex\">Z</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">Z</span></span></span></span>满足</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi mathvariant=\"double-struck\">P</mi><mrow><mo fence=\"true\">(</mo><mfrac><mrow><mi>Z</mi><mo>−</mo><mi>μ</mi></mrow><mi>μ</mi></mfrac><mo>≤</mo><mi>ε</mi><mo fence=\"true\">)</mo></mrow><mo>≥</mo><mn>1</mn><mo>−</mo><mi>δ</mi></mrow><annotation encoding=\"application/x-tex\">\\mathbb{P}\\left(\\frac{Z-\\mu}{\\mu}\\leq \\varepsilon\\right) \\geq 1-\\delta\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:2.4em;vertical-align:-0.95em;\"></span><span class=\"mord mathbb\">P</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">(</span></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3603em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">μ</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">Z</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\">μ</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8804em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≤</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord mathnormal\">ε</span><span class=\"mclose delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">)</span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≥</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03785em;\">δ</span></span></span></span></span></p>\n<p>也即</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi mathvariant=\"double-struck\">P</mi><mrow><mo fence=\"true\">(</mo><mfrac><mrow><mi>Z</mi><mo>−</mo><mi>μ</mi></mrow><mi>μ</mi></mfrac><mo>≥</mo><mi>ε</mi><mo fence=\"true\">)</mo></mrow><mo>≤</mo><mi>δ</mi></mrow><annotation encoding=\"application/x-tex\">\\mathbb{P}\\left(\\frac{Z-\\mu}{\\mu}\\geq \\varepsilon\\right) \\leq \\delta\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:2.4em;vertical-align:-0.95em;\"></span><span class=\"mord mathbb\">P</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">(</span></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3603em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">μ</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">Z</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\">μ</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8804em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≥</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord mathnormal\">ε</span><span class=\"mclose delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">)</span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≤</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03785em;\">δ</span></span></span></span></span></p>\n<p>其中<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>ε</mi><mo>&gt;</mo><mn>0</mn><mo separator=\"true\">,</mo><mn>0</mn><mo>&lt;</mo><mi>δ</mi><mo>&lt;</mo><mn>1</mn></mrow><annotation encoding=\"application/x-tex\">\\varepsilon &gt; 0, 0&lt;\\delta &lt;1</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5782em;vertical-align:-0.0391em;\"></span><span class=\"mord mathnormal\">ε</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">&gt;</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8389em;vertical-align:-0.1944em;\"></span><span class=\"mord\">0</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">0</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">&lt;</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7335em;vertical-align:-0.0391em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03785em;\">δ</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">&lt;</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">1</span></span></span></span>, 则称随机变量<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>Z</mi></mrow><annotation encoding=\"application/x-tex\">Z</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">Z</span></span></span></span>是 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo stretchy=\"false\">(</mo><mi>ε</mi><mo>−</mo><mi>δ</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">(\\varepsilon-\\delta)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">ε</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03785em;\">δ</span><span class=\"mclose\">)</span></span></span></span>-Bounded</p>\n</blockquote>\n<p>这个结论和Chebyshev不等式的结构相似, Chebyshev不等式能说明对方差有界随机变量都是 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo stretchy=\"false\">(</mo><mi>ε</mi><mo separator=\"true\">,</mo><mfrac><msup><mi>σ</mi><mn>2</mn></msup><msup><mi>ε</mi><mn>2</mn></msup></mfrac><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">(\\varepsilon, \\frac{\\sigma^2}{\\varepsilon^2})</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.3629em;vertical-align:-0.345em;\"></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">ε</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.0179em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">ε</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7463em;\"><span style=\"top:-2.786em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.394em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03588em;\">σ</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8913em;\"><span style=\"top:-2.931em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.345em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose\">)</span></span></span></span>-Bounded的</p>\n<h3><span id=\"整体损失函数梯度谱范数\"> 整体损失函数梯度谱范数</span></h3>\n<p>Post-LN架构的损失函数定义为顶部第<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>L</mi></mrow><annotation encoding=\"application/x-tex\">L</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\">L</span></span></span></span>层的交叉熵</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi mathvariant=\"script\">L</mi><mo stretchy=\"false\">(</mo><msubsup><mi>x</mi><mrow><mi>L</mi><mo>+</mo><mn>1</mn><mo separator=\"true\">,</mo><mi>i</mi></mrow><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mo stretchy=\"false\">)</mo><mo>=</mo><mo>−</mo><mi>log</mi><mo>⁡</mo><msub><mrow><mi mathvariant=\"normal\">s</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">f</mi><mi mathvariant=\"normal\">t</mi><mi mathvariant=\"normal\">m</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">x</mi></mrow><msub><mi>y</mi><mi>i</mi></msub></msub><mo stretchy=\"false\">(</mo><msup><mi>W</mi><mrow><mi>e</mi><mi>m</mi><mi>b</mi></mrow></msup><msubsup><mi>x</mi><mrow><mi>L</mi><mo>+</mo><mn>1</mn><mo separator=\"true\">,</mo><mi>i</mi></mrow><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mo stretchy=\"false\">)</mo><mo>=</mo><mo>−</mo><mi>log</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><mi mathvariant=\"double-struck\">P</mi><mo stretchy=\"false\">(</mo><mrow><mi mathvariant=\"normal\">S</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">f</mi><mi mathvariant=\"normal\">t</mi><mi mathvariant=\"normal\">m</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">x</mi></mrow><mo stretchy=\"false\">(</mo><msup><mi>W</mi><mrow><mi>e</mi><mi>m</mi><mi>b</mi></mrow></msup><msubsup><mi>x</mi><mrow><mi>L</mi><mo>+</mo><mn>1</mn><mo separator=\"true\">,</mo><mi>i</mi></mrow><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mo stretchy=\"false\">)</mo><mi mathvariant=\"normal\">∣</mi><mtext> </mtext><msub><mi>y</mi><mi>i</mi></msub><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\mathcal{L}(x^{post}_{L+1,i}) = -\\log \\mathrm{softmax}_{y_i} (W^{emb}x^{post}_{L+1,i}) = -\\log(\\mathbb{P}(\\mathrm{Softmax}(W^{emb}x^{post}_{L+1,i})|\\, y_i))\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.3411em;vertical-align:-0.4296em;\"></span><span class=\"mord mathcal\">L</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4065em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">L</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\">i</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.4296em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.3411em;vertical-align:-0.4296em;\"></span><span class=\"mord\">−</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop\">lo<span style=\"margin-right:0.01389em;\">g</span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathrm\">softmax</span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03588em;\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3281em;\"><span style=\"top:-2.357em;margin-left:-0.0359em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.143em;\"><span></span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8991em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">e</span><span class=\"mord mathnormal mtight\">mb</span></span></span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4065em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">L</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\">i</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.4296em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.3411em;vertical-align:-0.4296em;\"></span><span class=\"mord\">−</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop\">lo<span style=\"margin-right:0.01389em;\">g</span></span><span class=\"mopen\">(</span><span class=\"mord mathbb\">P</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathrm\">Softmax</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8991em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">e</span><span class=\"mord mathnormal mtight\">mb</span></span></span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4065em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">L</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\">i</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.4296em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mord\">∣</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">))</span></span></span></span></span></p>\n<p>Pre-LN架构尾部多一个LN块，损失函数为</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi mathvariant=\"script\">L</mi><mo stretchy=\"false\">(</mo><msubsup><mi>x</mi><mrow><mi>f</mi><mi>i</mi><mi>n</mi><mi>a</mi><mi>l</mi><mo separator=\"true\">,</mo><mi>i</mi></mrow><mrow><mi>p</mi><mi>r</mi><mi>e</mi></mrow></msubsup><mo stretchy=\"false\">)</mo><mo>=</mo><mo>−</mo><mi>log</mi><mo>⁡</mo><msub><mrow><mi mathvariant=\"normal\">s</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">f</mi><mi mathvariant=\"normal\">t</mi><mi mathvariant=\"normal\">m</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">x</mi></mrow><msub><mi>y</mi><mi>i</mi></msub></msub><mo stretchy=\"false\">(</mo><msup><mi>W</mi><mrow><mi>e</mi><mi>m</mi><mi>b</mi></mrow></msup><msubsup><mi>x</mi><mrow><mi>f</mi><mi>i</mi><mi>n</mi><mi>a</mi><mi>l</mi><mo separator=\"true\">,</mo><mi>i</mi></mrow><mrow><mi>p</mi><mi>r</mi><mi>e</mi></mrow></msubsup><mo stretchy=\"false\">)</mo><mo>=</mo><mo>−</mo><mi>log</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><mi mathvariant=\"double-struck\">P</mi><mo stretchy=\"false\">(</mo><mrow><mi mathvariant=\"normal\">S</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">f</mi><mi mathvariant=\"normal\">t</mi><mi mathvariant=\"normal\">m</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">x</mi></mrow><mo stretchy=\"false\">(</mo><msup><mi>W</mi><mrow><mi>e</mi><mi>m</mi><mi>b</mi></mrow></msup><msubsup><mi>x</mi><mrow><mi>f</mi><mi>i</mi><mi>n</mi><mi>a</mi><mi>l</mi><mo separator=\"true\">,</mo><mi>i</mi></mrow><mrow><mi>p</mi><mi>r</mi><mi>e</mi></mrow></msubsup><mo stretchy=\"false\">)</mo><mi mathvariant=\"normal\">∣</mi><mtext> </mtext><msub><mi>y</mi><mi>i</mi></msub><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\mathcal{L}(x^{pre}_{final,i}) = -\\log\\mathrm{softmax}_{y_i}(W^{emb}x^{pre}_{final,i}) = -\\log(\\mathbb{P}(\\mathrm{Softmax}(W^{emb}x^{pre}_{final,i})|\\, y_i))\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.2197em;vertical-align:-0.4374em;\"></span><span class=\"mord mathcal\">L</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.3987em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10764em;\">f</span><span class=\"mord mathnormal mtight\">ina</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\">i</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.4374em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.3365em;vertical-align:-0.4374em;\"></span><span class=\"mord\">−</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop\">lo<span style=\"margin-right:0.01389em;\">g</span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathrm\">softmax</span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03588em;\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3281em;\"><span style=\"top:-2.357em;margin-left:-0.0359em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.143em;\"><span></span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8991em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">e</span><span class=\"mord mathnormal mtight\">mb</span></span></span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.3987em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10764em;\">f</span><span class=\"mord mathnormal mtight\">ina</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\">i</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.4374em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.3365em;vertical-align:-0.4374em;\"></span><span class=\"mord\">−</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop\">lo<span style=\"margin-right:0.01389em;\">g</span></span><span class=\"mopen\">(</span><span class=\"mord mathbb\">P</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathrm\">Softmax</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8991em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">e</span><span class=\"mord mathnormal mtight\">mb</span></span></span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.3987em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10764em;\">f</span><span class=\"mord mathnormal mtight\">ina</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\">i</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.4374em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mord\">∣</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">))</span></span></span></span></span></p>\n<p>其中</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msubsup><mi>x</mi><mrow><mi>f</mi><mi>i</mi><mi>n</mi><mi>a</mi><mi>l</mi><mo separator=\"true\">,</mo><mi>i</mi></mrow><mrow><mi>p</mi><mi>r</mi><mi>e</mi></mrow></msubsup><mo>=</mo><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow><mo stretchy=\"false\">(</mo><msubsup><mi>x</mi><mrow><mi>L</mi><mo>+</mo><mn>1</mn><mo separator=\"true\">,</mo><mi>i</mi></mrow><mrow><mi>p</mi><mi>r</mi><mi>e</mi></mrow></msubsup><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">x^{pre}_{final,i} = \\mathrm{LN}(x^{pre}_{L+1,i})\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.2197em;vertical-align:-0.4374em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.3987em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10764em;\">f</span><span class=\"mord mathnormal mtight\">ina</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\">i</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.4374em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.2119em;vertical-align:-0.4296em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">LN</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.4065em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">L</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\">i</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.4296em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span></span></p>\n<p><strong>Theorem 1.</strong> 假设 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"normal\">∥</mi><msubsup><mi>W</mi><mrow><mi>L</mi><mo>+</mo><mn>1</mn><mo separator=\"true\">,</mo><mi>i</mi></mrow><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mi mathvariant=\"normal\">∥</mi></mrow><annotation encoding=\"application/x-tex\">\\|W^{post}_{L+1,i}\\|</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.3411em;vertical-align:-0.4296em;\"></span><span class=\"mord\">∥</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4065em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">L</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\">i</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.4296em;\"><span></span></span></span></span></span></span><span class=\"mord\">∥</span></span></span></span>, <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"normal\">∥</mi><msubsup><mi>W</mi><mrow><mi>L</mi><mo>+</mo><mn>1</mn><mo separator=\"true\">,</mo><mi>i</mi></mrow><mrow><mi>p</mi><mi>r</mi><mi>e</mi></mrow></msubsup><mi mathvariant=\"normal\">∥</mi></mrow><annotation encoding=\"application/x-tex\">\\|W^{pre}_{L+1,i}\\|</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.2119em;vertical-align:-0.4296em;\"></span><span class=\"mord\">∥</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.4065em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">L</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\">i</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.4296em;\"><span></span></span></span></span></span></span><span class=\"mord\">∥</span></span></span></span> 均为<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo stretchy=\"false\">(</mo><mi>ε</mi><mo separator=\"true\">,</mo><mi>δ</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">(\\varepsilon,\\delta)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">ε</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03785em;\">δ</span><span class=\"mclose\">)</span></span></span></span>-Bounded的。 则Post-LN与Pre-LN结构的梯度谱范数满足</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mo fence=\"true\">{</mo><mtable rowspacing=\"0.36em\" columnalign=\"left left\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><msub><mrow><mo fence=\"true\">∥</mo><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mover accent=\"true\"><mi mathvariant=\"script\">L</mi><mo>~</mo></mover><mo stretchy=\"false\">(</mo><msubsup><mi>x</mi><mrow><mi>L</mi><mo>+</mo><mn>1</mn></mrow><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mo stretchy=\"false\">)</mo></mrow><mrow><mi mathvariant=\"normal\">∂</mi><msup><mi>W</mi><mrow><mn>2</mn><mo separator=\"true\">,</mo><mi>L</mi></mrow></msup></mrow></mfrac><mo fence=\"true\">∥</mo></mrow><mi>F</mi></msub><mo>=</mo><mi mathvariant=\"script\">O</mi><mo stretchy=\"false\">(</mo><mi>d</mi><msqrt><mrow><mi>ln</mi><mo>⁡</mo><mi>d</mi></mrow></msqrt><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><msub><mrow><mo fence=\"true\">∥</mo><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mover accent=\"true\"><mi mathvariant=\"script\">L</mi><mo>~</mo></mover><mo stretchy=\"false\">(</mo><msubsup><mi>x</mi><mrow><mi>f</mi><mi>i</mi><mi>n</mi><mi>a</mi><mi>l</mi></mrow><mrow><mi>p</mi><mi>r</mi><mi>e</mi></mrow></msubsup><mo stretchy=\"false\">)</mo></mrow><mrow><mi mathvariant=\"normal\">∂</mi><msup><mi>W</mi><mrow><mn>2</mn><mo separator=\"true\">,</mo><mi>L</mi></mrow></msup></mrow></mfrac><mo fence=\"true\">∥</mo></mrow><mi>F</mi></msub><mo>=</mo><mi mathvariant=\"script\">O</mi><mo stretchy=\"false\">(</mo><mi>d</mi><msqrt><mfrac><mrow><mi>ln</mi><mo>⁡</mo><mi>d</mi></mrow><mi>L</mi></mfrac></msqrt><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr></mtable></mrow><annotation encoding=\"application/x-tex\">\\begin{dcases}\n\\left\\|\\frac{\\partial\\tilde{\\mathcal{L}} (x^{post}_{L+1})}{\\partial W^{2,L}}\\right\\|_F= \\mathcal{O}(d\\sqrt{\\ln d})\\\\[2em]\n\\left\\|\\frac{\\partial\\tilde{\\mathcal{L}}(x^{pre}_{final})}{\\partial W^{2,L}}\\right\\|_F=\\mathcal{O}(d\\sqrt{\\frac{\\ln d}{L}}) \n\\end{dcases}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:7.2317em;vertical-align:-3.3658em;\"></span><span class=\"minner\"><span class=\"mopen\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.85em;\"><span style=\"top:-1.366em;\"><span class=\"pstrut\" style=\"height:3.816em;\"></span><span class=\"delimsizinginner delim-size4\"><span>⎩</span></span></span><span style=\"top:-1.358em;\"><span class=\"pstrut\" style=\"height:3.816em;\"></span><span style=\"height:1.816em;width:0.8889em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.8889em\" height=\"1.816em\" style=\"width:0.8889em\" viewbox=\"0 0 888.89 1816\" preserveaspectratio=\"xMinYMin\"><path d=\"M384 0 H504 V1816 H384z M384 0 H504 V1816 H384z\"/></svg></span></span><span style=\"top:-3.816em;\"><span class=\"pstrut\" style=\"height:3.816em;\"></span><span class=\"delimsizinginner delim-size4\"><span>⎨</span></span></span><span style=\"top:-4.958em;\"><span class=\"pstrut\" style=\"height:3.816em;\"></span><span style=\"height:1.816em;width:0.8889em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.8889em\" height=\"1.816em\" style=\"width:0.8889em\" viewbox=\"0 0 888.89 1816\" preserveaspectratio=\"xMinYMin\"><path d=\"M384 0 H504 V1816 H384z M384 0 H504 V1816 H384z\"/></svg></span></span><span style=\"top:-6.766em;\"><span class=\"pstrut\" style=\"height:3.816em;\"></span><span class=\"delimsizinginner delim-size4\"><span>⎧</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.35em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.8658em;\"><span style=\"top:-5.8658em;\"><span class=\"pstrut\" style=\"height:3.75em;\"></span><span class=\"mord\"><span class=\"minner\"><span class=\"minner\"><span class=\"mopen\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.75em;\"><span style=\"top:-3.75em;\"><span class=\"pstrut\" style=\"height:5em;\"></span><span style=\"width:0.556em;height:3.000em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.556em\" height=\"3.000em\" viewbox=\"0 0 556 3000\"><path d=\"M145 15 v585 v1800 v585 c2.667,10,9.667,15,21,15\nc10,0,16.667,-5,20,-15 v-585 v-1800 v-585 c-2.667,-10,-9.667,-15,-21,-15\nc-10,0,-16.667,5,-20,15z M188 15 H145 v585 v1800 v585 h43z\nM367 15 v585 v1800 v585 c2.667,10,9.667,15,21,15\nc10,0,16.667,-5,20,-15 v-585 v-1800 v-585 c-2.667,-10,-9.667,-15,-21,-15\nc-10,0,-16.667,5,-20,15z M410 15 H367 v585 v1800 v585 h43z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.25em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.6621em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7673em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\">L</span></span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.7419em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9202em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathcal\">L</span></span><span style=\"top:-3.6023em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.2222em;\"><span class=\"mord\">~</span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4065em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">L</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3519em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.75em;\"><span style=\"top:-3.75em;\"><span class=\"pstrut\" style=\"height:5em;\"></span><span style=\"width:0.556em;height:3.000em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.556em\" height=\"3.000em\" viewbox=\"0 0 556 3000\"><path d=\"M145 15 v585 v1800 v585 c2.667,10,9.667,15,21,15\nc10,0,16.667,-5,20,-15 v-585 v-1800 v-585 c-2.667,-10,-9.667,-15,-21,-15\nc-10,0,-16.667,5,-20,15z M188 15 H145 v585 v1800 v585 h43z\nM367 15 v585 v1800 v585 c2.667,10,9.667,15,21,15\nc10,0,16.667,-5,20,-15 v-585 v-1800 v-585 c-2.667,-10,-9.667,-15,-21,-15\nc-10,0,-16.667,5,-20,15z M410 15 H367 v585 v1800 v585 h43z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.25em;\"><span></span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:-0.8214em;\"><span style=\"top:-1.4003em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em;\">F</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.2997em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord mathcal\" style=\"margin-right:0.02778em;\">O</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">d</span><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9811em;\"><span class=\"svg-align\" style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\" style=\"padding-left:0.833em;\"><span class=\"mop\">ln</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\">d</span></span></span><span style=\"top:-2.9411em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"hide-tail\" style=\"min-width:0.853em;height:1.08em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.08em\" viewbox=\"0 0 400000 1080\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M95,702\nc-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14\nc0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54\nc44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10\ns173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429\nc69,-144,104.5,-217.7,106.5,-221\nl0 -0\nc5.3,-9.3,12,-14,20,-14\nH400000v40H845.2724\ns-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7\nc-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z\nM834 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.0589em;\"><span></span></span></span></span></span><span class=\"mclose\">)</span></span></span><span style=\"top:-1.6839em;\"><span class=\"pstrut\" style=\"height:3.75em;\"></span><span class=\"mord\"><span class=\"minner\"><span class=\"minner\"><span class=\"mopen\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.75em;\"><span style=\"top:-3.75em;\"><span class=\"pstrut\" style=\"height:5em;\"></span><span style=\"width:0.556em;height:3.000em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.556em\" height=\"3.000em\" viewbox=\"0 0 556 3000\"><path d=\"M145 15 v585 v1800 v585 c2.667,10,9.667,15,21,15\nc10,0,16.667,-5,20,-15 v-585 v-1800 v-585 c-2.667,-10,-9.667,-15,-21,-15\nc-10,0,-16.667,5,-20,15z M188 15 H145 v585 v1800 v585 h43z\nM367 15 v585 v1800 v585 c2.667,10,9.667,15,21,15\nc10,0,16.667,-5,20,-15 v-585 v-1800 v-585 c-2.667,-10,-9.667,-15,-21,-15\nc-10,0,-16.667,5,-20,15z M410 15 H367 v585 v1800 v585 h43z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.25em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.7476em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7673em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\">L</span></span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.8274em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9202em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathcal\">L</span></span><span style=\"top:-3.6023em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.2222em;\"><span class=\"mord\">~</span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.3987em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10764em;\">f</span><span class=\"mord mathnormal mtight\">ina</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.4374em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.75em;\"><span style=\"top:-3.75em;\"><span class=\"pstrut\" style=\"height:5em;\"></span><span style=\"width:0.556em;height:3.000em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.556em\" height=\"3.000em\" viewbox=\"0 0 556 3000\"><path d=\"M145 15 v585 v1800 v585 c2.667,10,9.667,15,21,15\nc10,0,16.667,-5,20,-15 v-585 v-1800 v-585 c-2.667,-10,-9.667,-15,-21,-15\nc-10,0,-16.667,5,-20,15z M188 15 H145 v585 v1800 v585 h43z\nM367 15 v585 v1800 v585 c2.667,10,9.667,15,21,15\nc10,0,16.667,-5,20,-15 v-585 v-1800 v-585 c-2.667,-10,-9.667,-15,-21,-15\nc-10,0,-16.667,5,-20,15z M410 15 H367 v585 v1800 v585 h43z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.25em;\"><span></span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:-0.8214em;\"><span style=\"top:-1.4003em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em;\">F</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.2997em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord mathcal\" style=\"margin-right:0.02778em;\">O</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">d</span><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.6766em;\"><span class=\"svg-align\" style=\"top:-4.4em;\"><span class=\"pstrut\" style=\"height:4.4em;\"></span><span class=\"mord\" style=\"padding-left:1em;\"><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3714em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">L</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mop\">ln</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\">d</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span><span style=\"top:-3.6366em;\"><span class=\"pstrut\" style=\"height:4.4em;\"></span><span class=\"hide-tail\" style=\"min-width:1.02em;height:2.48em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"2.48em\" viewbox=\"0 0 400000 2592\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M424,2478\nc-1.3,-0.7,-38.5,-172,-111.5,-514c-73,-342,-109.8,-513.3,-110.5,-514\nc0,-2,-10.7,14.3,-32,49c-4.7,7.3,-9.8,15.7,-15.5,25c-5.7,9.3,-9.8,16,-12.5,20\ns-5,7,-5,7c-4,-3.3,-8.3,-7.7,-13,-13s-13,-13,-13,-13s76,-122,76,-122s77,-121,77,-121\ns209,968,209,968c0,-2,84.7,-361.7,254,-1079c169.3,-717.3,254.7,-1077.7,256,-1081\nl0 -0c4,-6.7,10,-10,18,-10 H400000\nv40H1014.6\ns-87.3,378.7,-272.6,1166c-185.3,787.3,-279.3,1182.3,-282,1185\nc-2,6,-10,9,-24,9\nc-8,0,-12,-0.7,-12,-2z M1001 80\nh400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7634em;\"><span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.3658em;\"><span></span></span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p>其中 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msup><mi>W</mi><mrow><mn>2</mn><mo separator=\"true\">,</mo><mi>L</mi></mrow></msup></mrow><annotation encoding=\"application/x-tex\">W^{2,L}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8413em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8413em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\">L</span></span></span></span></span></span></span></span></span></span></span></span> 是FFN中的参数矩阵</p>\n<p><strong>Proof:</strong><br>\n由链式法则</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mtable rowspacing=\"0.25em\" columnalign=\"right left\" columnspacing=\"0em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mover accent=\"true\"><mi mathvariant=\"script\">L</mi><mo>~</mo></mover><mo stretchy=\"false\">(</mo><msubsup><mi>x</mi><mrow><mi>L</mi><mo>+</mo><mn>1</mn></mrow><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mo stretchy=\"false\">)</mo></mrow><mrow><mi mathvariant=\"normal\">∂</mi><msup><mi>W</mi><mrow><mn>2</mn><mo separator=\"true\">,</mo><mi>L</mi></mrow></msup></mrow></mfrac></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mover accent=\"true\"><mi mathvariant=\"script\">L</mi><mo>~</mo></mover><mo stretchy=\"false\">(</mo><msubsup><mi>x</mi><mrow><mi>L</mi><mo>+</mo><mn>1</mn></mrow><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mo stretchy=\"false\">)</mo></mrow><mrow><mi mathvariant=\"normal\">∂</mi><msubsup><mi>x</mi><mrow><mi>L</mi><mo>+</mo><mn>1</mn></mrow><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup></mrow></mfrac><mrow><mo fence=\"true\">(</mo><munderover><mo>∏</mo><mrow><mi>k</mi><mo>=</mo><mi>l</mi></mrow><mi>L</mi></munderover><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><msubsup><mi>x</mi><mrow><mi>k</mi><mo>+</mo><mn>1</mn></mrow><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup></mrow><mrow><mi mathvariant=\"normal\">∂</mi><msubsup><mi>x</mi><mi>k</mi><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup></mrow></mfrac><mo fence=\"true\">)</mo></mrow><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><msubsup><mi>x</mi><mi>l</mi><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup></mrow><msup><mi>W</mi><mrow><mn>2</mn><mo separator=\"true\">,</mo><mi>L</mi></mrow></msup></mfrac></mrow></mstyle></mtd></mtr></mtable><annotation encoding=\"application/x-tex\">\\begin{aligned}\n\\frac{\\partial\\tilde{\\mathcal{L}} (x^{post}_{L+1})}{\\partial W^{2,L}}&amp;= \\frac{\\partial \\tilde{\\mathcal{L}}(x^{post}_{L+1})}{\\partial x^{post}_{L+1}}\\left(\\prod_{k=l}^L\\frac{\\partial x^{post}_{k+1}}{\\partial x^{post}_{k}}\\right)\\frac{\\partial x^{post}_l}{W^{2,L}} \n\\end{aligned}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:3.4304em;vertical-align:-1.4652em;\"></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-r\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.9652em;\"><span style=\"top:-3.9652em;\"><span class=\"pstrut\" style=\"height:3.8283em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.6621em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7673em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\">L</span></span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.7419em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9202em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathcal\">L</span></span><span style=\"top:-3.6023em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.2222em;\"><span class=\"mord\">~</span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4065em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">L</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3519em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.4652em;\"><span></span></span></span></span></span><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.9652em;\"><span style=\"top:-3.9652em;\"><span class=\"pstrut\" style=\"height:3.8283em;\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.6621em;\"><span style=\"top:-2.1985em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4065em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">L</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3519em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.7419em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9202em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathcal\">L</span></span><span style=\"top:-3.6023em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.2222em;\"><span class=\"mord\">~</span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4065em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">L</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3519em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1533em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size4\">(</span></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.8283em;\"><span style=\"top:-1.8479em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03148em;\">k</span><span class=\"mrel mtight\">=</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∏</span></span></span><span style=\"top:-4.3em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">L</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3021em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.6611em;\"><span style=\"top:-2.1985em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.3987em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03148em;\">k</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3013em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.7496em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.3987em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03148em;\">k</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3596em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1028em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size4\">)</span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.6028em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7673em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\">L</span></span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.6913em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.3987em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3013em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.4652em;\"><span></span></span></span></span></span></span></span></span></span></span></span></p>\n<p><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mstyle displaystyle=\"true\" scriptlevel=\"0\"><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mover accent=\"true\"><mi>L</mi><mo>~</mo></mover></mrow><mrow><mi mathvariant=\"normal\">∂</mi><msubsup><mi>x</mi><mrow><mi>L</mi><mo>+</mo><mn>1</mn></mrow><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup></mrow></mfrac></mstyle></mrow><annotation encoding=\"application/x-tex\">\\dfrac{\\partial\\tilde{L}}{\\partial x^{post}_{L+1}}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:2.7505em;vertical-align:-1.1533em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.5972em;\"><span style=\"top:-2.1985em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4065em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">L</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3519em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9202em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">L</span></span><span style=\"top:-3.6023em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.2222em;\"><span class=\"mord\">~</span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1533em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span> 是有界的，因为 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msubsup><mi>x</mi><mrow><mi>L</mi><mo>+</mo><mn>1</mn></mrow><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup></mrow><annotation encoding=\"application/x-tex\">x^{post}_{L+1}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.2633em;vertical-align:-0.3519em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4065em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">L</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3519em;\"><span></span></span></span></span></span></span></span></span></span> 是 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo stretchy=\"false\">(</mo><mi>ε</mi><mo separator=\"true\">,</mo><mi>δ</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">(\\varepsilon,\\delta)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">ε</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03785em;\">δ</span><span class=\"mclose\">)</span></span></span></span>-Bounded的</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mrow><mo fence=\"true\">∣</mo><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mover accent=\"true\"><mi>L</mi><mo>~</mo></mover></mrow><mrow><mi mathvariant=\"normal\">∂</mi><msubsup><mi>x</mi><mrow><mi>L</mi><mo>+</mo><mn>1</mn></mrow><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup></mrow></mfrac><mo fence=\"true\">∣</mo></mrow><mo>=</mo><mrow><mo fence=\"true\">∣</mo><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi mathvariant=\"double-struck\">P</mi><mo stretchy=\"false\">(</mo><mrow><mi mathvariant=\"normal\">S</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">f</mi><mi mathvariant=\"normal\">t</mi><mi mathvariant=\"normal\">m</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">x</mi></mrow><mo stretchy=\"false\">(</mo><msup><mi>W</mi><mrow><mi>e</mi><mi>m</mi><mi>b</mi></mrow></msup><msubsup><mi>x</mi><mrow><mi>L</mi><mo>+</mo><mn>1</mn></mrow><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mi mathvariant=\"normal\">∣</mi><msub><mi>y</mi><mi>i</mi></msub><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo></mrow><mrow><mi mathvariant=\"double-struck\">P</mi><mo stretchy=\"false\">(</mo><mrow><mi mathvariant=\"normal\">S</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">f</mi><mi mathvariant=\"normal\">t</mi><mi mathvariant=\"normal\">m</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">x</mi></mrow><mo stretchy=\"false\">(</mo><msup><mi>W</mi><mrow><mi>e</mi><mi>m</mi><mi>b</mi></mrow></msup><msubsup><mi>x</mi><mrow><mi>L</mi><mo>+</mo><mn>1</mn></mrow><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mi mathvariant=\"normal\">∣</mi><msub><mi>y</mi><mi>i</mi></msub><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo><mo>⋅</mo><mi mathvariant=\"normal\">∂</mi><msubsup><mi>x</mi><mrow><mi>L</mi><mo>+</mo><mn>1</mn></mrow><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup></mrow></mfrac><mo fence=\"true\">∣</mo></mrow><mo>=</mo><mi mathvariant=\"script\">O</mi><mo stretchy=\"false\">(</mo><mn>1</mn><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\left|\\dfrac{\\partial\\tilde{L}}{\\partial x^{post}_{L+1}}\\right| = \\left|\\frac{\\partial \\mathbb{P}(\\mathrm{Softmax}(W^{emb}x^{post}_{L+1}|y_i))}{\\mathbb{P}(\\mathrm{Softmax}(W^{emb}x^{post}_{L+1}|y_i))\\cdot\\partial x^{post}_{L+1}}\\right|= \\mathcal{O}(1)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:3em;vertical-align:-1.25em;\"></span><span class=\"minner\"><span class=\"mopen\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.75em;\"><span style=\"top:-3.75em;\"><span class=\"pstrut\" style=\"height:5em;\"></span><span style=\"width:0.333em;height:3.000em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.333em\" height=\"3.000em\" viewbox=\"0 0 333 3000\"><path d=\"M145 15 v585 v1800 v585 c2.667,10,9.667,15,21,15\nc10,0,16.667,-5,20,-15 v-585 v-1800 v-585 c-2.667,-10,-9.667,-15,-21,-15\nc-10,0,-16.667,5,-20,15z M188 15 H145 v585 v1800 v585 h43z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.25em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.5972em;\"><span style=\"top:-2.1985em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4065em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">L</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3519em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9202em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">L</span></span><span style=\"top:-3.6023em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.2222em;\"><span class=\"mord\">~</span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1533em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.75em;\"><span style=\"top:-3.75em;\"><span class=\"pstrut\" style=\"height:5em;\"></span><span style=\"width:0.333em;height:3.000em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.333em\" height=\"3.000em\" viewbox=\"0 0 333 3000\"><path d=\"M145 15 v585 v1800 v585 c2.667,10,9.667,15,21,15\nc10,0,16.667,-5,20,-15 v-585 v-1800 v-585 c-2.667,-10,-9.667,-15,-21,-15\nc-10,0,-16.667,5,-20,15z M188 15 H145 v585 v1800 v585 h43z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.25em;\"><span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:3em;vertical-align:-1.25em;\"></span><span class=\"minner\"><span class=\"mopen\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.75em;\"><span style=\"top:-3.75em;\"><span class=\"pstrut\" style=\"height:5em;\"></span><span style=\"width:0.333em;height:3.000em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.333em\" height=\"3.000em\" viewbox=\"0 0 333 3000\"><path d=\"M145 15 v585 v1800 v585 c2.667,10,9.667,15,21,15\nc10,0,16.667,-5,20,-15 v-585 v-1800 v-585 c-2.667,-10,-9.667,-15,-21,-15\nc-10,0,-16.667,5,-20,15z M188 15 H145 v585 v1800 v585 h43z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.25em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.6533em;\"><span style=\"top:-2.1985em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathbb\">P</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathrm\">Softmax</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7751em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">e</span><span class=\"mord mathnormal mtight\">mb</span></span></span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4065em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">L</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3519em;\"><span></span></span></span></span></span></span><span class=\"mord\">∣</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">))</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4065em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">L</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3519em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.7419em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathbb\">P</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathrm\">Softmax</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8491em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">e</span><span class=\"mord mathnormal mtight\">mb</span></span></span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4065em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">L</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3519em;\"><span></span></span></span></span></span></span><span class=\"mord\">∣</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">))</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1533em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.75em;\"><span style=\"top:-3.75em;\"><span class=\"pstrut\" style=\"height:5em;\"></span><span style=\"width:0.333em;height:3.000em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.333em\" height=\"3.000em\" viewbox=\"0 0 333 3000\"><path d=\"M145 15 v585 v1800 v585 c2.667,10,9.667,15,21,15\nc10,0,16.667,-5,20,-15 v-585 v-1800 v-585 c-2.667,-10,-9.667,-15,-21,-15\nc-10,0,-16.667,5,-20,15z M188 15 H145 v585 v1800 v585 h43z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.25em;\"><span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathcal\" style=\"margin-right:0.02778em;\">O</span><span class=\"mopen\">(</span><span class=\"mord\">1</span><span class=\"mclose\">)</span></span></span></span></span></p>\n<p>(此处略相关递推的阶估计，上文有相关Jacobian矩阵，只需进行估阶即可)关键在于</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mo fence=\"true\">{</mo><mtable rowspacing=\"0.36em\" columnalign=\"left left\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mtext>Post-LN:</mtext></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><msup><mrow><mo fence=\"true\">∥</mo><msub><mi>J</mi><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow></msub><mo stretchy=\"false\">(</mo><msubsup><mi>x</mi><mrow><mi>L</mi><mo>+</mo><mn>1</mn></mrow><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><mo stretchy=\"false\">)</mo><mo fence=\"true\">∥</mo></mrow><mn>2</mn></msup><mo>=</mo><mi mathvariant=\"script\">O</mi><mo stretchy=\"false\">(</mo><mfrac><mi>n</mi><mrow><mi mathvariant=\"normal\">∥</mi><msubsup><mi>x</mi><mrow><mi>L</mi><mo>+</mo><mn>1</mn></mrow><mrow><mi>p</mi><mi>o</mi><mi>s</mi><mi>t</mi></mrow></msubsup><msup><mi mathvariant=\"normal\">∥</mi><mn>2</mn></msup></mrow></mfrac><mo stretchy=\"false\">)</mo><mo>=</mo><mi mathvariant=\"script\">O</mi><mo stretchy=\"false\">(</mo><mn>1</mn><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mtext>Pre-LN:</mtext></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><msup><mrow><mo fence=\"true\">∣</mo><msub><mi>J</mi><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">N</mi></mrow></msub><mo stretchy=\"false\">(</mo><msubsup><mi>x</mi><mrow><mi>f</mi><mi>i</mi><mi>n</mi><mi>a</mi><mi>l</mi></mrow><mrow><mi>p</mi><mi>r</mi><mi>e</mi></mrow></msubsup><mo stretchy=\"false\">)</mo><mo fence=\"true\">∥</mo></mrow><mn>2</mn></msup><mo>=</mo><mi mathvariant=\"script\">O</mi><mo stretchy=\"false\">(</mo><mfrac><mi>n</mi><mrow><mi mathvariant=\"normal\">∥</mi><msubsup><mi>x</mi><mrow><mi>f</mi><mi>i</mi><mi>n</mi><mi>a</mi><mi>l</mi></mrow><mrow><mi>p</mi><mi>r</mi><mi>e</mi></mrow></msubsup><msup><mi mathvariant=\"normal\">∥</mi><mn>2</mn></msup></mrow></mfrac><mo stretchy=\"false\">)</mo><mo>=</mo><mi mathvariant=\"script\">O</mi><mo stretchy=\"false\">(</mo><mfrac><mn>1</mn><mi>L</mi></mfrac><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr></mtable></mrow><annotation encoding=\"application/x-tex\">\\begin{dcases}\n\\text{Post-LN:}&amp;\\left\\|J_{\\mathrm{LN}}(x^{post}_{L+1})\\right\\|^2 = \\mathcal{O}(\\frac{n}{\\|x^{post}_{L+1}\\|^2}) = \\mathcal{O}(1)\\\\[2em]\n\\text{Pre-LN:}&amp;\\left|J_{\\mathrm{LN}}(x^{pre}_{final})\\right\\|^2 = \\mathcal{O}(\\frac{n}{\\|x^{pre}_{final}\\|^2}) = \\mathcal{O}(\\frac{1}{L}) \n\\end{dcases}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:6.0249em;vertical-align:-2.7624em;\"></span><span class=\"minner\"><span class=\"mopen\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.25em;\"><span style=\"top:-1.366em;\"><span class=\"pstrut\" style=\"height:3.216em;\"></span><span class=\"delimsizinginner delim-size4\"><span>⎩</span></span></span><span style=\"top:-1.358em;\"><span class=\"pstrut\" style=\"height:3.216em;\"></span><span style=\"height:1.216em;width:0.8889em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.8889em\" height=\"1.216em\" style=\"width:0.8889em\" viewbox=\"0 0 888.89 1216\" preserveaspectratio=\"xMinYMin\"><path d=\"M384 0 H504 V1216 H384z M384 0 H504 V1216 H384z\"/></svg></span></span><span style=\"top:-3.216em;\"><span class=\"pstrut\" style=\"height:3.216em;\"></span><span class=\"delimsizinginner delim-size4\"><span>⎨</span></span></span><span style=\"top:-4.358em;\"><span class=\"pstrut\" style=\"height:3.216em;\"></span><span style=\"height:1.216em;width:0.8889em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.8889em\" height=\"1.216em\" style=\"width:0.8889em\" viewbox=\"0 0 888.89 1216\" preserveaspectratio=\"xMinYMin\"><path d=\"M384 0 H504 V1216 H384z M384 0 H504 V1216 H384z\"/></svg></span></span><span style=\"top:-5.566em;\"><span class=\"pstrut\" style=\"height:3.216em;\"></span><span class=\"delimsizinginner delim-size4\"><span>⎧</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.75em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.2624em;\"><span style=\"top:-5.501em;\"><span class=\"pstrut\" style=\"height:3.354em;\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord\">Post-LN:</span></span></span></span><span style=\"top:-1.715em;\"><span class=\"pstrut\" style=\"height:3.354em;\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord\">Pre-LN:</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.7624em;\"><span></span></span></span></span></span><span class=\"arraycolsep\" style=\"width:1em;\"></span><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.2624em;\"><span style=\"top:-5.501em;\"><span class=\"pstrut\" style=\"height:3.354em;\"></span><span class=\"mord\"><span class=\"minner\"><span class=\"minner\"><span class=\"mopen\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.85em;\"><span style=\"top:-2.85em;\"><span class=\"pstrut\" style=\"height:3.2em;\"></span><span style=\"width:0.556em;height:1.200em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.556em\" height=\"1.200em\" viewbox=\"0 0 556 1200\"><path d=\"M145 15 v585 v0 v585 c2.667,10,9.667,15,21,15\nc10,0,16.667,-5,20,-15 v-585 v0 v-585 c-2.667,-10,-9.667,-15,-21,-15\nc-10,0,-16.667,5,-20,15z M188 15 H145 v585 v0 v585 h43z\nM367 15 v585 v0 v585 c2.667,10,9.667,15,21,15\nc10,0,16.667,-5,20,-15 v-585 v0 v-585 c-2.667,-10,-9.667,-15,-21,-15\nc-10,0,-16.667,5,-20,15z M410 15 H367 v585 v0 v585 h43z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.35em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.09618em;\">J</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0962em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathrm mtight\">LN</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4065em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">L</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3519em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mclose\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.85em;\"><span style=\"top:-2.85em;\"><span class=\"pstrut\" style=\"height:3.2em;\"></span><span style=\"width:0.556em;height:1.200em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.556em\" height=\"1.200em\" viewbox=\"0 0 556 1200\"><path d=\"M145 15 v585 v0 v585 c2.667,10,9.667,15,21,15\nc10,0,16.667,-5,20,-15 v-585 v0 v-585 c-2.667,-10,-9.667,-15,-21,-15\nc-10,0,-16.667,5,-20,15z M188 15 H145 v585 v0 v585 h43z\nM367 15 v585 v0 v585 c2.667,10,9.667,15,21,15\nc10,0,16.667,-5,20,-15 v-585 v0 v-585 c-2.667,-10,-9.667,-15,-21,-15\nc-10,0,-16.667,5,-20,15z M410 15 H367 v585 v0 v585 h43z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.35em;\"><span></span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1155em;\"><span style=\"top:-3.3644em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord mathcal\" style=\"margin-right:0.02778em;\">O</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1076em;\"><span style=\"top:-2.1985em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">∥</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9115em;\"><span style=\"top:-2.4065em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">L</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3519em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord\">∥</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7401em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">n</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1533em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord mathcal\" style=\"margin-right:0.02778em;\">O</span><span class=\"mopen\">(</span><span class=\"mord\">1</span><span class=\"mclose\">)</span></span></span><span style=\"top:-1.715em;\"><span class=\"pstrut\" style=\"height:3.354em;\"></span><span class=\"mord\"><span class=\"minner\"><span class=\"minner\"><span class=\"mopen\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.15em;\"><span style=\"top:-3.15em;\"><span class=\"pstrut\" style=\"height:3.8em;\"></span><span style=\"width:0.333em;height:1.800em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.333em\" height=\"1.800em\" viewbox=\"0 0 333 1800\"><path d=\"M145 15 v585 v600 v585 c2.667,10,9.667,15,21,15\nc10,0,16.667,-5,20,-15 v-585 v-600 v-585 c-2.667,-10,-9.667,-15,-21,-15\nc-10,0,-16.667,5,-20,15z M188 15 H145 v585 v600 v585 h43z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.65em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.09618em;\">J</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0962em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathrm mtight\">LN</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.3987em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10764em;\">f</span><span class=\"mord mathnormal mtight\">ina</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.4374em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mclose\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.15em;\"><span style=\"top:-3.15em;\"><span class=\"pstrut\" style=\"height:3.8em;\"></span><span style=\"width:0.556em;height:1.800em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.556em\" height=\"1.800em\" viewbox=\"0 0 556 1800\"><path d=\"M145 15 v585 v600 v585 c2.667,10,9.667,15,21,15\nc10,0,16.667,-5,20,-15 v-585 v-600 v-585 c-2.667,-10,-9.667,-15,-21,-15\nc-10,0,-16.667,5,-20,15z M188 15 H145 v585 v600 v585 h43z\nM367 15 v585 v600 v585 c2.667,10,9.667,15,21,15\nc10,0,16.667,-5,20,-15 v-585 v-600 v-585 c-2.667,-10,-9.667,-15,-21,-15\nc-10,0,-16.667,5,-20,15z M410 15 H367 v585 v600 v585 h43z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.65em;\"><span></span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.354em;\"><span style=\"top:-3.6029em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord mathcal\" style=\"margin-right:0.02778em;\">O</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1076em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">∥</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.3987em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10764em;\">f</span><span class=\"mord mathnormal mtight\">ina</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.4374em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord\">∥</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7401em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">n</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1234em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord mathcal\" style=\"margin-right:0.02778em;\">O</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3214em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">L</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.7624em;\"><span></span></span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p>Theorem 1 的结论证明了：在初始化时刻，Post-LN 的梯度规模是常数阶，这意味着它与模型深度 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>L</mi></mrow><annotation encoding=\"application/x-tex\">L</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\">L</span></span></span></span> 无关，无法感知并抑制深层带来的不稳定因素；而 Pre-LN 的梯度规模具有 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>O</mi><mo stretchy=\"false\">(</mo><mfrac><mn>1</mn><msqrt><mi>L</mi></msqrt></mfrac><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">O(\\frac{1}{\\sqrt{L}})</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.3831em;vertical-align:-0.538em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">O</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8451em;\"><span style=\"top:-2.5374em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord sqrt mtight\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9323em;\"><span class=\"svg-align\" style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mtight\" style=\"padding-left:0.833em;\"><span class=\"mord mathnormal mtight\">L</span></span></span><span style=\"top:-2.8923em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"hide-tail mtight\" style=\"min-width:0.853em;height:1.08em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.08em\" viewbox=\"0 0 400000 1080\" preserveaspectratio=\"xMinYMin slice\"><path d=\"M95,702\nc-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14\nc0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54\nc44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10\ns173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429\nc69,-144,104.5,-217.7,106.5,-221\nl0 -0\nc5.3,-9.3,12,-14,20,-14\nH400000v40H845.2724\ns-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7\nc-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z\nM834 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1077em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.394em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.538em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose\">)</span></span></span></span> 的衰减性，能随着模型深度的增加降低初始梯度强度，减弱了对 warmup 的依赖。</p>\n",
            "tags": []
        },
        {
            "id": "https://yuuko.site/2026/05/23/CS/LLM/mHC/",
            "url": "https://yuuko.site/2026/05/23/CS/LLM/mHC/",
            "title": "From ResNet to mHC",
            "date_published": "2026-05-22T16:00:00.000Z",
            "content_html": "<link rel=\"stylesheet\" type=\"text/css\" href=\"https://cdn.jsdelivr.net/hint.css/2.4.1/hint.min.css\"><h1><span id=\"tensor-flow-的-演化历程\"> Tensor Flow 的 演化历程</span></h1>\n<p>整体的发展脉络分为四类</p>\n<ul>\n<li>跨层深度连接 Depth Connection</li>\n<li>同层广度连接 Width Connection</li>\n<li>把层看成动力系统 Continuous-Implicit Flow</li>\n<li>不同样本/token 走不同路径 Routing Connection</li>\n</ul>\n<h2><span id=\"plain-chain\"> Plain Chain</span></h2>\n<p>最原始的信息流就是层的复合，满足 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>x</mi><mrow><mi>l</mi><mo>+</mo><mn>1</mn></mrow></msub><mo>=</mo><msub><mi mathvariant=\"script\">F</mi><mi>l</mi></msub><mo stretchy=\"false\">(</mo><msub><mi>x</mi><mi>l</mi></msub><mo separator=\"true\">,</mo><msub><mi>θ</mi><mi>l</mi></msub><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">x_{l+1} = \\mathcal{F}_{l}(x_l,\\theta_l)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6389em;vertical-align:-0.2083em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2083em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathcal\" style=\"margin-right:0.09931em;\">F</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0993em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span>, 这样的深层网络的微分就是</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><msub><mi>x</mi><mrow><mi>l</mi><mo>+</mo><mn>1</mn></mrow></msub></mrow><mrow><mi mathvariant=\"normal\">∂</mi><msub><mi>x</mi><mn>0</mn></msub></mrow></mfrac><mo>=</mo><munderover><mo>∏</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>l</mi></munderover><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><msub><mi mathvariant=\"script\">F</mi><mi>i</mi></msub></mrow><mrow><mi mathvariant=\"normal\">∂</mi><msub><mi>x</mi><mi>i</mi></msub></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\frac{\\partial x_{l+1}}{\\partial x_0} = \\prod_{i = 1}^l \\frac{\\partial \\mathcal{F}_i}{\\partial x_i}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:2.2074em;vertical-align:-0.836em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3714em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">0</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2083em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.836em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:3.1138em;vertical-align:-1.2777em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.8361em;\"><span style=\"top:-1.8723em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">i</span><span class=\"mrel mtight\">=</span><span class=\"mord mtight\">1</span></span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∏</span></span></span><span style=\"top:-4.3em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.2777em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3714em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathcal\" style=\"margin-right:0.09931em;\">F</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0993em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.836em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p>Gradient of Loss Function</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi mathvariant=\"script\">L</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><msub><mi>x</mi><mi>l</mi></msub></mrow></mfrac><mo>=</mo><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi mathvariant=\"script\">L</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><msub><mi>x</mi><mi>L</mi></msub></mrow></mfrac><munderover><mo>∏</mo><mrow><mi>i</mi><mo>=</mo><mi>l</mi></mrow><mrow><mi>L</mi><mo>−</mo><mn>1</mn></mrow></munderover><msub><mi>J</mi><msub><mi mathvariant=\"script\">F</mi><mi>i</mi></msub></msub><mo stretchy=\"false\">(</mo><msub><mi>x</mi><mi>i</mi></msub><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\frac{\\partial \\mathcal{L}}{\\partial x_l} = \\frac{\\partial \\mathcal{L}}{\\partial x_L}\\prod_{i=l}^{L-1}J_{\\mathcal{F}_i}(x_i)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:2.2074em;vertical-align:-0.836em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3714em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathcal\">L</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.836em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:3.1304em;vertical-align:-1.3021em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3714em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">L</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathcal\">L</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.836em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.8283em;\"><span style=\"top:-1.8479em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">i</span><span class=\"mrel mtight\">=</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∏</span></span></span><span style=\"top:-4.3em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">L</span><span class=\"mbin mtight\">−</span><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3021em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.09618em;\">J</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0962em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathcal mtight\" style=\"margin-right:0.09931em;\">F</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3281em;\"><span style=\"top:-2.357em;margin-left:-0.0993em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.143em;\"><span></span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2501em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span></span></p>\n<p>Plain Chain 容易出现梯度消失或者梯度爆炸的现象。 随着梯度的模大于0或者小于0，指数发散(爆炸)或者收敛到0(消失)</p>\n<h2><span id=\"resnet\"> ResNet</span></h2>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>x</mi><mrow><mi>l</mi><mo>+</mo><mn>1</mn></mrow></msub><mo>=</mo><msub><mi>x</mi><mi>l</mi></msub><mo>+</mo><msub><mi mathvariant=\"script\">F</mi><mi>l</mi></msub><mo stretchy=\"false\">(</mo><msub><mi>x</mi><mi>l</mi></msub><mo separator=\"true\">,</mo><msub><mi>θ</mi><mi>l</mi></msub><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">x_{l+1} = x_{l} + \\mathcal{F}_{l}(x_l,\\theta_l)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6389em;vertical-align:-0.2083em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2083em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathcal\" style=\"margin-right:0.09931em;\">F</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0993em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span></span></p>\n<p>将模型的更新量作为线性可加的残差，将整体的复合非线性映射变为了一个恒等映射+微小扰动量的形式。</p>\n<p><img loading=\"lazy\" src=\"/picture/mHC/resnet.png\" alt=\"resnet\"></p>\n<h2><span id=\"densenet\"> DenseNet</span></h2>\n<p>DenseNet在ResNet的Skip Connections的基础上,将加性结构换为高维嵌入, 每一层的输入都是前面所有层输出的Concat, 即</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>x</mi><mrow><mi>l</mi><mo>+</mo><mn>1</mn></mrow></msub><mo>=</mo><msub><mi mathvariant=\"script\">F</mi><mi>l</mi></msub><mo stretchy=\"false\">(</mo><mrow><mi mathvariant=\"normal\">C</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">n</mi><mi mathvariant=\"normal\">c</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">t</mi></mrow><mo stretchy=\"false\">[</mo><msub><mi>x</mi><mn>0</mn></msub><mo separator=\"true\">,</mo><mo>⋯</mo><mtext> </mtext><mo separator=\"true\">,</mo><msub><mi>x</mi><mi>l</mi></msub><mo stretchy=\"false\">]</mo><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">x_{l+1} = \\mathcal{F}_l(\\mathrm{Concat}[x_0,\\cdots,x_l])\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6389em;vertical-align:-0.2083em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2083em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathcal\" style=\"margin-right:0.09931em;\">F</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0993em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathrm\">Concat</span></span><span class=\"mopen\">[</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">0</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\">⋯</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">])</span></span></span></span></span></p>\n<p><img loading=\"lazy\" src=\"/picture/mHC/densenet.png\" alt=\"densenet\"><br>\nDenseNet的性能开销和维度爆炸无疑是相当恐怖的。在CV领域DenseNet取得了不错的效果，但是在LLM中，其结构无法合适的Scaling-Up</p>\n<h2><span id=\"post-ln-pre-ln及相关的衍生架构-深度transformer-中的-residual-stream与稳定化\"> Post-LN、Pre-LN及相关的衍生架构 -- 深度Transformer 中的 residual stream与稳定化</span></h2>\n<p><a href=\"/2026/05/23/CS/LLM/ln/\" title=\"Transformer 中的 Layer Normalization与梯度稳定性\">Transformer 中的 Layer Normalization与梯度稳定性</a> 与 <a href=\"/2026/05/24/CS/LLM/deepnet/\" title=\"DeepNet\">DeepNet</a> 对相关的架构进行了讨论。在ResNet的基础结构上，研究LayerNorm的位置对于整体训练的效果、收敛的速度与稳定性的优化</p>\n<h2><span id=\"neural-ode\"> Neural ODE</span></h2>\n<h2><span id=\"deep-equilibrium-model\"> Deep Equilibrium Model</span></h2>\n<h2><span id=\"moe\"> MoE</span></h2>\n<p>MoE 是 Tenor Flow go wider 的结构范式</p>\n<p><a href=\"/2026/06/01/CS/LLM/moe/\" title=\"MoE -- Mixture of experts\">MoE -- Mixture of experts</a></p>\n<h1><span id=\"hyper-connections\"> Hyper-Connections</span></h1>\n<p>本文认为类似于梯度消失/梯度爆炸在深层网络的训练中的问题已经被Post-LN / Pre-LN (<a href=\"/2026/05/23/CS/LLM/ln/\" title=\"Transformer 中的 Layer Normalization与梯度稳定性\">Transformer 中的 Layer Normalization与梯度稳定性</a>)解决了，但是二者之间存在对抗性博弈。</p>\n<ul>\n<li>Pre-LN：确实有效解决了梯度消失问题，使得训练极深的网络成为可能。但它带来了表示崩溃（Representation Collapse）——即深层特征变得高度相似，每一层都在做重复工作。</li>\n<li>Post-LN：能缓解表示崩溃，但代价是重新引入了梯度消失，导致训练极不稳定。</li>\n</ul>\n<p><img loading=\"lazy\" src=\"/picture/mHC/cos_similarity.png\" alt=\"cosine-similarity\"></p>\n<p>实验结果展示了HC在深层网络中余弦相似度较低的结果，表明HC相比Pre-LN解决了表示崩溃的问题</p>\n<p>Hyper-Connection 由两个部分拼接而成</p>\n<ul>\n<li>Width-Connections</li>\n<li>Depth-Connections</li>\n</ul>\n<p>作为残差链接的横向的和纵向的线性拓展。相当于将n路ResNet 之间嵌入了线性层(如果是SHC则为静态参数线性层，如果是DHC则为可学习的线性层)</p>\n<p><img loading=\"lazy\" src=\"/picture/mHC/hc.png\" alt=\"hc\"></p>\n<p>Hidden Matrix的不同hidden vector 在初始化时状态是相同的，即满足</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi mathvariant=\"normal\">H</mi><mn>0</mn></msub><mo>=</mo><mo stretchy=\"false\">(</mo><msub><mi>h</mi><mn>0</mn></msub><mo separator=\"true\">,</mo><msub><mi>h</mi><mn>0</mn></msub><mo separator=\"true\">,</mo><mo>⋯</mo><mtext> </mtext><mo separator=\"true\">,</mo><msub><mi>h</mi><mn>0</mn></msub><mo stretchy=\"false\">)</mo><mo>∈</mo><msup><mi mathvariant=\"double-struck\">R</mi><mrow><mi>n</mi><mo>×</mo><mi>d</mi></mrow></msup></mrow><annotation encoding=\"application/x-tex\">\\mathrm{H}_0 = (h_0,h_0,\\cdots ,h_0) \\in \\mathbb{R}^{n\\times d }\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">H</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">0</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">h</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">0</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">h</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">0</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\">⋯</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">h</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">0</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">∈</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8991em;\"></span><span class=\"mord\"><span class=\"mord mathbb\">R</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8991em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">n</span><span class=\"mbin mtight\">×</span><span class=\"mord mathnormal mtight\">d</span></span></span></span></span></span></span></span></span></span></span></span></span></p>\n<p>其中hidden vector 的个数<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>n</mi></mrow><annotation encoding=\"application/x-tex\">n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">n</span></span></span></span> 称为拓展率, 表示 Width-Connections 在横向拓展的维度，这个维度对hidden vector本身的信息量没有影响。</p>\n<p>在后续的非线性变换后，不同的hidden vector才变为具有不同意义的vector构成 hidden matrix</p>\n<p>由图可以写出层间的残差递推公式, 对于第<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>i</mi></mrow><annotation encoding=\"application/x-tex\">i</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6595em;\"></span><span class=\"mord mathnormal\">i</span></span></span></span> 层的残差，有</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>h</mi><mrow><mi>i</mi><mo separator=\"true\">,</mo><mi>t</mi><mo>+</mo><mn>1</mn></mrow></msub><mo>=</mo><mo>∑</mo><msub><mi>α</mi><mrow><mi>j</mi><mo separator=\"true\">,</mo><mi>i</mi></mrow></msub><msub><mi>h</mi><mrow><mi>j</mi><mo separator=\"true\">,</mo><mi>t</mi></mrow></msub><mo>+</mo><msub><mi>β</mi><mi>i</mi></msub><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">y</mi><mi mathvariant=\"normal\">e</mi><mi mathvariant=\"normal\">r</mi></mrow><mo stretchy=\"false\">(</mo><mo>∑</mo><msub><mi>α</mi><mrow><mi>k</mi><mo separator=\"true\">,</mo><mn>0</mn></mrow></msub><msub><mi>h</mi><mi>k</mi></msub><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">h_{i,t+1} = \\sum \\alpha_{j,i} h_{j,t} + \\beta_i \\mathrm{Layer}(\\sum \\alpha_{k,0} h_k)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.9805em;vertical-align:-0.2861em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">h</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">i</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\">t</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.6em;vertical-align:-0.55em;\"></span><span class=\"mop op-symbol large-op\" style=\"position:relative;top:0em;\">∑</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0037em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\">i</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\">h</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.6em;vertical-align:-0.55em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0528em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathrm\">Layer</span></span><span class=\"mopen\">(</span><span class=\"mop op-symbol large-op\" style=\"position:relative;top:0em;\">∑</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0037em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03148em;\">k</span><span class=\"mpunct mtight\">,</span><span class=\"mord mtight\">0</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\">h</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03148em;\">k</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span></span></p>\n<p>用矩阵形式编码相应的参数:</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mrow><mi mathvariant=\"script\">H</mi><mi mathvariant=\"script\">C</mi></mrow><mo>=</mo><mrow><mo fence=\"true\">(</mo><mtable rowspacing=\"0.16em\" columnalign=\"center center\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mn>0</mn></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mi>B</mi></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>A</mi><mi>m</mi></msub></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>A</mi><mi>r</mi></msub></mstyle></mtd></mtr></mtable><mo fence=\"true\">)</mo></mrow><mo>=</mo><mrow><mo fence=\"true\">(</mo><mtable rowspacing=\"0.16em\" columnalign=\"center center center center center\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mn>0</mn></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>β</mi><mn>1</mn></msub></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>β</mi><mn>2</mn></msub></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mo lspace=\"0em\" rspace=\"0em\">⋯</mo></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>β</mi><mi>n</mi></msub></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>α</mi><mrow><mn>1</mn><mo separator=\"true\">,</mo><mn>0</mn></mrow></msub></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>α</mi><mrow><mn>1</mn><mo separator=\"true\">,</mo><mn>1</mn></mrow></msub></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>α</mi><mrow><mn>1</mn><mo separator=\"true\">,</mo><mn>2</mn></mrow></msub></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mo lspace=\"0em\" rspace=\"0em\">⋯</mo></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>α</mi><mrow><mn>1</mn><mo separator=\"true\">,</mo><mi>n</mi></mrow></msub></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>α</mi><mrow><mn>2</mn><mo separator=\"true\">,</mo><mn>0</mn></mrow></msub></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>α</mi><mrow><mn>2</mn><mo separator=\"true\">,</mo><mn>1</mn></mrow></msub></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>α</mi><mrow><mn>2</mn><mo separator=\"true\">,</mo><mn>2</mn></mrow></msub></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mo lspace=\"0em\" rspace=\"0em\">⋯</mo></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>α</mi><mrow><mn>2</mn><mo separator=\"true\">,</mo><mi>n</mi></mrow></msub></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi mathvariant=\"normal\">⋮</mi><mpadded height=\"0em\" voffset=\"0em\"><mspace mathbackground=\"black\" width=\"0em\" height=\"1.5em\"></mspace></mpadded></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mo lspace=\"0em\" rspace=\"0em\">⋱</mo></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>a</mi><mrow><mi>n</mi><mo separator=\"true\">,</mo><mn>0</mn></mrow></msub></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>α</mi><mrow><mi>n</mi><mo separator=\"true\">,</mo><mn>1</mn></mrow></msub></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>α</mi><mrow><mi>n</mi><mo separator=\"true\">,</mo><mn>2</mn></mrow></msub></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mo lspace=\"0em\" rspace=\"0em\">⋯</mo></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>α</mi><mrow><mi>n</mi><mo separator=\"true\">,</mo><mi>n</mi></mrow></msub></mstyle></mtd></mtr></mtable><mo fence=\"true\">)</mo></mrow></mrow><annotation encoding=\"application/x-tex\">\\mathcal{HC} = \n\\begin{pmatrix}\n0 &amp; B\\\\\nA_m &amp; A_r\n\\end{pmatrix} = \\begin{pmatrix}\n0 &amp; \\beta_1 &amp; \\beta_2 &amp; \\cdots &amp; \\beta_n\\\\\n\\alpha_{1,0} &amp; \\alpha_{1,1} &amp; \\alpha_{1,2} &amp; \\cdots &amp; \\alpha_{1,n}\\\\\n\\alpha_{2,0} &amp; \\alpha_{2,1} &amp; \\alpha_{2,2} &amp; \\cdots &amp; \\alpha_{2,n}\\\\\n\\vdots &amp; &amp;&amp;\\ddots\\\\\na_{n,0} &amp; \\alpha_{n,1} &amp; \\alpha_{n,2} &amp; \\cdots &amp; \\alpha_{n,n}\\\\\n\\end{pmatrix}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord\"><span class=\"mord mathcal\" style=\"margin-right:0.00965em;\">H</span><span class=\"mord mathcal\" style=\"margin-right:0.05834em;\">C</span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.4em;vertical-align:-0.95em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">(</span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.45em;\"><span style=\"top:-3.61em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">0</span></span></span><span style=\"top:-2.41em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">A</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">m</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.95em;\"><span></span></span></span></span></span><span class=\"arraycolsep\" style=\"width:0.5em;\"></span><span class=\"arraycolsep\" style=\"width:0.5em;\"></span><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.45em;\"><span style=\"top:-3.61em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05017em;\">B</span></span></span><span style=\"top:-2.41em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">A</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em;\">r</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.95em;\"><span></span></span></span></span></span></span></span><span class=\"mclose delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">)</span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:6.66em;vertical-align:-3.08em;\"></span><span class=\"minner\"><span class=\"mopen\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.55em;\"><span style=\"top:-5.55em;\"><span class=\"pstrut\" style=\"height:8.6em;\"></span><span style=\"width:0.875em;height:6.600em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.875em\" height=\"6.600em\" viewbox=\"0 0 875 6600\"><path d=\"M863,9c0,-2,-2,-5,-6,-9c0,0,-17,0,-17,0c-12.7,0,-19.3,0.3,-20,1\nc-5.3,5.3,-10.3,11,-15,17c-242.7,294.7,-395.3,682,-458,1162c-21.3,163.3,-33.3,349,\n-36,557 l0,3084c0.2,6,0,26,0,60c2,159.3,10,310.7,24,454c53.3,528,210,\n949.7,470,1265c4.7,6,9.7,11.7,15,17c0.7,0.7,7,1,19,1c0,0,18,0,18,0c4,-4,6,-7,6,-9\nc0,-2.7,-3.3,-8.7,-10,-18c-135.3,-192.7,-235.5,-414.3,-300.5,-665c-65,-250.7,-102.5,\n-544.7,-112.5,-882c-2,-104,-3,-167,-3,-189\nl0,-3092c0,-162.7,5.7,-314,17,-454c20.7,-272,63.7,-513,129,-723c65.3,\n-210,155.3,-396.3,270,-559c6.7,-9.3,10,-15.3,10,-18z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.05em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.58em;\"><span style=\"top:-6.4275em;\"><span class=\"pstrut\" style=\"height:3.6875em;\"></span><span class=\"mord\"><span class=\"mord\">0</span></span></span><span style=\"top:-5.2275em;\"><span class=\"pstrut\" style=\"height:3.6875em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0037em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">1</span><span class=\"mpunct mtight\">,</span><span class=\"mord mtight\">0</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-4.0275em;\"><span class=\"pstrut\" style=\"height:3.6875em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0037em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mpunct mtight\">,</span><span class=\"mord mtight\">0</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.1675em;\"><span class=\"pstrut\" style=\"height:3.6875em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord\">⋮</span><span class=\"mord rule\" style=\"border-right-width:0em;border-top-width:1.5em;bottom:0em;\"></span></span></span></span><span style=\"top:-0.9675em;\"><span class=\"pstrut\" style=\"height:3.6875em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">a</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">n</span><span class=\"mpunct mtight\">,</span><span class=\"mord mtight\">0</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.08em;\"><span></span></span></span></span></span><span class=\"arraycolsep\" style=\"width:0.5em;\"></span><span class=\"arraycolsep\" style=\"width:0.5em;\"></span><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.58em;\"><span style=\"top:-6.24em;\"><span class=\"pstrut\" style=\"height:3.5em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0528em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-5.04em;\"><span class=\"pstrut\" style=\"height:3.5em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0037em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">1</span><span class=\"mpunct mtight\">,</span><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.84em;\"><span class=\"pstrut\" style=\"height:3.5em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0037em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mpunct mtight\">,</span><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-1.98em;\"><span class=\"pstrut\" style=\"height:3.5em;\"></span><span class=\"mord\"></span></span><span style=\"top:-0.78em;\"><span class=\"pstrut\" style=\"height:3.5em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0037em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">n</span><span class=\"mpunct mtight\">,</span><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.08em;\"><span></span></span></span></span></span><span class=\"arraycolsep\" style=\"width:0.5em;\"></span><span class=\"arraycolsep\" style=\"width:0.5em;\"></span><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.58em;\"><span style=\"top:-6.24em;\"><span class=\"pstrut\" style=\"height:3.5em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0528em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-5.04em;\"><span class=\"pstrut\" style=\"height:3.5em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0037em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">1</span><span class=\"mpunct mtight\">,</span><span class=\"mord mtight\">2</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.84em;\"><span class=\"pstrut\" style=\"height:3.5em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0037em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mpunct mtight\">,</span><span class=\"mord mtight\">2</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-1.98em;\"><span class=\"pstrut\" style=\"height:3.5em;\"></span><span class=\"mord\"></span></span><span style=\"top:-0.78em;\"><span class=\"pstrut\" style=\"height:3.5em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0037em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">n</span><span class=\"mpunct mtight\">,</span><span class=\"mord mtight\">2</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.08em;\"><span></span></span></span></span></span><span class=\"arraycolsep\" style=\"width:0.5em;\"></span><span class=\"arraycolsep\" style=\"width:0.5em;\"></span><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.58em;\"><span style=\"top:-6.24em;\"><span class=\"pstrut\" style=\"height:3.5em;\"></span><span class=\"mord\"><span class=\"minner\">⋯</span></span></span><span style=\"top:-5.04em;\"><span class=\"pstrut\" style=\"height:3.5em;\"></span><span class=\"mord\"><span class=\"minner\">⋯</span></span></span><span style=\"top:-3.84em;\"><span class=\"pstrut\" style=\"height:3.5em;\"></span><span class=\"mord\"><span class=\"minner\">⋯</span></span></span><span style=\"top:-1.98em;\"><span class=\"pstrut\" style=\"height:3.5em;\"></span><span class=\"mord\"><span class=\"minner\">⋱</span></span></span><span style=\"top:-0.78em;\"><span class=\"pstrut\" style=\"height:3.5em;\"></span><span class=\"mord\"><span class=\"minner\">⋯</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.08em;\"><span></span></span></span></span></span><span class=\"arraycolsep\" style=\"width:0.5em;\"></span><span class=\"arraycolsep\" style=\"width:0.5em;\"></span><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.58em;\"><span style=\"top:-5.74em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:-0.0528em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-4.54em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0037em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">1</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\">n</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.34em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0037em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\">n</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-0.28em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:-0.0037em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">n</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\">n</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.08em;\"><span></span></span></span></span></span></span></span><span class=\"mclose\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.55em;\"><span style=\"top:-5.55em;\"><span class=\"pstrut\" style=\"height:8.6em;\"></span><span style=\"width:0.875em;height:6.600em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.875em\" height=\"6.600em\" viewbox=\"0 0 875 6600\"><path d=\"M76,0c-16.7,0,-25,3,-25,9c0,2,2,6.3,6,13c21.3,28.7,42.3,60.3,\n63,95c96.7,156.7,172.8,332.5,228.5,527.5c55.7,195,92.8,416.5,111.5,664.5\nc11.3,139.3,17,290.7,17,454c0,28,1.7,43,3.3,45l0,3009\nc-3,4,-3.3,16.7,-3.3,38c0,162,-5.7,313.7,-17,455c-18.7,248,-55.8,469.3,-111.5,664\nc-55.7,194.7,-131.8,370.3,-228.5,527c-20.7,34.7,-41.7,66.3,-63,95c-2,3.3,-4,7,-6,11\nc0,7.3,5.7,11,17,11c0,0,11,0,11,0c9.3,0,14.3,-0.3,15,-1c5.3,-5.3,10.3,-11,15,-17\nc242.7,-294.7,395.3,-681.7,458,-1161c21.3,-164.7,33.3,-350.7,36,-558\nl0,-3144c-2,-159.3,-10,-310.7,-24,-454c-53.3,-528,-210,-949.7,\n-470,-1265c-4.7,-6,-9.7,-11.7,-15,-17c-0.7,-0.7,-6.7,-1,-18,-1z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.05em;\"><span></span></span></span></span></span></span></span></span></span></span></span></p>\n<p>记第 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>t</mi></mrow><annotation encoding=\"application/x-tex\">t</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6151em;\"></span><span class=\"mord mathnormal\">t</span></span></span></span> 层的残差通道的Layer为 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>τ</mi></mrow><annotation encoding=\"application/x-tex\">\\tau</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.1132em;\">τ</span></span></span></span>，单层的递推写为矩阵方程的形式</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi mathvariant=\"normal\">H</mi><mrow><mi>t</mi><mo>+</mo><mn>1</mn></mrow></msub><mo>=</mo><msup><mi>B</mi><mi mathvariant=\"sans-serif\">T</mi></msup><mi>τ</mi><mo stretchy=\"false\">(</mo><msubsup><mi>A</mi><mi>m</mi><mi mathvariant=\"sans-serif\">T</mi></msubsup><msub><mi mathvariant=\"normal\">H</mi><mi mathvariant=\"normal\">t</mi></msub><mo stretchy=\"false\">)</mo><mo>+</mo><msubsup><mi>A</mi><mi>r</mi><mi mathvariant=\"sans-serif\">T</mi></msubsup><msub><mi mathvariant=\"normal\">H</mi><mi>t</mi></msub></mrow><annotation encoding=\"application/x-tex\">\\mathrm{H}_{t+1} = B^{\\mathsf{T}}\\tau (A_m^\\mathsf{T} \\mathrm{H_t}) + A_r^{\\mathsf{T}} \\mathrm{H}_t\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8917em;vertical-align:-0.2083em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">H</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">t</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2083em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.1491em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05017em;\">B</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8991em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathsf mtight\">T</span></span></span></span></span></span></span></span></span><span class=\"mord mathnormal\" style=\"margin-right:0.1132em;\">τ</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">A</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8991em;\"><span style=\"top:-2.453em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">m</span></span></span><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathsf mtight\">T</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.247em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathrm\">H</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathrm mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.1461em;vertical-align:-0.247em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">A</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8991em;\"><span style=\"top:-2.453em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em;\">r</span></span></span><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathsf mtight\">T</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.247em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathrm\">H</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span></span></p>\n<h2><span id=\"stastic-hyper-connections\"> Stastic Hyper-Connections</span></h2>\n<p>SHC 中的权重矩阵是固定学习的权重矩阵 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"script\">H</mi><mi mathvariant=\"script\">C</mi></mrow><annotation encoding=\"application/x-tex\">\\mathcal{HC}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord\"><span class=\"mord mathcal\" style=\"margin-right:0.00965em;\">H</span><span class=\"mord mathcal\" style=\"margin-right:0.05834em;\">C</span></span></span></span></span>。训练完后，在实际的推理阶段并不会改变相应的参数。</p>\n<h2><span id=\"dynamic-hyper-connections\"> Dynamic Hyper-Connections</span></h2>\n<p>DHC 中的权重矩阵依赖于输入的Hidden matrix, 但是并不是类似于在线学习的方式，而是作为Hidden matrix 的函数进行动态输出。</p>\n<p>DHC 中在每一个 Hidden Matrix 的输入时都会先进行相应的处理后再输出参数。相应的处理为</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mo fence=\"true\">{</mo><mtable rowspacing=\"0.36em\" columnalign=\"left left\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><msub><mover accent=\"true\"><mi mathvariant=\"normal\">H</mi><mo stretchy=\"true\">‾</mo></mover><mi>t</mi></msub><mo>=</mo><mrow><mi mathvariant=\"normal\">L</mi><mi mathvariant=\"normal\">a</mi><mi mathvariant=\"normal\">y</mi><mi mathvariant=\"normal\">e</mi><mi mathvariant=\"normal\">r</mi><mi mathvariant=\"normal\">N</mi><mi mathvariant=\"normal\">o</mi><mi mathvariant=\"normal\">r</mi><mi mathvariant=\"normal\">m</mi></mrow><mo stretchy=\"false\">(</mo><msub><mi mathvariant=\"normal\">H</mi><mi>t</mi></msub><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mi mathvariant=\"script\">B</mi><mo stretchy=\"false\">(</mo><msub><mi mathvariant=\"normal\">H</mi><mi>t</mi></msub><mo stretchy=\"false\">)</mo><mo>=</mo><msub><mi>s</mi><mi>β</mi></msub><mo>∘</mo><mi>tanh</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mover accent=\"true\"><mi mathvariant=\"normal\">H</mi><mo stretchy=\"true\">‾</mo></mover><mi>t</mi></msub><msub><mi>W</mi><mi>β</mi></msub><msup><mo stretchy=\"false\">)</mo><mi mathvariant=\"sans-serif\">T</mi></msup><mo>+</mo><mi>B</mi></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><msub><mi mathvariant=\"script\">A</mi><mi>m</mi></msub><mo stretchy=\"false\">(</mo><msub><mi mathvariant=\"normal\">H</mi><mi mathvariant=\"normal\">t</mi></msub><mo stretchy=\"false\">)</mo><mo>=</mo><msub><mi>s</mi><mi>m</mi></msub><mo>∘</mo><mi>tanh</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mover accent=\"true\"><mi mathvariant=\"normal\">H</mi><mo stretchy=\"true\">‾</mo></mover><mi>t</mi></msub><msub><mi>W</mi><mi>m</mi></msub><mo stretchy=\"false\">)</mo><mo>+</mo><msub><mi>A</mi><mi>m</mi></msub></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><msub><mi mathvariant=\"script\">A</mi><mi>r</mi></msub><mo stretchy=\"false\">(</mo><msub><mi mathvariant=\"normal\">H</mi><mi>t</mi></msub><mo stretchy=\"false\">)</mo><mo>=</mo><msub><mi>s</mi><mi>r</mi></msub><mo>∘</mo><mi>tanh</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mover accent=\"true\"><mi mathvariant=\"normal\">H</mi><mo stretchy=\"true\">‾</mo></mover><mi>t</mi></msub><msub><mi>W</mi><mi>r</mi></msub><mo stretchy=\"false\">)</mo><mo>+</mo><msub><mi>A</mi><mi>r</mi></msub></mrow></mstyle></mtd></mtr></mtable></mrow><annotation encoding=\"application/x-tex\">\\begin{dcases}\n\\overline{\\mathrm{H}}_t = \\mathrm{LayerNorm} (\\mathrm{H}_t) \\\\\n\\mathcal{B}(\\mathrm{H}_t) = s_\\beta\\circ \\tanh (\\overline{\\mathrm{H}}_tW_\\beta)^\\mathsf{T}+B\\\\\n\\mathcal{A}_m(\\mathrm{H_t}) = s_m\\circ \\tanh (\\overline{\\mathrm{H}}_t W_m) + A_m\\\\\n\\mathcal{A}_r(\\mathrm{H}_t) = s_r\\circ \\tanh (\\overline{\\mathrm{H}}_t W_r)+A_r\n\\end{dcases}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:5.76em;vertical-align:-2.63em;\"></span><span class=\"minner\"><span class=\"mopen\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.95em;\"><span style=\"top:-1.6em;\"><span class=\"pstrut\" style=\"height:3.15em;\"></span><span class=\"delimsizinginner delim-size4\"><span>⎩</span></span></span><span style=\"top:-1.592em;\"><span class=\"pstrut\" style=\"height:3.15em;\"></span><span style=\"height:0.916em;width:0.8889em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.8889em\" height=\"0.916em\" style=\"width:0.8889em\" viewbox=\"0 0 888.89 916\" preserveaspectratio=\"xMinYMin\"><path d=\"M384 0 H504 V916 H384z M384 0 H504 V916 H384z\"/></svg></span></span><span style=\"top:-3.15em;\"><span class=\"pstrut\" style=\"height:3.15em;\"></span><span class=\"delimsizinginner delim-size4\"><span>⎨</span></span></span><span style=\"top:-4.292em;\"><span class=\"pstrut\" style=\"height:3.15em;\"></span><span style=\"height:0.916em;width:0.8889em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.8889em\" height=\"0.916em\" style=\"width:0.8889em\" viewbox=\"0 0 888.89 916\" preserveaspectratio=\"xMinYMin\"><path d=\"M384 0 H504 V916 H384z M384 0 H504 V916 H384z\"/></svg></span></span><span style=\"top:-5.2em;\"><span class=\"pstrut\" style=\"height:3.15em;\"></span><span class=\"delimsizinginner delim-size4\"><span>⎧</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.45em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.13em;\"><span style=\"top:-5.13em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord overline\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8833em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">H</span></span></span><span style=\"top:-3.8033em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"overline-line\" style=\"border-bottom-width:0.04em;\"></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">LayerNorm</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathrm\">H</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span><span style=\"top:-3.69em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord mathcal\" style=\"margin-right:0.03041em;\">B</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathrm\">H</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">s</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05278em;\">β</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">∘</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mop\">tanh</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord overline\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8833em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">H</span></span></span><span style=\"top:-3.8033em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"overline-line\" style=\"border-bottom-width:0.04em;\"></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05278em;\">β</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span><span class=\"mclose\"><span class=\"mclose\">)</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8991em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathsf mtight\">T</span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05017em;\">B</span></span></span><span style=\"top:-2.25em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathcal\">A</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">m</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathrm\">H</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathrm mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">s</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">m</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">∘</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mop\">tanh</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord overline\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8833em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">H</span></span></span><span style=\"top:-3.8033em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"overline-line\" style=\"border-bottom-width:0.04em;\"></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">m</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">A</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">m</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-0.81em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathcal\">A</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em;\">r</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathrm\">H</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">s</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em;\">r</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">∘</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mop\">tanh</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord overline\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8833em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathrm\">H</span></span></span><span style=\"top:-3.8033em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"overline-line\" style=\"border-bottom-width:0.04em;\"></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em;\">r</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">A</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em;\">r</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.63em;\"><span></span></span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p>相比SHC， 每一个&quot;输出头&quot; 都是一个局部的线性层+ <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>tanh</mi><mo>⁡</mo></mrow><annotation encoding=\"application/x-tex\">\\tanh</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em;\"></span><span class=\"mop\">tanh</span></span></span></span> 激活函数的结构</p>\n<p>在单次训练中，DHC相比 SHC多训练了线性层，但是换取了更高的信息密度，且这几个线性层的参数量约为 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"script\">O</mi><mo stretchy=\"false\">(</mo><mi>d</mi><mi>n</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\mathcal{O}(dn)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathcal\" style=\"margin-right:0.02778em;\">O</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">d</span><span class=\"mord mathnormal\">n</span><span class=\"mclose\">)</span></span></span></span> , 在模型训练的开销较小的情况下，通过实时生成的非线性权重打破了传统残差连接的表达瓶颈，从而实现了极高的信息利用率和收敛稳定性。</p>\n<p>实验证明SHC在小任务的条件下性能与DHC相近，但是在更深的网络、更复杂任务(比如图像生成/大参数LLM预训练)的情况下DHC性能远高于SHC</p>\n<p><img loading=\"lazy\" src=\"/picture/mHC/result.png\" alt=\"result\"></p>\n<h1><span id=\"manifold-constrained-hyper-connections\"> Manifold-Constrained Hyper-Connections</span></h1>\n",
            "tags": []
        },
        {
            "id": "https://yuuko.site/2026/05/21/CS/OS/%E5%9F%BA%E6%9C%AC%E8%A7%82%E5%BF%B5/",
            "url": "https://yuuko.site/2026/05/21/CS/OS/%E5%9F%BA%E6%9C%AC%E8%A7%82%E5%BF%B5/",
            "title": "操作系统的基本概念",
            "date_published": "2026-05-20T16:00:00.000Z",
            "content_html": "<link rel=\"stylesheet\" type=\"text/css\" href=\"https://cdn.jsdelivr.net/hint.css/2.4.1/hint.min.css\"><h1><span id=\"操作系统\"> 操作系统</span></h1>\n<p>操作系统是计算机将硬件系统和用户的连接，是计算机的<strong>Harness</strong>。计算机系统自下而上分为: 硬件、操作系统、应用与用户。操作系统负责<strong>管理硬件资源、为应用程序提供运行环境并充当硬件和用户之间的桥梁</strong>。</p>\n<h2><span id=\"操作系统的发展过程与分类\"> 操作系统的发展过程与分类</span></h2>\n<p>原始模式:手动操作阶段，人工通过穿孔纸带输入计算机。分别分为人工操作模式与脱机IO模式。脱机IO通过外围机读取数据到磁带再供CPU调入内存。外围机和高速磁带的读取速度远远高于手动输入，大幅提高CPU IO速度</p>\n<p>批处理方式: 基于脱机IO实现的自动连续处理，以提高CPU和系统资源的利用率。可分为</p>\n<ul>\n<li>单道批处理 --</li>\n<li>多道批处理</li>\n</ul>\n",
            "tags": []
        },
        {
            "id": "https://yuuko.site/2026/05/21/CS/%E8%AE%A1%E7%BB%84/Bus/",
            "url": "https://yuuko.site/2026/05/21/CS/%E8%AE%A1%E7%BB%84/Bus/",
            "title": "总线",
            "date_published": "2026-05-20T16:00:00.000Z",
            "content_html": "<link rel=\"stylesheet\" type=\"text/css\" href=\"https://cdn.jsdelivr.net/hint.css/2.4.1/hint.min.css\"><h1><span id=\"总线\"> 总线</span></h1>\n<p>总线分为CPU内的总线(片内总线)与CPU和其他功能部件之间的总线(系统总线)，I/O总线与通信总线。</p>\n<p>多总线硬件系统的系统总线通常可分为</p>\n<ul>\n<li>数据总线</li>\n<li>地址总线</li>\n<li>控制总线</li>\n</ul>\n<p>I/O 总线是CPU与内部I/O设备的总线结构，如显卡、网卡等，通常使用标准化的内部总线协议</p>\n<p>通信总线是CPU与外部I/O设备或者其他计算机通信的总线结构，常见协议有USB、RS-232、RS-485等</p>\n<h2><span id=\"总线结构\"> 总线结构</span></h2>\n<p>总线结构发展历程经历三种结构</p>\n<ul>\n<li>早期共享总线结构 -- 通过单一系统总线串联所有设备，传输冲突严重</li>\n<li>三总线结构 -- 将总线在物理结构上增加到为三条总线，系统总线、IO总线与DMA总线。传统总线仍然需要承载所有设备通信，具有共享式、广播式的特点，仍然限制了并发和拓展能力</li>\n<li>南北桥结构 -- 通过南桥/北桥芯片分离内部和外部的控制功能，北桥负责高速内部通信，如显卡、内存通信；南桥负责外部通信，如USB设备、磁盘与其他外设。南北桥之间通过桥间总线进行通信，北桥与CPU之间通过 <strong>前端总线(FSB)</strong> 进行通信。南北桥结构仍然因为FSB的吞吐影响了系统的并发能力。</li>\n<li>现代集成通信结构 -- 传统北桥被集成在CPU内作为<strong>片上集成</strong>，CPU与其他外部设备进行<strong>点对点互联</strong>。多核CPU核间、多CPU结构的CPU之间使用QPI高速串行链路通信。高速IO设备直接通过PCIe与CPU相联，低速设备采用集成度更高的南桥芯片PCH管理</li>\n</ul>\n<h2><span id=\"总线的性能指标\"> 总线的性能指标</span></h2>\n<ul>\n<li>总线传输周期 -- 完成一次完整总线事务的总时间，简称总线周期</li>\n<li>总线时钟频率 -- 总线周期的倒数表示总线事务信号的频率。现代总线事务的时钟独立于CPU时钟</li>\n<li>总线工作频率 -- 总线有效数据传输的频率</li>\n<li>总线宽度 -- 总线中数据线条数，标志总线并行数量</li>\n<li>总线带宽 -- 总线总传递速度，总线带宽 = 总线宽度 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo>×</mo></mrow><annotation encoding=\"application/x-tex\">\\times</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6667em;vertical-align:-0.0833em;\"></span><span class=\"mord\">×</span></span></span></span> 总线频率</li>\n<li>总线复用 -- 地址总线/数据总线复用以减小引脚总数</li>\n<li>总线寻址能力 -- <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>n</mi></mrow><annotation encoding=\"application/x-tex\">n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">n</span></span></span></span>位地址线能寻址<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msup><mn>2</mn><mi>n</mi></msup></mrow><annotation encoding=\"application/x-tex\">2^n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6644em;\"></span><span class=\"mord\"><span class=\"mord\">2</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6644em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span></span></span></span></span></span></span></span>个存储单元</li>\n</ul>\n<h2><span id=\"总线事务\"> 总线事务</span></h2>\n<p>总线事务是主设备到从设备到一次信息交换的过程。分为三个基本段</p>\n<ul>\n<li>地址传送阶段 -- 目标地址与操作类型，主设备 -&gt; 从设备</li>\n<li>从设备响应阶段(数据准备阶段) -- 从设备准备数据</li>\n<li>数据传输阶段 -- 数据在总线上的传输</li>\n</ul>\n<p>总线的连续传输模式可分为:</p>\n<ul>\n<li>非突发传输模式 -- 单次传输一个总线宽度的数据单元，每一次传输都完整执行三个传输基本段</li>\n<li>突发传输模式 -- 连续地址传输，地址线仅传输首地址，后续地址由硬件执行地址自增生成，省去了地址传输的时间开销</li>\n</ul>\n<p>总线的物理传输方式可分为串行通信与并行通信。其中并行通信能通过多条数据线同步传输数据，但是会出现信号串扰和时序偏移等问题，实际传输效率低于串行通信。</p>\n<p>串行通信可分为</p>\n<ul>\n<li>同步串行 -- 发送方时钟控制接收方时钟，二者时钟严格一致，只在数据块首位进行标记。硬件实现复杂，成本较高</li>\n<li>异步串行 -- 收发方使用独立的时钟，单数据块以起始位0开启，以终止位1结束，存在奇偶校验。</li>\n</ul>\n<h2><span id=\"总线定时\"> 总线定时</span></h2>\n<p>总线定时方式可分为</p>\n<ul>\n<li>同步定时方式 -- 主从设备使用同一个时钟，每个操作都在固定的周期内进行</li>\n<li>异步定时方式 -- 通过主从设备间的握手信号实现定时控制，分为不互锁(主-从)、半互锁(主-从-主)、全互锁(主-从-主-从). 异步定时方式能改变总线周期长度，并可靠连接工作速度差异较大的设备</li>\n<li>半同步定时方式 -- 通过<code>Wait</code> 信号线控制。主设备在信号上升沿检测<code>Wait</code>, 当<code>Wait</code> 置高电平时表示数据未准备就绪，同步无效，直到<code>Wait</code>置低电平时才能读取数据</li>\n<li>分离式定时方式 -- 将总线事务拆为请求阶段与应答阶段，请求阶段主设备申请总线使用权，发送地址与命令后释放；应答阶段由从设备申请总线使用权并应答。分离式定时方案能处理多主设备竞争的场景，但是总线仲裁的逻辑较为复杂，协议管理开销较大。</li>\n</ul>\n",
            "tags": [
                "硬件",
                "计组"
            ]
        }
    ]
}