songsenand
  • Joined on 2024-01-17
songsenand pushed to main at songsenand/SUInput 2026-02-20 23:30:44 +08:00
17324ffa10 修复初始步骤损失计算逻辑
songsenand pushed to main at songsenand/SUInput 2026-02-20 23:28:47 +08:00
4560a9ed06 移除 global_step 自增逻辑并调整至循环末尾
songsenand pushed to main at songsenand/SUInput 2026-02-20 23:21:46 +08:00
558d7f9fc9 调整模型结构及参数以优化性能
songsenand pushed to main at songsenand/SUInput 2026-02-16 10:27:01 +08:00
ae414bae6b feat(trainer): 添加残差块以增强模型表达能力
songsenand pushed to main at songsenand/SUInput 2026-02-15 23:01:58 +08:00
ab2dbc378b 修复损失权重计算逻辑,修正平方根次数以提升稳定性
songsenand pushed to main at songsenand/SUInput 2026-02-15 21:51:39 +08:00
cd25349d90 删除旧的 MoE 模型文件
songsenand pushed to main at songsenand/SUInput 2026-02-15 01:48:50 +08:00
0d529c0c89 调整损失权重计算并优化训练循环终止条件
songsenand pushed to main at songsenand/SUInput 2026-02-15 01:07:06 +08:00
94b44e6f71 添加损失权重支持并重构部分模块结构
songsenand pushed to main at songsenand/SUInput 2026-02-15 00:25:48 +08:00
515f261824 修复模型加载方法,使用正确的实例方法加载状态字典
songsenand pushed to main at songsenand/SUInput 2026-02-15 00:08:59 +08:00
fd913748ca 调整残差块和分类头的 dropout 概率,并新增残差模块到 MoE 模型
songsenand pushed to main at songsenand/SUInput 2026-02-14 23:34:41 +08:00
e91f823d65 feat: 优化模型输入处理与专家数量,增强训练与推理兼容性
songsenand pushed to main at songsenand/SUInput 2026-02-14 17:07:42 +08:00
9fad2bf1d4 修复损失计算方式,使用NLLLoss替代原始criterion
songsenand pushed to main at songsenand/SUInput 2026-02-14 15:51:11 +08:00
f89635b201 添加 package-data 配置以包含 trainer 和 suinput 模块的额外数据文件
songsenand pushed to main at songsenand/SUInput 2026-02-14 15:43:31 +08:00
d60997438e 更新评估数据集样本文件
songsenand pushed to main at songsenand/SUInput 2026-02-14 15:29:35 +08:00
b68f75b09d 修复 char_info.pinyin 访问方式,使用字典形式确保兼容性
songsenand pushed to main at songsenand/SUInput 2026-02-14 15:27:16 +08:00
d2d65c7efa 调整导入顺序并修复pickle保存逻辑
134c8a09cf feat: 重构拼音输入数据集与 MoE 模型结构,优化专家网络配置及评估逻辑
Compare 2 commits »
songsenand pushed to main at songsenand/SUInput 2026-02-13 16:13:21 +08:00
7eb00c6207 feat(model): 优化专家输出结构并添加专家偏置支持
songsenand pushed to main at songsenand/SUInput 2026-02-13 15:44:52 +08:00
f4be47df78 feat(trainer): 使用 hidden_size 代替 d_model 计算输出维度并添加池化层
songsenand pushed to main at songsenand/SUInput 2026-02-13 14:20:15 +08:00
d82c80f3a9 修复分类头输出维度,使用 d_model 替代 hidden_size
songsenand pushed to main at songsenand/SUInput 2026-02-13 14:15:51 +08:00
6923870171 修复输出维度计算错误,使用 d_model 代替 input_dim