Tag: 3TToken

MiniMax Sparse Attention (MSA): a Two-Branch Block-Sparse Attention Trained on a 109B-Parameter MoE With a 3T-Token Budget

MiniMax has launched MSA (MiniMax Sparse Attendance), a sparse consideration methodology constructed…

AllTopicsToday