Scaling Linear Mode Connectivity and Merging to Billion Parameter Pretrained Transformers


This is a companion discussion topic for the original entry at https://arxiv.org/abs/2606.23607