Have you ever thought of combining the might of two AIs so you can get the best out of each of them? Well, that’s what researchers at MIT have tried to do by creating a new algorithm called Co-LLM.

Co-LLM could pair a general-purpose-based large language model (LLM) with an “assistant model” — an LLM specialized in a particular field — and help them work together to produce more accurate results.

How is the approach useful?

The authors said the collaborative decoding was useful in cross-domain settings where a generalist base LLM learns to invoke domain expert models. On instruction-following, domain-specific QA, and reasoning tasks, they showed that the performance of the joint system exceeded that of the individual models.

The research also indicated that using the base model alone could show factual mistakes. However, Co-LLM learns to use the “assistant model” at these positions to produce correct generations.

The authors also noted that research suggests that their “method can effectively combine the best of the models and achieve better performance than the “sum” of their parts.”

Types of LLMs used?

The research used several LLMs, including LLAMA-7B as the base model and MEDITRON-70B as the “assistant” LLM. These two models were developed by tech giant Meta Platforms. LLaMA 7B is a comparatively bigger model trained on one trillion tokens (words/characters), while Meditron caters to the medical field aimed at assisting with clinical decision-making and diagnosis, and was built on Meta Llama 2.

The researchers also used LLEMMA, a model mainly designed to solve mathematics problems, developed by several researchers and a company known as Eleuther AI. To show the uses of Co-LLM across architectures, the authors also used MISTRAL models, developed by French startup Mistral AI.

How does Co-LLM work?

The researchers proposed a method to teach multiple LLMs to team up by interleaving their generations at the token level.

“By optimizing the marginal likelihood of a training set under our latent variable model, the base LLM automatically learns when to generate itself and when to call on one of the “assistant” language models to generate, all without direct supervision,” stated the research paper by Shannon Zejiang Shen, Hunter Lang, Bailin Wang, Yoon Kim, and David Sontag at the MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL).

The authors noted that Token-level collaboration during decoding allows the fusion of each model’s expertise to do the specific tasks at hand.

“We empirically show that Co-LLM can produce generations of better quality compared to using either of the models alone, in tasks ranging from math reasoning, medical question answering, and instruction following,” said the researchers.

Examples

The authors noted that Co-LLM enabled collaboration between LLAMA and domain-specific models, and this led to improved performance versus the individual models themselves.

Using LLEMMA as an assistant led to improved performances on math and reasoning tasks. Similarly, cooperation with MEDITRON models led to performance gains on some BioASQ subtasks (e.g. List, Summary), and outperformed fine-tuned LLAMA-7B, fine-tuned LLAMA-70B, and base MEDITRON-70B on average, according to the research paper.

BioASQ is a dataset for biomedical questions and answers.

Co-LLM versus other LLM collaboration approaches?

The authors compared Co-LLM with the “Proxy Tuning” approach.

Co-LLM could guide two models trained differently to work together, while “Proxy Tuning” only performed well when all its component models were pre-trained on the same domain mix.

“Our results show that Co-LLM is more effective at enabling collaboration between models from different domains. PT also requires more calls to the larger model, thus resulting in slower inference. Co-LLM makes fewer calls to both large and small models,” the researchers stated.

LLM team-up across Architectures?

The researchers said that Co-LLM can be used easily to collaborate between models of different architectures. They cited an example from their research noting that Co-LLM can be adopted to collaborate between a dense model (MISTRAL-7B) and a sparse MoE model (MIXTRAL-8×7B), and the joint model achieves strong accuracy gains compared to either the fine-tuned MISTRAL-7B model or the MIXTRAL-8×7B model.

AIs collaborate and get better results via 'Co-LLM' developed by MIT researchers