Framework for Robust and Explainable Automated Large Language Model Selection
Acronym
AutoLLMSelect
Type
research
Duration
2025 - 2027
Content
Large Language Models (LLMs) are gradually becoming part of academic and industrial processes due to their inherent capacity to solve a multitude of different problems across different domains. However, an open question remains – from the multitude of LLMs available, how to select the most appropriate LLM to use on a specific supervised machine learning (ML) problem (with or without fine-tuning), without evaluating a large portfolio of LLMs on the labelled dataset related to that ML problem. Evaluating a large LLM portfolio across multiple criteria introduces high computational cost, which then translates into a negative environmental impact, especially in terms of increased carbon emission. This proposal aims to (1) publish a comprehensive LLM benchmark dataset analysis that would facilitate a robust and unbised LLM benchmarking, (2) make the first steps towards a robust, explainable and evolving framework for automated LLM selection based on a multi-disciplinary approach that would reduce the cost for comparing large LLM portfolio on ML datasets, and (3) evaluate the applicability of the framework on a use-case from in field of sustainable development. Due to the high complexity of the problem to be solved, the proposal will present a proof-of-concept on a selected LLM portfolio, dataset portfolio, and performance metrics, based on the available data in public benchmarks. The framework would evolve and could be extended in the future with new LLMs, benchmark datasets, ML tasks, performance metrics, from both our side and the community.
Funding
HE-MSCA-2024-PF-01