Aleph Alpha's Pharia-1-LLM-7B Models Revolutionize Multilingual AI for Domain-Specific Tasks

German artificial intelligence startup Aleph Alpha, has introduced a new foundation model family, specifically the Pharia-1-large language model (LLM)-7 billion (B)-control and its variant Pharia-1-LLM-7B-control-aligned. These models are designed for delivering concise, length-controlled responses and are optimized for German, French, and Spanish languages.

They are particularly suited for domain-specific applications in the automotive and engineering industries. The Pharia-1-LLM-7B-control-aligned version includes additional safety measures, and both models are available under the Open Aleph License for non-commercial research and educational use.

Model Architecture and Hyperparameters

Several ablations conducted on a 1B parameter model guided the architecture and hyperparameter choices, with evaluations on benchmarks such as Lambada, TriviaQA, HellaSwag, and others. The initial hyperparameter search involved a proxy model, upscaling to 1B parameters using maximal update parametrization (MuP). Although MuP was initially intended for the 7B scale, training instabilities were encountered, leading to the abandonment of MuP in favor of heuristics similar to those used in LLM Meta artificial intelligence (AI) (Llama) 2.

When comparing the classical generative pre-trained transformer (GPT) transformer architecture with Llama 2, both performed similarly, though the GPT architecture showed an edge on TriviaQA, leading to its selection for the Pharia-1-LLM-7B models. Group-query attention (GQA) was introduced to improve inference-time performance, with a 1/9 key value-query (kv-q) ratio providing significant memory and throughput benefits without degradation. A larger base for rotary embeddings and a Unigram tokenizer with a 128000-vocabulary size were selected based on better downstream performance.

Weight decay and learning rate decay were also optimized, with a 1e-1 weight decay and decaying the learning rate to zero, yielding the best results.

Pre-training the Model

The Pharia-1-LLM-7B base model was trained using the Scaling codebase, known for its parallelization capabilities and performance optimizations. Training employed the bfloat16 format with a standard mixed-precision strategy, maintaining master copies of weights and optimizer states in full precision and sharing full-precision tensors across data-parallel workers using zero redundancy optimizer (ZeRO) stage 1.

The pre-training was conducted with a sequence length of 8192 tokens to establish baseline long-context abilities. To counter early instabilities observed when scaling sequence length, a warm-up strategy was implemented, gradually increasing from 512 to 2048 and finally to 8192 tokens over several thousand steps. The training process was executed with a global batch size of 1024, spanning 4.7 trillion tokens, covering a single epoch of the initial pre-training dataset.

Subsequently, a second epoch was trained on a different data mix, incorporating recently accessible high-quality English data while retaining the model's multilingual capabilities. This phase covered an additional 3 trillion tokens. The learning rate, initially decayed to zero after the first pre-training phase, was warmed up for 2000 iterations to 3e-5 and gradually decayed to 3e-6 following a cosine schedule.

In total, the Pharia-1-LLM-7B base model was trained on 7.7 trillion tokens, utilizing 256 A100 graphic processing units (GPUs) for the first phase and 256 H100 GPUs for the second. Memory reduction techniques optimized throughput without the need for activation checkpointing, resulting in efficient step durations and high model throughput.

Fine-Tuning and Model Variants

Pharia-1-LLM-7B-control was optimized for instruction using full model fine-tuning and a curriculum strategy that involved training on a blend of instruction datasets, including proprietary and multilingual data in English, German, Spanish, and French. The model was fine-tuned with a focus on minimal and anonymized data.

Two variants were developed, namely, Pharia-1-LLM-7B-control and Pharia-1-LLM-7B-control-aligned. The latter includes preference alignment and safety training, making it ideal for conversational applications where clarity and safety are important. This version uses a knowledge transfer optimization (KTO) alignment process, though it sometimes results in more verbose, generic responses. The control-aligned model is suited for chatbots and virtual assistants, while the non-aligned control model excels in tasks requiring direct, concise outputs, such as extraction and summarization.

Performance Evaluation

Evaluating generative AI (GenAI) models is challenging due to the inherent ambiguity in language, which complicates the creation of standardized metrics. Unlike other AI domains with clear metrics, language models can produce outputs subject to multiple interpretations, making evaluation particularly complex. Human annotators often prioritize assertiveness and length over factuality, affecting evaluation outcomes.

Additionally, evaluation scores can be unstable due to model architecture and training details, such as prompt composition and metric choice. Evaluation data might also leak into training datasets, leading to overfitting and skewed results. Many GenAI evaluation tasks do not align well with real-world scenarios, leading to discrepancies between benchmark scores and practical performance.

For instance, tasks in benchmarks like massive multi-task language understanding (MMLU) and Alpaca Eval may not accurately reflect real-world use cases, complicating the assessment of model usefulness. Evaluations of models like Pharia-1-7B-control and Pharia-1-7B-control-aligned against other multilingual models highlight these challenges.

Conclusion

In conclusion, the Pharia-1-LLM-7B models, including the control and control-aligned variants, are advanced language models optimized for multilingual instruction and domain-specific applications. The control-aligned version incorporates safety and preference alignment, making it suitable for conversational tasks. Despite challenges in evaluating generative AI due to language ambiguity and benchmark misalignments, these models offer significant advancements for research and educational purposes under the Open Aleph License.

Source:
Soham Nandi

Written by

Soham Nandi

Soham Nandi is a technical writer based in Memari, India. His academic background is in Computer Science Engineering, specializing in Artificial Intelligence and Machine learning. He has extensive experience in Data Analytics, Machine Learning, and Python. He has worked on group projects that required the implementation of Computer Vision, Image Classification, and App Development.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Nandi, Soham. (2024, September 04). Aleph Alpha's Pharia-1-LLM-7B Models Revolutionize Multilingual AI for Domain-Specific Tasks. AZoAi. Retrieved on September 16, 2024 from https://www.azoai.com/news/20240904/Aleph-Alphas-Pharia-1-LLM-7B-Models-Revolutionize-Multilingual-AI-for-Domain-Specific-Tasks.aspx.

  • MLA

    Nandi, Soham. "Aleph Alpha's Pharia-1-LLM-7B Models Revolutionize Multilingual AI for Domain-Specific Tasks". AZoAi. 16 September 2024. <https://www.azoai.com/news/20240904/Aleph-Alphas-Pharia-1-LLM-7B-Models-Revolutionize-Multilingual-AI-for-Domain-Specific-Tasks.aspx>.

  • Chicago

    Nandi, Soham. "Aleph Alpha's Pharia-1-LLM-7B Models Revolutionize Multilingual AI for Domain-Specific Tasks". AZoAi. https://www.azoai.com/news/20240904/Aleph-Alphas-Pharia-1-LLM-7B-Models-Revolutionize-Multilingual-AI-for-Domain-Specific-Tasks.aspx. (accessed September 16, 2024).

  • Harvard

    Nandi, Soham. 2024. Aleph Alpha's Pharia-1-LLM-7B Models Revolutionize Multilingual AI for Domain-Specific Tasks. AZoAi, viewed 16 September 2024, https://www.azoai.com/news/20240904/Aleph-Alphas-Pharia-1-LLM-7B-Models-Revolutionize-Multilingual-AI-for-Domain-Specific-Tasks.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Generative Chatbots Amplify False Memories in Witness Interviews, Posing New Ethical Risks