Tencent’s Hunyuan-Large AI Model Sets New Benchmark with 389 Billion Parameters

Building on Tencent’s revolutionary Hunyuan-Large, the AI model with 389 billion parameters not only breaks performance records but also sets the stage for the next generation of scalable and efficient machine learning systems.

The four-step process of data synthesis in Hunyuan-Large’s pre-training: (1) Instruction generation, (2) Instruction evolution, (3) Response generation, and (4) Response filtering. paper: Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by TencentThe four-step process of data synthesis in Hunyuan-Large’s pre-training: (1) Instruction generation, (2) Instruction evolution, (3) Response generation, and (4) Response filtering. Paper: Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.

A recent paper posted on the arXiv preprint* server introduced "Hunyuan-Large," a novel Transformer-based mixture of experts (MoE) model developed by Tencent. This open-source model is the largest of its kind, featuring 389 billion total parameters, including 52 billion activated specialized experts, marking a significant advancement in artificial intelligence (AI).

Advancements in Large Language Models

Large language models (LLMs) have transformed AI, especially in natural language processing (NLP), computer vision (CV), and speech recognition. Models like the chat generative pre-trained transformer (ChatGPT) have led to the development of powerful LLMs that enable new information processing methods. However, traditional dense architectures face challenges in scalability and efficiency, particularly in training speed and resource use.

MoE models have emerged as a promising solution that enables the activation of only a subset of specialized submodels during inference to improve efficiency without compromising performance. This architecture handles complex tasks more effectively by allowing the model to leverage expert knowledge in specific domains.

Despite their potential, few large-scale open-source MoE models have been developed, limiting their widespread use. This study aims to address this gap by introducing Hunyuan-Large, which not only scales MoE architectures but also improves performance through innovative techniques.

Hunyuan-Large: An Open-Source MoE Model

In this paper, the authors presented Hunyuan-Large, which combines classical Transformer architecture with a MoE design. During the pre-training phase, the model develops foundational capabilities. The post-training phase then refines task-specific skills to better align the model with human preferences.

The model was pre-trained on a dataset of 7 trillion tokens, including 1.5 trillion tokens of high-quality synthetic data. The tokenizer was optimized to include 128K tokens. This optimization balances compression rates and vocabulary size to enhance performance, particularly in supporting the Chinese language.

The model architecture comprises 64 layers, 80 attention heads, and a combination of shared and specialized experts. One key feature is the Key-Value (KV) cache compression, which reduces memory usage during inference by using grouped-query attention (GQA) and cross-layer attention (CLA). This method reduces cache size by nearly 95% compared to traditional methods, with minimal information loss during training.

The study also used a mixed expert routing strategy, where shared and specialized experts are activated based on a top-k scoring mechanism, ensuring efficient load balancing and optimal performance. Additionally, the model uses expert-specific learning rate scaling, applying different learning rates to each expert to improve training efficiency.

After pre-training, the model underwent supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF). The researchers then evaluated Hunyuan-Large by benchmarking it against leading models, such as LLama3.1 (large language model by Meta AI version 3.1) and DeepSeek, across various tasks in both English and Chinese. They also compared it with dense and MoE models like LLama3.1-70B and LLama3.1-405B.

An illustration of the recycle routing strategy in Hunyuan-Large, where each expert’s maximum capacity is set to 2. Token D, which was initially allocated to the overloaded Expert 1, is reassigned to a randomly selected Expert 4. This approach helps alleviate the potential loss of valuable information. In traditional routing strategies, tokens from overloaded experts would be dropped as shown in (a). However, our strategy involves randomly reassigning these tokens to other experts, as demonstrated in (b), where Token D is routed to Expert 4.

​​​​​​​An illustration of the recycle routing strategy in Hunyuan-Large, where each expert’s maximum capacity is set to 2. Token D, which was initially allocated to the overloaded Expert 1, is reassigned to a randomly selected Expert 4. This approach helps alleviate the potential loss of valuable information. In traditional routing strategies, tokens from overloaded experts would be dropped as shown in (a). However, our strategy involves randomly reassigning these tokens to other experts, as demonstrated in (b), where Token D is routed to Expert 4.

Key Findings and Insights

The study showed that Hunyuan-Large outperformed existing models across various tasks. In comparative analyses, it demonstrated strong performance in common sense understanding, question answering, mathematical reasoning, and coding. Notably, despite using fewer activated parameters, the model surpassed LLama3.1-405B by 3.2% on massive multitask language understanding (MMLU) and grade school math 8K (GSM8K) benchmarks.

The authors also highlighted the model's proficiency in managing long-context tasks, with the ability to process sequences of up to 256K tokens. This feature is crucial for applications that require integrating large amounts of information and context, thereby broadening Hunyuan-Large’s potential for real-world use. Key factors behind the model's success include high-quality synthetic data used during pre-training, expert routing strategies, and training protocols developed through empirical testing.

Additionally, the researchers emphasized that combining high-quality synthetic data and innovative training strategies was essential in boosting the model’s performance. The use of synthetic data not only expanded the training dataset but also enabled the model to learn complex patterns more effectively.

Applications

This research has significant implications across various industries. In education, Hunyuan-Large can be used to develop intelligent tutoring systems for personalized learning. In software development, it can assist with code generation and debugging. Its strong reasoning and language understanding abilities make it suitable for customer support, content creation, and information retrieval tasks. Furthermore, the model’s open-source nature encourages collaboration and innovation within the AI community. The availability of its code enables further experimentation and adaptation to diverse use cases.

Conclusion and Future Directions

In conclusion, Hunyuan-Large represents a significant advancement in the field of LLMs. It proved effective in various tasks, including language understanding, generation, logical reasoning, mathematical problem-solving, coding, and long-context processing. As the demand for advanced AI systems grows, Hunyuan-Large offers significant potential for future research and applications, paving the way for intelligent systems capable of effectively processing and understanding complex information.

Future work should optimize the model's efficiency and performance by exploring new scaling laws and learning rate schedules for MoE models. Further advancements in domain-specific capabilities and techniques for handling larger datasets and more complex tasks would enhance the model’s potential.

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.

Sources:
Journal reference:
  • Preliminary scientific report. Sun, X., & et al. Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent. arXiv, 2024, 2411, 02265. DOI: 10.48550/arXiv.2411.02265, https://arxiv.org/abs/2411.02265
Muhammad Osama

Written by

Muhammad Osama

Muhammad Osama is a full-time data analytics consultant and freelance technical writer based in Delhi, India. He specializes in transforming complex technical concepts into accessible content. He has a Bachelor of Technology in Mechanical Engineering with specialization in AI & Robotics from Galgotias University, India, and he has extensive experience in technical content writing, data science and analytics, and artificial intelligence.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Osama, Muhammad. (2024, November 11). Tencent’s Hunyuan-Large AI Model Sets New Benchmark with 389 Billion Parameters. AZoAi. Retrieved on November 21, 2024 from https://www.azoai.com/news/20241111/Tencente28099s-Hunyuan-Large-AI-Model-Sets-New-Benchmark-with-389-Billion-Parameters.aspx.

  • MLA

    Osama, Muhammad. "Tencent’s Hunyuan-Large AI Model Sets New Benchmark with 389 Billion Parameters". AZoAi. 21 November 2024. <https://www.azoai.com/news/20241111/Tencente28099s-Hunyuan-Large-AI-Model-Sets-New-Benchmark-with-389-Billion-Parameters.aspx>.

  • Chicago

    Osama, Muhammad. "Tencent’s Hunyuan-Large AI Model Sets New Benchmark with 389 Billion Parameters". AZoAi. https://www.azoai.com/news/20241111/Tencente28099s-Hunyuan-Large-AI-Model-Sets-New-Benchmark-with-389-Billion-Parameters.aspx. (accessed November 21, 2024).

  • Harvard

    Osama, Muhammad. 2024. Tencent’s Hunyuan-Large AI Model Sets New Benchmark with 389 Billion Parameters. AZoAi, viewed 21 November 2024, https://www.azoai.com/news/20241111/Tencente28099s-Hunyuan-Large-AI-Model-Sets-New-Benchmark-with-389-Billion-Parameters.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.