Built on advanced architecture and fine-tuned for e-commerce, Xmodel-1.5 showcases breakthrough performance in low-resource languages, setting a new benchmark in global AI innovation.
Research: Xmodel-1.5: An 1B-scale Multilingual LLM. Image Credit: Krot_Studio / Shutterstock
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
In an article submitted to the arXiv preprint* server, researchers at Xiaoduo AI introduced Xmodel-1.5, a 1-billion-parameter multilingual large model pre-trained on approximately 2 trillion tokens.
The model competed strongly in Thai, Arabic, French, Chinese, and English. They also released a Thai evaluation dataset annotated by students from Chulalongkorn University. This dataset includes 359 samples that are culturally and linguistically accurate, focusing on polite and contextually appropriate responses.
While promising, the results highlighted areas for improvement, aiming to advance multilingual artificial intelligence (AI) research. For instance, the model faced challenges with Thai slang, gender differentiation, and tone distinctions, occasionally producing unnatural responses.
Related Work
Past work on multilingual large language models (LLM) has focused on addressing natural language processing (NLP) challenges across diverse languages, emphasizing high-resource and low-resource contexts.
Notable models include cross-lingual language model - RoBERTa (XLM-R), multilingual text-to-text transfer transformer (mT5), and polynomial language model (PolyLM), which set benchmarks for multilingual AI.
XLM-R demonstrated robust low-resource generalization, while mT5 excelled in cross-lingual tasks focusing on understanding and generation.
PolyLM utilized bilingual data and curriculum learning, achieving strong performance, particularly in lower-resource languages like Thai and Indonesian.
Multilingual Data Integration
The pretraining of Xmodel-1.5 involved a diverse, multilingual corpus, emphasizing low-resource languages from MultiWiki and CulturaX. Data ratios evolved during training, increasing the multilingual content from 5% to 10% over 600,000 iterations, to enhance low-resource language coverage.
A unigram tokenizer with a 65,280-token vocabulary was developed to balance efficiency and linguistic coverage, outperforming other tokenizers in compression. It utilized byte fallback and character coverage settings to handle rare tokens, ensuring adaptability for low-resource languages and code generation.
The architecture integrated rotary positional embeddings, root mean square normalization (RMSNorm), switched gated linear unit (SwiGLU), and grouped-query attention for improved context understanding and training efficiency.
The training utilized 7 H800 GPUs, AdamW optimization, and a cosine learning rate schedule across 600,000 iterations, processing over 2 trillion tokens.
E-commerce RAG Fine-tuning
Instruction fine-tuning enhanced the model's performance on e-commerce retrieval-augmented generation (RAG) tasks using a comprehensive instruction dataset for training. The fine-tuning setup included a learning rate 6e-5, a weight decay of 0.1, a warmup ratio of 0.03, and a batch size of 120.
Progressive dataset construction integrated RAG and recurrent all-pairs field transforms (RAFT) datasets, with significant contributions from Belle (56.04%) and other sources. This approach achieved a 92.47% satisfaction rate on e-commerce evaluations, assessed by GPT-4o mini.
Xmodel Evaluation Insights
To ensure a fair comparison, the evaluation of Xmodel-1.5 was conducted against several prominent decoder-only architecture models, each with around 1 billion parameters. These included open pretrained transformers (OPT), Pythia, tiny LLM meta-AI (TinyLLaMA), MobileLLaMA, H2O-Danube, InternLM2, and Qwen2.5.
The evaluation focused on commonsense reasoning tasks using the LM Evaluation Harness, covering datasets such as AI2 reasoning challenge (ARC), ARC-easy, Boolq, HellaSwag, OpenBookQA, physical interaction question answering (PiQA), scientific question answering (SciQ), and Winogrande.
Results highlighted Xmodel-1.5's competitive performance, surpassing models like TinyLLaMA in multiple metrics. However, specific models like Qwen2.5 outperformed Xmodel-1.5 in overall accuracy.
To assess multilingual capabilities, Xmodel-1.5 was tested on translated datasets, including ARC (Chinese), XCOPA (11 languages), PIQA_AR (Arabic), Belebele_tha_thai (Thai), multilingual massive multitask language understanding (mMMLU), and mHellaSwag.
These datasets evaluated the model's reasoning, comprehension, and knowledge across various languages and domains. Comparative results showed the model's strengths and limitations in multilingual tasks, with performance insights detailed in case studies. For example, Xmodel-1.5 achieved a Belebele_tha_thai score of 0.2756, outperforming PolyLM-1.7B.
Instruction-following performance was evaluated using benchmarks such as IFEval and MT-Bench, which measure language understanding and multi-turn dialogue capabilities. Xmodel-1.5-Instruct demonstrated moderate proficiency in these areas.
Further evaluation included diverse tasks encompassing open-domain question answering, machine translation, e-commerce, and cultural nuances. This evaluation revealed the model's adaptability across domains and highlighted areas for improvement, such as slang understanding and cultural context.
Overall, the results emphasize Xmodel-1.5's capabilities and areas needing refinement for enhanced instruction and multilingual performance.
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
Source:
Journal reference:
- Preliminary scientific report.
Qun, W., Yang, L., Qingquan, L., & Ling, J. (2024). Xmodel-1.5: An 1B-scale Multilingual LLM. ArXiv. https://arxiv.org/abs/2411.10083