Abstract
This project aims to develop IslamGPT 1.0, the first AI-empowered large language model (LLM) specifically designed for Arabic sources in Islamic Studies. By forming a strong interdisciplinary team of researchers, the project aims to overcome the various limitations and drawbacks of the currently only available AI-enabled chatbot that can converse in Arabic, viz., ChatGPT. To enhance its academic character, IslamGPT 1.0 will be trained on extensive classical and contemporary Arabic sources. Semantic similarity will be used for querying, but the results will be based on annotated documents, which will help fight ChatGPT-like hallucinations. Recent advancements in AI, hinging on LLMs, have been yielding remarkable applications across various domains. Nonetheless, the intricacies and distinctive terminologies inherent in the Arabic language and Islamic scholarship necessitate the creation of a specialized model. Thus, IslamGPT 1.0 is a significant stride toward harnessing the transformative potential of AI-empowered conversation agents to serve Islamic Studies. This project will enhance accuracy and efficiency in addressing unresolved inquiries across the broad discipline of Islamic Studies. The project acknowledges the importance of high-quality input data. To have a refined training dataset that meets the requirements of Islamic scholarship, the annotation will go through every single paragraph in each source, identifying characteristics such as author, historical context, scholars, disciplines, genres, and more. Additionally, the project will provide insights into knowledge transmission and the genealogy of ideas, concepts, and ideologies. This annotation will facilitate the examination of scriptural references across different contexts, disciplines, and schools. IslamGPT 1.0. will be built through interdisciplinary collaboration between experts in Islamic Studies (IS), digital humanities (DH), computer science and AI engineering (CSE). A special database curated in a previous multiyear project by this research team and an extensive corpus of open-access classical and contemporary Arabic literature will be the foundation for this project, resulting in a vast data repository comprising billions of tokens. The significance of developing the first LLM focused on Islamic Studies in Arabic cannot be overstated. By harnessing the capabilities of GPT, this model will enable the exploration of previously unexplored terrains while delivering superior performance compared to conventional models designed only for data retrieval purposes. IslamGPT 1.0 is just the beginning of a broader undertaking aiming to encompass a comprehensive range of classical Islamic languages and modern linguistic dimensions. By acknowledging the expected limitations of IslamGPT 1.0, the project seeks to continually expand its scope by including sources written in various languages.
Team
Lead PI
Prof. Mohammed Ghaly
PI
Prof. Aiman Erbad
PI
Dr. Muetaz Alkhatib
PI
Dr. Emad Mohamed
PI
Dr. Samer Rashwani