Skip to content

Latest commit

 

History

History
30 lines (22 loc) · 1.7 KB

File metadata and controls

30 lines (22 loc) · 1.7 KB

An Empirical Research Study on the Efficacy of Pattern Matching Algorithms in the Arabic Language

In Natural Language Processing, there are numerous applications for searching information in texts across different languages around the world. However, for Arabic, this is rare due to the complexity of its processing and its unique characteristics.
In this project, three pattern matching algorithms were implemented, tested, and compared in the Arabic language: Brute-Force (Naïve), Boyer-Moore-Horspool, and Knuth-Morris-Pratt algorithms.
The goal of this project is to solve the problem of information search in Arabic texts, regardless of the presence of diacritical symbols. This platform uses Artificial Intelligence and Natural Language Processing algorithms to not only to allow the user to easily search for information in Arabic texts, but also to determine how often it occurs and precisely locate it within the text.

Some Algorithms and Methods Used

The project uses various algorithms and methods for pattern matching in the Arabic language, including:

  • Pattern Matching Algorithm
  • Token Count Vectorizer
  • Arabic shakeel Function
  • Strip_shakeel Function

Tools and Dependencies

The project is written using Java, J2EE, Apache Tomcat, HTML5, CSS3, JS, MySQL.

Conclusion

This project demonstrates the potential of using Artificial Intelligence and Natural Language Processing algorithms for search in Arabic texts, regardless of the presence of diacritical symbols. The different algorithms and methods used in the project provide a comprehensive approach to easily search for information in Arabic texts, but also to determine how often it occurs and precisely locate it within the text.