NLP: Natural Language Processing (NLP) is a field of artificial intelligence (AI) that focuses on enabling machines to understand, interpret and generate human language. NLP involves analyzing and processing large amounts of human language data, such as written text or spoken language, and extracting meaning and insights from it.
- chatbots
- voice assistants
- sentiment analysis
- language translation
- text summarization and many more
- With the growing popularity of digital assistants and chatbots, NLP has become an essential tool for businesses to provide efficient and personalized customer service.
- Regular expression (regex) is a pattern-matching language used to manipulate and extract text data in NLP. Regular expressions consist of a sequence of characters and metacharacters that represent a particular pattern in a text string.
- For example, regular expressions can be used to extract all email addresses or phone numbers from a text document, or to remove all punctuation marks or stop words from a piece of text.
Extracting phone Numbers
as mentioned above in code we are extracting 10 digits, using '\d' we can extract digits and {n} here in place of n you can replace any number that much digits you want .
we are using findall function for matching data with our pattern
Extracting Email Address
here we are matching text with our designed pattern for mail that is '[a-z0-9A-z_]@[a-z0-9A-z_].[a-zA-Z]'
here a-z: means any character between a to z, simillar for A-Z and 0-9.
you can view my full content of regular expression in my jupyter file :
https://github.com/meet5398/NLP-Natural-Language-Processing-/blob/57611b2b14c58a205c3f93a264daa88f31acc341/regular%20expression%20in%20NLP.ipynb
Text Tokenization: Text tokenization involves breaking text into smaller units or tokens, such as words or sentences. This process enables computers to analyze and understand human language.
- Tokenization is a crucial step in many natural language processing tasks, including sentiment analysis, named entity recognition, and machine translation.
spacy: is an open-source software library for advanced natural language processing, written in Python and Cython. It provides a variety of tools for language understanding and processing, including named entity recognition, dependency parsing, and word vectors. it returns value in terms of object.
nltk (Natural Language Toolkit) is a leading platform for building Python programs to work with human language data. It provides a range of tools for text processing and analysis, including tokenization, stemming, tagging, and parsing. it returns value in terms of string.
Before running the code, make sure you have the following installed:
- Python 3.x
- spacy library (can be installed via pip)
- English language model for spacy (can be downloaded via python -m spacy download en)
- nltk library (can be installed via pip)
in above code we are using spacy and in output we can see that it is returning sentence in object form
In above code we are using nltk and we can see that it is returning output of sentence in string format