Spacy Deliberate Practice
Contents
This post gives some of the learnings from the deliberate practice on spacy.
What can spacy do ?
- Spacy can do
shallow parsing/Chunking. This entails grouping adjacent tokens in to phrases based on their POS tags. Some of them are noun phrases, verb phrases, prepositional phrases - Named Entity Recognition : This entails locating named entities and classifying them in to pre-defined categories
- Available packages to do NER
- Stanford NER - Provides sequence models. Train your own models with labeled data to build NER models
- Spacy - Comes with Out of the box NER tagging
- NLTK: This involves going through three stages
- Word Tokenization
- POS tagging : Download corpora to do POS tagging and NER
- Chunking: Shallow parsing that uses POS tagging and adds more structure to the sentence
- Available packages to do NER
verb-phrasedetection can be done viatextacy- Gives dependency parse tree via
doc.dep_ - One can use regex to match spacy docs
- One can quickly remove stop words, remove punctuation, lemmatize and remove punctuation symbols via spacy
tag_gives fine grained POSpos_gives coarse grained POS- word frequencies can be obtained by passing through
Counterobject - Lemmatization can be done via
token.lemma_ spacy.lang.en.stop_words.STOP_WORDSgives the list of stop wordsnlp.vocabgives the list of words present in a specific language- Every token as a set of very useful attributes and functions useful in NLP tasks
- Sentence detection is automatic. One can also tweak it to create custom sentence detections