MTech NLP · Module 3 · Coding Topics Map
10
Topics
6
Categories
2
Colab Notebooks

POS Tagging
& Context Free Grammars

Module 3 coding topics: from rule-based POS tagging to HMM/CRF stochastic methods, and from Context Free Grammars to constituency parsing. Each topic is mapped to libraries, difficulty, and NLP interview concepts.

Module 1 (NLP Pipeline) ✓ Module 2 (Morphology, N-Grams) ✓ → Module 3: POS + CFG
Filter →
Rule-Based POS Tagging — Lexical & Rule-Based Methods Topics 1 – 2
01
Topic
Theory + Code
Penn Treebank Tagset & NLTK Rule-Based POS
  • Penn Treebank 36-tag tagset: NN, VB, JJ, DT, IN, etc.
  • Default tagger, Regexp tagger, Lookup tagger
  • UnigramTagger, BigramTagger, TrigramTagger (backoff chain)
  • Accuracy evaluation on Brown Corpus
  • Brill Tagger: rule-learning from corpus errors
Libraries & Dataset
nltkspacy Brown Corpus (NLTK)Penn Treebank
Build a complete backoff tagger chain (Unigram → Bigram → Trigram → Default). Visualise Penn Treebank tag distribution. Compare accuracy across tagger levels.
⬛⬜⬜⬜ Beginner
Historical + Interview
PTB 36 tags Backoff tagging Brill 1992 Why trigram backoff?
Brown Corpus (1961) — first annotated English corpus. Penn Treebank (1993, Marcus et al.) — standard benchmark still used today.
02
Topic
Theory + Code
Multiple Tags, Unknown Words & Class-Based N-Grams
  • Lexical ambiguity: words with multiple tags (e.g. "book", "run")
  • Unknown word handling: suffix rules, morphological cues
  • Class-based n-gram language models (Brown clustering)
  • Tag frequency distribution and most-likely-tag baseline
  • OOV rate analysis on held-out test sets
Libraries & Dataset
nltkcollections regexBrown Corpus
Analyse tag ambiguity in Brown Corpus. Build a suffix-based unknown word tagger. Compute OOV rates at different training sizes. Plot tag distribution.
⬛⬛⬜⬜ Intermediate
Historical + Interview
Lexical ambiguity Suffix heuristics OOV rate Hapax legomena
The unknown words problem persists into transformer era (handled by BPE/WordPiece). Understanding this motivates subword tokenisation.
Stochastic Stochastic POS Tagging — HMM Topics 3 – 4
03
Topic
Theory + Code
Hidden Markov Model (HMM) for POS Tagging
  • HMM components: states (tags), observations (words), transitions, emissions
  • Viterbi algorithm: dynamic programming for most likely tag sequence
  • Forward algorithm: computing sequence probability P(O|λ)
  • Transition probabilities: P(tᵢ | tᵢ₋₁) from corpus counts
  • Emission probabilities: P(wᵢ | tᵢ) with Laplace smoothing
Libraries & Dataset
numpynltk hmmlearnBrown Corpus
Implement HMM POS tagger from scratch with Viterbi decoding. Visualise transition and emission matrices as heatmaps. Compare accuracy vs rule-based backoff chain.
⬛⬛⬛⬜ Advanced
Historical + Interview
Viterbi 1967 Baum-Welch Markov assumption Decode vs Learn Why HMM over rules?
HMMs from speech recognition (Jelinek, 1976 at IBM) adapted to POS tagging by Church (1988). First purely statistical NLP system to beat rule-based.
04
Topic
Theory + Code
Maximum Entropy (ME) Model for POS Tagging
  • MaxEnt principle: most uniform distribution consistent with constraints
  • Feature engineering: word shape, prefix, suffix, context window
  • sklearn LogisticRegression as MaxEnt implementation
  • Feature extraction pipeline for POS tagging
  • Comparison: HMM (generative) vs MaxEnt (discriminative)
Libraries & Dataset
sklearnnltk numpyCoNLL-2000
Build a MaxEnt POS tagger using LogisticRegression with handcrafted features (word, suffix, shape, neighbours). Compare vs HMM on Brown Corpus accuracy.
⬛⬛⬜⬜ Intermediate
Historical + Interview
Generative vs Discriminative MaxEnt Ratnaparkhi 1996 Feature engineering L1/L2 regularisation
ML Models SVM & CRF for Sequence Labelling Topics 5 – 6
05
Topic
Theory + Code
SVM for POS Tagging (SVM-HMM)
  • SVM for multi-class sequence classification
  • Feature vectors: contextual, lexical, morphological
  • sklearn SVC with one-vs-rest strategy
  • Comparison: local SVM vs sequence models (HMM, CRF)
  • Confusion matrix analysis: which tags are confused?
Libraries & Dataset
sklearnnltk seabornBrown/CoNLL
Build a feature-rich SVM tagger. Compare SVM vs HMM vs MaxEnt on a small held-out test set. Plot per-tag F1 scores and a confusion matrix heatmap.
⬛⬛⬜⬜ Intermediate
Historical + Interview
Gimenez & Marquez 2004 Local vs global model Label dependency problem Why CRF beats SVM?
06
Topic
Theory + Code
Conditional Random Field (CRF) for POS + NER
  • CRF: discriminative, globally normalised, sequence-aware
  • Feature templates: word, POS, suffix, BIO tags
  • sklearn-crfsuite for POS and NER jointly
  • F1 evaluation per entity/tag type
  • CRF vs HMM vs BERT-based models — evolution
Libraries & Dataset
sklearn-crfsuitenltk CoNLL-2003 (NER)spacy
Train a CRF tagger with feature templates on CoNLL-2003. Evaluate with per-tag F1. Show CRF learning BIO tag transitions implicitly via features. Compare vs NLTK chunker.
⬛⬛⬛⬜ Advanced
Historical + Interview
Lafferty et al. 2001 Global normalisation CRF vs HMM BIO scheme Label bias problem
CFG Core Context Free Grammars — Rules, Trees & Constituency Topics 7 – 8
07
Topic
Theory + Code
CFG Rules, Parse Trees & Constituency
  • CFG formalism: terminals, non-terminals, production rules S → NP VP
  • Constituency vs dependency: phrase structure
  • NLTK CFG and ChartParser — build and parse sentences
  • Parse tree traversal: leaves, depth, subtrees
  • Sentence-level construction: S → NP VP, embedded clauses
Libraries & Dataset
nltk.CFGnltk.ChartParser nltk.TreeCustom toy grammar
Define a CFG grammar with NLTK. Parse sample sentences. Draw parse trees. Demonstrate ambiguity (PP attachment). Compare CFG with dependency parsing.
⬛⬛⬜⬜ Intermediate
Historical + Interview
Chomsky 1956 Chomsky Normal Form CFG vs dependency PP attachment ambiguity CYK algorithm
08
Topic
Theory + Code
NP, VP, Coordination & Agreement
  • Noun Phrase structure: Det + (Adj*) + N (+ PP*)
  • Verb Phrase & subcategorisation frames: intransitive, transitive, ditransitive
  • Coordination: NP and NP, VP and VP (symmetry constraint)
  • Agreement features: number, person, gender morphology
  • Feature-based CFG (FCFG) with NLTK for agreement checking
Libraries & Dataset
nltk.FeatureGrammarnltk.RecursiveDescent spacyWSJ Penn Treebank
Build a feature-based CFG that enforces subject-verb agreement. Parse NPs with coordination. Demonstrate subcategorisation frames. Test grammatical vs ungrammatical sentences.
⬛⬛⬜⬜ Intermediate
Historical + Interview
Subcategorisation Coordination constraint Agreement in CFG Feature structures
Grammar Probabilistic CFG & Grammar Induction Topics 9 – 10
09
Topic
Theory + Code
Probabilistic CFG (PCFG) & CYK Parsing
  • PCFG: CFG rules annotated with probabilities P(α→β)
  • CYK (Cocke-Younger-Kasami) algorithm: bottom-up chart parsing
  • Chomsky Normal Form (CNF) conversion
  • Parsing ambiguity resolution via maximum probability parse
  • Reading PCFG from Penn Treebank with NLTK
Libraries & Dataset
nltk.PCFGnltk.ViterbiParser numpyPenn Treebank (small)
Implement CYK algorithm from scratch on toy grammar. Use NLTK ViterbiParser for PCFG. Show how probability resolves PP-attachment ambiguity. Convert grammar to CNF.
⬛⬛⬛⬜ Advanced
Historical + Interview
CYK O(n³) complexity CNF conversion PCFG disambiguation Earley parser Treebank grammars
10
Topic
Theory + Code
Modern Parsing: spaCy + Constituency vs Dependency
  • spaCy's dependency parser: arc labels, head words
  • Constituency parsing with benepar (Berkeley Neural Parser)
  • Comparing constituency trees vs dependency graphs
  • Extracting NPs, VPs, clauses programmatically
  • Discourse: parse trees for question answering, information extraction
Libraries & Dataset
spacybenepar nltkdisplacyWSJ sentences
Use spaCy's dep parser. Install benepar for constituency. Side-by-side comparison of both representations on same sentence. Extract all NPs with regex over parse trees.
⬛⬛⬜⬜ Intermediate
Historical + Interview
Constituency vs dependency Arc-eager parsing Neural parsers Treebank accuracy (UAS/LAS) Parse for downstream NLP

// NLP Parsing History — An Analytical Timeline

1956
Chomsky — Formal Language Hierarchy
Introduced context-free grammars in "Three models for the description of language." Foundation of all parse theory.
1961
Brown Corpus — First Annotated Corpus
One million words of American English, POS-annotated. First large-scale empirical NLP resource.
1967
Viterbi Algorithm
Dynamic programming for finding the most likely path through an HMM. Core of all sequence decoders.
1976
Statistical NLP begins at IBM — Jelinek
Applied HMM to speech recognition. Same math later applied to POS tagging (Church, 1988).
1992
Brill Tagger — Error-Driven Rule Learning
Transformation-based learning: start with most-frequent tag, learn rules to fix errors. Hybrid of rule-based and statistical.
1993
Penn Treebank (Marcus et al.)
2.9M words of Wall Street Journal, constituency-annotated. Gold standard for parsing and POS evaluation for 30 years.
1996
MaxEnt POS Tagger (Ratnaparkhi)
First discriminative POS tagger using maximum entropy / logistic regression. 96.6% accuracy on PTB.
2001
CRF — Conditional Random Fields (Lafferty et al.)
Solved the label bias problem in MEMMs. Became the dominant sequence labelling method for a decade.
2011
Neural Network POS (Collobert et al.)
"Natural Language Processing (almost) from scratch." Word embeddings + CNN replaced hand-crafted features.
2018
BERT — Contextual Embeddings Replace CRF
Fine-tuned BERT achieves 97%+ POS accuracy. CRF head still used on top of BERT for NER/POS in production.
2024
LLM-based Parsing
GPT-4/Claude perform constituency and dependency parsing in-context. Classical grammar still needed for explainability and low-resource languages.

// Teaching Sequence

TopicCategoryDifficultyCore LibrariesBuilds OnKey Interview Concept
01 — Penn Treebank & Backoff TaggerRule-Based POS⬛⬜⬜⬜nltk, spacyModule 1 pipelinePTB tagset, backoff chain
02 — Ambiguity & Unknown WordsRule-Based POS⬛⬛⬜⬜nltk, regexTopic 01OOV, lexical ambiguity
03 — HMM POS Tagger + ViterbiStochastic⬛⬛⬛⬜numpy, hmmlearnModule 2 N-gramsViterbi, Markov assumption
04 — MaxEnt / LogReg POSStochastic⬛⬛⬜⬜sklearnTopic 03Generative vs discriminative
05 — SVM POS TaggerML Models⬛⬛⬜⬜sklearn, seabornTopic 04Feature engineering
06 — CRF POS + NERML Models⬛⬛⬛⬜sklearn-crfsuiteTopics 03–05Label bias, global normalisation
07 — CFG Rules & Parse TreesCFG Core⬛⬛⬜⬜nltk.CFGTopic 01 (tags)Chomsky hierarchy, CNF
08 — NP/VP/Agreement/FCFGGrammar Structures⬛⬛⬜⬜nltk.FeatureGrammarTopic 07Subcategorisation, agreement
09 — PCFG & CYK AlgorithmGrammar⬛⬛⬛⬜nltk.PCFG, numpyTopics 07–08O(n³) CYK, PCFG disambiguation
10 — Modern Parsing: spaCy+beneparParsing⬛⬛⬜⬜spacy, beneparAll aboveConstituency vs dependency, UAS/LAS