NLP Module 3 — POS Tagging & CFG Coding Map

Rule-Based POS Tagging — Lexical & Rule-Based Methods Topics 1 – 2

Topic

Theory + Code

Penn Treebank Tagset & NLTK Rule-Based POS

Penn Treebank 36-tag tagset: NN, VB, JJ, DT, IN, etc.
Default tagger, Regexp tagger, Lookup tagger
UnigramTagger, BigramTagger, TrigramTagger (backoff chain)
Accuracy evaluation on Brown Corpus
Brill Tagger: rule-learning from corpus errors

Libraries & Dataset

nltkspacy Brown Corpus (NLTK)Penn Treebank

Build a complete backoff tagger chain (Unigram → Bigram → Trigram → Default). Visualise Penn Treebank tag distribution. Compare accuracy across tagger levels.

⬛⬜⬜⬜ Beginner

Historical + Interview

PTB 36 tags Backoff tagging Brill 1992 Why trigram backoff?

Brown Corpus (1961) — first annotated English corpus. Penn Treebank (1993, Marcus et al.) — standard benchmark still used today.

Topic

Theory + Code

Multiple Tags, Unknown Words & Class-Based N-Grams

Lexical ambiguity: words with multiple tags (e.g. "book", "run")
Unknown word handling: suffix rules, morphological cues
Class-based n-gram language models (Brown clustering)
Tag frequency distribution and most-likely-tag baseline
OOV rate analysis on held-out test sets

Libraries & Dataset

nltkcollections regexBrown Corpus

Analyse tag ambiguity in Brown Corpus. Build a suffix-based unknown word tagger. Compute OOV rates at different training sizes. Plot tag distribution.

⬛⬛⬜⬜ Intermediate

Historical + Interview

Lexical ambiguity Suffix heuristics OOV rate Hapax legomena

The unknown words problem persists into transformer era (handled by BPE/WordPiece). Understanding this motivates subword tokenisation.

Stochastic Stochastic POS Tagging — HMM Topics 3 – 4

Topic

Theory + Code

Hidden Markov Model (HMM) for POS Tagging

HMM components: states (tags), observations (words), transitions, emissions
Viterbi algorithm: dynamic programming for most likely tag sequence
Forward algorithm: computing sequence probability P(O|λ)
Transition probabilities: P(tᵢ | tᵢ₋₁) from corpus counts
Emission probabilities: P(wᵢ | tᵢ) with Laplace smoothing

Libraries & Dataset

numpynltk hmmlearnBrown Corpus

Implement HMM POS tagger from scratch with Viterbi decoding. Visualise transition and emission matrices as heatmaps. Compare accuracy vs rule-based backoff chain.

⬛⬛⬛⬜ Advanced

Historical + Interview

Viterbi 1967 Baum-Welch Markov assumption Decode vs Learn Why HMM over rules?

HMMs from speech recognition (Jelinek, 1976 at IBM) adapted to POS tagging by Church (1988). First purely statistical NLP system to beat rule-based.

Topic

Theory + Code

Maximum Entropy (ME) Model for POS Tagging

MaxEnt principle: most uniform distribution consistent with constraints
Feature engineering: word shape, prefix, suffix, context window
sklearn LogisticRegression as MaxEnt implementation
Feature extraction pipeline for POS tagging
Comparison: HMM (generative) vs MaxEnt (discriminative)

Libraries & Dataset

sklearnnltk numpyCoNLL-2000

Build a MaxEnt POS tagger using LogisticRegression with handcrafted features (word, suffix, shape, neighbours). Compare vs HMM on Brown Corpus accuracy.

⬛⬛⬜⬜ Intermediate

Historical + Interview

Generative vs Discriminative MaxEnt Ratnaparkhi 1996 Feature engineering L1/L2 regularisation

ML Models SVM & CRF for Sequence Labelling Topics 5 – 6

Topic

Theory + Code

SVM for POS Tagging (SVM-HMM)

SVM for multi-class sequence classification
Feature vectors: contextual, lexical, morphological
sklearn SVC with one-vs-rest strategy
Comparison: local SVM vs sequence models (HMM, CRF)
Confusion matrix analysis: which tags are confused?

Libraries & Dataset

sklearnnltk seabornBrown/CoNLL

Build a feature-rich SVM tagger. Compare SVM vs HMM vs MaxEnt on a small held-out test set. Plot per-tag F1 scores and a confusion matrix heatmap.

⬛⬛⬜⬜ Intermediate

Historical + Interview

Gimenez & Marquez 2004 Local vs global model Label dependency problem Why CRF beats SVM?

Topic

Theory + Code

Conditional Random Field (CRF) for POS + NER

CRF: discriminative, globally normalised, sequence-aware
Feature templates: word, POS, suffix, BIO tags
sklearn-crfsuite for POS and NER jointly
F1 evaluation per entity/tag type
CRF vs HMM vs BERT-based models — evolution

Libraries & Dataset

sklearn-crfsuitenltk CoNLL-2003 (NER)spacy

Train a CRF tagger with feature templates on CoNLL-2003. Evaluate with per-tag F1. Show CRF learning BIO tag transitions implicitly via features. Compare vs NLTK chunker.

⬛⬛⬛⬜ Advanced

Historical + Interview

Lafferty et al. 2001 Global normalisation CRF vs HMM BIO scheme Label bias problem

CFG Core Context Free Grammars — Rules, Trees & Constituency Topics 7 – 8

Topic

Theory + Code

CFG Rules, Parse Trees & Constituency

CFG formalism: terminals, non-terminals, production rules S → NP VP
Constituency vs dependency: phrase structure
NLTK CFG and ChartParser — build and parse sentences
Parse tree traversal: leaves, depth, subtrees
Sentence-level construction: S → NP VP, embedded clauses

Libraries & Dataset

nltk.CFGnltk.ChartParser nltk.TreeCustom toy grammar

Define a CFG grammar with NLTK. Parse sample sentences. Draw parse trees. Demonstrate ambiguity (PP attachment). Compare CFG with dependency parsing.

⬛⬛⬜⬜ Intermediate

Historical + Interview

Chomsky 1956 Chomsky Normal Form CFG vs dependency PP attachment ambiguity CYK algorithm

Topic

Theory + Code

NP, VP, Coordination & Agreement

Noun Phrase structure: Det + (Adj*) + N (+ PP*)
Verb Phrase & subcategorisation frames: intransitive, transitive, ditransitive
Coordination: NP and NP, VP and VP (symmetry constraint)
Agreement features: number, person, gender morphology
Feature-based CFG (FCFG) with NLTK for agreement checking

Libraries & Dataset

nltk.FeatureGrammarnltk.RecursiveDescent spacyWSJ Penn Treebank

Build a feature-based CFG that enforces subject-verb agreement. Parse NPs with coordination. Demonstrate subcategorisation frames. Test grammatical vs ungrammatical sentences.

⬛⬛⬜⬜ Intermediate

Historical + Interview

Subcategorisation Coordination constraint Agreement in CFG Feature structures

Grammar Probabilistic CFG & Grammar Induction Topics 9 – 10

Topic

Theory + Code

Probabilistic CFG (PCFG) & CYK Parsing

PCFG: CFG rules annotated with probabilities P(α→β)
CYK (Cocke-Younger-Kasami) algorithm: bottom-up chart parsing
Chomsky Normal Form (CNF) conversion
Parsing ambiguity resolution via maximum probability parse
Reading PCFG from Penn Treebank with NLTK

Libraries & Dataset

nltk.PCFGnltk.ViterbiParser numpyPenn Treebank (small)

Implement CYK algorithm from scratch on toy grammar. Use NLTK ViterbiParser for PCFG. Show how probability resolves PP-attachment ambiguity. Convert grammar to CNF.

⬛⬛⬛⬜ Advanced

Historical + Interview

CYK O(n³) complexity CNF conversion PCFG disambiguation Earley parser Treebank grammars

Topic

Theory + Code

Modern Parsing: spaCy + Constituency vs Dependency

spaCy's dependency parser: arc labels, head words
Constituency parsing with benepar (Berkeley Neural Parser)
Comparing constituency trees vs dependency graphs
Extracting NPs, VPs, clauses programmatically
Discourse: parse trees for question answering, information extraction

Libraries & Dataset

spacybenepar nltkdisplacyWSJ sentences

Use spaCy's dep parser. Install benepar for constituency. Side-by-side comparison of both representations on same sentence. Extract all NPs with regex over parse trees.

⬛⬛⬜⬜ Intermediate

Historical + Interview

Constituency vs dependency Arc-eager parsing Neural parsers Treebank accuracy (UAS/LAS) Parse for downstream NLP

// NLP Parsing History — An Analytical Timeline

1956

Chomsky — Formal Language Hierarchy

Introduced context-free grammars in "Three models for the description of language." Foundation of all parse theory.

1961

Brown Corpus — First Annotated Corpus

One million words of American English, POS-annotated. First large-scale empirical NLP resource.

1967

Viterbi Algorithm

Dynamic programming for finding the most likely path through an HMM. Core of all sequence decoders.

1976

Statistical NLP begins at IBM — Jelinek

Applied HMM to speech recognition. Same math later applied to POS tagging (Church, 1988).

1992

Brill Tagger — Error-Driven Rule Learning

Transformation-based learning: start with most-frequent tag, learn rules to fix errors. Hybrid of rule-based and statistical.

1993

Penn Treebank (Marcus et al.)

2.9M words of Wall Street Journal, constituency-annotated. Gold standard for parsing and POS evaluation for 30 years.

1996

MaxEnt POS Tagger (Ratnaparkhi)

First discriminative POS tagger using maximum entropy / logistic regression. 96.6% accuracy on PTB.

2001

CRF — Conditional Random Fields (Lafferty et al.)

Solved the label bias problem in MEMMs. Became the dominant sequence labelling method for a decade.

2011

Neural Network POS (Collobert et al.)

"Natural Language Processing (almost) from scratch." Word embeddings + CNN replaced hand-crafted features.

2018

BERT — Contextual Embeddings Replace CRF

Fine-tuned BERT achieves 97%+ POS accuracy. CRF head still used on top of BERT for NER/POS in production.

2024

LLM-based Parsing

GPT-4/Claude perform constituency and dependency parsing in-context. Classical grammar still needed for explainability and low-resource languages.

// Teaching Sequence

Topic	Category	Difficulty	Core Libraries	Builds On	Key Interview Concept
01 — Penn Treebank & Backoff Tagger	Rule-Based POS	⬛⬜⬜⬜	nltk, spacy	Module 1 pipeline	PTB tagset, backoff chain
02 — Ambiguity & Unknown Words	Rule-Based POS	⬛⬛⬜⬜	nltk, regex	Topic 01	OOV, lexical ambiguity
03 — HMM POS Tagger + Viterbi	Stochastic	⬛⬛⬛⬜	numpy, hmmlearn	Module 2 N-grams	Viterbi, Markov assumption
04 — MaxEnt / LogReg POS	Stochastic	⬛⬛⬜⬜	sklearn	Topic 03	Generative vs discriminative
05 — SVM POS Tagger	ML Models	⬛⬛⬜⬜	sklearn, seaborn	Topic 04	Feature engineering
06 — CRF POS + NER	ML Models	⬛⬛⬛⬜	sklearn-crfsuite	Topics 03–05	Label bias, global normalisation
07 — CFG Rules & Parse Trees	CFG Core	⬛⬛⬜⬜	nltk.CFG	Topic 01 (tags)	Chomsky hierarchy, CNF
08 — NP/VP/Agreement/FCFG	Grammar Structures	⬛⬛⬜⬜	nltk.FeatureGrammar	Topic 07	Subcategorisation, agreement
09 — PCFG & CYK Algorithm	Grammar	⬛⬛⬛⬜	nltk.PCFG, numpy	Topics 07–08	O(n³) CYK, PCFG disambiguation
10 — Modern Parsing: spaCy+benepar	Parsing	⬛⬛⬜⬜	spacy, benepar	All above	Constituency vs dependency, UAS/LAS

POS Tagging& Context Free Grammars

// NLP Parsing History — An Analytical Timeline

// Teaching Sequence

POS Tagging
& Context Free Grammars