Module 3 coding topics: from rule-based POS tagging to HMM/CRF stochastic methods, and from Context Free Grammars to constituency parsing. Each topic is mapped to libraries, difficulty, and NLP interview concepts.
| Topic | Category | Difficulty | Core Libraries | Builds On | Key Interview Concept |
|---|---|---|---|---|---|
| 01 — Penn Treebank & Backoff Tagger | Rule-Based POS | ⬛⬜⬜⬜ | nltk, spacy | Module 1 pipeline | PTB tagset, backoff chain |
| 02 — Ambiguity & Unknown Words | Rule-Based POS | ⬛⬛⬜⬜ | nltk, regex | Topic 01 | OOV, lexical ambiguity |
| 03 — HMM POS Tagger + Viterbi | Stochastic | ⬛⬛⬛⬜ | numpy, hmmlearn | Module 2 N-grams | Viterbi, Markov assumption |
| 04 — MaxEnt / LogReg POS | Stochastic | ⬛⬛⬜⬜ | sklearn | Topic 03 | Generative vs discriminative |
| 05 — SVM POS Tagger | ML Models | ⬛⬛⬜⬜ | sklearn, seaborn | Topic 04 | Feature engineering |
| 06 — CRF POS + NER | ML Models | ⬛⬛⬛⬜ | sklearn-crfsuite | Topics 03–05 | Label bias, global normalisation |
| 07 — CFG Rules & Parse Trees | CFG Core | ⬛⬛⬜⬜ | nltk.CFG | Topic 01 (tags) | Chomsky hierarchy, CNF |
| 08 — NP/VP/Agreement/FCFG | Grammar Structures | ⬛⬛⬜⬜ | nltk.FeatureGrammar | Topic 07 | Subcategorisation, agreement |
| 09 — PCFG & CYK Algorithm | Grammar | ⬛⬛⬛⬜ | nltk.PCFG, numpy | Topics 07–08 | O(n³) CYK, PCFG disambiguation |
| 10 — Modern Parsing: spaCy+benepar | Parsing | ⬛⬛⬜⬜ | spacy, benepar | All above | Constituency vs dependency, UAS/LAS |