MTech NLP · Module 4 · Coding Topics Map

Semantics, Lexical Relations
& Word Sense Disambiguation

Module 4 coding outline: from phrase attachment analysis and WordNet-based lexical relations to homonymy/polysemy, robust WSD systems, and machine-learning approaches — grounded in Allen (1994), Mitchell (1997) and Cover & Thomas information theory.

9
Coding Topics
5
Categories
2
Colab Notebooks
Allen 1994 — NLU Mitchell 1997 — ML Cover & Thomas — Info Theory WordNet (Miller 1995) Lesk 1986 — WSD Yarowsky 1995 — WSD
Phrase Attachment Structural Ambiguity in NP / VP / PP Topics 1 – 2
01
Topic
Theory + Code
Phrase Attachment Ambiguity Analysis
  • NP / VP / PP attachment ambiguity in fragment sentences
  • PP attachment problem: "I saw the man with the telescope"
  • Coordinating multiple attachments: noun phrase fragments
  • Allen (1994) Chapter 4 — attachment & semantic interpretation
  • Hindle & Rooth (1993) lexical association scores for PP attachment
Libraries & Approach
nltkspacynumpy
Build a corpus-based PP attachment classifier using lexical association (Hindle & Rooth method). Count verb-PP vs noun-PP attachments in corpus. Visualise attachment preferences and decision boundaries.
⬛⬛⬜⬜ Intermediate
References + Interview
Allen Ch.4PP attachment Hindle & Rooth 1993Structural ambiguity
Classic NLP benchmark — PP attachment was a major research problem 1987–2000. Still relevant for low-resource languages without neural parsers.
02
Topic
Theory + Code
Fragment Parsing: NP, VP & PP Chunks
  • Sentence fragments and partial parses (Allen 1994 §4.2)
  • NLTK RegexpParser for NP/VP/PP chunking
  • IOB labelling: Inside-Outside-Begin scheme
  • spaCy noun_chunks and verb phrase extraction
  • Comparing chunking vs full parsing accuracy
Libraries & Approach
nltkspacysklearn
Write RegexpParser grammars for NP, VP, PP chunks. Apply to news corpus. Compare IOB accuracy against CoNLL-2000 baseline. Show how fragment parsing feeds downstream semantic analysis.
⬛⬛⬜⬜ Intermediate
References + Interview
IOB schemeChunking vs parsing Fragment semanticsCoNLL-2000
Lexical Relations Relations Among Lexemes & Their Senses Topics 3 – 4
03
Topic
Theory + Code
WordNet: Synsets, Hypernyms & Lexical Relations
  • WordNet (Miller 1995) — synonym sets, lexical relations
  • Synsets: synonymy, hypernymy, hyponymy, meronymy, holonymy
  • Antonymy and entailment in verbs
  • Traversing the WordNet hierarchy: lowest common hypernym
  • Semantic similarity: path similarity, Wu-Palmer, Leacock-Chodorow
Libraries & Approach
nltk.corpus.wordnetmatplotlib
Explore WordNet synsets for polysemous words. Traverse hypernym chains. Compute all similarity measures. Visualise the noun hierarchy as a tree. Build a semantic similarity scorer for word pairs.
⬛⬜⬜⬜ Beginner
References + Interview
Miller 1995 WordNetSynset Wu-Palmer simHypernymy Allen Ch.7
WordNet underpins NLTK's semantic toolkit. Understanding synsets is prerequisite for WSD, semantic similarity, and ontology-based NLP.
04
Topic
Theory + Code
Distributional Semantics & Word Vectors
  • Distributional hypothesis: "a word is known by the company it keeps" (Firth 1957)
  • Co-occurrence matrix construction from corpus
  • PMI (Pointwise Mutual Information) — Cover & Thomas §2.3
  • TF-IDF as a distributional similarity baseline
  • Cosine similarity for word vector comparison
Libraries & Approach
numpysklearnscipy
Build a co-occurrence matrix from Brown Corpus. Compute PMI-weighted vectors. Find nearest neighbours by cosine similarity. Contrast distributional similarity vs WordNet path similarity for same word pairs.
⬛⬛⬜⬜ Intermediate
References + Interview
PMI (Cover & Thomas)Distributional hyp. Co-occurrence matrixCosine similarity
Homonymy & Polysemy Disambiguation, Limitations & Robust WSD Topics 5 – 6
05
Topic
Theory + Code
Homonymy vs Polysemy: Detection & Analysis
  • Homonymy: bank (river) vs bank (financial) — unrelated meanings
  • Polysemy: window (glass pane) vs window (GUI widget) — related meanings
  • Monosemy: words with a single sense
  • WordNet synset count as polysemy proxy
  • Semantic relatedness test: cross-sense similarity
Libraries & Approach
nltk.wordnetmatplotlibpandas
Analyse synset count distributions in WordNet. Classify words as homonymous vs polysemous using inter-synset similarity threshold. Plot polysemy vs word frequency. Show limitations of naive synset counting.
⬛⬜⬜⬜ Beginner
References + Interview
Homonymy vs polysemySense granularity Allen Ch.7 §7.2Fine vs coarse WSD
06
Topic
Theory + Code
Limitations of WSD & Sense Granularity
  • Sense enumeration problem: where do senses stop?
  • Inter-annotator agreement (ITA) on WSD tasks
  • SemEval WSD tasks — benchmark history
  • Coarse vs fine-grained sense inventories
  • The "most frequent sense" (MFS) baseline — surprisingly strong
Libraries & Approach
nltkpandasseaborn
Compute MFS baseline on a word sample. Measure how often MFS is correct. Simulate ITA disagreement. Show that WSD accuracy depends heavily on sense granularity — coarser = easier. Discuss SemEval evaluation methodology.
⬛⬛⬜⬜ Intermediate
References + Interview
MFS baselineSemEval ITA in WSDSense inventory
WSD Algorithms Dictionary-Based & Corpus-Based Robust WSD Topics 7 – 8
07
Topic
Theory + Code
Lesk Algorithm & Dictionary-Based WSD
  • Lesk (1986): sense = definition with most overlap with context
  • Simplified Lesk vs Extended Lesk (Banerjee & Pedersen 2002)
  • WordNet gloss overlap counting
  • Context window size effect on accuracy
  • Evaluation against SemEval gold annotations
Libraries & Approach
nltk.wordnetnltknumpy
Implement Simplified Lesk from scratch. Implement Extended Lesk (using gloss + examples + hypernym glosses). Evaluate on a set of target words with known senses. Compare window sizes. Visualise overlap scores per sense.
⬛⬛⬜⬜ Intermediate
References + Interview
Lesk 1986Gloss overlap Dictionary-basedExtended Lesk Allen Ch.7
08
Topic
Theory + Code
Yarowsky's Bootstrapping & One-Sense-Per-Discourse
  • Yarowsky (1995): unsupervised WSD with seed words
  • "One sense per collocation" and "one sense per discourse" principles
  • Decision list learning from collocations
  • Log-likelihood ratio for feature selection (Cover & Thomas §2)
  • Information gain — Mitchell (1997) Ch. 3
Libraries & Approach
nltknumpycollections
Implement simplified Yarowsky-style decision list for "bank" disambiguation. Use log-likelihood ratio for feature ranking. Apply one-sense-per-discourse rule as post-processing. Measure accuracy against labeled examples.
⬛⬛⬛⬜ Advanced
References + Interview
Yarowsky 1995Decision list One-sense-per-discLog-likelihood Bootstrapping
ML & Info Theory Machine Learning Approach to WSD Topic 9
09
Topic
Theory + Code
ML-Based WSD: Naive Bayes, Decision Trees & SVM
  • WSD as supervised classification (Mitchell 1997 Ch. 6)
  • Feature engineering: surrounding words, POS tags, collocations
  • Naive Bayes for WSD (Gale, Church & Yarowsky 1992)
  • Decision tree features (Mitchell 1997 Ch. 3) — information gain
  • SVM + TF-IDF features for context-window classification
  • Information theoretic measures: entropy, mutual information (Cover & Thomas)
Libraries & Approach
sklearnnltknumpypandasmatplotlib
Supervised WSD on SemCor corpus samples for "bank" and "plant". Feature extraction pipeline: context window words, POS tags, local collocations. Compare NB vs Decision Tree vs SVM. Feature importance analysis. Cross-validation evaluation.
⬛⬛⬛⬜ Advanced
References + Interview
Mitchell Ch.3,6Cover & Thomas Gale et al. 1992Information gain Supervised WSDSemCor
Connects Module 3 ML (SVM, DT) to semantic NLP. Mitchell Ch. 3 decision trees + Ch. 6 Naive Bayes directly apply here.

// Semantics & WSD — Historical Timeline

1957
Firth — Distributional Hypothesis
"You shall know a word by the company it keeps." Foundation of all distributional semantics, PMI, word2vec.
1969
Quillian — Semantic Networks
First computational lexical knowledge base. Concepts connected by IS-A, PART-OF relations. Direct ancestor of WordNet.
1986
Lesk Algorithm — Dictionary-Based WSD
Simple but foundational: overlap between dictionary glosses determines word sense. Still a competitive baseline in 2024.
1987
Hindle & Rooth — PP Attachment from Corpus
Statistical approach to prepositional phrase attachment using lexical association. First corpus-driven approach to structural ambiguity.
1991
Miller et al. — WordNet 1.0
Princeton English lexical database. Synonyms grouped into synsets connected by semantic relations. Most-used NLP lexical resource, still actively maintained.
1992
Gale, Church & Yarowsky — Supervised WSD
Naive Bayes for WSD using surrounding words as features. Established supervised WSD as a classification task. Mitchell Ch. 6 Bayes directly applies.
1994
Allen — "Natural Language Understanding" textbook
Comprehensive treatment of attachment, lexical semantics, WSD in computational context. Core reference for this module.
1995
Yarowsky — One-Sense-Per-Discourse Unsupervised WSD
Bootstrapping from seed collocations. Decision list with log-likelihood. 96% accuracy rivalling supervised systems. Information theory core.
1997
Mitchell — Machine Learning textbook
Decision trees (Ch. 3) and Naive Bayes (Ch. 6) directly applicable to WSD feature classification. Information gain as splitting criterion.
2001
SemEval Competitions Begin
Shared tasks for WSD evaluation. Established standard benchmarks. MFS baseline (~60%) proved hard to beat with classical methods.
2019
BERT Contextual Embeddings — WSD solved?
Hadiwinoto & Ng (2019): BERT representations give ~80% F1 on all-words WSD. But classical methods still matter for low-resource and interpretable systems.

// Teaching Sequence

TopicCategoryDifficultyKey LibraryBuilds OnReference
01 — PP Attachment AnalysisPhrase Attachment⬛⬛⬜⬜nltk, spacyModule 3 CFGAllen Ch.4, Hindle & Rooth
02 — NP/VP/PP Chunking (IOB)Phrase Attachment⬛⬛⬜⬜nltk, sklearnTopic 01Allen Ch.4, CoNLL-2000
03 — WordNet: Synsets & SimilarityLexical Relations⬛⬜⬜⬜nltk.wordnetModule 2 MorphologyAllen Ch.7, Miller 1995
04 — PMI & Distributional SemanticsLexical Relations⬛⬛⬜⬜numpy, sklearnTopic 03Cover & Thomas §2.3
05 — Homonymy vs Polysemy AnalysisHomo/Polysemy⬛⬜⬜⬜nltk.wordnetTopic 03Allen Ch.7 §7.2
06 — WSD Limitations & MFS BaselineHomo/Polysemy⬛⬛⬜⬜nltk, pandasTopics 03–05Allen Ch.7, SemEval
07 — Lesk Algorithm (Dict-Based WSD)WSD Algorithms⬛⬛⬜⬜nltk.wordnetTopics 05–06Lesk 1986, Allen Ch.7
08 — Yarowsky Decision List WSDWSD Algorithms⬛⬛⬛⬜nltk, numpyTopics 06–07Yarowsky 1995, Cover & Thomas
09 — ML WSD: NB + DT + SVMML Approach⬛⬛⬛⬜sklearn, nltkAll aboveMitchell Ch.3,6; Gale et al. 1992