MTech NLP · Module 2

Concepts That
Need Coding

An interactive map of all topics in Module 2 that involve programming — with tasks, tools, difficulty levels, and teaching sequence.

9 Coding Topics
5 Categories
3 Difficulty Levels
Module 2 Morphology & LMs
Filter →
TOPIC 01
Inflectional & Derivational Morphology
Morphology
⬛⬜⬜⬜ Beginner
Analyse English word forms — identify stems, affixes, inflections (plays→play+s) and derivations (happy→happiness). Use NLTK WordNet + MorphyFeedback.
NLTK WordNet spaCy
TOPIC 02
Regular Expressions for NLP
Regex
⬛⬜⬜⬜ Beginner
Write Python re patterns for real NLP tasks: tokenization, date/email extraction, phone number normalization, and pattern-based morpheme detection.
Python re NLTK RegexpTokenizer
TOPIC 03
Finite Automata (DFA & NFA)
Automata
⬛⬛⬜⬜ Intermediate
Implement a Deterministic Finite Automaton (DFA) and Non-deterministic FA (NFA) in Python. Use them to recognise morphological patterns (e.g., valid English plurals).
Python (pure) networkx matplotlib
TOPIC 04
Finite State Transducers (FST)
FST
⬛⬛⬛⬜ Advanced
Implement an FST that maps between surface forms and underlying morphological forms. E.g., "foxes" ↔ "fox+N+PL". Understand input/output tape duality.
Python (pure) graphviz fst library (optional)
TOPIC 05
Morphological Parsing with FST
FST
⬛⬛⬛⬜ Advanced
Use the FST framework to parse words into morpheme sequences. Input: surface word → Output: lemma + morphological tags (POS, tense, number, person).
Python (pure) NLTK spaCy morphology
TOPIC 06
Lexicon-Free FST
FST
⬛⬛⬛⬜ Advanced
Build a morphological analyser that works without a pre-defined word list — using only phonological/orthographic rules encoded in the transducer itself.
Python (pure) re
TOPIC 07
Porter Stemmer
Morphology
⬛⬜⬜⬜ Beginner
Implement the Porter Stemmer algorithm from scratch (5 rule phases). Compare with NLTK's built-in implementation and with Snowball/Lancaster stemmers.
Python (pure) NLTK re
TOPIC 08
N-Gram Language Model
N-Grams
⬛⬛⬜⬜ Intermediate
Build Unigram, Bigram, and Trigram language models from a corpus. Compute MLE probabilities, apply Laplace smoothing, and calculate sentence perplexity.
Python (pure) NLTK collections numpy
TOPIC 09
N-Gram Spelling Correction
N-Grams
⬛⬛⬜⬜ Intermediate
Implement a noisy channel spelling corrector combining an edit-distance error model with an n-gram language model. Rank candidates by P(word) × P(error|word).
Python (pure) NLTK difflib numpy

Recommended Teaching Sequence

Topic 02
Regex
start here
Topic 01
Morphology
concepts
Topic 07
Porter Stemmer
apply it
Topic 03
Finite Automata
theory
Topic 04
FST
extend FA
Topic 05
Morph. Parsing
apply FST
Topic 06
Lexicon-Free FST
extend
Topic 08
N-Gram LM
statistics
Topic 09
Spell Correct
apply LM
TOPIC CATEGORY PRIMARY LIBRARY DIFFICULTY BUILDS ON
01 · Morphology Analysis Morphology NLTK, WordNet ⬛⬜⬜⬜ Module 1 Pipeline
02 · Regular Expressions Regex Python re ⬛⬜⬜⬜ Python basics
03 · Finite Automata Automata Pure Python ⬛⬛⬜⬜ Topic 02 (Regex)
04 · Finite State Transducers FST Pure Python ⬛⬛⬛⬜ Topic 03 (FA)
05 · Morphological Parsing FST NLTK + spaCy ⬛⬛⬛⬜ Topics 01 + 04
06 · Lexicon-Free FST FST Pure Python ⬛⬛⬛⬜ Topic 05
07 · Porter Stemmer Morphology NLTK ⬛⬜⬜⬜ Topics 01 + 02
08 · N-Gram Language Model N-Grams NLTK + NumPy ⬛⬛⬜⬜ Module 1 basics
09 · N-Gram Spell Correction N-Grams NLTK + difflib ⬛⬛⬜⬜ Topic 08