The question
Sarcasm is an interesting mode of language that challenges direct language understanding. Once you strip away tone and facial expressions and you're left with text alone, it becomes much harder to tell if something is sarcastic. I wanted to know if a character-level RNN could generate sarcasm using nothing but sequential text.
Sarcasm might be one of the most context and memory dependent modes of language. For a model to generate it, it would have to learn patterns that hint at subtle ironic intent. That's what made the architecture comparison interesting.
What I built
I filtered the News Headlines Dataset for Sarcasm Detection down to sarcastic-only entries sourced from The Onion, around 26,000 headlines. Then I trained three character-level recurrent architectures on the same data: a vanilla RNN, an LSTM, and a GRU, each generating text one character at a time.
The vanilla RNN was the baseline. I figured the vanishing gradient problem would hurt performance specifically on sarcasm, since irony often relies on context established earlier in the sentence. LSTM addresses this with gates that decide what's remembered, what's forgotten, and what's passed forward. GRU does the same but combines the forget and input gates into a single update gate, making it faster to train.
What I found
The LSTM and GRU both outperformed the vanilla RNN, which wasn't surprising — the gating mechanisms help them retain structure over longer sequences. The generated headlines were often grammatically plausible and occasionally genuinely funny, which says something about how much of The Onion's style lives in structure and rhythm rather than content. The model learned things like "Area Man" constructions and the specific deadpan framing without being told what any of those things meant.
Try it yourself
The code and trained models are on GitHub. To generate headlines, clone the repo and run:
python generate.py models/lstm_256_2_0.005.pt --prime_str "Area Man" --temperature 0.65 --predict_len 60 Swap the prime string, temperature, and length to see how the outputs change.