Text summarization, topic models and RNNs

I’ve recently given a couple of talks (PyGotham video, PyGotham slides, Strata NYC slides) about text summarization.

I cover three ways of automatically summarizing text. One is an extremely simple algorithm from the 1950s, one uses Latent Dirichlet Allocation, and one uses skipthoughts and recurrent neural networks.

The talk is conceptual, and avoids code and mathematics. So here is a list of resources if you’re interested in text summarization and want to dive deeper.

This list useful is hopefully also useful if you’re interested in topic modelling or neural networks for other reasons.

Topic modelling

Now LDA is available in scikit-learn, it’s tempting to skip the mathematics of topic modelling entirely. But don’t. At least cover these two resources.

Neural networks

If you Google “neural networks” or, god forbid, “deep learning”, you will be overwhelmed with a torrent of bullshit. The first four things on this list make a great introductory reading/watching list. They’re hype-free and they’re by technical people who know what they’re talking about. If you cover those four, you’ll be well placed.