Generating text using LSTMs

In this collection of projects I did for various classes, I present the use of character RNNs and LSTMS in different language modeling tasks.
These were inspired by Andrej Karpathy’s blog article wherein he used character-RNNs to produce work similar to Shakespeare, Wikipedia articles, etc.

Recent years have seen an increase in the creation of artistic productions such as movie screenplays, paintings and music by special deep machine learning models called Artificial Neural Networks (ANNs or simply Neural Networks) that were inspired by the human brain. A Neural Network enables a computer to learn tasks such as image recognition, audio and text generation through training. It consists of layers of small processing units called neurons which are densely interconnected to one another to accept, represent and output information.

Recurrent Neural Networks (RNNs)

Over the years many different kinds of neural networks have been developed to target different tasks. One such ANN is called the Recurrent Neural Network which is used to process sequential data, such as text, due to its unique ability to ‘remember’ its previous inputs by means of an internal loop and non-linear dynamics.

This means that it can ‘remember’ and create language structures such as sentences by remembering sequences of words and create words by remembering sequences of characters.The latter is called character-RNN as it uses characters as inputs to produce words.

RNNs are an interesting topic in machine learning and natural language processing due to their somewhat magical ability to learn language structure based simply on sequences and predictions without explicit definitions of grammar.

Long Short Term Memory (LSTM)

Traditional RNNs work in a looping manner such that the outputs of the network are fed back into it as inputs.Thus, they allow for information to persist for a while making them really good at generating sequential data by allowing past information to be incorporated into generation of new information. However, with the increase in the gap between the past and present information that is needed to generate meaningful text, traditional RNNs fail to perform as well.

Long Short Term Memory models are special RNNs that show superior performance to vanilla RNNs when it comes to text generation and language modeling tasks.
LSTMs make up for the shortcomings of traditional RNNs as their architecture allows them to retain information for longer periods allowing the model to connect information that was inputted further in the past.

Language Modeling Projects

In this post, I have included outputs of 5 distinct datasets that I used to generate text:

Donald Trump Tweets
Quotes
Shakespeare's Works
Harry Potter Novels
NumPy Library

Donald Trump Tweets

I did this project for my Advanced Machine Learning Methods class in 2018.
You can view the complete project here.

Tweets outputted by the model. What is interesting is that it tried to generate its own links and hashtags.

In this project I explored the generation of tweets using Long Short Term Memory (LSTM) models trained on twitter data from Donald Trump. In terms of models, I explore the use of different methods of optimization and batch size in character level LSTMS and I also tried a word level model to see if the text generated would be better in terms of semantics, spelling and grammar.

I found that character level models seemed to work better for this particular dataset and that the Adam optimizer generated text that made the most sense.

Quotes

I did this project for my Neural Networks and Deep Learning class in 2018.
You can view the complete project here.

Some interesting quotes generated from different architectures/hyper-parameters. The underlined phrases are the seeds used and the words that follow are produced by the model.

In this project I used character level LSTMs on a quotes dataset that I extracted from github to produce similar quotes. I thought it would be interesting to see whether inputting quotes from different people would enable the model to generate wisdom that combined all the knowledge that was represented in the dataset.

One of my personal favorites was:

Shakespeare's Works

I did this project for my Image Recognition class in 2018.
You can view the complete project here.

The above articles do not make much sense grammatically or semantically. While they do follow the format of a dialog, their contents are not indicative of a conversation the characters are having with one another. Rather, each dialog seems to belong to a separate conversation. (This may actually pass of as artistic though!) However, these articles from this particular model were interesting to me based on the fact that the writing style is similar to the iambic pentameter used by Shakespeare.

With a different model,

While there are still some spelling errors and some random words, this model has learned the dialog format of Shakespeare’s play pretty well. There are indications of overfitting wherein the model has memorized the names of the characters (Duchess of York, Romeo, King Lewis etc.) and produced phrases such as “I do beseech you, sir”. However, these could just be due to the sampling temperature and not due to overfitting. Overall, there is definitely some improvement in the language and format of the dialogues from model 1.

Harry Potter Novels

I did this project for my Image Recognition class in 2018.
You can view the complete project here.

It is evident from the articles that while the language and spelling are not that great (although the model cannot be blamed entirely for this since Harry Potter contains a lot of made up words), the dialog makes more sense.than that generated using the Shakespeare dataset.

I decided to be adventurous and use a larger dataset of Harry Potter novels (1,4,6 &7) to see if there would be language level improvements in grammar, spelling, punctuation etc. given a larger training set. I decided to keep the model parameters the same since I just wanted to isolate the effects of the dataset size.

However, since the length of the sentences and the format of the text was different from that of Shakespeare, I increased the sequence length to 100. I also used a different prime, ‘Harry’.

One of my personal favorites was:

NumPy code

I did this project for my Image Recognition class in 2018.
You can view the complete project here.

This shows the model’s attempt at generating a function.

This shows more interesting data. I compiled a few examples from what my algorithm generated as I was interested in seeing how the generated articles would call the functions.While there is some overfitting (the model outputs exact calls from the input text), there are some interesting calls such as (12) wherein the model has created a new function called are and has passed parameters to it. While they may not be syntactically correct, it’s a pretty good attempt.

Here, I extracted some python code snippets from the open source NumPy library and trained a similar model on this new data. I chose the NumPy library since python is closer to natural language than most other programming languages and I thought this might generate interesting content.

You may also like