Memory and context play a huge role in helping humans interpret the world. If you encounter a word like “spring” while reading, you don’t usually have to ask yourself whether it’s a verb meaning “to jump,” or a noun referring to the season after winter, or another noun referring to a coil of metal, because the context of the sentence makes it clear. But as with many tasks, what’s effortless for humans can be incredibly difficult for computers.
We’ve spent a good deal of time looking at various types of neural networks and their applications. We started with feedforward networks and their ability to sort and label objects based on shared characteristics, and then we explored convolutional networks, which are particularly well-suited to decoding images. These types of networks fall short, however, when it comes to processing sequences of inputs where interpreting each element builds on the previous one.
Fortunately, there’s another type of neural network that solves this very problem. Recurrent neural networks (RNNs) are ideal for processing sequences and lists, making them essential for many advanced natural language processing (NLP) tasks. They’re the secret sauce behind Google’s revamped Google Translate program, and they’re becoming increasingly important to a number of deep learning tasks, from machine translation, to natural language generation, to computer vision.
The Shortcomings of Feedforward Networks
Before we cover what makes recurrent networks special, let’s take a brief look again at a standard feedforward neural network. Generally, a fixed input leads to a fixed output. For example, you input a series of photographs, and the network tells you whether there’s a person present or not in each photograph as an output. Another example: You might give the network a number of suspicious-looking financial transactions, and the network might give you the likelihood each of those transactions is fraudulent. In both these examples, the neural network assumes each input and output to be independent, ie, whether a given transaction looks fishy has no bearing on whether the one following it will also be fishy.
This assumption works fine for many different tasks, but in other situations it’s a huge problem. Let’s take an important area of AI: machine translation. How do you teach a computer to understand the sentence: “I asked the tourism bureau for a complimentary guide”? Even if your algorithm understands the sentence at a grammatical level, it may have no idea how to choose between the various possible definitions for many of the individual words.
This was the case for many years at Google, where their translation service relied on “phrase-based statistical machine translation,” an algorithm that takes every word or phrase of a sentence in the source language and tries to find the most direct translation for it in the target language. In this approach, each word or phrase is treated totally separately. This leads to lots of situations where the translation might be technically accurate, but also absurd or confusing, especially with words that have multiple definitions. Thus, a sentence like “I asked the tourism bureau for a complimentary guide” might come out something like “I asked the exploration wardrobe for a polite escort.”
Adding Important Context
Obviously, to understand a word in a sentence, you have to understand the words that precede (and sometimes follow) it. In AI parlance, the standard neural networks can’t do this because they lack persistence, that is, the ability to accrue and retain information across inputs. Practically what that means is that these networks aren’t able to put any individual input or output in context, frequently leading them to miss the forest for the trees.
What makes RNNs “recurrent” is that they take into account all the preceding inputs when determining an output value. In a standard NN, an x input goes through a hidden layer, where it’s transformed into the output y. In an RNN, though, the output y isn’t just based on the input x–it also takes into account every x that preceded it. The precise techniques and math involved can get a little complicated, but for our purposes it’s enough to say that the network creates a “memory” of prior inputs that it keeps in mind as it moves through the sequence.
Thus, when our RNN gets to the words “complimentary guide” it will take into account that the context is a tourism bureau, making it more likely to deduce that we’re talking about a free brochure rather than a courteous attendant.
We’ve looked at the problem of machine translation, but what are some other areas where RNNs are frequently used? The generalizability of RNNs make them well suited to a variety of tasks involving sequential data, whether in the form of sentences or videos. It’s also important to note that it’s also possible to process fixed vectors in a sequential manner.
- Natural Language Generation. We’ve focused on machine translation in this article, but RNNs can be applied to many different NLP tasks. Natural language generation in particular benefits from the increased flexibility and expressivity that RNNs can facilitate. Natural language generating systems can be used to create reports and summarize data from complex systems without needing a human to intervene and interpret the results. It’s even been shown to generate not just syntactically correct natural language, but also valid (if unusual looking) XML and HTML, and even diagrams in LaTeX.
- Video classification and captioning. As the number of videos online continues to proliferate, the task of classifying and describing these videos becomes both more important and more difficult. The flexibility of RNNs to handle variable inputs and outputs make them well suited to generating descriptions of videos based on salient features. In the case of computer vision, RNNs can also be combined with convolutional neural networks.
- Sentiment analysis. Sentiment analysis is widely used in both business intelligence and recommendation engines. Knowledge-based techniques for sentiment analysis consist of assigning positive or negative values to words and then searching for them in a text. However, these techniques can miss important contextual signals–whether a movie reviewer thought the film was “bad” or whether they were just describing a character who did a “bad” thing is not something these techniques can easily distinguish. RNNs however can place these terms in context and potentially provide more accurate classifications.
Challenges for RNNs
- What if a sequence is really long? Theoretically, an RNN could remember arbitrarily long sequences, but in practice memory and processing constraints limit the number of steps the network can remember. This can be a problem in particularly long sentences, where understanding the word at the end of the sentence requires remembering the first word. (This can be especially tricky in languages like German, where the verb often comes at the end of the sentence.) These are called long-term dependencies, and managing them is a particular challenge for RNNs. Fortunately, there’s a special variation of RNN called Long Short-Term Memory networks (LSTMs) that address this very problem.
- What happens if a given input depends on something that comes after? Take our earlier example of the phrase “chest of drawers.” It may not be clear to the algorithm that “chest” here refers to a piece of furniture and not a human torso until it reaches the word “drawers.” So how is it supposed to know? In this case, we can use another variation on the RNN called a Bidirectional RNN. These networks are essentially two RNNs stacked on top of one another, running in opposite directions. In these networks, the output is calculated based on the memory of both RNNs. Thus, the output for the word “chest” will take into account both the words preceding it, as well as the words “of drawers” that follow, making it much more likely to settle on the correct noun.
Implementing Recurrent Neural Networks
Like any deep learning project, RNNs are a complex endeavor that requires resources and expertise. Keep in mind, RNNs are a relatively recent development. There’s a lot of research and experimentation going on right now, which is exciting but also means that unfortunately the options for implementing RNNs aren’t as robust or mature as for other types of neural networks. That doesn’t mean they need to be coded entirely from scratch, however. Various types of RNNs can be implemented using libraries for major machine learning libraries like Theano, Torch, and TensorFlow.
If you are interested in learning how RNNs might help your project, you’ll likely need a data scientists with extensive experience in neural networks. Depending on the specifics of your project, you may also look for expertise in natural language processing, computer vision, and unsupervised learning.
Ready to get started? Create an awesome job post that attracts the freelancers and skills you need.
This article originally appeared in Upwork.