Can Neural Networks Learn to FedSpeak?

Can Neural Networks Learn to FedSpeak?

Deciphering FedSpeak1, the words and intentions of the Federal Reserve, has become a full-time job in financial markets and media. And markets have placed their bets leading up to the July 31 Federal Open Market Committee (FOMC) decision. As of today, the widely watched CME FedWatch Tool (which calculates probabilities of rate changes using Federal Funds Futures) has a 100% probability of a Fed rate cut on July 31 (77.5% chance of 25 bps cut; 22.5% of a 50 bps cut).

I was inspired to see if Artificial Intelligence could provide a new way to analyze the volumes of communication coming out of the Fed. So I built a Natural Language Processing (NLP) Fed sentiment model using FOMC statements going back to 1994. The model employs a long-short-term-memory (LSTM) Neural Network architecture and is built using PyTorch and libraries. I plan to continue to collect, label, and feed more data into the model to keep expanding its predictive depth and breadth.

Once the model was trained and validated, I tested it using recent Fed communication including the most recent FOMC Statement and Minutes, as well as recent speeches, testimony, and comments by the ten FOMC voting members. The results were decidedly mixed:

Probability of Rate Decision Implied by NLP Model
Note: no relevant comments from Michelle Bowman)

It is possible that the die is cast and the Fed will cut. Certainly, it looks like the vote will be split, perhaps 8-3 or 7-4. Regardless, betting on a rate cut is a crowded trade. Markets have a tendency to punish crowded trades and reward asymmetric, seemingly low probability, positions.

Part of the fever pitch debate centers around the lack of compelling economic data to support a rate cut. I will address the quantitative data in my next post.

Detailed Predictions

I collected the most recent FOMC Statement and Minutes as well as the most recent speeches or comments from the voting FOMC members.

NLP Model Predictions for July 2019 FOMC Decision

[table id=22 /]

You can find the code for the model here. For more detail on how the model was built, keep reading.

How the Model Works

Training Data

I pulled Fed statements going back to 1994 and uploaded all of them along with the next rate decision that was made after a given statement, either Raise, Hold, or Lower.

Model Architecture

The sentiment model is a Long Short-Term Memory (LSTM) neural network. LSTMs are a form of Recurrent Neural Networks (RNNs).

There are many types of neural network architectures, and new ones are being theorized and implemented daily. Modern neural networks for NLP employ RNNs because their architecture allows for memory from previous layers to be retained. This is very relevant for problems where prior information is relevant to current, such as time series models (i.e., stock price prediction where yesterday’s price may have an impact on today’s price). They are also useful in NLP because a prior letter or word is quite useful in predicting the next letter or word.

A famous blog post about LSTMs (more on that below) by Christopher Olah provides some great diagrams to visualize how an RNN works.

Source: Understanding LSTM Networks, August 27, 2015

xt-1 represents input x at time t-1. ht-1 represents the output or activations based on initially random weights inside A. In a standard neural network, the output ht-1 would have no bearing on the next step in the network, xt or ht. But in an RNN, activations generated at xt-1 are incorporated, via a loop, into the next step, to ensure that memory is retained. This can be done over and over and over again.

While a significant improvement for problems that require memory, RNNs have a major limitation, that they generally have short-term memory. Say a text sequence is long enough, an RNN will have a hard time carrying information from earlier time steps to later ones. So if you are trying to process a paragraph of text, RNN’s may leave out important information from the beginning of that paragraph that is applicable in terms of the context. More technically, RNNs suffer from a vanishing gradient problem, in that it becomes difficult to effectively train a model as it gets bigger as the gradient goes to zero.

Olah’s blog post provided a clear example. If early text states “I grew up in France” and later text states “I speak fluent French,” and you are trying to predict the word after “fluent”, it is relevant to know that earlier in the text, it stated that I grew up in France. However, Olah states:

“It’s entirely possible for the gap between the relevant information and the point where it is needed to become very large.”

Understanding LSTM Networks, August 27, 2015

LSTM’s corrected for this limitation by allowing for memory to be maintained in a model. Again, a visual from Olah helps.

Source: Understanding LSTM Networks, August 27, 2015

The key to the memory capability are gates, weighted functions that further govern the information flow in each layer of the network. There are three gates:

  • Forget Gate: decides what information to discard from the layer
  • Input Gate: decides which values from the input to update into the current layer
  • Output Gate: decides what to output based on input and the memory of the layer

All of this is accomplished, like in most neural networks, with matrix multiplication and the use of non-linear functions such as tanh, sigmoid, and more recently Rectified Linear Unit or ReLU. That level of detail is beyond the scope of this post.

Transfer Learning with AWD-LSTM

The specific LSTM architecture that is used (AWD_LSTM) is inspired by this paper: Regularizing and Optimizing LSTM Language Models by Stephen Merity, Nitish Shirish Keskar, Richard Socher, published August 7, 2017. The key idea to this architecture is to employ dropout, a popular method for regularizing neural networks, with the primary benefit of avoiding over-fitting.

What makes the library so useful is that it allows for the application of Transfer Learning to greatly accelerate most model development. Once your input language data is processed into a form suitable for NLP training, it can be trained on AWD_LSTM which is a pre-existing model trained on over 100 million tokens (mostly worlds) from “good” and “featured” English language articles in Wikipedia. This model has essentially learned the English language! By starting with AWD_LSTM, I didn’t need to start from scratch and could train my Fed sentiment model with a lot less data and computation time

Building the Vocabulary

Behind the scenes, the model takes the input data (all of the statements) and through the process of Tokenization and Numericalization, creates a vocabulary and converts it to numbers suitable for deep learning. I used the default settings which are to limit the total vocabulary to 60,000 words and to not create tokens for any words that appear less than twice in the input data. Below, is an example of the first 20 words in the vocabulary inside data_lm (the data object) after Tokenization and Numericalization. They are now just numbers, suitable for deep learning optimization.

The Core Language Model

After some training and tweaking of learning rates, the core language model achieved an 83% accuracy rate:

This language model is now able to “Fed-Speak”. So when I fed (no pun intended) the model three words “Information received since” and asked it to produce the next 20 words, here is what it produced:

The Classification Model

Prior to moving on, I saved the encoder generated by the language model. Encoders are beyond the scope of this post; more information on them can be found here. Saving the encoder ensures that any classification/predictive work that is done uses the same vocabulary model that was just built. Recall, when I processed the data, I hand-labeled it with Raise, Lower, or Hold rates. I can now take the language model that learned to “Fed-Speak” and create a model to classify/predict rate change decisions.

I was able to train the classification/prediction model to about 76% accuracy. I’d certainly like it to be higher and would expect that a much larger dataset would aid in this goal.

I look forward to adding quite a bit more training data to this Fed sentiment model, perhaps evolving it into a Central Bank sentiment model that incorporates data from the European Central Bank (ECB), Bank of Japan (BOJ), and People’s Bank of China (PBOC).

[1]: from Wikipedia: “The notion of fed speak originated from the fact that financial markets placed a heavy value on the statements made by Federal Reserve governors, which could in turn lead to a self-fulfilling prophecy. To prevent this, the governors developed a language, termed Fedspeak, in which ambiguous and cautious statements were made to purposefully obscure and detract meaning from the statement.”

Any opinions or forecasts contained herein reflect the personal and subjective judgments and assumptions of the author only. There can be no assurance that developments will transpire as forecasted and actual results will be different. The accuracy of data is not guaranteed but represents the author’s best judgment and can be derived from a variety of sources. The information is subject to change at any time without notice.