Believe Me When I Say To You, I Hope You Love a Trade War Too!

Believe Me When I Say To You, I Hope You Love a Trade War Too!

NLP Analysis of the President’s Tweets on Trade (With Some Inspiration From Russians by Sting)

In Europe and America, There’s a growing feeling of hysteria.

Markets are catching up to the staying power of the trade war. As I wrote on May 23rd here:

Investors continue to assume that what President Trump says and does on trade will be two different things. Actual events have proven otherwise, and global markets have not yet figured this out. We are on the verge of an upshift in geopolitical tension that will impact all markets.

Complexity Everywhere, May 23, 2019

Conditioned to respond to all the threats, In the rhetorical tweets of the President; Mister Trump says, “We will trade with you”.

Don’t take my word for it, take his. Since launching his Twitter account in 2009, the President has tweeted/retweeted more than 39,000 times. I built an NLP language model trained on over 97% of his tweets as well as a sentiment classifier to determine the positive or negative sentiment of any tweet (achieving about 85% accuracy in training). The models employ ULMFiT, a state-of-the-art NLP transfer learning method developed by Jeremy Howard and Sebastian Ruder, that is fully implemented in the library.

I don’t subscribe to this point of view, His tweets say it would be an ignorant thing to do.

I then took the ~690 tweets that included the words China, Chinese, or President Xi and ran them through the classifier. A staggering 97% of 413 tweets prior to Election Day in 2016 had negative sentiment, down to 62% since then. Of note, ALL 21 tweets about these topics during the 2016 campaign were negative. Read more about the model in the NLP Model Detail section below; full code can be found here.

Further, here is a word cloud of the most frequent words in President Trump’s tweets about China (excluding the words China, Trump, President, Trade, Russia, Mexico, Chinese). The four most common words are US, Now, Will, and Tariff.

It is up to you to decide if the President’s pre- Election Day tweets are more or less representative of his core beliefs. I believe they are more.

Negative China sentiment seems like good politics – the campaign clearly went all-in on negative with the Presidency on the line. And with the 454 days until the election, the President is focused on politics, not markets.

Believe me when I say to you, Mr. Trump does not love a trade deal too.

How can I save my White House joy, from 2020’s deadly toy?

A trade war is good politics. I wrote in January, May, and June about a global battle being waged between Freedom and Autocracy. The President, despite being attacked for his own supposed autocratic instincts, has successfully framed himself at the center of this debate, on the side of Freedom. This message will continue to resonate in the heartland, even in places affected by the trade war.

There is monopoly on China sense, On either side of the political fence.

Don’t expect any help from leading Democratic candidates. The President has successfully boxed the Democratic candidates into a corner. How do you think a “Let’s Play Nice With China” slogan will do in Wisconsin? A combination of true belief in fairness and cold political calculus will likely push the Democratic challengers to produce even tougher platforms on China.

There is such thing as a winnable war, We don’t believe the narrative anymore.

Two trends in the economy are sure to factor into the President’s re-election strategy. Both support a protracted battle.

First, reports of companies pulling production out of China has increased in recent months, see here, here, and here. This will embolden the White House and the campaign to double-down.

Second, the impact of job losses in key sectors such as farming may not matter to Trump’s reelection strategy. About 54% of farming jobs are in the 3 Pacific states, all of which will vote blue in 2020. In terms of the President’s Rust Belt firewall, Pennsylvania, Ohio, Wisconsin, and Michigan house only 5% of the country’s farming jobs.

[table id=23 /]

This topic requires more in-depth analysis and its own future post.

We share the same biology, regardless of ideology.

Cycles matter, and perhaps there is nothing new under the sun. As discussed in my 2019 Outlook here, public sector confidence is in a down cycle and the prevailing global order (established post-WW2 about 75 years) is undergoing a transition to a multi-polar framework. During this shift, the risk of a miscalculated move by a frustrated public sector actor is VERY HIGH. Perhaps that is a protracted trade war or overreach in Hong Kong or Kashmir.

Incidentally, all of this is occurring in alignment with the end of a long-term debt cycle, which typically lasts about the same length of time, per Ray Dalio. It is thus no surprise that major political and societal events also have a 72-75 year rhyme to them, as I wrote about here.

There is now historical precedent, To understand the words from the mouth of the president. Mr. Trump says he will protect you, Do you subscribe to this point of view?

Believe me when I say to you, Mr. Trump does not love a trade deal too.

Trade War NLP Model Details

Data Pre-Processing

I leveraged this great function built by Ronald Wahome to clean up the raw tweet data. It removed hashtags, punctuation, whitespace, tickers, etc. I also went through and removed all retweets of non-English language tweets.

def processTweet(tweet):
    # Remove HTML special entities (e.g. &)
    tweet = re.sub(r'\&\w*;', '', tweet)
    #Convert @username to AT_USER
    tweet = re.sub('@[^\s]+','',tweet)
    # Remove tickers
    tweet = re.sub(r'\$\w*', '', tweet)
    # To lowercase
    tweet = tweet.lower()
    # Remove hyperlinks
    tweet = re.sub(r'https?:\/\/.*\/\w*', '', tweet)
    # Remove hashtags
    tweet = re.sub(r'#\w*', '', tweet)
    # Remove Punctuation and split 's, 't, 've with a space for filter
    tweet = re.sub(r'[' + punctuation.replace('@', '') + ']+', ' ', tweet)
    # Remove whitespace (including new line characters)
    tweet = re.sub(r'\s\s+', ' ', tweet)
    # Remove single space remaining at the front of the tweet.
    tweet = tweet.lstrip(' ') 
    # Remove characters beyond Basic Multilingual Plane (BMP) of Unicode:
    tweet = ''.join(c for c in tweet if c <= '\uFFFF') 
    return tweet

Language Model

In addition to, this article by Prashant Rao was helpful in completing this analysis. As discussed in the ULMFiT paper and on, the first step is:

“The LM is trained on a general-domain corpus to capture general features of the language in different layers.”

Howard, J, Ruder, S. Universal Language Model Fine-tuning for Text Classification

ULMFiT achieves this by training on the Wikitext-103 (Merity et al., 2017b), which contains 28,595 preprocessed Wikipedia articles and 103 million words. The theory, like in all transfer learning, is to avoid expensive computation and leverage a pre-trained model to analyze one’s own particular corpus. That leaves you to fine-tune the pre-trained model on your target data, and train a classifier (using the language model).

Tokenization and Numericalization

After reading the cleaned tweets into a Pandas dataframe, the first step is to create a DataBunch suitable for training. The primary decisions here are to select the minimum word frequency, 2, and a maximum vocabulary size of 60,000. The model applies tokenization and numericalization (see here for more on these steps) to any word that appears at least twice in the corpus, and caps the total vocabulary at 60,000 words. Note that characters like spaces are assigned unique tokens (e.g. space = xxmaj)

data_lm = TextLMDataBunch.from_csv(path, 'ttat.csv', max_vocab = 60000, min_freq=2)
Sample of Tokenized Texts
Find the Optimal Learning Rate

I employ’s learning rate finder tool, which deploys an optimization function to identify at what learning rate our modeled loss begins to diverge. I prefer using a learning rate at a steep point in the curve below.

learn = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.3)
Learning Rate Finder Plot
Fine Tune the First Layer

Per ULMFiT and, I first trained the last layer of the model, having selected 1e-3 as the max learning rate.

“Instead of using the same learning rate for all layers of the model, discriminative fine-tuning allows us to tune each layer with different learning rates. We empirically found it to work well to first choose the learning rate ηL of the last layer by fine-tuning only the last layer and using ηl−1 = η l/2.6 as the learning rate for lower layers.”

Howard, et al (arXiv:1801.06146)
# Run one epoch on last layer
learn.fit_one_cycle(1, 1e-3, moms=(0.8,0.7))
Results of fine-tuning of the last layer
Unfreeze all Layers and Train

ULMFiT also employs Slanted Triangular Learning Rates (SLTR) behind the scenes, based on the following rationale:

…we would like the model to quickly converge to a suitable region of the parameter space in the beginning of training and then refine its parameters. Using the same learning rate (LR) or an annealed learning rate throughout training is not the best way to achieve this behavior. Instead, we propose slanted triangular learning rates (STLR), which first linearly increases the learning rate and then linearly decays it…”

Howard, et al (arXiv:1801.06146)
learn.fit_one_cycle(10, 1e-3, moms=(0.8,0.7))
Results after unfreezing all layers

Classifier Model

I now save the language encoder weights for use in the classification model


The classification model takes as input 100 tweets from the President, hand-labeled for positive or negative sentiment. I am operating on the notion that a small dataset can produce good results, given ULMFiT’s success.

Prepare Data and Build Learner

As before, I build a DataBunch and a learner object. I also make sure to load the encoder weights.

data_clas = TextClasDataBunch.from_csv(path, 'tr_ch_tw_train.csv', vocab=data_lm.train_ds.vocab, bs=8)
learn = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=0.5)
Train the Classifier

As proposed in ULMFiT, the key here is to train the classifier in stages. The paper presents two key concepts. First, they propose Concat Pooling which is accomplished behind the scenes in the model:

“As input documents can consist of hundreds of words, information may get lost if we only consider the last hidden state of the model. For this reason, we concatenate the hidden state at the last time step hT of the document with both the max-pooled and the mean-pooled representation of the hidden states over as many time steps as fit in GPU memory.”

Howard, et al (arXiv:1801.06146)

The second concept is Gradual Unfreezing, where layers are unfrozen in sequence.

“Rather than fine-tuning all layers at once, which risks catastrophic forgetting, we propose to gradually unfreeze the model starting from the last layer as this contains the least general knowledge (Yosinski et al., 2014): We first unfreeze the last layer and fine-tune all unfrozen layers for one epoch. We then unfreeze the next lower frozen layer and repeat, until we finetune all layers until convergence at the last iteration.”

Howard, et al (arXiv:1801.06146)

So that is what I do while also applying the discriminative fine-tuning proposed in the paper.

learn.fit_one_cycle(1, 2e-2, moms=(0.8,0.7))

learn.fit_one_cycle(1, slice(2e-2/(2.6**4),1e-2), moms=(0.8,0.7))

learn.fit_one_cycle(1, slice(5e-3/(2.6**4),5e-3), moms=(0.8,0.7))

learn.fit_one_cycle(2, slice(1e-3/(2.6**4),1e-3), moms=(0.8,0.7))

I ended up with about 85% accuracy.

Results after last training cycle

From here, I read in the complete set of Tweets that referenced the words China, Chinese, or President Xi and ran them through the classifier to produce the sentiment output.

test = pd.read_csv(path/'final_china - output.csv')
test_pred = test
test_pred['sentiment'] = test_pred['text'].apply(lambda row: str(learn.predict(row)[0]))

Any opinions or forecasts contained herein reflect the personal and subjective judgments and assumptions of the author only. There can be no assurance that developments will transpire as forecasted and actual results will be different. The accuracy of data is not guaranteed but represents the author’s best judgment and can be derived from a variety of sources. The information is subject to change at any time without notice.