[Crypto data science #1] Programmatical analysis of 200k tweets BEFORE and AFTER correction started. What has changed?

Wondering how people in general feel about Bitcoin? Bullish or Bearish? Let's write a Twitter analyzer :) To find out, you obviously need a big amount of REAL DATA. And where can we usually find such data? Well, of course on social media...I still sometimes can't believe we're giving a private information about our lives out these daily....and for free. One of the hottest topics in IT last couple of years is machine learning and artificial intelligence. Combining a big amount of data from social media and knowledge in field of machine learning can bring some interesting insights into WHAT CHANGED IN BITCOIN SINCE DECEMBER BULLRUN. Let's get into it.

"We've harvested tens of millions of Facebook profiles to change how average American thinks and specifically targetted every individual according to his online actions"...Only several days ago former employee of Cambridge Analytica opened up how data from social media are being misused...

Comparison - Twitter users about bitcoin BEFORE and AFTER correction

Steps

  1. Define time interval
    Before correction
    Correction
    Bitcoin
    11.09.2017 - 16.12.2017
    17.12.2017 - 22.03.2018
    Ethereum
    15.11.2017 - 14.01.2018
    15.01.2018 - 22.03.201
  2. Get the data

    First problem was how to get the historical Twitter data, because their API is limited just to last two weeks. At the end, I've used this Git project which basically serves as a workaround to Twitter API and directly curl requests historical tweets. I've downloaded thousand tweets for every day in the interval and I've worked just with English tweets.

  3. Create a wordcloud from frequently used words

    I've created a basic script which converts each tweet to an array of words, which was requird as a parameter of the wordcloud constructor. I really could not believe constructor asked for such input which in my eyes is not correct at all. Trasforming tweets to array is just an extra step for a used but whatever...

    Bitcoin

    Comparison of wordclouds from before and after correction start:

    As you can see, both clouds contain many typical words common for crypto related tweets. To get more interesting output, I had to get rid of these. At first I've considered filtering out some % of mostly used words, but that might potentially affect the results, so at the end I've decided to just ignore specified words such as bitcoin, blockchain, cryptocurrency, crypto, ethereum, btc, https, twitter..After filtering these out, results look immediately much more interesting:

  4. Analysis. What probably can't go unnoticed is:
    • Now vs. Will - I think the fall/gain of these two words has been caused by sentences like "Future is happening NOW" or "Earn money with bitcoin NOW!" which were so popular during the whole bullrun. After the correction started, Tweets like "Bitcoin WILL rise again" or It is just a matter of time till Bitcoin WILL go higher again
    • Cash & Money - Both of these words massively gained on popularity after the correction. It's probably caused by the fact that many people switched back to FIAT currencies and locked their profits in cash and money
    • Coinbase vs. Binance - I think the initial popularity of the word Coinbase has been caused by sooo many new people entering the cryptoworld and most of these people probably used Coinbase to buy their first bitcoins - this was happening till mid December, during bitcoin bullrun. But in the second half of December and January, because of the crazy altcoin boom, people started moving more towards Binance which had a crazy rise up and is one of the biggest exchanges right now and people can buy most of the TOP100 altcoins there.
    • I feel there are some more interesting patterns in these wordclouds, but it might be possible that my eyes just see what brain thinks, so I'll let it on You, if you see there something more.
  5. Execute sentiment analysis

    I've trained 4 machine learning algorithms to evaluate tweet on scale from 0 to 1 regarding its positivity or negativity. To train the algorithms, I've used dataset called Sentiment140, which contains 1,6 million labeled tweets. After training phase, once algorithms were clever enough. I've feed to them our Bitcoin tweets and let them to evaluate these.

    After a brief look, the graph doesn't seem that much changed, but it's important to realize that Bitcoin is wordlwide phenomenon and everyone talks about it - therefore sentiment doesn't oscilate as crazy as some shitcoin would (which is talked about by 20 people only). Despite this, we can beautifully follow events of the last months and see them effecting sentiment and pushing it up-n-down between score 0.6 and 0.7:

    • During the whole bullrun towards the end of 2017, sentiment almost never got under the "moving average" line
    • We can nicely seen drastic drop in sentiment during the "Bitcoin Cash weekend", where it got to 50% evaluation and there has been a war on social media between these 2 camps
    • Sentiment got much worse already BEFORE correction! - COULD WE PREDICT THIS???
    • During the whole correction was sentiment of 3day groups almost always under moving average.
    • Only exception was the first bulltrap, where people immediately got excited, because we were "mooning" again:D
    • After 6th-7th February, there has been a huge sentiment boost, as we strongly bounced off 6k. Sentiment grew from under 0.6 to 0.7 (cca 20% gain)
    • The general trend is hardly to be seen on small time scale, but if we "zoom out", we can see an uptrend

    This graph looks bit weird but getting tweets for all those years would for sure kill my PC :D. On bigger timeframe, we can see there's an uptrend in average sentiment. Also, don't get angry at me, I'm actually not that negative on Bitcoin Cash, it's just for fun :D
  6. WOW, as always with coding, this took muuuuuch more time that originally estimated. I wanted to do ETHEREUM analysis as well but it'll have to be in another post cuz im straight up DEAD! Next coffee would kill me :)

    Last but not least, proof of originality of my work - wordcould of most popular hastags (hashcloud ?) over the whole 6 analyzed months:

    Thanks for reading!
    Martin

    ***


    You can find my latest posts here:

    1. DROPSHIPPING mania. How people passively earn thousands of $$$ and why I find it DISGUSTING...!!
    2. Are we getting more and more STUPID!? ...DEFENSE of 2 latest generations and the cloud storage
    3. [Skydive jump] How 10 minutes destroyed my next 2 weeks, but enhanced many of upcoming years
    4. 10minute freewrite SCHIZOPHRENIC entry - Same topic, 2 points of view, 2 entries...CHOOSE YOUR ATTITUDE!
    5. 12 hours in train - 1 weird comparison and yet another reason why I HATE alcohol

H2
H3
H4
3 columns
2 columns
1 column
17 Comments