Site icon The Last Dev Talk about Technologies

Mining the Social Media using Python 2.7

Greetings!

In this post, I will show you how to mine the Social Media, to be more precise Twitter! It is a very simple process and I will show you how to do it in Python 2.7 in a couple of steps.

Step 1 – Install Python Packages

First of all, let’s see the list with all the packages that we are going to use for this project:

Json is already implemented in Python >=2.7 and python-twitter installs all the appropriate packages. After that, you are ready to start!

Step 2 – Make a Twitter app

This is an easy step and I am going to walk you through it. First go here and log in to your twitter account. This is the development site of twitter, where you can build your own apps!

Click on the button “Create new app” at the top right corner. Fill in the blanks with your information and then click on “Create your Twitter application”. Here is an example.

After you have created your app, you will be redirected to the App’s homepage. Go to Keys and Access Tokens and click on “Create My Access Token” at the bottom of the page. At the top of your page, you can find your secret keys and at the bottom your access tokens. Here is an example.

Write down those keys and remember, those keys are secret! DO NOT SHARE! After that you need to adjust your app’s access level, just to avoid further validation (if you are going to use it for your own account you do not need to change this). Go to Permissions->Select “Read Only”->Update Settings. That’s it! Now we can now write code.

Step 3 – Get the Tweets

First of all, we want to import the appropriate packages.

[code language=”python”]
import twitter
import json
[/code]

Json is needed because the twitter API returns us the tweet in json format. For example:

{"created_at": "Wed Mar 01 09:44:29 +0000 2017",
 "hashtags": [], 
 "id": 836874776106926080,
 "id_str": "836874776106926080",
 "lang": "en",
 "media": [
     {... "text": "First blog post https://t.co/Uqp7sA86Tw 
                   https://t.co/4zkWvT1EtN",
          "urls": [
                  {"expanded_url": "https://mydatam...", 
                   "url": "htt..."}], 
 "user": {"id": }, 
 "user_mentions": []}

We need to access the text field, so let’s see how we can accomplish that.

First, we need to connect to Twitter’s API. This is where we are going to use the API keys we generated earlier.

[code language=”python”]
#create a class to be able to use it properly
class SampleTwitter:
#declare class variables
consumer_key = ”
consumer_secret = ”
access_token_key = ”
access_token_secret = ”

def __init__(self, consumer_key, consumer_secret, access_token_key,
access_token_secret):
# Twitter tokens
SampleTwitter.consumer_key = consumer_key
SampleTwitter.consumer_secret = consumer_secret
SampleTwitter.access_token_key = access_token_key
SampleTwitter.access_token_secret = access_token_secret
[/code]

As you can see I created a class because I am using this sampling a lot in my research, I suggest you do the same. When I am going to create my class object, I will parse the API keys. Next, in the SampleTwitter class, I created a method called getTweets() where I gave as input the account I want to sample. BE CAREFUL, there is a limit on how many tweets per day you can retrieve!

[code language=”python”]
#use the python-twitter package to get the tweets
#where screen_name is name of the account you want to sample
def getTweets(self, screen_name):

# Connect to twitter api
api = twitter.Api(consumer_key=SampleTwitter.consumer_key, consumer_secret=SampleTwitter.consumer_secret, access_token_key=SampleTwitter.access_token_key, access_token_secret=SampleTwitter.access_token_secret)
statuses = api.GetUserTimeline(screen_name=screen_name,
count=200, include_rts=True,
trim_user=False, exclude_replies=True)
#Gather all tweets to a list
tweets = []

for i in statuses:
#the tweets come ona jason format
tweet = json.loads(str(i))
tweets.append(tweet[‘text’])

return tweets
[/code]

As you can see at line 15 and 16 I extract the tweet’s text from the json format. Also, I want to talk about the GetUserTimeline’s parameter at line 7. Here I sampled the last 200 tweets, without replies, without retweets and with the user handles. You can find all the parameters here.

Step 4 – Calling the class, iterate through tweets

Concluding, I created a main.py file to retrieve the tweets.

[code language=”python”]
#import your class
from sample import SampleTwitter

consumer_key = ‘your consumer key’
consumer_secret = ‘your consumer secret’
access_token_key = ‘your access token key’
access_token_secret = ‘your access token secret’
#create your object
sampling = SampleTwitter(consumer_key, consumer_secret, access_token_key, access_token_secret)
#call the getTweets() method with the account you want to sample
tweets = sampling.getTweets(‘siaterliskonsta’)

#iterate through tweets
for tweet in tweets:
print tweet
[/code]

Conclusion

This is it! You can now sample twitter account, harvest tweets and process the results. Be careful tho, as I said before, there is a limit on how many tweets you can retrieve! Anyways, until next time, take care and have fun!

Yours,

Siaterlis Konstantinos

 

P.S. The whole code of this post is here.