In the following blog post, you will learn:
- How to download all Tweets from a user
- How to mine the data using Python and extract the Tweets
- The basics of using Twython (a popular Python wrapper for the Twitter API)
- How you can use some of the Twython example scripts for your own exploration
What you'll need:
- Python 2.7 (it will probably work with Python 3 but it's untested)
- Twython (latest version compatible with Python 2 should be fine)
- Twitter API Keys
According to the Python Twitter API, they'll allow you to extract a maximum of 3,200 tweets, 200 at a time, to do this I'm using Twython, a Python wrapper for the API. In the following tutorial, I'm going to show you how you can get all tweets from a user.
After reading the docs and doing a little searching I couldn't find anything on extracting the total number, I figured it was just some magic happening on the API side of things and with multiple calls it just knew to give you the next set of tweets, so I came up with something like this...
for i in range(0, 16): ## iterate through 16 times to get max No. of tweets user_timeline = twitter.get_user_timeline(screen_name="craigaddyman",count=200) for tweet in user_timeline: print tweet['text']
So basically this is just doing the function call 16 times (200 * 16 = 3,200 max number of tweets), all this did though was extract the latest 200 tweets 16 times.
After more reading, I came across this in the api docs. It basically explains the use of the max_id and since_id parameters, essentially you can specify which id to extract from, the following from the docs should make it a little clearer...
To use max_id correctly, an application's first request to a timeline endpoint should only specify a count. When processing this and subsequent responses, keep track of the lowest ID received. This ID should be passed as the value of the max_id parameter for the next request, which will only return Tweets with IDs lower than or equal to the value of the max_id parameter. Note that since the max_id parameter is inclusive, the Tweet with the matching ID will actually be returned again, as shown in the following image:
With this info and a little playing around I came up with the following but basically all you do is find out the tweet id for the last tweet, the account made, for this, you would do the following...
user_timeline = twitter.get_user_timeline(screen_name="craigaddyman",count=1) print user_timeline['id']:
This id will be something like 467020906049835008, now we introduce the max_id parameter, we add this to a list and as we extract the tweets we add the twitter ids for each to the list and specify the max_id as the last item in the list (the last tweet id from each extraction).
Here is the final code along with import and authentication requirements. Use this python code to extract data from twitter.
from twython import Twython # pip install twython import time # standard lib ''' Go to https://apps.twitter.com/ to register your app to get your api keys ''' CONSUMER_KEY = '' CONSUMER_SECRET = '' ACCESS_KEY = '' ACCESS_SECRET = '' twitter = Twython(CONSUMER_KEY,CONSUMER_SECRET,ACCESS_KEY,ACCESS_SECRET) lis =  ## this is the latest starting tweet id for i in range(0, 16): ## iterate through all tweets ## tweet extract method with the last list item as the max_id user_timeline = twitter.get_user_timeline(screen_name="craigaddyman", count=200, include_retweets=False, max_id=lis[-1]) time.sleep(300) ## 5 minute rest between api calls for tweet in user_timeline: print tweet['text'] ## print the tweet lis.append(tweet['id']) ## append tweet id's
Let me know if you found this "mining twitter with python" tutorial useful.
Give Your Inbox Some Love
What You'll get?
- Posts like this delivered straight to your inbox!
- Nothing else.