Tweepy code samples
Common use-cases for the Twitter API and how to solve them in Python 3 using Tweepy
Quick links
Jump to some highlighted sections
Get users — Get tweets — Post tweet — Search API — Streaming
TL;DR
A summary of this page.
If you have authenticated with Twitter as per the Authentication instructions, then you can interact with the Twitter API using the api
object.
For example:
my_profile = api.me()
tweets = api.search(q="#foo")
print(tweets[0])
new_tweet = api.update_status("Hello, world!")
Keep reading this page for more details.
About
This section aims at making at easier by doing that work for you and suggesting a good path, by providing recommended code snippets and samples of the data or returned. This guide is not meant to be complete, but rather to cover typical situations in a way that is easy for beginners to follow.
This based on Tweepy docs, Tweepy code and the Twitter API docs.
Snippet use:
You may copy and paste the code here into your own project and modify it as you need.
Pasting into a script and running is straightforward. And pasting most code into the *interactive terminal is fine, but you'll get a syntax error if you paste a function which has empty lines, so use a script instead for that.
Naming conventions
- A tweet is called a status in the API and Tweepy.
- A profile is called a user or author in the API and Tweepy.
- A username is called a screen name in the API and Tweepy.
These terms will be used interchangeably in this guide.
Tweepy API overview
The api
object returned in the auth section above will cover most of your needs for requesting the Twitter API, whether fetching or sending data.
The api
object is an instance of tweepy.API
class and is covered in the docs here and is useful to see the allowed parameters, how they are used and what is returned.
The methods on tweepy.API
also include some useful links in their docstrings, pointing to the Twitter API endpoints docs. These do not appear in the Tweepy docs. Therefore you might want to look at the api.py script in the Tweepy repo to see these links.
Twitter API docs: API reference index - a list of all available endpoints. Tweepy implements most of these I think. For more info on the API, see Resources page.
How do I get a high of volume of tweets?
- Add a waiting config option as per the auth guide so that Tweepy will automatically wait when it rates a rate limit exceeded point.
- Use Paging here so that Tweepy will iterate over multiple pages for you.
- Pick a token auth approach that gives the most tweets in a window. See the Rate limits section on Twitter policies page. For example, App-only Token is more suitable for search than for App Access Token (with user context).
Paging
Follow the Tweepy tutorial to get familiar with how to use a Cursor to do paging - iterate over multiple pages of items of say 100 tweets each.
Tweepy docs: Cursor tutorial. The tutorial also explains truncated and full text.
Setup the cursor
An api
method must be passed to the cursor, along with any parameters.
cursor = tweepy.Cursor(
api.search,
query,
count=100
)
Tweepy repo: Cursor class
Pages and items
When iterating over the cursor, you must specify if you want the response to be pages or items.
Pages is how Twitter API works - you get multiple pages of say 100 tweets each, so you iterate over page which then have a list (or iterator) of tweets.
for page in cursor.pages():
for tweet in page:
print(tweet.id)
Or you can use items approach, where Tweepy flattens multiple pages into what feels like one long list (or iterator).
for tweet in cursor.items():
print(tweet.id)
Limit
The cursor will carry on it until it gets all available data. You can optionally limit this by omitting the limit.
In both examples below, we process 5 pages of 100
tweets each and get a total of 500 tweets.
for tweet in cursor.items(500):
print(tweet.id)
for page in cursor.pages(5):
for tweet in page:
print(tweet.id)
Get users
Various approaches to get profiles of Twitter users
Use api.get_user
to get one user by ID or screen name, or use api.look_users
to get many users. Read on for more details.
Fetch the profile for the authenticated user
api.me()
Get the author of a tweet
Whenever you have a tweet object you can find the profile that authored the tweet, without a doing a further API call.
tweet.author
See models page for in this guide for attributes on a User instance.
Fetch profile by ID
Lookup a single profile
By screen name.
screen_name = "foo"
user = api.get_user(screen_name=screen_name)
Or by user ID.
user_id = "foo"
user = api.get_user(user_id=user_id)
Tweepy docs: API.get_user
Then you can inspect the user object or do actions on it. See the User section of the models page.
Example:
user.screen_name
# => "foo"
user.id
# => 1234567
user.followers_count
# => 99
Lookup user ID for a screen name
How to get the profile and user ID for a given screen name.
user = api.get_user(screen_name='foo')
Get the user ID as an int
.
user_id = user.id
# 1234567
Get the user ID as a str
. You probably don't need this. Use the .id
one rather.
user_id = user.id_str
# "1234567"
Lookup many profiles
Lookup one or more users at once using their screen names.
screen_names = ["foo", "bar", "baz"]
users = api.lookup_users(screen_names=screen_names)
Or lookup one or more users by their IDs.
user_ids = [123, 456, 789]
users = api.lookup_users(user_ids=user_ids)
Tweepy docs: API.lookup_users
The endpoint only lets you request up to 100 IDs at once, so you'll never than more than one page of results. Therefore you get more results, you should batch your IDs into groups of 100 and then lookup each group.
Search for user
users = api.search_users(q, count=20)
The count argument may not be greater than 20 according to Tweepy docs, but you may use paging.
Tweepy docs: API.search_users
Get followers of a user
Followers method
Get the followers of a given user.
- api.followers
Returns a user’s followers ordered in which they were added. If no user is specified by id/screen name, it defaults to the authenticated user.
- Specify user ID or screen name.
- Supports paging.
- Returns a list of
tweepy.User
objects.
for user in api.followers(screen_name="foo"):
print(user.screen_name)
Follower IDs method
Similar to above, but only returns user IDs and not users.
Returns an array containing the IDs of users following the specified user.
Specify user ID or screen name.
Supports paging.
Return a list of
int
objects.This can be useful if you want to map user IDs to user IDs in a graph of followers and maybe combined with tweet IDs, without actually using the profile data like screen name.
for user_id in api.followers(screen_name="foo"):
print(user_id)
With paging:
cursor = tweepy.Cursor(
api.followers,
screen_name="foo",
count=100
)
user_id_pages = list(cursor.pages())
You can combine this approach with Lookup users method, to lookup a batch users with known IDs or screen names.
cursor = tweepy.Cursor(
api.lookup_users,
user_ids=user_id_pages,
count=100
)
You will have to split the user IDs into batches of at most 100 items so that the query will work. Here we use pages from above so it will already be batched.
This uses two steps to get IDs and the users, so consider the rate limit impact for the first and second step.
Rate limits on follower approaches
See Rate Limits on Twitter Policies page details.
If you want to see which approach works better for you at scale, see these references from people who have done research:
API | Max Return/Call Size | Requests / 15-min window | Total Results Per Window |
---|---|---|---|
followers/list | 200 | 15 | 3000 |
followers/ids | 5000 | 15 | 75000 |
users/lookup | 100 | 180 | 18000 |
Twitter provides two ways to fetch the followers
Fetching full followers list (using followers/list in Twitter API or api.followers in tweepy) - Alec and mataxu have provided the approach to fetch using this way in their answers. The rate limit with this is you can get at most 200 * 15 = 3000 followers in every 15 minutes window.
Second approach involves two stages:-
a) Fetching only the followers ids first (using followers/ids in Twitter API or api.followers_ids in tweepy).you can get 5000 * 15 = 75K follower ids in each 15 minutes window.
b) Looking up their usernames or other data (using users/lookup in twitter api or api.lookup_users in tweepy). This has rate limitation of about 100 * 180 = 18K lookups each 15 minute window.
Considering the rate limits, Second approach gives followers data 6 times faster when compared to first approach.
Get tweets
If you want to do a search for tweets based on hashtags or phrases or that are directed at a user, go to the Search API section.
Links:
- Twitter API: Timelines overview
- Twitter API: Post, retrieve, and engage with Tweets
Get a user's most recent status
This may be truncated since you can't specify tweet mode as extended.
Note Twitter API says this is supplied if available - but this is not guaranteed especially during high activity, so make you application robust enough to handle this.
Get the most recent status on a user object.
user.status
See the Get user section for getting a user.
Get exactly one status for a given user.
api.user_timeline(screen_name, count=1)
Get my timeline
Get tweets from your own users's timeline, as a mix of their own and friend's tweets.
tweets = api.home_timeline()
Returns the 20 most recent statuses, including retweets, posted by the authenticating user and that user’s friends. This is the equivalent of /timeline/home on the Web.
Tweepy docs: API.home_timeline
Get a user's timeline
Get the most recent by a user. You can specify user_id
or screen_name
to target a user.
screen_name = "foo"
tweets = api.user_timeline(screen_name=screen_name)
If you don't specify a user, the default behavior is for the authenticated user.
tweets = api.user_timeline()
The API doesn't say what the default is but the max without paging is 200
, so you can request 1
to 200
without paging.
tweets = api.user_timeline(count=200)
Tweepy docs: API.user_timeline
Twitter API docs: GET statuses/user_timeline - note daily limit of 100k tweets and getting 3,200 most recent tweets, otherwise there is not really a date restriction on how many days or years you can go back to.
Fuller examples
Get the latest 200
tweets of a user.
See Extended message section regarding the Tweet mode parameter.
screen_name = "foo"
tweets = api.user_timeline(
screen_name=screen_name,
count=200,
tweet_mode="extended",
)
for tweet in tweets:
try:
print(tweet.full_text)
except AttributeError:
print(tweet.text)
Using paging to get 1000
tweets - 3200
is the max for a timeline.
screen_name = "foo"
cursor = tweepy.Cursor(
api.user_timeline,
screen_name=screen_name,
count=200,
tweet_mode="extended",
)
for tweet in cursor.items(1000):
try:
print(tweet.full_text)
except AttributeError:
print(tweet.text)
Get expanded message on a user's retweets
Note that even though we use extended mode to show expanded rather than truncated tweets, the message of a retweet will still be truncated. So you can this approach to get the full message on the original tweet.
Example from source.
tweets = api.user_timeline(id=2271808427, tweet_mode="extended")
# This is still truncated.
tweets[6].full_text
# => 'RT @blawson_lcsw: So proud of these amazing @HSESchools students who presented their ideas on how to help their peers manage stress in mean…'
# Original expanded text.
tweets[6].retweeted_status.full_text
# => 'So proud of these amazing @HSESchools students who presented their ideas on how to help their peers manage stress in meaningful ways! Thanks @HSEPrincipal for giving us your time!'
Tweepy docs: Handling Retweets in Extended Tweets guide.
Get the latest tweet from users
You can use this approach, which is fine to do for one user.
tweets = api.user_timeline(count=1)
If you need to go through 100 users and get their latest tweet, this would take 100 separate requests.
A more efficent way would be to lookup the 100 profiles at once and then get the latest tweet on each user object.
screen_names = ["foo", "bar", "baz"]
users = api.lookup_users(screen_names=screen_names)
Getting the latest tweet on each user is not covered here.
Fetch tweets by ID
If you know the ID of a tweet, you can fetch it. This is useful if you want to find the latest engagements count on a tweet, or if you have a list of just IDs from outside Tweepy and you want to turn them into Tweepy objects so you can get the message, author, date, etc.
Lookup a single tweet
tweet_id = 123
api.get_status(tweet_id)
Tweepy docs: API.get_status
Lookup many tweets
tweet_ids = [123, 456, 789]
api.statuses_lookup(tweet_ids)
Tweepy docs: API.statuses_lookup
Get retweets of a tweet
Get up to 100 retweets on a given tweet.
tweet_id = 123
count = 100
retweets = api.retweets(tweet_id, count)
Tweepy docs: API.retweets
See also:
retweets = tweet.retweets
Get the target of a reply
Get original tweet on the current tweet, if it has one.
original_tweet_id = tweet.in_reply_to_status_id
if original_tweet_id is not None:
original_tweet = api.get_status(original_tweet_id)
Get user who was the target of the reply.
original_user = tweet.in_reply_to_user_id
Get the target of a retweet
If the current tweet is a retweet (i.e. starts with "RT @"
) then it will have the original tweet as an attribute. Use this code to get the original tweet and default to None
if it does not exist.
original_tweet = getattr(tweet, "retweeted_status", None)
You can get the ID or author on that tweet.
Or you can just check if the tweet is a retweet by checking if the value is None
.
Get media on a tweet
A tweet can have up to 4 item items on it and these can be photos, videos or GIFs.
Get the media by reading the entities attribute and getting the media
field, which only exists if there actually media items.
You must used extended mode otherwise you will not see media.
Get a tweet - the example below uses apu.get_status
, but this can be applied to other cases.
tweet_id = 1256704946717822977
tweet = api.get_status(tweet_id, tweet_mode="extended")
Get the media list on the tweet.
media = tweet.entities.get("media", [])
Here we default to an empty list in case the key is not set.
Then you you can get the HTTPS media URL on the items media list
.
Example:
for item in media:
url = item["media_url_https"]
# => "https://pbs.twimg.com/media/EXC2A8vXgAEM7Nm.jpg"
Get tweet engagements
See more on the models page of this guide.
Get favorites
tweet.favorite_count
# => 0
Get the favorites list. Supports paging.
tweet.favorites
Get retweets
tweet.retweet_count
# => 0
Get a list of retweets of the tweet. This has a max of 100 but supports paging.
api.retweets(tweet.id)
# Untested
retweets = tweet.retweets()
Get retweeters
Get the user IDs of the users who retweeted the tweet. This has a max of 100 but supports paging.
# Untested
retweeters = tweet.retweeters
Filter tweets by language
Twitter assigns a tweet a language e.g. en
for English or it
for Italian. These languages are available to filter by when doing a search or stream and you can also read the attribute on a fetched tweet.
Twitter dev docs: Supported languages
Twitter API docs: Get Supported Languages endpoint. There is some sample output there.
Where the value come from?
These language labels are based on the content of the tweet and is inferred.
Tweepy docs say "Language detection is best-effort.".
Warning: In my experience this is not reliable. Tweets appear as unknown language, or a user making several tweets which I can see are all in one language get labelled as different language. If you still want to use language, you can continue.
What about the settings of the user?
There is no account setting to change what language you are posting in.
There is a Display Language setting in Twitter account settings, but this how the interface appears to you. The help text for the item explain that is does not affect the content of Tweets.
See the Search API section on this page for more details how on to do searches.
Show the language
tweets = api.search("python")
for tweet in tweets:
print(tweet.lang, tweet.text)
if tweet.lang == "en":
print(tweet.text)
Filter on the result
tweets = api.search("python")
for tweet in tweets:
if tweet.lang == 'en':
print(tweet.text)
Filter query
Some endpoints let you specify languages so that only matching tweets will be returned.
Search filtered by language
From the api.search docs:
lang – Restricts tweets to the given language, given by an ISO 639-1 code. Language detection is best-effort.
e.g.
tweets = api.search("python", lang="en")
Streaming filtered by language
Note use of languages
, not lang
.
e.g.
stream.filter(track=["python"], languages=["en"])
Get replies to a tweet
The only way to get replies to a tweet is using the Search API, which means you can only get replies which happened in the past week.
This approach gets all replies to a user with screen name foo
. You can replace the handle with your own.
to:foo filter:replies
That can be tested into browser.
Here is how to do it with Tweepy.
screen_name
query = "to:{} filter:replies".format(screen_name)
tweets = api.search(query)
To get replies to a specify tweet, you'll have to check the tweet.in_reply_to_status_id
attribute for a match on the current ID.
This can be further optimized by specifying a condition in the search which only shows tweets after the target tweet ID, but if you're iterating back from most recent tweets the way Twitter does then it only helps a bit.
You'll also have to apply recursive logic to get replies to replies.
Engage with a tweet
Note that you should only use these actions if you included them in your dev application otherwise you may get blocked. Also if you have a read-only app, you can upgrade to a read and write app.
?! Please use these sparingly. The automation policy for Twitter API allows use of these actions as long as they are not used indiscriminately. If do favorite or retweet every tweet on a timeline or in a stream, you may get blocked for spammy low-quality behavior. If you do a search for popular tweets matching a hashtag and engage with a few of them, this will be fine.
See this guide's Twitter policies page
Favorite
tweet.favorite()
Retweet
tweet.retweet()
Reply
See Create a reply section.
Post tweet
FAQs
Important: Please understand what you are allowed to tweet before doing it.
Can I reply to a tweet or @mention
someone?
Yes, but only if they have first messaged you. The Twitter automation policy is strict on this. Please make sure you understand it before replying to tweets.
Doing a search for tweets and replying to them without the user opting in (such as by tweeting to you) is considered spammy behavior and will likely get your account shutdown.
Can I make a plain tweet?
If you just want to make a tweet message without replying or mentioning, yes you are allowed to do this using the API. For example a bot which posts content daily from Reddit or a weather or finance service. Or posts a random message from a list or posts a message from a schedule.
Tweet a text message
msg = 'Hello, world!'
tweet = api.update_status(msg)
Tweepy docs: API.update_status. ?> Twitter API docs: POST statuses/update
To choose a random text message:
msgs = ["Foo", "Bar baz")
msg = randon.choice(msgs)
Tweet a message with media
Upload an image or animated GIF. Video upload is not supported by Tweepy yet.
media_path = 'foo.gif'
msg = 'Hello, world!'
tweet = api.update_with_media(media_path, status=msg)
Tweepy docs: API.update_with_media.
Note that this method does still work, but the Tweepy docs says this is deprecated. The preferred approach is to use api.upload_media
and then attach the returned ID as part of the media_ids
list parameter on the api.update_status
method covered above.
Create a reply
A reply is a tweet directed at another tweet ID or user. When you reply to a tweet, it becomes a "thread" or "threaded conversation".
Read the Twitter policies page automation rules carefully before automating replies to users. Any message directed at a user without them requesting it from your bot can be considered spam by Twitter. Twitter docs are very specific on when you may reply.
A safe way to make replies is to reply to your own tweets only. This can be used to create a tweet chain such as a 10-part tutorial with text or images.
According to the Tweepy docs for this endpoint, you must do a mention of the screen name somewhere in your message along with using the reply parameter in order for your tweet to count as a reply.
Bearing the notices above in mind, here is how to create a reply.
Read more on the Twitter policies page of this guide.
Here is the general form:
tweet = api.update_status(
message,
in_reply_to_status_id=target_id,
)
Reply example
If you were replying to a tweet directed at your user:
target_id = tweet.id
screen_name = tweet.author.screen_name
msg = f"@{screen_name} thank you!"
api.update_status(
msg,
in_reply_to_status_id=target_id,
)
Reply
Below how to a reply chain aka threaded tweets. This will make an initial tweet and then a series of replies to each additional tweet
This is a a novel way to make replies without hitting policy restrictions is to make a tweet and then reply to yourself. This means you could chain together a list of say 10 items perhaps with pictures and group them together. I've seen this before and is a great way to overcome the character limit for writing a blog post.
Untested code - it might be better to reply to the initial ID only.
screen_name = api.me().screen_name
messages = [
"foo bar",
"fizz buzz",
"#tweepy #twitterapi",
]
target_id = None
for message in messages:
if target_id is None:
print("Initital tweet!")
else:
print(f"Replying to tweet ID: {target_id}")
message = f"@{screen_name} {message}"
tweet = api.update_status(
message,
in_reply_to_status_id=target_id,
)
target_id = tweet.id
Handle time values
Tips on dealing with time values from the Twitter API
Date and time
The Twitter API often provides a datetime value in ISO 8601 format and Tweepy returns this to you as a string still.
e.g. "2020-05-03T18:01:41+00:00"
.
This section covers how to parse a datetime string to a timezone-aware datetime object, to make it more useful for calculations and representations.
import datetime
TIME_FORMAT_IN = r"%Y-%m-%dT%H:%M%z"
def parse_datetime(value):
"""
Convert from Twitter datetime string to a datetime object.
>>> parse_datetime("2020-01-24T08:37:37+00:00")
datetime.datetime(2020, 1, 24, 8, 37, tzinfo=datetime.timezone.utc)
"""
dt = ":".join(value.split(":", 2)[:2])
tz = value[-6:]
clean_value = f"{dt}{tz}"
return datetime.datetime.strptime(clean_value, TIME_FORMAT_IN)
When splitting, we don't need seconds and any decimals values. Plus, these have changed style before between API versions so are unreliable. So we just ignore after the 2nd colon (minutes) and pick up the timezone from the last 6 characters.
The datetime value from Twitter will be always be UTC zone (GMT+00:00), regardless of your location or profile settings. Lookup the datetime docs for more info.
Example usage:
>>> dt = parse_datetime(tweet.created_at)
>>> print(dt.year)
2020
Timestamp
If you get any numbers which are timestamps such as from the Rate Limit endpoint, you can convert them to datetime objects.
import datetime
timestamp = "1403602426"
datetime.datetime.fromtimestamp(float(timestamp))
# => datetime.datetime(2014, 6, 24, 11, 33, 46)
Search API
The Twitter Search API lets you get tweets made in the past 6 to 9 days. The approaches below take you from getting 20 tweets to thousands of tweets but always bound by the time restriction.
If you want a live stream of tweets, see the Streaming section.
If you want to go back more than a week and are willing to pay, see the Batch historical tweets API docs.
Query syntax
Twitter has a flexible search syntax for using "and" / "or" logic and quoting phrases.
Twitter API docs on search:
Be sure to use the standard docs as the premium operators do not work on the free search services.
You can test a search query out in the Twitter search bar before trying it in the API.
Search query examples
Basic
Some examples to demonstrate common use of the search syntax.
- Single term
foo
#foo
@some_handle
- Require all terms. Note that
AND
logic is implied. The order does not matter.foo bar baz
to:some_handle foo
from:some_handle foo
- Require at least one term - uses the
OR
keyword.foo OR bar
#foo OR bar
- Exact match on phrase. i.e. all words must be used and in order.
foo bar
- Exclusion - Using leading minus sign.
foo -bar
- Groups
- Require all groups.
(foo OR bar) (spam OR eggs)
(foo OR bar) -(spam OR eggs)
- Require any group.
(foo OR bar) OR (spam OR eggs)
- Require all groups.
- Exact match on a phrase
"Foo bar"
"Foo bar" OR "Fizz buzz" OR spam
Searching is case insensitive.
The to
and from
operators are provided by the Twitters docs. Using @some_handle
might provide the same as to:some_handle
but I have not tested. Using @some_handle
might include tweets by the user too.
When looking up a user, you may wish to leave off the @
to get more results which are still relevant, provided the handle is not a common word. I found this increase the volume.
When combing AND
and OR
functionality in a single rule, the AND
logic is evaluated first. Such that foo OR bar fizz
is equivalent to foo OR (bar fizz)
. Though, braces are preferred for readability.
Note for the last example above that double-quoted phrases must be before ordinary terms, due to a known Twitter Search API bug.
Advanced
See the links in Query syntax section for more details.
Query | Description |
---|---|
to:some_handle |
Mentions of user @some_handle . |
filter:retweets #bar |
Retweets only about #bar . |
-filter:retweets #bar |
Exclude retweets about #bar . |
filter:replies #bar |
Replies only about #bar . |
to:some_handle filter:replies |
Replies to @some_handle . |
Tweepy search method
Tweepy docs: API.search - that section explains how it works and what the method parameters do.
Twitter API docs: Standard search API
Define query
Create a variable which contains your query. The query should be a single string, not a list, and should match exactly what you'd put in the Twitter.com search bar (which also makes it easy to test).
Examples:
Basic.
query = "#python"
Complex - Use the rules linked above or see the Query syntax section.
query = "foo bar" query = "foo OR bar"
An exact match phrase in quotes - just change the outside to single quotes.
query = '"foo bar"'
Basic
Return tweets for a search query. Only gives 20 tweets by default, so read on to get more.
tweets = api.search(query)
Or use q
explicitly, for the same result.
tweets = api.search(q=query)
Example of iterating over the results in tweets
object:
def process_tweet(tweet):
print(tweet.id, tweet.author, tweet.text)
for tweet in tweets:
process_tweet(tweet)
Get a page of 100 tweets
With search API, you can specify a max of up to 100
items (tweets) per page. The other endpoints like user timelines seem to mostly allow up to 200
items on a page.
tweets = api.search(
query,
count=100
)
If you want to get the next 100 tweets after that, you could get the ID of the last tweet and use that to start the search at the next page, modified with since_id=last_tweet_id-1
. You'd also have to check when there are no Tweets left and then stop searching. However, it is much more practical to use Tweepy's Cursor approach to do paging, covered next.
Get many tweets using paging
This approach using the Paging approach to do multiple requests for pages of up to 100 tweets each, allowing you get thousands of tweets.
The Twitter API imposes rate limiting against a token, to prevent abuse. So, after you've met your quota of searches in a 15-minute window (whether new searches or paging on one search), you will have have to wait until it resets and then do more queries. Any requests before then will fail (though other will have their own limit). This waiting can be turned on as a config option on setting up the auth
object, as covered in Installation section.
cursor = tweepy.Cursor(
api.search,
query,
count=100
)
for tweet in cursor.items():
process_tweet(tweet)
See Paging section for more info.
Extended message
It is useful to use extended mode when doing a search.
- Do this with `tweet_mode='extended'.
- Twitter by defaults returns messages truncated to 140 characters (with an ellipsis), even though users may enter tweets up to 280 characters. So use this option to the full message.
- Note that retweets messages might still be truncated even with this option but there is a workaround.
When using this option, make sure to use the tweet.full_text
attribute and not tweet.text
. But still allow fallback to plain tweet.text
. Since the Tweepy docs say:
If status is a Retweet, it will not have an extended_tweet attribute, and status.text could be truncated.
Example:
tweets = api.search(
query,
tweet_mode="extended",
)
for tweet in tweets:
try:
print(tweet.full_text)
except AttributeError:
print(tweet.text)
Tweepy docs: Extended mode
As a function:
def get_message(tweet):
"""
Robustly get a message on a tweet.
This ideal for extended mode, but also works on standard mode when tweets
are truncated. And it handles retweets, which ALWAYS use the `.text`
attribute even in extended mode according to the API docs.
"""
try:
return tweet.full_text
except AttributeError:
return tweet.text
print(get_message(tweet))
Result type
Set result_type
to one of the following, according to Twitter API:
mixed
- A balance of the other two. Default option.recent
- The tweets that are the most recent.popular
- The tweets with the highest engagements. Note that this list might be very short (just a few tweets) - compared with running therecent
query.
result_type = "popular"
count = 100
tweets = api.search(
query,
count=count,
result_type=result_type,
)
Limit date range
You can specify that the tweets should be up to a date. If you don't care about tweets in the last few days or you already stored them, this can be useful to go back further.
Add until
as a parameter with year, month, date formatted date as a string.
e.g.
api.search(
q=query,
until="2020-05-07",
)
You are still bound by the search API's limit of one week, so if you set until to be a week ago you'll get close to zero tweets.
Filter by location
Search for tweets at a point within a radius.
You can leave the search query parameter q
unset and this will still work.
Format of a geocode value:
LATITUDE,LONGITUDE,RADIUS
Example usage:
api.search(geocode="33.333,12.345,10km")
api.search(geocode="37.781157,-122.398720 ,mi")
Twitter API docs: Standard Search API - see geocode
under Parameters.
Returns tweets by users located within a given radius of the given latitude/longitude. The location is preferentially taking from the Geotagging API, but will fall back to their Twitter profile.
The parameter value is specified by
latitude,longitude,radius
, where radius units must be specified as eithermi
(miles) orkm
(kilometers).Note that you cannot use the near operator via the API to geocode arbitrary locations; however you can use this geocode parameter to search near geocodes directly.
A maximum of 1,000 distinct "sub-regions" will be considered when using the radius modifier.
Full search example
- Get tweets for a keyword search (Basic)
- Excluding replies and retweets based on the query (Advanced)
- Getting as many tweets as possible by setting max count using paging (Get many tweets using paging)
- Using full message text (Extended message)
- For a given language - this is not reliable but it is an option (Filter tweets by language)
See Authentication page of this guide for setting up the api
object.
Click to expand:
search.py
def get_message(tweet):
"""
Robustly get a message on a tweet.
Even if not extended mode or is a retweet (always truncated).
"""
try:
return tweet.full_text
except AttributeError:
return tweet.text
query = "-filter:retweets -filter:replies python"
lang = "en"
cursor = tweepy.Cursor(
api.search,
q=query,
count=100,
tweet_mode="extended",
lang=lang,
)
results = []
for tweet in cursor.items():
parsed_tweet = {
"id": tweet.id,
"screen_name": tweet.author.screen_name,
"message": get_message(tweet),
}
print(parsed_tweet)
results.append(parsed_tweet)
print(len(results)
Get entities on tweets
Get media
How to get images on tweets.
This example is for the Search API but can work for other methods too such as User timeline.
Add entities to your request - this may not always be needed on some endpoints such as .search
where the default is True
. Check the Tweepy docs.
Then use the media value, if one exists on a tweet's entities.
cursor = tweepy.Cursor(
api.search,
query,
count=count,
include_entities=True,
)
for tweet in cursor:
if 'media' in tweet.entities:
for image in tweet.entities['media']:
print(image['media_url'])
Streaming
This section focuses on the standard and free "filtered" Streaming API service. There are more services available, covered in the Other streams subsection.
What is streaming and how many tweets can I get?
The Search API gives about 90% of tweets and back 7 days, but you have to query it repeatedly if you want "live" data and this can result in reaching API limits.
The filtered streaming API lets you connect to the firehose of Twitter tweets made in realtime. You must specify a filter to apply - either keywords or users to track.
However, the volume is much lower than the search API.
Studies have estimated that using Twitter’s Streaming API users can expect to receive anywhere from 1% of the tweets to over 40% of tweets in near real-time.
The reason that you do not receive all of the tweets from the Twitter Streaming API is simply because Twitter doesn’t have the current infrastructure to support it, nor do they don’t want to support it; hence, the Twitter Firehose. source
Streaming resources
Tweepy
- Streaming tutorial in the docs.
- streaming.py module in the repo. This is useful to find or override existing methods.
- See StreamListener class.
- See Stream class and Stream.filter method.
- streaming.py example script in the repo.
- test_streaming.py - Python tests for
streaming
module.
Twitter API docs
- Filter realtime Tweets
- Make sure to use "POST statuses/filter" as the other endpoints are premium only.
- Note deprecation warning:
This endpoint will be deprecated in favor of the filtered stream endpoint, now available in Twitter Developer Labs.
- POST statuses/filter endpoint reference
- Including URL and response structure.
- Including allowed parameters.
- Basic stream parameters
- Covers parameters in more detail.
filter_level
- The default value is
none
, which is all available tweets. If you don't need all tweets or performance is an issue, you can set this tolow
ormedium
.
- The default value is
language
- You can this to a standard code like
en
. However, when using the Search API I found the labels were inconsistent even on several tweets from the same person. Twitter guesses the language, it doesn't use your settings.
- You can this to a standard code like
- Premium stream operators
- Additonal parameters only available on the paid tier.
Setup stream listener class
Create a class which inherits from StreamListener.
Base
To get started, define a listener class using the example from the Tweepy docs Streaming tutorial.
class MyStreamListener(tweepy.StreamListener):
def on_status(self, status):
"""Called when a new status arrives"""
print(status.text)
def on_error(self, status_code):
if status_code == 420:
return False
That will:
- Print a tweet immediately when it happens and then return
None
, which will keep the stream alive. - Disconnect when throttled by rate limiting by returning
False
. Rate limiting is not measure as requests in a window like the search API, which means you can get a high volume of tweets in realtime. Read the Rate limits section on the Twitter Policies page for more info.
Some people name this class as _StdOutListener
.
Override more methods
Most of the methods just return nothing quietly, so you will need to override methods you care about so you can print the output to the console or write to a CSV or database. Check the link above to available methods - the docstrings explain them well.
For example, you could override on_direct_message
to handle that event.
You might want to handle some errors, or add least add printing to hep debugging.
Here are some error methods:
Method | Description |
---|---|
on_exception |
Called when an unhandled exception occurs |
on_limit |
Called when a limitation notice arrives |
on_error |
Called when a non-200 status code is returned |
on_timeout |
Called when stream connection times out |
on_disconnect |
Called when twitter sends a disconnect notice. |
Twitter API docs: Streaming message types - includes error codes.
Setup stream instance
myStreamListener = MyStreamListener()
stream = tweepy.Stream(auth=auth, listener=myStreamListener)
Some people do this in one line instead:
stream = tweepy.Stream(auth=auth, listener=MyStreamListener())
Start streaming
Follow the sections below to start streaming with the stream
object.
Only the .filter
method is covered here as that is accessible without a premium account.
Follow tweets from or to users
Stream public tweets relating one or more users.
According to docs this includes:
- Tweets and retweets from the user.
- Replies and retweets to the user's tweets.
- Original messages to user. i.e. Message starts with "@handle", but not mentions with the handle later in the message.
Twitter API docs: Basic stream parameters (see follow section).
First get the user IDs of one or more Twitter users to follow.
Make sure you specify user IDs and not screen names. If you need to, see the instructions on how to Lookup user ID for a screen name.
Then pass the follow parameter using a list
of strings.
e.g.
user_ids = ["1234567", "456789", "9876543"]
stream.filter(follow=user_ids)
Follow tweets matching keywords
Use the track parameter and one or more terms, like keywords or hashtags or URLs.
Example:
track = ["foo", "#bar", "fizz buzz"]
stream.filter(track=track)
- OR
- The Twitter API will look for a tweet which contains any (i.e. at least one) of the items in the list, so it uses
OR
logic.
- The Twitter API will look for a tweet which contains any (i.e. at least one) of the items in the list, so it uses
- AND
- Use a space between words to use
AND
logic. e.g."fizz buzz"
.
- Use a space between words to use
You cannot use quoted phrases. The API doc says: "Exact matching of phrases (equivalent to quoted phrases in most search engines) is not supported.".
The docs say you can track a URL but recommends including a space between parts for the most inclusive search. "example com"
.
UTF-8 characters are supported but must be used explicitly in your search. e.g. 'touché'
, 'Twitter’s'
.
Twitter API docs: Basic stream parameters (see track section).
Full stream examples
Click to expand:
tweepy_docs_example.py
"""
Streaming demo - Tweepy docs example.
Based on tutorial: http://docs.tweepy.org/en/latest/streaming_how_to.html
"""
import tweepy
CONSUMER_KEY = ""
CONSUMER_SECRET = ""
ACCESS_TOKEN = ""
ACCESS_SECRET = ""
class MyStreamListener(tweepy.StreamListener):
def on_status(self, status):
print(status.text)
def on_error(self, status_code):
if status_code == 420:
# Returning False in on_error disconnects the stream on rate limiting.
# This is recommended.
return False
# Returning non-False reconnects the stream, with backoff.
auth = tweepy.auth.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)
myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth=auth, listener=myStreamListener)
# Follow tweets with the word "python".
# Note that is the command is blocking, so any lines after this will not execute.
myStream.filter(track=["python"])
# Use async flag so that a separate thread is used.
# myStream.filter(track=['python'], is_async=True)
# Follow user ID "2211149702"
# myStream.filter(follow=["2211149702"])
tweepy_example_repo_example.py
"""
Stream watcher - from Tweepy example repo.
Based on PY 2 script here: https://github.com/tweepy/examples/blob/master/streamwatcher.py
"""
import time
from getpass import getpass
from textwrap import TextWrapper
import tweepy
CONSUMER_KEY = ''
CONSUMER_SECRET = ''
ACCESS_TOKEN = ''
ACCESS_SECRET = ''
class StreamWatcherListener(tweepy.StreamListener):
status_wrapper = TextWrapper(width=60, initial_indent=' ', subsequent_indent=' ')
def on_status(self, status):
try:
print(self.status_wrapper.fill(status.text))
print('\n %s %s via %s\n'
% (status.author.screen_name, status.created_at, status.source))
except Exception:
# Catch any unicode errors while printing to console
# and just ignore them to avoid breaking application.
pass
def on_error(self, status_code):
print('An error has occurred! Status code = %s' % status_code)
# Keep stream alive.
return True
def on_timeout(self):
print('Snoozing Zzzzzz')
auth = tweepy.auth.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)
stream = tweepy.Stream(
auth,
StreamWatcherListener(),
timeout=None
)
Update stream
If you want to update a stream, you must stop it and then start a new stream, according to this Twitter dev page.
One filter rule on one allowed connection, disconnection required to adjust rule.
This also means you are not allowed to have more than one streaming running at a time for account, not in the same script, same machine or even on another machine.
One way is to stop your application, reconfigure it and then start it again.
If you want to keep the script running when switching streams, you can restart like this:
track = ["foo"]
# Start initial stream.
stream.filter(track=track, is_async=True)
time.sleep(5)
# Update. This won't get applied yet.
track.append("bar")
# Stop.
stream.running = False
# Start again.
stream.filter(track=track, is_async=True)
The gap will hopefully be very short so you don't lose much.
How do I stream faster?
The streaming API is meant to be realtime but you have still experience a delay. In one case I heard that posting a tweet was delayed in the streaming up by 5 seconds, which I'd say is still good.
This delay might just be built into the way the Twitter API works.
Here are some ideas to improve performance when streaming:
- The obvious ones - improve your internet connection speed or improve your hardware. Use a remote machine through AWS to "rent" a machine in the cloud dedicated to your application. Besides choosing higher specs than your local machine, it can also be online and run 24/7.
- Run your script in unbuffered mode. Rather than waiting until the console output meets a threshold, tell Python to print immediately.
- e.g.
python -U script.py
- e.g.
- If you performance bottleneck is processing the tweet locally (writing to CSV or database), you can make that task asynchronous by using RabbitMQ or similar.
- This may not improve the delay, but it will make sure your application can process every tweet that Twitter Streaming API sends at you and that you don't get disconnected (which can happen if Twitter Streaming API decides you are handling the offloaded tweets to slowly).
- Example of repo which does this (though it's archived, so it's not maintained and might not work).
- ukgovdatascience/twitter-mq-feed
A script that gets data from the Twitter real-time API, passes it to a message-queue (e.g. RabbitMQ) and stores tweets into MongoDB
- ukgovdatascience/twitter-mq-feed
- If using the premium streaming API, use an advanced filter.
Other streams
Decahose
Enterprise stream to get 10% of tweets.
Twitter API docs: Decahose API reference
Powertrack
Enterprise stream to get 100% of tweets.
The PowerTrack API provides customers with the ability to filter the full Twitter firehose, and only receive the data that they or their customers are interested in.
Twitter API docs: Powertrack API reference
Lab streams
Experimental Twitter API endpoints.
- Labs V2 Overview
- Sample stream v1 (replaces Sample realtime tweets endpoint)
The sampled stream endpoint allows developers to stream about 1% of all new public Tweets as they happen. You can connect no more than one client per session, and can disconnect and reconnect no more than 50 times per 15 minute window.
- Filtered stream v1
The filtered stream endpoints allow developers to filter the real-time stream of public Tweets. Developers can filter the real-time stream by applying a set of rules (specified using a combination of operators), and they can select the response format on connection.
This preview contains a streaming endpoint that delivers Tweets in real-time. It also contains a set of rules endpoints to create, delete and dry-run rule changes. During Labs, you can create up to 10 rules (each one up to 512 characters long) can be set on your stream at the same time. Unlike the existing statuses/filter endpoint, these rules are retained and are not specified at connection time.
- COVID-19 stream
How do I store tweets?
You can easily write to a CSV file using the Python csv
module.
Here are some options for storing in a database.
- Twitter MQ feed - this project stores in MongoDB.
- Streaming Twitter Data into a MySQL Database
SQLite
Demo script using SQLite
"""
Python SQLite demo.
The sqlite3 library is a Python builtin. Read more in the Python 3 docs:
https://docs.python.org/3/library/sqlite3.html
See also the SQLite docs:
https://www.sqlite.org/docs.html
"""
import sqlite3
conn = sqlite3.connect('db.sqlite')
cur = conn.cursor()
create_sql = """
CREATE TABLE IF NOT EXISTS tweet(
id INTEGER PRIMARY KEY,
status_id INTEGER,
screen_name VARCHAR(30),
message VARCHAR(255)
)
"""
cur.execute(create_sql)
conn.commit()
# Mock data that would be fetched from the API.
# Note each item in the list is a list.
tweets = [
[123, "foo", "Hello, world!"],
[124, "bar", "Hello, Tweepy!"],
]
# Note that id is not known upfront but can be left to autoincrement by specifying NULL.
insert_sql = """
INSERT INTO tweet VALUES (NULL, ?, ?, ?)
"""
cur.executemany(insert_sql, tweets)
fetch_sql = """
SELECT *
FROM tweet
"""
cur.execute(fetch_sql)
print(cur.fetchall())
conn.commit()
conn.close()
Direct messages
Methods relating to Twitter account direct messages.
Please ensure you comply with the Twitter API policies and do not spam users. See Twitter policies page to find links to appropriate docs.
Tweepy API docs: Direct message methods
Twitter API docs:
- Sending and receiving events overview
Receiving messages events
You can retrieve Direct Messages from up to the past 30 days with GET direct_messages/events/list.
Consuming Direct Messages in real-time can be accomplished via webhooks with the Account Activity API.
List messages
Get direct messages to the authenticated Twitter account (such as your bot) in the last 30 days.
dms = api.list_direct_messages()
The default value for count is 20
and this can be increased to 50
.
If you need to get more than that, using paging.
tweepy.Cursor(api.direct_messages, count=50).items(200)
Twitter API docs: List messages endpoint
Get message
Fetch a message by known ID.
dm_id = dms[0].id
dm = api.get_direct_message(dm_id)
Twitter API docs: Show message endpoint
Get attributes on a message object
- Get the text of a message.
dm.message_create['message_data']['text'])
- Get recipient user ID:
dm.message_create['target']['recipient_id']
See the Direct message section on the Models page to see a preview of the full structure.
Show all data
Print the entire object, prettified with the json
builtin library.
import json
print(json.dumps(dm.message_create, indent=4))
Filter to messages from a certain user
user_id = "12345"
filtered_dms = [dm for dm in dms if msg.message_create['target']['recipient_id'] == user_id
We use a list comprehension here with an if
condition, as it is has faster performance than a standard for
loop and also it can be more readable (since it fits on one line and there's no .append
step needed).
If don't have a user ID, then Lookup user ID for a screen name.
Here's a more complete example:
dms = api.list_direct_messages()
screen_name = "foo"
user_id = api.get_user(screen_name).id
for dm in dms:
if dm.message_create['target']['recipient_id'] == str(user_id):
print(dm.message_create['message_data']['text'])
Send message
Send a direct message to given user ID.
user_id = "123"
msg = "Hello, world!
api.send_direct_message(user_id, msg)
If don't have a user ID, then Lookup user ID for a screen name.
Twitter API docs: Create message endpoint - see optional parameters like quick_reply
and attachment
.
Get rate limit status
Twitter provides an endpoint to get the rate limit status for your token across all endpoints at once.
data = api.rate_limit_status()
The response is a dict
which you can lookup like this:
data['resources']['statuses']['/statuses/home_timeline']
data['resources']['users']['/users/lookup']
See more on the Rate limit status section of the models page.
Twitter API docs: Get app rate limit status
There is also a way to get the rate limit stats on the response object on a successful call, though this is not covered here.