Programmish
Twitter WHOIS using Python

Some test cases are pathological; they exist beyond the bounds of our initial planning, and in some cases beyond the boundary of what we want to support. While mapping social networks on Twitter, I kept running into edge cases where some people have too many followersAs evidenced by the Twitter API dropping a connection or returning a corrupt document. This can wreak havoc in a long running process that only saves state periodically. An obvious work around that doesn’t detect pathological cases is to maintain a constant cache of nodes that have been walked, and using threads or sub-processes to crawl the network graph.. My software already takes into account a ’sanity’ limit in what it can realistically process (and what is realistically interesting), but the mode in which it tested this ’sanity’ exposed another pathological case: super-super-nodes.

Some people have hundreds of friends, and this is interesting. Some users have thousands, still somewhat interesting, but beyond what is realistically able to be mapped when using the default rate limit of 100 queries per hour. So what’s the pathological case? 1,000,000+ friends or followers. Follow any social graph on Twitter and you’re bound to run into one of these nodes. The pathology exists in two places: it’s unrealistic for my software to attempt to determine this pathological case by first trying to download a list of ‘friend’ IDs from this node (this was the initial means of mapping the social graph, which has since be altered to test for these extreme cases before mapping begins). The second pathology is in the Twitter API itself, which will often return a 502 Bad Gateway response when faced with a friend/follower list that excedes a few hundred thousand entities. Not terribly descriptive, especially as the API has predefined error conditions and responses for a range of other cases.

The upshot of all of this? When purveying the depths of Twitter through Python (or any other language) and racking up user ID information, it can be useful to have a WHOIS analog for Twitter IDs. This script is a basic version of a Twitter Whois, accepting both User IDs and User names:

#!/usr/bin/python
import sys
import cjson
import urllib2

def api_rate_limit():
    return api_request('account/rate_limit_status')

def api_user_info(userInfo):
    return api_request('users/show/' + userInfo)

def api_request(path):
    uri = 'http://twitter.com/' + path + '.json'
    handle = urllib2.urlopen(uri)
    return cjson.decode(handle.read(), all_unicode=False)

if __name__ == '__main__':
    whoisInfo = ["name", "location", "followers_count", "friends_count", "statuses_count", "url"]
    for query in sys.argv[1:]:
        api_limit = api_rate_limit()
        if api_limit['remaining_hits'] > 0:
            print "whois: %s" % (query)
            who = api_user_info(query)
            for whoisItem in whoisInfo:
                if whoisItem in who.keys():
                    print "    %s: %s" % (whoisItem, who[whoisItem])
        else:
            print "Out of API requests..."
            sys.exit(0)

Example usage:

$./twitter_whois.py 813286
whois: 813286
    name: Barack Obama
    location: Chicago, IL
    followers_count: 1145759
    friends_count: 771817
    statuses_count: 269
    url: http:\/\/www.barackobama.com
Leave a Reply