Programmish
Posts Tagged python
Twitter WHOIS using Python

Some test cases are pathological; they exist beyond the bounds of our initial planning, and in some cases beyond the boundary of what we want to support. While mapping social networks on Twitter, I kept running into edge cases where some people have too many followersAs evidenced by the Twitter API dropping a connection or returning a corrupt document. This can wreak havoc in a long running process that only saves state periodically. An obvious work around that doesn’t detect pathological cases is to maintain a constant cache of nodes that have been walked, and using threads or sub-processes to crawl the network graph.. My software already takes into account a ’sanity’ limit in what it can realistically process (and what is realistically interesting), but the mode in which it tested this ’sanity’ exposed another pathological case: super-super-nodes.

Some people have hundreds of friends, and this is interesting. Some users have thousands, still somewhat interesting, but beyond what is realistically able to be mapped when using the default rate limit of 100 queries per hour. So what’s the pathological case? 1,000,000+ friends or followers. Follow any social graph on Twitter and you’re bound to run into one of these nodes. The pathology exists in two places: it’s unrealistic for my software to attempt to determine this pathological case by first trying to download a list of ‘friend’ IDs from this node (this was the initial means of mapping the social graph, which has since be altered to test for these extreme cases before mapping begins). The second pathology is in the Twitter API itself, which will often return a 502 Bad Gateway response when faced with a friend/follower list that excedes a few hundred thousand entities. Not terribly descriptive, especially as the API has predefined error conditions and responses for a range of other cases.

The upshot of all of this? When purveying the depths of Twitter through Python (or any other language) and racking up user ID information, it can be useful to have a WHOIS analog for Twitter IDs. This script is a basic version of a Twitter Whois, accepting both User IDs and User names:

#!/usr/bin/python
import sys
import cjson
import urllib2

def api_rate_limit():
    return api_request('account/rate_limit_status')

def api_user_info(userInfo):
    return api_request('users/show/' + userInfo)

def api_request(path):
    uri = 'http://twitter.com/' + path + '.json'
    handle = urllib2.urlopen(uri)
    return cjson.decode(handle.read(), all_unicode=False)

if __name__ == '__main__':
    whoisInfo = ["name", "location", "followers_count", "friends_count", "statuses_count", "url"]
    for query in sys.argv[1:]:
        api_limit = api_rate_limit()
        if api_limit['remaining_hits'] > 0:
            print "whois: %s" % (query)
            who = api_user_info(query)
            for whoisItem in whoisInfo:
                if whoisItem in who.keys():
                    print "    %s: %s" % (whoisItem, who[whoisItem])
        else:
            print "Out of API requests..."
            sys.exit(0)

Example usage:

$./twitter_whois.py 813286
whois: 813286
    name: Barack Obama
    location: Chicago, IL
    followers_count: 1145759
    friends_count: 771817
    statuses_count: 269
    url: http:\/\/www.barackobama.com
Graham’s Number, Modular Exponentiation, and Python

Directly calculating Graham’s Number is impossible, to the point that even writing out approximations of what the calculation should look like exceed the capacity of the universe in all but the simplest forms. But we can have some fun in calculating what some of the least significant digits look like by observing the convergent properties of “Power Towers”.

Using a simple formula to calculate the rightmost digits of Graham’s Number is straight forward:

def calc_graham(d):
        x = 3
        for n in range(0,d):
                x = (3 ** x) % (10 ** d)
        return x

This is, of course, horrifically slow. Depending upon the number of digits d we want to find, we’ll be calculating 3 to the power of a number of length d digits d times. Trying to calculate anything more than 6 or 7 digits will bring most computers to a crawl, much less trying to compute 100 or more digits.

Modular Exponentiation to the rescue. Modular exponentiation is a technique used throughout computer science (especially cryptography) to simplify the computation of equations of the type:

C = be mod m

Our calculation of the rightmost digits in the Graham Number fit this exactly, all we need to do is whip up a modpow method for Python integers:

def modpow(base, exponent, modulus):
        result = 1
        while exponent > 0:
                if exponent & 1 == 1:
                        result = (result * base) % modulus
                exponent = exponent >> 1
                base = (base * base) % modulus

This method is based upon an algorithm in Bruce Schneier’s Applied Cryptography. With this algorithm in place, the entire program to calculate the rightmost digits of Graham’s Number becomes:

#!/usr/bin/python
import sys
import time

def calc_graham(d):
        x = 3
        td = 10**d
        print "Iterating..."
        for n in range(0,d):
                print " step %d" % (n + 1)
                x = modpow(3, x, td)
        return x

def modpow(base, exponent, modulus):
        result = 1
        while exponent > 0:
                if exponent & 1 == 1:
                        result = (result * base) % modulus
                exponent = exponent >> 1
                base = (base * base) % modulus
        return result

if __name__ == '__main__':
        digits = int(sys.argv[1])
        partitionSize = 50
        if len(sys.argv) > 2:
                partitionSize = int(sys.argv[2])

        st = time.time()
        graham = calc_graham(digits)
        ed = time.time()

        print "calculation took %f seconds" % (ed - st)
        print "right most %d digits of G are:\n" % (digits)

        grahamString = "%d" % (graham)
        while len(grahamString) > 0:
                print grahamString[:partitionSize]
                grahamString = grahamString[partitionSize:]

Calculating the least significant digits of Graham’s Number in this way is still limited in efficiency by the number of digits being computed, but is orders of magnitude faster than the straight forward approach. And the last 500 digits of G?

24259506950647383956574791365193517983345353625214300354012602677162267216041981
06522631693551887803881448314065252616878509555264605107117200099709291249544378
88749606288291172506300130362293491608025459461494578871427832350829242102091825
89675356043086993801689249889268099510169055919951195027887178308370183402364745
48882222161573228010132974509273445945043433009010969280253527518332898844615089
40424826501819385156253579639961899396790549663800322234872396701848518643905910
4575627262464195387
The Deep Perl Gotcha

Every computer language has gotchas — strange quirks of syntax or persistent languages bugs (which, unfortunately, will sometimes morph into features) that trip you up or cause bizarre behavior of what would otherwise be a sound chunk of code. Gotchas happen, and there isn’t much use in complaining about them, other than to try and get the underlying behavior fixed or changed. But sometimes a gotcha is so strange, so removed from what you would normally consider an edge condition, that you do a double take.

Diagnosing a gotcha when working on a larger body of code can be somewhat aided by the feel of a language, the expectations you’ve come to terms with when programming in a specific syntax. I was recently working on a medium sized (1000+ line) Perl program when I happened upon what I would later realize was a strange gotcha — the problem was that it took quite awhile to determine the actual source of the problem. Some gotchas arise from using new or different libraries; thankfully this wasn’t the case, as those problems can often cause collateral work that significantly effects the length of time a programming task can take. In this case the gotcha was so fundamental as to cause me to step back and reevaluate some basic assumptions about how standard language features worked in Perl and other languages. In fact I had to test out this language feature in a few other languages to reassure myself that:

  • I wasn’t, in fact, crazy.
  • Perl is, in fact, crazy.

It’s a pretty simple feature, when it comes down to it (nevermind the amount of time it took to isolate this problem, as it also suffered from being buried in a rather deep stack of recursive calls): iterate over an associative array of elements twice; the first time exit the iteration once an appropriate item is found, the second time iterate over the associative array completely.

Suffice it to say this is the behavior I expected, as programmed in Python:

#!/usr/bin/python

set = { 'a' : 'aaa', 'b' : 'bbb', 'c' : 'ccc', 'd' : 'ddd', 'e' : 'eee', 'f' : 'fff', 'g' : 'ggg' }

print "First pass..."
for key, val in set.items():
    print "%s -> %s" % (key, val)
    if val == 'bbb':
        break;
print

print "Second pass..."
for key, val in set.items():
    print "%s -> %s" % (key, val)

The results for this vary slightly depending upon how the hash elements are stored, but a sample run of this program provides the output:

First pass...
a -> aaa
c -> ccc
b -> bbb

Second pass...
a -> aaa
c -> ccc
b -> bbb
e -> eee
d -> ddd
g -> ggg
f -> fff

So the first pass prints the elements of the associative array up until a certain value is found, and the second iteration prints out all of the elements.

The same behavior is evident in Ruby:

#!/usr/bin/ruby
set = {
        'a' => 'aaa',
        'b' => 'bbb',
        'c' => 'ccc',
        'd' => 'ddd',
        'e' => 'eee',
        'f' => 'fff',
        'g' => 'ggg'
      }

puts "First pass..."
set.each do |k,v|
    puts k + " -> " + v
    if v == 'bbb' then
        break
    end
end
puts

puts "Second pass..."
set.each do |k,v|
    puts k + " -> " + v
end

generates

First pass...
a -> aaa
b -> bbb

Second pass...
a -> aaa
b -> bbb
c -> ccc
d -> ddd
e -> eee
f -> fff
g -> ggg

Now for the Perl version, which left me scratching my head:

#!/usr/bin/perl
use strict;
use warnings;

my %set = (
    -a => 'aaa',
    -b => 'bbb',
    -c => 'ccc',
    -d => 'ddd',
    -e => 'eee',
    -f => 'fff',
    -g => 'ggg'
);

print "First run...n";
while ( my ($key, $val) = each %set ) {
    print "$key -> $valn";
    last if ($val eq 'ggg');
}
print "n";

print "Second run...n";
while ( my ($key, $val) = each %set ) {
    print "$key -> $valn";
}

results in

First run...
-a -> aaa
-c -> ccc
-g -> ggg

Second run...
-f -> fff
-e -> eee
-d -> ddd
-b -> bbb

Look at the output for a minute. Let it sink in: the first iteration displays all of the elements up until a specific value is found, and the second iteration continues from where the first left off. Being contrary to my basic understanding of how such an algorithm would work, some digging turned up an entry in perldoc:

The next call to each after that will start iterating again. There is a single iterator for each hash, shared by all each, keys, and values function calls in the program; it can be reset by reading all the elements from the hash, or by evaluating keys HASH or values HASH .
via perldoc

Thankfully I had a work around in my code, but the workaround is non-optimal as it makes deep assumptions about the layout and quality of incoming values from a database. Most of the time you don’t need to iterate through values in an associate arry in an arbitrary break dependent manner, as you either know or don’t know the key for the value, but in cases where partial iteration through an associative array is necessary this quirk in Perl is a fair pain to work around.

NSTextView Highlighting with Python

Get the sample code here: PyObjC-Highlight.zip

Just about every program that displays, manages, or edits structured or parsable data has some form of syntax highlighting. This sample application presents a naive syntax highlighter for a YAML like markup language, using PyObjC and Cocoa.

Our highlighter covers three basic elements:

  • Comments, which start with a pound (#) symbol
  • Key – value pairs, which are separated by a colon (:)
  • List items, which start with a dash (-)

Comment lines are formatted with green text and a normal font. The Key of the key – value pair is colored blue and presented in a bold font. The dash of all list items is colored red, using a normal font. The remainder of the text should be represented in a black system default font.

In this example the text highlighting should be applied whenever the text changes, which although convenient for a simple example does cause issues for large documents that take a long time to parse and highlight, as the UI will freeze during the highlight process. There are a few solutions to this problem:

  • Run the highlighter in a background thread, triggered whenever typing has stopped
  • Use an incremental parser/highlighter that only updates the current and immediately surrounding lines

We’ll bypass these solutions in favor of a straightforward example. To implement a highlighter that updates whenever the text changes we’ll use the textDidChange delegate method of the NSText class.

def textDidChange_(self, notification):
	"""
	Delegate method called by the NSTextView whenever the contents of the
	text view have changed. This is called after the text has changed and
	been committed to the view. See the Cocoa reference documents:

http://developer.apple.com/documentation/Cocoa/Reference/ApplicationKit/Classes/NSText_Class/Reference/Reference.html

http://developer.apple.com/documentation/Cocoa/Reference/ApplicationKit/Classes/NSTextView_Class/Reference/Reference.html

	Specifically the sections on Delegate Methods for information on additional
	delegate methods relating to text control is NSTextView objects.
	"""

	# Retrieve the current contents of the document and start highlighting
	content = self.highlightedText.string()
	self.highlightText(content)

In this method we retrieve the contents of the NSTextView in our UI, and pass that content to our highlightText method:

def highlightText(self, content):
	"""
	Apply our customized highlighting to the provided content. It is assumed that
	this content was extracted from the NSTextView.
	"""

	# Calling the setAttributesForRange with no values creates
	# a default that "resets" the formatting on all of the content
	self.setAttributesForRange(None, None, None, None)

	# We'll highlight the content by breaking it down into lines, and
	# processing each line one by one. By storing how many characters
	# have been processed we can maintain an "offset" into the overall
	# content that we use to specify the range of text that is currently
	# being highlighted.
	contentLines = content.split("n")
	highlightOffset = 0

	for line in contentLines:

		if line.strip().startswith("#"):
			# Comment - we want to highlight the whole comment line
			self.setAttributesForRange(NSColor.greenColor(), None, highlightOffset, len(line))

		elif line.find(":") > -1:
			# Tag - we only want to highlight the tag, not the colon or the remainder of the line
			startOfLine = line[0: line.find(":")]
			yamlTag = startOfLine.strip("t ")
			yamlTagStart = line.find(yamlTag)
			self.setAttributesForRange(NSColor.blueColor(), "bold", highlightOffset + yamlTagStart, len(yamlTag))

		elif line.strip().startswith("-"):
			# List item - we only want to highlight the dash
			listIndex = line.find("-")
			self.setAttributesForRange(NSColor.redColor(), None, highlightOffset + listIndex, 1)

		# Add the processed line to our offset, as well as the newline that terminated the line
		highlightOffset += len(line) + 1

The technique we use for parsing our content is primitive (and error prone) but suffices for a simple example. After breaking the content into lines, each line is checked against a basic format for our three highlighting targets. If the line matches, we determine the range of the text from that line to which highlighting will be applied. We call our setAttributesForRange method with a color (an NSColor object), a font strength (“normal” or “bold”), and the location and length of the text in the original document to highlight.

Throughout the parsing we need to maintain a tally of how many characters have been processed, as the highlighting is being applied to the original content, and we need an index into that content. Normally in Python we would use two indices into a List to denote a slice or range; unfortunately Objective-C uses an index and length to create a range (specifically with the NSRange class). This conversion doesn’t add a lot of code to our example, but it can be a point in the code for bugs to pop up.

The actual highlighting takes place in the setAttributesForRange method:

def setAttributesForRange(self, color, font, rangeStart, rangeLength):
	"""
	Set the visual attributes for a range of characters in the NSTextView. If
	values for the color and font are None, defaults will be used.

	The rangeStart is an index into the contents of the NSTextView, and
	rangeLength is used in combination with this index to create an NSRange
	structure, which is passed to the NSTextView methods for setting
	text attributes. If either of these values are None, defaults will
	be provided.

	The "font" parameter is used as an key for the "fontMap", which contains
	the associated NSFont objects for each font style.
	"""
	fontMap = {
				"normal" : NSFont.systemFontOfSize_(self.fontSize),
				"bold" : NSFont.boldSystemFontOfSize_(self.fontSize)
				}

	# Setup sane defaults for the color, font and range if no values
	# are provided
	if color is None:
		color = NSColor.blackColor()
	if font is None:
		font = "normal"        

	if font not in fontMap:
		font = "normal"

	displayFont = fontMap[font]

	if rangeStart is None:
		rangeStart = 0

	if rangeLength is None:
		rangeLength = len(self.highlightedText.string()) - rangeStart

	# Set the attributes for the specified character range
	range = NSRange(rangeStart, rangeLength)
	self.highlightedText.setTextColor_range_(color, range)
	self.highlightedText.setFont_range_(displayFont, range)

This method takes a color (NSColor object), a font type (“normal” or “bold”), and the location and length of the text to highlight in the NSTextView. If any of the parameters is None a sensible default value is used. When called with all None parameters the default is to highlight the entire range of the document with a normal weight black font (in fact the method is called like this at the start of the highlightText method to reset the highlights on the document). The actual methods used to apply formatting are the setTextColor_range_ and setFont_range_ methods of the NSTextView.

Using PyObjC with NSThread

Get the sample code here: PyObjC-Thread.zip

This is a small sample project showing how to invoke Python methods as background tasks when writing applications using the PyObjC Python to Objective-C bridge. Normally any method invoked as the result of a User Interface action is run in the UI thread, which for most fast tasks is just fine, there will be no noticable degredation of user experience.

But when a long running task is run on the primary application thread, the UI will become non-responsive, and updates to the UI from within the task will be delayed until the task completes, which means that any indications of progress will be lost. This can be seen in the sample application with the startProgressNoThread_ and runProgressNoThread methods:

@objc.IBAction
def startProgressNoThread_(self, sender):
	"""
	This method is triggered by the "Progress without Thread" button in
	the UI.
	"""
	NSLog("Starting the progress bar without background thread")
	self.runProgressNoThread()

def runProgressNoThread(self):
	"""
	Update an NSProgressIndicator in a loop, with a small delay between
	each update to the indicator.

	Because this method is called on the application event loop, nothing
	will update on the UI until this method is complete. The NSProgressIndicator
	will not properly update, and the calls to disable and then enable the
	buttons will not happen properly. All of the UI updates will be delayed
	until this method has finished. This method also completely locks the UI,
	allowing no interaction or input.
	"""
	self.buttonsEnabled(False)

	self.progress_bar.setMinValue_(0.0)
	self.progress_bar.setMaxValue_(100.0)
	self.progress_bar.setIndeterminate_(False)

	for i in range(0, 101):
		self.progress_bar.setDoubleValue_(i * 1.0)
		time.sleep(0.05)

	self.buttonsEnabled(True)

These methods can be found in the PyObjC_ThreadAppDelegate.py file. The startProgressNoThread_ method is triggered by the Progress without Thread button in the application interface. If you run this application, and click on this button, you’ll notice that the interface locks up for the duration of the runProgressNoThread method, with no UI updates and no UI response. You can’t even exit the program normally. To fix this user experience we can take advantage of the NSThread Objective-C class, and run this task in a background thread. Adding a background thread introduces only two new lines of code, and modifies one existing line of code as compared to the non-threaded example.

To invoke a background thread we need to create an NSThread object and initialize it with a method to execute in the background, and then start the thread:

@objc.IBAction
def startProgressThreaded_(self, sender):
	"""
	This method is triggered by the "Progress with Thread" button in the
	UI. Instead of directly calling a Python method, we use the NSThread
	class to start a new background thread, passing the object method to
	be invoked in this new thread.
	"""
	NSLog("Starting progress bar with background thread")

	t = NSThread.alloc().initWithTarget_selector_object_(self,self.runProgressThreaded, None)
	t.start()

This method is triggered by the Progress with Thread button in our interface. Rather than directly calling our task (as the startProgressNoThread_ method does), we create an NSThread, with the method runProgressThreaded as an argument. The t.start() call kicks off our background thread.

Every thread in an Objective-C program uses a pool of memory to manage and control how and when memory is allocated to objects, and how that memory gets cleaned up. The application thread has already created the memory pool used by the UI and the rest of our program, but because we’re starting out own background thread, we need to make sure to create a new NSAutoreleasePool to be used by objects within our thread.

def runProgressThreaded(self):
	"""
	Update an NSProgressIndicator in a loop, with a small delay between
	each update to the indicator.

	This method is called in a background thread. Each thread in an Objective-C program
	has it's own NSAutoreleasePool for memory management. We need to create this pool
	before updating any objects in our UI. Creating and then releasing this autorelease
	pool only adds two lines of code, as compared to the runProgressNoThread method.

	While this method is running the UI will update normally and respond to user input.
	"""
	self.buttonsEnabled(False)

	pool = NSAutoreleasePool.alloc().init()

	self.progress_bar.setMinValue_(0.0)
	self.progress_bar.setMaxValue_(100.0)
	self.progress_bar.setIndeterminate_(False)

	for i in range(0, 101):
		self.progress_bar.setDoubleValue_(i * 1.0)
		time.sleep(0.05)

	pool.release()

	self.buttonsEnabled(True)

The runProgressThreaded method is nearly identical to the runProgressNoThread method, with the addition of the memory pool, which we create using the NSAutoreleasePool class. If you run the application you’ll see that when you click the Progress with Thread button the UI remains responsive, the NSProgressIndicator updates throughout the task process, and you can exit the program normally while the task is running.

This example is obviously contrived, but running a task in the background like this is feasible for any long running Python method.

Python and iCal — Completing Tasks

Once I could print out iCal Tasks using Python, I though I’d move on to marking tasks as complete using Python. The following program will take a list of Task UIDs as parameters on the command line, and mark the corresponding tasks as complete:

#!/usr/bin/python
from CalendarStore import CalCalendarStore
import sys

if len(sys.argv) < 2:
    print "Syntax:nt%s  ..." % (sys.argv[0])
    sys.exit(0)

store = CalCalendarStore.defaultCalendarStore()

for taskUID in sys.argv[1:]:
    task = store.taskWithUID_(taskUID)
    task.setIsCompleted_(True)
    store.saveTask_error_(task, None)
    print "%s marked complete" % (task.title())

This program attempts to retrieve each Task from the CalendarStore, mark each as complete, and print out a simple message. This does assume that all of the UIDs are valid, and skips the requisite error checking you’d need to make this a robust and useful tool, but the idea is there. The output of the program looks like:

./todoComplete.py A771147B-3F98-4895-8CB1-8466A7433DD0
Task #3 marked complete
Python and iCal — ToDo List

Digging around i PyObjC and the CalendarStore (which is the local shared calendar database for which ever user is logged in), I thought it would be fun to pull the Tasks from my ToDo list into Python. This small chunk of code will extract all available of your tasks from all of your calendars:

#!/usr/bin/python
from CalendarStore import CalCalendarStore

store = CalCalendarStore.defaultCalendarStore()
calendars = CalCalendarStore.calendars(store)
predicate = CalCalendarStore.taskPredicateWithCalendars_(calendars)
tasks = store.tasksWithPredicate_(predicate)

for task in tasks:
    completedMark = " "
    completeDate = task.completedDate()
    if completeDate is not None:
        completedMark = "#"
    print "%s(%s:%s) %s" % (completedMark, task.calendar().title(), task.priority(), task.title())
    print "   uid: %s" % (task.uid())

The end result is a list of tasks:

#(Home:5) Look, a task!
   uid: 1AD57B3A-C31A-4BBD-BCB7-19E78E4060B9
 (Home:1) Task #1
   uid: 240C7449-B74B-4666-98DF-1A6C6F6564CF
 (Work:5) Task #2
   uid: A36AFE1B-5BC6-4A3A-B18F-7B5DB4E4A5F3
 (Home:9) Task #3
   uid: A771147B-3F98-4895-8CB1-8466A7433DD0

The values in parentheses are the Calendar from which the task is taken, and the task priority. Tasks without a priority have a value of 0, priority of High is 1, Medium is 5, and Low is 9. The text following the parenthesized values is the title of the task, below which is the Task’s UID, which is used by the CalendarStore to reference specific tasks. Any tasks that have a completedDate of None are not yet marked as complete in iCal. Those tasks that are complete are printed with a hash (#) sign in front of them.

Python and Apple AddressBook

Among various other tasks for a project this week I’ve been writing some Python classes to query Apple’s AddressBook database. OS X 10.5 ships with PyObjC, a Python to Objective-C bridge, preinstalled, which makes this kind of task fairly straight forward. Or, at least, it should. A few minutes of digging discovered that what few examples of working with Python and AddressBook were easy to find were incomplete, broken, or counter to the task I was trying to accomplish. Add to that a lack of any good documentation on the PyObjC methods, and it fell to experimenting and digging through Objective-C documentation.



What follows is a method for retrieving a list of People from the local AddressBook as a list of Python Dictionaries.

from AddressBook import *
import pprint

def addressBookToList():
        """
        Read the current user's AddressBook database, converting each person
        in the address book into a Dictionary of values. Some values (addresses,
        phone numbers, email, etc) can have multiple values, in which case a
        list of all of those values is stored. The result of this method is
        a List of Dictionaries, with each person represented by a single record
        in the list.
        """
        # get the shared addressbook and the list of
        # people from the book.
        ab = ABAddressBook.sharedAddressBook()
        people = ab.people()

        peopleList = []

        # convert the ABPerson to a hash
        for person in people:
                thisPerson = {}
                props = person.allProperties()
                for prop in props:

                        # skip some properties
                        if prop == "com.apple.ABPersonMeProperty":
                            continue
                        elif prop == "com.apple.ABImageData":
                            continue

                        # How we convert the value depends on the ObjC
                        # class used to represent it
                        val = person.valueForProperty_(prop)
                        if type(val) == objc.pyobjc_unicode:
                                # Unicode String
                                thisPerson[prop.lower()] = val
                        elif issubclass(val.__class__, NSDate):
                                # NSDate
                                thisPerson[prop.lower()] = val.description()
                        elif type(val) == ABMultiValueCoreDataWrapper:
                                # List -- convert each item in the list
                                # into the proper format
                                thisPerson[prop.lower()] = []
                                for valIndex in range(0, val.count()):
                                        indexedValue = val.valueAtIndex_(valIndex)
                                        if type(indexedValue) == objc.pyobjc_unicode:
                                                # Unicode string
                                                thisPerson[prop.lower()].append(indexedValue)
                                        elif issubclass(indexedValue.__class__, NSDate):
                                                # Date
                                                thisPerson[prop.lower()].append(indexedValue.description())
                                        elif type(indexedValue) == NSCFDictionary:
                                                # NSDictionary -- convert to a Python Dictionary
                                                propDict = {}
                                                for propKey in indexedValue.keys():
                                                        propValue = indexedValue[propKey]
                                                        propDict[propKey.lower()] = propValue
                                                thisPerson[prop.lower()].append(propDict)
                peopleList.append(thisPerson)
        return peopleList

So, given an entry in the AddressBook that looked like:




This method will have the following Dictionary in the list of returned People:

{   u'address': [   {   u'city': u'Anytown',
                        u'country': u'USA',
                        u'countrycode': u'us',
                        u'state': u'NY',
                        u'street': u'123 Fake Street',
                        u'zip': u'10111'}],
    u'aiminstant': [u'john_doe_aim'],
    u'creation': u'2008-04-09 13:33:17 -0500',
    u'email': [u'john@doe.com'],
    u'first': u'John',
    u'last': u'Doe',
    u'modification': u'2008-04-09 13:33:17 -0500',
    u'phone': [u'555-555-1212'],
    u'uid': u'BBFAB17F-591A-4D3C-BE75-4FE5B25B984D:ABPerson'}
Adding logarithms to Python’s Decimal type

The Python Decimal class is a handy arbitrary precision numeric type for dealing with extraordinarily large or small numbers, or numbers requiring exact precision. The strength of this class is in it’s use of overridden methods that allow it to be used in standard mathematical constructs:

>>> Decimal(2) ** 16 - 1
Decimal("65535")

Because of these overridden methods, the class can in fact be passed to many of the math library’s methods:

>>> from math import *
>>> floor(Decimal("1.21"))
1.0
>>> fmod(Decimal("10.25"), Decimal("10"))
0.25
>>> modf(Decimal("1.234"))
(0.23399999999999999, 1.0)

Of course the short coming of this is what is returned: a built in float type. When using the math library methods we lose the precision of the Decimal type. We could attempt to use the math library log method to calculate the logarithm of a Decimal type:

>>> log(Decimal(100), 10)
2.0
>>> log(Decimal(100), 3.14)
4.0247145803329714

For a log that returns an integer this works well, but for anything requiring decimal precision we lose the accuracy and precision of the Decimal type. We can instead use an adaptation of the common logarithm, reconstructed to work as a class method for the Decimal type. The Python code:

def dec_log(self, base=10):
	cur_prec = getcontext().prec
	getcontext().prec += 2
	baseDec = Decimal(10)
	retValue = self

	if isinstance(base, Decimal):
		baseDec = base
	elif isinstance(base, float):
		baseDec = Decimal("%f" % (base))
	else:
		baseDec = Decimal(base)

	integer_part = Decimal(0)
	while retValue < 1:
		integer_part = integer_part - 1
		retValue = retValue * baseDec
	while retValue >= baseDec:
		integer_part = integer_part + 1
		retValue = retValue / baseDec

	retValue = retValue ** 10
	decimal_frac = Decimal(0)
	partial_part = Decimal(1)
	while cur_prec > 0:
		partial_part = partial_part / Decimal(10)
		digit = Decimal(0)
		while retValue >= baseDec:
			digit += 1
			retValue = retValue / baseDec
		decimal_frac = decimal_frac + digit * partial_part
		retValue = retValue ** 10
		cur_prec -= 1
	getcontext().prec -= 2

	return integer_part + decimal_frac

This algorithm calculates the integer and fractional portions of the logarithm in separate passes, composing the two when returning the value. Taking a cue from a comment by Jake Voytko on adding nth roots to Python, this algorithm adjusts the working precision of the Decimal types by adding 2 extra digits of precision during the calculation, and trimming two digits from the precision when the calculation is complete, to ensure that the trailing digits of the result are as accurate as possible (a technique I should have used in previous examples, had I consulted the Decimal numeric recipes).

The logarithm can be added to the Decimal type and used with no parameters, a float, an integer or a Decimal parameter. Without a parameter the logarithm defaults to log10:

>>> Decimal.log = dec_log
>>> Decimal(100).log()
Decimal("2.000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000")
>>> Decimal(100).log(3.14)
Decimal("4.024714580332970585367941961553706972817670933108784336660631220853887
020695465320530995672957013298")
>>> Decimal(1234).log(Decimal("1.1234"))
Decimal("61.17246795014182187657829372589717334008147684454521224410597688872638
875290237619255864354194950937")
Making Python √, adding nth roots

The simplest means of working with large numbers in Python is using the Decimal type, which supports exact representation of numbers with arbitrary precision. We’ve already used the Decimal type when working with π, γ, and ζ.

The Decimal type is especially useful as it overrides all of the basic mathematical operators, so instead of using method calls to perform operations on a Decimal number, you can use normal operators:

>>> Decimal(2) ** 2
Decimal("4")
>>> 1 / Decimal(4)
Decimal("0.25")

One of the built in methods for the Decimal type is sqrt, which generates the square root of the current Decimal value. But what if you want the cube root, or the fifth root, or even the 713th root of a number? We can add generic roots to the Decimal class with an n-th root algorithm.

This root finding algorithm is fast converging, so we can get results fairly quickly, and the method is fairly simple. For a given n-th root:
n√A
Where we want the nth root of A, we can use the following pattern:

  1. Make an initial guess at the value, which we’ll call x0
  2. At every iteration we build on the previous values for x, such that xk + 1 = 1 / n * ( (n – 1) * xk + A / xkn – 1)
  3. Repeat step 2 until the last iteration and this iteration have the same result, at which point we’ve reached the proper value in the current precision

The algorithm looks like this in Python:

from decimal import *

def nth_root_gen(num, n, digits, iter):
	getcontext().prec = digits
	a = Decimal(num)
	oneOverN = 1 / Decimal(n)
	nMinusOne = Decimal(n) - 1
	curVal = Decimal(num) / (Decimal(n) ** 2)
	if curVal <= Decimal("1.0"):
		curVal = Decimal("1.1")
	lastVal = 0
	curN = Decimal(1)
	while (lastVal != curVal) and (curN <= iter):
		lastVal = curVal
		curVal = oneOverN * ( (nMinusOne * curVal) + (a / (curVal ** nMinusOne)))
		yield curVal
		curN = curN + 1

def nth_root(num, n, digits):
	getcontext().prec = digits
	a = Decimal(num)
	oneOverN = 1 / Decimal(n)
	nMinusOne = Decimal(n) - 1
	curVal = Decimal(num) / (Decimal(n) ** 2)
	if curVal <= Decimal("1.0"):
		curVal = Decimal("1.1")
	lastVal = 0
	while lastVal != curVal:
		lastVal = curVal
		curVal = oneOverN * ( (nMinusOne * curVal) + (a / (curVal ** nMinusOne)))
	return curVal

def dec_ext_nth_root(self, root):
	a = self
	oneOverN = 1 / Decimal(root)
	nMinusOne = Decimal(root) - 1
	curVal = self / (Decimal(root) ** 2)
	if curVal <= Decimal("1.0"):
		curVal = Decimal("1.1")
	lastVal = 0
	while lastVal != curVal:
		lastVal = curVal
		curVal = oneOverN * ( (nMinusOne * curVal) + (a / (curVal ** nMinusOne)))
	return curVal

Two of these methods can be used directly in code or in the Python interpreter, nth_root_gen and nth_root. The former is a generator that yields it’s current value at every iteration, so you can monitor the value as it converges on the actual root value:

>>> for r in nth_root_gen(100, 7, 20, 100):
...     print r
...
1.9470038113262390670
1.9311017464441574735
1.9306979823757723882
1.9306977288833500144
1.9306977288832501670
1.9306977288832501670

or using the nth_root method:

>>> print nth_root(100, 7, 20)
1.9306977288832501670

With both of these we’re calculating the 7th root of 100, to an accuracy of 20 decimal places. In the generator version we also specify a maximum number of iterations to use before giving up the calculation, in this case 100.

The third method, dec_ext_nth_root, has a considerably different method signature. That’s because it’s designed to extend the built in Decimal class, by adding a generic nth_root method. It can be used like this:

Decimal.nth_root = dec_ext_nth_root

That only needs to be executed once for any script that wants to extend the Decimal class. After that, you can use the nth_root method attached to any Decimal object:

>>> Decimal.nth_root = dec_ext_nth_root
>>> a = Decimal(125)
>>> a.sqrt()
Decimal("11.180339887498948482")
>>> a.nth_root(2)
Decimal("11.180339887498948482")
>>> a.nth_root(3)
Decimal("5.0000000000000000000")
>>> a.nth_root(15)
Decimal("1.3797296614612148324")
>>> a.nth_root(120)
Decimal("1.0410563801900299289")

So we’ve fairly easily added a fast converging method to the Python Decimal object to handle arbitrary roots. There are, of course, some limitations: the algorithm we use depends on the Decimal classes exponentiation operator, which only supports whole number powers, which means that we an only calculate whole number roots. This means that we can find the 3rd root, but not the 3.14th root. This doesn’t effect the number we’re finding the root of, however. Any number that can be represented by the Decimal class we can take the root of, but the root type must be a whole number. We’re also limited to positive roots; we can’t calculate the -3rd root.