I’ve just released a new set of Python classes for managing NSPredicateEditor and converting NSPredicate objects into SQL snippits that can be used in non-managed databases (ie: outside of CoreData). Details and code are available on GitHub: PredicateUtility.
Example Usage in an Application Delegate (see GitHub and included Example Application for more detail):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | from Foundation import * from AppKit import * from objc import IBOutlet, IBAction from PredicateUtility import * class Example01AppDelegate(NSObject): editor = IBOutlet() def applicationDidFinishLaunching_(self, sender): # create the editor self.predicateManager = PredicateEditorManager.alloc().initWithPredicateEditor_(self.editor) # add a few search criteria self.predicateManager.addMappedCriteria("First Name", "firstname", operators = [PredicateEditorManager.OP_EQ, PredicateEditorManager.OP_BEGINSWITH]) self.predicateManager.addMappedCriteria("Last Name", "lastname", operators = [PredicateEditorManager.OP_NE, PredicateEditorManager.OP_CONTAINS]) self.predicateManager.addCriteria("zipcode") # build the predicate manager self.predicateManager.build() self.predicateManager.addRow() @IBAction def generateSQL_(self, id): print "sql: ", self.predicateManager.wrappedPredicate().toSQL() |
![]()
Pulling information from the Census Bureauhttp://www.census.gov/popest/counties/ and the USGSSpecifically the Board on Geographic names: http://geonames.usgs.gov/domestic/index.html, I’ve create two quick SVG maps as a follow on to yesterday’s map of school density in the US. The top map is the density of cities by county, while the lower map details the US Census estimated 2008 population of each county. I’ve maintained similar color schemes for each to remove any ambiguity, and while the scaling of the city density map uses the same numeric thresholds as the school density map, the population density map breaks the county population down into 6 brackets:
Using the guide provided by Flowing Data, with a few modifications for reusability, I’ve generated an SVG map of the United States color coded for school (K-12) density. The original Scalable Vector Graphic map is the public domain FIPS County Code Map for the United States. School locations were compiled from SchoolDataDirect databasesState Education Data Center (Dist.). [2009/11/23]. [School District Locations and Addresses] [State Directories]. Washington, DC: Council of Chief State School Officers. Accessed on [November 23, 2009] from, http://www.SchoolDataDirect.org
The only times I end up working with PHP are when modifying a WordPress template to add new functionality or somehow tweak existing functionality. Most of the time this is a straightforward process; the WordPress functions and code layout are surprisingly clear, and tend to take care of obvious tasks for you, without having to munge around too much data by hand.
Alas, in this case I wanted to deviate from the standard output of a function, notably the wp_tag_cloudhttp://codex.wordpress.org/Template_Tags/wp_tag_cloud method. Rather than an ordered or unordered list, I wanted to display all of the tags used in the blog in a single drop down list. I didn’t care how many posts were tagged with each, I just wanted a drop down list that would navigate to a tag archive. A little bit of digging, and some experimentation lead to the following code snippit:
<select name="tag-dropdown" onChange="document.location.href='<?php bloginfo('url'); ?>/?tag=' + this.options[this.selectedIndex].value;">
<option value=""><?php echo attribute_escape(__('Browser Archive by Tag')); ?></option>
<?php
$args = array(
'number' => 0,
'format' => 'array',
'taxonomy' => 'post_tag'
);
$tags = get_terms( $args['taxonomy'], args);
foreach ($tags as $key => $tag ) {
$term = &get_term($tag->term_id, 'post_tag');
echo "<option value='" . $tag->slug . "'>" . $term->slug . "</option>";
}
?>
</select>
The results are a simple alphabetically ordered drop down of Tags used in the blog.
Some test cases are pathological; they exist beyond the bounds of our initial planning, and in some cases beyond the boundary of what we want to support. While mapping social networks on Twitter, I kept running into edge cases where some people have too many followersAs evidenced by the Twitter API dropping a connection or returning a corrupt document. This can wreak havoc in a long running process that only saves state periodically. An obvious work around that doesn’t detect pathological cases is to maintain a constant cache of nodes that have been walked, and using threads or sub-processes to crawl the network graph.. My software already takes into account a ‘sanity’ limit in what it can realistically process (and what is realistically interesting), but the mode in which it tested this ‘sanity’ exposed another pathological case: super-super-nodes.
Some people have hundreds of friends, and this is interesting. Some users have thousands, still somewhat interesting, but beyond what is realistically able to be mapped when using the default rate limit of 100 queries per hour. So what’s the pathological case? 1,000,000+ friends or followers. Follow any social graph on Twitter and you’re bound to run into one of these nodes. The pathology exists in two places: it’s unrealistic for my software to attempt to determine this pathological case by first trying to download a list of ‘friend’ IDs from this node (this was the initial means of mapping the social graph, which has since be altered to test for these extreme cases before mapping begins). The second pathology is in the Twitter API itself, which will often return a 502 Bad Gateway response when faced with a friend/follower list that excedes a few hundred thousand entities. Not terribly descriptive, especially as the API has predefined error conditions and responses for a range of other cases.
The upshot of all of this? When purveying the depths of Twitter through Python (or any other language) and racking up user ID information, it can be useful to have a WHOIS analog for Twitter IDs. This script is a basic version of a Twitter Whois, accepting both User IDs and User names:
#!/usr/bin/python
import sys
import cjson
import urllib2
def api_rate_limit():
return api_request('account/rate_limit_status')
def api_user_info(userInfo):
return api_request('users/show/' + userInfo)
def api_request(path):
uri = 'http://twitter.com/' + path + '.json'
handle = urllib2.urlopen(uri)
return cjson.decode(handle.read(), all_unicode=False)
if __name__ == '__main__':
whoisInfo = ["name", "location", "followers_count", "friends_count", "statuses_count", "url"]
for query in sys.argv[1:]:
api_limit = api_rate_limit()
if api_limit['remaining_hits'] > 0:
print "whois: %s" % (query)
who = api_user_info(query)
for whoisItem in whoisInfo:
if whoisItem in who.keys():
print " %s: %s" % (whoisItem, who[whoisItem])
else:
print "Out of API requests..."
sys.exit(0)
Example usage:
$./twitter_whois.py 813286
whois: 813286
name: Barack Obama
location: Chicago, IL
followers_count: 1145759
friends_count: 771817
statuses_count: 269
url: http:\/\/www.barackobama.com
Directly calculating Graham’s Number is impossible, to the point that even writing out approximations of what the calculation should look like exceed the capacity of the universe in all but the simplest forms. But we can have some fun in calculating what some of the least significant digits look like by observing the convergent properties of “Power Towers”.
Using a simple formula to calculate the rightmost digits of Graham’s Number is straight forward:
def calc_graham(d):
x = 3
for n in range(0,d):
x = (3 ** x) % (10 ** d)
return x
This is, of course, horrifically slow. Depending upon the number of digits d we want to find, we’ll be calculating 3 to the power of a number of length d digits d times. Trying to calculate anything more than 6 or 7 digits will bring most computers to a crawl, much less trying to compute 100 or more digits.
Modular Exponentiation to the rescue. Modular exponentiation is a technique used throughout computer science (especially cryptography) to simplify the computation of equations of the type:
C = be mod m
Our calculation of the rightmost digits in the Graham Number fit this exactly, all we need to do is whip up a modpow method for Python integers:
def modpow(base, exponent, modulus):
result = 1
while exponent > 0:
if exponent & 1 == 1:
result = (result * base) % modulus
exponent = exponent >> 1
base = (base * base) % modulus
This method is based upon an algorithm in Bruce Schneier’s Applied Cryptography. With this algorithm in place, the entire program to calculate the rightmost digits of Graham’s Number becomes:
#!/usr/bin/python
import sys
import time
def calc_graham(d):
x = 3
td = 10**d
print "Iterating..."
for n in range(0,d):
print " step %d" % (n + 1)
x = modpow(3, x, td)
return x
def modpow(base, exponent, modulus):
result = 1
while exponent > 0:
if exponent & 1 == 1:
result = (result * base) % modulus
exponent = exponent >> 1
base = (base * base) % modulus
return result
if __name__ == '__main__':
digits = int(sys.argv[1])
partitionSize = 50
if len(sys.argv) > 2:
partitionSize = int(sys.argv[2])
st = time.time()
graham = calc_graham(digits)
ed = time.time()
print "calculation took %f seconds" % (ed - st)
print "right most %d digits of G are:\n" % (digits)
grahamString = "%d" % (graham)
while len(grahamString) > 0:
print grahamString[:partitionSize]
grahamString = grahamString[partitionSize:]
Calculating the least significant digits of Graham’s Number in this way is still limited in efficiency by the number of digits being computed, but is orders of magnitude faster than the straight forward approach. And the last 500 digits of G?
24259506950647383956574791365193517983345353625214300354012602677162267216041981 06522631693551887803881448314065252616878509555264605107117200099709291249544378 88749606288291172506300130362293491608025459461494578871427832350829242102091825 89675356043086993801689249889268099510169055919951195027887178308370183402364745 48882222161573228010132974509273445945043433009010969280253527518332898844615089 40424826501819385156253579639961899396790549663800322234872396701848518643905910 4575627262464195387
In my current day to day work I spend a lot of time writing Javascript, and a significant portion of that Javascript involves Events. Of course creating event observers is dead simple with Prototype, as is stopping event observers. For better or for worse Javascript isn’t C, and sometimes we lose track of which events we’re observing, and which we’ve cleared, and we start to create a fairly large Event.cache.
Stumbling across Juriy Zaytsev’s Event Counter Bookmarklet I decided to do a bit of debugging of an internal data analyst application current under development, and to my astonishment discovered that after a few minutes within the application I had nearly 4,000 Event observers hanging around. Granted, this application handles editing of rather complex documents, but something wasn’t getting cleaned out properly. After manually digging out the low hanging fruit, I had managed to drop half of those dangling observers, but I was still missing quite a few, and over time these smaller leaks were bound to bog down the application. I decided I needed a bookmarklet with a bit finer grain, that could enumerate the types of observers in the document. Building upon Juriy’s use of the inject method of Prototype Enumerables I built the following simple script:
javascript:alert(
"Event Observersn" + $H(Event.cache).inject( $H(),
function (acc, p) {
$H(p.value).each(
function (evtT) {
if ( !acc.get(evtT.key) )
acc.set(evtT.key, 0);
acc.set(evtT.key,
acc.get(evtT.key) + evtT.value.size()
);
}
);
return acc;
}
).inject( "",
function (acc,p) {
acc += "n"
+ p.key
+ ":"
+ ( " ".times(40 - p.key.length) )
+ p.value;
return acc;
}
)
)
Which you can grab with this bookmarklet.
This bookmarklet produces a popup with an enumerated list of observers by type:
![]()
Anyone working with Javascript for any significant amount of time will probably run into the problem of namespace pollution, where a function (or a whole host of functions) defined in one library override those defined in another. With the proliferation of libraries that extend built in object Prototypes, this problem is extended to the basic behavior of objects acting in ways contrary to what we would expect. When this namespace pollution seeps into library functions and basic objects, it can cause incredibly pernicious problems that are difficult (if not impossible) to track down.
In any significantly useful library or application it’s impossible to completely avoid adding objects and functions to the global namespace, but a structured and sane approach is usually better than seeding objects throughout the global window object. Consider a simple Javascript file that we might load in a page:
var myValue = 42;
function foo() {
alert("Hey there, it's foo.");
}
function bar() {
alert("Hey there, it's bar.");
}
function baz() {
alert("Hey there, it's baz.");
}
Once loaded we can directly access these three functions by name, as they’ve become part of the global namespace. If we were to load another file, with another function named “foo”, “bar”, or “baz”, we’d have a conflict where we attempted to define the same function in the global namespace twice.
A better approach would be to create our own namespace, and store our functions inside of this distinct object. We’d still be extending the global namespace, but only once, for our top level namespace object. This is far from a new idea. Some examples build namespacing using PrototypeJS (or one from Mark Ziesemer), some use the Yahoo API for namespaces. All of these have in common the same basic idea: bring module or package style namespacing familiar from other languages into JavaScript. The module/package structure allows us to create nested sets of classes and objects that can all be accessed from a single base object, so instead of putting all of our functions in the global namespace, as we did above, we could create a different structure to store our functions and data:
com.programmish.values.myValue = 42;
com.programmish.lib.foo = function() {
alert("Hey there, it's foo.");
}
com.programmish.lib.bar = function() {
alert("Hey there, it's bar.");
}
com.programmish.lib.baz = function() {
alert("Hey there, it's baz.");
}
It’s a bit more code to write up front, but now we know our functions (and data) are isolated from other libraries or files we might load. The trick, with all of the namespacing solutions, is creating the namespace object heirarchy; the objects nested within objects. Here’s a simple version, using pure (no external libraries expected) JavaScript:
function N(namespace) {
var pieces = namespace.split('.'), node = window, piece;
while ( piece = pieces.shift() )
node = !node[piece] ? node[piece] = {} : node[piece];
}
Rewriting our example from above using this namespacing function:
// create the 'values' namespace
N('com.programmish.value');
com.programmish.values.myValue = 42;
// create the 'lib' namespace
N('com.programmish.lib');
com.programmish.lib.foo = function() {
alert("Hey there, it's foo.");
}
com.programmish.lib.bar = function() {
alert("Hey there, it's bar.");
}
com.programmish.lib.baz = function() {
alert("Hey there, it's baz.");
}
When we create a new namespace, any existing parts of that namespace that have already been created are left intact, with only new sections of the namespace created. This namespace code compacts nicely to 85 characters, which is plenty small to include in a top level script tag on any page:
function N(n){var p=n.split('.'),b=window,s;while(s=p.shift())b=!b[s]?b[s]={}:b[s];}
So please, use some form of JavaScript namespacing.
Every computer language has gotchas — strange quirks of syntax or persistent languages bugs (which, unfortunately, will sometimes morph into features) that trip you up or cause bizarre behavior of what would otherwise be a sound chunk of code. Gotchas happen, and there isn’t much use in complaining about them, other than to try and get the underlying behavior fixed or changed. But sometimes a gotcha is so strange, so removed from what you would normally consider an edge condition, that you do a double take.
Diagnosing a gotcha when working on a larger body of code can be somewhat aided by the feel of a language, the expectations you’ve come to terms with when programming in a specific syntax. I was recently working on a medium sized (1000+ line) Perl program when I happened upon what I would later realize was a strange gotcha — the problem was that it took quite awhile to determine the actual source of the problem. Some gotchas arise from using new or different libraries; thankfully this wasn’t the case, as those problems can often cause collateral work that significantly effects the length of time a programming task can take. In this case the gotcha was so fundamental as to cause me to step back and reevaluate some basic assumptions about how standard language features worked in Perl and other languages. In fact I had to test out this language feature in a few other languages to reassure myself that:
It’s a pretty simple feature, when it comes down to it (nevermind the amount of time it took to isolate this problem, as it also suffered from being buried in a rather deep stack of recursive calls): iterate over an associative array of elements twice; the first time exit the iteration once an appropriate item is found, the second time iterate over the associative array completely.
Suffice it to say this is the behavior I expected, as programmed in Python:
#!/usr/bin/python
set = { 'a' : 'aaa', 'b' : 'bbb', 'c' : 'ccc', 'd' : 'ddd', 'e' : 'eee', 'f' : 'fff', 'g' : 'ggg' }
print "First pass..."
for key, val in set.items():
print "%s -> %s" % (key, val)
if val == 'bbb':
break;
print
print "Second pass..."
for key, val in set.items():
print "%s -> %s" % (key, val)
The results for this vary slightly depending upon how the hash elements are stored, but a sample run of this program provides the output:
First pass... a -> aaa c -> ccc b -> bbb Second pass... a -> aaa c -> ccc b -> bbb e -> eee d -> ddd g -> ggg f -> fff
So the first pass prints the elements of the associative array up until a certain value is found, and the second iteration prints out all of the elements.
The same behavior is evident in Ruby:
#!/usr/bin/ruby
set = {
'a' => 'aaa',
'b' => 'bbb',
'c' => 'ccc',
'd' => 'ddd',
'e' => 'eee',
'f' => 'fff',
'g' => 'ggg'
}
puts "First pass..."
set.each do |k,v|
puts k + " -> " + v
if v == 'bbb' then
break
end
end
puts
puts "Second pass..."
set.each do |k,v|
puts k + " -> " + v
end
generates
First pass... a -> aaa b -> bbb Second pass... a -> aaa b -> bbb c -> ccc d -> ddd e -> eee f -> fff g -> ggg
Now for the Perl version, which left me scratching my head:
#!/usr/bin/perl
use strict;
use warnings;
my %set = (
-a => 'aaa',
-b => 'bbb',
-c => 'ccc',
-d => 'ddd',
-e => 'eee',
-f => 'fff',
-g => 'ggg'
);
print "First run...n";
while ( my ($key, $val) = each %set ) {
print "$key -> $valn";
last if ($val eq 'ggg');
}
print "n";
print "Second run...n";
while ( my ($key, $val) = each %set ) {
print "$key -> $valn";
}
results in
First run... -a -> aaa -c -> ccc -g -> ggg Second run... -f -> fff -e -> eee -d -> ddd -b -> bbb
Look at the output for a minute. Let it sink in: the first iteration displays all of the elements up until a specific value is found, and the second iteration continues from where the first left off. Being contrary to my basic understanding of how such an algorithm would work, some digging turned up an entry in perldoc:
The next call to each after that will start iterating again. There is a single iterator for each hash, shared by all each, keys, and values function calls in the program; it can be reset by reading all the elements from the hash, or by evaluating keys HASH or values HASH .
via perldoc
Thankfully I had a work around in my code, but the workaround is non-optimal as it makes deep assumptions about the layout and quality of incoming values from a database. Most of the time you don’t need to iterate through values in an associate arry in an arbitrary break dependent manner, as you either know or don’t know the key for the value, but in cases where partial iteration through an associative array is necessary this quirk in Perl is a fair pain to work around.
Get the sample code here: PyObjC-Highlight.zip
Just about every program that displays, manages, or edits structured or parsable data has some form of syntax highlighting. This sample application presents a naive syntax highlighter for a YAML like markup language, using PyObjC and Cocoa.
Our highlighter covers three basic elements:
Comment lines are formatted with green text and a normal font. The Key of the key – value pair is colored blue and presented in a bold font. The dash of all list items is colored red, using a normal font. The remainder of the text should be represented in a black system default font.

In this example the text highlighting should be applied whenever the text changes, which although convenient for a simple example does cause issues for large documents that take a long time to parse and highlight, as the UI will freeze during the highlight process. There are a few solutions to this problem:
We’ll bypass these solutions in favor of a straightforward example. To implement a highlighter that updates whenever the text changes we’ll use the textDidChange delegate method of the NSText class.
def textDidChange_(self, notification): """ Delegate method called by the NSTextView whenever the contents of the text view have changed. This is called after the text has changed and been committed to the view. See the Cocoa reference documents: http://developer.apple.com/documentation/Cocoa/Reference/ApplicationKit/Classes/NSText_Class/Reference/Reference.html http://developer.apple.com/documentation/Cocoa/Reference/ApplicationKit/Classes/NSTextView_Class/Reference/Reference.html Specifically the sections on Delegate Methods for information on additional delegate methods relating to text control is NSTextView objects. """ # Retrieve the current contents of the document and start highlighting content = self.highlightedText.string() self.highlightText(content)
In this method we retrieve the contents of the NSTextView in our UI, and pass that content to our highlightText method:
def highlightText(self, content):
"""
Apply our customized highlighting to the provided content. It is assumed that
this content was extracted from the NSTextView.
"""
# Calling the setAttributesForRange with no values creates
# a default that "resets" the formatting on all of the content
self.setAttributesForRange(None, None, None, None)
# We'll highlight the content by breaking it down into lines, and
# processing each line one by one. By storing how many characters
# have been processed we can maintain an "offset" into the overall
# content that we use to specify the range of text that is currently
# being highlighted.
contentLines = content.split("n")
highlightOffset = 0
for line in contentLines:
if line.strip().startswith("#"):
# Comment - we want to highlight the whole comment line
self.setAttributesForRange(NSColor.greenColor(), None, highlightOffset, len(line))
elif line.find(":") > -1:
# Tag - we only want to highlight the tag, not the colon or the remainder of the line
startOfLine = line[0: line.find(":")]
yamlTag = startOfLine.strip("t ")
yamlTagStart = line.find(yamlTag)
self.setAttributesForRange(NSColor.blueColor(), "bold", highlightOffset + yamlTagStart, len(yamlTag))
elif line.strip().startswith("-"):
# List item - we only want to highlight the dash
listIndex = line.find("-")
self.setAttributesForRange(NSColor.redColor(), None, highlightOffset + listIndex, 1)
# Add the processed line to our offset, as well as the newline that terminated the line
highlightOffset += len(line) + 1
The technique we use for parsing our content is primitive (and error prone) but suffices for a simple example. After breaking the content into lines, each line is checked against a basic format for our three highlighting targets. If the line matches, we determine the range of the text from that line to which highlighting will be applied. We call our setAttributesForRange method with a color (an NSColor object), a font strength (“normal” or “bold”), and the location and length of the text in the original document to highlight.
Throughout the parsing we need to maintain a tally of how many characters have been processed, as the highlighting is being applied to the original content, and we need an index into that content. Normally in Python we would use two indices into a List to denote a slice or range; unfortunately Objective-C uses an index and length to create a range (specifically with the NSRange class). This conversion doesn’t add a lot of code to our example, but it can be a point in the code for bugs to pop up.
The actual highlighting takes place in the setAttributesForRange method:
def setAttributesForRange(self, color, font, rangeStart, rangeLength):
"""
Set the visual attributes for a range of characters in the NSTextView. If
values for the color and font are None, defaults will be used.
The rangeStart is an index into the contents of the NSTextView, and
rangeLength is used in combination with this index to create an NSRange
structure, which is passed to the NSTextView methods for setting
text attributes. If either of these values are None, defaults will
be provided.
The "font" parameter is used as an key for the "fontMap", which contains
the associated NSFont objects for each font style.
"""
fontMap = {
"normal" : NSFont.systemFontOfSize_(self.fontSize),
"bold" : NSFont.boldSystemFontOfSize_(self.fontSize)
}
# Setup sane defaults for the color, font and range if no values
# are provided
if color is None:
color = NSColor.blackColor()
if font is None:
font = "normal"
if font not in fontMap:
font = "normal"
displayFont = fontMap[font]
if rangeStart is None:
rangeStart = 0
if rangeLength is None:
rangeLength = len(self.highlightedText.string()) - rangeStart
# Set the attributes for the specified character range
range = NSRange(rangeStart, rangeLength)
self.highlightedText.setTextColor_range_(color, range)
self.highlightedText.setFont_range_(displayFont, range)
This method takes a color (NSColor object), a font type (“normal” or “bold”), and the location and length of the text to highlight in the NSTextView. If any of the parameters is None a sensible default value is used. When called with all None parameters the default is to highlight the entire range of the document with a normal weight black font (in fact the method is called like this at the start of the highlightText method to reset the highlights on the document). The actual methods used to apply formatting are the setTextColor_range_ and setFont_range_ methods of the NSTextView.