Wiki: Python

Python Coding and Syntax Reference
by Oliver; Dec. 22, 2013
   

Introduction

This is a collection of miscellaneous Python syntax for reference. I'm new to Python—the language responsible for introducing the adjective Pythonic into English—but I've already discovered many of its selling points:
  • the Python shell, invoked by simply typing python on the command line
  • the ability to print out arrays and hashes with a simple print() statement (no loops!)
  • wide usage as a backend web programming language (e.g., Django)
  • regex functionality as good as Perl's
  • good math libraries, good plotting libraries, good libraries all around
  • the ability to easily make modules and have them double as stand-alone scripts
  • an awesome package manager, pip
  • Jupyter, a notebook GUI a bit like RStudio or Matlab
Python is so easy to use and supple—its developers seem to have magically removed the headache from programming—that it's hard to go back once you start using it. As Python's Wikipedia page notes:
An important goal of the Python developers is making Python fun to use.
It's also conquered a giant swath of territory: scientists like it for NumPy, SciPy, and Pandas; web people like it for Django and Flask; teachers like it because it's perfect for beginners; et cetera. Adding to its appeal, the official documentation is comprehensive and elegant.

This wiki—so you know what to expect—is more a reminder to myself than a carefully crafted article. Note that there are two versions of Python, Python 2.x and Python 3.x, which the docs call "the first ever intentionally backwards incompatible Python release." I assume Python 2 here, but I'll use a python3 tag when I want to explicitly discuss a Python 3 feature or point out a difference. For a good (free!) professional tutorial, see the book Dive Into Python (Python 2) or Dive Into Python 3 (Python 3).

The Python Shell

The first lesson of Python is that you can open up a Python shell (i.e., a program that interprets Python commands) on the command line simply by typing:
$ python
(where the $ denotes the ordinary bash prompt). Screenshot:

image

As you can see, the python prompt is typically triple angle brackets:
>>>

Data Types

Here are some, but not all, of the Python data types:
  • int
  • float
  • str
  • list [ ]
  • tuple ( )
  • dict { }
  • set
  • bool
Three of these may be unfamiliar to you: lists are Python arrays; tuples are similar to arrays but they're immutable (so no pushing or popping); and dicts are Python's hashes—i.e., key-value pairs. For example, let's define a Python list:
>>> x = ['a', 'b', 'c', 1, 2, 3]
>>> x
['a', 'b', 'c', 1, 2, 3]
The indices of the list employ zero-based counting:
>>> x[0]
'a'
>>> x[5]
3
>>> x[-1] # negative indicies count backwards
3
We can grab ranges, too. In python, the list range:
x:y
signifies the range from x to y, not including y. For example:
>>> x[1:2]
['b']
>>> x[:3] # from the beginning up to (but not including) 3
['a', 'b', 'c']
>>> x[3:] # from 3 to the end
[1, 2, 3]
Note the same thing works on a plain old string:
>>> s = 'testing'
>>> s[0:4]
'test'
Now, let's define a tuple:
>>> y = ('a', 'b', 'c', 1, 2, 3)
>>> y
('a', 'b', 'c', 1, 2, 3)
For these basic range operations, it behaves similarly to our list x:
>>> y[0]
'a'
>>> y[:3]
('a', 'b', 'c')
However, because tuples are immutable (see the Stackoverflow discussion here), x and y behave differently when it comes to changing elements:
>>> x[0] = 'z' # x is a list
>>> x
['z', 'b', 'c', 1, 2, 3]
>>> y[0] = 'z' # y is a tuple, so this doesn't work
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
Finally, let's define a dict:
>>> z = {'a': 1, 'b': 2, 'c': 3}
>>> z
{'a': 1, 'c': 3, 'b': 2}
We can input a key to access a value:
>>> z['a']
1
>>> z['b']
2
Also note you can cast variables from one type to another:
>>> z = {'a': 1, 'b': 2, 'c': 3}
>>> z
{'a': 1, 'c': 3, 'b': 2}
>>> str(z)
"{'a': 1, 'c': 3, 'b': 2}"
>>> list(z)
['a', 'c', 'b']
>>> set(z)
{'a', 'c', 'b'}
Forget what variable type you're using? You can always call Python's type() function. E.g., here:
>>> type(z)
<type 'dict'>

Print

The print statement in Python is simple. In Python 2, you can print with either of these syntaxes:
>>> print 'joe'
joe
>>> print('joe')
joe
python3
However, in Python 3 print() is a proper function and thus accepts only this syntax:
>>> print('joe')
joe
Therefore, it's a good idea to always use parentheses whichever version of Python you're using. Check out What’s New In Python 3.0 for more differences. To quote that source:
Old: print "The answer is", 2*2
New: print("The answer is", 2*2)

Old: print x,           # Trailing comma suppresses newline
New: print(x, end=" ")  # Appends a space instead of a newline

Old: print              # Prints a newline
New: print()            # You must call the function!

Old: print >>sys.stderr, "fatal error"
New: print("fatal error", file=sys.stderr)

Old: print (x, y)       # prints repr((x, y))
New: print((x, y))      # Not the same as print(x, y)!
(Source: What’s New In Python 3.0)

A wonderfully convenient feature of Python is that it can handle printing objects of any datatype. E.g.:
>>> z = {'a': 1, 'b': 2, 'c': 3}
>>> print(z)
{'a': 1, 'c': 3, 'b': 2}
Want to print to stderr, not stdout, in your script? That's:
import sys
sys.stderr.write('Error\n')

Data Types, Continued: Objects in Python

In the section on Data Types, we defined a list, a tuple, and a dict:
>>> x
['a', 'b', 'c', 1, 2, 3]
>>> y
('a', 'b', 'c', 1, 2, 3)
>>> z
{'a': 1, 'c': 3, 'b': 2}
Python is a modern object-oriented programming language and, as such, x is an instance (or object, if you like) of the list class; y is an instance of the tuple class; and z is an instance of the dict class. Dive Into Python tells us about objects in Python:
Everything in Python is an object, and almost everything has attributes and methods. All functions have a built-in attribute __doc__, which returns the doc string defined in the function's source code. The sys module is an object which has (among other things) an attribute called path. And so forth.

Still, this begs the question. What is an object? Different programming languages define “object” in different ways. In some, it means that all objects must have attributes and methods; in others, it means that all objects are subclassable. In Python, the definition is looser; some objects have neither attributes nor methods, and not all objects are subclassable. But everything is an object in the sense that it can be assigned to a variable or passed as an argument to a function.

This is so important that I'm going to repeat it in case you missed it the first few times: everything in Python is an object. Strings are objects. Lists are objects. Functions are objects. Even modules are objects.
To see the attributes of, say, z, we can call the function dir() on it (scroll right):
>>> dir(z)
['__class__', '__cmp__', '__contains__', '__delattr__', '__delitem__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'clear', 'copy', 'fromkeys', 'get', 'has_key', 'items', 'iteritems', 'iterkeys', 'itervalues', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values', 'viewitems', 'viewkeys', 'viewvalues']
Note I use attributes to mean any variables inherent to z and any methods you can call on z (some people use attributes to mean the variables or properties only, not the methods). First, what's with the double underscores? This blog has a succinct explanation:
[A double underscore before and after a name indicates a] special method name used by Python. As far as one’s concerned, this is just a convention, a way for the Python system to use names that won’t conflict with user-defined names. You then typically override these methods and define the desired behaviour for when Python calls them. For example, you often override the __init__ method when writing a class.
Stackoverflow elaborates:
Note that names with double leading and trailing underscores are essentially reserved for Python itself: "Never invent such names; only use them as documented."
So these are internal functions we can ignore for now. The other methods, however, are useful for us. Python uses the dot syntax, so you can access the attribute of an object as:
object.attribute
Let's try some of the methods which dir() printed out. The keys() method prints a list of the dict's keys:
>>> z.keys()
['a', 'c', 'b']
The values() method prints a list of the dict's values:
>>> z.values()
[1, 3, 2]
And items() prints a list of (key, value) tuples:
>>> z.items()
[('a', 1), ('c', 3), ('b', 2)]
Look what happens if we call z.items instead of z.items():
>>> z.items
<built-in method items of dict object at 0x7fa213d42400>
The note says that items is a method of our object, and we also get a reference to the object or, if you like, its id. 0x7fa213d42400 is the object z's memory address (in hexadecimal). You can also see this with the built-in function id():
>>> id(z)
140334094099456
As a sanity check, verify that 140334094099456 == 0x7fa213d42400:
>>> int('0x7fa213d42400', 0)
140334094099456
As Dive Into Python mentioned, we can examine our method's __doc__ attribute:
>>> print(z.items.__doc__)
D.items() -> list of D's (key, value) pairs, as 2-tuples
This is a good illustration of how the dot syntax conveniently allows us to chain things together, accessing an object's attribute's attribute, ad infinitum.

We'll see below that we define functions in Python using the keyword def. If we define a simple function describe:
>>> def describe(x): print x.__doc__
then:
>>> describe(z.items)
D.items() -> list of D's (key, value) pairs, as 2-tuples
>>> describe(z.items())
list() -> new empty list
list(iterable) -> new list initialized from iterable's items
In the first case, we're getting the doc string of the items function; in the second case, calling items returns a list so we're getting the doc string of a list. Python has two builtin functions, type and help, which you can try calling on z.items and z.items() as an exercise.

Into the Weeds on Python Objects

There's a great stackoverflow post that gets into the weeds on Python objects:
And Python has a very peculiar idea of what classes are, borrowed from the Smalltalk language.

In most languages, classes are just pieces of code that describe how to produce an object. That's kinda true in Python too:

>>> class ObjectCreator(object):
... pass
...

>>> my_object = ObjectCreator()
>>> print(my_object)
<__main__.ObjectCreator object at 0x8974f2c>


But classes are more than that in Python. Classes are objects too.

Yes, objects.

As soon as you use the keyword class, Python executes it and creates an OBJECT. The instruction

>>> class ObjectCreator(object):
... pass
...


creates in memory an object with the name "ObjectCreator".

This object (the class) is itself capable of creating objects (the instances), and this is why it's a class.

But still, it's an object, and therefore:
  • you can assign it to a variable
  • you can copy it
  • you can add attributes to it
  • you can pass it as a function parameter
e.g.:

>>> print(ObjectCreator) # you can print a class because it's an object
<class '__main__.ObjectCreator'>
>>> def echo(o):
... print(o)
...
>>> echo(ObjectCreator) # you can pass a class as a parameter
<class '__main__.ObjectCreator'>
>>> print(hasattr(ObjectCreator, 'new_attribute'))
False
>>> ObjectCreator.new_attribute = 'foo' # you can add attributes to a class
>>> print(hasattr(ObjectCreator, 'new_attribute'))
True
>>> print(ObjectCreator.new_attribute)
foo
>>> ObjectCreatorMirror = ObjectCreator # you can assign a class to a variable
>>> print(ObjectCreatorMirror.new_attribute)
foo
>>> print(ObjectCreatorMirror())
<__main__.ObjectCreator object at 0x8997b4c>
(Source: What is a metaclass in Python?)

I encourage you to read the complete post! The article also gives a little bit of Python trivia about the type() function:
Since classes are objects, they must be generated by something.

When you use the class keyword, Python creates this object automatically. But as with most things in Python, it gives you a way to do it manually.

Remember the function type? The good old function that lets you know what type an object is:

Well, type has a completely different ability, it can also create classes on the fly. type can take the description of a class as parameters, and return a class.

It works like this, fwiw:
>>> myClass = type('myClass', (), {'name': 'Oliver'})
>>> myClass
<class '__main__.myClass'>
>>> x = myClass()
>>> x
<__main__.myClass object at 0x109f2f610>
>>> x.name
'Oliver'

Conditional Logic

For example:
a = 0
b = 1

if a:
    print ('A')
elif b:
    print ('B')
else:
    print ('C')

# output is B
A unique feature of Python is that there are no curly brackets { } to demarcate blocks of code and define scope. Instead, indentation serves this purpose. The conventional unit of indentation in Python is a half-tab (4 spaces), although you can use a different number of spaces as long as you're consistent. Also note that, unlike many programming languages, you don't need a semi-colon at the end of a line (although you can still use one to combine two lines:
>>> print('hello'); print('hello')
hello
hello
)

Loops

Basic for loop to print 1 through 3:
>>> for i in [1, 2, 3]: print(i)
1
2
3
or:
>>> for i in range(1,3+1): print(i)
1
2
3
You can loop over more complicated data structures, like a list of tuples:
>>> for i in [('a', 1), ('c', 3), ('b', 2)]: print(i)
('a', 1)
('c', 3)
('b', 2)
If we loop with two variables, we get:
>>> for i,j in [('a', 1), ('c', 3), ('b', 2)]: print(i)
a
c
b
Or we can print out both:
for i,j in [('a', 1), ('c', 3), ('b', 2)]:
    print('i = ' + i + ', j = ' + str(j))
This yields:
i = a, j = 1
i = c, j = 3
i = b, j = 2
We get the same result if we have a dict z, such that:
>>> z = {'a': 1, 'c': 3, 'b': 2}
and our loop is:
>>> for i,j in z.items(): print('i = ' + i + ', j = ' + str(j))
And if we have two tuples, we can zip them together and get the same result again:
>>> x = ('a', 'c', 'b')
>>> y = (1, 3, 2)
>>> for i,j in zip(x,y): print('i = ' + i + ', j = ' + str(j))
Oftentimes, we want to loop through a list and print the index as well as the value. We can use the built-in function enumerate to accomplish this:
x = ['a', 'b', 'c']

for i,j in enumerate(x):
    print('index: ' + str(i) + ', element: ' + j)
This gives us:
index: 0, element: a
index: 1, element: b
index: 2, element: c
Read about zip, enumerate and the other built-in Python functions here: And remember, you can always get help in the python shell:
>>> help(zip)
>>> help(enumerate)
Python has a syntactically compact way of doing loops called list comprehension we will see below.

File I/O

Reading a file:
with open('myfile', 'r') as f:
    contents = f.read()
Reading a file line by line:
with open('myfile', 'r') as f:
    for line in f:
        print(line),
The with syntax takes care of closing the file object automatically. Note: the comma suppresses the default newline appended by print.

Often, you want the file name to be passed in by the user:
import sys

with open(sys.argv[1], 'r') as f:
    for line in f:
        print(line),
Take input from std:in and write to a file:
with open('myfile', 'w') as f:
    for line in sys.stdin:
        f.write(line)
Read every row of std:in into a list:
# read file into list
contents = sys.stdin.read().split('\n')
Note we can save ourselves an indent by reading from and/or writing to multiple files at once like so:
with open('file1.txt', 'w') as f, open('file2.txt', 'w') as g:
    # do something ...
    f.write('write something\n')
    g.write('write something else\n')

Number Manipulation

Suppose:
>>> counter = 1
To increment:
>>> counter += 1 
>>> counter
2
In Python:
counter++ # this doesn't exist
does not exist.

Integer operations return integers in Python 2, so note the difference between these two expressions:
>>> 1/2
0
>>> 1./2
0.5
python3
Note in Python 3, this behavior changes and 1/2 yields 0.5:
>>> 1/2
0.5
You can still get Python 2 style division with:
>>> 1//2
0
In both Python 2 and 3, use the ** operator to exponentiate:
>>> 3**2
9
>>> 3**3
27
Square root:
>>> 2**(1./2)
1.4142135623730951
Trigonometry (use radians):
>>> import math
>>> math.cos(0)
1.0
>>> math.sin(math.pi/2)
1.0
This uses the math module, which is built in to Python.

Scipy and numpy are popular science and math libraries, which you have to install on your own. For example, to compute 52 choose 5, the number of 5 card combinations from a 52 card deck, it's:
>>> from scipy import special
>>> special.binom(52, 5)
2598960.0

String Manipulation

Split a string on tab, returning a list:
>>> a = 'hello\tkitty'
>>> print(a)
hello	kitty
>>> a.split('\t')
['hello', 'kitty']
Join a list on comma, returning a string:
>>> b = ['hello', 'kitty']
>>> ", ".join(b)
'hello, kitty'
Do a Perl-style chomp—i.e., strip a newline character off of the end of a string:
>>> c = 'hello\n'
>>> print(c)
hello

>>> print(c.rstrip('\n'))
hello
Define a multi-line string:
mystr='''This is
a multiline
string'''

The .format() method

As we've already seen, one way to concatenate strings is to use a plus sign:
a = 'Hello '
b = 'kitty'
print(a + b)
# output is 'Hello kitty'
When you want to throw some variables into a string, a more professional way to do this is to take advantage of the string object's built-in format() method. E.g.:
>>> a = 'Hello'
>>> b = 'kitty'
>>> "{} {}".format(a, b)
'Hello kitty'
>>> "{0} {1}".format(a, b)
'Hello kitty'
>>> "{word1} {word2}".format(word1=a, word2=b)
'Hello kitty'
>>> "{word1} {word2}".format(word1='Hello', word2='kitty')
'Hello kitty'
If you have a dict object, you can use string's format() method like this:
>>> d = {'friend1': 'Kim', 'friend2': 'Joe'}
>>> "Hello {friend1}, Goodnight {friend2}".format(**d)
'Hello Kim, Goodnight Joe'
As this page says:
The special syntax ** before the dictionary indicates that the dictionary is not to be treated as a single actual parameter. Instead keyword arguments for all the entries in the dictionary effectively appear in its place.

Unicode

In Python 2, ASCII is the default character encoding. If you want to use the richer unicode character set (utf-8), you have to prefix your string with a "u". Let's try printing the Chinese and Japanese character for "cat", which is :
>>> print('cat = \u732B')
cat = \u732B
>>> print(u'cat = \u732B')
cat = 猫
python3
In Python 3, however, unicode encoding is default:
>>> print('cat = \u732B')
cat = 猫

List Operations

extend and append

extend and append are two methods to add elements to your list. Suppose we have a list x such that:
>>> x = ['a', 'b', 'c']
If we're adding a string to the list, these two methods do the same thing:
>>> x.extend('d')
>>> x
['a', 'b', 'c', 'd']
>>> x.append('e')
>>> x
['a', 'b', 'c', 'd', 'e']
However, passing a list as an argument reveals the difference between the two. extend merges while append appends:
>>> x.extend(['f', 'g'])
>>> x
['a', 'b', 'c', 'd', 'e', 'f', 'g']
>>> x.append(['h', 'i'])
>>> x
['a', 'b', 'c', 'd', 'e', 'f', 'g', ['h', 'i']]
pop() returns the last element of the list:
>>> x.pop()
['h', 'i']
>>> x
['a', 'b', 'c', 'd', 'e', 'f', 'g']
You can also pass pop an index:
['a', 'b', 'c', 'd', 'e', 'f', 'g']
>>> x.pop(0)
'a'
>>> x
['b', 'c', 'd', 'e', 'f', 'g']
In addition to extend, you can join two lists by using +:
>>> ['x', 'y'] + ['q', 'r']
['x', 'y', 'q', 'r']

range

In Python 2, range is a function that returns a list:
>>> range(1,11)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
xrange constructs an xrange object that behaves similarly but avoids storing a list in memory. These are identical:
for i in range(1,11):
    print(i)
for i in xrange(1,11):
    print(i)
though the later is more efficient if the range is large.

python3
In Python 3, range returns an iterable range object, not a list. The benefit is that, like Python 2 xrange (which doesn't exist in Python 3), it does not need to create a list in memory and can generate the next needed value for your iteration on the fly.

Copying a List

Copy the list x into the new variable y:
>>> y = list(x)
>>> y
['b', 'c', 'd', 'e', 'f', 'g']
The following does not make a fresh copy of x:
>>> z = x
Observe:
>>> x
['b', 'c', 'd', 'e', 'f', 'g']
>>> z = x
>>> z[0] = 5
>>> z
[5, 'c', 'd', 'e', 'f', 'g']
>>> x
[5, 'c', 'd', 'e', 'f', 'g']
We see that changing z also changes x. What's going on? There's a good explanation here:
In Python variables are just tags attached to objects ... If we do: b = a. We didn’t copy the list referenced by a. We just created a new tag b and attached it to the list pointed [to] by a.
We can understand this better if we use the id function:
>>> y = list(x) 
>>> z = x
>>> id(x)
4462945920
>>> id(z) # z has the same id as x
4462945920
>>> id(y) # y doesn't because it's a new copy
4462979192

Example 1

Convert a list of strings into a list of indices:
>>> x = ['a', 'b', 'c']
>>> x = range(len(x))
>>> x
[0, 1, 2]

Example 2

Let's suppose we have a string x such that:
>>> print(x)
0       2
0.1     1
0.2     0
0.3     0
0.4     0
0.5     0
0.6     0
0.7     0
0.8     0
0.9     0
1       0
How can we get each column into a separate list? Python's split works on any whitespace so it evaporates both tabs and newlines:
>>> print(x.split())
['0', '2', '0.1', '1', '0.2', '0', '0.3', '0', '0.4', '0', '0.5', '0', '0.6', \
 '0', '0.7', '0', '0.8', '0', '0.9', '0', '1', '0']
As described in this stackoverflow post, the next step falls under the rubric of list slicing, according to the syntax:
some_list[start:stop:step]
As we've seen, if we leave out the stop part, the range defaults to the end:
>>> print(x.split()[0::2])
['0', '0.1', '0.2', '0.3', '0.4', '0.5', '0.6', '0.7', '0.8', '0.9', '1']
>>> print(x.split()[1::2])
['2', '1', '0', '0', '0', '0', '0', '0', '0', '0', '0']

Dict Operations

Declare an empty dict:
d = {}
Now let's look at some simple operations. Let:
d = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
Basic dict operations we've already seen:
>>> for key in d: print key
a
c
b
d
>>> d.keys()
['a', 'c', 'b', 'd']
>>> d.values()
[1, 3, 2, 4]
>>> d.items()
[('a', 1), ('c', 3), ('b', 2), ('d', 4)]
>>> for key,value in d.items(): print(key, value)
('a', 1)
('c', 3)
('b', 2)
('d', 4)
python3
In Python 3, the dict methods keys(), values(), and items() don't return list objects but instead—for the purposes of efficiency—return iterable "view objects". This change does not concern us much because we can still iterate over them and cast them as lists if necessary (read more: What are Python dictionary view objects?).

In both Python 2 and 3, to get a list of sorted keys:
>>> d
{'a': 1, 'c': 3, 'b': 2, 'd': 4}
>>> sorted(d)
['a', 'b', 'c', 'd']
Check for the existence of a key:
if 'b' in d:
    print(d['b'])
else:
    print('not found')

# output is: 2
Python throws an error if the key doesn't exist:
>>> d['e']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'e'
A better and more succinct style to prevent this sort of key-not-found error is to use the dict object's get() method:
>>> d.get('b', 'notfound')
2
>>> d.get('e', 'notfound')
'notfound'

Multi-dimensional Dicts

In Python, you can create a multi-dimensional or multi-tiered dict which, as you can read on StackOverflow, is a "a dictionary where the values are themselves also dictionaries." The best way to do this is to use defaultdict from Python's collections. Suppose we have a text file, testfile.txt:
1       2       3
234     dfg     wre
x4      few     4k
Our goal is to slurp this up into a dictionary such that the the first two columns represent keys and the last one is the value—e.g.,
d['1']['2'] = '3'
for the first row and so on.

Observe the following the script, testme.py, first using a regular old dict:
#!/usr/bin/env python

import sys

d = {}

with open(sys.argv[1], 'r') as f:
    for line in f:
        (c1, c2, c3) = line.split()
        d[c1][c2] = c3

print(d)
Running this script yields an error:
$ ./testme.py testfile.txt
Traceback (most recent call last):
  File "./testme.py", line 12, in <module>
    d[c1][c2] = c3
KeyError: '1'
because Python is upset we haven't initialized the multi-dimensional dict properly. Now let's use a defaultdict instead of an ordinary one:
#!/usr/bin/env python

import sys
from collections import defaultdict

d = defaultdict(dict)

with open(sys.argv[1], 'r') as f:
    for line in f:
        (c1, c2, c3) = line.split()
        d[c1][c2] = c3

print(d)
Now it works like a charm:
$ ./testme.py testfile.txt
defaultdict(<type 'dict'>, {'1': {'2': '3'}, '234': {'dfg': 'wre'}, 'x4': {'few': '4k'}})

Defining Functions in Python

To define a function use the keyword def, as in:
def myfunction():
    '''This function prints hello world'''
    print('hello world')
The first line of the function is a string which seems to be just floating there. This is called a docstring, and the Python website explains it here:
A docstring is a string literal that occurs as the first statement in a module, function, class, or method definition. Such a docstring becomes the __doc__ special attribute of that object.
So if we call the help function, we get:
>>> help(myfunction)
Help on function myfunction in module __main__:

myfunction()
    This function prints hello world
We'll also see this string if we examine the attribute myfunction.__doc__. While you don't have to write a docstring, it's best to use one in the interest of having well-documented code.

Functions, of course, can return a value as well as just doing something:
>>> def myfunction(x): return x + 1
>>> j = myfunction(3)
>>> j
4
Suppose we define a function to print the mean and variance of a list in a script called example.py:
import numpy as np

def print_basic_stats(x):
    '''Print the mean and variance of a list of numbers'''

    print('The mean is ' + str(np.mean(x)))
    print('The variance is ' + str(np.var(x)))
(numpy is a package with math & science functions) We can access this function on the Python command line as follows:
>>> import example
>>> example.print_basic_stats([2,3,4])
The mean is 3.0
The variance is 0.666666666667
We can access it in another script, say example2.py, in much the same way:
import example

example.print_basic_stats([2,3,4])
There's one wrinkle here. I'm tacitly assuming that I opened up the Python interpreter in the directory where example.py resides. If I don't, I'll get a "No module named example" error. We'll have a similar problem if example2.py and example.py are in different directories. How can we ensure Python finds our example.py script? To answer this question, we have to know all the paths Python searches. We can get this information using the sys module:
>>> import sys
>>> sys.path
If you execute this command, you'll see a list of various directories. As the docs say:
[sys.path is a] list of strings that specifies the search path for modules. Initialized from the environment variable PYTHONPATH, plus an installation-dependent default.
So, if example.py is in the directory /some/path/python_examples, we can add this to Python's search path in example2.py as follows:
import sys

sys.path.append('/some/path/python_examples')

import example

example.print_basic_stats([2,3,4])
Now import example will work in all circumstances. As an aside, you'll notice that after you do this import a file called example.pyc will be created in the directory where example.py resides. This is your program compiled into bytecode and you can read about it on this Stackoverflow link.

List Comprehension, Anonymous (Lambda) Functions, Map

List comprehension is a quick, Pythonic way to manipulate a list without invoking the full machinery of a for loop. Suppose we want to square the elements of the list a and save the result in another list, b. With a traditional for loop, that's:
>>> a = [1, 2, 3]
>>> b = []
>>> for i in a: b.append(i*i)
>>> b
[1, 4, 9]
With list comprehension, we can do it like this:
>>> a = [1, 2, 3]
>>> b = [i*i for i in a]
>>> b
[1, 4, 9]
You can read this as the list of i2 elements produced by the iteration:
for i in a
Another way to manipulate lists in Python is with the map fuction. In the last section, we saw how to define functions. Let's make a function to square a number:
>>> def squarefunction(x): return x*x
>>> squarefunction(2)
4
>>> squarefunction(3)
9
We can define a function without explicitly giving it a name using a lambda function. This is also known as an anonymous function. The simple square function is:
>>> lambda x: x*x
<function <lambda> at 0x108bb1d70>
What's neat is that we can save this function in a variable (again, without ever having given it a name):
>>> y = lambda x: x*x
>>> y(2)
4
>>> y(3)
9
The punchline is that these are all equivalent ways to square each element of our list:
>>> map(y, [1, 2, 3])
[1, 4, 9]
>>> map(lambda x: x*x, [1, 2, 3])
[1, 4, 9]
>>> map(squarefunction, [1, 2, 3])
[1, 4, 9]
>>> [i*i for i in [1, 2, 3]]
[1, 4, 9]
Question: How would you produce the following string with list comprehension:
my/path/1 my/path/2 my/path/3 my/path/4 my/path/5
?

Answer:
" ".join(["my/path/" + str(j) for j in range(1,6)])

Example 1

Here's an example of creating a subset list according to whether or not the orginal list's elements contain some string:
mystr = "1,2:3,2:4"
# create a list from our string:
mylist = mystr.split(",");
# mylist is ['1', '2:3', '2:4']
Now suppose we want our new list to contain only elements of the original list which contain a colon:
mylist_subset = [s for s in mylist if ":" in s]
# mylist_subset is ['2:3', '2:4']

Example 2

Here's another example combining list comprehension and lambda funtions, courtesy of my friend Ohad:
>>> from scipy import log2
>>> H = lambda x: [p*log2(p) for p in x if p>0]
>>> H([1, 2, 3])
[0.0, 2.0, 4.7548875021634682]
>>> H2 = lambda x: -sum([p*log2(p) for p in x if p>0])
>>> H2([1, 2, 3])
-6.7548875021634682

Regex

Regex (regular expression) reminder, which I stole somewhere off the internet:
# \d [0-9] Any digit
# \D [^0-9] Any character not a digit
# \w [0-9a-zA-Z_] Any "word character"
# \W [^0-9a-zA-Z_] Any character not a word character
# \s [ \t\n\r\f] whitespace (space, tab, newline, carriage return, form feed)
# \S [^ \t\n\r\f] Any non-whitespace character

# *      Match 0 or more times
# +      Match 1 or more times
# ?      Match 1 or 0 times
# {n}    Match exactly n times
# {n,}   Match at least n times
# {n,m}  Match at least n but not more than m times
Python has the ability to grab bits of a regular expression and store them in variable. For example:
(?P<my_variable>\w+)
would store anything that matched it (a string of "word" characters of at least length 1) in the variable my_variable. re is the module that deals with regular expression operations in Python.

Example 1 (bioinformatics): grabbing sub-strings out of a string:
import re

line='gene_id "XLOC_033544"; transcript_id "TCONS_00092538";'

match = re.search(r'gene_id "(\S+)"; transcript_id "(\S+)";', line)
geneid = match.group(1)
tranid = match.group(2)

print(geneid, tranid)

# output is: ('XLOC_033544', 'TCONS_00092538')
Example 2 (bioinformatics): printing elements of a certain pattern:
>>> import re
>>> mystr="P1_F=44;P2_F=42;;INDEl;i=xyz;true="
>>> for i in mystr.split(";"):
...  	if (re.search(r'(\w+)=(\w+)', i)): print(i)
...
P1_F=44
P2_F=42
i=xyz
This prints the semi-colon delimited elements that fit the pattern blob=blob.

OOP Python (Object Oriented Programming in Python)

As we've seen, Python is all about object-oriented-ness. For example:
x = 5
instantiates x as a member of the integer class, and we can see all its attributes by calling dir(x).

Let's create our own super-simple "Circle" class—representing a circle—to see how Python's OOP machinery works. We'll put it in a file called cir.py:
#!/usr/bin/env python

import math

class Circle:
    '''A Circle Object'''

    def __init__(self, myradius):
        self.radius = myradius

    def getradius(self):
        return self.radius

    def getcircumference(self):
        return 2*math.pi*self.radius

    def getarea(self):
        return math.pi*self.radius*self.radius

    def setradius(self, r):
        self.radius = r
        print("You've set the radius to " + str(r))
Assuming cir.py is in Python's search path, we run:
>>> from cir import Circle
to import our class into the python shell. Now that we have access to our Circle class, we can make a Circle object:
>>> c = Circle(1.0)
We can call various get and set methods on our object:
>>> c.getarea()
3.141592653589793
>>> c.setradius(4.0)
You've set the radius to 4.0
>>> c.getarea()
50.26548245743669
We can print out the object, as is:
>>> print(c)
<__main__.Circle instance at 0x106b1bc20>
And we can see all the methods we're allowed to call on an object of type Circle using the dir() method:
>>> dir(c)
['__doc__', '__init__', '__module__', 'getarea', 'getcircumference', 'getradius', 'radius', 'setradius']

Plotting with matplotlib

Matplotlib is an excellent tool for plotting. To use it import pylab:
import pylab

pylab.xlabel('x ax')
pylab.ylabel('y ax')
pylab.title('My Plot')
pylab.plot([1,2,3,4,5,6],[2,4,7,3,0,2])
pylab.savefig("/my/path/test.png")
Produces:

image

Importing Modules in Python and the Namespace

What's the difference between:
import pylab
and:
from pylab import *
?

The later floods every function from pylab into the namespace, so we can just type:
plot([1,2,3,4,5,6],[2,4,7,3,0,2])
while the former requires us to use:
pylab.plot([1,2,3,4,5,6],[2,4,7,3,0,2])
Needless to say, this is the safer choice because it won't risk collisions with homemade functions we've created. If we just want to use a specific function from a module, we can import it as:
from pylab import plot
This allows us to call plot rather than pylab.plot. Still, I prefer the verbose way because it makes your code more readable.

__name__

__name__ is a special variable in python. Consider this example from the docs:
def main():
    print 'Running test...'

if __name__ == '__main__':
    main()
The function main() will only run if we execute this script from the command line. If we import it into another script, it won't. This gives us the convenient ability to use a script both as a stand-alone and as an imported module.

Reading Arguments

Let myscript.py be:
#!/usr/bin/env python

import sys

print(sys.argv[0])
print(sys.argv[1])
then:
$ ./myscript.py test
./myscript.py
test
sys.argv[0] is the name of the script itself; sys.argv[1] is the first user-passed argument, and so on.

Exit script if no arguments:
if (len(sys.argv) == 1):
    exit(0)

Example Reading Arguments: Using argparse

The argparse module provides a convenient way to get arguments into your python script. Here's the syntax, following the doc's example:
#!/usr/bin/env python

import argparse

# -------------------------------------

def main():
    '''Main block'''

    args = get_arg()

    if args.verbose:
        print("verbosity turned on")
    if args.vcf:
        print(args.vcf)
    if args.sample:
        print(args.sample)

def get_arg():
    '''Get Arguments'''

    # http://docs.python.org/2/howto/argparse.html
    parser = argparse.ArgumentParser(description="run pipeline")
    parser.add_argument("-v", "--verbose", action="store_true", help="verbose mode")
    parser.add_argument("-f", "--vcf", help="vcf input file")
    parser.add_argument("-s", "--sample", type=int, help="sample index")
    args = parser.parse_args()

    return args

# -------------------------------------

if __name__ == "__main__":

    main()
Another nice feature of argparse lets you take input either as an argument or as piped in from std:in:
parser.add_argument("-i", "--input", type=argparse.FileType('r'), default=sys.stdin, help="input file")

System Commands

Run a system command:
import subprocess

cmd="ls " + "../"

proc = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

# it's already started running with the Popen call; 
# this ensures it finishes before we move on
proc.wait()

# print return code
print(proc.returncode)

# print stdout stderr tuple
proc.communicate()
For example:
import subprocess
>>> cmd="ls nonexistentfile"
>>> proc = subprocess.Popen(cmd, shell=True, 
    stdout=subprocess.PIPE, stderr=subprocess.PIPE)
>>> proc.wait()
2
>>> print(proc.returncode)
2
>>> proc.communicate()
('', 'ls: cannot access nonexistentfile: No such file or directory\n')

Making your Own Python Packages

Let's say we want to make a package to run system commands. We'll make a directory:
$ tree runsys/
runsys/
|-- __init__.py
|-- __init__.pyc
|-- runsys.py
`-- runsys.pyc
where runsys/runsys.py is:
#!/usr/bin/env python

import subprocess

def run_cmd(cmd, bool_verbose, bool_getstdout):
    """Run system cmd"""

    # echo command
    if (bool_verbose):
        print(cmd)

    proc = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    proc.wait()

    # return stdout
    if (bool_getstdout):
        return proc.communicate()[0].rstrip()
Now in some other Python script we could do this:
from runsys.runsys import run_cmd

cmd="mkdir -p tmp"
run_cmd(cmd, 1, 0)

Useful Commands from the os Module

The os module gives you the functionality of some basic unix commands in Python. Get it:
import os
Check for the existence of myfile:
os.path.isfile("myfile")
Get the abs path of myfile:
os.path.abspath("myfile")
Check if myfile is non-zero:
if ( os.path.getsize("myfile") > 0 ):
    ...
Get the cwd:
cwdir = os.getcwd()
Get the directory where your script itself resides:
script_dir = os.path.dirname(__file__)
If we want to make sure we get an absolute file path which reads through symbolic links, we could even do this:
script_dir = os.path.dirname(os.path.realpath(__file__))
Get the name of your script:
os.path.basename(__file__)

Hooking Python up to an Sqlite Database

Import the sqlite library:
import sqlite3
See the docs: For an example script, see Wiki: MySQL and SQLite.

Installing Python Packages with Pip

You can use pip to install python modules. First, you need get pip itself. Install it, as described in the docs (assuming root access):
$ wget https://bootstrap.pypa.io/get-pip.py
$ sudo python get-pip.py
To install packages, the syntax is simple:
$ sudo pip install django
$ sudo pip install numpy
If you don't have root access, you can install the modules locally with the --user flag. For example, to use pylab, install matplotlib:
$ pip install --user matplotlib
This installs stuff in the directory:
$HOME/.local
See all of your installed modules:
$ pip freeze 

virtualenv

From 100 Useful Unix Commands - virtualenv is a command line tool to keep a series of packages isolated in a virtual enviroment Suppose you're working on a number of Python projects. One project has a number of dependencies and you've used pip to install them. Another project has a different set of dependencies, and so on. You could install all of your Python modules in your global copy of Python, but that could get messy. It would be nice if you could associate your dependencies with your particular project. This would also ensure that if two projects have conflicting dependencies—say they depend on different versions of the same module—you can get away with it. Moreover, it would allow you to freely install or update modules to your global Python worry-free, since this won't interfere with your projects. This is what virtualenv does and why it's a boon to Python users.

Following the docs, first install it:
$ sudo pip install virtualenv
To make a new Python installation in a folder called venv, run:
$ virtualenv venv
To emphasize the point, this is like a new copy of Python. To use this Python, type:
$ source venv/bin/activate
As a sanity check, examine which Python you're using:
(venv) $ which python
/some/path/venv/bin/python
It's virtualenv's copy! Now if you, say, install Django:
(venv) $ pip install Django
You can see that you only have the Django module (and wheel):
(venv) $ pip freeze
Django==1.8.7
wheel==0.24.0
Django's source code is going to be installed in a path such as:
venv/lib/python2.7/site-packages
It's common practice to annotate your project's python dependencies in a requirements.txt file:
(venv) $ pip freeze > requirements.txt
Then somebody who, say, clones your project on Github can simply run:
$ virtualenv venv
$ source venv/bin/activate
(venv) $ pip install -r requirements.txt
to get the dependencies.

If you were doing a Django project, everytime you wanted to start coding, the first order of business would be to turn on virtualenv and the last would be to turn it off. To exit virtualenv, type:
(venv) $ deactivate
python3

If you're using Python 3, this stackoverflow post tells us how to get a python3 virtual environment:
$ sudo pip install --upgrade virtualenv # ensure virtualenv is up to date
$ virtualenv -p python3 venv

Simple Python CGI Script

Print out environmental variables:
#!/usr/bin/env python

import os,sys
print "Content-Type: text/html\n"
print("<html>")
print("hello world")
print("<br>")
print("<br>")

keys = os.environ.keys()
keys.sort()
for k in keys:
    print(k)
    print(" ")
    print(os.environ[k])
    print("<br>")

print("</html>")
In the web browser, this will output stuff like:
hello world

ADDRFAM inet 
CONTENT_LENGTH 
CONTENT_TYPE 
DAEMON /usr/bin/uwsgi 
DOCUMENT_ROOT /path/to/html/root
GATEWAY_INTERFACE CGI/1.1 
HTTP_ACCEPT text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 
HTTP_ACCEPT_ENCODING gzip, deflate 
HTTP_ACCEPT_LANGUAGE en-us 
HTTP_CONNECTION keep-alive 
HTTP_COOKIE __unam=ac294eb-13ee2414bd8-5d900a97-6;
HTTP_HOST myuniversity.edu 
HTTP_USER_AGENT Mozilla/5.0 (Macintosh; Intel Mac OS X 10_5_8) \
 AppleWebKit/534.50.2 (KHTML, like Gecko) Version/5.0.6 Safari/533.22.3 
HTTP_X_FORWARDED_FOR 156.145.29.38 
PATH /usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin 
PIDFILE /var/run/uwsgi.pid 

The IPython Shell and IPython Notebook GUI

IPython is an awesome program, which provides a much better python shell with stuff like auto-complete, nice help, easy functionality for system commands and more.

To invoke the IPython shell:
$ ipython
One way to get IPython is in the Anaconda Python Distribution which is described on the Anaconda website as a "Completely free enterprise-ready Python distribution for large-scale data processing, predictive analytics, and scientific computing."

Another nice IPython feature is the ability to run a Python shell in your web browser:
$ ipython notebook
Read about it here:

Example Problem: Re-format a Text File of Data

Here's the problem, borrowing from The Unix Intro. Take a file, example_data.txt, that looks like this:
,height,weight,salary,age
1,106,111,111300,62
2,124,91,79740,40
3,127,176,15500,46
And make it look like this:
1       height  106
2       height  124
3       height  127
1       weight  111
2       weight  91
3       weight  176
1       salary  111300
2       salary  79740
3       salary  15500
1       age     62
2       age     40
3       age     46
Let's start by opening the file and reading its contents:
>>> with open('example_data.txt', "r") as f: contents = f.read()
Examine our variable contents:
>>> contents
',height,weight,salary,age\n1,106,111,111300,62\n2,124,91,79740,40\n3,127,176,15500,46\n'
Let's convert this string into a list, by splitting on the newline character:
>>> contents.split('\n')
[',height,weight,salary,age', '1,106,111,111300,62', '2,124,91,79740,40', '3,127,176,15500,46', '']
Now lop off the empty field at the end:
>>> contents.split('\n')[:-1]
[',height,weight,salary,age', '1,106,111,111300,62', '2,124,91,79740,40', '3,127,176,15500,46']
Use list comprehension to split the elements of this list on the comma character:
>>> [x.split(',') for x in contents.split('\n')[:-1]]
[['', 'height', 'weight', 'salary', 'age'], ['1', '106', '111', '111300', '62'], ['2', '124', '91', '79740', '40'], ['3', '127', '176', '15500', '46']]
Now let's use a trick that if A is a list of lists, you can perform a matrix transpose with zip(*A):
>>> zip(*[x.split(',') for x in contents.split('\n')[:-1]])
[('', '1', '2', '3'), ('height', '106', '124', '127'), ('weight', '111', '91', '176'), ('salary', '111300', '79740', '15500'), ('age', '62', '40', '46')]
Let's loop through this list:
>>> for j in zip(*[x.split(',') for x in contents.split('\n')[:-1]])[1:]: print(j)
('height', '106', '124', '127')
('weight', '111', '91', '176')
('salary', '111300', '79740', '15500')
('age', '62', '40', '46')
We can transform any given tuple as follows:
>>> k = ('height', '106', '124', '127')
>>> [str(y+1) + '\t' + k[0] + '\t' + z for y,z in enumerate(k[1:])]
['1\theight\t106', '2\theight\t124', '3\theight\t127']
This is using list comprehension to meld the first element of the tuple, height, to each subsequent element and add a numerical index, as well. Let's apply this to each tuple in our list, and join everything with a newline to finish the job:
>>> for j in zip(*[x.split(',') for x in contents.split('\n')[:-1]])[1:]:
...  print('\n'.join([str(y+1) + '\t' + j[0] + '\t' + z for y,z in enumerate(j[1:])]))
1       height  106
2       height  124
3       height  127
1       weight  111
2       weight  91
3       weight  176
1       salary  111300
2       salary  79740
3       salary  15500
1       age     62
2       age     40
3       age     46

Example Problem with Nested Dicts: Making a Multi-Dimensional Hash

I was recently given the following problem:
Consider a function incr_dict, which takes two arguments, which behaves like this in Python:
>>> dct = {} 
>>> incr_dict(dct, ('a', 'b', 'c')) 
>>> dct 
{'a': {'b': {'c': 1}}} 
>>> incr_dict(dct, ('a', 'b', 'c')) 
>>> dct 
{'a': {'b': {'c': 2}}} 
>>> incr_dict(dct, ('a', 'b', 'f')) 
>>> dct  
{'a': {'b': {'c': 2, 'f': 1}}} 
>>> incr_dict(dct, ('a', 'r', 'f')) 
>>> dct 
{'a': {'r': {'f': 1}, 'b': {'c': 2, 'f': 1}}} 
>>> incr_dict(dct, ('a', 'z')) 
>>> dct 
{'a': {'r': {'f': 1}, 'b': {'c': 2,'f': 1}, 'z': 1}} 
incr_dict(dct, ('a', 'b', 'c')) is conceptually like:
dct['a']['b']['c'] += 1
except that it creates any necessary intermediate and leaf nodes.
Here's my solution, after reading this Stackoverflow post:
debug = 0	# boolean (0 = quiet, 1 = verbose)

# from http://stackoverflow.com/questions/14692690/access-python-nested-dictionary-items-via-a-list-of-keys
def getFromDict(dataDict, mapList):    
    '''get a given value from a nested dictionary from keys (provided as a list)'''
    for k in mapList:
        dataDict = dataDict[k]
    return dataDict

# from http://stackoverflow.com/questions/14692690/access-python-nested-dictionary-items-via-a-list-of-keys
def setInDict(dataDict, mapList, value): 
    '''set a given value in a nested dictionary for keys (provided as a list)'''
    for k in mapList[:-1]: 
        dataDict = dataDict[k]
    dataDict[mapList[-1]] = value

def incr_dict(dataDict, mapList): 
    '''increment a given value in a nested dictionary for keys (provided as a list) or, 
    if entry doesnt exist, create and set to 1'''
    if (debug):
        print("starting list " + str(mapList))
        print("starting dict " + str(dataDict))

    # got to 1 before end
    for k in mapList[:-1]: 
        if (debug):
            print("list elt " + str(k))
            print("pre dict " + str(dataDict))

        # if key in dataDict, change dataDict to point to inner dict
        if k in dataDict:
            dataDict = dataDict[k]
        # else create key and make value empty dict, then change dataDict to point to empty dict
        else:
            dataDict[k] = {}
            dataDict = dataDict[k]

        if (debug):
            print("post dict " + str(dataDict))

    # now deal w last elt
    # if exists, increment 
    if mapList[-1] in dataDict:
        dataDict[mapList[-1]] += 1 
    # else create and set value to 1
    else:
        dataDict[mapList[-1]] = 1 

def main():
    # initialize empty dict
    dct = {}
    incr_dict(dct, ('a', 'b', 'c'))
    print("result")
    print(dct)
    incr_dict(dct, ('a', 'b', 'c'))
    print("result")
    print(dct)
    incr_dict(dct, ('a', 'b', 'f'))
    print("result")
    print(dct)
    incr_dict(dct, ('a', 'r', 'f'))
    print("result")
    print(dct)
    incr_dict(dct, ('a', 'z'))
    print("result")
    print(dct)
    # test long tuple
    incr_dict(dct, ('a', 'x', 'f', 'b', 'c', 'd', 'e', 'f', 'g', 'h'))
    print("result")
    print(dct)
    incr_dict(dct, ('a', 'x', 'f', 'b', 'c', 'd', 'e', 'f', 'g', 'i'))
    print("result")
    print(dct)

if __name__ == '__main__':
    main()

Easter Eggs

Read The Zen of Python by Tim Peters:
>>> import this
Open up this XKCD cartoon (Python 3+):
>>> import antigravity
Advertising

image


image


image