Notes: Django

Misc Django Notes
by Oliver; 2017
   
web
 

Introduction

Here are some unpolished notes about Django, the well-known backend framework written in Python. For these notes, I'm using Django version 1.11.1.

A Note About Webdev

A quick note about webdev. When you're creating a website, it's going to go through many iterations before the finished product (if, indeed, it ever finishes—many sites undergo slow, continuous evolution). You'll probably host your site on AWS, but you don't want your rough drafts to be visible to the public. The way I like to solve this issue is to develop and host the website on my local computer, a Mac, and port it to AWS once it's good enough. With git, this isn't too hard and it provides a nice division between your development site and your publication-ready site.

Database: Postgres

First things first. Django needs a database. Let's choose postgres.

Install Postgres and Start It

On Mac, install postgres:
$ brew install postgresql
Initialize a location where postgres stores its data:
$ initdb /path/postgres_data -E utf8 
You may need to do this:
$ createdb
Start or stop postgres:
$ pg_ctl -D /path/postgres_data -l /path/logfile start 
$ pg_ctl -D /path/postgres_data stop 
We must create a db for our Django project. Let's call our database myDB:
$ createdb myDB 
Check out your postgres processes:
$ ps -Af | grep postgres
If you have to nuke the db, in the event of a crisis, that's:
$ dropdb myDB
And you may also want to delete the migrations folder. We'll discuss migrations presently but according to the simple directory structure established below, that's:
$ rm -r myProject/sitebackend/migrations

The Postgres Shell

Open the postgres shell:
$ psql
Ditto, but attach to a particular db:
$ psql myDB
Ditto, but attach to a particular db as a particular user:
$ psql -d myDB -U myUserName
In the postgres shell, list databases, then connect to one:
=> \l
=> \connect myDB
Show tables, and how big they are:
=> \dt+
Show first 10 rows from mytable:
=> SELECT * FROM mytable LIMIT 10;
Show last 10 rows from mytable:
=> SELECT * FROM mytable ORDER BY id DESC LIMIT 10;
Get size of mytable:
=> SELECT COUNT(*) from mytable;

Starting your Django Project

Let's make an overarching directory for the project called mySite and go into it:
$ mkdir mySite
$ cd mySite
You'll want to use virtualenv so you don't mix up your Django-related Python packages and your globally installed Python packages. Also, let's be sure to use Python 3, not Python 2. Here we go:
$ virtualenv -p python3 venv
$ source venv/bin/activate
$ pip install Django==2.0 # pip install django
$ pip install psycopg2 # this is the Django postgres plugin
$ pip freeze > requirements.txt # record the packages we've installed
$ django-admin startproject myProject
Here's what our directory structure looks like so far:
mySite/
├── myProject
│   ├── manage.py
│   └── myProject
│       ├── __init__.py
│       ├── settings.py
│       ├── urls.py
│       └── wsgi.py
├── notes
└── venv
    ├── bin
    ├── include
    ├── lib
    └── pip-selfcheck.json

Modifying the Database in settings.py

We see that Django has created a settings.py, which is the configuration file for the project. In settings.py, change the default db from sqlite to your postgres db:
# DATABASES = {
#    'default': {
#        'ENGINE': 'django.db.backends.sqlite3',
#        'NAME': os.path.join(BASE_DIR, 'db.sqlite3'),
#    }
# }

# use postgres instead

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': 'myDB',
        'USER': 'myUserName',
        'PASSWORD': 'myPassword',
        'HOST': 'localhost',
        'PORT': '',
    }
}
Note: if you're using a public git repo, don't commit your settings file, because it contains a secret key!

Creating the Database Schema

Now that we've created our database with postgres's createdb command and linked to it in our settings.py file, we have to create the database schema. That's:
$ python manage.py migrate

Starting Git

Not using version control is not an option! Here are some standard commands to get git up and running.

Referring to the directory tree above, we're in the mySite/ directory. First, I like to make a .gitignore file that looks like this:
$ cat .gitignore
notes
*.pyc
settings.py
venv
migrations
Now start the repository:
$ echo "# myProject" >> README.md
$ git add .gitignore README.md requirements.txt
$ git commit -m 'first commit - add .gitignore, README, requirements.txt'
If you have an empty repository waiting on GitHub, hook it up:
$ git remote add origin git@github.com:myUserName/myProject.git
$ git push -u origin master

Starting an App within your Project

Follow the Django tutorial. Let's make an app called sitebackend:
$ python manage.py startapp sitebackend
Now our directory structure is looking something like this:
mySite/
├── README.md
├── notes
├── myProject
│   ├── manage.py
│   ├── notes
│   ├── myProject
│   │   ├── __init__.py
│   │   ├── __pycache__
│   │   ├── settings.py
│   │   ├── urls.py
│   │   └── wsgi.py
│   └── sitebackend
│       ├── __init__.py
│       ├── __pycache__
│       ├── admin.py
│       ├── apps.py
│       ├── migrations
│       ├── models.py
│       ├── tests.py
│       ├── urls.py
│       └── views.py
├── requirements.txt
└── venv
    ├── bin
    ├── include
    ├── lib
    └── pip-selfcheck.json
Follow the docs to modify the following files:

myProject/sitebackend/urls.py:
from django.conf.urls import url

from . import views

urlpatterns = [
    url(r'^$', views.index, name='index'),
]
Django 2.x has a slightly different syntax:
Django 2.x
from django.urls import path

from . import views

urlpatterns = [
    path('', views.index, name='index'),
]
myProject/sitebackend/views.py:
from django.shortcuts import render
from django.http import HttpResponse

def index(request):
    return HttpResponse("Hello, world")
myProject/myProject/urls.py:
from django.conf.urls import include, url
from django.contrib import admin

urlpatterns = [
    url(r'^home/', include('sitebackend.urls')),
    url(r'^admin/', admin.site.urls),
]
Django 2.x
from django.contrib import admin
from django.urls import path
from django.urls import include

urlpatterns = [
    path('home/', include('sitebackend.urls')),
    path('admin/', admin.site.urls),
]
Finally, we'll add our app to the INSTALLED_APPS list in our settings.py file:
INSTALLED_APPS = [
    'sitebackend.apps.SitebackendConfig',
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
]
In the next section we'll try to serve our initial Django site.

Running the Django Mini-Server

When your project goes into production, you'll want to use a proper server like nginx (see Setting up Django and your web server with uWSGI and nginx). However, you can test your project during development without the hassle of configuring nginx. Django comes bundled with a mini-server. Run it:
$ python manage.py runserver
By default this serves the page on port 8000. To, say, run on port 8001 instead:
$ python myProject/manage.py runserver localhost:8001
(That's via Stackoverflow: Django change default runserver port.) If you're on AWS EC2, don't forget to open security permissions on the port you want to access.

Creating your Models

Suppose we want to define a (biological) virus object. Here's an example models file, sitebackend/models.py:
from django.db import models

class Virus(models.Model):
    """full description of the viruses (nucleic acid info and taxonomy)"""
    name = models.CharField(max_length=70)
    # virus taxonomic identifier
    taxid = models.CharField(max_length=10)
    # DNA, RNA or RETRO
    nucleic1 = models.CharField(max_length=70)
    # ssDNA, dsDNA, (+)ssRNA, (-)ssRNA, dsRNA, RETRO)
    nucleic2 = models.CharField(max_length=70)
    # taxonomic info
    order = models.CharField(max_length=70)
    family = models.CharField(max_length=70)
    subfamily = models.CharField(max_length=70)
    genus = models.CharField(max_length=70)
    species = models.CharField(max_length=70)
Now we need to transmit this schema into our database:
$ python manage.py makemigrations sitebackend
$ # python manage.py sqlmigrate sitebackend 0001
$ python manage.py migrate

Loading Data into your Database

I often have data in text files and face the issue of importing that data into the database. One way to accomplish this is to write a loader script. I'll create a scripts/ directory in mySite/:
mySite/
├── README.md
├── notes
├── myProject
├── requirements.txt
├── scripts
│   ├── loader1.py
│   └── notes
└── venv
Suppose our text file looks like this:
#taxid  nucleic1        nucleic2        order   family  subfamily       genus   specie  name
568715  RNA     (+)ssRNA        nan     Astroviridae    nan     nan     nan     Astrovirus MLB1
683172  RNA     (+)ssRNA        nan     Astroviridae    nan     nan     nan     Astrovirus MLB2
1247114 RNA     (+)ssRNA        nan     Astroviridae    nan     nan     nan     Astrovirus MLB3
645687  RNA     (+)ssRNA        nan     Astroviridae    nan     nan     nan     Astrovirus VA1
(taxid stands for the taxonomic identifier—a number that uniquely identifies a species. Read more about it here.)

Then we could write a script loader1.py as follows:
import sys
sys.path.append('../myProject')
import django
django.setup()
from sitebackend.models import Virus

# input file has header:
# #taxid nucleic1 nucleic2 order family subfamily genus specie name

header = 1
for line in sys.stdin:
    if header:
        header = 0
        continue
    fields = line.strip().split("\t")
    v = Virus(name = fields[8],
        taxid = fields[0],
        nucleic1 = fields[1],
        nucleic2 = fields[2],
        order = fields[3],
        family = fields[4],
        subfamily = fields[5],
        genus = fields[6],
        species = fields[7]
    )
    v.save()
Now we can run it as follows:
$ cd scripts
$ export DJANGO_SETTINGS_MODULE=myProject.settings
$ cat file.txt | python ./loader1.py

Dealing with Foreign Keys

Let's suppose we add a ViralProtein class to our models:
class ViralProtein(models.Model):
    """class for viral proteins"""
    # the virus (foreign key) 
    virus = models.ForeignKey(Virus, on_delete=models.CASCADE)
    # a description of the viral protein
    geneproteindescrip = models.CharField(max_length=150)
Every virus is comprised of multiple viral proteins so the relationship between virus and viral protein is one-to-many. Let's suppose we have a tab-delimited file, viral_proteins.txt, such that the first column is the taxid and the second column is a description. Then we could write a script loader2.py as follows:
import sys
sys.path.append('../myProject')

import django
django.setup()

from sitebackend.models import Virus
from sitebackend.models import ViralProtein

# load viral proteins information into database

# example usage:
# cat viral_proteins.txt | python ./loader2.py

# delete all preexisting entries in table
ViralProtein.objects.all().delete()

for line in sys.stdin:
    fields = line.strip().split("\t")
    mytaxid = int(fields[0])
    try:
        # get the virus with the matching taxid
        myvirus = Virus.objects.get(taxid = mytaxid)
        vp = ViralProtein(virus = myvirus,
            geneproteindescrip = fields[1],
        )
        vp.save()
    except:
        print("Taxid not found:")
        print(mytaxid)
We'd run this script as:
$ cat viral_proteins.txt | python ./loader2.py

Performance Issues

I tried using the above style scripts, looping over thousands of files to load millions of elements into my postgres db. It was painfully slow. I found the solution to this problem in the following StackOverflow posts: Let's say you have a Django model called MyObject. Instead of running:
MyObjectInstance.save()
every time you create a new instance of MyObject, create a big list of object instances. Then, as these posts suggest, use the bulk_create() method:
MyObject.objects.bulk_create(myobjectlist)

Loading Data into your Database from Fixtures

Another way to load your data is via fixtures, which you can read about here. You can make a directory, e.g., here:
$ mkdir -p myProject/sitebackend/fixtures
(the docs say: "By default, Django looks in the fixtures directory inside each app for fixtures") and throw a file of JSON data in the directory. For example, suppose we have cancer objects in our models.py. Then our fixture might look like this:

myCancerData.json:
[
  {"fields": {"name": "Gastric cancer"}, "pk": 1, "model": "sitebackend.Cancer"}, 
  {"fields": {"name": "Colorectal cancer"}, "pk": 2, "model": "sitebackend.Cancer"}, 
  {"fields": {"name": "Glioma"}, "pk": 3, "model": "sitebackend.Cancer"}
]
pk is the primary key.

Dealing with Foreign Keys

Suppose we have another database table that links to our cancer table via foreign keys. How do we express that with fixtures? The answer is to use the cancer object's pk to link it. For example, suppose we have patient objects and each patient is associated with a particular cancer. Then our patient fixture might look like this:
[
  {"fields": {"patientid": 53, "study": 1, "cancer": 1}, "pk": 330, "model": "sitebackend.Patient"}, 
  {"fields": {"patientid": 89, "study": 1, "cancer": 2}, "pk": 227, "model": "sitebackend.Patient"}, 
  {"fields": {"patientid": 66, "study": 1, "cancer": 1}, "pk": 19, "model": "sitebackend.Patient"}
]
This captures the relationship that the patient with pk == 227 has Colorectal cancer.

We still haven't loaded the data in the database. To do that, run:
$ python manage.py loaddata myCancerData.json
manage.py loaddata is (pardon the language) finicky as fuck—i.e., the opposite of robust. I discovered the following super-annoying "gotchas":
  • using single quotes not double quotes throws an error
  • a trailing comma at the end of the file ( },] as opposed to }] ) throws an error
  • loading 500,000 objects threw a mystery error; 250,000 objects was ok
Also, it should be noted, if your data is in text files, you'll still have to write a script. Only this time it will be to transform your text file into JSON format.

The Django REST framework

I like to use the Django REST framework. The point of this is to make your backend a lean, JSON-serving API and take care of the all the front-end rendering with a javascript framework, like Angular or Vue. These javascript frameworks will digest your JSON and deal with it in a more elegant and interactive fashion than Django. You thus save yourself from having to use Django's templating engine and are easily set up to build a SPA ("single page application").

Follow their docs to install it:
$ pip install djangorestframework
then add it to your INSTALLED_APPS list in settings.py:
INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'sitebackend.apps.SitebackendConfig',
    'rest_framework',
]
Now we're going to take inspiration from this tutorial: http://www.django-rest-framework.org/tutorial/2-requests-and-responses/.

Edit sitebackend/views.py to be:
from django.shortcuts import render
from django.http import HttpResponse

# http://www.django-rest-framework.org/tutorial/2-requests-and-responses/
from rest_framework import status
from rest_framework.decorators import api_view
from rest_framework.response import Response

from sitebackend.models import Virus

def index(request):
    return HttpResponse("Hello, world")

@api_view(['GET',])
def get_virus_all(request):
    """
    Get list of virus objects

    Sample output:
    GET /virus
    [
        {
            "order": "nan",
            "species": "Adeno-associated dependoparvovirus A",
            "taxid": "10804",
            ...
        },
        ...
    ]
    """

    # return Response([i.__dict__ for i in Virus.objects.all()[0:10]])

    res = []
    for i in Virus.objects.all():
        i.__dict__.pop('_state', None)
        res.append(i.__dict__)

    return Response(res)
The reason I'm deleting the _state key is that it throws an error if you don't:
<django.db.models.base.ModelState object at ... > is not JSON serializable
Now we're going to hook this function up to the appropriate URL. Edit sitebackend/urls.py to be:
from django.conf.urls import url

from . import views

urlpatterns = [
    url(r'^$', views.index, name='index'),
    url(r'^virus/$', views.get_virus_all),
]
Django 2.x
from django.urls import path

from . import views

urlpatterns = [
    path('', views.index, name='index'),
    path('virus/', views.get_virus_all),
]
The result is a svelte, JSON-serving back-end! Here's what it looks like in the browser:

image

Now your front-end javascript framework can crunch this data and go wild with it—filtering it, populating menus, etc.

The Django Shell

The django shell is ideal for testing database queries. Fire it up:
$ python manage.py shell
Let's suppose we've defined a "sample" class in models.py, and our database is populated with sample objects. Here these samples represent biopsies from cancer tissue. Your particular project might have user objects or article objects or whatever, but no matter.

Get all sample objects:
In [1]: from sitebackend.models import Sample

In [2]: Sample.objects.all()
Out[2]: <QuerySet [<Sample: Sample object>, <Sample: Sample object>, <Sample: Sample object>, <Sample: Sample object>, <Sample: Sample object>, <Sample: Sample object>, <Sample: Sample object>, <Sample: Sample object>, <Sample: Sample object>, <Sample: Sample object>, <Sample: Sample object>, <Sample: Sample object>, <Sample: Sample object>, <Sample: Sample object>, <Sample: Sample object>, <Sample: Sample object>, <Sample: Sample object>, <Sample: Sample object>, <Sample: Sample object>, <Sample: Sample object>, '...(remaining elements truncated)...']>
Get the first sample object:
In [1]: Sample.objects.get(id = 1)
Out[1]: <Sample: Sample object>
To get the dictionary representation of the first sample object, we can look at the object's __dict__ attribute:
In [1]: Sample.objects.get(id = 1).__dict__
Out[1]:
{'id': 1,
 'patient_id': 1,
 'sampleid': 'P1.T'}
Note the difference between the .get and .filter methods: .get is used when you expect one result and will return something of the object type you're querying; while .filter can return more than one object and thus will yield something of the QuerySet type. Here's .filter:
In [1]: MyGene.objects.filter(name = 'TP53')
Out[1]: <QuerySet [<MyGene: MyGene object>]>

In [2]: type(MyGene.objects.filter(name = 'TP53'))
Out[2]: django.db.models.query.QuerySet
Here's .get:
In [3]: MyGene.objects.get(name = 'TP53')
Out[3]: <MyGene: MyGene object>

In [4]: type(MyGene.objects.get(name = 'TP53'))
Out[4]: sitebackend.models.MyGene

The Django Shell, Part II

Let's demonstrate some more complicated—albeit still simple—queries in the Django shell. Suppose we have virus objects comprised of viral protein objects. Let's arbitrarily look at the sixth viral protein in our database:
>>> ViralProtein.objects.all()[5].__dict__
{'_state': <django.db.models.base.ModelState object at 0x17cd18253>, 'id': 40221, 'virus_id': 1025, 'geneproteindescrip': 'product=E2protein#3#;structuralpolyprotein;', 'geneproteinids': 'NC_001786;NP_054024.1;', 'chrposition': 'nan', 'name': 'E2 protein#3#'}
We can get a single attribute with either of these two styles:
>>> ViralProtein.objects.all()[5].name
'E2 protein#3#'
>>> ViralProtein.objects.all()[5].__dict__['name']
'E2 protein#3#'
Which virus is this viral protein object associated with?
>>> Virus.objects.get(id=1025)
<Virus: Virus object>
Let's examine this object:
>>> Virus.objects.get(id=1025).__dict__
{'_state': <django.db.models.base.ModelState object at 0x1a5599a30>, 'id': 1025, 'name': 'Barmah Forest virus', 'taxid': '11020', 'nucleic1': 'RNA', 'nucleic2': '(+)ssRNA', 'order': 'nan', 'family': 'Togaviridae', 'subfamily': 'nan', 'genus': 'Alphavirus', 'species': 'nan'}
Now suppose we want all viral proteins associated with this virus:
>>> ViralProtein.objects.filter(virus_id = 1025)
<QuerySet [<ViralProtein: ViralProtein object>, <ViralProtein: ViralProtein object>, <ViralProtein: ViralProtein object>, <ViralProtein: ViralProtein object>, <ViralProtein: ViralProtein object>, <ViralProtein: ViralProtein object>, <ViralProtein: ViralProtein object>, <ViralProtein: ViralProtein object>, <ViralProtein: ViralProtein object>, <ViralProtein: ViralProtein object>, <ViralProtein: ViralProtein object>]>
What if we want the names of all these viral protein objects?
>>> [i.name for i in ViralProtein.objects.filter(virus_id = 1025)]
['E2 protein#3#', 'nsP1#1#', 'NP_597797.2 full_polyprotein 1..2411', 'YP_006491241.1 excised_polyprotein 1..742', 'nsP2#2#', 'transframe fusion protein#1#', 'C protein#1#', 'E1 protein#5#', '6K protein#4#', 'nsP3#3#', 'E3 protein#2#']
As the Django docs say, there's a neat syntax for "Lookups that span relationships". For example, how many viral protein records in my database correspond to RNA viruses?
>>> len(ViralProtein.objects.filter(virus__nucleic1 = 'RNA'))
4858
You can chain filters like unix pipes. How many viral proteins records in my database correspond to RNA viruses of genus Alphavirus?
>>> len(ViralProtein.objects.filter(virus__nucleic1 = 'RNA').filter(virus__genus='Alphavirus'))
128
How many viral proteins records in my database correspond to RNA viruses of genus Alphavirus with taxids greater than or equal to 15000?
>>> len(ViralProtein.objects.filter(virus__nucleic1 = 'RNA').filter(virus__genus='Alphavirus').filter(virus__taxid__gte=15000))
28
And so on!

Serving Django with nginx

As noted above, refer to: Setting up Django and your web server with uWSGI and nginx. I also have a post on the subject here, which closely mirrors the above link.

One of the first steps that page mentions is to install the development version of Python, and then to install uwsgi:
pip install uwsgi
Eventually, you have to start messing around with the nginx config file. Here are some sample ngnix commands on my system (Amazon Linux):
$ sudo /etc/init.d/nginx start # start it
$ sudo /etc/init.d/nginx stop # stop it
$ sudo /etc/init.d/nginx restart
$ sudo nginx -t # test the config file syntax and print its path

Miscellaneous

Check your Django version, via StackOverflow:
$ python -c "import django; print(django.get_version())"
Update your version of Django, via The Docs:
$ python -m pip install -U Django
If you're upgrading Django, you might need to upgrade your other packages, as well. See: StackOverflow: How to upgrade all Python packages with pip?

Access-Control-Allow-Origin errors with your frontend framework? Install django-cors-headers (see Wikipedia: Same-origin policy; Wikipedia: Cross-origin resource sharing).

Enable simple password protection via nginx: see Restricting Access with HTTP Basic Authentication.
Advertising

image


image