%%html
<script src="http://bits.csb.pitt.edu/asker.js/lib/asker.js"></script>
<script>
$3Dmolpromise = new Promise((resolve, reject) => {
require(['https://3dmol.org/build/3Dmol-nojquery.js'], function(){
resolve();});
});
require(['https://cdnjs.cloudflare.com/ajax/libs/Chart.js/2.2.2/Chart.js'], function(Ch){
Chart = Ch;
});
$('head').append('<link rel="stylesheet" href="http://bits.csb.pitt.edu/asker.js/themes/asker.default.css" />');
//the callback is provided a canvas object and data
var chartmaker = function(canvas, labels, data) {
var ctx = $(canvas).get(0).getContext("2d");
var dataset = {labels: labels,
datasets:[{
data: data,
backgroundColor: "rgba(150,64,150,0.5)",
fillColor: "rgba(150,64,150,0.8)",
}]};
var myBarChart = new Chart(ctx,{type:'bar',data:dataset,options:{legend: {display:false},
scales: {
yAxes: [{
ticks: {
min: 0,
}
}]}}});
};
$(".jp-InputArea .o:contains(html)").closest('.jp-InputArea').hide();
</script>
%%html
<div id="secstruct" style="width: 500px"></div>
<script>
$('head').append('<link rel="stylesheet" href="http://bits.csb.pitt.edu/asker.js/themes/asker.default.css" />');
var divid = '#secstruct';
jQuery(divid).asker({
id: divid,
question: "Which is not an example of protein secondary structure??",
answers: ["Twist","Helix","Sheet","Coil"],
server: "http://bits.csb.pitt.edu/asker.js/example/asker.cgi",
charter: chartmaker})
$(".jp-InputArea .o:contains(html)").closest('.jp-InputArea').hide();
</script>
ProDy stands for Protein Dynamics,
ProDy is designed for normal mode analysis, but also is
This lecture contains:
In interactive sessions, the following is safe (no name conflicts)
from prody import *
checkUpdates()
When developing software, prefer importing specific functions/classes or prefixing
import prody as pd
from prody import parsePDB
ProDy functions start with an action verb followed by one or more names or an abbreviation/extension, i.e. doSomething:
parseEXT()
: parse a file in EXT format, e.g. parsePDB
, parseDCD
writeEXT()
: write a file in EXT format, e.g. writePDB
, writeDCD
fetchSTH()
: download a file, e.g. fetchPDB
, fetchMSA
calcSTH()
: calculate something, e.g. calcRMSD
, calcGyradius
, calcANM
showSTH()
: show a plotting of something, e.g. showCrossCorrelations
, showProtein
saveSTH()
: save a ProDy object instance to disk, e.g. saveAtoms
loadSTH()
: save a ProDy object instance to disk, e.g. loadAtoms
Class names start with upper case letters and are abbreviations or words in camel case style, e.g. AtomGroup
, Selection
, ANM
, etc.
Class method naming is similar to function naming:
Class.numSomething
: return number of the thing, e.g. AtomGroup.numAtoms
returns number of atomsClass.getSomething
: return an attribute, e.g. AtomGroup.getNames
returns atom namesClass.setSomething
: set/update an attribute, e.g. AtomGroup.setNames
sets atom namesClass.buildMatrix
: builds a matrix, e.g. PCA.buildCovariance
calculates covariance matrixubi = parsePDB('1ubi')
ubi
<AtomGroup: 1ubi (683 atoms)>
TIP: to see a list of available functions, just type the action word (calc, show, get, etc.), and use TAB completion.
ubi.numAtoms()
683
calcGyradius(ubi)
12.085104173005442
%matplotlib inline
showProtein(ubi)
/home/anupam06/anaconda3/lib/python3.9/site-packages/prody/proteins/functions.py:265: MatplotlibDeprecationWarning: Axes3D(fig) adding itself to the figure is deprecated since 3.4. Pass the keyword argument auto_add_to_figure=False and use fig.add_axes(ax) to suppress this warning. The default value of auto_add_to_figure will change to False in mpl3.5 and True values will no longer work in 3.6. This is consistent with other Axes classes. show = Axes3D(cf)
<Axes3D:xlabel='x', ylabel='y'>
import py3Dmol
showProtein(ubi)
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
<py3Dmol.view at 0x7f4561276ca0>
%%html
<div id="atomcnt" style="width: 500px"></div>
<script>
$('head').append('<link rel="stylesheet" href="http://bits.csb.pitt.edu/asker.js/themes/asker.default.css" />');
var divid = '#atomcnt';
jQuery(divid).asker({
id: divid,
question: "How many atoms does PDB 3ERK have?",
answers: ["1023","2849","3016","4872"],
server: "http://bits.csb.pitt.edu/asker.js/example/asker.cgi",
charter: chartmaker})
$(".jp-InputArea .o:contains(html)").closest('.jp-InputArea').hide();
</script>
ProDy has a custom format for directly saving atom groups.
saveAtoms(ubi)
'1ubi.ag.npz'
Can also read/write standard protein formats (PDB,PQR)
writePDB('ubi.pdb', ubi)
'ubi.pdb'
parsePDB(writePDB('ubi.pdb', ubi))
<AtomGroup: ubi (683 atoms)>
AtomGroups
¶Protein Data Bank (PDB) Structure files can be parsed using parsePDB
. For a given PDB identifier, this function will download the file if needed.
ag = parsePDB('1vrt')
Structure data parsed from the file is stored in an AtomGroup
instance:
type(ag)
prody.atomic.atomgroup.AtomGroup
Why AtomGroup
?
Molecule
, because structures are usually made up from multiple moleculesStructure
, because PDB format is sometimes used for storing small-moleculesAtomGroup
made sense for handling bunch of atoms, and is used by some other packages too.
AtomGroup
methods¶Check number of atoms and models (coordinate sets)
ag.numAtoms()
7953
ag.numCoordsets()
1
Use getSomething
methods to access data parsed from the file
names = ag.getNames()
names
array(['N', 'CA', 'C', ..., 'O', 'O', 'O'], dtype='<U6')
len(names)
7953
print(names[0])
N
ag.getBetas()
array([86.04, 85.23, 84.2 , ..., 64.55, 58.23, 46.98])
Note that get
is followed by a plural name, and method returns a numpy array
Use setSomething
methods to set attributes, but don't forget to pass an array or list with length equal to number of atoms
zeros = [0] * len(ag) # same as ag.numAtoms()
ag.setBetas(zeros)
ag.getBetas()
array([0., 0., 0., ..., 0., 0., 0.])
Atom
instances¶You can get a handle for specific atoms by indexing
a0 = ag[0]
print(a0)
Atom N (index 0)
Note that getSomething
and setSomething
methods are available, but that thing is in singular form:
a0.getName()
'N'
a0.getBeta() # we had just set it to zero
0.0
Slicing an AtomGroup
will return a handler called Selection
for specified subset of atoms. Let's get every other atom:
eoa = ag[::2] # eoa: every other atom
print(eoa)
Selection 'index 0:7953:2'
print(len(eoa))
print(len(ag))
3977 7953
get
and set
methods in plural form are also available for Selection
instances
eoa.getNames()
array(['N', 'C', 'CB', ..., 'O', 'O', 'O'], dtype='<U6')
Macromolecules have a hierarchical structure:
ag.numChains()
2
ag.numResidues()
1233
showProtein(ag)
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
<py3Dmol.view at 0x7f4560066f10>
for ch in ag.iterChains():
print('%s - %d residues' % (str(ch), ch.numResidues()))
Chain A - 688 residues Chain B - 545 residues
for ch in ag.iterChains():
print(ch)
for res in ch:
print(' |-%s' % res)
break
print('...')
Chain A |-PRO 4 ... Chain B |-ILE 5 ...
Also iterAtoms
(default), iterBonds
, iterCoordsets
, iterFragments
, iterResidues
, iterSegments
Index with chain identifier to get a chain instance:
chA = ag['A']
print(chA)
type(chA)
Chain A
prody.atomic.chain.Chain
Index with a pair of chain identifier and residue number to get a handle for a residue:
chA_res10 = ag['A', 10]
print(chA_res10)
type(chA_res10)
VAL 10
prody.atomic.residue.Residue
Of course, plural forms of get
and set
methods and other methods are available:
chA_res10.getNames()
array(['N', 'CA', 'C', 'O', 'CB', 'CG1', 'CG2'], dtype='<U6')
chA_res10.numAtoms()
7
%%html
<div id="resname" style="width: 500px"></div>
<script>
$('head').append('<link rel="stylesheet" href="http://bits.csb.pitt.edu/asker.js/themes/asker.default.css" />');
var divid = '#resname';
jQuery(divid).asker({
id: divid,
question: "What is residue 12 of 3ERK?",
answers: ["VAL","ALA","PHE","ARG"],
server: "http://bits.csb.pitt.edu/asker.js/example/asker.cgi",
charter: chartmaker})
$(".jp-InputArea .o:contains(html)").closest('.jp-InputArea').hide();
</script>
The atom selection engine is what makes ProDy a powerful tool. Selection grammar is similar to that of VMD but with added flexibility of Python. AtomGroup.select
method accepts selection arguments and returns a Selection
instance:
sel = ag.select('protein and name CA')
print(sel)
print('# of atoms: %d' % sel.numAtoms())
set(sel.getNames())
Selection 'protein and name CA' # of atoms: 925
{'CA'}
Same as using the keyword 'ca' or 'calpha'
ag.select('ca') == ag.select('calpha') == ag.select('protein and name CA')
True
You can make proximity based selections:
kinase = parsePDB('3erk')
bindingsite = kinase.select('within 5 of (hetero and not water)')
print('# of atoms: %d' % bindingsite.numAtoms())
print(set(bindingsite.getResnames()))
# of atoms: 98 {'SB4', 'GLN', 'LEU', 'ASP', 'CYS', 'LYS', 'VAL', 'ILE', 'MET', 'SER', 'ALA', 'HOH'}
exwithin
excludes the initial selection
noligand = kinase.select('exwithin 5 of (hetero and not water)')
print('# of atoms: %d' % noligand.numAtoms())
print(set(noligand.getResnames()))
# of atoms: 73 {'GLN', 'LEU', 'ASP', 'CYS', 'LYS', 'VAL', 'ILE', 'MET', 'SER', 'ALA', 'HOH'}
%%html
<div id="withincnt" style="width: 500px"></div>
<script>
$('head').append('<link rel="stylesheet" href="http://bits.csb.pitt.edu/asker.js/themes/asker.default.css" />');
var divid = '#withincnt';
jQuery(divid).asker({
id: divid,
question: "How many non-water protein atoms are there within 5A of SB4?",
answers: ["67","70","73","78"],
server: "http://bits.csb.pitt.edu/asker.js/example/asker.cgi",
charter: chartmaker})
$(".jp-InputArea .o:contains(html)").closest('.jp-InputArea').hide();
</script>
Words used within the selection string that are not reserved keywords can be substituted with keyword arguments.
Select atoms that are close to a point in space:
import numpy as np
origin = np.zeros(3)
print(origin)
sel = ag.select('within 10 of origin', origin=origin)
sel
[0. 0. 0.]
<Selection: 'index 3426 to 3437 3439 to 3461' from 1vrt (35 atoms)>
calcDistance(sel, origin)[:5]
array([9.93856378, 9.16952812, 9.08634382, 9.79169985, 8.45576029])
ag.select('within 5 of center',center = calcCenter(ag))
<Selection: 'index 4457 to 4...49 to 7353 7941' from 1vrt (19 atoms)>
Dot operator can also be used to make selections:
ag.calpha
<Selection: 'calpha' from 1vrt (925 atoms)>
ag.name_CA_and_resname_ALA
<Selection: 'name CA and resname ALA' from 1vrt (37 atoms)>
See full documentation of selection grammar at: http://prody.csb.pitt.edu/manual/reference/atomic/select.html
AtomGroup
can handle multiple models in a PDB file. Models are called coordinate sets.
ubi = parsePDB('2k39')
ubi.numCoordsets()
116
Active coordinate set (ACS) can be queried or changed using getACSIndex
and setACSIndex
methods:
ubi.getACSIndex()
0
Coordinates are numpy
arrays
coords = ubi.getCoords()
coords.mean(axis=0)
array([26.02865475, 25.60081316, 20.56833875])
ubi.setACSIndex(115)
ubi.getCoords().mean(axis=0)
array([26.49875061, 26.09516897, 20.5057498 ])
Selection
s have their own active coordinate set indices:
ubi_ca = ubi.calpha
ubi_ca.getACSIndex()
115
ubi_ca.setACSIndex(0)
print(ubi_ca.getACSIndex())
print(ubi.getACSIndex())
0 115
Call copy
method to make a copy of AtomGroup
or Selection
instance
ag.copy()
<AtomGroup: 1vrt (7953 atoms)>
ca = ag.select('ca')
ca.copy()
<AtomGroup: 1vrt Selection 'ca' (925 atoms)>
AtomGroup
addition¶Lot's of customization has gone into ProDy classess handling atoms. Addition of AtomGoup
s (__add__
) results in a new AtomGroup
:
ag_copy = ag.copy()
moveAtoms(ag_copy, by=np.array([50, 50, 50]))
ag_copy['A'].setChid('C')
ag_copy['B'].setChid('D')
ag_new = ag + ag_copy
showProtein(ag_new)
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
<py3Dmol.view at 0x7f4560113be0>
AtomGroup
membership¶You can use in
to see whether a Selection
is in another one (enabled by implementing Selection.__contains__()
method):
ag.calpha in ag.backbone
True
ag.backbone in ag.protein
True
ag.water in ag.protein
False
ag.water in ag
True
Selection
operations¶You can use bitwise operations on selections as follows:
chA = ag.chain_A
chB = ag.chain_B
print(chA & chB) # intersection
None
sel = chA | chB # union
sel.getSelstr()
'(chain A) or (chain B)'
~chA #not
<Selection: 'not (chain A)' from 1vrt (3461 atoms)>
AtomGroup
¶You can store arbitrary atomic data in AtomGroup
class and make it persistent on disk:
atoms = parsePDB('1p38')
atoms.getResnums()
array([ 4, 4, 4, ..., 771, 773, 776])
resnum_fract = atoms.getResnums() / 10.
resnum_fract
array([ 0.4, 0.4, 0.4, ..., 77.1, 77.3, 77.6])
atoms.setData('resnumfract', resnum_fract)
atoms.getData('resnumfract')
array([ 0.4, 0.4, 0.4, ..., 77.1, 77.3, 77.6])
saveAtoms(atoms)
'1p38.ag.npz'
atoms = loadAtoms('1p38.ag.npz')
atoms.getData('resnumfract')
array([ 0.4, 0.4, 0.4, ..., 77.1, 77.3, 77.6])
Can request multiple files at once. Will download to local directory.
fetchPDB('1p38', '1r39', '@!~#') # searches working directory, local resources, then downloads
['/home/anupam06/Documents/mmbios_workshop/day2/1p38.pdb.gz', '/home/anupam06/Documents/mmbios_workshop/day2/1r39.pdb.gz', None]
fetchPDBviaHTTP('1p38')
fetchPDBviaFTP('1p38')
p38 = parsePDB('1p38')
bound = parsePDB('1zz2')
showProtein(p38, bound)
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
<py3Dmol.view at 0x7f45600d7e50>
matchChains
tries to match chains by sequence. Returns a list of all pairwise matches
apo_chA, bnd_chA, seqid, overlap = matchChains(p38, bound)[0]
print(bnd_chA)
print(seqid)
print(overlap)
AtomMap Chain A from 1zz2 -> Chain A from 1p38 99.5475113122172 92.08333333333333
Match is alpha carbon only
print(len(apo_chA),len(bnd_chA),len(p38),len(bound))
337 337 2962 2872
set(apo_chA.getNames())
{'CA'}
calcRMSD(apo_chA, bnd_chA)
72.93023086946586
bnd_chA, transformation = superpose(bnd_chA, apo_chA)
calcRMSD(bnd_chA, apo_chA)
1.8628014908695485
showProtein(p38, bound)
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
<py3Dmol.view at 0x7f4560090be0>
PDB files often contain monomer coordinates, when the structure of the multimer can be obtained using symmetry operations. Let’s use some ProDy function to build tetrameric form of KcsA potassium channel:
# parse the monomer structure and PDB file header information
# the header includes transformations for forming tetramer
monomer, header = parsePDB('1k4c', header=True)
monomer
<AtomGroup: 1k4c (4534 atoms)>
showProtein(monomer, legend=True)
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
<py3Dmol.view at 0x7f456119ca60>
without_K = monomer.not_name_K
without_K
<Selection: 'not name K' from 1k4c (4527 atoms)>
tetramer = buildBiomolecules(header, without_K)
tetramer
<AtomGroup: 1k4c Selection 'not name K' biomolecule 1 (18108 atoms)>
showProtein(tetramer)
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
<py3Dmol.view at 0x7f456119cc10>