Understanding a genetic disease with the help of Bioinformatics
proposed by the SIB Swiss Institute of Bioinformatics
Insulin is a protein that allows sugar (glucose) to enter the body's cells
(mainly liver, adipose tissue and skeletal
. This hormone plays a key role in the regulation of glucose
levels in the blood ('hypoglycemic' effect). It is produced by the
beta cells in the pancreas.
Type I diabetes
(insulino dependent; IDDM) is more often
than not due to the absence of insulin: for
understood reasons (virus,
...), the pancreas is
no longer able to produce the
Type II diabetes
dependent; NIDDM) is a metabolic disease (insulin resistance). Obesity is
thought to be the primary cause of type II diabetes in people who are
genetically predisposed to the disease.
A very rare genetic variation - rs121908261
- leads to the the production of a non functional insulin and is the cause
of type I diabetes in a Norwegian family, (Molven et al., 2008
workshop will explore how bioinformatics can
help to better understand the causes of this rare genetic disorder
... and also to learn more
Activity 1: The insulin gene and the human genome
Bellow is a piece of the gene sequence that encodes for the insulin protein
- On which of our 23 chromosomes is this gene located?
Use the tool 'BLAT'
Technical information: 'BLAT' is a bioinformatics tool for
comparing a DNA sequence against the whole genome sequence (the human
genome has 3 billion nucleotides). If the sequence exists, BLAT finds the
sequence that is the most similar in just a few seconds. It's a bit like a
small 'google map' of the human genome.
* Copy the DNA sequence and paste it in the tool 'BLAT'
* Click on 'submit'
* In the page 'BLAT Search Result': choose the best score and click
- On which chromosome is located the gene for insulin?
- What are the beginning and end positions of the sequence on the
chromosome (nucleotide 'numbers')?
- For fun: write a random sequence (about 30 letters), always using
the 4-letter alphabet (a, t, g, c) into 'BLAT':
can you find it in the genome?
Activity 2: Comparing DNA sequences - Diagnosing
a rare genetic disease
About 1 nucleotide in 1000 differs from one person to another, and from one
genome to another. These differences are called variations or mutations.
Some have no effect on a person, while others may be associated with genetic
In 2008, scientists studied a Norwegian family in which several members had
diabetes (type I or type II) (Molven
et al., 2008
All diabetic type I members of the family carry the same rare variation in
the gene which encodes for insulin.
Here is the family's pedigree (phenotype and family relationship)
- Does this baby carry the mutation associated with diabetes?
To answer this question, researchers extracted DNA from 8 members of the
Norwegian family and sequenced part of the gene that encodes for insulin.
Compare these sequences, and locate the common variation for diabetes.
You can do it manually which will help you better understand
the principle of sequence
comparison and alignment.
into account all the given clues and play with our strips of DNA
Build an alignment of these 8 sequences using a bioinformatics tool and look
out for the common variation among those with diabetes
* Copy these 8 sequences (including the lines starting with '>1') and
paste them into the align
* Click on the Run Align
* On the results page, on the lefthand column 'Highlight': select
For those who are curious:
.... additional information on the family (Molven
et al., 2008
The subject (1)
with the c -> t
(R55C) mutation (heterozygous mutation) is a girl who presented frank
diabetes at the early age of 10. Her blood glucose level was of 17.6 mmol/l
– which is very high. The girl’s mother (3)
has type I diabetes that was diagnosed when she was 13. Currently, she is
being treated with insulin. She also carries the heterozygous mutation. The
girl’s maternal grandfather (6)
type 2 diabetes, which was diagnosed at the age of 40. He is currently being
treated with insulin. Neither he nor the healthy maternal grandmother carry
mutations. Thus, the girl’s mother is carrying a de novo
which must be a germline mutation since it has been inherited by her
C-peptides, or connecting peptides, are the peptides which connect the
insulin’s A chain to the B chain. Both carriers (mother and daughter) of the
c -> t (R55C) mutation have C-peptide levels in the normal range, which
suggests that some insulin is being processed and secreted. Currently, no
one really knows why the mother and daughter have severe insulin deficiency
despite evidence of insulin secretion.
of the gene
for insulin (with the c -> t (R55C) mutation site
.... The list
of known variants
(in red) in the insulin gene; note that many
variations are neither pathogenic, nor associated with diabetes.
Activity 3: DNA translation -> protein
Check the effect of the mutation
c -> t (R55C
Like all proteins,insulin is composed of a sequence of amino acids. The
order of the amino acids is determined by the nucleic acid sequence of the
3 letters of DNA (codon) correspond to one amino acid (symbolized by
letters: K for lysine, M for methionine, etc.).
This is a piece of the DNA sequence of the normal insulin gene.
aag acc cgc cgg gag
This is a piece of the DNA sequence of the insulin gene with the c ->
t variation, associated with type I diabetes.
aag acc tgc cgg gag
You could manually translate the nucleic acid sequences into amino acid
sequences ('1 'letter code) using the genetic code below: :
- Does the c->t mutation change the amino acid
sequence of insulin?
- Does the aag -> aaa mutation change the amino
acid sequence of insulin?
You can also use the bioinformatics tool 'Translate'
: The c -> t mutation in the insulin gene led to the
replacement of the amino acid R (arginine, cgc codon) by the amino acid C
(cysteine, cgt codon) at position 55:
This change prevents the insulin protein from being 'cut', a process which
is essential for insulin to be functional (Molven
et al., 2008
Insulin is cleaved by an enzyme called
The cleavage site recognized by insulinase
is very specific: a change in
the amino acid sequence
of the cleavage site (such as
the R55C mutation),
will prevent the protease from being active.
Activity 4: 3D structure of insulin
Since 1958, researchers have been able to crystallize proteins and then
'take a picture' of them by using X-rays. The results of these experiments
are then analyzed using bioinformatic programs which make it possible to view
the 3D structure of proteins such as insulin.
View the 3D structure of insulin
* Go to the PDB entry
* Select the 3D viewer 'Protein Workshop'.
A Jmol application will be launched
and you will be asked to accept it. Jmol is a viewer for chemical
structures in 3D.
The Jmol application requires Java
to be installed in your computer. Both programs are free.
* In Shortcuts: Recolor the backbone 'By compound' - and then look at the
positions of the different amino acids (mouse over)
* In Tools: 'Surfaces' play with the Transparency slider
* In Tools: 'Visibility', 'atoms and bonds', click on 'Chain A: Insulin" and
see the atoms (balls and sticks) that are displayed
* In Option: Reset - to go back to the original image
!!! If Java is not working: open the entry
at ePDB (EBI): click on '3D Visualisation'
For fun, here are the raw experimental data, the
spatial coordinates(X, Y, Z) of every atom in each amino of insulin
(search ATOM in the page)
There is no 3D structure
data for insulin with
the R55C mutation.
The amino acid sequence of a protein determines its shape and its function.
Here is a gallery of pictures
which will give you an idea of the relative sizes and shapes of different
proteins (enlarged x 3,000,000) (pdf
* Find the insulin among the different proteins and compare its size with
Activity 5: Is insulin specific to humans?
This is the full sequence of human insulin amino acid (in
- Is this protein specific to humans?
Do a 'BLAST'
against a database of proteins called UniProtKB
Technical information: BLAST is a bioinformatics tool that
compares the sequence of a protein with millions of other sequences
contained in a database. If they exist, it finds those that resemble a
given sequence the most within a few seconds. We can thus find out quickly
whether a protein exists in a given species, or not.
* Copy the sequence and paste it into the tool 'BLAST'
* Select 'Target Database = UniProtKB/Swiss-Prot
* Click on 'Run BLAST'
* Check the conservation of amino acids ('View alignment') and the
conservation of the disulfide bonds ('Highlight' 'disulfide bond', when
* Search on Google for images corresponding to the different Latin names of
the species (example 'Octodon
According to wikipedia
insulin is a very old protein that may have originated one billion years
Apart from animals, insulin-like proteins are also known to exist in Fungi
and Protista kingdoms.
* Select 'Target Database = ...Nematoda
or 'Target Database = ...Arthropoda
Starting from the following set
from different species
* Select different species (mammals, birds, fish; include
* Do a multiple alignment (Align
* Result page: 'Highlight Annotation
' and 'Natural
- look at the cystein conservation (involved in disulfide
- look at the conservation of the R55 amino acid .
- look at the 2 major conserved regions which correspond
to the A and B chains of mature insulin, respectively.
Introduction to phylogeny
You can also
compare the insulin sequences
of different species and sketch
a phylogenetic tree with PhiloPhylo (in French)
Activity 6: www.chromosomewalk.ch
is a virtual exhibition to (re) discover the world of genes, proteins and
From the list
of human chromosomes
: search for 'insulin'
Find out if you are real expert: quiz
- On which chromosome is located the insulin gene?
- What is the size of the chromosome (number of nucleotides and
length in centimeters)?
- How many genes are there on this chromosome?
- Some additional information on insulin (information)
To contact us... firstname.lastname@example.org