BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//talks.bham.ac.uk//v3//EN
BEGIN:VEVENT
CATEGORIES:Artificial Intelligence and Natural Computation se
minars
SUMMARY:One-shot Learning of Poisson Distributions - Infor
mation Theory of Audic-Claverie Statistic for Anal
ysing cDNA Arrays - Dr Peter Tino\, School of Comp
uter Science\, University of Birmingham
DTSTART:20100308T160000Z
DTEND:20100308T170000Z
UID:TALK312AT
URL:/talk/index/312
DESCRIPTION:It is of utmost importance for biologists to be ab
le to analyse patterns of expression levels of sel
ected genes in different tissues possibly obtained
under different conditions or treatment regimes.
Even subtle changes in gene expression levels can
be indicators of biologically crucial processes su
ch as cell differentiation and cell specialisation
. Measurement of gene expression levels can be per
formed either via hybridisation to microarrays\, o
r by counting gene tags (signatures) using e.g. Se
rial Analysis of Gene Expression (SAGE) or Massive
ly Parallel Signature Sequencing (MPSS) methodolog
ies.\n\nThe SAGE procedure results in a library of
short sequence tags\, each representing an expres
sed gene. The key assumption is that every mRNA co
py in the tissue has the same chance of ending up
as a tag in the library. Selecting a specific tag
from the pool of transcripts can be approximately
considered as sampling with replacement. The key s
tep in many SAGE studies is identification of `int
eresting' genes\, typically those that are differe
ntially expressed under different conditions/treat
ments. This is done by comparing the number of spe
cific tags found in the two SAGE libraries corresp
onding to different conditions or treatments.\n\nA
udic and Claverie were among the first to systemat
ically study the influence of random fluctuations
and sampling size on the reliability of digital ex
pression profile data. For a transcript representi
ng a small fraction of the library and a large num
ber N of clones\, the probability of observing x t
ags of the same gene will be well-approximated by
the Poisson distribution parametrised by its mean
(and variance) m>0\, where the unknown parameter m
signifies the number of transcripts of the given
type (tag) per N clones in the cDNA library.\n\nWh
en comparing two libraries\, it is assumed that un
der the null hypothesis of not differentially expr
essed genes the tag count x in one library comes f
rom the same underlying Poisson distribution as th
e tag count y in the other library. However\, each
SAGE library represents a single measurement only
! From a purely statistical standpoint resolving t
his issue is potentially quite problematic. One ca
n be excused for being rather sceptical about how
much can actually be learned about the underlying
unknown Poisson distribution from a single observa
tion.\n\nThe key instrument of the Audic-Claverie
approach is a distribution P over tag counts y in
one library informed by the tag count x in the oth
er library\, under the null hypothesis that the ta
g counts are generated from the same but unknown P
oisson distribution. P is obtained by Bayesian ave
raging (infinite mixture) of all possible Poisson
distributions with mixing proportions equal to the
posteriors (given x) under the flat prior over m.
\n\nWe ask: Given that the tag count samples from
SAGE libraries are *extremely* limited\, how usefu
l actually is the Audic-Claverie methodology? We r
igorously analyse the A-C statistic P that forms a
backbone of the methodology and represents our kn
owledge of the underlying tag generating process b
ased on one observation.\n\nWe show will that the
A-C statistic P and the underlying Poisson distrib
ution of the tag counts share the same mode struct
ure. Moreover\, the K-L divergence from the true u
nknown Poisson distribution to the A-C statistic i
s minimised when the A-C statistic is conditioned
on the mode of the Poisson distribution. Most impo
rtantly (and perhaps rather surprisingly)\, the ex
pectation of this K-L divergence never exceeds 1/2
bit! This constitutes a rigorous quantitative arg
ument\, extending the previous empirical Monte Car
lo studies\, that supports the wide spread use of
Audic-Claverie method\, even though by their very
nature\, the SAGE libraries represent very sparse
samples.\n\nFull paper: http://www.biomedcentral.c
om/1471-2105/10/310/
LOCATION:UG40\, School of Computer Science
CONTACT:Per Kristian Lehre
END:VEVENT
END:VCALENDAR