Epidemic Spreading in Real Networks An Eigenvalue Viewpoint, Hacking and IT E-Book Dump Release

[ Pobierz całość w formacie PDF ]
Epidemic Spreading in Real Networks: An Eigenvalue Viewpoint
Yang Wang, Deepayan Chakrabarti, Chenxi Wang
, Christos Faloutsos
†
Carnegie Mellon University
5000 Forbes Avenue, Pittsburgh, PA, 15213
yangwang, deepayan, chenxi, christos@andrew.cmu.edu
Abstract
1. Introduction
How will a virus propagate in a real network?
Does an epidemic threshold exist for a nite power-
law graph, or any nite graph? How long does it
take to disinfect a network given particular values
of infection rate and virus death rate?
We answer the rst question by providing equa-
tions that accurately model virus propagation in
any network including real and synthesized network
graphs. We propose a general epidemic thresh-
old condition that applies to arbitrary graphs: we
prove that, under reasonable approximations, the
epidemic threshold for a network is closely related
to the largest eigenvalue of its adjacency matrix.
Finally, for the last question, we show that infec-
tions tend to zero exponentially below the epidemic
threshold.
We show that our epidemic threshold model
subsumes many known thresholds for special-case
graphs (e.g., Erdos-Renyi, BA power-law, homoge-
neous); we show that the threshold tends to zero for
innite power-law graphs. Finally, we illustrate the
predictive power of our model with extensive experi-
ments on real and synthesized graphs. We show that
our threshold condition holds for arbitrary graphs.
Computer viruses remain a signicant threat to
today’s networks and systems. Existing defense
mechanisms typically focus on local scanning of
virus signatures. While these mechanisms can de-
tect and prevent the spreading of known viruses,
they do little for globally optimal defenses. The
recent proliferation of malicious code that spreads
with virus code exacerbates the problem [10, 24, 25].
From a network dependability standpoint, the prop-
agation of malicious code represents a particular
form of fault propagation, which may lead to the ul-
timate demise of the network (consider distributed
denial-of-service attacks). With the exception of a
few specialized modeling studies [7, 8, 16, 19, 26],
much still remains unknown about the propagation
characteristics of computer viruses and the factors
that inuence them.
In this paper, we investigate epidemiological
modeling techniques to reason about computer vi-
ral propagation. Kephart and White [7, 8] are
among the rst to propose epidemiology-based an-
alytic models. Their studies, however, are based
on topologies that do not represent modern net-
works. Staniford et al. [23] reported a study of the
Code Red worm propagation, but did not attempt
to create an analytic model. The more recent stud-
ies by Pastor-Satorras et al. [16, 17, 18, 19, 20] and
Barabasi et al. [2, 4] focused on epidemic models
for power-law networks.
This work aims to develop a general analytic
model of virus propagation. Specically, we are in-
terested in models that capture the impact of the
underlying topology but are not limited by it. We
found that, contrary to prior beliefs, viral propaga-
tion is largely determined by intrinsic characteris-
tics of the network. Our model holds for arbitrary
graphs and renders surprisingly simple but accurate
predictions.
This work is partially supported by the National Science
Foundation under Grant No. CCR-0208853 and a grant from
NIST.
†
This work is partially supported by the National Science
Foundation under Grants No. IIS-9817496, IIS-9988876, IIS-
0083148, IIS-0113089, IIS-0209107 IIS-0205224 by the Penn-
sylvania Infrastructure Technology Alliance (PITA) Grant
No. 22-901-0001, and by the Defense Advanced Research
Projects Agency under Contract No. N66001-00-1-8936.
The layout of this paper is as follows: section 2
gives a background review of previous models. In
section 3, we describe our proposed model. We
show that our model conforms better to simulation
results than previous models over real networks. In
section 4, we revisit the issue of epidemic threshold
and present a new theory for arbitrary graphs—the
epidemic threshold of a given network is related in-
trinsically to the rst eigenvalue of its adjacency
matrix. We summarize in section 6.
follow a power law structure instead. Computer
viruses, therefore, are likely to propagate among
nodes with a high variance in connectivity.
Pastor-Satorras and Vespignani studied epidemic
spread for power-law networks where the connec-
tivity distribution is characterized as P (k) = k
−
(P (k) is the probability that a node has k links)
[14, 16, 18, 19]. Power-law networks have a highly
skewed connectivity distribution and for certain val-
ues of
resemble the Internet topology [6]. Pastor-
Satorras et al. developed an analytic model (we
refer to their model as the SV model) for the
Barabasi-Albert (BA) power-law topology (
= 3).
Their steady state prediction is,
2. Earlier models and their limitations
The class of epidemiological models that is most
widely used is the so-called homogeneous mod-
els [1, 11]. A homogeneous model assumes that
every individual has equal contact to everyone else
in the population, and that the rate of infection is
largely determined by the density of the infected
population. Kephart and White adopted a mod-
ied homogeneous model in which the communi-
cation among individuals is modeled as a directed
graph [7]: a directed edge from node i to node j
denotes that i can directly infect j. A rate of in-
fection, called the birth rate,
, is associated with
each edge. A virus curing rate,
, is associated with
each infected node.
If we denote the infected population at time t
as
t
, a deterministic time evolution of
t
in the
Kephart-White model (hereafter referred to as the
KW model) can be represented as
= 2e
−
/m
(3)
where m is the minimum connection in the net-
work. The SV model, however, depends critically
on the assumption
= 3, which does not hold for
real networks [9, 6]. This model yields less than
accurate predictions for networks that deviate from
the BA topology, as we will show later in the pa-
per. Pastor-Satorras et al. [18] also proposed an
epidemic threshold condition
k
k
2
SV
=
(4)
wherekis the expected connectivity andk
2
sig-
nies the connectivity divergence.
Following [19], Boguna and Satorras studied epi-
demic spreading in correlated networks where the
connectivity of a node is related to the connectiv-
ity of its neighbors [3]. These correlated networks
include Markovian networks where, in addition to
P (k), a function P (k|k
′
) determines the probability
that a node of degree k is connected to a node of
degree k
′
.
While some degree of correlations may exist in
real networks, it is often di
cult to character-
ize connectivity correlations with a simple P (k|k
′
)
function. Indeed, prior studies on real networks
[6, 15] have not found any conclusive evidence to
support the type of correlation as dened in [3].
Hence, we will not discuss models for correlated
networks further in this paper.
We present a new analytic model that does not
assume any particular propagation topology. We
will show later that our model subsumes previous
models that are tailored to t special-case graphs
(homogeneous, BA power-law, etc.).
d
t
dt
=
k
t
(1−
t
)−
t
(1)
wherekis the average connectivity. The steady
state solution for Equation 1 is
= 1−
/(
k)∗N ,
where N is the total number of nodes.
An important prediction of Equation 1 is the no-
tion of epidemic threshold. An epidemic threshold,
, is the critical
/
ratio beyond which epidemics
ensue. In a homogeneous or Erdos-Renyi network,
the epidemic threshold is,
1
k
hom
=
(2)
wherekis the average connectivity [7].
These earlier models provide a good approxima-
tion of virus propagation in networks where the
contact among individuals is su
ciently homoge-
neous. However, there is overwhelming evidence
that real networks (including social networks [21],
router and AS networks [6], and Gnutella overlay
graphs [22]) deviate from such homogeneity—they
3. The proposed model
Note that the third bullet above is due to poten-
tially concurrent curing and infection events.
We subsequently dene the healthy probability
of a node i at time t, 1−p
i,t
, to be
In this section, we describe a model that does
not assume homogeneous connectivity or any par-
ticular topology. We assume a connected network
G = (N, E), where N is the number of nodes in the
network and E is the set of edges. We assume a
universal infection rate
for each edge connected
to an infected node, and a virus death rate
for
each infected node. Table 1 lists the symbols used.
1−p
i,t
=
(1−p
i,t−1
)
i,t
+
p
i,t−1
i,t
+
1
2
p
i,t−1
(1−
i,t
)
i = 1 . . . N (6)
Note that for the last term on the right hand side
of Equation 6 we assume that the probability that
a curing event at node i takes place after infection
from neighbors is roughly 50%.
Given a network topology and particular values
of
and
, we can solve Equation 6 numerically and
obtain the time evolution of infected population,
t
,
where
t
=
Virus birth rate on a link connected
to an infected node
Virus curing rate on an infected node
t Time stamp
p
i,t
Probability that node i is infected at t
i,t
Probability that node i does not
receive infections from its neighbors at t
t
Infected population at time t
k Average degree of nodes in a network
k
2
Connectivity divergence
P
N
i=1
p
i,t
.
3.2. Experiments
In this section, we present a set of simulation
results. The simulations are conducted to answer
the question—how does our model perform in real,
BA power law, and homogeneous graphs? We use a
real network graph collected at the Oregon router
views
1
. This dataset contains 31180 links among
10900 AS peers. All synthesized power-law graphs
used in this study are generated using BRITE [12].
Unless otherwise specied, each simulation plot is
averaged over 15 individual runs.
We begin each simulation with a set of randomly
chosen infected nodes on a given network topology
2
.
Simulation proceeds in steps of one time unit. Dur-
ing each step, an infected node attempts to infect
each of its neighbors with probability
. In addi-
tion, every infected node is cured with probability
. An infection attempt on an already infected node
has no eect.
Figure 1 shows the time evolution of
as pre-
dicted by our model (see Equation 6) on the 10900-
node Oregon AS graph, plotted against simulation
results and the steady state prediction of the SV
model in Equation 3 (Since the SV model does not
estimate the transients, we plot the steady state
only.) As shown, our model yields a strictly more
precise result than the SV model.
Figure 2 compares the predictions of our model
against the SV model for Barabasi-Albert networks
(see Equation 3). The topology used in Figure 2 is
a synthesized 1000-node BA network. Since the SV
Table 1. Table of Symbols
3.1. Model
Our model assumes discrete time. During each
time interval, an infected node i tries to infect its
neighbors with probability
. At the same time,
i may be cured with probability
. We denote the
probability that a node i is infected at time t as p
i,t
.
We dene
i,t
, the probability that a node i will not
receive infections from its neighbors at time t as,
Y
i,t
=
(p
j,t−1
(1−
) + (1−p
j,t−1
))
j:neighbor of i
Y
=
(1−
∗p
j,t−1
)
(5)
j:neighbor of i
We assume that a node i is healthy at time t if
•i was healthy before t and did not receive in-
fections from its neighbors at t (dened by
i,t
)
OR
•i was infected before t, cured at t and did not
receive infections from its neighbors (dened
by
i,t
) OR
1
2
The number of initially-infected nodes does not aect
the equilibrium of the propagation. It is chosen based on
the particular values of
and
for each run
•i was infected before t, received and ignored
infections from its neighbors, and was subse-
quently cured at t
(a)
(b)
Figure 1. Experiments show the time evolution of infection in an 10900-node power-law network.
Both simulations were performed on an Oregon network graph, with
k= 5.72
and
= 0.14
. In
both cases, our model conforms much closer to the simulation results than the SV model.
model (see Equation 3) is specically tailored for
BA networks, we expect the comparison to be sim-
ply a sanity check. As shown, both models conform
nicely to the simulation results, though our model
appears to be slightly more precise.
cise as the KW model, which is designed specically
for homogeneous networks. In one case where
is
0.2 and
is 0.72, the simulated spreading appears
to follow our prediction more closely than that of
the KW model.
Figure 2. Experiments on BA topology:
shows time evolution of infected popula-
tion in a 1000-node power-law network.
Our model outperforms the SV model in
its steady state prediction.
Figure 3. Experiments on ER topology:
shows time evolution of infected popu-
lation in a 1000-node random Erd os net-
work. Our model generally yields similar
predictions to the KW model, but outper-
forms it when
is high.
Figure 3 shows simulation results of epidemic
spreading on a synthesized 1000-node random net-
work, plotted against the KW model [7] and our
model. The network is constructed according to
the Erdos-Renyi model [5]. Since an Erdos-Renyi
network is su
ciently close to being homogeneous
as far as epidemiological models are concerned, the
results in Figure 3 suggest that our model is as pre-
The experiments we show here, conducted on a
real network, a synthesized BA power-law network,
and an Erdos-Renyi network, illustrate the predic-
tive power of our model—as a general model, it sub-
sumes prior models and produces predictions that
equal or outperform predictions that target specic
topologies.
4. Epidemic threshold and eigenvalues
matrix A of the network.
Theorem 1 (Epidemic Threshold) If an epi-
demic dies out, then it is necessarily true that
As described previously, an epidemic threshold
is a critical state beyond which infections become
endemic. Predicting the epidemic threshold is an
important part of an epidemiological model. The
epidemic threshold of a graph depends fundamen-
tally on the graph itself. The challenge therefore is
to capture the essence of the graph in as few param-
eters as possible. We present one such model here
that predicts the epidemic threshold with a single
parameter—the largest eigenvalue of the adjacency
matrix of the graph—for arbitrary graphs.
We note that previous models have derived
threshold conditions for special-case graphs. For in-
stance, the epidemic threshold for a homogeneous
network is the inverse of the average connectivity,
k. Similarly, the threshold for innite power-law
networks is zero. However, a unifying model for
arbitrary, real graphs has not appeared in the lit-
erature. The closest model thus far is the one put
forth by Pastor-Satorras et al. (see Equation 4).
We show later that their model is not accurate for
arbitrary graphs.
In this section, we describe a general theory for
epidemic threshold that holds for arbitrary graphs.
We observe that the epidemic threshold is a con-
dition linking the virus’ birth and curing rate to
the adjacency matrix of the graph, such that an in-
fection becomes an epidemic if the condition holds,
and dies out if it does not. Our theory is surpris-
ingly simple yet accurate at the same time. We
show later in this section that this new threshold
condition subsumes prior models for special-case
graphs. Table 2 lists the symbols used in this sec-
tion.
1
1,A
, where
is the birth rate,
is the
curing rate and
1,A
is the largest eigenvalue of the
adjacency matrix A.
<
=
Proof: Restating Equation 6,
1−p
i,t
=
(1−p
i,t−1
)
i,t
+
p
i,t−1
i,t
+
1
2
p
i,t−1
(1−
i,t
)
i = 1 . . . N
Rearranging the terms,
1
2
p
i,t−1
+
1
2
−1
1−p
i,t
=
1 +
p
i,t−1
i,t
1
2
p
i,t−1
+ 1 +
1
2
−1
=
p
i,t−1
X
−
p
j,t−1
j
X
=
1 +
p
i,t−1
−p
i,t−1
−
p
j,t−1
(8)
j
This uses the approximation
(1−a)(1−b)≈1−a−b
(9)
when a≪1, b≪1.
We thus have
X
so,
p
i,t
=
(1−
)p
i,t−1
+
p
j,t−1
(10)
j
Converting Equation 10 to matrix notation (P
t
is the column vector (p
1,t
, p
2,t
, . . . , p
N,t
)),
P
t
= ((1−
) I +
A) P
t−1
(11)
Thus, P
t
is of the form
A
Adjacency matrix of the network
P
t
=
SP
t−1
(12)
trA
The transpose of matrix A
S
t
P
0
i,A
The i-th largest eigenvalue of A
=
(13)
u
i,A
Eigenvector of A corresponding to
i,A
where S = (1−
)I +
A.
We call S the system
S
The ‘system’ matrix describing the
equations of infection
matrix.
As we show in Lemma 1 in the Appendix, the
matrices A and S have the same eigenvectors u
i,S
,
and their eigenvalues,
i,A
and
i,S
, are closely re-
lated:
i,S
The i-th largest eigenvalue of S
Table 2. Symbols for eigenvalue analysis
i,S
= 1−
+
i,A
∀i
(14)
Next, we will show that our estimate for the epi-
demic threshold
is
Using the spectral decomposition, we can say
X
S
=
i,S
u
i,S
tr(u
i,S
)
1
1,A
=
(7)
i
X
and, S
t
i,S
u
i,S
tr(u
i,S
)
=
(15)
where
1,A
is the largest eigenvalue of the adjacency
i
[ Pobierz całość w formacie PDF ]

zanotowane.pl

doc.pisz.pl

pdf.pisz.pl

upanicza.keep.pl

Epidemic Spreading in Real ...

Epidemic Spreading in Real Networks An Eigenvalue Viewpoint, Hacking and IT E-Book Dump Release

Archiwum