Longest Common Subsequence
X = <x1, x2, x3, ...,xm>
is a sequence
Z = <z1, z2, z3, ...,zk>
is a subsequence of X if z1 = xi1, z2=
xi2, ...zk = xik
0 < i1 < i2... < ik
£ m
Given sequences X and Y, Z is a common subsequence if Z is a
subsequence to both X and Y. If Z is the longest possible subsequence of
both X and Y then it is the longest common subsequence.
X = < A, G, C, G, T, A, G >
Y = < G, T, C, A, G, A >
a common subsequence is < G, C, A > also < G, C, G, A >, <
G, T, A, G > and < G, C, A, G >, since no common subsequence of length 5
exists there are 3 LCS's: < G, C, G, A > , < G, T, A, G
>
and < G, C, A, G >.
Finding the LCS:
Could enumerate all subsequences of X (length m) ® 2mand
Y (length n) ® 2n must search for
hits and sort by length. This is an exponential algorithm. LCS has an
optimal substructure property based on prefixes. If X = <x1,
...,xm> the Xi = <x1, ...,xi>
is the ith prefix of X and X0 is empty.
Theorem 16.1 ( Optimal substructure for
LCS)
If X = <x1,
...,xm> and If Y =
<y1, ...,yn> are sequences and let Z = <z1,
...,zk> be some LCS of X and Y.
If xm = yn then zk
=
xmand Zk-1 is an LCS of Xm-1 and
Yn-1
If xm ¹ yn
then zk ¹xm
Þ Z is an LCS of Xm-1
and Y
If xm ¹ yn
then zk ¹yn
Þ Z is an LCS of X and
Y n-1
Proof:
If zk ¹xm then we
could add xm = yn to Z to get an LCS of
length k + 1. By contradiction it must be that zk = xm
= yn . |zk-1 | = k - 1 and is an LCS
of Xm-1 and Yn-1 . It is an LCS, if not
then $ W CS of Xm-1 and Yn-1
with | W | > k - 1 and so by appending xm =
yn we get a CS of X and Y of length greater than k, a
contradiction.
If zk ¹xm then z
is a CS of Xm-1 and Y. If $
W
a CS with | W | > k, then W would be a CS of X and Y, a
contradiction
Proof by reversing x and y
This means that to find the LCS of X and Y:
if xm = yn find LCS of
and
If xm ¹ yn
-
find LCS of Xm-1and Y
-
find LCS of X and Yn-1
and take the larger of 'a.' or 'b.'. Thus we start
with small problem,
find LCS and grow our solution:
Let c[i, j] = | W |, W is LCS of Xi and
Yj
c[i, j] = 0
if i*j = 0
=
c[i-1,j-1] + 1,
if i*j
> 0 and xi = yj
=
max (c[i, j-1].
c[i-1, j]) if i,j > 0
and xi ¹
yj
Could use this to write an exponential algorithm via
recursion.
However, there are only Q (m n)
sub-problems, so we use DP. Store c[i, j] and b[i, j] which points
to the
optimal sub-problem chosen. c[m, n] contains the length of the LCS
and
b[m, n] directions used to build it.
LCS-Length ( X, Y )
m ¬ length[X]
n ¬ length[Y]
for i ¬ 1 to m
do c[i,0] ¬ 0
for j ¬ 0 to n
do c[0, j] ¬ 0
for i ¬ 1 to m
do for j ¬ 1 to n
do if xi = yj
then c[i, j] ¬ c[i-1, j-1] + 1
b[i, j] ¬ "\"
else if c[i - 1, j] ³ c[i, j-1]
then c[i, j] ¬ c[i-1, j]
b[i, j] ¬ "Ý"
else c[i, j] ¬ c[i, j-1]
b[i,j] ¬ "¬"
return c and b

Running Time = O (m n) since each table entry
takes O (1)
time.
Note:
"\" means both the same
"Ý" means c[i -
1, j] ³
c[i, j-1]
"¬" means c[i -
1, j] <
c[i, j-1]
The "\" diagonal arrows lengthen the
LCS
Print-LCS (b, X, i, j)
-
If i = 0 or j = 0
then return
if b[i,j] = "\"
then Print-LCS (b, X, i-1,
j-1)
print
xi
else if b[i,j] = "Ý"
then Print-LCS ( b, X, i-1,
j)
else Print-LCS (b, X, i, j-1)
This takes O ( m + n ) since either i or j is
decremented in the
recursion.
|