Generalized Multi--Layer Representation of Grayscale Images

# Generalized Multi-Layer Representation of Grayscale Images

## Abstract

Thresholding is one of the earliest concepts developed in the image processing community. Many researchers have worked on the generalization of the thresholding problem as multithresholding. However, multithresholding only increases the number of threshold values used in the process. Here, we define multithresholding as computing layers which add up together to give better approximations of the given image. The paper contains comprehensive mathematical formulation of the problem and a proper solution to it. Simulation results show the efficiency of the proposed algorithm. Image Segmentation, multi-layer segmentation, Adaptive Thresholding.

## 1  Introduction

Thresholding is one of the most fundamental primitives of grayscale image processing. Accordingly, different approaches for threshold selection already exist in the literature [1,2]. Using a priori knowledge about the share of each class in the image [3] or finding the deepest valley in the histogram [4] are two simple examples from the sixties. More recent approaches incorporate the edge information to the threshold-decider [5] and fit parametric curves to the histogram [6].
There are a few approaches that concern multilevel thresholding. As the generalization of single-level thresholding, the multi-level thresholding approaches work on finding a set of thresholds for converting an image into an index array. Two representative examples of these approaches are [7,8].
In this paper we propose a novel method for multilevel thresholding of an image which results in a multi-layer representation of an image. The proposed method extracts layers from an image which are combined step by step to give a better approximation of the given image. To the best knowledge of the authors, this approach is novel.

## 2  Proposed Method

Assume that n realizations of the arbitrary random variable X are given as SX={x1,¼,xn}, xi Î R. A multi-layer segmentation of SX is a set of binary planes and corresponding multipliers. These parameters should be selected in a way that they give a proper approximation of the original signal. Below, we will discuss this definition in full details.
Lets work on the single-layer segmentation. For the sequence SX, the single layer segmentation is a same-sized binary sequence G1X=[g11,g12,¼,g1n] and two real values r1 and s1. Where, r1 and s1 are two real values. As such, the first approximation of xi is denoted by [x\tilde]1i and is defined as,
 ~ x 1i =g1ir1+(1-g1i)s1.
(1)
Equation (1) shows that in the first layer, the sequence is quantized into two levels. Each quantization level is then assigned by a single representative. The above single-layer segmentation should be performed in a way that results in the least possible distortion. Hence, to obtain proper values of r1, s1, and g1i, one should minimize the objective function of,
 D1= nå i=1 (xi- ~ x 1i )2= nå i=1 (xi-g1ir1-(1-g1i)s1)2.
(2)
Computing the derivative of D1 in terms of r1 gives,
 ¶D1 ¶r1 =-2 nå i=1 g1i(xi-g1ir1-(1-g1i)s1).
(3)
Using the fact that g1i(1-g1i) is always zero and g1i2=g1i, we get,
 r1=Eg1i=1{xi}.
(4)
Similarly, computing [(D1)/(s1)] gives,
 s1=Eg1i=0{xi}.
(5)
Here, Eci{xi} denotes the expectation of xi for all those i that satisfy ci, where ci is a boolean condition.
Note that the computation of r1 and s1, does not solve the problem. Because these values depend essentially on the layer definition (G1X). To solve this recursion, select an arbitrary value of i. The share of i in D1 is d1i defined as,
 d1i=(xi-g1ir1-(1-g1i)s1)2.
(6)
Hence, g1i should be selected in a way that d1i gets minimum. As g1i Î [0,1] we have to only check whether,
 g1i=1«(xi-r1)2 £ (xi-s1)2.
(7)
or,
 g1i=1« xi £ 1 2 (r1+s1).
(8)
Assuming that r1 < s1. Substituting (8) in (4) and (5) gives,
 Exi £ 1/2(r1+s1){xi}=r1,
(9)

 Exi > 1/2(r1+s1){xi}=s1,
(10)
which should be solved simultaneously. We combine (9) and (10) to have,
 Exi £ 1/2(r1+s1){xi}+Exi > 1/2(r1+s1){xi}=r1+s1.
(11)
Now, substituting q = 1/2(r1+s1) in (11), we have,
 Exi £ q{xi}+Exi > q{xi}=2q.
(12)
Hence, we are seeking for the zero point of,
 f(q)= 1 2 (Exi £ q{xi}+Exi > q{xi})-q.
(13)
Note that for an arbitrary sequence of realizations of a random variable we have,
 lim x®min{X} Exi £ x{xi}= min {X},
(14)
and
 lim x®max{X} Exi > x{xi}= max {X}.
(15)
Hence,
 f æè min {X} öø = 1 2 æè - X - min {X} öø ³ 0.
(16)
Here, [`X] denotes the expectation of the sequence of values of X. Also, we have,
 f æè max {X} öø =- 1 2 æè max {X}- - X öø £ 0.
(17)
Note that (16) and (17) turn into equalities when and only when all the values in the sequence X are identical. Neglecting this impractical situation which results in f(q) º 0, we have,
 f æè max {X} öø f æè min {X} öø < 0.
(18)
Hence, working in the interval [min{X},max{X}], to which q should be a member of, bi-section gives a proper approximation of the value of q giving f(q) @ 0. Note that in (13), f:R® R is a continuous function.
Just to perform a brief review, the method of bi-section for finding a zero of the continuous function f:R® R in the interval [a,b] given that f(a)f(b) < 0 is as follows; First, c=1/2(a+b) is computed. If f(c) is zero, the goal is reached. Otherwise, one and only one of the two inequalities f(c)f(a) < 0 and f(c)f(b) < 0 should hold. Now, the search is repeated in the interval [a,c] or [c,b]. The algorithm stops when the length of the working interval is less than a preselected threshold. Here, we set the minimum interval length, which determines the precision of q, equal to 2-r multiplied by the original range of the sequence. A nominal value of r is 10 which results in a precision of approximately 0.1% of the interval length. The method of bi-section is proved to linearly converge, as the error in each stage is half the error in the last stage. Assuming that the length of the beginning search interval equals a, after j iterations, it would equal 2-ja. Hence, using the minimum threshold of 2-ra for convergence, the algorithm will stop not after r iterations. With the algorithm computing f(q) only for interval margins (one computation of f at each iteration), the total number of times the function f is computed equals r+2. For the nominal value, this is only twelve times computation of f.
Returning to the original problem, each times computation of the function f in our case, means performing n comparisons and n additions. Hence, the price of finding q by the bi-section approach equals 2n(r+2) flops (6.3 Mega flops for a 512×512 image while r equals 10, elapsing 6ms on a one-Giga-flops processor). Here, we have assumed that the original image is used for computing q. Though, the wise idea would be to use a down-sampled version. Now, assume that we have down-sampled the original signal by a factor of l before feeding it to the procedure (which finds the optimum q). It is clear that reducing the number of data realizations may only (rarely) decrease the number of iterations passed before convergence. Hence, after down-sampling the cost of finding the optimum q equals, [2/(l)]n(r+2). As such, by increasing l and decreasing r, one can decrease the computational cost, drastically. In Section III, we will perform a through investigation to obtain practical values for l and r. We emphasize that this down-sampling scheme is not useful in the areas where the realizations of the random variable are not correlated in the spatial (or time) domain. It means that for a set of data points gathered from the values of a (computer generated) random sequence, performing the down-sampling will result in data-loss. Though, the reader should be aware that images are highly correlated in the spatial domain and hence act perfectly in the proposed method.
There is another interpretation for the process performed at the above. Note that PSNR depends on the amount of distortion, computed in terms of the mean square difference between the original and the approximated signal. Hence, the one-layer segmentation produced here, is the best one-bit representation of the given image in the mean square sense.
Now that the mathematics is developed, we turn into the original problem of multi-layer segmentation. Having the sequence X=[x1,¼,xn], the m-layer segmentation is the set of n binary planes Gm=[g11,¼,g1n],¼,[gm1,¼,gmn] plus 2n real values of Rm=[r1,¼,rm] and Sm=[s1,¼,sm]. We define,
 ~ x m,i = må j=1 (gj,irj+(1-gj,i)sj),
(19)
as the m-layer representation of xi. Note that (1) is a special case of (19) with m=1. Now, the objective function for fitting xk,is into xis changes to,
 Dm= nå i=1 æè xi- må j=1 (gj,irj+(1-gj,i)sj) öø 2 .
(20)
We call the combination of Gm, Rm, and Sm as a proper m-layer segmentation of X if for all 1 £ k £ m, Gk, Rk and Sk give a proper k-layer segmentation of X (note that here X denotes a sequence of values of a random variable). This recursive definition implies that for finding the (m+1)-layer representation of X, one should first find its m-layer representation. Rewriting (19) as,
 ~ x k,i =xk-1,i+gk,irj+(1-gk,i)sk,
(21)
and (20) as,
 Dk= nå i=1 æè (x- ~ x k-1,i )-gk,irk-(1-gk,i)sk öø 2 ,
(22)
proves that the k-th layer is essentially the one-layer segmentation of the error of the k-1-layer representation. From this it proceeds that the process of finding the m-layer segmentation of a sequence of n realization (call it X) of a random variable is as follows: find the one-layer representation of X and call it X1. Set X=X-X1 and find the one-layer representation of the new X. Call the result X2. Set X=X-X2 and proceed to m layers.
The computational cost of obtaining the m-layer representation of a sequence of n values using the down-sampling ratio of l and the precision of 2-r equals m([2/(l)]n(r+2)+2n). For nominal values of r=10 and l = 200, the multiplier [2/(l)](r+2) is less than [1/8]. Hence, the computational cost of the proposed method almost equals 2nm. Here, n is the number of data points and m is the number of layers. As such, the cost of the proposed method is linear both in terms of the number of data points and the number of layers. In an ordinary practice, we may need to compute the eight-layer representation of a 512×512 image. This operation takes 4.2 Mega flops, elapsing less than 5ms on a one-Giga-flop processor.

## 3  Experimental Results

Figure 1-a, b, and c show Lena image, its 50-bin PDF, and the values of f(x), respectively. Here, the optimum q is approximated as 104.7. Figure 1-d shows the one-layer segmentation of Lena using r1=72.2 and s1=150.8. The PSNR is about 20dB.
 (a) (b) (c) (d)
Figure 1: (a) Lena image. (b) PDF of (a). (c) Values of the function f defined in (13). (d) Single-layer segmentation result (PSNR=19.8dB).
Figure 2 examines 512×512, 8bpp Peppers image to find the acceptable values of the precision and l (down-sampling ratio of the signal for reducing the cost of finding q). Figure 2-a shows that selecting the precision of 2-4 is an acceptable compromise. Note that larger values of -log2(precision) tend to unwanted decrease of PSNR. Figure 2-b shows that reducing the number of data points by a factor of 1000 has no great influence on the PSNR. Hence, according to the fact that having a large number of realizations, greatly declines the performance of finding q in the coming parts of this paper we down-sample he signal before feeding it to the routine which finds the optimum q. Also, we let the routine to stop making q more precise when it is working in an interval less than [1/16] of the original interval. With this settings, the cost of computing the optimum q declines to 3200 flops (about 3ms in a one-Giga-flop processor). Using non-optimized MATLAB code, it takes 28ms to perform the operation. From which, 22ms is elapsed on performing the down-sampling (on a PIV with clock frequency of 2871MHz).
 (a) (b)
Figure 2: (a) PSNR of a single-layer segmentation of Peppers image with varying l and r. (b) Logarithm of the computational cost for different values of l and r.
Figure 3 shows the results of applying the proposed method on a 1-D signal. This signal is actually a line of a 512×512 image. The value of SNR increases as 23dB, 27dB, 31dB, 34dB, and 39dB as the number of layers increases from one to five, respectively. We believe that this figure clearly depicts the scope of the proposed method.
Figure 3: Results of applying the proposed method on a 1-D signal.
Figure 4 shows unr730 image. The proposed method is applied on this image to produce the number of layers that give a proper representation with PSNR ³ 40dB. As such, seven layers have been produced (here, we have only shown the results of five layers, the two others are not visually recognizable). Note the results of adding more and more layers. While increasing the PSNR, adding new layers also results in an image with more details. Figure 5 shows PSNR versus layer number for this image.
 (a) (b) (c) (d) (e) (f)
Figure 4: Results of applying the proposed method on unr730 image. (a) Original image adopted from http://ipagwww.med.yale.edu/ ~ staib. (b) Single-layer representation. (c) Two-layer representation. (d) Three-layer representation. (e) Four-layer representation. (f) Five-layer representation.
Figure 5: PSNR versus layer number for unr730 image.
Here, we compare the proposed method with one of the best available single-layer thresholding approaches. As an outstanding approach, Otsu proposed a method for finding the optimal threshold of an image containing a bright object on a dark background [9]. The Otsu method divides a double-peak histogram into two compact classes which are well-separated. The above method differs from the proposed method in many major points. While the Otsu method only works for a specific class of images, it focuses on the intra- and inter-specifications of the classes. In contrast, the proposed method gives a proper estimation of any arbitrary image. Also, Otsu's approach results in a binary (single-layer) representation of an image, while the proposed method is able to give the representation of the given image in any number of layers. In the computational side, both the Otsu's method and the proposed method rely on analytical search for a proper threshold. While the proposed method utilizes a logarithmic-order search algorithm, the Otsu's approach searches an interval from the beginning bin-by-bin to find the optimum threshold.

## 4  Conclusions

A new multi-layer segmentation problem is defined and its practical meaning is described. Also, a novel approach for solving the above problem is proposed. The proposed method acts on a grayscale image to produce sequential layers. Adding these layers one after another gives successive approximations of the image with increasing qualities. The proposed method is proved to be both efficient and effective.

## Acknowledgement

This work was in part supported by a grant from ITRC. The first author wishes to thank Ms. Azadeh Yadollahi for her encouragement and invaluable ideas.

## A  Generalization

Assume that x1, ¼,xn Î R are given. Also, assume that we need approximate xis using a single layer. As such, the problem is to find the values g1,¼,gn Î {0,1} and the two real values r and s, for which [x\tilde]1i=rgi+s(1-gi) is an appropriate approximation. Here, appropriateness is measured in terms of the following objective function which is to minimized,
 D = nå i=1 f(xi-rgi-s(1-gi)).
(23)
Where, f:R® R+È{0} is a continuous differentiable function which is increasing in R+È{0}. First note that "x Î R, f(x)=f(|x|). For an arbitrary value of i we have,
 gi=1« f(|xi-r|) £ f(|xi-s|),
(24)
or equivalently,
 gi=1« |xi-r| £ |xi-s|.
(25)
Assuming s > r, (25) turns into,
 gi=1« xi £ 1 2 (r+s).
(26)
Now, lets find the values of r and s. We have,
 ¶D ¶r =- å xi £ [(r+s)/2] f¢(xi-r)=0,
(27)

 ¶D ¶r =- å xi > [(r+s)/2] f¢(xi-s)=0.
(28)
For an arbitrary function g:R® R we define,
 Eg{x} º argr Î R å x Î X g(x-r)=0.
(29)
For example, for g(x) º x, Eg turns into the conventional average. Using this notation, (27) and (28) are converted to,
 Ef¢{xi £ q}+Ef¢{xi > q}=2q,
(30)
solution of what gives the appropriate q = [(r+s)/2].
Lets try to compute Eg(X) for an arbitrary function g and a set X. Here, we focus on the continuous functions g satisfying g(-x)=-g(x) and g ³ 0® g(x) ³ 0. Lets define,
 GX(r)= å x Î X g(x-r)= å x Î X, x > r g(x-r)- å x Î X,x £ r g(r-x).
(31)
Now,
 GX æè min {X} öø > 0, GX æè max {X} öø < 0.
(32)
Hence, using bi-section, the value of r satisfying GX(r)=0 is approximated. Returning to the main problem, we were faced with an odd function so,
 f¢(-x)= lim Dx®0 f(-x+Dx)-f(-x) Dx =
(33)
 - lim Dx®0 f(x-Dx)-f(x) -Dx =-f¢(x).
As for x ³ 0, f¢(x) is positive, the method of bi-section will find the value of Ef¢{X} for any set X Ì R. For the set X Ì [a,b] for finding Ef¢{X} with the precision of 2-a(b-a), a operations would be enough. Denoting the cost of computing f¢ by t, the cost equals (a+2)nt.
Lets return to the main problem of solving (30). Thus, we are seeking for the zero point of the function,
 FX(q)=Ef¢{xi £ q}+Ef¢{xi > q}-2q.
(34)
Note that,
 lim q®min{X}+ Ef¢{x Î X,x < q}= min {X},
(35)
and
 lim q®max{X}- Ef¢{x Î X,x > q}= max {X}.
(36)
Hence,
 FX æè min {X} öø =Ef¢{x Î X}- min {X},
(37)

 FX æè max {X} öø =Ef¢{x Î X}- max {X},
(38)
We claim that for every function f with the conditions described above,
 min {X} £ Ef¢{x Î X} £ max {X}.
(39)
Note that if (39) is proved then FX(min{X}) ³ 0 and FX(max{X}) £ 0 and so finding the zero of FX would be possible using bi-section. To prove (39) note that,
 r £ min {X}® xi-r ³ 0® f¢(xi-r) ³ 0®
(40)
 å x Î X f¢(x-r) ³ 0,

 r ³ max {X}® r-xi ³ 0® f¢(r-xi) ³ 0®
(41)
 å x Î X f¢(x-r) £ 0.
Hence, the zero point of åx Î X f¢(x-r) should be in the interval [min{X},max{X}], proving (39).
For a specific value of q to compute (30) one should compute both Ef¢{xi £ q} and Ef¢{xi > q}. Assuming that X Ì [a,b] and that q = ab+b(1-b) (0 £ b £ 1) finding the two above measures with the precision of 2-a multiplied by interval length elapses,
 (a+2)ntb+(a+2)nt(1-b)=(a+2)nt,
(42)
flops (assuming that the points are almost uniformly spread). For finding q with the precision of 2-a(b-a) the total number of flops will get (a+2)2nt.

## References

[1]
J. S. Weska, "A survey of threshold selection techniques," Computer Graphics and Image Processing, vol. 7(2), pp. 259-265, 1978.
[2]
B. Sankur, A. T. Abak, and U. Baris, "Assessment of thresholding algorithms for document processing," in Proceedings of the IEEE International Conference on Image Processing, Kobe, Japan, 1999, pp. 580-584.
[3]
W. Doyle, "Operations useful for similarity-invariant pattern recognition," Journal of Association for Computing Machinery, vol. 9(2), pp. 259-267, 1962.
[4]
J. M. S. Prewitt and M. L. Mendelsohn, "The analysis of cell images," Annual New York Academy of Science, vol. 128, pp. 1036-1053, 1966.
[5]
J. S. Weska, R. N. Nagel, and A. Rosenfeld, "A threshold selection technique," IEEE Transaction on Computers, vol. C-23 (12), pp. 1322-1326, 1974.
[6]
N. Papamarkos and B. Gatos, "A new approach for multilevel threshold selection," CVGIP: Graphical Models and Image Processing, vol. 56(5), pp. 357-370, 1994.
[7]
F. Tomita, M. Yachida, and S. Tsuji, "Detection of homogenous regions by structural analysis," in Proceedings of the Joint Conference on Artificial Intelligence, Stanford, CA, 1973, pp. 564-571.
[8]
N. Papamarkos, C. Strouthopoulos, and I. Andreadis, "Multithresholding of color and gray-level images through a neural network technique," Image and Vision Computing, vol. 18, pp. 213-222, 2000.
[9]
N. Otsu, "A threshold selection method from gray level histograms," IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-9, pp. 62-66, 1979.

File translated from TEX by TTH, version 3.72.
On 13 Nov 2007, 16:44.