西安交通大学本科毕业设计(论文)
redundancy as ingredients in a global Bayesian objective—this part is described in Section II, along with its emerging iterated numerical solver. Also novel in this work is the idea to train dictionaries for the denoising task, rather than use prechosen ones. As already mentioned earlier, when training is done on the corrupted image directly, the overall training-denoising algorithm becomes fused into one iterative procedure that comprises of steps of denoising of the image, followed by an update of the dictionary. This is described in Section III in detail. In Section IV, we show some experimental results that demonstrate the effectiveness of this algorithm.
II. FROM LOCAL TO GLOBAL BAYESIAN RECONSTRUCTION
In this section, we start the presentation of the proposed denoising algorithm by first introducing how sparsity and redundancy are brought to use. We do that via the introduction of the Sparseland model. Once this is set, we will discuss how local treatment on image patches turns into a global prior in a Bayesian reconstruction framework.
A. Sparseland Model for Image Patches
We consider image patches of sizen?n pixels, ordered lexicographically as column vectorsx?Rn . For the construction of the Sparseland model, we need to define a dictionary (matrix) of size D?Rn?k (withk?n , implying that it is redundant). At the moment, we shall assume that this matrix is known and fixed. Put loosely, the proposed model suggests that every image patch, x , could be represented sparsely over this dictionary, i.e., the solution of
??argmin?0 subject toD??x (2)
??is indeed very sparse, ?0??n. The notation ? stands for the count of the nonzero
0?entries in ?. The basic idea here is that every signal instance from the family we consider can be represented as a linear combination of few columns (atoms) from the redundant dictionary D.
This model should be made more precise by replacing the rough constraint D??x with a clear requirement to allow a bounded representation error, D??x2??. Also, one needs to define how deep is the required sparsity, adding a requirement of the form
??L??n, that states that the sparse representation uses no more than atoms from the
0?dictionary for every image patch instance. Alternatively, a probabilistic characterization
can be given, defining the probability to obtain a representation with nonzeros as a decaying function of some sort. Considering the simpler option between the two, with the triplet (?,L,D) in place, our model is well defined.
Now assume that indeed belongs to the (?,L,D)-Sparseland signals. Consider a noisy version of it,y , contaminated by an additive zero-mean white Gaussian noise with standard deviation ?. The MAP estimator for denoising this image patch is built by solving
??argmin?0 subject to D??y2?T (3)
??2where is dictated by?,? . The denoised image is, thus, given by [22], [41], [42]. Notice that the above optimization task can be changed to be
14
附录
??argminD??y2???0 (4)
??2so that the constraint becomes a penalty. For a proper choice of?, the two problems are equivalent. We will use this alternative terminology from now on, as it makes the presentation of later parts simpler to follow.
While this problem is, in general, very hard to solve, the matching and the basis pursuit algorithms can be used quite effectively [20]–[22] to get an approximated solution. Recent work established that those approximation techniques can be quite accurate if the solution is sparse enough to begin with[41], [42]. In this work, we will make use mainly of the orthonormal matching pursuit (OMP) because of its simplicity [21]and efficiency.
B. From Local Analysis to a Global Prior
If we want to handle a larger image X of size N?N, and we are still interested in using the above described model, one option is to redefine the model with a larger dictionary. Indeed, when using this model with a dictionary emerging from the contourlet or curvelet transforms, such scaling is simple and natural [26].
However, when we insist on using a specific fixed and small size dictionary D?Rn?k, this option no longer exists. Thus,a natural question arises concerning the use of such a small dictionary in the first place. Two reasons come to mind: 1) when training takes place (as we will show in the next section), only small dictionaries can be composed; and furthermore; 2) a small dictionary implies a locality of the resulting algorithms, which simplifies the overall image treatment.
We next describe possible ways to use such a small dictionary when treating a large image. A heuristic approach is to work on smaller patches of size n?nand tile the results. In doing so, visible artifacts may occur on block boundaries. One could also propose to work on overlapping patches and average the results in order to prevent such blockiness artifacts, as, indeed,practiced in [38]–[40]. As we shall see next, a systematic global approach towards this problem leads to this very option as a core ingredient in an overall algorithm.
If our knowledge on the unknown large image X is fully expressed in the fact that every patch in it belongs to the (?,L,D)-Sparseland model, then the natural generalization of the above MAP estimator is the replacement of (4) with
22???? (5) ?,X?argmin?X?Y????D??RX?ij???ijij0ijij2?ij,X??ijij2In this expression, the first term is the log-likelihood global force that demands the proximity between the measured image, X,and its denoised (and unknown) version X. Put as a
2constraint, this penalty would have read X?Y2?Const.?2, and this reflects the direct relationship between ?and ?.
The second and the third terms are the image prior that makes sure that in the constructed image, X, every patch xij?RijXof size n?n in every location (thus, the summation by i,j)has a sparse representation with bounded error. Similar conversion has also been practiced by Roth and Black when handling an MRF prior [29].
The matrix Rijis an n?Nmatrix that extracts the (i,j) block from the image. For an
N?Nimage X the summation over i,j includes (N?n?1)2items, considering all image patches of size
n?n in X with overlaps. As to the coefficients ?ij
15
西安交通大学本科毕业设计(论文)
, those must be location dependent, so as to comply with a set of constraints of the form
D?ij?xij.
22
C. Numerical Solution
When the underlying dictionary D is assumed known, the proposed penalty term in (5) has two kinds of unknowns: the sparse representations ?ijper each location, and the overall output image X. Instead of addressing both together, we propose a block-coordinate minimization algorithm that starts with an initialization X=Y, and then seeks the optimal
??ij. In doing so, we get a complete decoupling of the minimization task to many smaller ones, each of the form
??ij?argmin?ij?ij0??D?ij?xij (6)
?ij2?2handling one image patch. Solving this using the orthonormal matching pursuit [21] is easy, gathering one atom at a time, and stopping when the error D?ij?xijgoes below T. This
22way,the choice of ?ijhas been handled implicitly. Thus, this stage works as a sliding windowsparse coding stage, operated on each block of n?nat a time.
Given all , we can now fix those and turn to update . Returning to (5), we need to solve
X?argmin?X?YX22??D?ij?RijXij2?2 (7)
This is a simple quadratic term that has a closed-form solution of the form
X?(?I??RRij)(?Y??RD?ij) (8)
Tij?1Tijijij??This rather cumbersome expression may mislead, as all it says is that averaging of the denoised patches is to be done, with some relaxation obtained by averaging with the original noisy image.The matrix to invert in the above expression is a diagonal one,and, thus, the calculation of (8) can be also done on a pixel-bypixel basis, following the previously described sliding window sparse coding steps.
So far, we have seen that the obtained denoising algorithm calls for sparse coding of small patches, and an averaging of their outcomes. However, if minimization of (5) is our goal,then this process should proceed. Given the updated X , we can repeat the sparse coding stage, this time working on patches from the already denoised image. Once this is done, a new averaging should be calculated, and so on, and so forth. Thus,we obtain exactly what Guleryuz suggested in his work—iterated denoising via sparse representation, and we may regard the analysis proposed here as a rigorous way to justify such iterated scheme [38]–[40].
III. EXAMPLE-BASED SPARSITY AND REDUNDANCY
The entire discussion so far has been based on the assumption that the dictionary D?Rn?kis known.We can certainly make some educated guesses as to which dictionaries to use. In fact,following Guleryuz’swork, the DCT seems like such a plausible choice [38]–[40]. Indeed, we might do better by using a redundant version of the DCT,1 as practiced in [36]. Still, the question remains: Can we make a better choice for D based on training? We now turn to discuss this option. We start with the simpler(and less effective) option of training the
16
附录
dictionary on a set of image patches taken from good quality images, and then turn to discuss the option of training on the corrupted image itself. A. Training on the Corpus of Image Patches
MGiven a set of image patches Z??zj?j?1, each of size n?n, and assuming that they emerge from a specific(?,L,D)-Sparseland model, we would like to estimate
this model parameters, (?,L,D). Put formally, we seek the dictionary D that minimizes
?(D,??j?j?1)??[?j?j0?D?j?zj2] (9)
M2j?1MJust as before, the above expression seeks to get a sparse representation per each of the examples in Z, and obtain a small representation error. The choice for ?j dictates how those two forces should be weighted, so as to make one of them a clear constraint. For example, constraining ?j?j0?Limplies specific values for ?j, while requiring
?jD?j?zj0??2leads to others.
The K-SVD proposes an iterative algorithm designed to handle the above task effectively [36], [37]. Adopting again the block-coordinate descent idea, the computations of
?j?MD and?are separated. Assuming that D is known, the penalty posed in (9) reduces to a j?1set of M sparse coding operations,very much like the ones seen in (6). Thus, OMP can be used again to obtain the near-optimal (recall that OMP is an approximation algorithm, and,
?j?Mthus, a true minimization is not guaranteed) set of representation vectors ?. j?1Assuming these representation vectors fixed, the K-SVD proposes an update of the
dictionary one column at a time. As it turns out, this update can be done optimally, leading to the need to perform a SVD operation on residual data matrices, computed only on the
?j?M)is guaranteed to drop per examples that use this atom. This way, the value of ?(D,?j?1an update of each dictionary atom, and along with this update, the representation coefficients
change as well (see [36] and [37] for more details).
When adopted to the denoising task at hand, a crucial step is the choice of the examples to train on. Is there really a universal dictionary that fits all images well? If there is one, whichexamples shall we use to find it? The experiments that follow in the next section bring us to the conclusion that while a reasonably good dictionary that fits all is indeed within reach, extracting state-of-the-art denoising performance calls for a more complex model that uses several dictionaries switched by content—an option we do not explore in this work.
Also, since the penalty minimized here in (9) is a highly nonconvex functional, local minimum solutions are likely to haunt us. Thus, a wise initialization could be of great worth. In our experiments we started with the already mentioned redundant DCT, which proves to be a good dictionary choice. This also enabled us to apply fewer number of iterations.
Another puzzling issue is the redundancy factor k/n—How should we choose k, the number of columns in D? Is there an optimal choice? In this work, we do not address this important question, and simply choose a value we find empirically to perform well. Further work is required to explore this matter.
B. Training on the Corrupted Image
Instead of supplying an artificial set of examples to train on, as proposed above, one
Mcould take the patches from the corrupted image, Z??yj?j?1, where M?(N?n?1)2.
17
西安交通大学本科毕业设计(论文)
Since the K-SVD dictionary learning process has in it a noise rejection capability (see experiments reported in [36]), this seems like a natural idea. Furthermore, rather than using unrelated examples that call for the universality assumption of the Sparseland model, this option tailors the dictionary to the image to be treated.
At first sight, this change in the origin of the examples to train on seems to be of technical worth, and has no impact on the overall algorithm. However, a close inspection of
?j?M) (9), and the global MAP penalty in (5), reveals the close both the functional in ?(D,?j?1resemblance between the two. This implies that the dictionary design could be embedded
within the Bayesian approach. Returning to (5), we can regard also as an unknown, and define our problem as
22????? (10) D,?X?argmin?X?Y????D??RXij????ijij0ijij2???ij,X,Dijij2Following the previously constructed algorithm, we can assume a fixed D and X, and compute the representations?ij. This requires, as before, a sparse coding stage that deploys the OMP.Given those representations, the dictionary can be now updated, using a sequence of K-SVD operations.
Once done, the output image can be computed, using (8).However, an update of the output image X changes the noise level ?, which up until now has been considered as known, and was used in the preceding two stages. Therefore, we choose to perform several more iterations of representation computation and dictionary update, using the same value of ?, before finding the output image X. This algorithm is described in detail in Fig. 1.
In evaluating the computational complexity of this algorithm,we consider all three stages—sparse coding (OMP process), dictionary update (these stages are iterated J times), and final averaging process. All stages can be done efficiently, requiring O(nkLj)operations per pixel, where n is the block dimension, k is the number of atoms in the dictionary, and L is the number of nonzero elements in each coefficient vector. L depends strongly on the noise level, e.g., for??10, the average L is 2.96, and for ??20, the average L is 1.12.
IV. CONCLUSION AND FURTHER WORK
This work has presented a simple method for image denoising,leading to state-of-the-art performance, equivalent to and sometimes surpassing recently published leading alternatives.The proposed method is based on local operations and involves sparse decompositions of each image block under one fixed over-complete dictionary, and a simple average calculations.The content of the dictionary is of prime importance for the denoising process—we have shown that a dictionary trained for natural real images, as well as an adaptive dictionary trained on patches of the noisy image itself, both perform very well.
There are several research directions that we are currently considering, such as using several dictionaries and switching between them by content, optimizing the parameters, replacing the OMP by a better pursuit technique, and more. Beyond these,one direction we consider to be promising is a generalization to multiscale dictionaries. This work concentrated on small image patches, completely overlooking the global structure of the
image, and the multiscale analysis that other techniques have exploited rather well.We are studying ways to extend this work to multiscale dictionaries, as it is clear that K-SVD cannot be directly deployed on larger blocks.
18
?