[논문 구현] Improving Document Binarization via Adversarial Noise-Texture Augmentation (2019)

Paper Code 2022. 7. 4. 20:18

# Pytorch 사용한 논문 구현

- paper : https://arxiv.org/abs/1810.11120v1

Improving Document Binarization via Adversarial Noise-Texture Augmentation

Binarization of degraded document images is an elementary step in most of the problems in document image analysis domain. The paper re-visits the binarization problem by introducing an adversarial learning approach. We construct a Texture Augmentation Netw

arxiv.org

# Architecture

# TANet (Texture Augmentation Network)

- 다양한 noise texture를 가진 동일한 content의 이미지를 생성하여 document binarization dataset 확장

1. Content와 Style(Texture) 분리

- Clean image Ic와 reference noise image Ir -> Content Encoder와 Style Encoder의 input

- Encoder : 각 image의 latent representation 추출

- 각 encoder의 content, style representation -> Concatenation 후 decoder input으로 사용

- Decoder : 최종 출력값 tanh activaion 사용, encoder와 대칭구조, encoder와 skip-connection 사용

2. Adversarial Loss

- 전체적인 구조를 얻는데 중점을 둔 Loss

- TANet의 output image Ig와 Ir의 Adversarial Loss -> Dg을 통해 판별

3. Style Loss

- 전체적인 구조가 아닌 texture의 detail을 포착

- Style loss를 통해 texture을 reference image에서 clean image로 transfer (matching the gram matrices)

- pre-trained VGG-19 의 특정 layer(conv1_1, conv2_1, conv3_1, conv4_1, conv5_1)의 feature 값들의 내적곱을 통해 gram matrix를 구한다.

-> 각 layer에서 feature map의 channel 간 상관관계 파악

-> gram matrix 간 loss을 줄여 유사한 style 유도

F^l_ik : activation of ith filter at position k in layer l.

4. Content Loss

- 생성된 image의 content(text) 유지를 위한 loss : Masked mean squared loss function

- content image Ic, 생성된 Ig의 픽셀 간 차이를 계산

Total Objective Function of TANet

# TANet Structure

# Generator

- encoder (style, content -> concat)

각 layer -> leaky relu, batch normalization

layer	conv1	conv2	conv3	conv4	conv5	conv6	conv7	conv8
	5x5x32	5x5x64	5x5x128	5x5x256	5x5x256	5x5x256	5x5x256	5x5x256
output_size	128x128	64x64	32x32	16x16	8x8	4x4	2x2	1x1

- decoder

각 layer -> relu

layer	conv1	conv2	conv3	conv4	conv5	conv6	conv7	conv8
	5x5x512(256x2)	5x5x512	5x5x512	5x5x512	5x5x256	5x5x128	5x5x64	5x5x32
output_size	2x2	4x4	8x8	16x16	32x32	64x64	128x128	256x256
dropout	0.5	0.5	0.5

# Discriminator

- d1_loss_real + d1_loss_fake

layer	conv1	conv2	conv3	conv4	linear
	5x5x32	5x5x64	5x5x128	5x5x256
output_size	128x128	64x64	32x32	16x16	sigmoid

# Style loss

- pre-trained vgg16 활용 (layer 별 max_pooling(1,2,2,1) 적용)

- Generator output과 style image를 vgg16 통과시켜 추출한 gram_matrix의 loss

layer	conv1_x	conv2_x	conv3_x	conv4_x	conv5_x
	[3x3, 64]x2	[3x3, 128]x2	[3x3, 256]x3	[3x3, 512]x3	[3x3, 512]x3
gram_matrix 사용	conv1_2	conv2_2	conv3_3	conv4_3	conv5_3

# Content loss

수식과 동일

# BiNet (Binarization Network)

- TANet에서 생성된 이미지를 BiNet을 통해 clean version으로 변환

ABOUT ME

tkdrnjss tkdrnjss