In computer vision, segmenting an image into separate segments or regions is a crucial operation. The article “Segment Anything – A Foundation Model for Image Segmentation” provides an introduction to Attention Res-UNet which is an essential model for making separate aspects visible through images.
In this article, we explore the idea of a foundation model designed for image segmentation, which includes its structure and how to execute it in several stages such as data preparation, creation, learning as well as outcome forecasts, also talk about performance evaluation measures of the product and offers some examples for a better understanding of its use across different fields too.
Overview of Image SegmentationThe significance of image segmentation extends beyond mere visual understanding, permeating into diverse domains and industries. In medical imaging, for instance, segmentation plays a pivotal role in delineating anatomical structures, identifying lesions, and assisting in disease diagnosis and treatment planning. Similar to this, segmentation helps with urban planning, environmental monitoring, and land cover classification in satellite imaging analysis. Furthermore, precise environment segmentation is essential for path planning, obstacle detection, and scene comprehension in the context of autonomous driving.
There has been an unbroken link between the development of deep learning techniques and the growth of image segmentation techniques ever since the introduction of convolutional neural networks (CNNs). With their extraordinary ability to capture complex spatial relationships and hierarchical representations found in images, these deep-learning architectures have completely changed the field of image segmentation. Researchers and professionals have been able to accomplish previously unheard-of levels of precision and effectiveness in segmentation jobs across numerous areas because of CNNs.
What is Attention Res-UNetAttention ResUNet is an advanced neural network architecture for high-precision image segmentation, particularly in medical imaging. It integrates the strengths of UNet’s encoder-decoder structure, ResNet’s residual learning, and attention mechanisms to enhance segmentation accuracy and efficiency. The residual blocks facilitate training deeper networks by maintaining gradient flow, while attention gates focus on relevant image regions, improving feature representation. This combination allows Attention ResUNet to deliver superior performance in tasks like tumour detection, organ segmentation, and retinal vessel segmentation, making it a powerful tool for complex segmentation challenges.
Image Segmentation Stepwise ImplementationPutting into practice a foundation model for picture segmentation requires a methodical process that includes multiple crucial components. A thorough explanation of the implementation procedure is provided below:
Step 1: Import necessary libraries
Python
import pandas as pd
import numpy as np
import os
import tensorflow as tf
import cv2
import matplotlib.pyplot as plt
import glob
from tqdm import tqdm
from skimage.io import imread, imshow
from skimage.transform import resize
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import (Input, Conv2D, Conv2DTranspose,
BatchNormalization, Activation,
MaxPooling2D, UpSampling2D,
Concatenate, Dropout, Lambda)
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras import backend as K
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Model
Step 2: Download and Extract Mask Images
Python
%%capture
# Download and Extract Mask Image Dataset
!wget https://dataverse.harvard.edu/api/access/datafile/3838943 -O mask_imgs.zip
!mkdir masks
!unzip mask_imgs.zip -d /content/masks
# Download and Extract Image Dataset Part 1
!wget https://dataverse.harvard.edu/api/access/datafile/3172585 -O imgs_1.zip
!mkdir imgs
!unzip imgs_1.zip -d /content/imgs
# Download and Extract Image Dataset Part 2
!wget https://dataverse.harvard.edu/api/access/datafile/3172584 -O imgs_2.zip
!unzip imgs_2.zip -d /content/imgs
Step 3: Load Mask Images and Image DatasetLoad Mask Images
Python
mask_img_list = os.listdir(
'/content/masks/HAM10000_segmentations_lesion_tschandl')
df_mask_images = pd.DataFrame(mask_img_list, columns=['image_id'])
print('Mask Image Dataset Size: ', df_mask_images.size)
print(df_mask_images.sample(5))
Output:
Mask Image Dataset Size: 10015 image_id 856 ISIC_0026955_segmentation.png 8481 ISIC_0029327_segmentation.png 3663 ISIC_0031816_segmentation.png 5760 ISIC_0029516_segmentation.png 141 ISIC_0030870_segmentation.png Load Image Dataset
Python
img_list = os.listdir('/content/imgs/')
df_images = pd.DataFrame(img_list, columns=['image_id'])
print('Image Dataset Size: ', df_images.size)
print(df_images.sample(5))
Output:
Image Dataset Size: 5000 image_id 658 ISIC_0027306.jpg 3046 ISIC_0027958.jpg 153 ISIC_0026017.jpg 2846 ISIC_0027431.jpg 2056 ISIC_0026607.jpg Step 4: PreprocessingLoad and Resize Images and Masks
Python
img_bad = []
mask_bad = []
not_found_imgs = []
start_val = 24306
i = start_val + 1
size = start_val + 1000
while i <= size:
num = str(i)
zeroes = 7 - len(num)
mask_path = f'/content/masks/HAM10000_segmentations_lesion_tschandl/ISIC_{zeroes * "0"}{num}_segmentation.png'
mask = cv2.imread(mask_path)
if type(mask) is np.ndarray:
mask = cv2.resize(mask, (128, 128))
mask_bad.append(mask)
else:
not_found_imgs.append(i)
img_path = f'/content/imgs/ISIC_{zeroes * "0"}{num}.jpg'
img = cv2.imread(img_path)
if type(img) is np.ndarray:
img = cv2.resize(img, (128, 128))
img_bad.append(img)
#print(i)
i += 1
mask_bad = np.array(mask_bad)
img_bad = np.array(img_bad)
print('Loaded Images:', len(img_bad))
print('Images Not Found:', not_found_imgs)
Output:
Loaded Images: 1000 Images Not Found: [] Display Image and Mask
Python
def RGBimshow(img):
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.imshow(img)
i = 135
plt.rcParams['figure.figsize'] = [10, 5]
plt.figure(1)
plt.subplot(1,2,1)
RGBimshow(img_bad[i])
plt.figure(2)
plt.subplot(1,2,1)
plt.rcParams['figure.figsize'] = [10, 5]
plt.imshow(mask_bad[i])
Output:

Step 5: Model Building- Model Architecture: Consists of residual connections and attention processes combined with the U-Net topology.
- Encoder Block: This four-layer block combines max pooling and a residual convolutional block for downsampling.
- Decoder block: The four-layer decoder block combines an attention block, a residual convolutional block, and upsampling.
- 1×1 convolutions (3-layers) are used to get the final output, which is then batch normalized and sigmoid activated.Using the training dataset and validation, the Attention Residual U-Net model is trained.
Define Helper Functions
Python
# Repeats the elements of a tensor along the specified axis
def repeat_elem(tensor, rep):
# lambda function to repeat Repeats the elements of a tensor along an axis by a factor of rep.
# If tensor has shape (None, 256,256,3), lambda will return a tensor of shape (None, 256,256,6), if specified axis=3 and rep=2.
return Lambda(lambda x, repnum: K.repeat_elements(x, repnum, axis=3), arguments={'repnum': rep})(tensor)
# Residual convolutional block with optional batch normalization and dropout
def res_conv_block(x, filter_size, size, dropout, batch_norm=False):
# First convolutional layer
conv = Conv2D(size, (filter_size, filter_size), padding='same')(x)
if batch_norm:
conv = BatchNormalization(axis=3)(conv)
conv = Activation('relu')(conv)
# Second convolutional layer
conv = Conv2D(size, (filter_size, filter_size), padding='same')(conv)
if batch_norm:
conv = BatchNormalization(axis=3)(conv)
if dropout > 0:
conv = Dropout(dropout)(conv)
# Shortcut connection to the input
shortcut = Conv2D(size, kernel_size=(1, 1), padding='same')(x)
if batch_norm:
shortcut = BatchNormalization(axis=3)(shortcut)
# Adding the shortcut and the conv output
res_path = tf.keras.layers.add([shortcut, conv])
res_path = Activation('relu')(res_path)
return res_path
# Creates a gating signal with optional batch normalization
def gating_signal(input, out_size, batch_norm=False):
x = Conv2D(out_size, (1, 1), padding='same')(input)
if batch_norm:
x = BatchNormalization()(x)
x = Activation('relu')(x)
return x
Attention blockAttention block for enhancing feature maps from the encoder using gating signal from the decoder
Python
# Attention block for enhancing feature maps from the encoder using gating signal from the decoder
def attention_block(x, gating, inter_shape):
shape_x = K.int_shape(x)
shape_g = K.int_shape(gating)
# Downsample the input tensor x
theta_x = Conv2D(inter_shape, (2, 2), strides=(2, 2), padding='same')(x)
shape_theta_x = K.int_shape(theta_x)
# Apply 1x1 convolution to the gating signal
phi_g = Conv2D(inter_shape, (1, 1), padding='same')(gating)
upsample_g = Conv2DTranspose(inter_shape, (3, 3), strides=(
shape_theta_x[1] // shape_g[1], shape_theta_x[2] // shape_g[2]), padding='same')(phi_g)
# Add the downsampled input tensor and upsampled gating signal
concat_xg = tf.keras.layers.add([upsample_g, theta_x])
act_xg = Activation('relu')(concat_xg)
# Apply a 1x1 convolution followed by a sigmoid activation
psi = Conv2D(1, (1, 1), padding='same')(act_xg)
sigmoid_xg = Activation('sigmoid')(psi)
shape_sigmoid = K.int_shape(sigmoid_xg)
# Upsample the attention map
upsample_psi = UpSampling2D(size=(
shape_x[1] // shape_sigmoid[1], shape_x[2] // shape_sigmoid[2]))(sigmoid_xg)
# Repeat the attention map along the channel dimension
upsample_psi = repeat_elem(upsample_psi, shape_x[3])
# Multiply the input tensor by the attention map
y = tf.keras.layers.multiply([upsample_psi, x])
# Apply a 1x1 convolution and batch normalization
result = Conv2D(shape_x[3], (1, 1), padding='same')(y)
result_bn = BatchNormalization()(result)
return result_bn
Encoder blockEncoder block consisting of a residual convolutional block and a max pooling layer
Python
# Encoder block consisting of a residual convolutional block and a max pooling layer
def encoder_block(inputs, filter_size, filter_num, dropout_rate, batch_norm):
conv = res_conv_block(inputs, filter_size, filter_num,
dropout_rate, batch_norm)
pool = MaxPooling2D(pool_size=(2, 2))(conv)
return conv, pool
Decoder block
Python
# Decoder block consisting of upsampling, attention mechanism, concatenation, and residual convolutional block
def decoder_block(input, conv, filter_size, filter_num, dropout_rate, batch_norm, up_samp_size, axis):
# Create a gating signal from the input
gating = gating_signal(input, filter_num, batch_norm)
# Create an attention block using the gating signal and the corresponding encoder output
att = attention_block(conv, gating, filter_num)
# Upsample the input
up = UpSampling2D(size=(up_samp_size, up_samp_size),
data_format="channels_last")(input)
# Concatenate the upsampled input with the attention output
up = Concatenate(axis=axis)([up, att])
# Apply a residual convolutional block to the concatenated output
up_conv = res_conv_block(
up, filter_size, filter_num, dropout_rate, batch_norm)
return up_conv
Define Attention ResUNet Model
Python
def Attention_Res_UNet(input_shape, NUM_CLASSES=1, dropout_rate=0.0, batch_norm=True):
FILTER_NUM = 64 # number of basic filters for the first layer
FILTER_SIZE = 3 # size of the convolutional filter
UP_SAMP_SIZE = 2 # size of upsampling filters
inputs = Input(input_shape, dtype=tf.float32)
axis = 3
# Downsampling layers (Encoder Block)
conv_128, pool_64 = encoder_block(
inputs, FILTER_SIZE, FILTER_NUM, dropout_rate, batch_norm)
conv_64, pool_32 = encoder_block(
pool_64, FILTER_SIZE, 2*FILTER_NUM, dropout_rate, batch_norm)
conv_32, pool_16 = encoder_block(
pool_32, FILTER_SIZE, 4*FILTER_NUM, dropout_rate, batch_norm)
conv_16, pool_8 = encoder_block(
pool_16, FILTER_SIZE, 8*FILTER_NUM, dropout_rate, batch_norm)
conv_8 = res_conv_block(
pool_8, FILTER_SIZE, 16*FILTER_NUM, dropout_rate, batch_norm)
# Upsampling layers (Decoder Block)
up_conv_16 = decoder_block(conv_8, conv_16, FILTER_SIZE,
8*FILTER_NUM, dropout_rate, batch_norm, UP_SAMP_SIZE, axis)
up_conv_32 = decoder_block(up_conv_16, conv_32, FILTER_SIZE,
4*FILTER_NUM, dropout_rate, batch_norm, UP_SAMP_SIZE, axis)
up_conv_64 = decoder_block(up_conv_32, conv_64, FILTER_SIZE,
2*FILTER_NUM, dropout_rate, batch_norm, UP_SAMP_SIZE, axis)
up_conv_128 = decoder_block(up_conv_64, conv_128, FILTER_SIZE,
FILTER_NUM, dropout_rate, batch_norm, UP_SAMP_SIZE, axis)
# 1x1 convolutional layer
conv_final = Conv2D(NUM_CLASSES, kernel_size=(1, 1))(up_conv_128)
conv_final = BatchNormalization(axis=axis)(conv_final)
conv_final = Activation('sigmoid')(conv_final)
model = Model(inputs, conv_final, name="AttentionResUNet")
return model
Model Summary
Python
input_shape = (128,128,3)
model = Attention_Res_UNet(input_shape)
model.summary()
Output:
Model: "AttentionResUNet" ┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ Connected to ┃ ┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩ │ input_layer_2 │ (None, 128, 128, │ 0 │ - │ │ (InputLayer) │ 3) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_96 (Conv2D) │ (None, 128, 128, │ 1,792 │ input_layer_2[0]… │ │ │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 128, 128, │ 256 │ conv2d_96[0][0] │ │ (BatchNormalizatio… │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_62 │ (None, 128, 128, │ 0 │ batch_normalizat… │ │ (Activation) │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_98 (Conv2D) │ (None, 128, 128, │ 256 │ input_layer_2[0]… │ │ │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_97 (Conv2D) │ (None, 128, 128, │ 36,928 │ activation_62[0]… │ │ │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 128, 128, │ 256 │ conv2d_98[0][0] │ │ (BatchNormalizatio… │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 128, 128, │ 256 │ conv2d_97[0][0] │ │ (BatchNormalizatio… │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ add_26 (Add) │ (None, 128, 128, │ 0 │ batch_normalizat… │ │ │ 64) │ │ batch_normalizat… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_63 │ (None, 128, 128, │ 0 │ add_26[0][0] │ │ (Activation) │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ max_pooling2d_8 │ (None, 64, 64, │ 0 │ activation_63[0]… │ │ (MaxPooling2D) │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_99 (Conv2D) │ (None, 64, 64, │ 73,856 │ max_pooling2d_8[… │ │ │ 128) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 64, 64, │ 512 │ conv2d_99[0][0] │ │ (BatchNormalizatio… │ 128) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_64 │ (None, 64, 64, │ 0 │ batch_normalizat… │ │ (Activation) │ 128) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_101 (Conv2D) │ (None, 64, 64, │ 8,320 │ max_pooling2d_8[… │ │ │ 128) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_100 (Conv2D) │ (None, 64, 64, │ 147,584 │ activation_64[0]… │ │ │ 128) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 64, 64, │ 512 │ conv2d_101[0][0] │ │ (BatchNormalizatio… │ 128) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 64, 64, │ 512 │ conv2d_100[0][0] │ │ (BatchNormalizatio… │ 128) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ add_27 (Add) │ (None, 64, 64, │ 0 │ batch_normalizat… │ │ │ 128) │ │ batch_normalizat… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_65 │ (None, 64, 64, │ 0 │ add_27[0][0] │ │ (Activation) │ 128) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ max_pooling2d_9 │ (None, 32, 32, │ 0 │ activation_65[0]… │ │ (MaxPooling2D) │ 128) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_102 (Conv2D) │ (None, 32, 32, │ 295,168 │ max_pooling2d_9[… │ │ │ 256) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 32, 32, │ 1,024 │ conv2d_102[0][0] │ │ (BatchNormalizatio… │ 256) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_66 │ (None, 32, 32, │ 0 │ batch_normalizat… │ │ (Activation) │ 256) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_104 (Conv2D) │ (None, 32, 32, │ 33,024 │ max_pooling2d_9[… │ │ │ 256) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_103 (Conv2D) │ (None, 32, 32, │ 590,080 │ activation_66[0]… │ │ │ 256) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 32, 32, │ 1,024 │ conv2d_104[0][0] │ │ (BatchNormalizatio… │ 256) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 32, 32, │ 1,024 │ conv2d_103[0][0] │ │ (BatchNormalizatio… │ 256) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ add_28 (Add) │ (None, 32, 32, │ 0 │ batch_normalizat… │ │ │ 256) │ │ batch_normalizat… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_67 │ (None, 32, 32, │ 0 │ add_28[0][0] │ │ (Activation) │ 256) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ max_pooling2d_10 │ (None, 16, 16, │ 0 │ activation_67[0]… │ │ (MaxPooling2D) │ 256) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_105 (Conv2D) │ (None, 16, 16, │ 1,180,160 │ max_pooling2d_10… │ │ │ 512) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 16, 16, │ 2,048 │ conv2d_105[0][0] │ │ (BatchNormalizatio… │ 512) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_68 │ (None, 16, 16, │ 0 │ batch_normalizat… │ │ (Activation) │ 512) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_107 (Conv2D) │ (None, 16, 16, │ 131,584 │ max_pooling2d_10… │ │ │ 512) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_106 (Conv2D) │ (None, 16, 16, │ 2,359,808 │ activation_68[0]… │ │ │ 512) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 16, 16, │ 2,048 │ conv2d_107[0][0] │ │ (BatchNormalizatio… │ 512) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 16, 16, │ 2,048 │ conv2d_106[0][0] │ │ (BatchNormalizatio… │ 512) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ add_29 (Add) │ (None, 16, 16, │ 0 │ batch_normalizat… │ │ │ 512) │ │ batch_normalizat… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_69 │ (None, 16, 16, │ 0 │ add_29[0][0] │ │ (Activation) │ 512) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ max_pooling2d_11 │ (None, 8, 8, 512) │ 0 │ activation_69[0]… │ │ (MaxPooling2D) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_108 (Conv2D) │ (None, 8, 8, │ 4,719,616 │ max_pooling2d_11… │ │ │ 1024) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 8, 8, │ 4,096 │ conv2d_108[0][0] │ │ (BatchNormalizatio… │ 1024) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_70 │ (None, 8, 8, │ 0 │ batch_normalizat… │ │ (Activation) │ 1024) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_110 (Conv2D) │ (None, 8, 8, │ 525,312 │ max_pooling2d_11… │ │ │ 1024) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_109 (Conv2D) │ (None, 8, 8, │ 9,438,208 │ activation_70[0]… │ │ │ 1024) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 8, 8, │ 4,096 │ conv2d_110[0][0] │ │ (BatchNormalizatio… │ 1024) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 8, 8, │ 4,096 │ conv2d_109[0][0] │ │ (BatchNormalizatio… │ 1024) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ add_30 (Add) │ (None, 8, 8, │ 0 │ batch_normalizat… │ │ │ 1024) │ │ batch_normalizat… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_71 │ (None, 8, 8, │ 0 │ add_30[0][0] │ │ (Activation) │ 1024) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_111 (Conv2D) │ (None, 8, 8, 512) │ 524,800 │ activation_71[0]… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 8, 8, 512) │ 2,048 │ conv2d_111[0][0] │ │ (BatchNormalizatio… │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_72 │ (None, 8, 8, 512) │ 0 │ batch_normalizat… │ │ (Activation) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_113 (Conv2D) │ (None, 8, 8, 512) │ 262,656 │ activation_72[0]… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_transpose_8 │ (None, 8, 8, 512) │ 2,359,808 │ conv2d_113[0][0] │ │ (Conv2DTranspose) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_112 (Conv2D) │ (None, 8, 8, 512) │ 1,049,088 │ activation_69[0]… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ add_31 (Add) │ (None, 8, 8, 512) │ 0 │ conv2d_transpose… │ │ │ │ │ conv2d_112[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_73 │ (None, 8, 8, 512) │ 0 │ add_31[0][0] │ │ (Activation) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_114 (Conv2D) │ (None, 8, 8, 1) │ 513 │ activation_73[0]… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_74 │ (None, 8, 8, 1) │ 0 │ conv2d_114[0][0] │ │ (Activation) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ up_sampling2d_16 │ (None, 16, 16, 1) │ 0 │ activation_74[0]… │ │ (UpSampling2D) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ lambda_8 (Lambda) │ (None, 16, 16, │ 0 │ up_sampling2d_16… │ │ │ 512) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ multiply_8 │ (None, 16, 16, │ 0 │ lambda_8[0][0], │ │ (Multiply) │ 512) │ │ activation_69[0]… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_115 (Conv2D) │ (None, 16, 16, │ 262,656 │ multiply_8[0][0] │ │ │ 512) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ up_sampling2d_17 │ (None, 16, 16, │ 0 │ activation_71[0]… │ │ (UpSampling2D) │ 1024) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 16, 16, │ 2,048 │ conv2d_115[0][0] │ │ (BatchNormalizatio… │ 512) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ concatenate_8 │ (None, 16, 16, │ 0 │ up_sampling2d_17… │ │ (Concatenate) │ 1536) │ │ batch_normalizat… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_116 (Conv2D) │ (None, 16, 16, │ 7,078,400 │ concatenate_8[0]… │ │ │ 512) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 16, 16, │ 2,048 │ conv2d_116[0][0] │ │ (BatchNormalizatio… │ 512) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_75 │ (None, 16, 16, │ 0 │ batch_normalizat… │ │ (Activation) │ 512) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_118 (Conv2D) │ (None, 16, 16, │ 786,944 │ concatenate_8[0]… │ │ │ 512) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_117 (Conv2D) │ (None, 16, 16, │ 2,359,808 │ activation_75[0]… │ │ │ 512) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 16, 16, │ 2,048 │ conv2d_118[0][0] │ │ (BatchNormalizatio… │ 512) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 16, 16, │ 2,048 │ conv2d_117[0][0] │ │ (BatchNormalizatio… │ 512) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ add_32 (Add) │ (None, 16, 16, │ 0 │ batch_normalizat… │ │ │ 512) │ │ batch_normalizat… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_76 │ (None, 16, 16, │ 0 │ add_32[0][0] │ │ (Activation) │ 512) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_119 (Conv2D) │ (None, 16, 16, │ 131,328 │ activation_76[0]… │ │ │ 256) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 16, 16, │ 1,024 │ conv2d_119[0][0] │ │ (BatchNormalizatio… │ 256) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_77 │ (None, 16, 16, │ 0 │ batch_normalizat… │ │ (Activation) │ 256) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_121 (Conv2D) │ (None, 16, 16, │ 65,792 │ activation_77[0]… │ │ │ 256) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_transpose_9 │ (None, 16, 16, │ 590,080 │ conv2d_121[0][0] │ │ (Conv2DTranspose) │ 256) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_120 (Conv2D) │ (None, 16, 16, │ 262,400 │ activation_67[0]… │ │ │ 256) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ add_33 (Add) │ (None, 16, 16, │ 0 │ conv2d_transpose… │ │ │ 256) │ │ conv2d_120[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_78 │ (None, 16, 16, │ 0 │ add_33[0][0] │ │ (Activation) │ 256) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_122 (Conv2D) │ (None, 16, 16, 1) │ 257 │ activation_78[0]… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_79 │ (None, 16, 16, 1) │ 0 │ conv2d_122[0][0] │ │ (Activation) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ up_sampling2d_18 │ (None, 32, 32, 1) │ 0 │ activation_79[0]… │ │ (UpSampling2D) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ lambda_9 (Lambda) │ (None, 32, 32, │ 0 │ up_sampling2d_18… │ │ │ 256) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ multiply_9 │ (None, 32, 32, │ 0 │ lambda_9[0][0], │ │ (Multiply) │ 256) │ │ activation_67[0]… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_123 (Conv2D) │ (None, 32, 32, │ 65,792 │ multiply_9[0][0] │ │ │ 256) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ up_sampling2d_19 │ (None, 32, 32, │ 0 │ activation_76[0]… │ │ (UpSampling2D) │ 512) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 32, 32, │ 1,024 │ conv2d_123[0][0] │ │ (BatchNormalizatio… │ 256) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ concatenate_9 │ (None, 32, 32, │ 0 │ up_sampling2d_19… │ │ (Concatenate) │ 768) │ │ batch_normalizat… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_124 (Conv2D) │ (None, 32, 32, │ 1,769,728 │ concatenate_9[0]… │ │ │ 256) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 32, 32, │ 1,024 │ conv2d_124[0][0] │ │ (BatchNormalizatio… │ 256) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_80 │ (None, 32, 32, │ 0 │ batch_normalizat… │ │ (Activation) │ 256) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_126 (Conv2D) │ (None, 32, 32, │ 196,864 │ concatenate_9[0]… │ │ │ 256) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_125 (Conv2D) │ (None, 32, 32, │ 590,080 │ activation_80[0]… │ │ │ 256) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 32, 32, │ 1,024 │ conv2d_126[0][0] │ │ (BatchNormalizatio… │ 256) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 32, 32, │ 1,024 │ conv2d_125[0][0] │ │ (BatchNormalizatio… │ 256) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ add_34 (Add) │ (None, 32, 32, │ 0 │ batch_normalizat… │ │ │ 256) │ │ batch_normalizat… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_81 │ (None, 32, 32, │ 0 │ add_34[0][0] │ │ (Activation) │ 256) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_127 (Conv2D) │ (None, 32, 32, │ 32,896 │ activation_81[0]… │ │ │ 128) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 32, 32, │ 512 │ conv2d_127[0][0] │ │ (BatchNormalizatio… │ 128) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_82 │ (None, 32, 32, │ 0 │ batch_normalizat… │ │ (Activation) │ 128) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_129 (Conv2D) │ (None, 32, 32, │ 16,512 │ activation_82[0]… │ │ │ 128) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_transpose_10 │ (None, 32, 32, │ 147,584 │ conv2d_129[0][0] │ │ (Conv2DTranspose) │ 128) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_128 (Conv2D) │ (None, 32, 32, │ 65,664 │ activation_65[0]… │ │ │ 128) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ add_35 (Add) │ (None, 32, 32, │ 0 │ conv2d_transpose… │ │ │ 128) │ │ conv2d_128[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_83 │ (None, 32, 32, │ 0 │ add_35[0][0] │ │ (Activation) │ 128) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_130 (Conv2D) │ (None, 32, 32, 1) │ 129 │ activation_83[0]… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_84 │ (None, 32, 32, 1) │ 0 │ conv2d_130[0][0] │ │ (Activation) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ up_sampling2d_20 │ (None, 64, 64, 1) │ 0 │ activation_84[0]… │ │ (UpSampling2D) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ lambda_10 (Lambda) │ (None, 64, 64, │ 0 │ up_sampling2d_20… │ │ │ 128) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ multiply_10 │ (None, 64, 64, │ 0 │ lambda_10[0][0], │ │ (Multiply) │ 128) │ │ activation_65[0]… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_131 (Conv2D) │ (None, 64, 64, │ 16,512 │ multiply_10[0][0] │ │ │ 128) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ up_sampling2d_21 │ (None, 64, 64, │ 0 │ activation_81[0]… │ │ (UpSampling2D) │ 256) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 64, 64, │ 512 │ conv2d_131[0][0] │ │ (BatchNormalizatio… │ 128) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ concatenate_10 │ (None, 64, 64, │ 0 │ up_sampling2d_21… │ │ (Concatenate) │ 384) │ │ batch_normalizat… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_132 (Conv2D) │ (None, 64, 64, │ 442,496 │ concatenate_10[0… │ │ │ 128) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 64, 64, │ 512 │ conv2d_132[0][0] │ │ (BatchNormalizatio… │ 128) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_85 │ (None, 64, 64, │ 0 │ batch_normalizat… │ │ (Activation) │ 128) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_134 (Conv2D) │ (None, 64, 64, │ 49,280 │ concatenate_10[0… │ │ │ 128) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_133 (Conv2D) │ (None, 64, 64, │ 147,584 │ activation_85[0]… │ │ │ 128) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 64, 64, │ 512 │ conv2d_134[0][0] │ │ (BatchNormalizatio… │ 128) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 64, 64, │ 512 │ conv2d_133[0][0] │ │ (BatchNormalizatio… │ 128) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ add_36 (Add) │ (None, 64, 64, │ 0 │ batch_normalizat… │ │ │ 128) │ │ batch_normalizat… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_86 │ (None, 64, 64, │ 0 │ add_36[0][0] │ │ (Activation) │ 128) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_135 (Conv2D) │ (None, 64, 64, │ 8,256 │ activation_86[0]… │ │ │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 64, 64, │ 256 │ conv2d_135[0][0] │ │ (BatchNormalizatio… │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_87 │ (None, 64, 64, │ 0 │ batch_normalizat… │ │ (Activation) │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_137 (Conv2D) │ (None, 64, 64, │ 4,160 │ activation_87[0]… │ │ │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_transpose_11 │ (None, 64, 64, │ 36,928 │ conv2d_137[0][0] │ │ (Conv2DTranspose) │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_136 (Conv2D) │ (None, 64, 64, │ 16,448 │ activation_63[0]… │ │ │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ add_37 (Add) │ (None, 64, 64, │ 0 │ conv2d_transpose… │ │ │ 64) │ │ conv2d_136[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_88 │ (None, 64, 64, │ 0 │ add_37[0][0] │ │ (Activation) │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_138 (Conv2D) │ (None, 64, 64, 1) │ 65 │ activation_88[0]… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_89 │ (None, 64, 64, 1) │ 0 │ conv2d_138[0][0] │ │ (Activation) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ up_sampling2d_22 │ (None, 128, 128, │ 0 │ activation_89[0]… │ │ (UpSampling2D) │ 1) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ lambda_11 (Lambda) │ (None, 128, 128, │ 0 │ up_sampling2d_22… │ │ │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ multiply_11 │ (None, 128, 128, │ 0 │ lambda_11[0][0], │ │ (Multiply) │ 64) │ │ activation_63[0]… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_139 (Conv2D) │ (None, 128, 128, │ 4,160 │ multiply_11[0][0] │ │ │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ up_sampling2d_23 │ (None, 128, 128, │ 0 │ activation_86[0]… │ │ (UpSampling2D) │ 128) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 128, 128, │ 256 │ conv2d_139[0][0] │ │ (BatchNormalizatio… │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ concatenate_11 │ (None, 128, 128, │ 0 │ up_sampling2d_23… │ │ (Concatenate) │ 192) │ │ batch_normalizat… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_140 (Conv2D) │ (None, 128, 128, │ 110,656 │ concatenate_11[0… │ │ │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 128, 128, │ 256 │ conv2d_140[0][0] │ │ (BatchNormalizatio… │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_90 │ (None, 128, 128, │ 0 │ batch_normalizat… │ │ (Activation) │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_142 (Conv2D) │ (None, 128, 128, │ 12,352 │ concatenate_11[0… │ │ │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_141 (Conv2D) │ (None, 128, 128, │ 36,928 │ activation_90[0]… │ │ │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 128, 128, │ 256 │ conv2d_142[0][0] │ │ (BatchNormalizatio… │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 128, 128, │ 256 │ conv2d_141[0][0] │ │ (BatchNormalizatio… │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ add_38 (Add) │ (None, 128, 128, │ 0 │ batch_normalizat… │ │ │ 64) │ │ batch_normalizat… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_91 │ (None, 128, 128, │ 0 │ add_38[0][0] │ │ (Activation) │ 64) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ conv2d_143 (Conv2D) │ (None, 128, 128, │ 65 │ activation_91[0]… │ │ │ 1) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ batch_normalizatio… │ (None, 128, 128, │ 4 │ conv2d_143[0][0] │ │ (BatchNormalizatio… │ 1) │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ activation_92 │ (None, 128, 128, │ 0 │ batch_normalizat… │ │ (Activation) │ 1) │ │ │ └─────────────────────┴───────────────────┴────────────┴───────────────────┘ Total params: 39,090,377 (149.12 MB) Trainable params: 39,068,871 (149.04 MB) Non-trainable params: 21,506 (84.01 KB) Step 6: Model TrainingPrepare Data for Training
Python
s = max([img_bad.shape[0]])
img = []
mask = []
y = []
for i in range(s):
try:
img.append(img_bad[i])
mask.append(mask_bad[i][:, :, 0:1])
y.append([0, 1, 0])
except:
pass
img = np.array(img)
mask = np.array(mask)
mask = mask.astype(bool)
y = np.array(y)
mask.shape
Output:
(1000, 128, 128, 1) Define Callbacks and Compile Model
Python
call = EarlyStopping(monitor='val_accuracy', patience=5,
restore_best_weights=True)
arr = []
class CustomCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs=None):
arr.append(self.model.predict(img[1:2].reshape(128, 128)))
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.01),
loss='binary_crossentropy', metrics=['accuracy'])
Train Model
Python
history = model.fit(x=img, y=mask, epochs=50, callbacks=[call])
Output:
Epoch 1/50 32/32 [==============================] - 89s 962ms/step - loss: 0.4492 - accuracy: 0.8589 Epoch 2/50 32/32 [==============================] - 22s 676ms/step - loss: 0.3339 - accuracy: 0.9085 Epoch 3/50 32/32 [==============================] - 22s 678ms/step - loss: 0.2804 - accuracy: 0.9188 Epoch 4/50 32/32 [==============================] - 22s 679ms/step - loss: 0.2390 - accuracy: 0.9282 Epoch 5/50 32/32 [==============================] - 22s 684ms/step - loss: 0.2261 - accuracy: 0.9261 Epoch 6/50 32/32 [==============================] - 22s 690ms/step - loss: 0.2024 - accuracy: 0.9345 Epoch 7/50 32/32 [==============================] - 22s 692ms/step - loss: 0.1948 - accuracy: 0.9338 Epoch 8/50 32/32 [==============================] - 22s 693ms/step - loss: 0.1869 - accuracy: 0.9359 Epoch 9/50 32/32 [==============================] - 22s 694ms/step - loss: 0.1766 - accuracy: 0.9385 Epoch 10/50 32/32 [==============================] - 22s 697ms/step - loss: 0.1765 - accuracy: 0.9373 Epoch 11/50 32/32 [==============================] - 22s 700ms/step - loss: 0.1702 - accuracy: 0.9388 Epoch 12/50 32/32 [==============================] - 22s 702ms/step - loss: 0.1637 - accuracy: 0.9403 Epoch 13/50 32/32 [==============================] - 23s 705ms/step - loss: 0.1560 - accuracy: 0.9432 Epoch 14/50 32/32 [==============================] - 22s 699ms/step - loss: 0.1505 - accuracy: 0.9458 Epoch 15/50 32/32 [==============================] - 22s 700ms/step - loss: 0.1451 - accuracy: 0.9467 Epoch 16/50 32/32 [==============================] - 22s 702ms/step - loss: 0.1444 - accuracy: 0.9465 Epoch 17/50 32/32 [==============================] - 22s 700ms/step - loss: 0.1419 - accuracy: 0.9474 Epoch 18/50 32/32 [==============================] - 22s 698ms/step - loss: 0.1366 - accuracy: 0.9488 Epoch 19/50 32/32 [==============================] - 22s 701ms/step - loss: 0.1317 - accuracy: 0.9502 Epoch 20/50 32/32 [==============================] - 22s 701ms/step - loss: 0.1356 - accuracy: 0.9475 Epoch 21/50 32/32 [==============================] - 22s 700ms/step - loss: 0.1271 - accuracy: 0.9518 Epoch 22/50 32/32 [==============================] - 22s 699ms/step - loss: 0.1206 - accuracy: 0.9541 Epoch 23/50 32/32 [==============================] - 22s 700ms/step - loss: 0.1293 - accuracy: 0.9506 Epoch 24/50 32/32 [==============================] - 22s 701ms/step - loss: 0.1226 - accuracy: 0.9532 Epoch 25/50 32/32 [==============================] - 22s 701ms/step - loss: 0.1232 - accuracy: 0.9527 Epoch 26/50 32/32 [==============================] - 22s 699ms/step - loss: 0.1362 - accuracy: 0.9481 Epoch 27/50 32/32 [==============================] - 22s 699ms/step - loss: 0.1158 - accuracy: 0.9550 Epoch 28/50 32/32 [==============================] - 22s 699ms/step - loss: 0.1057 - accuracy: 0.9595 Epoch 29/50 32/32 [==============================] - 22s 700ms/step - loss: 0.1132 - accuracy: 0.9555 Epoch 30/50 32/32 [==============================] - 22s 702ms/step - loss: 0.1097 - accuracy: 0.9577 Epoch 31/50 32/32 [==============================] - 22s 702ms/step - loss: 0.0975 - accuracy: 0.9621 Epoch 32/50 32/32 [==============================] - 22s 700ms/step - loss: 0.0986 - accuracy: 0.9617 Epoch 33/50 32/32 [==============================] - 22s 698ms/step - loss: 0.1057 - accuracy: 0.9579 Epoch 34/50 32/32 [==============================] - 22s 701ms/step - loss: 0.0950 - accuracy: 0.9627 Epoch 35/50 32/32 [==============================] - 22s 703ms/step - loss: 0.0931 - accuracy: 0.9634 Epoch 36/50 32/32 [==============================] - 22s 700ms/step - loss: 0.0878 - accuracy: 0.9655 Epoch 37/50 32/32 [==============================] - 22s 699ms/step - loss: 0.1033 - accuracy: 0.9596 Epoch 38/50 32/32 [==============================] - 22s 700ms/step - loss: 0.0928 - accuracy: 0.9638 Epoch 39/50 32/32 [==============================] - 23s 704ms/step - loss: 0.0972 - accuracy: 0.9618 Epoch 40/50 32/32 [==============================] - 22s 700ms/step - loss: 0.0953 - accuracy: 0.9623 Epoch 41/50 32/32 [==============================] - 22s 698ms/step - loss: 0.0808 - accuracy: 0.9679 Epoch 42/50 32/32 [==============================] - 22s 701ms/step - loss: 0.0744 - accuracy: 0.9708 Epoch 43/50 32/32 [==============================] - 22s 703ms/step - loss: 0.0691 - accuracy: 0.9732 Epoch 44/50 32/32 [==============================] - 22s 700ms/step - loss: 0.0736 - accuracy: 0.9708 Epoch 45/50 32/32 [==============================] - 22s 698ms/step - loss: 0.0671 - accuracy: 0.9738 Epoch 46/50 32/32 [==============================] - 22s 700ms/step - loss: 0.0681 - accuracy: 0.9733 Epoch 47/50 32/32 [==============================] - 22s 702ms/step - loss: 0.0685 - accuracy: 0.9728 Epoch 48/50 32/32 [==============================] - 22s 700ms/step - loss: 0.0880 - accuracy: 0.9650 Epoch 49/50 32/32 [==============================] - 22s 699ms/step - loss: 0.0734 - accuracy: 0.9711 Epoch 50/50 32/32 [==============================] - 22s 700ms/step - loss: 0.0623 - accuracy: 0.9752 Step 7: PredictionsHelper Functions to Calculate Area
Python
def maskArea(img):
DPI = 72
INCH_TO_CM = 2.54
sum_of_pixels = (img.sum() / 255)
img_area = ((1 / DPI) ** 2) * (INCH_TO_CM ** 2) * sum_of_pixels
return img_area
def area(img):
DPI = 72
INCH_TO_CM = 2.54
sum_of_pixels = img.sum()
img_area = ((1 / DPI) ** 2) * (INCH_TO_CM ** 2) * sum_of_pixels
return img_area
Predict and Display Results
Python
i = 55
# Predict the mask for the i-th image
img_test_1 = model.predict(img[i:i+1]).reshape(128, 128)
# Plot the predicted mask
plt.figure(1)
plt.subplot(122)
plt.imshow(img_test_1, cmap=plt.cm.binary)
plt.title('Predicted Mask')
plt.colorbar()
# Plot the original image
plt.figure(2)
plt.subplot(122)
RGBimshow(img[i])
plt.title('Original Image')
# Plot the ground truth mask
plt.figure(3)
plt.subplot(122)
plt.imshow(mask[i].reshape(128, 128), cmap=plt.cm.binary)
plt.title('Ground Truth Mask')
# Convert the predicted mask to binary
img_test_2 = (img_test_1 >= 0.5)
# Plot the binary predicted mask
plt.figure(1)
plt.subplot(1, 2, 1)
plt.imshow(img_test_2)
plt.title('Binary Predicted Mask')
# Plot the binary predicted mask with binary colormap
plt.figure(2)
plt.subplot(1, 2, 1)
plt.imshow(img_test_2, cmap=plt.cm.binary)
plt.title('Binary Predicted Mask (Binary Colormap)')
i = 135
# Predict the mask for the i-th image
img_pred = model.predict(img[i:i+1]).reshape(128, 128)
img_pred = (img_pred >= 0.5)
# Plot the predicted mask
plt.figure(1)
plt.subplot(122)
plt.imshow(img_pred, cmap=plt.cm.binary)
plt.title('Predicted Mask')
plt.colorbar()
# Plot the original image
plt.figure(2)
plt.subplot(122)
RGBimshow(img[i])
plt.title('Original Image')
# Plot the ground truth mask
plt.figure(3)
plt.subplot(122)
plt.imshow(mask_bad[i][:, :, 0].reshape(128, 128), cmap=plt.cm.binary)
plt.title('Ground Truth Mask')
i = 57
# Predict the mask for the i-th image
img_pred = model.predict(img[i:i+1]).reshape(128, 128)
img_pred = (img_pred >= 0.5)
# Plot the predicted mask
plt.figure(1)
plt.subplot(122)
plt.imshow(img_pred, cmap=plt.cm.binary)
plt.title('Predicted Mask')
plt.colorbar()
# Plot the original image
plt.figure(2)
plt.subplot(122)
RGBimshow(img[i])
plt.title('Original Image')
# Plot the ground truth mask
plt.figure(3)
plt.subplot(122)
plt.imshow(mask_bad[i][:, :, 0].reshape(128, 128), cmap=plt.cm.binary)
plt.title('Ground Truth Mask')
Output:
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 22ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 22ms/step
Text(0.5, 1.0, 'Ground Truth Mask') 
Application of Attention Res-UNet in Image SegmentationApplying foundation models to image segmentation provides an efficient way to handle challenging segmentation jobs in many fields. Using these pre-trained models has a number of advantages, including promoting creativity, increasing performance, and enabling effective model development. The following are some salient features that illustrate the application of foundation models to picture segmentation:
- Effective Model Development: By offering pre-trained networks, foundation models expedite the development process and save time and resources that would otherwise be required for initial training. Model deployment for segmentation tasks is accelerated as a result.
- Benefits of Transfer Learning: By utilizing transfer learning, foundation models apply pre-training knowledge to particular segmentation tasks. This improves model performance and generalization by making it easier to collect generic visual properties and spatial relationships.
- Improved Performance: A foundation model’s accuracy and efficiency are increased when it is adjusted on a target dataset to better suit the specifics of the segmentation task. This flexibility guarantees that the model operates at its best across a variety of settings and datasets.
- Versatility Across Domains: Foundation models exhibit versatility, since they can effectively tackle segmentation problems in a range of domains like as item identification, medical imaging, and environmental monitoring. Their versatility enables customisation to satisfy certain objectives and use scenarios.
- Real-world Applications and Seamless Integration: These models allow for quick prototype and deployment in real-world applications by integrating smoothly into current workflows and frameworks. With their advanced segmentation capabilities, foundation models enable practitioners to tackle a wide range of problems, from medical diagnosis to urban planning.
When evaluating the efficacy and practicality of foundation models for image segmentation, case studies and performance evaluation are essential tools. Let’s examine these features in more detail:
- Metrics Assessment: Metrics like Mean Average Precision (mAP), Dice Coefficient, and Intersection over Union (IoU) are used to assess foundation models. These measures put segmentation robustness, accuracy, and consistency into numerical form.
- Benchmarking Against Ground Truth: To verify the effectiveness of the model, segmentation results are compared to annotations from the ground truth. This guarantees that objects and regions inside photos are correctly delineated by the model.
- Validation on Test Datasets: To confirm the model’s generalization ability, extensive validation is carried out on various test datasets. To ensure reliability, this entails evaluating performance across many datasets and circumstances.
- Accuracy: The primary metric for evaluating the performance of Attention ResUNet models is accuracy. This involves comparing the model’s segmentation results against ground truth labels. Higher accuracy indicates better performance in accurately identifying and delineating objects in images.
- Speed and Efficiency: Apart from accuracy metrics, the computational efficiency of the Attention ResUNet model is also crucial, especially in real-time applications. Evaluation should include metrics such as inference time and model size to assess efficiency.
Case Studies:- Medical Imaging: In medical imaging, foundation models are used to perform tasks including organ segmentation, tumor detection, and illness diagnosis. Case examples illustrate how these models enhance the precision of diagnoses and support clinicians in making treatment decisions.
- Autonomous Driving: Foundation models provide for robust scene analysis, obstacle recognition, and path planning in the field of autonomous driving. Case examples demonstrate how these models improve self-driving car safety and navigation.
- Satellite Imagery Analysis: Applications of foundation models include land cover classification, urban planning, and environmental monitoring in satellite imaging analysis. Case studies demonstrate how useful these models are for gleaning insightful information from remote sensing data.
- Object Detection and Recognition: Foundation models are used for tasks like object detection and recognition across a variety of domains, going beyond segmentation. Case examples show how adaptable and effective these models are at identifying and categorizing objects of interest.
- Industrial Applications: Process optimization, defect detection, and quality control are all achieved in industrial settings through the use of foundation models. Case studies demonstrate how these models enhance productivity and dependability in production and manufacturing settings.
ConclusionTo sum up, foundation models offer a reliable and effective framework for model creation and implementation, marking a paradigm shift in the field of image segmentation. Researchers and practitioners can use transfer learning to efficiently and accurately handle a variety of segmentation problems by utilizing pre-trained CNN architectures. Foundation models will become more and more important as the industry develops since they will push innovation, improve image segmentation, and promote cooperation amongst many sectors and domains.
|