VGG-16 network
Reference implementation of the classic VGG-16 network
class model_VGG16(channel=3, im_height=224, im_width=224, Nclass=1000,
kernel_size=3, border_mode=(1, 1), flip_filters=False)
- channel: input channel number
- Nclass: output class number
The model accepts input of shape in the order of (B, C, H, W), and outputs with shape (B, N).
Depthwise Separable Convolution
Reference implementation of Depthwise Separable Convolution
class DSConv2D(in_channels, out_channels, kernel_size=(3,3), stride=(1,1),
dilation=(1,1), pad='valid')
- input_channels: int. Input shape is (B, input_channels, H_in, W_in)
- out_channels: int. Output shape is (B output_channels, H_out, W_out)
- kernel_size: int scalar or tuple of int. Convolution kernel size
- stride: Factor by which to subsample the output
- pad:
or 2-element tuple of int. Control image border padding. - dilation: factor by which to subsample (stride) the input.
The model do the depthwise 2D convolution per-channel of input, then map the output to #out_channels number of channel by pointwise 1*1 convolution. No activation applied inside.
ResNet bottleneck
Reference implementation of bottleneck building block of ResNet network
class ResNet_bottleneck(outer_channel=256, inner_channel=64, border_mode='same',
batchnorm_mode=1, activation=relu)
- outer_channel: channel number of block input
- inner_channel: channel number inside the block
- batchnorm_mode: {0 | 1 | 2}. 0 means no batch normalization applied; 1 means batch normalization applied to each cnn; 2 means batch normalization only applied to the last cnn
- activation: default = relu. Note no activation applied to the last element-wise sum output.
The model accepts input of shape in the order of (B, C, H, W), and outputs with the same shape.
Feature Pyramid Network
Reference implementation of feature pyramid network
class model_FPN(input_channel=3, base_n_filters=64, batchnorm_mode=1)
- batchnorm_mode: same with
- return 4-element tuple
(p2, p3, p4, p5)
, CNN pyramid features at different scales, each with #channel = 4 *base_n_filters
Reference implementation of shuffle-net unit
class ShuffleUnit(in_channels=256, inner_channels=None, out_channels=None, group_num=4, border_mode='same',
batchnorm_mode=1, activation=relu, stride=(1,1), dilation=(1,1), fusion_mode='add')
- in_channels: channel number of unit input
- inner_channel: optional, channel number inside the unit, default =
- out_channels: channel number of unit output, only used when
= 'concat', and must >in_channels
- group_num: number of convolution groups
- border_mode: only
allowed - batchnorm_mode: {0 | 1 | 2}. 0 means no batch normalization applied; 1 means batch normalization applied to each cnn; 2 means batch normalization only applied to the last cnn
- activation: default = relu. Note no activation applied to the last output.
- stride, dilation: only used for depthwise separable convolution module inside
- fusion_mode: {'add' | 'concat'}. When 'concat',
must >in_channels
. - return: convolution result with #channel =
='add', #channel =out_channels
Reference implementation of shuffle-net unit stack
class ShuffleUnit_Stack(in_channels, inner_channels=None, out_channels=None, group_num=4, batchnorm_mode=1,
activation=relu, stack_size=3, stride=2, fusion_mode='concat')
- in_channels: channel number of input
- inner_channel: optional, channel number inside the shuffle-unit, default =
- out_channels: channel number of stack output, must >
- group_num: number of convolution groups
- batchnorm_mode: {0 | 1 | 2}. 0 means no batch normalization applied; 1 means batch normalization applied to each cnn; 2 means batch normalization only applied to the last cnn
- activation: default = relu. Note no activation applied to the last output.
- stack_size: number of shuffle-unit in the stack
- stride: int or tuple of int, convolution stride for the first unit, default=2
- fusion_mode: fusion_mode for the first unit.
Reference implementation of shuffle-net, without the final pooling & Dense layer.
class model_ShuffleNet(in_channels, group_num=4, stage_channels=(24, 272, 544, 1088), stack_size=(3, 7, 3),
batchnorm_mode=1, activation=relu)
- in_channels: channel number of input
- group_num: number of convolution groups
- stage_channels: channel number of each stage output.
- stack_size: size of each stack.
- batchnorm_mode: {0 | 1 | 2}. 0 means no batch normalization applied; 1 means batch normalization applied to each cnn; 2 means batch normalization only applied to the last cnn
- activation: default = relu. Note no activation applied to the last output.
Reference implementation of shufflenet_v2 unit
class ShuffleUnit_v2(in_channels=256, out_channels=None, border_mode='same', batchnorm_mode=1,
activation=relu, stride=1, dilation=1)
- in_channels: channel number of unit input
- out_channels: channel number of unit output, only used when
>1; whenstride1
is fixed toin_channels
. - border_mode: only
allowed - batchnorm_mode: {0 | 1 | 2}. 0 means no batch normalization applied; 1 means batch normalization applied to each cnn; 2 means batch normalization only applied to the last cnn
- activation: default = relu. Note no activation applied to the last output.
- stride, dilation: only used for depthwise separable convolution module inside, must be integer scalars or tuple of integers.
Reference implementation of shufflenet_v2 unit stack
class ShuffleUnit_v2_Stack(in_channels, out_channels, batchnorm_mode=1, activation=relu, stack_size=3, stride=2)
- in_channels: channel number of input
- out_channels: channel number of stack output
- batchnorm_mode: {0 | 1 | 2}. 0 means no batch normalization applied; 1 means batch normalization applied to each cnn; 2 means batch normalization only applied to the last cnn
- activation: default = relu. Note no activation applied to the last output.
- stack_size: number of shuffle-unit in the stack
- stride: int or tuple of int, convolution stride for the first unit, default=2
Reference implementation of shufflenet_v2, without the final pooling & Dense layer.
class model_ShuffleNet_v2(in_channels, stage_channels=(24, 116, 232, 464, 1024), stack_size=(3, 7, 3),
batchnorm_mode=1, activation=relu)
- in_channels: channel number of input
- stage_channels: channel number of each stage output.
- stack_size: size of each stack.
- batchnorm_mode: {0 | 1 | 2}. 0 means no batch normalization applied; 1 means batch normalization applied to each cnn; 2 means batch normalization only applied to the last cnn
- activation: default = relu. Note no activation applied to the last output.
Model reference implementation of CTPN
class model_CTPN(k=10, do_side_refinement_regress=False,
batchnorm_mode=1, channel=3, im_height=None, im_width=None,
kernel_size=3, border_mode=(1, 1), VGG_flip_filters=False,
- k: anchor box number
- do_side_refinement_regress: whether implement side refinement regression
- batchnorm_mode: {0|1}, whether insert batch normalization into the end of each convolution stage of VGG-16 net, useful for cold start.
- channel: input channel number
- im_height, im_width: input image height/width, optional
- kernel_size: convolution kernel size of VGG-16 net
- border_mode: border mode of VGG-16 net
- VGG_flip_filters: whether flip convolution kernels for VGG-16 net
- im2col: function corresponding to Caffe's
. IfNone
, the CTPN implementation will not strictly follow the original paper.
U-net FCN
Reference implementation of U-net FCN
class model_Unet(channel=1, im_height=128, im_width=128, Nclass=2, kernel_size=3,
border_mode='same', base_n_filters=64, output_activation=softmax)
- channel: input channel number
- Nclass: output channel number
The model accepts input of shape in the order of (B, C, H, W), and outputs with shape in the order of (B, H, W, C).
Shuffle-Seg network
Model reference implementation of ShuffleSeg
class model_ShuffleSeg(in_channels=1, Nclass=6, SF_group_num=4, SF_stage_channels=(24, 272, 544, 1088),
SF_stack_size=(3, 7, 3), SF_batchnorm_mode=1, SF_activation=relu)
- in_channels: channel number of input
- Nclass: output class number
- SF_group_num: number of convolution groups for inside ShuffleNet encoder.
- SF_stage_channels: channel number of each stage output for inside ShuffleNet encoder.
- SF_stack_size: size of each stack for inside ShuffleNet encoder.
- SF_batchnorm_mode: {0 | 1 | 2}. 0 means no batch normalization applied; 1 means batch normalization applied to each cnn; 2 means batch normalization only applied to the last cnn. For inside ShuffleNet encoder
- SF_activation: default = relu. For inside ShuffleNet encoder.
Alternate 2D LSTM
LSTM2D implementation by alternating LSTM along different dimensions.
Input shape = (H, W, B, C)
class Alternate_2D_LSTM( input_dims, hidden_dim, peephole=True, initializer=init.Normal(0.1), grad_clipping=0,
hidden_activation=tanh, learn_ini=False, truncate_gradient=-1, mode=2)
All the arguments are the same with LSTM
module, except for mode
- mode: {0 | 1 | 2}.
0: concat mode, 1D LSTM results from horizontal and vertical dimensions are concatenated along theC
dimension, i.e.,
result = concat(horizontal\_LSTM(input), vertical\_LSTM(input));
1: sequential mode, horizontal and vertical dimensions are processed sequentially, i.e., result = horizontal\_LSTM(vertical\_LSTM(input));
2: mixed mode, i.e.,
result = horizontal\_LSTM(concat(input, vertical\_LSTM(input)))
.forward(seq_input, h_ini=(None, None), c_ini=(None, None), seq_mask=None, backward=(False, False), return_final_state=False)
All the arguments are the same with LSTM
.predict = .forward