dandelion.model

VGG-16 network

Reference implementation of the classic VGG-16 network

class model_VGG16(channel=3, im_height=224, im_width=224, Nclass=1000, 
                  kernel_size=3, border_mode=(1, 1), flip_filters=False)

channel: input channel number
Nclass: output class number

The model accepts input of shape in the order of (B, C, H, W), and outputs with shape (B, N).

Depthwise Separable Convolution

Reference implementation of Depthwise Separable Convolution

class DSConv2D(in_channels, out_channels, kernel_size=(3,3), stride=(1,1), 
               dilation=(1,1), pad='valid')

input_channels: int. Input shape is (B, input_channels, H_in, W_in)
out_channels: int. Output shape is (B output_channels, H_out, W_out)
kernel_size: int scalar or tuple of int. Convolution kernel size
stride: Factor by which to subsample the output
pad: same/valid/full or 2-element tuple of int. Control image border padding.
dilation: factor by which to subsample (stride) the input.

The model do the depthwise 2D convolution per-channel of input, then map the output to #out_channels number of channel by pointwise 1*1 convolution. No activation applied inside.

ResNet bottleneck

Reference implementation of bottleneck building block of ResNet network

class ResNet_bottleneck(outer_channel=256, inner_channel=64, border_mode='same',
                        batchnorm_mode=1, activation=relu)

outer_channel: channel number of block input
inner_channel: channel number inside the block
batchnorm_mode: {0 | 1 | 2}. 0 means no batch normalization applied; 1 means batch normalization applied to each cnn; 2 means batch normalization only applied to the last cnn
activation: default = relu. Note no activation applied to the last element-wise sum output.

The model accepts input of shape in the order of (B, C, H, W), and outputs with the same shape.

Feature Pyramid Network

Reference implementation of feature pyramid network

class model_FPN(input_channel=3, base_n_filters=64, batchnorm_mode=1)

batchnorm_mode: same with ResNet_bottleneck
return 4-element tuple (p2, p3, p4, p5), CNN pyramid features at different scales, each with #channel = 4 * base_n_filters

ShuffleUnit

Reference implementation of shuffle-net unit

class ShuffleUnit(in_channels=256, inner_channels=None, out_channels=None, group_num=4, border_mode='same', 
                  batchnorm_mode=1, activation=relu, stride=(1,1), dilation=(1,1), fusion_mode='add')

in_channels: channel number of unit input
inner_channel: optional, channel number inside the unit, default = in_channels//4
out_channels: channel number of unit output, only used when fusion_mode = 'concat', and must > in_channels
group_num: number of convolution groups
border_mode: only same allowed
batchnorm_mode: {0 | 1 | 2}. 0 means no batch normalization applied; 1 means batch normalization applied to each cnn; 2 means batch normalization only applied to the last cnn
activation: default = relu. Note no activation applied to the last output.
stride, dilation: only used for depthwise separable convolution module inside
fusion_mode: {'add' | 'concat'}. When 'concat', out_channels must > in_channels.
return: convolution result with #channel = in_channels when fusion_mode='add', #channel = out_channels when fusion_mode='concat'

ShuffleUnit_Stack

Reference implementation of shuffle-net unit stack

class ShuffleUnit_Stack(in_channels, inner_channels=None, out_channels=None, group_num=4, batchnorm_mode=1, 
                        activation=relu, stack_size=3, stride=2, fusion_mode='concat')

in_channels: channel number of input
inner_channel: optional, channel number inside the shuffle-unit, default = in_channels//4
out_channels: channel number of stack output, must > in_channels
group_num: number of convolution groups
batchnorm_mode: {0 | 1 | 2}. 0 means no batch normalization applied; 1 means batch normalization applied to each cnn; 2 means batch normalization only applied to the last cnn
activation: default = relu. Note no activation applied to the last output.
stack_size: number of shuffle-unit in the stack
stride: int or tuple of int, convolution stride for the first unit, default=2
fusion_mode: fusion_mode for the first unit.

ShuffleNet

Reference implementation of shuffle-net, without the final pooling & Dense layer.

class model_ShuffleNet(in_channels, group_num=4, stage_channels=(24, 272, 544, 1088), stack_size=(3, 7, 3), 
                       batchnorm_mode=1, activation=relu)

in_channels: channel number of input
group_num: number of convolution groups
stage_channels: channel number of each stage output.
stack_size: size of each stack.
batchnorm_mode: {0 | 1 | 2}. 0 means no batch normalization applied; 1 means batch normalization applied to each cnn; 2 means batch normalization only applied to the last cnn
activation: default = relu. Note no activation applied to the last output.

ShuffleUnit_v2

Reference implementation of shufflenet_v2 unit

class ShuffleUnit_v2(in_channels=256, out_channels=None, border_mode='same', batchnorm_mode=1, 
                     activation=relu, stride=1, dilation=1)

in_channels: channel number of unit input
out_channels: channel number of unit output, only used when stride>1; when stride1=1, out_channels is fixed to in_channels.
border_mode: only same allowed
batchnorm_mode: {0 | 1 | 2}. 0 means no batch normalization applied; 1 means batch normalization applied to each cnn; 2 means batch normalization only applied to the last cnn
activation: default = relu. Note no activation applied to the last output.
stride, dilation: only used for depthwise separable convolution module inside, must be integer scalars or tuple of integers.

ShuffleUnit_v2_Stack

Reference implementation of shufflenet_v2 unit stack

class ShuffleUnit_v2_Stack(in_channels, out_channels, batchnorm_mode=1, activation=relu, stack_size=3, stride=2)

in_channels: channel number of input
out_channels: channel number of stack output
batchnorm_mode: {0 | 1 | 2}. 0 means no batch normalization applied; 1 means batch normalization applied to each cnn; 2 means batch normalization only applied to the last cnn
activation: default = relu. Note no activation applied to the last output.
stack_size: number of shuffle-unit in the stack
stride: int or tuple of int, convolution stride for the first unit, default=2

ShuffleNet_v2

Reference implementation of shufflenet_v2, without the final pooling & Dense layer.

class model_ShuffleNet_v2(in_channels, stage_channels=(24, 116, 232, 464, 1024), stack_size=(3, 7, 3), 
                          batchnorm_mode=1, activation=relu)

in_channels: channel number of input
stage_channels: channel number of each stage output.
stack_size: size of each stack.
batchnorm_mode: {0 | 1 | 2}. 0 means no batch normalization applied; 1 means batch normalization applied to each cnn; 2 means batch normalization only applied to the last cnn
activation: default = relu. Note no activation applied to the last output.

CTPN

Model reference implementation of CTPN

class model_CTPN(k=10, do_side_refinement_regress=False,
                 batchnorm_mode=1, channel=3, im_height=None, im_width=None,
                 kernel_size=3, border_mode=(1, 1), VGG_flip_filters=False,
                 im2col=None)

k: anchor box number
do_side_refinement_regress: whether implement side refinement regression
batchnorm_mode: {0|1}, whether insert batch normalization into the end of each convolution stage of VGG-16 net, useful for cold start.
channel: input channel number
im_height, im_width: input image height/width, optional
kernel_size: convolution kernel size of VGG-16 net
border_mode: border mode of VGG-16 net
VGG_flip_filters: whether flip convolution kernels for VGG-16 net
im2col: function corresponding to Caffe's im2col(). If None, the CTPN implementation will not strictly follow the original paper.

U-net FCN

Reference implementation of U-net FCN

class model_Unet(channel=1, im_height=128, im_width=128, Nclass=2, kernel_size=3, 
                 border_mode='same', base_n_filters=64, output_activation=softmax)

channel: input channel number
Nclass: output channel number

The model accepts input of shape in the order of (B, C, H, W), and outputs with shape in the order of (B, H, W, C).

Shuffle-Seg network

Model reference implementation of ShuffleSeg

class model_ShuffleSeg(in_channels=1, Nclass=6, SF_group_num=4, SF_stage_channels=(24, 272, 544, 1088), 
                       SF_stack_size=(3, 7, 3), SF_batchnorm_mode=1, SF_activation=relu)

in_channels: channel number of input
Nclass: output class number
SF_group_num: number of convolution groups for inside ShuffleNet encoder.
SF_stage_channels: channel number of each stage output for inside ShuffleNet encoder.
SF_stack_size: size of each stack for inside ShuffleNet encoder.
SF_batchnorm_mode: {0 | 1 | 2}. 0 means no batch normalization applied; 1 means batch normalization applied to each cnn; 2 means batch normalization only applied to the last cnn. For inside ShuffleNet encoder
SF_activation: default = relu. For inside ShuffleNet encoder.

Alternate 2D LSTM

LSTM2D implementation by alternating LSTM along different dimensions.
Input shape = (H, W, B, C)

class Alternate_2D_LSTM( input_dims, hidden_dim, peephole=True, initializer=init.Normal(0.1), grad_clipping=0, 
                         hidden_activation=tanh, learn_ini=False, truncate_gradient=-1, mode=2)

All the arguments are the same with LSTM module, except for mode.

mode: {0 | 1 | 2}.
0: concat mode, 1D LSTM results from horizontal and vertical dimensions are concatenated along the C dimension, i.e.,
$result = concat(horizontal\_LSTM(input), vertical\_LSTM(input))$ ;
1: sequential mode, horizontal and vertical dimensions are processed sequentially, i.e., $result = horizontal\_LSTM(vertical\_LSTM(input))$ ;
2: mixed mode, i.e.,
$result = horizontal\_LSTM(concat(input, vertical\_LSTM(input)))$

.forward(seq_input, h_ini=(None, None), c_ini=(None, None), seq_mask=None, backward=(False, False), return_final_state=False)

All the arguments are the same with LSTM module

.predict = .forward