Training tool
  • 16 Aug 2023
  • 2 Minutes to read
  • Contributors
  • Dark
    Light

Training tool

  • Dark
    Light

Article Summary


Step1. Prepare a computer with GPU and Linux (Ubuntu) system.

Step2. Prepare a dataset, please refer to the label tool.

Step3. Install Docker engine, please refer to the SOP.

Step4. Pull YOLO training docker image.

docker pull adlinksec/train-yolo

Step5. Enable X server permission.

Docker will use root user but x server default do not open permission root user to access. Therefore we need to open permission before launch our EVA Docker.

xhost +

This command only need to execute one time unless reset permission of X Server like reboot system. And close terminal will not reset permission of X server.

Step6. How to use YOLO training docker.

Usage

docker run adlinksec/train-yolo -h
Usage: ./entrypoint.sh [-v4] [-tiny] [-gui] [-s SIZE] [-mb MAX_BATCHES] [-b BATCH] [-sub SUBDIVISIONS] [-name NAME] [-w PRETRAIN_WEIGHTS] [-fp16] [--zip ZIP_FILE] [--zip-url ZIP_URL] [--data-dir DATA_DIR]

Training script will the line of label file that was assign by names field in obj data.

And the training data folder loading order:

ZIP_URL > ZIP_FILE > DATA_DIR > /data

Any data location variables that was assigned value is check with failed like label file is not exist will cause training flow fail.

Arguments:

    -v4         Train with yolov4

    -tiny       Train with tiny

    -gui        Enable GUI to display training figure

    -s          Train image size, SIZE should be multiply by 32

    -h          Print help usage

    -mb         Max traning batch iterator

    -b          Training batch size default 32

    -sub        Training subdivision size default 4

    -name       Output name of models, this will be suffix of model files

    -w          Pre-train darknet weights model, do not support change size and class number

    -fp16       Also generate precision FP16 model (Current only OpenVINO FP16)

    -zip       Path of data zip file

    --zip-url   Data zip file download url

    --data-dir  Path of data folder

Environments:

    GPU_IDX     Index of Nvidia GPU card to training, card index start with 0. ex: 0,3 will use 1-st and 4-th of Nvidia GPU card

    CFG_FILE    Path of Yolo cfg file

    OBJ_DATA    Path of Yolo object data

Command samples

Traing YoloV3 and size 416

docker run -ti --net=host -e DISPLAY=$DISPLAY --gpus all -v /tmp/.X11-unix:/tmp/.X11-unix -v ~/.Xauthority:/root/.Xauthority --rm --volumes-from $(docker create adlinksec/hand_dataset) -v $PWD/models:/models adlinksec/train-yolo -gui -mb 1000

Traing YoloV4-tiny and size 608

docker run -ti --net=host -e DISPLAY=$DISPLAY --gpus all -v /tmp/.X11-unix:/tmp/.X11-unix -v ~/.Xauthority:/root/.Xauthority --rm --volumes-from $(docker create adlinksec/hand_dataset) -v $PWD/models:/models adlinksec/train-yolo -gui -mb 1000 -v4 -tiny -s 608

Traing YoloV4 and model named "hand"

docker run -ti --net=host -e DISPLAY=$DISPLAY --gpus all -v /tmp/.X11-unix:/tmp/.X11-unix -v ~/.Xauthority:/root/.Xauthority --rm --volumes-from $(docker create adlinksec/hand_dataset) -v $PWD/models:/models adlinksec/train-yolo -gui -mb 1000 -v4 -name hand

Continue to train "hand" model to 4000 batch

docker run -ti --net=host -e DISPLAY=$DISPLAY --gpus all -v /tmp/.X11-unix:/tmp/.X11-unix -v ~/.Xauthority:/root/.Xauthority --rm --volumes-from $(docker create adlinksec/hand_dataset) -v $PWD/models:/models adlinksec/train-yolo -gui -mb 4000 -v4 -name hand_4000 -w /models/hand.weights

Train YoloV3-tiny helmet detector

docker run -ti --net=host -e DISPLAY=$DISPLAY --gpus all -v /tmp/.X11-unix:/tmp/.X11-unix -v ~/.Xauthority:/root/.Xauthority --rm -v $PWD/models:/models adlinksec/train-yolo -gui -mb 1000 -tiny --zip-url https://app.roboflow.com/ds/p93iBw9pxu?key=3K4jR07cID

Optimize the model on the edge device
After training procedure, you will get the onnx model at models folder. Move it to your edge device like Neon camera and optimize it by tensorrt tool.

  • FP32 batchsize =1       
/usr/src/tensorrt/bin/trtexec --onnx=YOUR_MODEL_NAME.onnx --buildOnly --saveEngine=YOUR_MODEL_NAME.engine --maxBatch=1
  • FP16 batchsize =1        
/usr/src/tensorrt/bin/trtexec --onnx=YOUR_MODEL_NAME.onnx --buildOnly --saveEngine=YOUR_MODEL_NAME.engine --maxBatch=1 --fp16

Inference on the edge device
Refer to article


Was this article helpful?