# MASTER-pytorch **Repository Path**: chenhaohan88/MASTER-pytorch ## Basic Information - **Project Name**: MASTER-pytorch - **Description**: Code for the paper "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021) - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-08-01 - **Last Updated**: 2021-08-01 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # MASTER-PyTorch ![](https://img.shields.io/badge/license-MIT-blue)
PyTorch reimplementation of ["MASTER: Multi-Aspect Non-local Network for Scene Text Recognition"](https://arxiv.org/abs/1910.02562) (Pattern Recognition 2021). This project is different from our original implementation that builds on the privacy codebase FastOCR of the company. You can also find Tensorflow reimplementation at [MASTER-TF](https://github.com/jiangxiluning/MASTER-TF) repository, and the performance is almost identical. (PS. Logo inspired by the Master Oogway in Kung Fu Panda) ## News * 2021/07: [MASTER-mmocr](https://github.com/JiaquanYe/MASTER-mmocr), reimplementation of MASTER by mmocr. [@Jiaquan Ye](https://github.com/JiaquanYe) * 2021/07: [TableMASTER-mmocr](https://github.com/JiaquanYe/TableMASTER-mmocr), 2nd solution of ICDAR 2021 Competition on Scientific Literature Parsing Task B based on MASTER. [@Jiaquan Ye](https://github.com/JiaquanYe) * 2021/07: Talk can be found at [here](https://www.bilibili.com/video/BV1T44y1m7vc) (Chinese). * 2021/05: [Savior](https://github.com/novioleo/Savior), which aims to provide a simple, lightweight, fast integrated, pipelined deployment framework for RPA, is now integrated MASTER for captcha recognition. [@Tao Luo](https://github.com/novioleo) * 2021/04: Slides can be found at [here](https://github.com/wenwenyu/MASTER-pytorch/blob/main/assets/MASTER.pdf). ## Honors based on MASTER * 1st place (2021/05) solution to [ICDAR 2021 Competition on Scientific Table Image Recognition to LaTeX (Subtask I: Table structure reconstruction)](https://competitions.codalab.org/competitions/26979) * 1st place (2021/05) solution to [ICDAR 2021 Competition on Scientific Table Image Recognition to LaTeX (Subtask II: Table content reconstruction)](https://competitions.codalab.org/competitions/26979) * 2nd place (2021/05) solution to [ICDAR 2021 Competition on Scientific Literature Parsing Task B: Table recognition](https://icdar2021.org/program-2/competitions/competition-on-scientific-literature-parsing/) * 1st place (2020/10) solution to [ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard (task2)](https://rrc.cvc.uab.es/?ch=12&com=evaluation&task=2) * 2nd and 5th places (2020/10) in [The 5th China Innovation Challenge on Handwritten Mathematical Expression Recognition](https://www.heywhale.com/home/competition/5f703ac023f41e002c3ed5e4/content/6) * 4th place (2019/08) of [ICDAR 2017 Robust Reading Challenge on COCO-Text (task2)](https://rrc.cvc.uab.es/?ch=5&com=evaluation&task=2) * More will be released ## Contents * [Introduction](#introduction) * [Requirements](#requirements) * [Usage](#usage) * [Prepare Datasets](#prepare-datasets) * [Distributed training with config files](#distributed-training-with-config-files) * [Using Multiple Node](#using-multiple-node) * [Debug mode on one GPU/CPU training with config files](#debug-mode-on-one-gpucpu-training-with-config-files) * [Resuming from checkpoints](#resuming-from-checkpoints) * [Finetune from checkpoints](#finetune-from-checkpoints) * [Testing from checkpoints](#testing-from-checkpoints) * [Evaluation](#evaluation) * [Customization](#customization) * [Checkpoints](#checkpoints) * [Tensorboard Visualization](#tensorboard-visualization) * [TODO](#todo) * [Citations](#citations) * [License](#license) * [Acknowledgements](#acknowledgements) ## Introduction MASTER is a self-attention based scene text recognizer that (1) not only encodes the input-output attention, but also learns self-attention which encodes feature-feature and target-target relationships inside the encoder and decoder and (2) learns a more powerful and robust intermediate representation to spatial distortion and (3) owns a better training and evaluation efficiency. Overall architecture shown follows.
## Requirements * python==3.6 * torchvision==0.6.1 * pandas==1.0.5 * torch==1.5.1 * numpy==1.16.4 * tqdm==4.47.0 * Distance==0.1.3 * Pillow==7.2.0 ```bash pip install -r requirements.txt ``` ## Usage ### Prepare Datasets 1. Prepare the correct format of files as provided in `data` folder. * Please see [data/README.md](data/README.md) an instruction how to prepare the data in required format for MASTER. * Synthetic image datasets: [SynthText (Synth800k)](https://www.robots.ox.ac.uk/~vgg/data/scenetext/), [MJSynth (Synth90k)](https://www.robots.ox.ac.uk/~vgg/data/text/), [SynthAdd (password:627x)](https://pan.baidu.com/s/1uV0LtoNmcxbO-0YA7Ch4dg) * Real image datasets: [IIIT5K](http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html), [SVT](http://www.iapr-tc11.org/dataset/SVT/svt.zip), [IC03](http://iapr-tc11.org/mediawiki/index.php?title=ICDAR_2003_Robust_Reading_Competitions), [IC13](http://rrc.cvc.uab.es/?ch=2), [IC15](https://rrc.cvc.uab.es/?ch=4), [COCO-Text](https://rrc.cvc.uab.es/?ch=5), [SVTP](https://github.com/Jyouhou/SceneTextPapers/raw/master/datasets/svt-p.zip), [CUTE80_Cropped](https://github.com/ocr-algorithm-and-data/CUTE80) * An example of cropping SynthText can be found at [data_utils/crop_synthtext.py](data_utils/crop_synthtext.py) 2. Modify `train_dataset` and `val_dataset` args in `config.json` file, including `txt_file`, `img_root`, `img_w`, `img_h`. 3. Modify `keys.txt` in `utils/keys.txt` file if needed according to the vocabulary of your dataset. 4. Modify `STRING_MAX_LEN` in `utils/label_util.py` file if needed according to the text length of your dataset. ### Distributed training with config files Modify the configurations in `configs/config.json` and `dist_train.sh` files, then run: ```bash bash dist_train.sh ``` The application will be launched via `launch.py` on a 4 GPU node with one process per GPU (recommend). This is equivalent to ```bash python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=4 \ --master_addr=127.0.0.1 --master_port=5555 \ train.py -c configs/config.json -d 1,2,3,4 --local_world_size 4 ``` and is equivalent to specify indices of available GPUs by `CUDA_VISIBLE_DEVICES` instead of `-d` args ```bash CUDA_VISIBLE_DEVICES=1,2,3,4 python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=4 \ --master_addr=127.0.0.1 --master_port=5555 \ train.py -c configs/config.json --local_world_size 4 ``` Similarly, it can be launched with a single process that spans all 4 GPUs (if node has 4 available GPUs) using (don't recommend): ```bash CUDA_VISIBLE_DEVICES=1,2,3,4 python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=1 \ --master_addr=127.0.0.1 --master_port=5555 \ train.py -c configs/config.json --local_world_size 1 ``` ### Using Multiple Node You can enable multi-node multi-GPU training by setting `nnodes` and `node_rank` args of the commandline line on every node. e.g., 2 nodes 4 gpus run as follows Node 1, ip: 192.168.0.10, then run on node 1 as follows ``` CUDA_VISIBLE_DEVICES=1,2,3,4 python -m torch.distributed.launch --nnodes=2 --node_rank=0 --nproc_per_node=4 \ --master_addr=192.168.0.10 --master_port=5555 \ train.py -c configs/config.json --local_world_size 4 ``` Node 2, ip: 192.168.0.15, then run on node 2 as follows ``` CUDA_VISIBLE_DEVICES=2,4,6,7 python -m torch.distributed.launch --nnodes=2 --node_rank=1 --nproc_per_node=4 \ --master_addr=192.168.0.10 --master_port=5555 \ train.py -c configs/config.json --local_world_size 4 ``` ### Debug mode on one GPU/CPU training with config files This option of training mode can debug code without distributed way. `-dist` must set to `false` to turn off distributed mode. `-d` specify which one gpu will be used. ```bash python train.py -c configs/config.json -d 1 -dist false ``` ### Resuming from checkpoints You can resume from a previously saved checkpoint by: ``` python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=4 \ --master_addr=127.0.0.1 --master_port=5555 \ train.py -d 1,2,3,4 --local_world_size 4 --resume path/to/checkpoint ``` ### Finetune from checkpoints You can finetune from a previously saved checkpoint by: ``` python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=4 \ --master_addr=127.0.0.1 --master_port=5555 \ train.py -d 1,2,3,4 --local_world_size 4 --resume path/to/checkpoint --finetune true ``` ### Testing from checkpoints You can predict from a previously saved checkpoint by: ``` python test.py --checkpoint path/to/checkpoint --img_folder path/to/img_folder \ --width 160 --height 48 \ --output_folder path/to/output_folder \ --gpu 0 --batch_size 64 ``` **Note**: `width` and `height` must be the same as the settings used during training. ### Evaluation Evaluate squence accuracy and edit distance accuracy: ``` python utils/calculate_metrics.py --predict-path predict_result.json --label-path label.txt ``` **Note**: `label.txt`: multi-line, every line containing `{ImageFile:, Label:}` ## Customization ### Checkpoints You can specify the name of the training session in `config.json` files: ```json "name": "MASTER_Default", "run_id": "example" ``` The checkpoints will be saved in `save_dir/name/run_id_timestamp/checkpoint_epoch_n`, with timestamp in mmdd_HHMMSS format. A copy of `config.json` file will be saved in the same folder. **Note**: checkpoints contain: ```python { 'arch': arch, 'epoch': epoch, 'model_state_dict': self.model.state_dict(), 'optimizer': self.optimizer.state_dict(), 'monitor_best': self.monitor_best, 'config': self.config } ``` ### Tensorboard Visualization This project supports Tensorboard visualization by using either `torch.utils.tensorboard` or [TensorboardX](https://github.com/lanpa/tensorboardX). 1. **Install** If you are using pytorch 1.1 or higher, install tensorboard by 'pip install tensorboard>=1.14.0'. Otherwise, you should install tensorboardx. Follow installation guide in [TensorboardX](https://github.com/lanpa/tensorboardX). 2. **Run training** Make sure that `tensorboard` option in the config file is turned on. ``` "tensorboard" : true ``` 3. **Open Tensorboard server** Type `tensorboard --logdir saved/log/` at the project root, then server will open at `http://localhost:6006` By default, values of loss will be logged. If you need more visualizations, use `add_scalar('tag', data)`, `add_image('tag', image)`, etc in the `trainer._train_epoch` method. `add_something()` methods in this project are basically wrappers for those of `tensorboardX.SummaryWriter` and `torch.utils.tensorboard.SummaryWriter` modules. **Note**: You don't have to specify current steps, since `WriterTensorboard` class defined at `logger/visualization.py` will track current steps. ## TODO - [ ] Memory-Cache based Inference ## Citations If you find MASTER useful please cite our [paper](https://arxiv.org/abs/1910.02562): ```bibtex @article{Lu2021MASTER, title={{MASTER}: Multi-Aspect Non-local Network for Scene Text Recognition}, author={Ning Lu and Wenwen Yu and Xianbiao Qi and Yihao Chen and Ping Gong and Rong Xiao and Xiang Bai}, journal={Pattern Recognition}, year={2021} } ``` ## License This project is licensed under the MIT License. See LICENSE for more details. ## Acknowledgements * [PICK-pytorch](https://github.com/wenwenyu/PICK-pytorch) * [PyTorch Template Project](https://github.com/victoresque/pytorch-template)