Electronics Science Technology and Application

  • Home
  • About
    • About the Journal
    • Contact
  • Article
    • Current
    • Archives
  • Submissions
  • Editorial Team
  • Announcements
Register Login

ISSN

2424-8460(Online)

2251-2608(Print)

Article Processing Charges (APCs)

US$800

Publication Frequency

Quarterly

PDF

Published

2026-01-30

Issue

Vol 12 No 4 (2025): Published

Section

Articles

Pinyin-enhanced BERT for Chinese spelling correction with continuous pinyin features and user-speciffc lexicon adaptation

Fan Cui

Chizhou University

Dan Wang

Chizhou University

Zundong Mao

Chizhou University


DOI: https://doi.org/10.59429/esta.v12i4.12673


Keywords: pinyin; convolutional neural network; Chinese spelling correction


Abstract

Chinese spelling correction (CSC) is an important task in natural language processing (NLP), with applications in text generation, intelligent input methods, and other scenarios. Although many advanced models have improved general-purpose correction performance, two major challenges remain. First, under the dominance of pinyin-based input methods, phonetic similarity errors have become prevalent, yet existing models still struggle to correct them effectively. Second, the presence of low-frequency domain-specific terms (e.g., in medicine or law) makes it difficult for general models to adapt to out-of-domain texts. To address these issues, we propose Pinpin-BERT, a CSC approach that enhances BERT with continuous pinyin features. Leveraging the property that consecutive Chinese characters often have stable pinyin patterns, the model employs a convolutional neural network to extract continuous pinyin features and integrates them with BERT’s semantic representations for more accurate correction. In addition, a user-defined lexicon is incorporated to improve domain adaptability. Experimental results on three official benchmark datasets and a self-constructed medical dataset demonstrate that the proposed method achieves consistently strong performance across all evaluations.


References

[1] Y. M. Cui, W. X. Che, T. Liu, et al., Pre-training with whole word masking for Chinese BERT, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 3504–3514, 2021.

[2] C. L. Liu, M. H. Lai, K. W. Tien, et al., Visually and phonologically similar characters in incorrect Chinese words: Analyses, identification, and applications, ACM Transactions on Asian Language Information Processing, vol. 10, no. 2, p. 10, 2011.

[3] X. Cheng, W. Xu, K. Chen, et al., SpellGCN: Incorporating phonological and visual similarities into language models for Chinese spelling check, arXiv preprint arXiv:2004.14166, 2020.

[4] H. D. Xu, Z. Li, Q. Zhou, et al., ‘Read, listen, and see: Leveraging multimodal information helps Chinese spell checking, arXiv preprint arXiv:2105.12306, 2021.

[5] L. Huang, J. Li, W. Jiang, et al., PHMOSpell: Phonological and morphological knowledge guided Chinese spelling check, in Proc. 59th Annual Meeting of the Association for Computational Linguistics and the 11th Int. Joint Conf. on Natural Language Processing (Volume 1: Long Papers), 2021, pp. 5958–5967.

[6] F. Boudin, A. Aizawa, Unsupervised domain adaptation for keyphrase generation using citation contexts, arXiv preprint arXiv:2409.13266, 2024.

[7] M. Qiu, Q. Gao, L. Yang, et al., Chinese Grammatical Error Correction: A Survey, arXiv preprint arXiv:2504.00977, 2025.

[8] S. H. Wu, C. L. Liu, L. H. Lee, Chinese spelling check evaluation at SIGHAN bake-off 2013, in Proc. Seventh SIGHAN Workshop on Chinese Language Processing, 2013, pp. 35–42.

[9] L. C. Yu, L. H. Lee, Y. H. Tseng, et al., Overview of SIGHAN 2014 bake-off for Chinese spelling check, in Proc. Third CIPS-SIGHAN Joint Conf. on Chinese Language Processing, 2014, pp. 126–132. [10] D. Wang, Y. Song, J. Li, et al., A hybrid approach to automatic corpus generation for Chinese spelling check, in Proc. 2018 Conf. on Empirical Methods in Natural Language Processing, 2018, pp. 2517–2527.

[10] Y. Hong, X. Yu, N. He, et al., FASPell: A fast, adaptable, simple, powerful Chinese spell checker based on DAEdecoder paradigm, in Proc. 5th Workshop on Noisy User-generated Text (W-NUT 2019), 2019, pp. 160–169.

[11] S. Zhang, H. Huang, J. Liu, et al., Spelling error correction with softmasked BERT, arXiv preprint arXiv:2005.07421, 2020.

[12] B. Wang, W. Che, D. Wu, et al., Dynamic connected networks for Chinese spelling check, in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2021, pp. 2437–2446.

[13] L. Liu, H. Wu, H. Zhao, Chinese spelling correction as rephrasing language model, in Proc. AAAI Conf. on Artificial Intelligence, vol. 38, no. 17, pp. 18662–18670, 2024.

[14] L. Jiang, H. Wu, H. Zhao, et al., Chinese spelling corrector is just a language learner, in Findings of the Association for Computational Linguistics ACL 2024, 2024, pp. 6933–6943.



ISSN: 2424-8460
21 Woodlands Close #02-10 Primz Bizhub Singapore 737854

Email:editorial_office@as-pub.com