邮政包裹分拣系统英文文献和中文翻译

IMPROVED PARCEL SORTING BY COMBINING AUTOMATIC SPEECH AND CHARACTER RECOGNITION
Automatic postal sorting systems have traditionally reliedon optical character recognition (OCR) technology. WhileOCR systems perform well for ﬂat mail items such as en-velopes, the performance deteriorates for parcels. In thisstudy, we propose a new multimodal solution for parcel sort-ing which combines automatic speech recognition (ASR)technology with OCR in order to deliver better performance.Our multimodal approach is based on estimating OCR outputconﬁdence, and then optionally using ASR system outputwhen OCR results show low conﬁdence. Particularly, weproposed a Levenshtein edit distance (LED) based measureto compute OCR conﬁdence. Based on the OCR conﬁdencemeasure, a dynamic fusion strategy is developed that formsits ﬁnal decision based on (i) OCR output alone, (ii) ASRoutput alone, and (iii) combination of ASR and OCR outputs.The proposed system is evaluated on speech and image datacollected in real-world conditions. Our experiments showthat the proposed multimodal solution achieves an overallzip code recognition rate of 90.2%, which is a 27903

substantialimprovement over ASR alone (81%) and OCR alone (80.6%)systems. This advancement represents an important contri-bution that leverages OCR and ASR technologies to improveaddress recognition in parcels.Index Terms— Automatic Speech Recognition, OpticalCharacter Recognition, Parcel Sorting, Address Recognition
1. INTRODUCTIONParcel sorters are large machines that automatically sort anddirect mail towards their destination. Parcel sorting technol-ogy provides greater speed, efﬁciency, and reliability to maildelivery while cutting operational cost. The key step in auto-matic parcel sorting is reliable automatic identiﬁcation of theaddress information on inpidual mail items. Traditionally,optical character recognition (OCR) has been used widely forreading addresses on mail items. While OCR performs ex-tremely well on ﬂat mail items (also known as ﬂats) such asenvelopes, its performance is lower on parcels. Address labelson ﬂats are generally more consistently oriented and providea strong black-font on white-background contrast resulting inbetter OCR performance. However, address labels on parcels are inconsistently placed, often covered with plastic, and theparcel itself could be irregularly shaped resulting in poorerOCR performance. Consequently, there is a need to improveautomatic address recognition accuracy for parcels.In this study, we explore the possibility of using automaticspeech recognition (ASR) together with OCR to deliver im-proved address recognition accuracy for parcels. While mul-timodal approaches that combine OCR and ASR system havebeen explored in other domains [1, 2], the application domainof postal automation is relatively under explored. Here, it isimportant to note that parcel sorters are generally operated bya single person, whose primary job is to pick up the parcel andplace it on the sorting machine conveyor belt. In the proposedmultimodal solution, the operator is assigned a secondary jobwhere he/she reads the address label while placing the par-cel. In this manner, the necessary speech input for the ASRsystem can be obtained. It is also important to note that theoperation is minimally effected as the operator is generallyonly required to speak the zip code (or the ﬁrst three digits ofthe zip code).
In the proposed multimodal approach, the ASR and OCRsystems work independently to decode the best results, whichis later combined to deliver superior performance. Particu-larly, the combination logic examines the OCR output andgenerates a conﬁdence score. If the OCR output is gener-ated with high conﬁdence, then ASR output is not used forﬁnal processing of the result. Additionally, if the OCR out-put is generated with low conﬁdence, then the OCR outputis not trusted and the ASR output alone is used for process-ing the ﬁnal output. Finally, if the OCR output is generatedwith medium conﬁdence, then a combination of OCR andASR outputs is used to process the ﬁnal result. In order toplace the OCR output in high, medium, and low conﬁdencecategories, a conﬁdence measure based on Levenshtein editdistance (LED) is proposed. The LED based measure is veryefﬁcient and effective at utilizing ASR output when OCR out-put is unlikely to be correct. 邮政包裹分拣系统英文文献和中文翻译:http://www.751com.cn/fanyi/lunwen_22564.html