阿里云服务器发布网站,杭州外贸网站多少钱,国外做滑板网站,全球云邮登陆网站OCR模型调研及详细安装
1 搭建 Tesseract-OCR 环境。
1.1 注意需先手动安装Tesseract-OCR, 下载地址#xff1a;https://digi.bib.uni-mannheim.de/tesseract/?CM;OD
注意#xff1a;安装的时候选中中文包#xff08;安装时把所有选项都勾上#xff09;。
安装磁盘选择…OCR模型调研及详细安装
1 搭建 Tesseract-OCR 环境。
1.1 注意需先手动安装Tesseract-OCR, 下载地址https://digi.bib.uni-mannheim.de/tesseract/?CM;OD
注意安装的时候选中中文包安装时把所有选项都勾上。
安装磁盘选择与运行的代码在同一磁盘。
安装 Tesseract-OCR 后需将 Tesseract-OCR 对应的安装路径添加到系统环境变量中。安装完成后使用命令查看版本号和支持语言
cd C:\Program Files\Tesseract-OCR
tesseract -v tesseract --list-langs -v tesseract --list-langs 若有语言方面的Error,需将中文包 chi_sim.traineddata 下载到本地C:\Program Files\Tesseract-OCR 路径下。见1.3下载语言包1.2 再安装python库pytesseract
pip install pytesseract1.3 下载语言包并放到Tesseract的目录下
下载地址https://github.com/tesseract-ocr/tesseract/wiki/Data-Fileshttps://tesseract-ocr.github.io/tessdoc/Data-Files1.4 代码块
def tesseract_to_str(image_path):Tesseract-OCR: 提取图片中的文字返回 text字符串from PIL import Imageimport pytesseractimport osif not os.path.isfile(image_path):logging.info( 路径存在问题请检查image_path: .format(image_path))return image Image.open(image_path)# 如果没有将tesseract的安装目录添加到系统环境变量中则需要指定安装路径,pytesseract.pytesseract.tesseract_cmd rD:\Program_Files\Tesseract-OCR\tesseract.exetestdata_dir_config --tessdata-dir D:/Program_Files/Tesseract-OCR/tessdata# 调用pytesseract库提取文字识别中文需指定语言langchi_simprint(-*20,获取图中的文字,-*20)try:text_from_image pytesseract.image_to_string(image, configtestdata_dir_config, langchi_sim)except Exception as e:logging.info( 识别文字失败{} .format(e))return # print(- * 20, 获取图中的文字完成, - * 20)# print(text_from_tesseract: \n, text_from_image)return text_from_image2 EasyOCR 是一个基于 PyTorch 的 OCR 库。
pip install easyocr源码
https://github.com/JaidedAI/EasyOCR
API详解见https://blog.csdn.net/yohnyang/article/details/130300923模型储存路径
windows C:\Users\username\.EasyOCR\
linux/root/.EasyOCR/代码
def easyocr_to_str(image_path):import easyocr# import os# os.environ[KMP_DUPLICATE_LIB_OK] TRUE# reader easyocr.Reader([ch_sim,en], gpu False)print(result:1 \n, )reader easyocr.Reader([ch_sim,], gpu False)print(result:2 \n, )result reader.readtext(image_path)print(result: \n, result)for detection in result:print(detection[1])问题
OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OKTRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.修改
网友说的方法
添加如下代码
import os
os.environ[KMP_DUPLICATE_LIB_OK] TRUE然而没用然后修改了自己环境中的如下内容之后不再报错了即使将上述os内容注释也不报错。
D:\ProgramFiles\miniconda3\envs\env_myenv\Library\bin路径下的libiomp5md.dll改为libiomp5md.dll.bk识别文本示例 3 Keras-OCR
源码
https://gitcode.com/gh_mirrors/ke/keras-ocr/overview?utm_sourceartical_gitcodeindextoptypecardwebUrl安装
安装 keras-ocr支持Python 3.6和TensorFlow 2.0.0。
方法1 从主分支安装
pip install githttps://github.com/faustomorales/keras-ocr.git#eggkeras-ocr
方法2 从PyPi安装
pip install keras-ocr4 Doctr 识别文档中的文本区域、图像和表格
项目地址
https://gitcode.com/gh_mirrors/do/doctr/overview?utm_sourceartical_gitcodeindextoptypecardwebUrlisLogin1安装
pip install python-doctr[torch]首次运行会下载模型存储在
C:\Users\hlj\.cache\doctr\models\db_resnet50-79bd7d70.pt
C:\Users\hlj\.cache\doctr\models\crnn_vgg16_bn-9762b0b0.pt缺点
不支持中文模型