外贸网站怎样做,宣传片制作公司排行,识别关键词软件,济宁恒德建设有限公司网站文章目录 项目背景造数据训练 项目背景
在日常开发中#xff0c;经常会遇到一些图片是由多个图片拼接来的#xff0c;如下图就是三个图片横向拼接来的。是否可以利用yolov8-seg模型来识别出这张图片的三张子图区域呢#xff0c;这是文本要做的事情。 造数据
假设拼接方式有… 文章目录 项目背景造数据训练 项目背景
在日常开发中经常会遇到一些图片是由多个图片拼接来的如下图就是三个图片横向拼接来的。是否可以利用yolov8-seg模型来识别出这张图片的三张子图区域呢这是文本要做的事情。 造数据
假设拼接方式有横向拼接2张图为新图最短边是高reisze到768另一边等比resize、横向拼接3张图为新图最短边是高reisze到768另一边等比resize、纵向拼接2张图为新图最短边是高reisze到768另一边等比resize、纵向拼接3张图为新图最短边是高reisze到768另一边等比resize、拼接一个22的图每张图大小resize到一样总大小12901280。
这个代码会造分割数据。
import os
import random
from PIL import Imagedef list_path_all_files(dirname):result []for maindir, subdir, file_name_list in os.walk(dirname):for filename in file_name_list:if filename.lower().endswith(.jpg):apath os.path.join(maindir, filename)result.append(apath)return resultdef resize_image(image, target_size, resize_byheight):w, h image.sizeif resize_by height:if h ! target_size:ratio target_size / hnew_width int(w * ratio)image image.resize((new_width, target_size), Image.ANTIALIAS)elif resize_by width:if w ! target_size:ratio target_size / wnew_height int(h * ratio)image image.resize((target_size, new_height), Image.ANTIALIAS)return imagedef create_2x2_image(images):target_size (640, 640)new_image Image.new(RGB, (1280, 1280))coords []for i, img in enumerate(images):img img.resize(target_size, Image.ANTIALIAS)if i 0:new_image.paste(img, (0, 0))coords.append((0, 0, 640, 0, 640, 640, 0, 640))elif i 1:new_image.paste(img, (640, 0))coords.append((640, 0, 1280, 0, 1280, 640, 640, 640))elif i 2:new_image.paste(img, (0, 640))coords.append((0, 640, 640, 640, 640, 1280, 0, 1280))elif i 3:new_image.paste(img, (640, 640))coords.append((640, 640, 1280, 640, 1280, 1280, 640, 1280))return new_image, coordsdef concatenate_images(image_list, modehorizontal, target_size768):if mode horizontal:resized_images [resize_image(image, target_size, height) for image in image_list]total_width sum(image.size[0] for image in resized_images)max_height target_sizenew_image Image.new(RGB, (total_width, max_height))x_offset 0coords []for image in resized_images:new_image.paste(image, (x_offset, 0))coords.append((x_offset, 0, x_offset image.size[0], 0, x_offset image.size[0], max_height, x_offset, max_height))x_offset image.size[0]elif mode vertical:resized_images [resize_image(image, target_size, width) for image in image_list]total_height sum(image.size[1] for image in resized_images)max_width target_sizenew_image Image.new(RGB, (max_width, total_height))y_offset 0coords []for image in resized_images:new_image.paste(image, (0, y_offset))coords.append((0, y_offset, max_width, y_offset, max_width, y_offset image.size[1], 0, y_offset image.size[1]))y_offset image.size[1]return new_image, coordsdef generate_labels(coords, image_size):labels []width, height image_sizefor coord in coords:x1, y1, x2, y2, x3, y3, x4, y4 coordx1 / widthy1 / heightx2 / widthy2 / heightx3 / widthy3 / heightx4 / widthy4 / heightlabels.append(f0 {x1:.5f} {y1:.5f} {x2:.5f} {y2:.5f} {x3:.5f} {y3:.5f} {x4:.5f} {y4:.5f})return labelsdef generate_dataset(image_folder, output_folder, label_folder, num_images):image_paths list_path_all_files(image_folder)if not os.path.exists(output_folder):os.makedirs(output_folder)if not os.path.exists(label_folder):os.makedirs(label_folder)for i in range(num_images):random_choice random.randint(1, 5)if random_choice 1:selected_images [Image.open(random.choice(image_paths)) for _ in range(2)]new_image, coords concatenate_images(selected_images, modehorizontal)elif random_choice 2:selected_images [Image.open(random.choice(image_paths)) for _ in range(3)]new_image, coords concatenate_images(selected_images, modehorizontal)elif random_choice 3:selected_images [Image.open(random.choice(image_paths)) for _ in range(2)]new_image, coords concatenate_images(selected_images, modevertical)elif random_choice 4:selected_images [Image.open(random.choice(image_paths)) for _ in range(3)]new_image, coords concatenate_images(selected_images, modevertical)elif random_choice 5:selected_images [Image.open(random.choice(image_paths)) for _ in range(4)]new_image, coords create_2x2_image(selected_images)output_image_path os.path.join(output_folder, fcomposite_image_paper_{i 1:06d}.jpg)new_image.save(output_image_path, JPEG)label_path os.path.join(label_folder, fcomposite_image_paper_{i 1:06d}.txt)labels generate_labels(coords, new_image.size)with open(label_path, w) as label_file:for label in labels:label_file.write(label \n)# 示例用法
image_folder /ssd/xiedong/datasets/multilabelsTask/multilabels_new/10025doc_textPaperShot/
# image_folder /ssd/xiedong/datasets/multilabelsTask/multilabels_new/
output_folder /ssd/xiedong/datasets/composite_images_yolov8seg/images
label_folder /ssd/xiedong/datasets/composite_images_yolov8seg/labels
num_images 10000
generate_dataset(image_folder, output_folder, label_folder, num_images)
有的图片还是很有难度的比如这张图分界不明显模型是否能搞定是个未知数。当然我会认为模型可以在一定程度上识别语义或者排版还是有几率可以识别对的。 训练
我想得到一个后续可以直接用的环境我直接用docker搞个环境。搞的过程
docker run -it --gpus all --net host --shm-size8g -v /ssd/xiedong/yolov8segdir:/ssd/xiedong/yolov8segdir ultralytics/ultralytics:8.2.62 bashdocker tag ultralytics/ultralytics:8.2.62 kevinchina/deeplearning:ultralytics-8.2.62
docker push kevinchina/deeplearning:ultralytics-8.2.62写一个数据集data.yaml
cd /ssd/xiedong/yolov8segdir
vim data.yamlpath: /ssd/xiedong/yolov8segdir/composite_images_yolov8seg
train: images # train images (relative to path) 128 images
val: images # val images (relative to path) 128 images
test: # test images (optional)# Classes
names:0: paper执行这个代码开始训练模型
from ultralytics import YOLO# Load a model
model YOLO(yolov8m-seg.pt) # load a pretrained model (recommended for training)# Train the model with 2 GPUs
results model.train(datadata.yaml, epochs50, imgsz640, device[1, 2, 3], batch180)
代码会自动下载这个模型到本地网络问题也可能需要自己用wget下载到当前训练代码的执行目录。
https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8m-seg.pt
开始训练
python -m torch.distributed.run --nproc_per_node 3 x03train.py这样训练就可以了
看起来任务是简单的