python编程技巧（四）

循环列表

1
2
3

import itertools

l = itertools.cycle([1,])

conda 安装

conda环境配置，先运行 conda config --set show_channel_urls yes，然后修改 .condarc文件，最后运行 conda clean -i

channels:
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/win-64
  - defaults
show_channel_urls: true
default_channels:
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2
custom_channels:
  conda-forge: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  msys2: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  bioconda: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  menpo: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  pytorch: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  pytorch-lts: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  simpleitk: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud


remote_read_timeout_secs: 1000.0
ssl_verify: false

conda命令

conda create -n xxx python=3.9
conda env remove -n xxx
conda deactivate
conda init

pytorch 安装

查看设备：运行 dxdiag，检查CPU、GPU

驱动版本和CUDA版本

先确定torch版本和CUDA版本

然后生成命令

1	conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c nvidia

fake-useragent ==1.5.1 的使用

1	from fake_useragent import UserAgent

ua = UserAgent(
    browsers=['firefox', 'chrome', 'safari'],
    os=["android", "ios"],
    min_version=120.0,
    platforms=["mobile"]
)

ua = UserAgent(
    browsers=["firefox", "chrome", "safari"],
    os=["windows", "macos"],
    min_version=120.0,
    platforms=["pc"],
)

ua.random

两个列表的差值

1	list(set(account_shop_lst).difference(set(data_shop_lst)))

re.sub 替换目标分组

def _new_base_info(text: str, new_str: str):
    pattern = r'cookie":".*?"'
    ck_str = re.search(pattern, text).group(0)
    new_ck_str = re.sub(r':".*?"', f':"{new_str}"', ck_str)
    return re.sub(ck_str, new_ck_str, device.base_info)

万 - 转换数字

1 2	if "万" in item["soldCount"]: item["soldCount"] = int(eval(item["soldCount"].replace("万", " * 10000")))

Python 项目结构

除了utils和init文件夹之外，还有许多其他常见的项目结构和文件夹

main.py 或 app.py:
- 项目的主入口点或应用程序启动文件。
models/:
- 存放业务逻辑模型、数据库模型或数据结构定义。
views/:
- 如果是Web应用，存放视图逻辑或视图模板。
controllers/ 或 controllers.py:
- 用于MVC架构中的控制器层，处理用户输入和业务逻辑调用。
services/:
- 存放服务层代码，实现应用程序的核心业务逻辑。
repositories/:
- 数据访问层，实现与数据库或其他数据存储的交互。
migrations/:
- 存放数据库迁移脚本。
tests/:
- 存放单元测试、集成测试和其他测试代码。
config/:
- 存放配置文件和环境变量配置。
scripts/:
- 存放项目相关的脚本文件，如安装脚本、部署脚本等。
static/:
- 如果是Web应用，存放静态文件，如CSS、JavaScript和图片。
templates/:
- 如果是Web应用，存放模板文件。
docs/ 或 documentation/:
- 存放项目文档，如用户手册、开发文档等。
requirements.txt:
- 列出项目依赖的Python库。
setup.py:
- 用于项目的安装和分发。
README.md:
- 项目的说明文件，通常包含项目描述、安装指南和使用方法。
.gitignore:
- 指定Git版本控制要忽略的文件和文件夹。
.env:
- 存放环境变量配置。
.pytest.ini 或 tox.ini:
- 配置测试框架。
Dockerfile:
- 定义容器化部署的Docker配置。

通用缺口识别 captcha-recognizer

https://github.com/chenwei-zhao/captcha-recognizer

from captcha_recognizer.recognizer import Recognizer

# 传入完整缺口图，返回坐标和可信度
recognizer = Recognizer()
box, confidence = recognizer.identify_gap(source=img_name, verbose=False)

需要 Python 3.8 + 以及 torch 1.8 + 以上环境

飞桨 PaddleOCR

https://github.com/PaddlePaddle/PaddleOCR

快速开始：https://paddlepaddle.github.io/PaddleOCR/latest/quick_start.html

安装：

1 2	pip install paddlepaddle pip install paddleocr

ddddocr的使用

目标检测（det模式）

ocr = ddddocr.DdddOcr(det=True, show_ad=False)
image_file = "big.png"
with open(image_file, "rb") as f:
    image = f.read()
poses = ocr.detection(image)

画框

im = cv2.imread(image_file)
for box in poses:
    x1, y1, x2, y2 = box
    im = cv2.rectangle(im, (x1, y1), (x2, y2), color=(0, 0, 255), thickness=2)
cv2.imwrite("result_" + image_file, im)

裁剪

img = Image.open("big.png")
for num, box in enumerate(poses):
	cap = img.crop(box)
	cap.save(f"{num}.png")

识别字符

ocr = ddddocr.DdddOcr(show_ad=False)
for num in range(5):
    num_img = f"{num}.png"
    _bytes = open(num_img, "rb").read()
    result = ocr.classification(_bytes)
    print(num_img, result)

识别缺口

第一种

def get_distance_by_slide(small_name, big_name):
    """
    比较滑块和缺口图
    """
    slide = ddddocr.DdddOcr(det=False, ocr=False, show_ad=False)
    target_bytes = open(small_name, "rb").read()
    background_bytes = open(big_name, "rb").read()
    res = slide.slide_match(target_bytes, background_bytes, simple_target=True)
    return res["target"][0]

第二种

def get_distance_by_two_img(full_img_name, cut_img_name):
    """
    比较完整图和缺口图
    """
    slide = ddddocr.DdddOcr(det=False, ocr=False, show_ad=False)
    target_bytes = open(cut_img_name, "rb").read()
    background_bytes = open(full_img_name, "rb").read()
    res = slide.slide_comparison(target_bytes, background_bytes)
    return res["target"][0]

选出长度较短的列表

1	min_lst = min(ori_lst, des_lst, key=len)

js文件导入模块

1
2
3

// a.js，统一目录下
const b = require('./b.js');
b.encrypt("your data");  // 调用 b.js 中的 encrypt 函数，并传入参数

b.js 需要将模块主动导出，有两种方式

// b.js
function encrypt(data) {
  // 函数逻辑
}

module.exports = {
  encrypt: encrypt
};

// b.js
exports.encrypt = function(data) {
  // 函数逻辑
};