AI 时代逆向工程与开源复刻完全指南

以 Open Computer Use 为案例的通用方法论

Codex Computer Use 开源复刻的实战经验总结适用于所有类似的"分析 → 逆向 → 复刻 → 开源"项目

项目背景与问题定义
完整技术栈总览
阶段一：侦察与信息收集
阶段二：静态分析与逆向工程
阶段三：动态分析与抓包
阶段四：破解进程签名限制
阶段五：核心功能实现
阶段六：对比验证与 Eval 闭环
阶段七：产品化与发布
阶段八：攻克视觉细节（鼠标动画逆向）
通用方法论总结
以后类似项目的完整 SOP

1. 项目背景与问题定义

1.1 什么是 Computer Use

Computer Use 是一种让 AI Agent 自主操控电脑界面（鼠标、键盘、应用程序）来完成任务的技术。传统实现有两种方式：

Connectors 模式：直接调用 Gmail、Slack 等应用的 API，不操控界面
GUI 模式：模拟鼠标和键盘事件，在屏幕上直接操作

OpenAI Codex 的创新点：Background Computer Use，即非抢占式操作。在此之前所有方案都需要"占用"屏幕，用户不能同时使用电脑。Codex 的方案允许 AI 在后台操控，用户完全不感知。

1.2 目标拆解

在动手之前，团队首先明确了要复刻的是什么：

原始目标：复刻 Codex Computer Use 的"非抢占式"核心能力
拆解为：
  ├── 功能性目标（必须完成）
  │   ├── 后台与 UI 交互
  │   ├── 截图回传做多模态推理
  │   └── 对外提供 MCP 服务（9个工具）
  └── 体验性目标（加分项）
      ├── 灵动的鼠标动画效果
      ├── 权限申请浮窗
      └── 一键安装/发布

关键决策：先功能后体验，分阶段推进，不被视觉细节阻塞主流程。

2. 完整技术栈总览

2.1 分析工具链

工具	用途	原理
`file` / `strings`	二进制基础信息	读取 ELF/Mach-O header 和可打印字符串
`class-dump` / `nm`	Swift/ObjC 符号导出	从二进制符号表提取类名和方法名
`Hopper Disassembler`	汇编级逆向	将机器码反编译为伪代码
`mitmdump`	HTTPS 中间人抓包	代理拦截，解密 TLS 流量
`mitmproxy`	抓包可视化界面	mitmdump 的 GUI 版本
`Codex AI`	辅助分析二进制	多模态理解反编译结果

2.2 开发技术栈

层次	技术	用途
核心实现	Swift	macOS 原生，访问 AX API
服务封装	MCP（Model Context Protocol）	对外提供标准化工具接口
签名绕过	Go	编写 CLI 借用 Codex.app 签名
包管理发布	npm / Node.js	`npm i -g open-computer-use`
动画算法	Swift + 贝塞尔曲线	鼠标运动路径计算
多媒体处理	ffmpeg / ImageMagick	视频抽帧、图像处理

2.3 AI 辅助工具使用

Codex（主力）
  ├── 分析二进制文件
  ├── 并行开多个 session 分工协作
  ├── 抓包配置与执行
  └── 代码生成与校验

Grok
  └── 挖掘 Twitter/X 上的相关技术线索

Claude / GPT-4o
  └── 多模态分析（截图、视频帧理解）

3. 阶段一：侦察与信息收集

3.1 目标：找到可分析的入口

在逆向任何封闭系统之前，第一步永远是找到实体文件。

操作步骤

# 1. 找到 Codex App 的 Computer Use 插件位置
ls -la ~/.codex/plugins/cache/openai-bundled/computer-use/

# 2. 查看目录结构
find ~/.codex/plugins/cache/openai-bundled/computer-use/ -type f | head -50

# 3. 检查文件大小，评估分析工作量
du -sh "~/.codex/plugins/cache/openai-bundled/computer-use/1.0.750/Codex Computer Use.app"
# → 26.5MB，工作量可控

# 4. 查看 .app 内部结构（macOS app bundle 本质是目录）
ls -la "Codex Computer Use.app/Contents/"
# → MacOS/（可执行文件）、Frameworks/、Resources/

发现的关键信息

整个功能是一个独立的 macOS App Bundle，26.5MB
核心可执行文件是 SkyComputerUseClient
对外通过 MCP 协议提供服务（这意味着接口是标准化的）

为什么这一步重要

找到实体文件 = 找到了"战场"。没有文件，后续所有分析都是空谈。文件大小决定分析成本，文件类型决定分析方法。

3.2 收集公开信息

在动手分析之前，先收集所有公开的"免费情报"：

公开信息来源：
├── OpenAI 官方博客（功能介绍、截图）
├── Software.inc 团队成员的 Twitter/X 推文
├── App Store / 发布日志
├── MCP 协议规范文档（mcpprotocol.com）
└── GitHub 相关开源项目

实际操作：让 Grok 搜索 Software.inc 和 Ari 的推文，从评论里推断技术关键词。

4. 阶段二：静态分析与逆向工程

4.1 提取符号表（Swift 二进制）

Swift 编译后的二进制包含大量符号信息，可以不执行程序就获得类结构。

# 提取所有符号（函数名、类名、协议）
nm "Codex Computer Use.app/Contents/MacOS/SkyComputerUseClient" | grep -i "computer\|screen\|mouse\|cursor"

# 使用 class-dump 提取 ObjC/Swift 类定义
class-dump "Codex Computer Use.app/Contents/MacOS/SkyComputerUseClient" > symbols.txt

# strings 提取可读字符串（发现 API 端点、错误信息等）
strings "Codex Computer Use.app/Contents/MacOS/SkyComputerUseClient" | grep -E "mcp|tool|accessibility|screen"

发现的关键信息

从符号表中提取到的关键线索：

- AXUIElement（macOS Accessibility API 相关）
- CGScreenCapture（截图 API）
- SkyComputerUseClient（主类名）
- MCP Tool 相关：共 9 个工具名称
- osascript（Apple Script 降级方案）

4.2 AI 辅助分析反编译结果

直接把反编译截图或文本丢给 Codex/Claude：

Prompt 模板：
"这是从 [二进制名称] 反编译出的代码片段，请分析：
1. 这段代码的核心功能是什么？
2. 关键的数据结构和接口定义是什么？
3. 它是如何与 [目标系统] 交互的？
请把分析结果整理成文档。"

实际发现的架构：

SkyComputerUseClient
├── MCP Server Layer（对外：9个标准化工具）
│   ├── tool_1: screenshot（截图）
│   ├── tool_2: click（点击）
│   ├── tool_3: type_text（输入文字）
│   ├── tool_4: scroll（滚动）
│   ├── tool_5: find_element（查找 UI 元素）
│   ├── tool_6: get_ui_tree（获取 AX 树）
│   ├── tool_7: run_applescript（执行 Apple Script）
│   ├── tool_8: key_press（按键）
│   └── tool_9: wait（等待）
└── Core Interaction Layer（底层：三种操控方式）
    ├── AX API（首选，后台操控）
    ├── osascript（AX 失败降级）
    └── CGEvent（鼠标事件，最后手段）

4.3 核心原理：macOS Accessibility API

这是整个"非抢占式"的技术基础，必须理解透彻。

什么是 Accessibility API（AX API）

macOS 为了支持视障用户，提供了一套可以程序化读取和操控所有 UI 元素的接口，叫做 AXUIElement。

// AX API 的核心能力
import ApplicationServices

// 1. 获取某个应用的 AX 树（UI 元素树）
let appRef = AXUIElementCreateApplication(pid)

// 2. 查找特定的 UI 元素（按标题、角色等）
var value: CFTypeRef?
AXUIElementCopyAttributeValue(appRef, kAXWindowsAttribute as CFString, &value)

// 3. 在后台点击某个按钮（不需要窗口在前台！）
AXUIElementPerformAction(buttonRef, kAXPressAction as CFString)

// 4. 在后台输入文字
AXUIElementSetAttributeValue(fieldRef, kAXValueAttribute as CFString, "Hello" as CFTypeRef)

关键特性：AX API 的所有操作不需要目标窗口在前台，这就是"非抢占式"的根本原因。

AX → osascript → CGEvent 三级降级策略

func performClick(element: AXUIElement?) {
    // 第一优先：AX API（最精准，后台可用）
    if let element = element {
        let result = AXUIElementPerformAction(element, kAXPressAction as CFString)
        if result == .success { return }
    }
    
    // 第二优先：Apple Script（兼容性好）
    let script = "tell application \"Safari\" to click button 1 of window 1"
    NSAppleScript(source: script)?.executeAndReturnError(nil)
    
    // 最后手段：CGEvent 鼠标模拟（会占用屏幕）
    let event = CGEvent(mouseEventSource: nil, mouseType: .leftMouseDown, 
                        mouseCursorPosition: point, mouseButton: .left)
    event?.post(tap: .cghidEventTap)
}

5. 阶段三：动态分析与抓包

5.1 为什么需要抓包

静态分析能告诉你"有什么"，但工具的精确参数定义（JSON Schema）只有在实际调用时才能获得。如果手动重写参数定义，很难做到 100% 严格对齐。

目标：直接从 Codex 的实际网络请求中捕获：

完整的 system prompt
9 个工具的精确 JSON Schema 定义
实际的请求/响应格式

5.2 配置 mitmdump 进行 HTTPS 中间人抓包

安装与基础配置

# 安装 mitmproxy（包含 mitmdump）
pip install mitmproxy
# 或
brew install mitmproxy

# 启动 mitmdump，监听 8080 端口
mitmdump -p 8080 --save-stream-file capture.mitm

# 查看实时流量（可视化）
mitmweb -p 8080

安装 mitmproxy 根证书（信任 HTTPS 解密）

# 1. 启动 mitmproxy 后，浏览器访问 mitm.it
# 2. 下载对应平台的证书
# 3. 在 macOS 钥匙串中信任该证书

# 或通过命令行安装
security add-trusted-cert -d -r trustRoot -k ~/Library/Keychains/login.keychain ~/.mitmproxy/mitmproxy-ca-cert.pem

配置 Codex 走代理

# 方法一：系统代理（推荐）
# 系统偏好设置 → 网络 → 高级 → 代理 → HTTP/HTTPS 代理 → 127.0.0.1:8080

# 方法二：环境变量
export HTTP_PROXY=http://127.0.0.1:8080
export HTTPS_PROXY=http://127.0.0.1:8080

# 方法三：让 Codex 自己配置（套娃）
# 直接告诉 Codex："请帮我配置 mitmdump 并启动抓包"

5.3 套娃抓包：让 Codex 调用自己并抓包

这是整个过程中最精妙的操作：

用户 → 告诉 Codex："调用你的 computer-use 插件执行一个截图任务，同时用 mitmdump 抓包"
   ↓
Codex 配置好代理环境
   ↓
Codex 调用自己的 computer-use MCP 工具
   ↓
mitmdump 捕获所有 HTTPS 请求
   ↓
用户获得完整的 tools 定义和 system prompt

捕获到的数据结构（示例）

{
  "model": "gpt-4o",
  "tools": [
    {
      "name": "screenshot",
      "description": "Capture a screenshot of the current screen state...",
      "input_schema": {
        "type": "object",
        "properties": {
          "app_name": {
            "type": "string",
            "description": "Target application name"
          }
        }
      }
    }
    // ... 其余 8 个工具的精确定义
  ],
  "system": "You are a computer use agent capable of..."
}

5.4 编写 mitmdump 过滤脚本

# filter.py - 只捕获 computer-use 相关请求
from mitmproxy import http
import json

def request(flow: http.HTTPFlow) -> None:
    if "computer-use" in flow.request.pretty_url or \
       "api.openai.com" in flow.request.pretty_url:
        # 保存请求 body
        if flow.request.content:
            data = json.loads(flow.request.content)
            with open("captured_tools.json", "w") as f:
                json.dump(data, f, indent=2)
            print(f"[+] Captured request to {flow.request.pretty_url}")

# 运行：mitmdump -s filter.py -p 8080

6. 阶段四：破解进程签名限制

6.1 问题发现

# 尝试直接用 MCP Client 连接官方插件
npx @modelcontextprotocol/inspector "Codex Computer Use.app/Contents/MacOS/SkyComputerUseClient"

# 结果：进程立即崩溃
# Error: Process terminated with signal 9 (SIGKILL)

6.2 诊断原因

# 查看进程崩溃日志
log show --predicate 'process == "SkyComputerUseClient"' --last 5m

# 分析崩溃报告
cat ~/Library/Logs/DiagnosticReports/SkyComputerUseClient*.crash | grep -A 20 "Exception"

# 关键发现：进程使用了 SecCodeCopyGuestWithAttributes 验证父进程签名
# 只有 Codex.app 签名的父进程才能启动它

6.3 解决方案：签名继承代理

思路：写一个 Go 程序，让它在 Codex.app 的进程上下文中运行，从而继承正确的签名。

// launcher.go
package main

import (
    "fmt"
    "os"
    "os/exec"
    "path/filepath"
)

func main() {
    // 找到 SkyComputerUseClient 的路径
    pluginPath := filepath.Join(
        os.Getenv("HOME"),
        ".codex/plugins/cache/openai-bundled/computer-use/1.0.750",
        "Codex Computer Use.app/Contents/MacOS/SkyComputerUseClient",
    )
    
    // 启动进程，继承当前进程的签名上下文
    cmd := exec.Command(pluginPath)
    cmd.Stdin = os.Stdin
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr
    
    if err := cmd.Run(); err != nil {
        fmt.Fprintf(os.Stderr, "Failed to launch: %v\n", err)
        os.Exit(1)
    }
}

# 编译
go build -o codex-launcher launcher.go

# 关键：使用 codesign 让 launcher 借用 Codex.app 的签名
# （或者从 Codex.app 进程内部调用，直接继承）

# 实际操作：让 Codex 自己执行这个 launcher
# 因为 Codex 本身就是有效签名的进程，从它 fork 出来的子进程继承签名

6.4 验证成功

# 通过 CLI 成功调用官方 MCP
echo '{"jsonrpc":"2.0","method":"tools/list","id":1}' | ./codex-launcher

# 预期输出：9 个工具的完整列表
{
  "result": {
    "tools": [
      {"name": "screenshot", ...},
      // ...
    ]
  }
}

7. 阶段五：核心功能实现

7.1 项目结构（从 harness-template 开始）

open-computer-use/
├── Sources/
│   └── ComputerUse/
│       ├── main.swift              # 入口
│       ├── MCPServer.swift         # MCP 协议层
│       ├── AccessibilityEngine.swift # AX API 核心
│       ├── ScreenCapture.swift     # 截图模块
│       ├── AppleScriptFallback.swift # 降级方案
│       └── Tools/                  # 9 个 MCP 工具实现
│           ├── ScreenshotTool.swift
│           ├── ClickTool.swift
│           ├── TypeTextTool.swift
│           └── ...
├── Package.swift
├── install.sh                      # 一键安装脚本
└── docs/                           # AI 持续输出的分析文档（LLM Wiki）
    ├── architecture.md
    ├── ax-api-reference.md
    └── mcp-tools-spec.md

7.2 MCP Server 实现

MCP（Model Context Protocol）是 Anthropic 提出的标准化工具接口协议。AI 模型通过 MCP 调用外部工具。

// MCPServer.swift
import Foundation

struct MCPServer {
    // 工具注册表
    let tools: [String: MCPTool] = [
        "screenshot": ScreenshotTool(),
        "click": ClickTool(),
        "type_text": TypeTextTool(),
        "scroll": ScrollTool(),
        "find_element": FindElementTool(),
        "get_ui_tree": GetUITreeTool(),
        "run_applescript": RunAppleScriptTool(),
        "key_press": KeyPressTool(),
        "wait": WaitTool(),
    ]
    
    // 处理 JSON-RPC 请求
    func handle(request: JSONRPCRequest) -> JSONRPCResponse {
        switch request.method {
        case "tools/list":
            return listTools()
        case "tools/call":
            return callTool(request)
        default:
            return errorResponse("Unknown method")
        }
    }
    
    // 通过 stdio 与 MCP Client 通信
    func run() {
        while let line = readLine() {
            guard let data = line.data(using: .utf8),
                  let request = try? JSONDecoder().decode(JSONRPCRequest.self, from: data) else {
                continue
            }
            let response = handle(request: request)
            let responseJSON = try! JSONEncoder().encode(response)
            print(String(data: responseJSON, encoding: .utf8)!)
        }
    }
}

7.3 核心工具实现示例

// ScreenshotTool.swift
import ScreenSaver
import AppKit

struct ScreenshotTool: MCPTool {
    var name = "screenshot"
    var description = "Capture a screenshot of the current screen or a specific application window"
    var inputSchema: JSONSchema = [
        "type": "object",
        "properties": [
            "app_name": ["type": "string", "description": "Target app name (optional)"]
        ]
    ]
    
    func execute(input: [String: Any]) async throws -> MCPToolResult {
        let appName = input["app_name"] as? String
        
        // 截图逻辑
        let screenshot: NSImage
        if let app = appName {
            screenshot = try captureApp(named: app)  // 截取特定 app
        } else {
            screenshot = try captureScreen()  // 截取全屏
        }
        
        // 转为 base64 返回
        let base64 = screenshot.tiffRepresentation?
            .base64EncodedString() ?? ""
        
        return MCPToolResult(
            content: [["type": "image", "data": base64, "mimeType": "image/png"]]
        )
    }
}

// AccessibilityEngine.swift - 核心后台操控实现
import ApplicationServices

class AccessibilityEngine {
    // 查找 UI 元素
    func findElement(inApp pid: pid_t, matching query: ElementQuery) -> AXUIElement? {
        let appRef = AXUIElementCreateApplication(pid)
        return searchUITree(root: appRef, query: query)
    }
    
    // 递归搜索 UI 树
    private func searchUITree(root: AXUIElement, query: ElementQuery) -> AXUIElement? {
        var children: CFTypeRef?
        AXUIElementCopyAttributeValue(root, kAXChildrenAttribute as CFString, &children)
        
        guard let childArray = children as? [AXUIElement] else { return nil }
        
        for child in childArray {
            if matches(element: child, query: query) { return child }
            if let found = searchUITree(root: child, query: query) { return found }
        }
        return nil
    }
    
    // 后台点击（不需要窗口前台！）
    func click(element: AXUIElement) throws {
        let result = AXUIElementPerformAction(element, kAXPressAction as CFString)
        guard result == .success else {
            throw AccessibilityError.actionFailed(result)
        }
    }
    
    // 后台输入文字
    func typeText(_ text: String, into element: AXUIElement) throws {
        let result = AXUIElementSetAttributeValue(
            element,
            kAXValueAttribute as CFString,
            text as CFTypeRef
        )
        guard result == .success else {
            throw AccessibilityError.setValueFailed(result)
        }
    }
}

8. 阶段六：对比验证与 Eval 闭环

8.1 设计对比验证框架

有了"能调用官方版"的能力后，建立严格的对比验证体系：

# 测试脚本：同一任务分别跑官方和开源版本
#!/bin/bash
TASK="截取 Safari 浏览器当前页面的截图"

echo "=== 官方版本 ==="
echo "$TASK" | codex --use-plugin=computer-use 2>&1 | tee official_output.json

echo "=== 开源版本 ==="  
echo "$TASK" | codex --mcp-server=open-computer-use 2>&1 | tee opensource_output.json

# 对比差异
diff official_output.json opensource_output.json

8.2 多维度对比指标

对比维度：
├── 功能性（P0）
│   ├── 工具调用成功率
│   ├── 截图质量（分辨率、完整性）
│   ├── UI 元素定位准确率
│   └── 文字输入正确率
├── 性能（P1）
│   ├── 响应延迟
│   ├── 内存占用
│   └── CPU 使用率
└── 稳定性（P2）
    ├── 长时间运行稳定性
    └── 异常恢复能力

8.3 Dog Fooding：用自己的产品做测试

最后阶段直接用 open-computer-use 来开发 open-computer-use 本身。这既是测试，也是最真实的验证：

用 open-computer-use 执行的任务：
- 打开 Xcode，修改某个文件
- 在 Terminal 里运行测试
- 截图验证 UI 变化
- 提交 Git commit

9. 阶段七：产品化与发布

9.1 权限申请 UI（参考 Software.inc 方案）

macOS 需要明确申请两个权限：

Accessibility（AX API 必须）
Screen Recording（截图必须）

// PermissionWindow.swift - 可拖动浮窗
import AppKit

class PermissionFloatingWindow: NSPanel {
    init() {
        super.init(
            contentRect: NSRect(x: 0, y: 0, width: 320, height: 200),
            styleMask: [.titled, .closable, .miniaturizable, .utilityWindow],
            backing: .buffered,
            defer: false
        )
        
        // 浮动在所有窗口之上
        self.level = .floating
        self.isMovableByWindowBackground = true  // 拖动任意位置移动
        
        // 检查权限状态并引导用户开启
        setupPermissionChecks()
    }
    
    func setupPermissionChecks() {
        // 检查 Accessibility 权限
        let axEnabled = AXIsProcessTrusted()
        
        // 检查 Screen Recording 权限
        let screenEnabled = CGPreflightScreenCaptureAccess()
        
        // 根据状态更新 UI
        updateUI(ax: axEnabled, screen: screenEnabled)
    }
    
    @objc func openAccessibilitySettings() {
        NSWorkspace.shared.open(URL(string: "x-apple.systempreferences:com.apple.preference.security?Privacy_Accessibility")!)
    }
}

9.2 发布到 npm

// package.json
{
  "name": "open-computer-use",
  "version": "1.0.0",
  "description": "Open-source alternative to Codex Computer Use",
  "bin": {
    "open-computer-use": "./bin/open-computer-use.js"
  },
  "scripts": {
    "postinstall": "node scripts/install.js"
  }
}

// bin/open-computer-use.js
#!/usr/bin/env node
const { execSync } = require('child_process');
const path = require('path');

// 找到 Swift 编译的二进制
const binaryPath = path.join(__dirname, '../bin/ComputerUse');

// 直接启动 MCP 服务
execSync(binaryPath, { stdio: 'inherit' });

# install.js - 安装时自动编译 Swift 代码
const { execSync } = require('child_process');
execSync('swift build -c release', { cwd: __dirname });

# 发布
npm publish

# 用户安装
npm install -g open-computer-use

# 添加到 Codex MCP 配置（一键命令）
open-computer-use install-to-codex

9.3 一键接入 Codex MCP

// 自动修改 ~/.codex/config.json
const configPath = path.join(os.homedir(), '.codex', 'config.json');
const config = JSON.parse(fs.readFileSync(configPath, 'utf8'));

config.mcpServers = config.mcpServers || {};
config.mcpServers['open-computer-use'] = {
    command: 'open-computer-use',
    args: []
};

fs.writeFileSync(configPath, JSON.stringify(config, null, 2));
console.log('✅ open-computer-use 已添加到 Codex MCP 配置');

9.4 LOGO 设计（全程 AI）

Prompt 给 Codex：
"为一个叫 open-computer-use 的项目设计 LOGO，
主题是：AI 在后台操控电脑，鼠标光标是核心元素，
风格：简洁、现代、极客感，
格式：SVG，多个方案供选择"

→ AI 输出多个 SVG
→ 用 ffmpeg/ImageMagick 转换格式
→ AI 自己验收（截图发给 AI 确认）
→ 选定方案

10. 阶段八：攻克视觉细节（鼠标动画逆向）

这是最硬核的部分，演示了如何将视觉效果也逆向工程。

10.1 视频帧分析

# 下载 Software.inc 作者的演示视频
# 用 ffmpeg 抽帧（每秒 30 帧）
ffmpeg -i demo_video.mp4 -vf fps=30 frames/frame_%04d.png

# 让 Codex 分析关键帧
# 找到鼠标位置变化序列，逆推运动曲线

10.2 关键词定位与论文搜索

从 Twitter 评论中提取关键词：calculates natural and aesthetic motion paths

让 AI 搜索相关资料：
├── 贝塞尔曲线（Bezier Curve）鼠标路径
├── Fitts' Law（费茨定律）—— 人类鼠标运动规律
├── Critically Damped Spring Animation
├── 相关论文："Natural Mouse Trajectory Simulation"
└── 开源实现：human-cursor、naturalmouser

10.3 二进制逆向提取算法

当 AI 和论文都不够精确时，直接逆向二进制：

# 用 Hopper/IDA 反编译鼠标动画相关函数
# 搜索函数名关键词
strings SkyComputerUseClient | grep -i "cursor\|animate\|bezier\|easing"

# 在反编译器中定位到核心函数
# 让 AI 分析汇编/伪代码，还原算法

AI Prompt：
"这是从 SkyComputerUseClient 中反编译出的鼠标动画函数，
请分析它的算法原理，并用 Swift 重新实现：
[粘贴反编译的伪代码]"

10.4 鼠标路径算法实现

// CursorAnimator.swift
import CoreGraphics
import QuartzCore

class NaturalCursorAnimator {
    // 三次贝塞尔曲线路径
    func generatePath(from start: CGPoint, to end: CGPoint) -> [CGPoint] {
        // 在起点和终点之间生成自然的控制点
        let midX = (start.x + end.x) / 2
        let midY = (start.y + end.y) / 2
        
        // 加入随机偏移，模拟人手抖动
        let offset = CGFloat.random(in: -20...20)
        let control1 = CGPoint(x: midX + offset, y: start.y + offset * 0.5)
        let control2 = CGPoint(x: midX - offset, y: end.y - offset * 0.5)
        
        // 沿贝塞尔曲线采样路径点
        return sampleBezierCurve(
            p0: start, p1: control1, p2: control2, p3: end,
            steps: Int(distance(start, end) / 5)  // 距离越远，采样点越多
        )
    }
    
    // 速度曲线：加速 → 匀速 → 减速（模拟人类习惯）
    func easeInOutCubic(_ t: CGFloat) -> CGFloat {
        if t < 0.5 {
            return 4 * t * t * t
        } else {
            let f = 2 * t - 2
            return 0.5 * f * f * f + 1
        }
    }
    
    // 沿路径移动虚拟光标（不影响真实鼠标）
    func animateCursor(along path: [CGPoint], completion: @escaping () -> Void) {
        var index = 0
        Timer.scheduledTimer(withTimeInterval: 1.0/60.0, repeats: true) { timer in
            guard index < path.count else {
                timer.invalidate()
                completion()
                return
            }
            
            let t = CGFloat(index) / CGFloat(path.count)
            let easedT = self.easeInOutCubic(t)
            
            // 更新虚拟光标位置（绘制在 overlay window 上）
            self.updateVirtualCursor(to: path[index], opacity: easedT > 0.9 ? 1 - (easedT - 0.9) * 10 : 1)
            index += 1
        }
    }
}

11. 通用方法论总结

11.1 解决问题的元框架

┌─────────────────────────────────────────────────────┐
│                   问题定义                           │
│    What（要做什么）+ Why（为什么可行）              │
└─────────────────────┬───────────────────────────────┘
                      ↓
┌─────────────────────────────────────────────────────┐
│                   信息收集                           │
│    公开情报 + 静态分析 + 动态抓包                   │
└─────────────────────┬───────────────────────────────┘
                      ↓
┌─────────────────────────────────────────────────────┐
│                   分解与并行                         │
│    将大问题切成独立模块，多个 AI Session 并行推进   │
└─────────────────────┬───────────────────────────────┘
                      ↓
┌─────────────────────────────────────────────────────┐
│                   实现与验证                         │
│    最小可用版本 → 对比验证 → Dog Fooding            │
└─────────────────────┬───────────────────────────────┘
                      ↓
┌─────────────────────────────────────────────────────┐
│                   产品化与发布                       │
│    打包 → 发布 → 录屏 → 开源                        │
└─────────────────────────────────────────────────────┘

11.2 AI 辅助原则

给 AI 上下文是人的核心责任

AI 能力的上限 = 你给它的上下文质量。当 AI 卡住时，问题不是 AI 不行，而是缺少正确的上下文。

AI 擅长做的事：
✅ 分析已有代码/二进制/文档
✅ 并行处理多个独立任务
✅ 按精确规格生成代码
✅ 在有参考资料时快速实现算法
✅ 图像/视频多模态分析

人需要做的事：
✅ 决定做什么、不做什么
✅ 收集并提供关键上下文
✅ 在多个方案中做判断
✅ 发现 AI 的盲区并补充信息
✅ 把握整体方向和节奏

11.3 遇到困难时的排查路径

问题：实现出来效果和原版不一致
    ↓
Step 1：是否有更精确的参考资料？（抓包、逆向、论文）
    ↓
Step 2：给 AI 的上下文是否足够具体？
    ↓
Step 3：是否可以建立自动化的对比验证？
    ↓
Step 4：能否直接逆向二进制获取真实算法？

12. 以后类似项目的完整 SOP

适用于所有"分析封闭系统、复刻开源"类项目的标准操作程序。

Phase 0：准备工作（30分钟）

# 1. 从 harness-template 创建新项目
gh repo create my-project --template your-org/harness-template
cd my-project

# 2. 建立 docs/ 文件夹，让 AI 持续沉淀文档
mkdir docs

# 3. 明确拆解目标
cat > docs/goal.md << EOF
## 目标
复刻：[目标系统名称]

## 功能拆解
- P0（必须）：
- P1（重要）：
- P2（可选）：

## 成功标准
- [ ] 功能性对比验证通过
- [ ] 性能指标达标
- [ ] 可发布状态
EOF

Phase 1：信息收集（1-2小时）

# 1. 找到目标文件
find / -name "*[target]*" 2>/dev/null

# 2. 基础分析
file target_binary
strings target_binary | tee docs/strings.txt
nm target_binary | tee docs/symbols.txt

# 3. 让 AI 分析并沉淀到 docs
# Prompt: "分析这些符号，推断系统架构，写入 docs/architecture.md"

# 4. 收集公开情报
# - 官方博客 / 文档
# - GitHub Issues / PR
# - Twitter/X 相关讨论
# - 学术论文

Phase 2：动态分析（1-2小时）

# 配置抓包
pip install mitmproxy
mitmdump -p 8080 --save-stream-file capture.mitm &

# 配置代理，触发目标功能
export HTTPS_PROXY=http://127.0.0.1:8080

# 让 AI 分析捕获的流量
# Prompt: "分析 capture.mitm，提取所有 API 接口定义，写入 docs/api-spec.md"

Phase 3：核心实现（4-6小时）

# 并行开多个 AI Session
# Session A: 实现核心功能
# Session B: 实现辅助工具
# Session C: 处理边界情况和错误处理
# Session D: 编写测试

# 每个 Session 的上下文必须包含：
# - docs/architecture.md（系统架构）
# - docs/api-spec.md（接口定义）
# - 当前模块的具体要求

Phase 4：验证闭环（1-2小时）

# 建立对比测试
./scripts/compare.sh "测试任务描述" official open-source

# Dog Fooding
# 用自己的工具做开发工作，发现实际问题

# 修复 → 验证 → 迭代

Phase 5：发布（1小时）

# 打包
npm init && npm publish
# 或
go build && goreleaser release

# 录屏
# 让 AI 推荐配乐网站，下载免版权音乐
ffmpeg -i screen_record.mov -i music.mp3 -shortest output.mp4

# 开源
gh repo create open-[target-name] --public
git push

结语

"AI 时代改变的只是解决问题的方法，但是 Geek 或者说解决问题的人，依然是不变的。"