Ver código fonte

新增模板导入功能

kyle 5 dias atrás
pai
commit
71e2503eb7

+ 446 - 0
generate-data-report-ppt/CUSTOM_TEMPLATE_DESIGN.md

@@ -0,0 +1,446 @@
+# 用户自定义 PPT 模板支持 — 修改方案
+
+## 1. 需求概述
+
+### 现状
+- Skill 内置 3 套固定模板:`report-master.pptx`(日报)、`weekly-master.pptx`(周报)、`monthly-master.pptx`(月报)
+- 模板存放于 `assets/` 目录,按 `report_type` 硬编码映射
+- 构建流程:加载模板 → 复制母版幻灯片 → 替换占位符 → 插入图表/文本 → 删除原始模板页
+
+### 目标
+- **用户可上传自定义 `.pptx` 模板**,skill 按该模板的样式(配色、字体、布局、背景)生成报告
+- **工作流程完全不变**:数据探查 → 六项确认 → 生成 PPT → 质量自检
+- **改动范围最小化**:仅在"模板加载与样式适配"环节做修改,不触及数据分析与洞察生成逻辑
+
+---
+
+## 2. 核心设计思路
+
+```
+┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
+│  用户上传模板    │ ──→ │  TemplateParser  │ ──→ │ TemplateProfile │
+│  (任意.pptx)    │     │  解析结构+样式   │     │ 结构化模板描述   │
+└─────────────────┘     └──────────────────┘     └─────────────────┘
+                                                          │
+                              ┌───────────────────────────┘
+                              ▼
+                    ┌─────────────────────┐
+                    │   ppt_builder.py    │
+                    │  按 TemplateProfile │
+                    │  动态选择母版页复制  │
+                    │  动态应用配色/字体   │
+                    └─────────────────────┘
+```
+
+**关键原则**:
+1. **解析而非约束**:自动识别模板中的母版页类型、占位符、配色、字体,不强求用户按固定规范制作模板
+2. **映射兜底**:若用户模板缺少某类母版页(如目录页),自动回退到内置模板或跳过该页
+3. **样式继承优先**:图表、卡片、文本的颜色/字体优先使用模板提取的样式,用户显式配置可覆盖
+
+---
+
+## 3. 模块级修改清单
+
+### 3.1 新增模块:`scripts/template_parser.py`(模板解析引擎)
+
+**职责**:读取任意 `.pptx` 模板,输出结构化的 `TemplateProfile`
+
+**核心数据结构**:
+```python
+@dataclass
+class MasterSlideInfo:
+    slide_index: int           # 在 prs.slides 中的索引
+    master_type: str           # 'cover' | 'content' | 'toc' | 'end' | 'unknown'
+    placeholders: list[str]    # 检测到的占位符列表,如 ['{report_title}', '{page_title}']
+    content_top: int           # 内容区域起始 Y(EMU),通过 {page_title} 底部 + gap 推算
+    has_footer: bool           # 是否自带页脚
+    has_background: bool       # 是否有背景图/形状
+
+@dataclass
+class TemplateProfile:
+    path: str
+    is_builtin: bool
+    slide_width: int           # 幻灯片宽度(EMU)
+    slide_height: int          # 幻灯片高度(EMU)
+    master_slides: list[MasterSlideInfo]
+    placeholder_map: dict      # 全局占位符 → 母版页索引 映射
+    detected_theme: dict       # 提取的颜色 {primary, accent, text, ...}
+    detected_fonts: dict       # 提取的字体 {title_font, body_font, number_font}
+    safe_margins: dict         # {left, right, top, bottom} in EMU
+    
+    # 快捷访问方法
+    def get_master_for(self, page_type: str) -> MasterSlideInfo: ...
+    def get_content_top(self, page_type: str = 'content') -> int: ...
+```
+
+**解析逻辑**:
+
+| 检测项 | 方法 | 说明 |
+|--------|------|------|
+| 母版页类型 | `_detect_master_type(slide)` | 通过文本特征匹配:`{report_title}`→cover,`{page_title}`→content,`{chapter`→toc,末页+感谢文字→end |
+| 占位符 | `_scan_placeholders(slide)` | 正则匹配 `{\w+}` 形式的文本 |
+| 内容区域 | `_detect_content_top(slide)` | 已有 `_detect_content_top`,迁移并增强:识别标题 shape 底部 + 动态 gap |
+| 主题色 | `_extract_colors(slide)` | 从母版形状的 fill/line 颜色、主题色表(`slide.slide_layout.slide_master.theme.color_scheme`)提取主色、强调色 |
+| 字体 | `_extract_fonts(slide)` | 统计各母版页中不同字体的使用频次,取标题区(top < 1.5M EMU)最多字体为 title_font,内容区最多为 body_font |
+| 尺寸 | `prs.slide_width`, `prs.slide_height` | 读取实际尺寸,支持宽屏(16:9)和标准(4:3) |
+
+**内置模板适配**:
+- 对 `assets/` 中的 3 个内置模板也走同一套解析流程,输出 `TemplateProfile`
+- 这样内置/自定义模板在下游的接口完全一致
+
+---
+
+### 3.2 修改模块:`scripts/report_config.py`
+
+**新增字段**:
+```python
+@dataclass
+class ReportConfig:
+    # ... 原有字段 ...
+    template_path: str = ''           # 已有,保持不变
+    template_profile: Optional['TemplateProfile'] = None  # NEW: 解析后的模板描述
+    use_template_theme: bool = True   # NEW: 是否用模板提取的配色覆盖默认主题
+    
+    # 确认项扩展(第5项确认)
+    # 原:"页面结构与模板方案"
+    # 现增加展示:模板解析结果(可用母版页、检测到的配色、字体)
+```
+
+**ConfirmationSpec 扩展**(用户确认时展示模板信息):
+```python
+@dataclass
+class TemplateConfirmationDetail:
+    template_name: str
+    detected_master_types: list[str]  # ['cover', 'content', 'toc', 'end']
+    detected_colors: dict
+    detected_fonts: dict
+    warnings: list[str]               # 如 "未检测到目录页母版,将自动生成"
+```
+
+---
+
+### 3.3 修改模块:`scripts/theme_manager.py`
+
+**新增函数**:
+```python
+def extract_theme_from_template(template_profile: 'TemplateProfile') -> ThemeConfig:
+    """
+    从 TemplateProfile 提取的颜色构建 ThemeConfig。
+    映射规则:
+      - primary → 母版中面积最大的深色填充 / 主题色表第一个颜色
+      - accent → 母版中出现频次最高的亮色(绿/蓝/橙)
+      - text → 母版正文文本颜色 / 默认 #333333
+      - card_bg → primary 的 10% 透明度浅色变体
+      - chart_series → 从母版色表提取前8色
+    """
+
+def merge_theme(template_theme: ThemeConfig, user_theme: ThemeConfig) -> ThemeConfig:
+    """
+    合并模板主题与用户显式配置,用户配置优先。
+    """
+```
+
+**修改 `get_theme()`**:
+- 新增预设 `ThemePreset.FROM_TEMPLATE = 'from_template'`
+- 当 `config.theme.preset == FROM_TEMPLATE` 或 `config.use_template_theme == True` 时,调用 `extract_theme_from_template()`
+
+---
+
+### 3.4 修改模块:`scripts/page_layouts.py`
+
+**核心问题**:当前布局常量(`SLIDE_WIDTH = 16256000` 等)是硬编码的 16:9 尺寸,若用户模板是 4:3 或其他尺寸会错位。
+
+**修改方案**:
+1. 保留全局常量作为**默认值**
+2. 所有计算函数增加 `slide_width` / `slide_height` 可选参数
+3. 新增 `LayoutContext` 类,封装当前模板的尺寸信息
+
+```python
+@dataclass
+class LayoutContext:
+    slide_width: int = SLIDE_WIDTH
+    slide_height: int = SLIDE_HEIGHT
+    content_top: int = int(CONTENT_TOP_BASE)
+    footer_top: int = FOOTER_TOP
+    margin_left: int = int(MARGIN_LEFT)
+    margin_right: int = int(MARGIN_RIGHT)
+    
+    @classmethod
+    def from_template_profile(cls, profile: 'TemplateProfile') -> 'LayoutContext': ...
+```
+
+**修改所有布局函数签名**:
+```python
+# 修改前
+def get_kpi_grid(content_top_emu: int = None, ...) -> list[LayoutZone]:
+
+# 修改后
+def get_kpi_grid(content_top_emu: int = None, 
+                 ctx: LayoutContext = None, ...) -> list[LayoutZone]:
+    ctx = ctx or LayoutContext()
+    ...
+```
+
+**调用侧修改**:`ppt_builder.py` 在构建每页前创建 `LayoutContext`,传入各布局函数。
+
+---
+
+### 3.5 修改模块:`scripts/ppt_builder.py`(核心构建器)
+
+#### 3.5.1 模板解析入口
+
+在 `build_report()` 和 `_build_without_save()` 开头增加:
+```python
+def _resolve_template_profile(config: ReportConfig) -> TemplateProfile:
+    if config.template_profile:
+        return config.template_profile
+    if config.template_path:
+        return parse_template(config.template_path)  # template_parser.parse_template
+    # 内置模板
+    return parse_template(_resolve_master_template(config))
+```
+
+#### 3.5.2 母版页选择逻辑(关键修改)
+
+**现状**:固定索引复制
+```python
+slide = _duplicate_slide(prs, prs.slides[0])  # 封面
+slide = _duplicate_slide(prs, prs.slides[1])  # 内容
+slide = _duplicate_slide(prs, prs.slides[3])  # 尾页
+```
+
+**新设计**:通过 `TemplateProfile` 动态选择
+```python
+def _duplicate_master_slide(prs, profile: TemplateProfile, page_type: str):
+    master_info = profile.get_master_for(page_type)
+    if master_info:
+        return _duplicate_slide(prs, prs.slides[master_info.slide_index])
+    # 兜底:按原硬编码索引
+    fallback_index = {'cover': 0, 'content': 1, 'toc': 1, 'end': 3}.get(page_type, 1)
+    return _duplicate_slide(prs, prs.slides[fallback_index])
+```
+
+#### 3.5.3 配色与字体应用
+
+**现状**:硬编码颜色常量 + 主题转换
+```python
+C_PRIMARY = RGBColor(0x1E, 0x3A, 0x5F)  # 硬编码
+colors = theme_to_rgb_colors(config.theme)  # 用户主题
+```
+
+**新设计**:三层优先级
+1. 用户显式配置的 `config.theme`(最高优先级)
+2. 模板提取的 `detected_theme`(当 `use_template_theme=True`)
+3. 默认颜色常量(兜底)
+
+```python
+def _resolve_colors(config: ReportConfig, profile: TemplateProfile) -> dict:
+    if config.theme and not config.use_template_theme:
+        return theme_to_rgb_colors(config.theme)
+    template_theme = extract_theme_from_template(profile)
+    return theme_to_rgb_colors(template_theme)
+```
+
+**字体同理**:模板提取的字体 → 用户配置 → 默认 "微软雅黑"/"Arial"
+
+#### 3.5.4 布局上下文传递
+
+```python
+def build_report(data_file: str, config: ReportConfig, output_path: str) -> str:
+    profile = _resolve_template_profile(config)
+    ctx = LayoutContext.from_template_profile(profile)
+    colors = _resolve_colors(config, profile)
+    fonts = _resolve_fonts(config, profile)  # 新增
+    
+    # 后续所有 _build_xxx_page 调用增加 profile, ctx, fonts 参数
+    ...
+```
+
+#### 3.5.5 占位符兼容(关键)
+
+用户模板中的占位符命名可能与内置模板不同。建立**占位符别名映射**:
+
+```python
+PLACEHOLDER_ALIASES = {
+    '{report_title}': ['{report_title}', '{标题}', '{title}'],
+    '{page_title}': ['{page_title}', '{页面标题}', '{subtitle}'],
+    '{date}': ['{date}', '{日期}', '{report_date}'],
+    '{department}': ['{department}', '{部门}', '{source}'],
+    '{period}': ['{period}', '{周期}', '{report_period}'],
+}
+```
+
+`_replace_placeholder()` 增强为支持别名匹配:
+```python
+def _replace_placeholder(slide, placeholder, new_text, aliases=None):
+    aliases = aliases or []
+    targets = [placeholder] + aliases
+    for shape in slide.shapes:
+        if not shape.has_text_frame:
+            continue
+        for para in shape.text_frame.paragraphs:
+            for target in targets:
+                if target in para.text:
+                    para.text = para.text.replace(target, str(new_text))
+```
+
+---
+
+### 3.6 修改模块:`scripts/quality_inspector.py`
+
+**适配点**:
+1. 尺寸检查不再使用硬编码 `SLIDE_WIDTH/SLIDE_HEIGHT`,改为读取实际 `slide.slide_width / slide.slide_height`
+2. 字体一致性检查:允许模板提取的字体组合(如标题用 A 字体、正文用 B 字体),不强制单一字体
+3. 内容区域检测:使用 `LayoutContext` 或从模板 profile 传入的 `content_top`
+
+```python
+# 修改前(硬编码)
+sw = int(slide.slide_width) if hasattr(slide, 'slide_width') else SLIDE_WIDTH
+
+# 修改后(实际读取,已有逻辑,确认生效即可)
+# 补充:支持从 config 读取 template_profile 中的安全边距
+```
+
+---
+
+### 3.7 修改模块:`scripts/agent_analyzer.py`(用户确认流程)
+
+**第 5 项确认**(页面结构与模板方案)增强:
+
+当用户提供了自定义模板时,展示解析结果供确认:
+
+```
+【模板解析结果】
+- 检测到母版页:封面页(✓)  内容页(✓)  目录页(✗)  尾页(✓)
+- 检测到配色:主色 #1E3A5F,强调色 #10B981
+- 检测到字体:标题=微软雅黑,正文=微软雅黑
+- 内容区域起始:距顶部 2.1cm
+- ⚠️ 未检测到目录页母版,目录页将使用内容页母版替代
+
+是否应用模板提取的配色和字体? [是/否]
+```
+
+---
+
+## 4. 文件修改明细表
+
+| 文件 | 修改类型 | 修改内容 |
+|------|----------|----------|
+| `scripts/template_parser.py` | **新增** | 模板解析引擎,输出 TemplateProfile |
+| `scripts/report_config.py` | 修改 | 新增 `template_profile`, `use_template_theme` 字段;新增 `TemplateConfirmationDetail` |
+| `scripts/theme_manager.py` | 修改 | 新增 `extract_theme_from_template()`, `merge_theme()`, `ThemePreset.FROM_TEMPLATE` |
+| `scripts/page_layouts.py` | 修改 | 新增 `LayoutContext`,所有函数支持动态尺寸 |
+| `scripts/ppt_builder.py` | 修改 | 模板解析入口、动态母版选择、三层配色/字体优先级、占位符别名、LayoutContext 传递 |
+| `scripts/quality_inspector.py` | 修改 | 尺寸/边距读取实际值,字体检查适配模板字体 |
+| `scripts/agent_analyzer.py` | 修改 | 第5项确认展示模板解析结果 |
+| `SKILL.md` | 修改 | 更新文档:自定义模板使用说明、模板制作规范 |
+
+---
+
+## 5. 用户交互流程(新增)
+
+```
+用户: 我要用我自己的模板生成报告
+      (上传 my-template.pptx)
+
+Agent: 收到模板,正在解析...
+       [调用 template_parser.parse_template()]
+
+Agent: 【模板解析完成】
+       - 尺寸: 16:9 宽屏
+       - 母版页: 封面✓ 内容✓ 尾页✓
+       - 检测到配色: 主色 #2B579A,强调色 #FF6B35
+       - 检测到字体: 标题=思源黑体,正文=微软雅黑
+       - ⚠️ 未找到 {page_title} 占位符,将自动添加页面标题
+
+Agent: 是否应用模板提取的配色?[是/否]
+用户: 是
+
+Agent: (继续原有6项确认流程,第5项已包含模板信息)
+       ...
+
+Agent: (调用 build_report(),内部使用 TemplateProfile)
+       → 生成按用户模板样式的 PPT
+```
+
+---
+
+## 6. 模板制作规范(面向用户)
+
+为获得最佳效果,建议用户在模板中遵循以下规范:
+
+### 6.1 母版页结构
+建议模板包含 **4 个母版幻灯片**(至少包含封面页和内容页):
+| 母版页 | 建议包含的占位符 | 用途 |
+|--------|-----------------|------|
+| 封面页 | `{report_title}`, `{date}`, `{department}` | 报告封面 |
+| 目录页 | `{chapter1_title}`, `{chapter1_desc}`, ... | 目录/导航页 |
+| 内容页 | `{page_title}`, `{source}`, `{period}` | 正文页(图表、洞察) |
+| 尾页 | `{report_title}` | 结束页 |
+
+### 6.2 占位符规则
+- 占位符使用 `{}` 包裹,如 `{report_title}`
+- 不强制要求所有占位符,缺少的会自动跳过或智能补充
+- 支持自定义命名,agent 会通过语义匹配和别名映射识别
+
+### 6.3 样式设计建议
+- **主题色**:在母版中设置主题颜色(设计 → 变体 → 颜色),agent 会自动提取
+- **字体**:在母版中分别设置标题和正文字体,agent 会识别并统一应用
+- **背景**:可使用纯色、渐变或图片背景,复制时会完整保留
+- **页眉/页脚**:模板中已有的页眉页脚图形会保留,agent 会自动检测避免重复添加
+
+---
+
+## 7. 风险与兜底策略
+
+| 风险场景 | 兜底策略 |
+|----------|----------|
+| 用户模板只有1页 | 该页同时作为封面/内容/尾页的复制源;缺失页类型跳过 |
+| 无法识别母版页类型 | 默认第1页=cover,最后1页=end,其余=content |
+| 无法提取主题色 | 回退到 `ThemePreset.BUSINESS_CLASSIC` |
+| 用户模板尺寸非标准 | `LayoutContext` 读取实际尺寸,布局函数自适应计算 |
+| 占位符命名完全自定义 | 通过语义相似度匹配(如文本框位置、内容特征) |
+| 模板有复杂动画/媒体 | python-pptx 复制元素时会保留 XML,通常可保留;视频等不支持元素会自动跳过 |
+
+---
+
+## 8. 向后兼容性
+
+- **完全不提供模板**:走现有逻辑,使用 `assets/` 内置模板,行为与现在完全一致
+- **提供 `template_path` 但无 `template_profile`**:自动调用 `template_parser` 解析,兼容旧配置
+- **显式设置 `theme`** 且 `use_template_theme=False`:完全使用用户指定主题,忽略模板颜色
+- **原有 API** `build_daily_report()` / `build_report()` / `quality_assured_build()` 签名不变,内部自动适配
+
+---
+
+## 9. 实施优先级建议
+
+按以下顺序实施,每步可独立测试:
+
+1. **P0 - 模板解析器**(`template_parser.py`)
+   - 解析内置 3 个模板,验证输出 `TemplateProfile` 与现有硬编码逻辑一致
+   - 解析用户上传模板,验证结构识别正确
+
+2. **P0 - 动态布局上下文**(`page_layouts.py` + `ppt_builder.py` 参数传递)
+   - 所有布局函数支持 `LayoutContext`
+   - 内置模板走新流程,输出应与旧版本逐页一致
+
+3. **P1 - 动态母版选择**(`ppt_builder.py`)
+   - 按 `TemplateProfile` 选择母版页复制
+   - 支持单页模板、多页模板的母版映射
+
+4. **P1 - 主题色/字体提取与应用**(`theme_manager.py` + `ppt_builder.py`)
+   - 从模板提取配色,应用到图表、卡片、文本
+   - 用户配置覆盖机制
+
+5. **P2 - 占位符别名与增强匹配**
+   - 支持更多占位符命名变体
+   - 语义匹配兜底
+
+6. **P2 - 质量检查适配**(`quality_inspector.py`)
+   - 读取实际幻灯片尺寸
+   - 模板字体白名单
+
+7. **P2 - 用户确认流程展示**(`agent_analyzer.py` + `SKILL.md`)
+   - 第5项确认展示模板解析结果
+   - 文档更新

+ 51 - 0
generate-data-report-ppt/SKILL.md

@@ -78,6 +78,8 @@ generate-data-report-ppt/
 - 每套主题包含:主色、辅色、强调色、背景色、文字色、系列色盘
 - 支持自定义配色覆盖
 - `theme_to_rgb_colors()` 一键转换为 pptx RGBColor 对象
+- 新增 `extract_theme_from_template()` 从用户上传的模板母版自动提取主题色和字体
+- 新增 `ThemePreset.FROM_TEMPLATE` 使用模板自带配色方案
 
 ### 智能分析(agent_analyzer.py)
 - 自动识别可量化的数值指标
@@ -96,6 +98,14 @@ generate-data-report-ppt/
 - `calculate_content_area()` 计算可用内容区域
 - `calculate_fill_ratio()` 计算页面内容填充率
 - `ensure_safe_position()` 确保元素在页面安全区域内
+- 新增 `LayoutContext`:封装模板实际尺寸(宽/高/边距/内容区),支持 16:9、4:3 等任意尺寸模板
+
+### 自定义模板解析(template_parser.py)
+- 解析任意 `.pptx` 模板,自动识别母版页类型(封面/内容/目录/尾页)
+- 提取占位符列表,支持别名映射(如 `{report_title}` / `{标题}`)
+- 自动检测主题色、字体、幻灯片尺寸、安全边距、内容区域
+- 输出 `TemplateProfile`,供 `ppt_builder.py` 动态选择母版页和适配布局
+- 内置模板与用户模板走同一套解析流程,保证接口一致
 
 ### 质量自检(quality_rules.py + quality_inspector.py)
 
@@ -154,6 +164,18 @@ config = ReportConfig(
 
 build_report('any_data.xlsx', config, 'output.pptx')
 
+# === 使用自定义模板 ===
+config = ReportConfig(
+    title='销售数据报告',
+    period_type='monthly',
+    source_label='销售部',
+    template_path='my-template.pptx',   # 用户上传的模板
+    use_template_theme=True,             # 自动应用模板提取的配色
+    quality_threshold=85,
+    max_fix_iterations=3,
+)
+build_report('any_data.xlsx', config, 'output_custom.pptx')
+
 # === 带质量保证的方式(推荐)===
 prs, issues = quality_assured_build('any_data.xlsx', config, 'output_qa.pptx')
 ```
@@ -176,3 +198,32 @@ Data profiling serves the confirmed business intent. It should map the confirmed
 For visual quality, treat master PPTX files as style assets, not rigid page contracts. If a template placeholder cannot be populated, remove the whole placeholder component. If a KPI grid consumes the available vertical space, do not add bottom insight text; use a later analysis page or a different layout instead.
 
 For analytical quality, load `references/professional-data-analyst-playbook.md` before generating or reviewing report narratives. A page that only restates totals, rankings, or category names without comparison, diagnosis, implication, or action is not acceptable even if layout quality passes.
+
+## 自定义模板规范
+
+用户可提供自定义 `.pptx` 模板,skill 自动解析并按其样式生成报告。
+
+### 母版页结构(建议)
+| 母版页 | 建议包含的占位符 | 用途 |
+|--------|-----------------|------|
+| 封面页 | `{report_title}`, `{date}`, `{department}` | 报告封面 |
+| 目录页 | `{chapter1_title}`, `{chapter1_desc}`, ... | 目录/导航页 |
+| 内容页 | `{page_title}`, `{source}`, `{period}` | 正文页(图表、洞察) |
+| 尾页 | `{report_title}` | 结束页 |
+
+### 占位符规则
+- 使用 `{}` 包裹,如 `{report_title}`
+- 不强制要求全部占位符,缺少的会自动跳过或智能补充
+- 支持中文别名:`{标题}`, `{日期}`, `{部门}`, `{页面标题}`, `{数据来源}`, `{页码}` 等
+
+### 样式设计建议
+- **主题色**:在母版中设置主题颜色,agent 会自动提取并应用到图表、卡片、文本
+- **字体**:在母版中分别设置标题和正文字体,agent 会识别并统一应用
+- **背景**:纯色、渐变或图片背景均可,复制时会完整保留
+- **尺寸**:支持 16:9、4:3 等任意尺寸,布局引擎自动适配
+- **页眉/页脚**:已有的页眉页脚图形会保留,agent 自动检测避免重复添加
+
+### 配色优先级
+1. 用户显式配置 `config.theme`(最高优先级)
+2. 模板提取的配色(`use_template_theme=True` 时生效)
+3. 默认商务经典主题(兜底)

+ 9 - 6
generate-data-report-ppt/scripts/data_loader.py

@@ -447,6 +447,15 @@ def _clean_generic_dataframe(df: pd.DataFrame, skip_summary_rows=True) -> pd.Dat
 
     for col in df.columns:
         if df[col].dtype == 'object':
+            # Try numeric first to avoid small integers being mis-parsed as dates
+            try:
+                numeric = pd.to_numeric(df[col], errors='coerce')
+                if numeric.notna().sum() > len(df) * 0.7:
+                    df[col] = numeric
+                    continue
+            except Exception:
+                pass
+            # Then try datetime
             try:
                 try:
                     parsed = pd.to_datetime(df[col], errors='coerce', format='mixed')
@@ -457,12 +466,6 @@ def _clean_generic_dataframe(df: pd.DataFrame, skip_summary_rows=True) -> pd.Dat
                     continue
             except Exception:
                 pass
-            try:
-                numeric = pd.to_numeric(df[col], errors='coerce')
-                if numeric.notna().sum() > len(df) * 0.7:
-                    df[col] = numeric
-            except Exception:
-                pass
 
     return df
 

+ 7 - 0
generate-data-report-ppt/scripts/metrics_calculator.py

@@ -1148,6 +1148,13 @@ def calc_generic_metrics(df: pd.DataFrame, config) -> dict:
         series = df[col].dropna()
         agg = metric_def.aggregation
 
+        # Coerce object series to numeric when possible so sparse numeric
+        # columns (e.g. forecast/delivery with many NaNs) are handled correctly.
+        if not pd.api.types.is_numeric_dtype(series):
+            coerced = pd.to_numeric(series, errors='coerce').dropna()
+            if len(coerced) > 0:
+                series = coerced
+
         if agg == 'sum':
             val = int(series.sum()) if pd.api.types.is_numeric_dtype(series) else len(series)
         elif agg == 'count':

+ 115 - 48
generate-data-report-ppt/scripts/page_layouts.py

@@ -1,13 +1,18 @@
 """
 Dynamic page layout engine for the universal data report generator.
 Provides pre-defined layout templates and layout calculation utilities.
+Supports dynamic slide dimensions via LayoutContext for custom templates.
 """
 from pptx.util import Emu, Pt
 from pptx.dml.color import RGBColor
-from dataclasses import dataclass
+from dataclasses import dataclass, field
 from typing import Optional
 
 
+# ==============================================================================
+# DEFAULT CONSTANTS (backward compatible)
+# ==============================================================================
+
 SLIDE_WIDTH = 16256000
 SLIDE_HEIGHT = 9144000
 MARGIN_LEFT = Emu(762000)
@@ -19,6 +24,52 @@ FOOTER_HEIGHT = Emu(320000)
 CONTENT_WIDTH = SLIDE_WIDTH - MARGIN_LEFT - MARGIN_RIGHT
 
 
+# ==============================================================================
+# LAYOUT CONTEXT
+# ==============================================================================
+
+@dataclass
+class LayoutContext:
+    """Encapsulates slide dimensions and content geometry for a specific template."""
+    slide_width: int = SLIDE_WIDTH
+    slide_height: int = SLIDE_HEIGHT
+    content_top: int = field(default_factory=lambda: int(CONTENT_TOP_BASE))
+    footer_top: int = FOOTER_TOP
+    margin_left: int = field(default_factory=lambda: int(MARGIN_LEFT))
+    margin_right: int = field(default_factory=lambda: int(MARGIN_RIGHT))
+    margin_top: int = field(default_factory=lambda: int(MARGIN_TOP))
+
+    @property
+    def content_width(self) -> int:
+        return self.slide_width - self.margin_left - self.margin_right
+
+    @classmethod
+    def from_template_profile(cls, profile) -> 'LayoutContext':
+        """Build LayoutContext from a TemplateProfile (template_parser)."""
+        # Import here to avoid circular dependency at module load
+        from template_parser import TemplateProfile
+        if not isinstance(profile, TemplateProfile):
+            raise TypeError("profile must be a TemplateProfile instance")
+        
+        # Use content top from content master if available
+        content_top = profile.get_content_top("content")
+        margins = profile.safe_margins
+        
+        return cls(
+            slide_width=profile.slide_width,
+            slide_height=profile.slide_height,
+            content_top=content_top,
+            footer_top=profile.slide_height - int(Emu(320000)),  # default footer area
+            margin_left=margins.get("left", int(MARGIN_LEFT)),
+            margin_right=margins.get("right", int(MARGIN_RIGHT)),
+            margin_top=margins.get("top", int(MARGIN_TOP)),
+        )
+
+
+# ==============================================================================
+# LAYOUT ZONES
+# ==============================================================================
+
 @dataclass
 class LayoutZone:
     x: int
@@ -28,63 +79,75 @@ class LayoutZone:
     zone_type: str
 
 
-def calculate_content_area(content_top_emu: int = None) -> LayoutZone:
-    top = content_top_emu or int(CONTENT_TOP_BASE)
-    height = FOOTER_TOP - top - Emu(100000)
+def _resolve_ctx(ctx: Optional[LayoutContext] = None) -> LayoutContext:
+    return ctx if ctx is not None else LayoutContext()
+
+
+def calculate_content_area(content_top_emu: int = None,
+                           ctx: Optional[LayoutContext] = None) -> LayoutZone:
+    ctx = _resolve_ctx(ctx)
+    top = content_top_emu if content_top_emu is not None else ctx.content_top
+    height = ctx.footer_top - top - int(Emu(100000))
     return LayoutZone(
-        x=int(MARGIN_LEFT),
+        x=ctx.margin_left,
         y=top,
-        width=int(CONTENT_WIDTH),
-        height=int(height),
+        width=ctx.content_width,
+        height=max(int(height), int(Emu(500000))),
         zone_type='content_area',
     )
 
 
 def get_kpi_grid(content_top_emu: int = None, cols: int = 3, rows: int = 2,
                  card_width_emu: int = 4699000, card_height_emu: int = 3048000,
-                 gap_x_emu: int = 444500, gap_y_emu: int = 381000) -> list[LayoutZone]:
-    start_y = max(int(CONTENT_TOP_BASE), content_top_emu or int(CONTENT_TOP_BASE))
+                 gap_x_emu: int = 444500, gap_y_emu: int = 381000,
+                 ctx: Optional[LayoutContext] = None) -> list[LayoutZone]:
+    ctx = _resolve_ctx(ctx)
+    start_y = max(ctx.content_top, content_top_emu or ctx.content_top)
     zones = []
     for row in range(rows):
         for col in range(cols):
-            x = int(MARGIN_LEFT) + col * (card_width_emu + gap_x_emu)
+            x = ctx.margin_left + col * (card_width_emu + gap_x_emu)
             y = start_y + row * (card_height_emu + gap_y_emu)
             zones.append(LayoutZone(x=x, y=y, width=card_width_emu, height=card_height_emu, zone_type='kpi_card'))
     return zones
 
 
-def get_chart_left_zone(content_top_emu: int = None, chart_ratio: float = 0.6) -> LayoutZone:
-    content = calculate_content_area(content_top_emu)
-    chart_w = int(content.width * chart_ratio) - Emu(200000)
+def get_chart_left_zone(content_top_emu: int = None, chart_ratio: float = 0.6,
+                        ctx: Optional[LayoutContext] = None) -> LayoutZone:
+    content = calculate_content_area(content_top_emu, ctx)
+    chart_w = int(content.width * chart_ratio) - int(Emu(200000))
     return LayoutZone(
         x=content.x,
         y=content.y,
-        width=chart_w,
+        width=max(chart_w, int(Emu(1000000))),
         height=content.height,
         zone_type='chart_left',
     )
 
 
-def get_insight_right_zone(content_top_emu: int = None, chart_ratio: float = 0.6) -> LayoutZone:
-    content = calculate_content_area(content_top_emu)
+def get_insight_right_zone(content_top_emu: int = None, chart_ratio: float = 0.6,
+                           ctx: Optional[LayoutContext] = None) -> LayoutZone:
+    content = calculate_content_area(content_top_emu, ctx)
     chart_w = int(content.width * chart_ratio)
-    text_left = content.x + chart_w + Emu(200000)
+    text_left = content.x + chart_w + int(Emu(200000))
     text_w = content.x + content.width - text_left
     return LayoutZone(
         x=text_left,
         y=content.y,
-        width=text_w,
+        width=max(text_w, int(Emu(800000))),
         height=content.height,
         zone_type='insight_right',
     )
 
 
-def get_full_width_zone(content_top_emu: int = None) -> LayoutZone:
-    return calculate_content_area(content_top_emu)
+def get_full_width_zone(content_top_emu: int = None,
+                        ctx: Optional[LayoutContext] = None) -> LayoutZone:
+    return calculate_content_area(content_top_emu, ctx)
 
 
-def get_two_column_zones(content_top_emu: int = None, gap_emu: int = 381000) -> tuple[LayoutZone, LayoutZone]:
-    content = calculate_content_area(content_top_emu)
+def get_two_column_zones(content_top_emu: int = None, gap_emu: int = 381000,
+                         ctx: Optional[LayoutContext] = None) -> tuple[LayoutZone, LayoutZone]:
+    content = calculate_content_area(content_top_emu, ctx)
     half_w = (content.width - gap_emu) // 2
     left = LayoutZone(x=content.x, y=content.y, width=half_w, height=content.height, zone_type='column_left')
     right = LayoutZone(x=content.x + half_w + gap_emu, y=content.y, width=half_w, height=content.height, zone_type='column_right')
@@ -92,8 +155,9 @@ def get_two_column_zones(content_top_emu: int = None, gap_emu: int = 381000) ->
 
 
 def get_two_row_zones(content_top_emu: int = None, gap_emu: int = 381000,
-                      top_ratio: float = 0.55) -> tuple[LayoutZone, LayoutZone]:
-    content = calculate_content_area(content_top_emu)
+                      top_ratio: float = 0.55,
+                      ctx: Optional[LayoutContext] = None) -> tuple[LayoutZone, LayoutZone]:
+    content = calculate_content_area(content_top_emu, ctx)
     top_h = int(content.height * top_ratio)
     top = LayoutZone(x=content.x, y=content.y, width=content.width, height=top_h, zone_type='row_top')
     bottom = LayoutZone(
@@ -106,34 +170,34 @@ def get_two_row_zones(content_top_emu: int = None, gap_emu: int = 381000,
     return top, bottom
 
 
-def get_card_grid(n: int, content_top_emu: int = None, max_cols: int = 3) -> list[LayoutZone]:
-    content = calculate_content_area(content_top_emu)
+def get_card_grid(n: int, content_top_emu: int = None, max_cols: int = 3,
+                  ctx: Optional[LayoutContext] = None) -> list[LayoutZone]:
+    content = calculate_content_area(content_top_emu, ctx)
     cols = min(max_cols, n)
     rows = (n + cols - 1) // cols
-    card_w = (content.width - (cols - 1) * Emu(254000)) // cols
-    card_h = (content.height - (rows - 1) * Emu(254000)) // rows
+    card_w = (content.width - (cols - 1) * int(Emu(254000))) // cols
+    card_h = (content.height - (rows - 1) * int(Emu(254000))) // rows
 
     zones = []
     for i in range(n):
         col = i % cols
         row = i // cols
-        x = content.x + col * (card_w + Emu(254000))
-        y = content.y + row * (card_h + Emu(254000))
+        x = content.x + col * (card_w + int(Emu(254000)))
+        y = content.y + row * (card_h + int(Emu(254000)))
         zones.append(LayoutZone(x=x, y=y, width=card_w, height=card_h, zone_type=f'card_{i}'))
     return zones
 
 
-def get_alert_card_zones(n: int, content_top_emu: int = None) -> list[LayoutZone]:
-    content = calculate_content_area(content_top_emu)
-    card_h = Emu(2286000)
-    gap = Emu(254000)
-    return get_card_grid(n, content_top_emu, max_cols=3)
+def get_alert_card_zones(n: int, content_top_emu: int = None,
+                         ctx: Optional[LayoutContext] = None) -> list[LayoutZone]:
+    return get_card_grid(n, content_top_emu, max_cols=3, ctx=ctx)
 
 
-def get_issue_card_zones(n: int, content_top_emu: int = None) -> list[LayoutZone]:
-    content = calculate_content_area(content_top_emu)
-    card_h = Emu(2032000)
-    gap = Emu(254000)
+def get_issue_card_zones(n: int, content_top_emu: int = None,
+                         ctx: Optional[LayoutContext] = None) -> list[LayoutZone]:
+    content = calculate_content_area(content_top_emu, ctx)
+    card_h = int(Emu(2032000))
+    gap = int(Emu(254000))
     start_y = content.y
     zones = []
     for i in range(min(n, 3)):
@@ -142,25 +206,27 @@ def get_issue_card_zones(n: int, content_top_emu: int = None) -> list[LayoutZone
     return zones
 
 
-def get_table_zone(content_top_emu: int = None, ratio: float = 0.5) -> LayoutZone:
-    content = calculate_content_area(content_top_emu)
+def get_table_zone(content_top_emu: int = None, ratio: float = 0.5,
+                   ctx: Optional[LayoutContext] = None) -> LayoutZone:
+    content = calculate_content_area(content_top_emu, ctx)
     return LayoutZone(
         x=content.x,
-        y=content.y + int(content.height * ratio) + Emu(200000),
+        y=content.y + int(content.height * ratio) + int(Emu(200000)),
         width=content.width,
         height=int(content.height * (1 - ratio)),
         zone_type='table_bottom',
     )
 
 
-def detect_layout_slots(slide) -> dict:
+def detect_layout_slots(slide, ctx: Optional[LayoutContext] = None) -> dict:
+    ctx = _resolve_ctx(ctx)
     slots = {
         'has_header': False,
         'has_footer': False,
         'has_page_title': False,
-        'content_top': int(CONTENT_TOP_BASE),
-        'content_width': int(CONTENT_WIDTH),
-        'content_height': FOOTER_TOP - int(CONTENT_TOP_BASE) - Emu(100000),
+        'content_top': ctx.content_top,
+        'content_width': ctx.content_width,
+        'content_height': ctx.footer_top - ctx.content_top - int(Emu(100000)),
     }
     for shape in slide.shapes:
         if shape.has_text_frame:
@@ -193,8 +259,9 @@ def ensure_safe_position(shape, slide_width: int, slide_height: int) -> bool:
     return adjusted
 
 
-def calculate_fill_ratio(slide, content_top_emu: int = None) -> float:
-    content = calculate_content_area(content_top_emu)
+def calculate_fill_ratio(slide, content_top_emu: int = None,
+                         ctx: Optional[LayoutContext] = None) -> float:
+    content = calculate_content_area(content_top_emu, ctx)
     total_area = content.width * content.height
     if total_area <= 0:
         return 0.0

Diferenças do arquivo suprimidas por serem muito extensas
+ 336 - 133
generate-data-report-ppt/scripts/ppt_builder.py


+ 42 - 4
generate-data-report-ppt/scripts/quality_inspector.py

@@ -48,8 +48,9 @@ class QualityIssue:
 
 
 class QualityInspector:
-    def __init__(self, theme_colors: dict = None):
+    def __init__(self, theme_colors: dict = None, layout_context=None):
         self.theme_colors = theme_colors or {}
+        self.layout_context = layout_context
         self.fix_count = 0
         self.fix_log = []
 
@@ -388,6 +389,10 @@ class QualityInspector:
         return issues
 
     def _check_content(self, slide, page_idx, config, prs, page_type='content') -> list[QualityIssue]:
+        # Resolve dynamic content top from layout context if available
+        content_top_emu = None
+        if self.layout_context:
+            content_top_emu = self.layout_context.content_top
         issues = []
 
         if page_type in ('cover', 'end'):
@@ -411,7 +416,7 @@ class QualityInspector:
                     'C008', False, {'type': 'empty_page'}))
             return issues
 
-        fill_ratio = calculate_fill_ratio(slide)
+        fill_ratio = calculate_fill_ratio(slide, content_top_emu=content_top_emu)
 
         if page_type in ('kpi_overview', 'trend', 'distribution', 'ranking', 'summary') or page_type in FORECAST_PAGE_TYPES:
             if fill_ratio < FILL_RATIO_THRESHOLDS['sparse']:
@@ -615,8 +620,41 @@ class QualityInspector:
         elif fd.get('type') == 'placeholder':
             shape = fd.get('shape')
             if shape and shape.has_text_frame:
-                for para in shape.text_frame.paragraphs:
-                    para.text = re.sub(r'\{[^}]+\}', '', para.text)
+                text = shape.text_frame.text or ''
+                # For KPI placeholders, remove the entire shape and nearby card backgrounds
+                kpi_pattern = re.compile(r'\{kpi\d+_(label|value)\}')
+                if kpi_pattern.search(text):
+                    # Remove this text shape
+                    self._remove_shape(shape)
+                    # Also remove nearby rounded rectangle backgrounds
+                    try:
+                        sx = int(shape.left)
+                        sy = int(shape.top)
+                        sw = int(shape.width)
+                        sh = int(shape.height)
+                        pad = 300000
+                        for other in list(slide.shapes):
+                            try:
+                                ox = int(other.left)
+                                oy = int(other.top)
+                                ow = int(other.width)
+                                oh = int(other.height)
+                                in_region = (
+                                    ox >= sx - pad and ox + ow <= sx + sw + pad and
+                                    oy >= sy - pad and oy + oh <= sy + sh + pad
+                                )
+                                if in_region and other != shape:
+                                    # Check if it's a background shape (no text or empty text)
+                                    if not other.has_text_frame or not (other.text_frame.text or '').strip():
+                                        self._remove_shape(other)
+                            except Exception:
+                                pass
+                    except Exception:
+                        pass
+                else:
+                    # For other placeholders, just clear the text
+                    for para in shape.text_frame.paragraphs:
+                        para.text = re.sub(r'\{[^}]+\}', '', para.text)
                 fd['fixed'] = True
 
         elif fd.get('type') == 'edge_left':

+ 3 - 0
generate-data-report-ppt/scripts/report_config.py

@@ -74,6 +74,7 @@ class ThemePreset(str, Enum):
     DARK_PROFESSIONAL = 'dark_professional'
     WARM_BRAND = 'warm_brand'
     CUSTOM = 'custom'
+    FROM_TEMPLATE = 'from_template'
 
 
 @dataclass
@@ -181,6 +182,8 @@ class ReportConfig:
 
     theme: ThemeConfig = field(default_factory=ThemeConfig)
     template_path: str = ''
+    template_profile: Optional[object] = None  # TemplateProfile from template_parser
+    use_template_theme: bool = True
     visual_style_direction: str = ''
     page_structure_template: str = ''
 

+ 587 - 0
generate-data-report-ppt/scripts/template_parser.py

@@ -0,0 +1,587 @@
+"""
+Template parser engine for the universal data report generator.
+Reads any .pptx template and outputs a structured TemplateProfile describing
+master slide types, placeholders, colors, fonts, and layout geometry.
+"""
+from __future__ import annotations
+
+import os
+import re
+from dataclasses import dataclass, field
+from pathlib import Path
+from typing import Optional
+
+from pptx import Presentation
+from pptx.dml.color import RGBColor
+from pptx.util import Emu
+
+
+# ==============================================================================
+# DATA MODELS
+# ==============================================================================
+
+@dataclass
+class MasterSlideInfo:
+    slide_index: int
+    master_type: str  # 'cover' | 'content' | 'toc' | 'end' | 'unknown'
+    placeholders: list[str] = field(default_factory=list)
+    content_top: int = 0  # EMU
+    has_footer: bool = False
+    has_background: bool = False
+    shape_count: int = 0
+
+
+@dataclass
+class TemplateProfile:
+    path: str
+    is_builtin: bool
+    slide_width: int
+    slide_height: int
+    master_slides: list[MasterSlideInfo] = field(default_factory=list)
+    placeholder_map: dict[str, list[int]] = field(default_factory=dict)
+    detected_theme: dict[str, str] = field(default_factory=dict)
+    detected_fonts: dict[str, str] = field(default_factory=dict)
+    safe_margins: dict[str, int] = field(default_factory=dict)
+
+    def get_master_for(self, page_type: str) -> Optional[MasterSlideInfo]:
+        """Return the first master slide matching page_type, or None."""
+        for ms in self.master_slides:
+            if ms.master_type == page_type:
+                return ms
+        return None
+
+    def get_content_top(self, page_type: str = "content") -> int:
+        """Return content_top for the given page_type, or best guess."""
+        ms = self.get_master_for(page_type)
+        if ms and ms.content_top > 0:
+            return ms.content_top
+        # Fallback to any content page
+        for ms in self.master_slides:
+            if ms.master_type == "content" and ms.content_top > 0:
+                return ms.content_top
+        # Hard fallback
+        return int(Emu(1422400))
+
+    def get_master_index_for(self, page_type: str) -> int:
+        """Return slide index for page_type, with fallback rules."""
+        ms = self.get_master_for(page_type)
+        if ms:
+            return ms.slide_index
+        # Fallback heuristics
+        if page_type == "cover" and self.master_slides:
+            return self.master_slides[0].slide_index
+        if page_type == "end" and self.master_slides:
+            return self.master_slides[-1].slide_index
+        if page_type == "toc" and len(self.master_slides) >= 3:
+            return self.master_slides[2].slide_index
+        if len(self.master_slides) >= 2:
+            return self.master_slides[1].slide_index
+        return 0
+
+
+# ==============================================================================
+# PLACEHOLDER DETECTION
+# ==============================================================================
+
+_PLACEHOLDER_RE = re.compile(r"\{[^{}]+\}")
+
+# Canonical placeholder -> list of aliases (including itself)
+PLACEHOLDER_ALIASES: dict[str, list[str]] = {
+    "{report_title}": ["{report_title}", "{标题}", "{title}", "{报告标题}"],
+    "{report_type}": ["{report_type}", "{报告类型}", "{type}"],
+    "{date}": ["{date}", "{日期}", "{report_date}", "{报告日期}"],
+    "{department}": ["{department}", "{部门}", "{source}", "{来源}", "{dept}"],
+    "{period}": ["{period}", "{周期}", "{report_period}", "{时间周期}"],
+    "{gen_time}": ["{gen_time}", "{生成时间}", "{generated_time}"],
+    "{page_title}": ["{page_title}", "{页面标题}", "{subtitle}", "{page_header}"],
+    "{source}": ["{source}", "{数据来源}", "{data_source}"],
+    "{page_num}": ["{page_num}", "{页码}", "{page_number}"],
+}
+
+# Chapter placeholders are generated dynamically
+for i in range(1, 13):
+    PLACEHOLDER_ALIASES[f"{{chapter{i}_title}}"] = [f"{{chapter{i}_title}}", f"{{章节{i}标题}}"]
+    PLACEHOLDER_ALIASES[f"{{chapter{i}_desc}}"] = [f"{{chapter{i}_desc}}", f"{{章节{i}描述}}"]
+
+# KPI placeholders
+for i in range(1, 13):
+    PLACEHOLDER_ALIASES[f"{{kpi{i}_label}}"] = [f"{{kpi{i}_label}}", f"{{kpi{i}_name}}"]
+    PLACEHOLDER_ALIASES[f"{{kpi{i}_value}}"] = [f"{{kpi{i}_value}}", f"{{kpi{i}_val}}"]
+
+
+def _scan_placeholders(slide) -> list[str]:
+    """Scan a slide for all placeholder-like strings {xxx}."""
+    found = set()
+    for shape in slide.shapes:
+        if shape.has_text_frame:
+            text = shape.text_frame.text or ""
+            for match in _PLACEHOLDER_RE.finditer(text):
+                found.add(match.group(0))
+    return sorted(found)
+
+
+def _normalize_placeholder(raw: str) -> Optional[str]:
+    """Map a raw placeholder to its canonical form, if known."""
+    raw_lower = raw.lower()
+    for canonical, aliases in PLACEHOLDER_ALIASES.items():
+        if raw_lower in [a.lower() for a in aliases]:
+            return canonical
+    return None
+
+
+# ==============================================================================
+# MASTER SLIDE TYPE DETECTION
+# ==============================================================================
+
+_TYPE_KEYWORDS: dict[str, list[str]] = {
+    "cover": ["{report_title}", "{date}", "{department}", "{report_type}", "{gen_time}"],
+    "content": ["{page_title}", "{source}", "{page_num}", "{period}"],
+    "toc": ["{chapter", "contents", "目录", "catalog", "agenda"],
+    "end": ["{report_title}", "感谢", "thank", "结语", "尾页", "end"],
+}
+
+
+def _detect_master_type(slide, slide_index: int, total_slides: int) -> str:
+    """Detect the semantic type of a master slide."""
+    texts = []
+    placeholders = []
+    for shape in slide.shapes:
+        if shape.has_text_frame:
+            t = (shape.text_frame.text or "").strip()
+            if t:
+                texts.append(t.lower())
+                placeholders.extend(_PLACEHOLDER_RE.findall(t))
+
+    text_block = " ".join(texts)
+    ph_block = " ".join(placeholders).lower()
+
+    scores: dict[str, int] = {"cover": 0, "content": 0, "toc": 0, "end": 0, "unknown": 0}
+
+    # Score by keywords
+    for ptype, keywords in _TYPE_KEYWORDS.items():
+        for kw in keywords:
+            if kw.lower() in ph_block or kw.lower() in text_block:
+                scores[ptype] += 1
+
+    # Position heuristics
+    if slide_index == 0:
+        scores["cover"] += 2
+    if slide_index == total_slides - 1:
+        scores["end"] += 2
+    if total_slides >= 3 and slide_index == 2:
+        scores["toc"] += 1
+
+    # Content page has page_title but not report_title (cover does)
+    if "{page_title}" in ph_block:
+        if "{report_title}" in ph_block:
+            # Could be cover with both; check position of report_title
+            # If report_title is at top-left small text, it's a header → content
+            scores["cover"] += 1
+        else:
+            scores["content"] += 3
+
+    # TOC strongly signaled by chapter placeholders
+    if "{chapter" in ph_block:
+        scores["toc"] += 5
+
+    # Distinguish end from cover: end usually lacks date/department placeholders
+    if "{date}" in ph_block and "{department}" in ph_block:
+        scores["cover"] += 2
+        scores["end"] -= 1
+
+    # Cover usually has KPI placeholders
+    if "{kpi1_label}" in ph_block:
+        scores["cover"] += 2
+
+    best = max(scores, key=lambda k: scores[k])
+    if scores[best] == 0:
+        # Default fallback by position
+        if slide_index == 0:
+            return "cover"
+        if slide_index == total_slides - 1:
+            return "end"
+        return "content"
+    return best
+
+
+# ==============================================================================
+# CONTENT TOP DETECTION
+# ==============================================================================
+
+def _detect_content_top(slide, default_gap: int = 381000) -> int:
+    """Detect content start Y by finding page_title placeholder bottom + gap."""
+    page_title_bottom = None
+    for shape in slide.shapes:
+        if not shape.has_text_frame:
+            continue
+        text = shape.text_frame.text or ""
+        # Match any page_title alias
+        if _matches_any_placeholder(text, "{page_title}"):
+            page_title_bottom = int(shape.top) + int(shape.height)
+            break
+
+    if page_title_bottom is not None:
+        return page_title_bottom + default_gap
+
+    # Fallback: find any text shape in the upper area that looks like a title
+    for shape in slide.shapes:
+        if not shape.has_text_frame:
+            continue
+        if int(shape.top) > Emu(500000) and int(shape.top) < Emu(1500000):
+            text = (shape.text_frame.text or "").strip()
+            if text and len(text) < 40 and "{" not in text:
+                return int(shape.top) + int(shape.height) + default_gap
+
+    return int(Emu(1422400))
+
+
+def _matches_any_placeholder(text: str, canonical: str) -> bool:
+    aliases = PLACEHOLDER_ALIASES.get(canonical, [canonical])
+    for alias in aliases:
+        if alias in text:
+            return True
+    return False
+
+
+# ==============================================================================
+# COLOR EXTRACTION
+# ==============================================================================
+
+def _extract_colors(slide) -> dict[str, str]:
+    """Extract dominant colors from a slide's shapes and theme."""
+    colors: dict[str, str] = {}
+
+    # Try theme color scheme first
+    try:
+        theme = slide.slide_layout.slide_master.theme
+        cs = theme.color_scheme
+        # Map theme colors
+        theme_map = {
+            "primary": cs.accent1,
+            "accent": cs.accent2,
+            "accent2": cs.accent3,
+            "accent_neg": cs.accent6,  # often red/orange
+            "text": cs.text1,
+            "background": cs.background1,
+        }
+        for key, color_obj in theme_map.items():
+            try:
+                rgb = color_obj.rgb
+                if rgb:
+                    colors[key] = _rgb_to_hex(rgb)
+            except Exception:
+                pass
+    except Exception:
+        pass
+
+    # Extract from shape fills (heuristic for primary color)
+    fill_colors: dict[str, int] = {}
+    text_colors: dict[str, int] = {}
+
+    for shape in slide.shapes:
+        # Fill colors
+        try:
+            if hasattr(shape, "fill") and shape.fill.type is not None:
+                if hasattr(shape.fill, "fore_color") and shape.fill.fore_color:
+                    rgb = getattr(shape.fill.fore_color, "rgb", None)
+                    if rgb:
+                        hex_str = _rgb_to_hex(rgb)
+                        fill_colors[hex_str] = fill_colors.get(hex_str, 0) + 1
+                        # Weight by area
+                        area = int(shape.width) * int(shape.height)
+                        fill_colors[hex_str] += area // 1000000000
+        except Exception:
+            pass
+
+        # Text colors
+        try:
+            if shape.has_text_frame:
+                for para in shape.text_frame.paragraphs:
+                    for run in para.runs:
+                        if run.font.color and run.font.color.rgb:
+                            hex_str = _rgb_to_hex(run.font.color.rgb)
+                            text_colors[hex_str] = text_colors.get(hex_str, 0) + 1
+        except Exception:
+            pass
+
+    # Determine primary from most common dark fill
+    dark_fills = {h: c for h, c in fill_colors.items() if _is_dark_color(h)}
+    if dark_fills:
+        primary = max(dark_fills, key=lambda k: dark_fills[k])
+        colors["primary"] = primary
+
+    # Determine accent from bright fills
+    bright_fills = {h: c for h, c in fill_colors.items() if _is_bright_color(h) and not _is_dark_color(h)}
+    if bright_fills:
+        accent = max(bright_fills, key=lambda k: bright_fills[k])
+        colors["accent"] = accent
+
+    # Text color
+    if text_colors:
+        text_col = max(text_colors, key=lambda k: text_colors[k])
+        if text_col.upper() not in ("FFFFFF", "000000") or len(text_colors) == 1:
+            colors["text"] = text_col
+
+    return colors
+
+
+def _rgb_to_hex(rgb) -> str:
+    if rgb is None:
+        return "#333333"
+    try:
+        return f"#{rgb[0]:02X}{rgb[1]:02X}{rgb[2]:02X}"
+    except Exception:
+        try:
+            return f"#{int(rgb):06X}"
+        except Exception:
+            return "#333333"
+
+
+def _is_dark_color(hex_str: str) -> bool:
+    hex_str = hex_str.lstrip("#")
+    if len(hex_str) != 6:
+        return False
+    try:
+        r, g, b = int(hex_str[0:2], 16), int(hex_str[2:4], 16), int(hex_str[4:6], 16)
+        luminance = 0.299 * r + 0.587 * g + 0.114 * b
+        return luminance < 120
+    except Exception:
+        return False
+
+
+def _is_bright_color(hex_str: str) -> bool:
+    hex_str = hex_str.lstrip("#")
+    if len(hex_str) != 6:
+        return False
+    try:
+        r, g, b = int(hex_str[0:2], 16), int(hex_str[2:4], 16), int(hex_str[4:6], 16)
+        saturation = max(r, g, b) - min(r, g, b)
+        return saturation > 40
+    except Exception:
+        return False
+
+
+# ==============================================================================
+# FONT EXTRACTION
+# ==============================================================================
+
+def _extract_fonts(slide) -> dict[str, str]:
+    """Extract dominant title and body fonts from a slide."""
+    title_fonts: dict[str, int] = {}
+    body_fonts: dict[str, int] = {}
+
+    for shape in slide.shapes:
+        if not shape.has_text_frame:
+            continue
+        top = int(shape.top)
+        for para in shape.text_frame.paragraphs:
+            for run in para.runs:
+                font_name = run.font.name
+                if not font_name:
+                    continue
+                # Title area: top < ~1.5M EMU (approx 3.8cm)
+                if top < Emu(1500000):
+                    title_fonts[font_name] = title_fonts.get(font_name, 0) + 1
+                else:
+                    body_fonts[font_name] = body_fonts.get(font_name, 0) + 1
+
+    result: dict[str, str] = {}
+    if title_fonts:
+        result["title_font"] = max(title_fonts, key=lambda k: title_fonts[k])
+    if body_fonts:
+        result["body_font"] = max(body_fonts, key=lambda k: body_fonts[k])
+    # Number font often same as body or Arial; keep it simple
+    result["number_font"] = result.get("body_font", "Arial")
+    return result
+
+
+# ==============================================================================
+# SAFE MARGIN DETECTION
+# ==============================================================================
+
+def _extract_safe_margins(slide) -> dict[str, int]:
+    """Estimate safe margins by looking at leftmost/topmost shapes."""
+    lefts = []
+    tops = []
+    for shape in slide.shapes:
+        try:
+            l = int(shape.left)
+            t = int(shape.top)
+            if l > 0 and l < Emu(2000000):
+                lefts.append(l)
+            if t > 0 and t < Emu(2000000):
+                tops.append(t)
+        except Exception:
+            pass
+
+    margins = {}
+    if lefts:
+        margins["left"] = min(lefts)
+        margins["right"] = min(lefts)
+    if tops:
+        margins["top"] = min(tops)
+    # Bottom margin harder to detect; use default
+    margins["bottom"] = int(Emu(254000))
+    return margins
+
+
+# ==============================================================================
+# BACKGROUND DETECTION
+# ==============================================================================
+
+def _has_background(slide) -> bool:
+    """Check if slide has explicit background shapes or images."""
+    try:
+        if slide.background.fill.type is not None:
+            return True
+    except Exception:
+        pass
+    for shape in slide.shapes:
+        try:
+            if int(shape.left) == 0 and int(shape.top) == 0:
+                if int(shape.width) > Emu(10000000) and int(shape.height) > Emu(5000000):
+                    return True
+        except Exception:
+            pass
+    return False
+
+
+def _has_footer(slide) -> bool:
+    """Check if slide has footer-like text at bottom."""
+    for shape in slide.shapes:
+        if not shape.has_text_frame:
+            continue
+        try:
+            top = int(shape.top)
+            if top > Emu(8000000):
+                text = (shape.text_frame.text or "").strip()
+                if text and ("{source}" in text or "{period}" in text or "{page_num}" in text):
+                    return True
+        except Exception:
+            pass
+    return False
+
+
+# ==============================================================================
+# MAIN PARSER
+# ==============================================================================
+
+def parse_template(path: str) -> TemplateProfile:
+    """Parse a .pptx template file and return a TemplateProfile."""
+    abs_path = os.path.abspath(path)
+    prs = Presentation(abs_path)
+
+    total_slides = len(prs.slides)
+    is_builtin = "assets" in abs_path.replace("\\", "/").lower()
+
+    master_slides: list[MasterSlideInfo] = []
+    placeholder_map: dict[str, list[int]] = {}
+    all_colors: dict[str, dict[str, int]] = {}
+    all_fonts: dict[str, dict[str, int]] = {}
+
+    for idx, slide in enumerate(prs.slides):
+        mtype = _detect_master_type(slide, idx, total_slides)
+        placeholders = _scan_placeholders(slide)
+        content_top = _detect_content_top(slide)
+
+        ms = MasterSlideInfo(
+            slide_index=idx,
+            master_type=mtype,
+            placeholders=placeholders,
+            content_top=content_top,
+            has_footer=_has_footer(slide),
+            has_background=_has_background(slide),
+            shape_count=len(list(slide.shapes)),
+        )
+        master_slides.append(ms)
+
+        # Build placeholder -> master index map
+        for ph in placeholders:
+            canonical = _normalize_placeholder(ph) or ph
+            if canonical not in placeholder_map:
+                placeholder_map[canonical] = []
+            if idx not in placeholder_map[canonical]:
+                placeholder_map[canonical].append(idx)
+
+        # Aggregate colors
+        colors = _extract_colors(slide)
+        for k, v in colors.items():
+            if k not in all_colors:
+                all_colors[k] = {}
+            all_colors[k][v] = all_colors[k].get(v, 0) + 1
+
+        # Aggregate fonts
+        fonts = _extract_fonts(slide)
+        for k, v in fonts.items():
+            if k not in all_fonts:
+                all_fonts[k] = {}
+            all_fonts[k][v] = all_fonts[k].get(v, 0) + 1
+
+    # Determine final detected_theme by voting across master slides
+    detected_theme: dict[str, str] = {}
+    for key, vote in all_colors.items():
+        if vote:
+            detected_theme[key] = max(vote, key=lambda k: vote[k])
+
+    # Determine final detected_fonts by voting
+    detected_fonts: dict[str, str] = {}
+    for key, vote in all_fonts.items():
+        if vote:
+            detected_fonts[key] = max(vote, key=lambda k: vote[k])
+
+    # Safe margins: use first content-like slide or cover
+    safe_margins: dict[str, int] = {}
+    for ms in master_slides:
+        if ms.master_type in ("content", "cover"):
+            slide = prs.slides[ms.slide_index]
+            safe_margins = _extract_safe_margins(slide)
+            break
+    if not safe_margins:
+        safe_margins = {"left": int(Emu(762000)), "right": int(Emu(762000)), "top": int(Emu(254000)), "bottom": int(Emu(254000))}
+
+    # Resolve slide dimensions
+    slide_width = int(prs.slide_width) if prs.slide_width else 16256000
+    slide_height = int(prs.slide_height) if prs.slide_height else 9144000
+
+    return TemplateProfile(
+        path=abs_path,
+        is_builtin=is_builtin,
+        slide_width=slide_width,
+        slide_height=slide_height,
+        master_slides=master_slides,
+        placeholder_map=placeholder_map,
+        detected_theme=detected_theme,
+        detected_fonts=detected_fonts,
+        safe_margins=safe_margins,
+    )
+
+
+def get_builtin_template_profile(report_type: str = "daily") -> TemplateProfile:
+    """Parse a built-in template and return its profile."""
+    base = os.path.join(os.path.dirname(__file__), "..", "assets")
+    template_map = {
+        "daily": os.path.join(base, "report-master.pptx"),
+        "weekly": os.path.join(base, "weekly-master.pptx"),
+        "monthly": os.path.join(base, "monthly-master.pptx"),
+    }
+    path = template_map.get(report_type, template_map["daily"])
+    return parse_template(path)
+
+
+# ==============================================================================
+# DEBUG
+# ==============================================================================
+
+if __name__ == "__main__":
+    import json
+    for rtype in ["daily", "weekly", "monthly"]:
+        profile = get_builtin_template_profile(rtype)
+        print(f"\n=== {rtype.upper()} TEMPLATE PROFILE ===")
+        print(f"  Path: {profile.path}")
+        print(f"  Size: {profile.slide_width} x {profile.slide_height}")
+        print(f"  Masters:")
+        for ms in profile.master_slides:
+            print(f"    [{ms.slide_index}] {ms.master_type}: placeholders={ms.placeholders}, content_top={ms.content_top}")
+        print(f"  Theme: {profile.detected_theme}")
+        print(f"  Fonts: {profile.detected_fonts}")
+        print(f"  Margins: {profile.safe_margins}")

+ 112 - 0
generate-data-report-ppt/scripts/theme_manager.py

@@ -96,6 +96,118 @@ def get_theme(preset: ThemePreset, custom_overrides: dict = None) -> ThemeConfig
     return PRESETS.get(preset, PRESETS[ThemePreset.BUSINESS_CLASSIC])
 
 
+def extract_theme_from_template(template_profile) -> ThemeConfig:
+    """
+    Build a ThemeConfig from a TemplateProfile's detected colors and fonts.
+    Adapts to dark templates by ensuring sufficient contrast for charts and text.
+    Falls back to BUSINESS_CLASSIC if no usable colors were detected.
+    """
+    detected = getattr(template_profile, 'detected_theme', {}) or {}
+    detected_fonts = getattr(template_profile, 'detected_fonts', {}) or {}
+    if not detected:
+        return PRESETS[ThemePreset.BUSINESS_CLASSIC].copy()
+
+    def _get(key: str, fallback: str) -> str:
+        return detected.get(key, detected.get(key.replace('_', ''), fallback))
+
+    primary = _get('primary', '#1E3A5F')
+    accent = _get('accent', '#10B981')
+    accent_neg = _get('accent_neg', '#EF4444')
+    text = _get('text', '#333333')
+    text_gray = _get('text_gray', '#666666')
+    bg = detected.get('background', '')
+
+    # Detect if template is dark-themed
+    is_dark = _is_dark_color(primary) or _is_dark_color(bg) or _is_dark_color(detected.get('dark', ''))
+
+    if is_dark:
+        # For dark templates: use bright, high-contrast colors
+        primary = '#38BDF8' if _is_dark_color(primary) else primary  # bright cyan-blue
+        text = '#F8FAFC' if _is_dark_color(text) else text  # near-white
+        text_gray = '#94A3B8' if _is_dark_color(text_gray) else text_gray  # light slate
+        card_bg = '#1E293B'  # dark slate card background
+        gray_bg = '#0F172A'  # very dark background
+        line = '#334155'  # medium slate line
+        series = ['#38BDF8', '#34D399', '#FBBF24', '#F87171', '#A78BFA', '#22D3EE', '#FB923C', '#60A5FA']
+    else:
+        # Light template: derive soft backgrounds from primary
+        card_bg = _lighten_hex(primary, 0.92)
+        gray_bg = _lighten_hex(primary, 0.96)
+        line = _lighten_hex(text, 0.80)
+        series = [primary, accent, accent_neg]
+        if 'accent2' in detected:
+            series.append(detected['accent2'])
+        else:
+            series.append('#ED7D31')
+        series.extend(['#64748B', '#EF4444', '#707070', '#4472C4'])
+        series = series[:8]
+
+    return ThemeConfig(
+        preset=ThemePreset.FROM_TEMPLATE,
+        name='模板提取主题',
+        primary=primary,
+        accent=accent,
+        accent_neg=accent_neg,
+        secondary=_get('secondary', '#64748B'),
+        dark=_get('dark', primary),
+        white='#FFFFFF',
+        gray_bg=gray_bg,
+        card_bg=card_bg,
+        text=text,
+        text_gray=text_gray,
+        line=line,
+        chart_series=series,
+        title_font=detected_fonts.get('title_font', '微软雅黑'),
+        body_font=detected_fonts.get('body_font', '微软雅黑'),
+        number_font=detected_fonts.get('number_font', 'Arial'),
+    )
+
+
+def _is_dark_color(hex_str: str) -> bool:
+    """Check if a hex color is dark (luminance < 120)."""
+    if not hex_str or not isinstance(hex_str, str):
+        return False
+    hex_str = hex_str.lstrip('#')
+    if len(hex_str) != 6:
+        return False
+    try:
+        r = int(hex_str[0:2], 16)
+        g = int(hex_str[2:4], 16)
+        b = int(hex_str[4:6], 16)
+        luminance = 0.299 * r + 0.587 * g + 0.114 * b
+        return luminance < 120
+    except Exception:
+        return False
+
+
+def _lighten_hex(hex_str: str, factor: float) -> str:
+    """Lighten a hex color by mixing with white. factor 0=original, 1=white."""
+    hex_str = hex_str.lstrip('#')
+    if len(hex_str) != 6:
+        return '#F2F2F2'
+    try:
+        r = int(int(hex_str[0:2], 16) * (1 - factor) + 255 * factor)
+        g = int(int(hex_str[2:4], 16) * (1 - factor) + 255 * factor)
+        b = int(int(hex_str[4:6], 16) * (1 - factor) + 255 * factor)
+        return f'#{min(255, r):02X}{min(255, g):02X}{min(255, b):02X}'
+    except Exception:
+        return '#F2F2F2'
+
+
+def merge_theme(template_theme: ThemeConfig, user_theme: ThemeConfig) -> ThemeConfig:
+    """Merge two ThemeConfigs, with user_theme overriding non-empty values."""
+    from dataclasses import fields
+    result = ThemeConfig()
+    for f in fields(ThemeConfig):
+        user_val = getattr(user_theme, f.name, None)
+        template_val = getattr(template_theme, f.name, None)
+        if user_val is not None and user_val != '' and user_val != f.default:
+            setattr(result, f.name, user_val)
+        elif template_val is not None and template_val != '' and template_val != f.default:
+            setattr(result, f.name, template_val)
+    return result
+
+
 def theme_to_rgb_colors(theme: ThemeConfig) -> dict:
     return {
         'primary': _hex_to_rgb(theme.primary),

BIN
generate-data-report-ppt/五菱报告模板.pptx


BIN
generate-data-report-ppt/海外订单日报_4月数据.xlsx


BIN
五菱报告模板.pptx


Alguns arquivos não foram mostrados porque muitos arquivos mudaram nesse diff